All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-10 19:54 ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

This series patches mainly do below things:

1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
   KVM will will do a minimum simulation, there registers are simulated
   to RAZ/WI in KVM.
2. Route synchronous External Abort exceptions from Non-secure EL0
   and EL1 to EL2. When exception EL3 routing is enabled by firmware,
   system will trap to EL3 firmware instead of EL2 KVM, then firmware
   judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
   otherwise jump to EL1 host kernel.
3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
   in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
   driver to parse the CPER record for SError which happened in the guest
4. Although we can use APEI driver to handle the guest SError, but not all
   system support SEI notification, such as kernel-first. So here KVM will
   also classify the Error through Exception Syndrome Register and do different
   approaches according to Asynchronous Error Type
5. If the guest SError error is not propagated and not consumed, then KVM return
   recoverable error status to user-space, user-space will specify the guest ESR
   and inject a virtual SError. For other Asynchronous Error Type, KVM directly
   injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
   error is fatal. In the RAS extension, guest virtual ESR must be set, because
   all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
   this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.

Dongjiu Geng (5):
  acpi: apei: Add SEI notification type support for ARMv8
  KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
  arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
  arm64: kvm: Set Virtual SError Exception Syndrome for guest
  arm64: kvm: handle SError Interrupt by categorization

James Morse (1):
  KVM: arm64: Save ESR_EL2 on guest SError

Xie XiuQi (1):
  arm64: cpufeature: Detect CPU RAS Extentions

 Documentation/virtual/kvm/api.txt    | 11 ++++++
 arch/arm/include/asm/kvm_host.h      |  1 +
 arch/arm/kvm/guest.c                 |  9 +++++
 arch/arm64/Kconfig                   | 16 +++++++++
 arch/arm64/include/asm/barrier.h     |  1 +
 arch/arm64/include/asm/cpucaps.h     |  3 +-
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/kvm_emulate.h | 17 +++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/asm/sysreg.h      | 15 ++++++++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kernel/cpufeature.c       | 13 +++++++
 arch/arm64/kernel/process.c          |  3 ++
 arch/arm64/kvm/guest.c               | 14 ++++++++
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/kvm/hyp/switch.c          | 31 +++++++++++++++--
 arch/arm64/kvm/inject_fault.c        | 13 ++++++-
 arch/arm64/kvm/reset.c               |  3 ++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++
 arch/arm64/mm/fault.c                | 16 +++++++++
 drivers/acpi/apei/Kconfig            | 15 ++++++++
 drivers/acpi/apei/ghes.c             | 53 ++++++++++++++++++++++++++++
 include/acpi/ghes.h                  |  1 +
 include/uapi/linux/kvm.h             |  3 ++
 virt/kvm/arm/arm.c                   |  7 ++++
 27 files changed, 336 insertions(+), 9 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-10 19:54 ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

This series patches mainly do below things:

1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
   KVM will will do a minimum simulation, there registers are simulated
   to RAZ/WI in KVM.
2. Route synchronous External Abort exceptions from Non-secure EL0
   and EL1 to EL2. When exception EL3 routing is enabled by firmware,
   system will trap to EL3 firmware instead of EL2 KVM, then firmware
   judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
   otherwise jump to EL1 host kernel.
3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
   in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
   driver to parse the CPER record for SError which happened in the guest
4. Although we can use APEI driver to handle the guest SError, but not all
   system support SEI notification, such as kernel-first. So here KVM will
   also classify the Error through Exception Syndrome Register and do different
   approaches according to Asynchronous Error Type
5. If the guest SError error is not propagated and not consumed, then KVM return
   recoverable error status to user-space, user-space will specify the guest ESR
   and inject a virtual SError. For other Asynchronous Error Type, KVM directly
   injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
   error is fatal. In the RAS extension, guest virtual ESR must be set, because
   all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
   this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.

Dongjiu Geng (5):
  acpi: apei: Add SEI notification type support for ARMv8
  KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
  arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
  arm64: kvm: Set Virtual SError Exception Syndrome for guest
  arm64: kvm: handle SError Interrupt by categorization

James Morse (1):
  KVM: arm64: Save ESR_EL2 on guest SError

Xie XiuQi (1):
  arm64: cpufeature: Detect CPU RAS Extentions

 Documentation/virtual/kvm/api.txt    | 11 ++++++
 arch/arm/include/asm/kvm_host.h      |  1 +
 arch/arm/kvm/guest.c                 |  9 +++++
 arch/arm64/Kconfig                   | 16 +++++++++
 arch/arm64/include/asm/barrier.h     |  1 +
 arch/arm64/include/asm/cpucaps.h     |  3 +-
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/kvm_emulate.h | 17 +++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/asm/sysreg.h      | 15 ++++++++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kernel/cpufeature.c       | 13 +++++++
 arch/arm64/kernel/process.c          |  3 ++
 arch/arm64/kvm/guest.c               | 14 ++++++++
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/kvm/hyp/switch.c          | 31 +++++++++++++++--
 arch/arm64/kvm/inject_fault.c        | 13 ++++++-
 arch/arm64/kvm/reset.c               |  3 ++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++
 arch/arm64/mm/fault.c                | 16 +++++++++
 drivers/acpi/apei/Kconfig            | 15 ++++++++
 drivers/acpi/apei/ghes.c             | 53 ++++++++++++++++++++++++++++
 include/acpi/ghes.h                  |  1 +
 include/uapi/linux/kvm.h             |  3 ++
 virt/kvm/arm/arm.c                   |  7 ++++
 27 files changed, 336 insertions(+), 9 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-10 19:54 ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

This series patches mainly do below things:

1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
   KVM will will do a minimum simulation, there registers are simulated
   to RAZ/WI in KVM.
2. Route synchronous External Abort exceptions from Non-secure EL0
   and EL1 to EL2. When exception EL3 routing is enabled by firmware,
   system will trap to EL3 firmware instead of EL2 KVM, then firmware
   judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
   otherwise jump to EL1 host kernel.
3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
   in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
   driver to parse the CPER record for SError which happened in the guest
4. Although we can use APEI driver to handle the guest SError, but not all
   system support SEI notification, such as kernel-first. So here KVM will
   also classify the Error through Exception Syndrome Register and do different
   approaches according to Asynchronous Error Type
5. If the guest SError error is not propagated and not consumed, then KVM return
   recoverable error status to user-space, user-space will specify the guest ESR
   and inject a virtual SError. For other Asynchronous Error Type, KVM directly
   injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
   error is fatal. In the RAS extension, guest virtual ESR must be set, because
   all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
   this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.

Dongjiu Geng (5):
  acpi: apei: Add SEI notification type support for ARMv8
  KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
  arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
  arm64: kvm: Set Virtual SError Exception Syndrome for guest
  arm64: kvm: handle SError Interrupt by categorization

James Morse (1):
  KVM: arm64: Save ESR_EL2 on guest SError

Xie XiuQi (1):
  arm64: cpufeature: Detect CPU RAS Extentions

 Documentation/virtual/kvm/api.txt    | 11 ++++++
 arch/arm/include/asm/kvm_host.h      |  1 +
 arch/arm/kvm/guest.c                 |  9 +++++
 arch/arm64/Kconfig                   | 16 +++++++++
 arch/arm64/include/asm/barrier.h     |  1 +
 arch/arm64/include/asm/cpucaps.h     |  3 +-
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/kvm_emulate.h | 17 +++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/asm/sysreg.h      | 15 ++++++++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kernel/cpufeature.c       | 13 +++++++
 arch/arm64/kernel/process.c          |  3 ++
 arch/arm64/kvm/guest.c               | 14 ++++++++
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/kvm/hyp/switch.c          | 31 +++++++++++++++--
 arch/arm64/kvm/inject_fault.c        | 13 ++++++-
 arch/arm64/kvm/reset.c               |  3 ++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++
 arch/arm64/mm/fault.c                | 16 +++++++++
 drivers/acpi/apei/Kconfig            | 15 ++++++++
 drivers/acpi/apei/ghes.c             | 53 ++++++++++++++++++++++++++++
 include/acpi/ghes.h                  |  1 +
 include/uapi/linux/kvm.h             |  3 ++
 virt/kvm/arm/arm.c                   |  7 ++++
 27 files changed, 336 insertions(+), 9 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-10 19:54 ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]

This series patches mainly do below things:

1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
   KVM will will do a minimum simulation, there registers are simulated
   to RAZ/WI in KVM.
2. Route synchronous External Abort exceptions from Non-secure EL0
   and EL1 to EL2. When exception EL3 routing is enabled by firmware,
   system will trap to EL3 firmware instead of EL2 KVM, then firmware
   judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
   otherwise jump to EL1 host kernel.
3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
   in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
   driver to parse the CPER record for SError which happened in the guest
4. Although we can use APEI driver to handle the guest SError, but not all
   system support SEI notification, such as kernel-first. So here KVM will
   also classify the Error through Exception Syndrome Register and do different
   approaches according to Asynchronous Error Type
5. If the guest SError error is not propagated and not consumed, then KVM return
   recoverable error status to user-space, user-space will specify the guest ESR
   and inject a virtual SError. For other Asynchronous Error Type, KVM directly
   injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
   error is fatal. In the RAS extension, guest virtual ESR must be set, because
   all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
   this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.

Dongjiu Geng (5):
  acpi: apei: Add SEI notification type support for ARMv8
  KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
  arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
  arm64: kvm: Set Virtual SError Exception Syndrome for guest
  arm64: kvm: handle SError Interrupt by categorization

James Morse (1):
  KVM: arm64: Save ESR_EL2 on guest SError

Xie XiuQi (1):
  arm64: cpufeature: Detect CPU RAS Extentions

 Documentation/virtual/kvm/api.txt    | 11 ++++++
 arch/arm/include/asm/kvm_host.h      |  1 +
 arch/arm/kvm/guest.c                 |  9 +++++
 arch/arm64/Kconfig                   | 16 +++++++++
 arch/arm64/include/asm/barrier.h     |  1 +
 arch/arm64/include/asm/cpucaps.h     |  3 +-
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/kvm_emulate.h | 17 +++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/asm/sysreg.h      | 15 ++++++++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kernel/cpufeature.c       | 13 +++++++
 arch/arm64/kernel/process.c          |  3 ++
 arch/arm64/kvm/guest.c               | 14 ++++++++
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/kvm/hyp/switch.c          | 31 +++++++++++++++--
 arch/arm64/kvm/inject_fault.c        | 13 ++++++-
 arch/arm64/kvm/reset.c               |  3 ++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++
 arch/arm64/mm/fault.c                | 16 +++++++++
 drivers/acpi/apei/Kconfig            | 15 ++++++++
 drivers/acpi/apei/ghes.c             | 53 ++++++++++++++++++++++++++++
 include/acpi/ghes.h                  |  1 +
 include/uapi/linux/kvm.h             |  3 ++
 virt/kvm/arm/arm.c                   |  7 ++++
 27 files changed, 336 insertions(+), 9 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

From: Xie XiuQi <xiexiuqi@huawei.com>

ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending.

Add cpufeature detection and a barrier in the context-switch code.
There is no need to use alternatives for this as CPUs that don't
support this feature will treat the instruction as a nop.

Platform level RAS support may require additional firmware support.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased, added esb and config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig               | 16 ++++++++++++++++
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/kernel/cpufeature.c   | 13 +++++++++++++
 arch/arm64/kernel/process.c      |  3 +++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..cc00d10 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -973,6 +973,22 @@ config ARM64_PMEM
 	  operations if DC CVAP is not supported (following the behaviour of
 	  DC CVAP itself if the system does not define a point of persistence).
 
+config ARM64_RAS_EXTN
+	bool "Enable support for RAS CPU Extensions"
+	default y
+	help
+	  CPUs that support the Reliability, Availability and Serviceability
+	  (RAS) Extensions, part of ARMv8.2 are able to track faults and
+	  errors, classify them and report them to software.
+
+	  On CPUs with these extensions system software can use additional
+	  barriers to determine if faults are pending and read the
+	  classification from a new set of registers.
+
+	  Selecting this feature will allow the kernel to use these barriers
+	  and access the new registers if the system supports the extension.
+	  Platform RAS features may additionally depend on firmware support.
+
 endmenu
 
 config ARM64_MODULE_CMODEL_LARGE
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 0fe7e43..8b0a0eb 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -30,6 +30,7 @@
 #define isb()		asm volatile("isb" : : : "memory")
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
+#define esb()		asm volatile("hint #16"  : : : "memory")
 
 #define mb()		dsb(sy)
 #define rmb()		dsb(ld)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..4820d44 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_858921			19
 #define ARM64_WORKAROUND_CAVIUM_30115		20
 #define ARM64_HAS_DCPOP				21
+#define ARM64_HAS_RAS_EXTN			22
 
-#define ARM64_NCAPS				22
+#define ARM64_NCAPS				23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f707fed..64e2a80 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -332,6 +332,7 @@
 #define ID_AA64ISAR1_DPB_SHIFT		0
 
 /* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT		28
 #define ID_AA64PFR0_GIC_SHIFT		24
 #define ID_AA64PFR0_ASIMD_SHIFT		20
 #define ID_AA64PFR0_FP_SHIFT		16
@@ -340,6 +341,7 @@
 #define ID_AA64PFR0_EL1_SHIFT		4
 #define ID_AA64PFR0_EL0_SHIFT		0
 
+#define ID_AA64PFR0_RAS_V1		0x1
 #define ID_AA64PFR0_FP_NI		0xf
 #define ID_AA64PFR0_FP_SUPPORTED	0x0
 #define ID_AA64PFR0_ASIMD_NI		0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 21e2c95..4846974 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -125,6 +125,7 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_RAS_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -900,6 +901,18 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
 		.min_field_value = 1,
 	},
 #endif
+#ifdef CONFIG_ARM64_RAS_EXTN
+	{
+		.desc = "RAS Extension Support",
+		.capability = ARM64_HAS_RAS_EXTN,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64PFR0_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR0_RAS_SHIFT,
+		.min_field_value = ID_AA64PFR0_RAS_V1,
+	},
+#endif /* CONFIG_ARM64_RAS_EXTN */
 	{},
 };
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2dc0f84..5e5d2f0 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -365,6 +365,9 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	dsb(ish);
 
+	/* Deliver any pending SError from prev */
+	esb();
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

From: Xie XiuQi <xiexiuqi@huawei.com>

ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending.

Add cpufeature detection and a barrier in the context-switch code.
There is no need to use alternatives for this as CPUs that don't
support this feature will treat the instruction as a nop.

Platform level RAS support may require additional firmware support.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased, added esb and config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig               | 16 ++++++++++++++++
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/kernel/cpufeature.c   | 13 +++++++++++++
 arch/arm64/kernel/process.c      |  3 +++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..cc00d10 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -973,6 +973,22 @@ config ARM64_PMEM
 	  operations if DC CVAP is not supported (following the behaviour of
 	  DC CVAP itself if the system does not define a point of persistence).
 
+config ARM64_RAS_EXTN
+	bool "Enable support for RAS CPU Extensions"
+	default y
+	help
+	  CPUs that support the Reliability, Availability and Serviceability
+	  (RAS) Extensions, part of ARMv8.2 are able to track faults and
+	  errors, classify them and report them to software.
+
+	  On CPUs with these extensions system software can use additional
+	  barriers to determine if faults are pending and read the
+	  classification from a new set of registers.
+
+	  Selecting this feature will allow the kernel to use these barriers
+	  and access the new registers if the system supports the extension.
+	  Platform RAS features may additionally depend on firmware support.
+
 endmenu
 
 config ARM64_MODULE_CMODEL_LARGE
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 0fe7e43..8b0a0eb 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -30,6 +30,7 @@
 #define isb()		asm volatile("isb" : : : "memory")
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
+#define esb()		asm volatile("hint #16"  : : : "memory")
 
 #define mb()		dsb(sy)
 #define rmb()		dsb(ld)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..4820d44 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_858921			19
 #define ARM64_WORKAROUND_CAVIUM_30115		20
 #define ARM64_HAS_DCPOP				21
+#define ARM64_HAS_RAS_EXTN			22
 
-#define ARM64_NCAPS				22
+#define ARM64_NCAPS				23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f707fed..64e2a80 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -332,6 +332,7 @@
 #define ID_AA64ISAR1_DPB_SHIFT		0
 
 /* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT		28
 #define ID_AA64PFR0_GIC_SHIFT		24
 #define ID_AA64PFR0_ASIMD_SHIFT		20
 #define ID_AA64PFR0_FP_SHIFT		16
@@ -340,6 +341,7 @@
 #define ID_AA64PFR0_EL1_SHIFT		4
 #define ID_AA64PFR0_EL0_SHIFT		0
 
+#define ID_AA64PFR0_RAS_V1		0x1
 #define ID_AA64PFR0_FP_NI		0xf
 #define ID_AA64PFR0_FP_SUPPORTED	0x0
 #define ID_AA64PFR0_ASIMD_NI		0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 21e2c95..4846974 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -125,6 +125,7 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_RAS_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -900,6 +901,18 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
 		.min_field_value = 1,
 	},
 #endif
+#ifdef CONFIG_ARM64_RAS_EXTN
+	{
+		.desc = "RAS Extension Support",
+		.capability = ARM64_HAS_RAS_EXTN,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64PFR0_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR0_RAS_SHIFT,
+		.min_field_value = ID_AA64PFR0_RAS_V1,
+	},
+#endif /* CONFIG_ARM64_RAS_EXTN */
 	{},
 };
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2dc0f84..5e5d2f0 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -365,6 +365,9 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	dsb(ish);
 
+	/* Deliver any pending SError from prev */
+	esb();
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: Xie XiuQi <xiexiuqi@huawei.com>

ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending.

Add cpufeature detection and a barrier in the context-switch code.
There is no need to use alternatives for this as CPUs that don't
support this feature will treat the instruction as a nop.

Platform level RAS support may require additional firmware support.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased, added esb and config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
---
 arch/arm64/Kconfig               | 16 ++++++++++++++++
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/kernel/cpufeature.c   | 13 +++++++++++++
 arch/arm64/kernel/process.c      |  3 +++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..cc00d10 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -973,6 +973,22 @@ config ARM64_PMEM
 	  operations if DC CVAP is not supported (following the behaviour of
 	  DC CVAP itself if the system does not define a point of persistence).
 
+config ARM64_RAS_EXTN
+	bool "Enable support for RAS CPU Extensions"
+	default y
+	help
+	  CPUs that support the Reliability, Availability and Serviceability
+	  (RAS) Extensions, part of ARMv8.2 are able to track faults and
+	  errors, classify them and report them to software.
+
+	  On CPUs with these extensions system software can use additional
+	  barriers to determine if faults are pending and read the
+	  classification from a new set of registers.
+
+	  Selecting this feature will allow the kernel to use these barriers
+	  and access the new registers if the system supports the extension.
+	  Platform RAS features may additionally depend on firmware support.
+
 endmenu
 
 config ARM64_MODULE_CMODEL_LARGE
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 0fe7e43..8b0a0eb 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -30,6 +30,7 @@
 #define isb()		asm volatile("isb" : : : "memory")
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
+#define esb()		asm volatile("hint #16"  : : : "memory")
 
 #define mb()		dsb(sy)
 #define rmb()		dsb(ld)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..4820d44 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_858921			19
 #define ARM64_WORKAROUND_CAVIUM_30115		20
 #define ARM64_HAS_DCPOP				21
+#define ARM64_HAS_RAS_EXTN			22
 
-#define ARM64_NCAPS				22
+#define ARM64_NCAPS				23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f707fed..64e2a80 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -332,6 +332,7 @@
 #define ID_AA64ISAR1_DPB_SHIFT		0
 
 /* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT		28
 #define ID_AA64PFR0_GIC_SHIFT		24
 #define ID_AA64PFR0_ASIMD_SHIFT		20
 #define ID_AA64PFR0_FP_SHIFT		16
@@ -340,6 +341,7 @@
 #define ID_AA64PFR0_EL1_SHIFT		4
 #define ID_AA64PFR0_EL0_SHIFT		0
 
+#define ID_AA64PFR0_RAS_V1		0x1
 #define ID_AA64PFR0_FP_NI		0xf
 #define ID_AA64PFR0_FP_SUPPORTED	0x0
 #define ID_AA64PFR0_ASIMD_NI		0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 21e2c95..4846974 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -125,6 +125,7 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_RAS_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -900,6 +901,18 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
 		.min_field_value = 1,
 	},
 #endif
+#ifdef CONFIG_ARM64_RAS_EXTN
+	{
+		.desc = "RAS Extension Support",
+		.capability = ARM64_HAS_RAS_EXTN,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64PFR0_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR0_RAS_SHIFT,
+		.min_field_value = ID_AA64PFR0_RAS_V1,
+	},
+#endif /* CONFIG_ARM64_RAS_EXTN */
 	{},
 };
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2dc0f84..5e5d2f0 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -365,6 +365,9 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	dsb(ish);
 
+	/* Deliver any pending SError from prev */
+	esb();
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 5498 bytes --]

From: Xie XiuQi <xiexiuqi(a)huawei.com>

ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending.

Add cpufeature detection and a barrier in the context-switch code.
There is no need to use alternatives for this as CPUs that don't
support this feature will treat the instruction as a nop.

Platform level RAS support may require additional firmware support.

Signed-off-by: Xie XiuQi <xiexiuqi(a)huawei.com>
[Rebased, added esb and config option, reworded commit message]
Signed-off-by: James Morse <james.morse(a)arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
 arch/arm64/Kconfig               | 16 ++++++++++++++++
 arch/arm64/include/asm/barrier.h |  1 +
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/include/asm/sysreg.h  |  2 ++
 arch/arm64/kernel/cpufeature.c   | 13 +++++++++++++
 arch/arm64/kernel/process.c      |  3 +++
 6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..cc00d10 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -973,6 +973,22 @@ config ARM64_PMEM
 	  operations if DC CVAP is not supported (following the behaviour of
 	  DC CVAP itself if the system does not define a point of persistence).
 
+config ARM64_RAS_EXTN
+	bool "Enable support for RAS CPU Extensions"
+	default y
+	help
+	  CPUs that support the Reliability, Availability and Serviceability
+	  (RAS) Extensions, part of ARMv8.2 are able to track faults and
+	  errors, classify them and report them to software.
+
+	  On CPUs with these extensions system software can use additional
+	  barriers to determine if faults are pending and read the
+	  classification from a new set of registers.
+
+	  Selecting this feature will allow the kernel to use these barriers
+	  and access the new registers if the system supports the extension.
+	  Platform RAS features may additionally depend on firmware support.
+
 endmenu
 
 config ARM64_MODULE_CMODEL_LARGE
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 0fe7e43..8b0a0eb 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -30,6 +30,7 @@
 #define isb()		asm volatile("isb" : : : "memory")
 #define dmb(opt)	asm volatile("dmb " #opt : : : "memory")
 #define dsb(opt)	asm volatile("dsb " #opt : : : "memory")
+#define esb()		asm volatile("hint #16"  : : : "memory")
 
 #define mb()		dsb(sy)
 #define rmb()		dsb(ld)
diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da6216..4820d44 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_858921			19
 #define ARM64_WORKAROUND_CAVIUM_30115		20
 #define ARM64_HAS_DCPOP				21
+#define ARM64_HAS_RAS_EXTN			22
 
-#define ARM64_NCAPS				22
+#define ARM64_NCAPS				23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index f707fed..64e2a80 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -332,6 +332,7 @@
 #define ID_AA64ISAR1_DPB_SHIFT		0
 
 /* id_aa64pfr0 */
+#define ID_AA64PFR0_RAS_SHIFT		28
 #define ID_AA64PFR0_GIC_SHIFT		24
 #define ID_AA64PFR0_ASIMD_SHIFT		20
 #define ID_AA64PFR0_FP_SHIFT		16
@@ -340,6 +341,7 @@
 #define ID_AA64PFR0_EL1_SHIFT		4
 #define ID_AA64PFR0_EL0_SHIFT		0
 
+#define ID_AA64PFR0_RAS_V1		0x1
 #define ID_AA64PFR0_FP_NI		0xf
 #define ID_AA64PFR0_FP_SUPPORTED	0x0
 #define ID_AA64PFR0_ASIMD_NI		0xf
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 21e2c95..4846974 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -125,6 +125,7 @@ static int __init register_cpu_hwcaps_dumper(void)
 };
 
 static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
+	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_RAS_SHIFT, 4, 0),
 	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_EXACT, ID_AA64PFR0_GIC_SHIFT, 4, 0),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_ASIMD_SHIFT, 4, ID_AA64PFR0_ASIMD_NI),
 	S_ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_FP_SHIFT, 4, ID_AA64PFR0_FP_NI),
@@ -900,6 +901,18 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
 		.min_field_value = 1,
 	},
 #endif
+#ifdef CONFIG_ARM64_RAS_EXTN
+	{
+		.desc = "RAS Extension Support",
+		.capability = ARM64_HAS_RAS_EXTN,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64PFR0_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64PFR0_RAS_SHIFT,
+		.min_field_value = ID_AA64PFR0_RAS_V1,
+	},
+#endif /* CONFIG_ARM64_RAS_EXTN */
 	{},
 };
 
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2dc0f84..5e5d2f0 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -365,6 +365,9 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
 	 */
 	dsb(ish);
 
+	/* Deliver any pending SError from prev */
+	esb();
+
 	/* the actual thread switch */
 	last = cpu_switch_to(prev, next);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

From: James Morse <james.morse@arm.com>

When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.

The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 arch/arm64/kvm/hyp/switch.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..c6f17c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -226,13 +226,20 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
+static void __hyp_text __populate_fault_info_esr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+}
+
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 {
-	u64 esr = read_sysreg_el2(esr);
-	u8 ec = ESR_ELx_EC(esr);
+	u8 ec;
+	u64 esr;
 	u64 hpfar, far;
 
-	vcpu->arch.fault.esr_el2 = esr;
+	__populate_fault_info_esr(vcpu);
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
 
 	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
 		return true;
@@ -321,6 +328,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
 		goto again;
+	else if (ARM_EXCEPTION_CODE(exit_code) == ARM_EXCEPTION_EL1_SERROR)
+		__populate_fault_info_esr(vcpu);
 
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
 	    exit_code == ARM_EXCEPTION_TRAP) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

From: James Morse <james.morse@arm.com>

When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.

The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 arch/arm64/kvm/hyp/switch.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..c6f17c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -226,13 +226,20 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
+static void __hyp_text __populate_fault_info_esr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+}
+
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 {
-	u64 esr = read_sysreg_el2(esr);
-	u8 ec = ESR_ELx_EC(esr);
+	u8 ec;
+	u64 esr;
 	u64 hpfar, far;
 
-	vcpu->arch.fault.esr_el2 = esr;
+	__populate_fault_info_esr(vcpu);
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
 
 	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
 		return true;
@@ -321,6 +328,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
 		goto again;
+	else if (ARM_EXCEPTION_CODE(exit_code) == ARM_EXCEPTION_EL1_SERROR)
+		__populate_fault_info_esr(vcpu);
 
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
 	    exit_code == ARM_EXCEPTION_TRAP) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

From: James Morse <james.morse@arm.com>

When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.

The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 arch/arm64/kvm/hyp/switch.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..c6f17c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -226,13 +226,20 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
+static void __hyp_text __populate_fault_info_esr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+}
+
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 {
-	u64 esr = read_sysreg_el2(esr);
-	u8 ec = ESR_ELx_EC(esr);
+	u8 ec;
+	u64 esr;
 	u64 hpfar, far;
 
-	vcpu->arch.fault.esr_el2 = esr;
+	__populate_fault_info_esr(vcpu);
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
 
 	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
 		return true;
@@ -321,6 +328,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
 		goto again;
+	else if (ARM_EXCEPTION_CODE(exit_code) == ARM_EXCEPTION_EL1_SERROR)
+		__populate_fault_info_esr(vcpu);
 
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
 	    exit_code == ARM_EXCEPTION_TRAP) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1794 bytes --]

From: James Morse <james.morse(a)arm.com>

When we exit a guest due to an SError the vcpu fault info isn't updated
with the ESR. Today this is only done for traps.

The v8.2 RAS Extensions define ISS values for SError. Update the vcpu's
fault_info with the ESR on SError so that handle_exit() can determine
if this was a RAS SError and decode its severity.

Signed-off-by: James Morse <james.morse(a)arm.com>
Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
---
 arch/arm64/kvm/hyp/switch.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 945e79c..c6f17c7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -226,13 +226,20 @@ static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar)
 	return true;
 }
 
+static void __hyp_text __populate_fault_info_esr(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.fault.esr_el2 = read_sysreg_el2(esr);
+}
+
 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 {
-	u64 esr = read_sysreg_el2(esr);
-	u8 ec = ESR_ELx_EC(esr);
+	u8 ec;
+	u64 esr;
 	u64 hpfar, far;
 
-	vcpu->arch.fault.esr_el2 = esr;
+	__populate_fault_info_esr(vcpu);
+	esr = vcpu->arch.fault.esr_el2;
+	ec = ESR_ELx_EC(esr);
 
 	if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW)
 		return true;
@@ -321,6 +328,8 @@ int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 */
 	if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu))
 		goto again;
+	else if (ARM_EXCEPTION_CODE(exit_code) == ARM_EXCEPTION_EL1_SERROR)
+		__populate_fault_info_esr(vcpu);
 
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap) &&
 	    exit_code == ARM_EXCEPTION_TRAP) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

ARMv8.2 requires implementation of the RAS extension, in
this extension it adds SEI(SError Interrupt) notification
type, this patch adds new GHES error source SEI handling
functions. This error source parsing and handling method
is similar with the SEA.

Expose API ghes_notify_sei() to external users. External
modules can call this exposed API to parse APEI table and
handle the SEI notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 drivers/acpi/apei/Kconfig | 15 ++++++++++++++
 drivers/acpi/apei/ghes.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ghes.h       |  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 52ae543..ff4afc3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,21 @@ config ACPI_APEI_SEA
 	  option allows the OS to look for such hardware error record, and
 	  take appropriate action.
 
+config ACPI_APEI_SEI
+	bool "APEI SError(System Error) Interrupt logging/recovering support"
+	depends on ARM64 && ACPI_APEI_GHES
+	default y
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEI (SError interrupt).
+
+	  SEI happens with asynchronous external abort for errors on device
+	  memory reads on ARMv8 systems. If a system supports firmware first
+	  handling of SEI, the platform analyzes and handles hardware error
+	  notifications from SEI, and it may then form a hardware error record for
+	  the OS to parse and handle. This option allows the OS to look for
+	  such hardware error record, and take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
 	bool "APEI memory error recovering support"
 	depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6a3f824..67cd3a7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -855,6 +855,46 @@ static inline void ghes_sea_add(struct ghes *ghes) { }
 static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+/*
+ * Return 0 only if one of the SEI error sources successfully reported an error
+ * record sent from the firmware.
+ */
+int ghes_notify_sei(void)
+{
+	struct ghes *ghes;
+	int ret = -ENOENT;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+		if (!ghes_proc(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sei);
+	mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes) { }
+static inline void ghes_sei_remove(struct ghes *ghes) { }
+#endif /* CONFIG_ACPI_APEI_SEI */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1086,6 +1126,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+				generic->header.source_id);
+		goto err;
+	}
+	break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1158,6 +1205,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_add(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1211,6 +1261,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c8..9ba59e2 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
 	     section = acpi_hest_get_next(section))
 
 int ghes_notify_sea(void);
+int ghes_notify_sei(void);
 
 #endif /* GHES_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

ARMv8.2 requires implementation of the RAS extension, in
this extension it adds SEI(SError Interrupt) notification
type, this patch adds new GHES error source SEI handling
functions. This error source parsing and handling method
is similar with the SEA.

Expose API ghes_notify_sei() to external users. External
modules can call this exposed API to parse APEI table and
handle the SEI notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 drivers/acpi/apei/Kconfig | 15 ++++++++++++++
 drivers/acpi/apei/ghes.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ghes.h       |  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 52ae543..ff4afc3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,21 @@ config ACPI_APEI_SEA
 	  option allows the OS to look for such hardware error record, and
 	  take appropriate action.
 
+config ACPI_APEI_SEI
+	bool "APEI SError(System Error) Interrupt logging/recovering support"
+	depends on ARM64 && ACPI_APEI_GHES
+	default y
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEI (SError interrupt).
+
+	  SEI happens with asynchronous external abort for errors on device
+	  memory reads on ARMv8 systems. If a system supports firmware first
+	  handling of SEI, the platform analyzes and handles hardware error
+	  notifications from SEI, and it may then form a hardware error record for
+	  the OS to parse and handle. This option allows the OS to look for
+	  such hardware error record, and take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
 	bool "APEI memory error recovering support"
 	depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6a3f824..67cd3a7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -855,6 +855,46 @@ static inline void ghes_sea_add(struct ghes *ghes) { }
 static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+/*
+ * Return 0 only if one of the SEI error sources successfully reported an error
+ * record sent from the firmware.
+ */
+int ghes_notify_sei(void)
+{
+	struct ghes *ghes;
+	int ret = -ENOENT;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+		if (!ghes_proc(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sei);
+	mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes) { }
+static inline void ghes_sei_remove(struct ghes *ghes) { }
+#endif /* CONFIG_ACPI_APEI_SEI */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1086,6 +1126,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+				generic->header.source_id);
+		goto err;
+	}
+	break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1158,6 +1205,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_add(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1211,6 +1261,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c8..9ba59e2 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
 	     section = acpi_hest_get_next(section))
 
 int ghes_notify_sea(void);
+int ghes_notify_sei(void);
 
 #endif /* GHES_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

ARMv8.2 requires implementation of the RAS extension, in
this extension it adds SEI(SError Interrupt) notification
type, this patch adds new GHES error source SEI handling
functions. This error source parsing and handling method
is similar with the SEA.

Expose API ghes_notify_sei() to external users. External
modules can call this exposed API to parse APEI table and
handle the SEI notification.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
---
 drivers/acpi/apei/Kconfig | 15 ++++++++++++++
 drivers/acpi/apei/ghes.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ghes.h       |  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 52ae543..ff4afc3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,21 @@ config ACPI_APEI_SEA
 	  option allows the OS to look for such hardware error record, and
 	  take appropriate action.
 
+config ACPI_APEI_SEI
+	bool "APEI SError(System Error) Interrupt logging/recovering support"
+	depends on ARM64 && ACPI_APEI_GHES
+	default y
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEI (SError interrupt).
+
+	  SEI happens with asynchronous external abort for errors on device
+	  memory reads on ARMv8 systems. If a system supports firmware first
+	  handling of SEI, the platform analyzes and handles hardware error
+	  notifications from SEI, and it may then form a hardware error record for
+	  the OS to parse and handle. This option allows the OS to look for
+	  such hardware error record, and take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
 	bool "APEI memory error recovering support"
 	depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6a3f824..67cd3a7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -855,6 +855,46 @@ static inline void ghes_sea_add(struct ghes *ghes) { }
 static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+/*
+ * Return 0 only if one of the SEI error sources successfully reported an error
+ * record sent from the firmware.
+ */
+int ghes_notify_sei(void)
+{
+	struct ghes *ghes;
+	int ret = -ENOENT;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+		if (!ghes_proc(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sei);
+	mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes) { }
+static inline void ghes_sei_remove(struct ghes *ghes) { }
+#endif /* CONFIG_ACPI_APEI_SEI */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1086,6 +1126,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+				generic->header.source_id);
+		goto err;
+	}
+	break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1158,6 +1205,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_add(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1211,6 +1261,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c8..9ba59e2 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
 	     section = acpi_hest_get_next(section))
 
 int ghes_notify_sea(void);
+int ghes_notify_sei(void);
 
 #endif /* GHES_H */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 4696 bytes --]

ARMv8.2 requires implementation of the RAS extension, in
this extension it adds SEI(SError Interrupt) notification
type, this patch adds new GHES error source SEI handling
functions. This error source parsing and handling method
is similar with the SEA.

Expose API ghes_notify_sei() to external users. External
modules can call this exposed API to parse APEI table and
handle the SEI notification.

Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
---
 drivers/acpi/apei/Kconfig | 15 ++++++++++++++
 drivers/acpi/apei/ghes.c  | 53 +++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/ghes.h       |  1 +
 3 files changed, 69 insertions(+)

diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig
index 52ae543..ff4afc3 100644
--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -55,6 +55,21 @@ config ACPI_APEI_SEA
 	  option allows the OS to look for such hardware error record, and
 	  take appropriate action.
 
+config ACPI_APEI_SEI
+	bool "APEI SError(System Error) Interrupt logging/recovering support"
+	depends on ARM64 && ACPI_APEI_GHES
+	default y
+	help
+	  This option should be enabled if the system supports
+	  firmware first handling of SEI (SError interrupt).
+
+	  SEI happens with asynchronous external abort for errors on device
+	  memory reads on ARMv8 systems. If a system supports firmware first
+	  handling of SEI, the platform analyzes and handles hardware error
+	  notifications from SEI, and it may then form a hardware error record for
+	  the OS to parse and handle. This option allows the OS to look for
+	  such hardware error record, and take appropriate action.
+
 config ACPI_APEI_MEMORY_FAILURE
 	bool "APEI memory error recovering support"
 	depends on ACPI_APEI && MEMORY_FAILURE
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 6a3f824..67cd3a7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -855,6 +855,46 @@ static inline void ghes_sea_add(struct ghes *ghes) { }
 static inline void ghes_sea_remove(struct ghes *ghes) { }
 #endif /* CONFIG_ACPI_APEI_SEA */
 
+#ifdef CONFIG_ACPI_APEI_SEI
+static LIST_HEAD(ghes_sei);
+
+/*
+ * Return 0 only if one of the SEI error sources successfully reported an error
+ * record sent from the firmware.
+ */
+int ghes_notify_sei(void)
+{
+	struct ghes *ghes;
+	int ret = -ENOENT;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ghes, &ghes_sei, list) {
+		if (!ghes_proc(ghes))
+			ret = 0;
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
+static void ghes_sei_add(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_add_rcu(&ghes->list, &ghes_sei);
+	mutex_unlock(&ghes_list_mutex);
+}
+
+static void ghes_sei_remove(struct ghes *ghes)
+{
+	mutex_lock(&ghes_list_mutex);
+	list_del_rcu(&ghes->list);
+	mutex_unlock(&ghes_list_mutex);
+	synchronize_rcu();
+}
+#else /* CONFIG_ACPI_APEI_SEI */
+static inline void ghes_sei_add(struct ghes *ghes) { }
+static inline void ghes_sei_remove(struct ghes *ghes) { }
+#endif /* CONFIG_ACPI_APEI_SEI */
+
 #ifdef CONFIG_HAVE_ACPI_APEI_NMI
 /*
  * printk is not safe in NMI context.  So in NMI handler, we allocate
@@ -1086,6 +1126,13 @@ static int ghes_probe(struct platform_device *ghes_dev)
 			goto err;
 		}
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		if (!IS_ENABLED(CONFIG_ACPI_APEI_SEI)) {
+			pr_warn(GHES_PFX "Generic hardware error source: %d notified via SEI is not supported!\n",
+				generic->header.source_id);
+		goto err;
+	}
+	break;
 	case ACPI_HEST_NOTIFY_NMI:
 		if (!IS_ENABLED(CONFIG_HAVE_ACPI_APEI_NMI)) {
 			pr_warn(GHES_PFX "Generic hardware error source: %d notified via NMI interrupt is not supported!\n",
@@ -1158,6 +1205,9 @@ static int ghes_probe(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_add(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_add(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_add(ghes);
 		break;
@@ -1211,6 +1261,9 @@ static int ghes_remove(struct platform_device *ghes_dev)
 	case ACPI_HEST_NOTIFY_SEA:
 		ghes_sea_remove(ghes);
 		break;
+	case ACPI_HEST_NOTIFY_SEI:
+		ghes_sei_remove(ghes);
+		break;
 	case ACPI_HEST_NOTIFY_NMI:
 		ghes_nmi_remove(ghes);
 		break;
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index 8feb0c8..9ba59e2 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -120,5 +120,6 @@ static inline void *acpi_hest_get_next(struct acpi_hest_generic_data *gdata)
 	     section = acpi_hest_get_next(section))
 
 int ghes_notify_sea(void);
+int ghes_notify_sei(void);
 
 #endif /* GHES_H */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external
aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps
all Non-secure EL1&0 error record accesses to EL2.

This patch enables the two bits for the guest OS, guaranteeing that
KVM takes external aborts and traps attempts to access the physical
error registers.

ERRIDR_EL1 advertises the number of error records, we return
zero meaning we can treat all the other registers as RAZ/WI too.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
[removed specific emulation, use trap_raz_wi() directly for everything,
 rephrased parts of the commit message]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_emulate.h |  7 +++++++
 arch/arm64/include/asm/sysreg.h      | 10 ++++++++++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++++++
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 61d694c..1188272 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,8 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_TEA		(UL(1) << 37)
+#define HCR_TERR	(UL(1) << 36)
 #define HCR_E2H		(UL(1) << 34)
 #define HCR_ID		(UL(1) << 33)
 #define HCR_CD		(UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index e5df3fc..555b28b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
 	if (is_kernel_in_hyp_mode())
 		vcpu->arch.hcr_el2 |= HCR_E2H;
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		/* route synchronous external abort exceptions to EL2 */
+		vcpu->arch.hcr_el2 |= HCR_TEA;
+		/* trap error record accesses */
+		vcpu->arch.hcr_el2 |= HCR_TERR;
+	}
+
 	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
 		vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 64e2a80..47b967d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -169,6 +169,16 @@
 #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
 #define SYS_AFSR1_EL1			sys_reg(3, 0, 5, 1, 1)
 #define SYS_ESR_EL1			sys_reg(3, 0, 5, 2, 0)
+
+#define SYS_ERRIDR_EL1			sys_reg(3, 0, 5, 3, 0)
+#define SYS_ERRSELR_EL1			sys_reg(3, 0, 5, 3, 1)
+#define SYS_ERXFR_EL1			sys_reg(3, 0, 5, 4, 0)
+#define SYS_ERXCTLR_EL1			sys_reg(3, 0, 5, 4, 1)
+#define SYS_ERXSTATUS_EL1		sys_reg(3, 0, 5, 4, 2)
+#define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
+#define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+
 #define SYS_FAR_EL1			sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1			sys_reg(3, 0, 7, 4, 0)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..2b1fafa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -953,6 +953,16 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
+
+	{ SYS_DESC(SYS_ERRIDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERRSELR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXFR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXCTLR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXSTATUS_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXADDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
+
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external
aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps
all Non-secure EL1&0 error record accesses to EL2.

This patch enables the two bits for the guest OS, guaranteeing that
KVM takes external aborts and traps attempts to access the physical
error registers.

ERRIDR_EL1 advertises the number of error records, we return
zero meaning we can treat all the other registers as RAZ/WI too.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
[removed specific emulation, use trap_raz_wi() directly for everything,
 rephrased parts of the commit message]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_emulate.h |  7 +++++++
 arch/arm64/include/asm/sysreg.h      | 10 ++++++++++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++++++
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 61d694c..1188272 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,8 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_TEA		(UL(1) << 37)
+#define HCR_TERR	(UL(1) << 36)
 #define HCR_E2H		(UL(1) << 34)
 #define HCR_ID		(UL(1) << 33)
 #define HCR_CD		(UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index e5df3fc..555b28b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
 	if (is_kernel_in_hyp_mode())
 		vcpu->arch.hcr_el2 |= HCR_E2H;
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		/* route synchronous external abort exceptions to EL2 */
+		vcpu->arch.hcr_el2 |= HCR_TEA;
+		/* trap error record accesses */
+		vcpu->arch.hcr_el2 |= HCR_TERR;
+	}
+
 	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
 		vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 64e2a80..47b967d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -169,6 +169,16 @@
 #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
 #define SYS_AFSR1_EL1			sys_reg(3, 0, 5, 1, 1)
 #define SYS_ESR_EL1			sys_reg(3, 0, 5, 2, 0)
+
+#define SYS_ERRIDR_EL1			sys_reg(3, 0, 5, 3, 0)
+#define SYS_ERRSELR_EL1			sys_reg(3, 0, 5, 3, 1)
+#define SYS_ERXFR_EL1			sys_reg(3, 0, 5, 4, 0)
+#define SYS_ERXCTLR_EL1			sys_reg(3, 0, 5, 4, 1)
+#define SYS_ERXSTATUS_EL1		sys_reg(3, 0, 5, 4, 2)
+#define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
+#define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+
 #define SYS_FAR_EL1			sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1			sys_reg(3, 0, 7, 4, 0)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..2b1fafa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -953,6 +953,16 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
+
+	{ SYS_DESC(SYS_ERRIDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERRSELR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXFR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXCTLR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXSTATUS_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXADDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
+
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external
aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps
all Non-secure EL1&0 error record accesses to EL2.

This patch enables the two bits for the guest OS, guaranteeing that
KVM takes external aborts and traps attempts to access the physical
error registers.

ERRIDR_EL1 advertises the number of error records, we return
zero meaning we can treat all the other registers as RAZ/WI too.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
[removed specific emulation, use trap_raz_wi() directly for everything,
 rephrased parts of the commit message]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_emulate.h |  7 +++++++
 arch/arm64/include/asm/sysreg.h      | 10 ++++++++++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++++++
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 61d694c..1188272 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,8 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_TEA		(UL(1) << 37)
+#define HCR_TERR	(UL(1) << 36)
 #define HCR_E2H		(UL(1) << 34)
 #define HCR_ID		(UL(1) << 33)
 #define HCR_CD		(UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index e5df3fc..555b28b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
 	if (is_kernel_in_hyp_mode())
 		vcpu->arch.hcr_el2 |= HCR_E2H;
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		/* route synchronous external abort exceptions to EL2 */
+		vcpu->arch.hcr_el2 |= HCR_TEA;
+		/* trap error record accesses */
+		vcpu->arch.hcr_el2 |= HCR_TERR;
+	}
+
 	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
 		vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 64e2a80..47b967d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -169,6 +169,16 @@
 #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
 #define SYS_AFSR1_EL1			sys_reg(3, 0, 5, 1, 1)
 #define SYS_ESR_EL1			sys_reg(3, 0, 5, 2, 0)
+
+#define SYS_ERRIDR_EL1			sys_reg(3, 0, 5, 3, 0)
+#define SYS_ERRSELR_EL1			sys_reg(3, 0, 5, 3, 1)
+#define SYS_ERXFR_EL1			sys_reg(3, 0, 5, 4, 0)
+#define SYS_ERXCTLR_EL1			sys_reg(3, 0, 5, 4, 1)
+#define SYS_ERXSTATUS_EL1		sys_reg(3, 0, 5, 4, 2)
+#define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
+#define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+
 #define SYS_FAR_EL1			sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1			sys_reg(3, 0, 7, 4, 0)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..2b1fafa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -953,6 +953,16 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
+
+	{ SYS_DESC(SYS_ERRIDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERRSELR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXFR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXCTLR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXSTATUS_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXADDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
+
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3974 bytes --]

ARMv8.2 adds a new bit HCR_EL2.TEA which routes synchronous external
aborts to EL2, and adds a trap control bit HCR_EL2.TERR which traps
all Non-secure EL1&0 error record accesses to EL2.

This patch enables the two bits for the guest OS, guaranteeing that
KVM takes external aborts and traps attempts to access the physical
error registers.

ERRIDR_EL1 advertises the number of error records, we return
zero meaning we can treat all the other registers as RAZ/WI too.

Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
[removed specific emulation, use trap_raz_wi() directly for everything,
 rephrased parts of the commit message]
Signed-off-by: James Morse <james.morse(a)arm.com>
---
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_emulate.h |  7 +++++++
 arch/arm64/include/asm/sysreg.h      | 10 ++++++++++
 arch/arm64/kvm/sys_regs.c            | 10 ++++++++++
 4 files changed, 29 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 61d694c..1188272 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -23,6 +23,8 @@
 #include <asm/types.h>
 
 /* Hyp Configuration Register (HCR) bits */
+#define HCR_TEA		(UL(1) << 37)
+#define HCR_TERR	(UL(1) << 36)
 #define HCR_E2H		(UL(1) << 34)
 #define HCR_ID		(UL(1) << 33)
 #define HCR_CD		(UL(1) << 32)
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index e5df3fc..555b28b 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -47,6 +47,13 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
 	vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS;
 	if (is_kernel_in_hyp_mode())
 		vcpu->arch.hcr_el2 |= HCR_E2H;
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		/* route synchronous external abort exceptions to EL2 */
+		vcpu->arch.hcr_el2 |= HCR_TEA;
+		/* trap error record accesses */
+		vcpu->arch.hcr_el2 |= HCR_TERR;
+	}
+
 	if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features))
 		vcpu->arch.hcr_el2 &= ~HCR_RW;
 }
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 64e2a80..47b967d 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -169,6 +169,16 @@
 #define SYS_AFSR0_EL1			sys_reg(3, 0, 5, 1, 0)
 #define SYS_AFSR1_EL1			sys_reg(3, 0, 5, 1, 1)
 #define SYS_ESR_EL1			sys_reg(3, 0, 5, 2, 0)
+
+#define SYS_ERRIDR_EL1			sys_reg(3, 0, 5, 3, 0)
+#define SYS_ERRSELR_EL1			sys_reg(3, 0, 5, 3, 1)
+#define SYS_ERXFR_EL1			sys_reg(3, 0, 5, 4, 0)
+#define SYS_ERXCTLR_EL1			sys_reg(3, 0, 5, 4, 1)
+#define SYS_ERXSTATUS_EL1		sys_reg(3, 0, 5, 4, 2)
+#define SYS_ERXADDR_EL1			sys_reg(3, 0, 5, 4, 3)
+#define SYS_ERXMISC0_EL1		sys_reg(3, 0, 5, 5, 0)
+#define SYS_ERXMISC1_EL1		sys_reg(3, 0, 5, 5, 1)
+
 #define SYS_FAR_EL1			sys_reg(3, 0, 6, 0, 0)
 #define SYS_PAR_EL1			sys_reg(3, 0, 7, 4, 0)
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 2e070d3..2b1fafa 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -953,6 +953,16 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu,
 	{ SYS_DESC(SYS_AFSR0_EL1), access_vm_reg, reset_unknown, AFSR0_EL1 },
 	{ SYS_DESC(SYS_AFSR1_EL1), access_vm_reg, reset_unknown, AFSR1_EL1 },
 	{ SYS_DESC(SYS_ESR_EL1), access_vm_reg, reset_unknown, ESR_EL1 },
+
+	{ SYS_DESC(SYS_ERRIDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERRSELR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXFR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXCTLR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXSTATUS_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXADDR_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC0_EL1), trap_raz_wi },
+	{ SYS_DESC(SYS_ERXMISC1_EL1), trap_raz_wi },
+
 	{ SYS_DESC(SYS_FAR_EL1), access_vm_reg, reset_unknown, FAR_EL1 },
 	{ SYS_DESC(SYS_PAR_EL1), NULL, reset_unknown, PAR_EL1 },
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
guest and user space needs a way to tell KVM this value. So we add a
new ioctl. Before user space specifies the Exception Syndrome Register
ESR(ESR), it firstly checks that whether KVM has the capability to
set the guest ESR, If has, will set it. Otherwise, nothing to do.

For this ESR specifying, Only support for AArch64, not support AArch32.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>

change the name to KVM_CAP_ARM_INJECT_SERROR_ESR instead of
XXXXX_ARM_RAS_EXTENSION, suggested here

https://patchwork.kernel.org/patch/9925203/
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/guest.c              |  9 +++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/guest.c            |  5 +++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  3 +++
 virt/kvm/arm/arm.c                |  7 +++++++
 8 files changed, 40 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e63a35f..6dfb9fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4347,3 +4347,14 @@ This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr.  Its
 value is used to denote the target vcpu for a SynIC interrupt.  For
 compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
 capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify syndrome value reported to
+guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+into ISS filed of ESR_EL1
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 4a879f6..6cf5c7b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -211,6 +211,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 1e0784e..1e15fa2 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/*
+ * we only support guest SError syndrome specifying
+ * for ARM64, not support it for ARM32.
+ */
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e923b58..769cc58 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -317,6 +317,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..738ae90 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7e99999..0c861c4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -931,6 +931,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SMT_POSSIBLE 147
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 150
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1357,6 +1358,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_S390_CMMA_MIGRATION */
 #define KVM_S390_GET_CMMA_BITS      _IOWR(KVMIO, 0xb8, struct kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
+/* Available with KVM_CAP_ARM_INJECT_SERROR_ESR */
+#define KVM_ARM_INJECT_SERROR_ESR   _IOW(KVMIO,  0xba, __u32)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 95cba07..60dea5f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1027,6 +1027,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 			return -EFAULT;
 		return kvm_arm_vcpu_has_attr(vcpu, &attr);
 	}
+	case KVM_ARM_INJECT_SERROR_ESR: {
+		u32 syndrome;
+
+		if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
+			return -EFAULT;
+		return kvm_arm_set_sei_esr(vcpu, &syndrome);
+	}
 	default:
 		return -EINVAL;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
guest and user space needs a way to tell KVM this value. So we add a
new ioctl. Before user space specifies the Exception Syndrome Register
ESR(ESR), it firstly checks that whether KVM has the capability to
set the guest ESR, If has, will set it. Otherwise, nothing to do.

For this ESR specifying, Only support for AArch64, not support AArch32.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>

change the name to KVM_CAP_ARM_INJECT_SERROR_ESR instead of
XXXXX_ARM_RAS_EXTENSION, suggested here

https://patchwork.kernel.org/patch/9925203/
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/guest.c              |  9 +++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/guest.c            |  5 +++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  3 +++
 virt/kvm/arm/arm.c                |  7 +++++++
 8 files changed, 40 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e63a35f..6dfb9fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4347,3 +4347,14 @@ This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr.  Its
 value is used to denote the target vcpu for a SynIC interrupt.  For
 compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
 capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify syndrome value reported to
+guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+into ISS filed of ESR_EL1
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 4a879f6..6cf5c7b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -211,6 +211,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 1e0784e..1e15fa2 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/*
+ * we only support guest SError syndrome specifying
+ * for ARM64, not support it for ARM32.
+ */
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e923b58..769cc58 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -317,6 +317,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..738ae90 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7e99999..0c861c4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -931,6 +931,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SMT_POSSIBLE 147
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 150
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1357,6 +1358,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_S390_CMMA_MIGRATION */
 #define KVM_S390_GET_CMMA_BITS      _IOWR(KVMIO, 0xb8, struct kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
+/* Available with KVM_CAP_ARM_INJECT_SERROR_ESR */
+#define KVM_ARM_INJECT_SERROR_ESR   _IOW(KVMIO,  0xba, __u32)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 95cba07..60dea5f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1027,6 +1027,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 			return -EFAULT;
 		return kvm_arm_vcpu_has_attr(vcpu, &attr);
 	}
+	case KVM_ARM_INJECT_SERROR_ESR: {
+		u32 syndrome;
+
+		if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
+			return -EFAULT;
+		return kvm_arm_set_sei_esr(vcpu, &syndrome);
+	}
 	default:
 		return -EINVAL;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
guest and user space needs a way to tell KVM this value. So we add a
new ioctl. Before user space specifies the Exception Syndrome Register
ESR(ESR), it firstly checks that whether KVM has the capability to
set the guest ESR, If has, will set it. Otherwise, nothing to do.

For this ESR specifying, Only support for AArch64, not support AArch32.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>

change the name to KVM_CAP_ARM_INJECT_SERROR_ESR instead of
XXXXX_ARM_RAS_EXTENSION, suggested here

https://patchwork.kernel.org/patch/9925203/
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/guest.c              |  9 +++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/guest.c            |  5 +++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  3 +++
 virt/kvm/arm/arm.c                |  7 +++++++
 8 files changed, 40 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e63a35f..6dfb9fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4347,3 +4347,14 @@ This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr.  Its
 value is used to denote the target vcpu for a SynIC interrupt.  For
 compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
 capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify syndrome value reported to
+guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+into ISS filed of ESR_EL1
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 4a879f6..6cf5c7b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -211,6 +211,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 1e0784e..1e15fa2 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/*
+ * we only support guest SError syndrome specifying
+ * for ARM64, not support it for ARM32.
+ */
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e923b58..769cc58 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -317,6 +317,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..738ae90 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7e99999..0c861c4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -931,6 +931,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SMT_POSSIBLE 147
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 150
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1357,6 +1358,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_S390_CMMA_MIGRATION */
 #define KVM_S390_GET_CMMA_BITS      _IOWR(KVMIO, 0xb8, struct kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
+/* Available with KVM_CAP_ARM_INJECT_SERROR_ESR */
+#define KVM_ARM_INJECT_SERROR_ESR   _IOW(KVMIO,  0xba, __u32)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 95cba07..60dea5f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1027,6 +1027,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 			return -EFAULT;
 		return kvm_arm_vcpu_has_attr(vcpu, &attr);
 	}
+	case KVM_ARM_INJECT_SERROR_ESR: {
+		u32 syndrome;
+
+		if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
+			return -EFAULT;
+		return kvm_arm_set_sei_esr(vcpu, &syndrome);
+	}
 	default:
 		return -EINVAL;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 6257 bytes --]

The ARM64 RAS SError Interrupt(SEI) syndrome value is specific to the
guest and user space needs a way to tell KVM this value. So we add a
new ioctl. Before user space specifies the Exception Syndrome Register
ESR(ESR), it firstly checks that whether KVM has the capability to
set the guest ESR, If has, will set it. Otherwise, nothing to do.

For this ESR specifying, Only support for AArch64, not support AArch32.

Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
Signed-off-by: Quanming Wu <wuquanming(a)huawei.com>

change the name to KVM_CAP_ARM_INJECT_SERROR_ESR instead of
XXXXX_ARM_RAS_EXTENSION, suggested here

https://patchwork.kernel.org/patch/9925203/
---
 Documentation/virtual/kvm/api.txt | 11 +++++++++++
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/guest.c              |  9 +++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/guest.c            |  5 +++++
 arch/arm64/kvm/reset.c            |  3 +++
 include/uapi/linux/kvm.h          |  3 +++
 virt/kvm/arm/arm.c                |  7 +++++++
 8 files changed, 40 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index e63a35f..6dfb9fc 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4347,3 +4347,14 @@ This capability indicates that userspace can load HV_X64_MSR_VP_INDEX msr.  Its
 value is used to denote the target vcpu for a SynIC interrupt.  For
 compatibilty, KVM initializes this msr to KVM's internal vcpu index.  When this
 capability is absent, userspace can still query this msr's value.
+
+8.13 KVM_CAP_ARM_SET_SERROR_ESR
+
+Architectures: arm, arm64
+
+This capability indicates that userspace can specify syndrome value reported to
+guest OS when guest takes a virtual SError interrupt exception.
+If KVM has this capability, userspace can only specify the ISS field for the ESR
+syndrome, can not specify the EC field which is not under control by KVM.
+If this virtual SError is taken to EL1 using AArch64, this value will be reported
+into ISS filed of ESR_EL1
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 4a879f6..6cf5c7b 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -211,6 +211,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 unsigned long kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 
diff --git a/arch/arm/kvm/guest.c b/arch/arm/kvm/guest.c
index 1e0784e..1e15fa2 100644
--- a/arch/arm/kvm/guest.c
+++ b/arch/arm/kvm/guest.c
@@ -248,6 +248,15 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+/*
+ * we only support guest SError syndrome specifying
+ * for ARM64, not support it for ARM32.
+ */
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	switch (read_cpuid_part()) {
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e923b58..769cc58 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -317,6 +317,7 @@ struct kvm_vcpu_stat {
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5c7f657..738ae90 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -277,6 +277,11 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 	return -EINVAL;
 }
 
+int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
+{
+	return -EINVAL;
+}
+
 int __attribute_const__ kvm_target_cpu(void)
 {
 	unsigned long implementor = read_cpuid_implementor();
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 3256b92..38c8a64 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -77,6 +77,9 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PMU_V3:
 		r = kvm_arm_support_pmu_v3();
 		break;
+	case KVM_CAP_ARM_INJECT_SERROR_ESR:
+		r = cpus_have_const_cap(ARM64_HAS_RAS_EXTN);
+		break;
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 		r = 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7e99999..0c861c4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -931,6 +931,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SMT_POSSIBLE 147
 #define KVM_CAP_HYPERV_SYNIC2 148
 #define KVM_CAP_HYPERV_VP_INDEX 149
+#define KVM_CAP_ARM_INJECT_SERROR_ESR 150
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1357,6 +1358,8 @@ struct kvm_s390_ucas_mapping {
 /* Available with KVM_CAP_S390_CMMA_MIGRATION */
 #define KVM_S390_GET_CMMA_BITS      _IOWR(KVMIO, 0xb8, struct kvm_s390_cmma_log)
 #define KVM_S390_SET_CMMA_BITS      _IOW(KVMIO, 0xb9, struct kvm_s390_cmma_log)
+/* Available with KVM_CAP_ARM_INJECT_SERROR_ESR */
+#define KVM_ARM_INJECT_SERROR_ESR   _IOW(KVMIO,  0xba, __u32)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 95cba07..60dea5f 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1027,6 +1027,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 			return -EFAULT;
 		return kvm_arm_vcpu_has_attr(vcpu, &attr);
 	}
+	case KVM_ARM_INJECT_SERROR_ESR: {
+		u32 syndrome;
+
+		if (copy_from_user(&syndrome, argp, sizeof(syndrome)))
+			return -EFAULT;
+		return kvm_arm_set_sei_esr(vcpu, &syndrome);
+	}
 	default:
 		return -EINVAL;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

RAS Extension add a VSESR_EL2 register which can provides
the syndrome value reported to software on taking a virtual
SError interrupt exception. This patch supports to specify
this Syndrome.

In the RAS Extensions we can not set all-zero syndrome value
for SError, which means 'RAS error: Uncategorized' instead of
'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
by default.

We also need to support userspace to specify a valid syndrome
value, Because in some case, the recovery is driven by userspace.
This patch can support that userspace can specify it.

In the guest/host world switch, restore this value to VSESR_EL2
only when HCR_EL2.VSE is set. This value no need to be saved
because it is stale vale when guest exit.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>
[Set an impdef ESR for Virtual-SError]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_host.h    |  1 +
 arch/arm64/include/asm/sysreg.h      |  3 +++
 arch/arm64/kvm/guest.c               | 11 ++++++++++-
 arch/arm64/kvm/hyp/switch.c          | 16 ++++++++++++++++
 arch/arm64/kvm/inject_fault.c        | 13 ++++++++++++-
 6 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 555b28b..73c84d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.esr_el2;
 }
 
+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	vcpu->arch.fault.vsesr_el2 = val;
+}
+
 static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
 	u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 769cc58..53d1d81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_fault_info {
 	u32 esr_el2;		/* Hyp Syndrom Register */
 	u64 far_el2;		/* Hyp Fault Address Register */
 	u64 hpfar_el2;		/* Hyp IPA Fault Address Register */
+	u32 vsesr_el2;          /* Virtual SError Exception Syndrome Register */
 };
 
 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 47b967d..3b035cc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,6 +86,9 @@
 #define REG_PSTATE_PAN_IMM		sys_reg(0, 0, 4, 0, 4)
 #define REG_PSTATE_UAO_IMM		sys_reg(0, 0, 4, 0, 3)
 
+/* virtual SError exception syndrome register */
+#define REG_VSESR_EL2                  sys_reg(3, 4, 5, 2, 3)
+
 #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM |	\
 				      (!!x)<<8 | 0x1f)
 #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM |	\
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 738ae90..ffad42b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
 {
-	return -EINVAL;
+	u64 reg = *syndrome;
+
+	/* inject virtual system Error or asynchronous abort */
+	kvm_inject_vabt(vcpu);
+
+	if (reg)
+		/* set vsesr_el2[24:0] with value that user space specified */
+		kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
+
+	return 0;
 }
 
 int __attribute_const__ kvm_target_cpu(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c6f17c7..06a71d2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,14 @@ static hyp_alternate_select(__activate_traps_arch,
 			    __activate_traps_nvhe, __activate_traps_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu, u64 value)
+{
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+			       (value & HCR_VSE))
+		write_sysreg_s(kvm_vcpu_get_vsesr(vcpu), REG_VSESR_EL2);
+}
+
+
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -86,6 +94,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 	write_sysreg(val, hcr_el2);
+
+	/*
+	 * If the virtual SError interrupt is taken to EL1 using AArch64,
+	 * then VSESR_EL2 provides the syndrome value reported in ISS field
+	 * of ESR_EL1.
+	 */
+	__sysreg_set_vsesr(vcpu, val);
+
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
 	/*
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 3556715..fb94b5e 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 		inject_undef64(vcpu);
 }
 
+static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
+{
+	kvm_vcpu_set_vsesr(vcpu, esr);
+	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+}
+
 /**
  * kvm_inject_vabt - inject an async abort / SError into the guest
  * @vcpu: The VCPU to receive the exception
  *
  * It is assumed that this code is called from the VCPU thread and that the
  * VCPU therefore is not currently executing guest code.
+ *
+ * Systems with the RAS Extensions specify an imp-def ESR (ISV/IDS = 1) with
+ * the remaining ISS all-zeros so that this error is not interpreted as an
+ * uncatagorized RAS error. Without the RAS Extensions we can't specify an ESR
+ * value, so the CPU generates an imp-def value.
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+	pend_guest_serror(vcpu, ESR_ELx_ISV);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

RAS Extension add a VSESR_EL2 register which can provides
the syndrome value reported to software on taking a virtual
SError interrupt exception. This patch supports to specify
this Syndrome.

In the RAS Extensions we can not set all-zero syndrome value
for SError, which means 'RAS error: Uncategorized' instead of
'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
by default.

We also need to support userspace to specify a valid syndrome
value, Because in some case, the recovery is driven by userspace.
This patch can support that userspace can specify it.

In the guest/host world switch, restore this value to VSESR_EL2
only when HCR_EL2.VSE is set. This value no need to be saved
because it is stale vale when guest exit.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>
[Set an impdef ESR for Virtual-SError]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_host.h    |  1 +
 arch/arm64/include/asm/sysreg.h      |  3 +++
 arch/arm64/kvm/guest.c               | 11 ++++++++++-
 arch/arm64/kvm/hyp/switch.c          | 16 ++++++++++++++++
 arch/arm64/kvm/inject_fault.c        | 13 ++++++++++++-
 6 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 555b28b..73c84d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.esr_el2;
 }
 
+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	vcpu->arch.fault.vsesr_el2 = val;
+}
+
 static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
 	u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 769cc58..53d1d81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_fault_info {
 	u32 esr_el2;		/* Hyp Syndrom Register */
 	u64 far_el2;		/* Hyp Fault Address Register */
 	u64 hpfar_el2;		/* Hyp IPA Fault Address Register */
+	u32 vsesr_el2;          /* Virtual SError Exception Syndrome Register */
 };
 
 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 47b967d..3b035cc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,6 +86,9 @@
 #define REG_PSTATE_PAN_IMM		sys_reg(0, 0, 4, 0, 4)
 #define REG_PSTATE_UAO_IMM		sys_reg(0, 0, 4, 0, 3)
 
+/* virtual SError exception syndrome register */
+#define REG_VSESR_EL2                  sys_reg(3, 4, 5, 2, 3)
+
 #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM |	\
 				      (!!x)<<8 | 0x1f)
 #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM |	\
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 738ae90..ffad42b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
 {
-	return -EINVAL;
+	u64 reg = *syndrome;
+
+	/* inject virtual system Error or asynchronous abort */
+	kvm_inject_vabt(vcpu);
+
+	if (reg)
+		/* set vsesr_el2[24:0] with value that user space specified */
+		kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
+
+	return 0;
 }
 
 int __attribute_const__ kvm_target_cpu(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c6f17c7..06a71d2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,14 @@ static hyp_alternate_select(__activate_traps_arch,
 			    __activate_traps_nvhe, __activate_traps_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu, u64 value)
+{
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+			       (value & HCR_VSE))
+		write_sysreg_s(kvm_vcpu_get_vsesr(vcpu), REG_VSESR_EL2);
+}
+
+
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -86,6 +94,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 	write_sysreg(val, hcr_el2);
+
+	/*
+	 * If the virtual SError interrupt is taken to EL1 using AArch64,
+	 * then VSESR_EL2 provides the syndrome value reported in ISS field
+	 * of ESR_EL1.
+	 */
+	__sysreg_set_vsesr(vcpu, val);
+
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
 	/*
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 3556715..fb94b5e 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 		inject_undef64(vcpu);
 }
 
+static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
+{
+	kvm_vcpu_set_vsesr(vcpu, esr);
+	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+}
+
 /**
  * kvm_inject_vabt - inject an async abort / SError into the guest
  * @vcpu: The VCPU to receive the exception
  *
  * It is assumed that this code is called from the VCPU thread and that the
  * VCPU therefore is not currently executing guest code.
+ *
+ * Systems with the RAS Extensions specify an imp-def ESR (ISV/IDS = 1) with
+ * the remaining ISS all-zeros so that this error is not interpreted as an
+ * uncatagorized RAS error. Without the RAS Extensions we can't specify an ESR
+ * value, so the CPU generates an imp-def value.
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+	pend_guest_serror(vcpu, ESR_ELx_ISV);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

RAS Extension add a VSESR_EL2 register which can provides
the syndrome value reported to software on taking a virtual
SError interrupt exception. This patch supports to specify
this Syndrome.

In the RAS Extensions we can not set all-zero syndrome value
for SError, which means 'RAS error: Uncategorized' instead of
'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
by default.

We also need to support userspace to specify a valid syndrome
value, Because in some case, the recovery is driven by userspace.
This patch can support that userspace can specify it.

In the guest/host world switch, restore this value to VSESR_EL2
only when HCR_EL2.VSE is set. This value no need to be saved
because it is stale vale when guest exit.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>
[Set an impdef ESR for Virtual-SError]
Signed-off-by: James Morse <james.morse@arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_host.h    |  1 +
 arch/arm64/include/asm/sysreg.h      |  3 +++
 arch/arm64/kvm/guest.c               | 11 ++++++++++-
 arch/arm64/kvm/hyp/switch.c          | 16 ++++++++++++++++
 arch/arm64/kvm/inject_fault.c        | 13 ++++++++++++-
 6 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 555b28b..73c84d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.esr_el2;
 }
 
+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	vcpu->arch.fault.vsesr_el2 = val;
+}
+
 static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
 	u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 769cc58..53d1d81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_fault_info {
 	u32 esr_el2;		/* Hyp Syndrom Register */
 	u64 far_el2;		/* Hyp Fault Address Register */
 	u64 hpfar_el2;		/* Hyp IPA Fault Address Register */
+	u32 vsesr_el2;          /* Virtual SError Exception Syndrome Register */
 };
 
 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 47b967d..3b035cc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,6 +86,9 @@
 #define REG_PSTATE_PAN_IMM		sys_reg(0, 0, 4, 0, 4)
 #define REG_PSTATE_UAO_IMM		sys_reg(0, 0, 4, 0, 3)
 
+/* virtual SError exception syndrome register */
+#define REG_VSESR_EL2                  sys_reg(3, 4, 5, 2, 3)
+
 #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM |	\
 				      (!!x)<<8 | 0x1f)
 #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM |	\
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 738ae90..ffad42b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
 {
-	return -EINVAL;
+	u64 reg = *syndrome;
+
+	/* inject virtual system Error or asynchronous abort */
+	kvm_inject_vabt(vcpu);
+
+	if (reg)
+		/* set vsesr_el2[24:0] with value that user space specified */
+		kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
+
+	return 0;
 }
 
 int __attribute_const__ kvm_target_cpu(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c6f17c7..06a71d2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,14 @@ static hyp_alternate_select(__activate_traps_arch,
 			    __activate_traps_nvhe, __activate_traps_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu, u64 value)
+{
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+			       (value & HCR_VSE))
+		write_sysreg_s(kvm_vcpu_get_vsesr(vcpu), REG_VSESR_EL2);
+}
+
+
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -86,6 +94,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 	write_sysreg(val, hcr_el2);
+
+	/*
+	 * If the virtual SError interrupt is taken to EL1 using AArch64,
+	 * then VSESR_EL2 provides the syndrome value reported in ISS field
+	 * of ESR_EL1.
+	 */
+	__sysreg_set_vsesr(vcpu, val);
+
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
 	/*
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 3556715..fb94b5e 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 		inject_undef64(vcpu);
 }
 
+static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
+{
+	kvm_vcpu_set_vsesr(vcpu, esr);
+	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+}
+
 /**
  * kvm_inject_vabt - inject an async abort / SError into the guest
  * @vcpu: The VCPU to receive the exception
  *
  * It is assumed that this code is called from the VCPU thread and that the
  * VCPU therefore is not currently executing guest code.
+ *
+ * Systems with the RAS Extensions specify an imp-def ESR (ISV/IDS = 1) with
+ * the remaining ISS all-zeros so that this error is not interpreted as an
+ * uncatagorized RAS error. Without the RAS Extensions we can't specify an ESR
+ * value, so the CPU generates an imp-def value.
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+	pend_guest_serror(vcpu, ESR_ELx_ISV);
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 6045 bytes --]

RAS Extension add a VSESR_EL2 register which can provides
the syndrome value reported to software on taking a virtual
SError interrupt exception. This patch supports to specify
this Syndrome.

In the RAS Extensions we can not set all-zero syndrome value
for SError, which means 'RAS error: Uncategorized' instead of
'no valid ISS'. So set it to IMPLEMENTATION DEFINED syndrome
by default.

We also need to support userspace to specify a valid syndrome
value, Because in some case, the recovery is driven by userspace.
This patch can support that userspace can specify it.

In the guest/host world switch, restore this value to VSESR_EL2
only when HCR_EL2.VSE is set. This value no need to be saved
because it is stale vale when guest exit.

Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
Signed-off-by: Quanming Wu <wuquanming(a)huawei.com>
[Set an impdef ESR for Virtual-SError]
Signed-off-by: James Morse <james.morse(a)arm.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 10 ++++++++++
 arch/arm64/include/asm/kvm_host.h    |  1 +
 arch/arm64/include/asm/sysreg.h      |  3 +++
 arch/arm64/kvm/guest.c               | 11 ++++++++++-
 arch/arm64/kvm/hyp/switch.c          | 16 ++++++++++++++++
 arch/arm64/kvm/inject_fault.c        | 13 ++++++++++++-
 6 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 555b28b..73c84d0 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -155,6 +155,16 @@ static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.esr_el2;
 }
 
+static inline u32 kvm_vcpu_get_vsesr(const struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.fault.vsesr_el2;
+}
+
+static inline void kvm_vcpu_set_vsesr(struct kvm_vcpu *vcpu, unsigned long val)
+{
+	vcpu->arch.fault.vsesr_el2 = val;
+}
+
 static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
 	u32 esr = kvm_vcpu_get_hsr(vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 769cc58..53d1d81 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -88,6 +88,7 @@ struct kvm_vcpu_fault_info {
 	u32 esr_el2;		/* Hyp Syndrom Register */
 	u64 far_el2;		/* Hyp Fault Address Register */
 	u64 hpfar_el2;		/* Hyp IPA Fault Address Register */
+	u32 vsesr_el2;          /* Virtual SError Exception Syndrome Register */
 };
 
 /*
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 47b967d..3b035cc 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -86,6 +86,9 @@
 #define REG_PSTATE_PAN_IMM		sys_reg(0, 0, 4, 0, 4)
 #define REG_PSTATE_UAO_IMM		sys_reg(0, 0, 4, 0, 3)
 
+/* virtual SError exception syndrome register */
+#define REG_VSESR_EL2                  sys_reg(3, 4, 5, 2, 3)
+
 #define SET_PSTATE_PAN(x) __emit_inst(0xd5000000 | REG_PSTATE_PAN_IMM |	\
 				      (!!x)<<8 | 0x1f)
 #define SET_PSTATE_UAO(x) __emit_inst(0xd5000000 | REG_PSTATE_UAO_IMM |	\
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 738ae90..ffad42b 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -279,7 +279,16 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
 
 int kvm_arm_set_sei_esr(struct kvm_vcpu *vcpu, u32 *syndrome)
 {
-	return -EINVAL;
+	u64 reg = *syndrome;
+
+	/* inject virtual system Error or asynchronous abort */
+	kvm_inject_vabt(vcpu);
+
+	if (reg)
+		/* set vsesr_el2[24:0] with value that user space specified */
+		kvm_vcpu_set_vsesr(vcpu, reg & ESR_ELx_ISS_MASK);
+
+	return 0;
 }
 
 int __attribute_const__ kvm_target_cpu(void)
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index c6f17c7..06a71d2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -67,6 +67,14 @@ static hyp_alternate_select(__activate_traps_arch,
 			    __activate_traps_nvhe, __activate_traps_vhe,
 			    ARM64_HAS_VIRT_HOST_EXTN);
 
+static void __hyp_text __sysreg_set_vsesr(struct kvm_vcpu *vcpu, u64 value)
+{
+	if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) &&
+			       (value & HCR_VSE))
+		write_sysreg_s(kvm_vcpu_get_vsesr(vcpu), REG_VSESR_EL2);
+}
+
+
 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 {
 	u64 val;
@@ -86,6 +94,14 @@ static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu)
 		isb();
 	}
 	write_sysreg(val, hcr_el2);
+
+	/*
+	 * If the virtual SError interrupt is taken to EL1 using AArch64,
+	 * then VSESR_EL2 provides the syndrome value reported in ISS field
+	 * of ESR_EL1.
+	 */
+	__sysreg_set_vsesr(vcpu, val);
+
 	/* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */
 	write_sysreg(1 << 15, hstr_el2);
 	/*
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 3556715..fb94b5e 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -246,14 +246,25 @@ void kvm_inject_undefined(struct kvm_vcpu *vcpu)
 		inject_undef64(vcpu);
 }
 
+static void pend_guest_serror(struct kvm_vcpu *vcpu, u64 esr)
+{
+	kvm_vcpu_set_vsesr(vcpu, esr);
+	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+}
+
 /**
  * kvm_inject_vabt - inject an async abort / SError into the guest
  * @vcpu: The VCPU to receive the exception
  *
  * It is assumed that this code is called from the VCPU thread and that the
  * VCPU therefore is not currently executing guest code.
+ *
+ * Systems with the RAS Extensions specify an imp-def ESR (ISV/IDS = 1) with
+ * the remaining ISS all-zeros so that this error is not interpreted as an
+ * uncatagorized RAS error. Without the RAS Extensions we can't specify an ESR
+ * value, so the CPU generates an imp-def value.
  */
 void kvm_inject_vabt(struct kvm_vcpu *vcpu)
 {
-	vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE);
+	pend_guest_serror(vcpu, ESR_ELx_ISV);
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-10 19:54   ` Dongjiu Geng
  -1 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

If it is not RAS SError, directly inject virtual SError,
which will keep the old way. If it is RAS SError, firstly
let host ACPI module to handle it. For the ACPI handling,
if the error address is invalid, APEI driver will not
identify the address to hwpoison memory and can not notify
guest to do the recovery. In order to safe, KVM continues
categorizing errors and handle it separately.

If the RAS error is not propagated, let host user space to
handle it. The reason is that sometimes we can only kill the
guest effected application instead of panic whose guest OS.
Host user space specifies a valid ESR and inject virtual
SError, guest can just kill the current application if the
non-consumed error coming from guest application.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>
---
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/mm/fault.c                | 16 +++++++++
 5 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 66ed8b6..aca7eee 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -102,6 +102,7 @@
 #define ESR_ELx_FSC_ACCESS	(0x08)
 #define ESR_ELx_FSC_FAULT	(0x04)
 #define ESR_ELx_FSC_PERM	(0x0C)
+#define ESR_ELx_FSC_SERROR	(0x11)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
@@ -119,6 +120,20 @@
 #define ESR_ELx_CM_SHIFT	(8)
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
+/* ISS field definitions for SError interrupt */
+#define ESR_ELx_AET_SHIFT	(10)
+#define ESR_ELx_AET		(UL(0x7) << ESR_ELx_AET_SHIFT)
+/* Uncontainable error */
+#define ESR_ELx_AET_UC		(UL(0) << ESR_ELx_AET_SHIFT)
+/* Unrecoverable error */
+#define ESR_ELx_AET_UEU		(UL(1) << ESR_ELx_AET_SHIFT)
+/* Restartable error */
+#define ESR_ELx_AET_UEO		(UL(2) << ESR_ELx_AET_SHIFT)
+/* Recoverable error */
+#define ESR_ELx_AET_UER		(UL(3) << ESR_ELx_AET_SHIFT)
+/* Corrected */
+#define ESR_ELx_AET_CE		(UL(6) << ESR_ELx_AET_SHIFT)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..884f723 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,9 @@
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP	  2
+/* Error code for SError Interrupt (SEI) exception */
+#define KVM_SEI_SEV_RECOVERABLE	  1
+
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 07aa8e3..9ee13ad 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,6 +57,7 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+int handle_guest_sei(void);
 
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 7debb74..1afdc87 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_psci.h>
+#include <asm/system_misc.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
 	return arm_exit_handlers[hsr_ec];
 }
 
+/**
+ * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
+ * @vcpu:	the VCPU pointer
+ *
+ * For RAS SError interrupt, firstly let host kernel handle it.
+ * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
+ */
+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
+	unsigned int aet = esr & ESR_ELx_AET;
+
+	/*
+	 * This is not RAS SError
+	 */
+	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	/* The host kernel may handle this abort. */
+	handle_guest_sei();
+
+	/*
+	 * In below two conditions, it will directly inject the
+	 * virtual SError:
+	 * 1. The Syndrome is IMPLEMENTATION DEFINED
+	 * 2. It is Uncategorized SEI
+	 */
+	if (impdef_syndrome ||
+		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	switch (aet) {
+	case ESR_ELx_AET_CE:	/* corrected error */
+	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
+		return 1;	/* continue processing the guest exit */
+	case ESR_ELx_AET_UER:	/* The error has not been propagated */
+		/*
+		 * Userspace only handle the guest SError Interrupt(SEI) if the
+		 * error has not been propagated
+		 */
+		run->exit_reason = KVM_EXIT_EXCEPTION;
+		run->ex.exception = ESR_ELx_EC_SERROR;
+		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+		return 0;
+	default:
+		/*
+		 * Until now, the CPU supports RAS and SEI is fatal, or host
+		 * does not support to handle the SError.
+		 */
+		panic("This Asynchronous SError interrupt is dangerous, panic");
+	}
+
+	return 0;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to userspace.
@@ -201,8 +262,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			*vcpu_pc(vcpu) -= adj;
 		}
 
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	}
 
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
@@ -211,8 +271,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	case ARM_EXCEPTION_IRQ:
 		return 1;
 	case ARM_EXCEPTION_EL1_SERROR:
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	case ARM_EXCEPTION_TRAP:
 		/*
 		 * See ARM ARM B1.14.1: "Hyp traps on instructions
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index b64958b..8560672 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -728,6 +728,22 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
 }
 
 /*
+ * Handle SError interrupt that occurred in guest OS.
+ *
+ * The return value will be zero if the SEI was successfully handled
+ * and non-zero if handling is failed.
+ */
+int handle_guest_sei(void)
+{
+	int ret = -ENOENT;
+
+	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI))
+		ret = ghes_notify_sei();
+
+	return ret;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, linux, bp, rjw, james.morse,
	pbonzini, rkrcmar, corbet, catalin.marinas, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, devel,
	gengdongjiu, huangshaoyu, wuquanming, linuxarm

If it is not RAS SError, directly inject virtual SError,
which will keep the old way. If it is RAS SError, firstly
let host ACPI module to handle it. For the ACPI handling,
if the error address is invalid, APEI driver will not
identify the address to hwpoison memory and can not notify
guest to do the recovery. In order to safe, KVM continues
categorizing errors and handle it separately.

If the RAS error is not propagated, let host user space to
handle it. The reason is that sometimes we can only kill the
guest effected application instead of panic whose guest OS.
Host user space specifies a valid ESR and inject virtual
SError, guest can just kill the current application if the
non-consumed error coming from guest application.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>
---
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/mm/fault.c                | 16 +++++++++
 5 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 66ed8b6..aca7eee 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -102,6 +102,7 @@
 #define ESR_ELx_FSC_ACCESS	(0x08)
 #define ESR_ELx_FSC_FAULT	(0x04)
 #define ESR_ELx_FSC_PERM	(0x0C)
+#define ESR_ELx_FSC_SERROR	(0x11)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
@@ -119,6 +120,20 @@
 #define ESR_ELx_CM_SHIFT	(8)
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
+/* ISS field definitions for SError interrupt */
+#define ESR_ELx_AET_SHIFT	(10)
+#define ESR_ELx_AET		(UL(0x7) << ESR_ELx_AET_SHIFT)
+/* Uncontainable error */
+#define ESR_ELx_AET_UC		(UL(0) << ESR_ELx_AET_SHIFT)
+/* Unrecoverable error */
+#define ESR_ELx_AET_UEU		(UL(1) << ESR_ELx_AET_SHIFT)
+/* Restartable error */
+#define ESR_ELx_AET_UEO		(UL(2) << ESR_ELx_AET_SHIFT)
+/* Recoverable error */
+#define ESR_ELx_AET_UER		(UL(3) << ESR_ELx_AET_SHIFT)
+/* Corrected */
+#define ESR_ELx_AET_CE		(UL(6) << ESR_ELx_AET_SHIFT)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..884f723 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,9 @@
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP	  2
+/* Error code for SError Interrupt (SEI) exception */
+#define KVM_SEI_SEV_RECOVERABLE	  1
+
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 07aa8e3..9ee13ad 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,6 +57,7 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+int handle_guest_sei(void);
 
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 7debb74..1afdc87 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_psci.h>
+#include <asm/system_misc.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
 	return arm_exit_handlers[hsr_ec];
 }
 
+/**
+ * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
+ * @vcpu:	the VCPU pointer
+ *
+ * For RAS SError interrupt, firstly let host kernel handle it.
+ * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
+ */
+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
+	unsigned int aet = esr & ESR_ELx_AET;
+
+	/*
+	 * This is not RAS SError
+	 */
+	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	/* The host kernel may handle this abort. */
+	handle_guest_sei();
+
+	/*
+	 * In below two conditions, it will directly inject the
+	 * virtual SError:
+	 * 1. The Syndrome is IMPLEMENTATION DEFINED
+	 * 2. It is Uncategorized SEI
+	 */
+	if (impdef_syndrome ||
+		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	switch (aet) {
+	case ESR_ELx_AET_CE:	/* corrected error */
+	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
+		return 1;	/* continue processing the guest exit */
+	case ESR_ELx_AET_UER:	/* The error has not been propagated */
+		/*
+		 * Userspace only handle the guest SError Interrupt(SEI) if the
+		 * error has not been propagated
+		 */
+		run->exit_reason = KVM_EXIT_EXCEPTION;
+		run->ex.exception = ESR_ELx_EC_SERROR;
+		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+		return 0;
+	default:
+		/*
+		 * Until now, the CPU supports RAS and SEI is fatal, or host
+		 * does not support to handle the SError.
+		 */
+		panic("This Asynchronous SError interrupt is dangerous, panic");
+	}
+
+	return 0;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to userspace.
@@ -201,8 +262,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			*vcpu_pc(vcpu) -= adj;
 		}
 
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	}
 
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
@@ -211,8 +271,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	case ARM_EXCEPTION_IRQ:
 		return 1;
 	case ARM_EXCEPTION_EL1_SERROR:
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	case ARM_EXCEPTION_TRAP:
 		/*
 		 * See ARM ARM B1.14.1: "Hyp traps on instructions
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index b64958b..8560672 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -728,6 +728,22 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
 }
 
 /*
+ * Handle SError interrupt that occurred in guest OS.
+ *
+ * The return value will be zero if the SEI was successfully handled
+ * and non-zero if handling is failed.
+ */
+int handle_guest_sei(void)
+{
+	int ret = -ENOENT;
+
+	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI))
+		ret = ghes_notify_sei();
+
+	return ret;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

If it is not RAS SError, directly inject virtual SError,
which will keep the old way. If it is RAS SError, firstly
let host ACPI module to handle it. For the ACPI handling,
if the error address is invalid, APEI driver will not
identify the address to hwpoison memory and can not notify
guest to do the recovery. In order to safe, KVM continues
categorizing errors and handle it separately.

If the RAS error is not propagated, let host user space to
handle it. The reason is that sometimes we can only kill the
guest effected application instead of panic whose guest OS.
Host user space specifies a valid ESR and inject virtual
SError, guest can just kill the current application if the
non-consumed error coming from guest application.

Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
Signed-off-by: Quanming Wu <wuquanming@huawei.com>
---
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/mm/fault.c                | 16 +++++++++
 5 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 66ed8b6..aca7eee 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -102,6 +102,7 @@
 #define ESR_ELx_FSC_ACCESS	(0x08)
 #define ESR_ELx_FSC_FAULT	(0x04)
 #define ESR_ELx_FSC_PERM	(0x0C)
+#define ESR_ELx_FSC_SERROR	(0x11)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
@@ -119,6 +120,20 @@
 #define ESR_ELx_CM_SHIFT	(8)
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
+/* ISS field definitions for SError interrupt */
+#define ESR_ELx_AET_SHIFT	(10)
+#define ESR_ELx_AET		(UL(0x7) << ESR_ELx_AET_SHIFT)
+/* Uncontainable error */
+#define ESR_ELx_AET_UC		(UL(0) << ESR_ELx_AET_SHIFT)
+/* Unrecoverable error */
+#define ESR_ELx_AET_UEU		(UL(1) << ESR_ELx_AET_SHIFT)
+/* Restartable error */
+#define ESR_ELx_AET_UEO		(UL(2) << ESR_ELx_AET_SHIFT)
+/* Recoverable error */
+#define ESR_ELx_AET_UER		(UL(3) << ESR_ELx_AET_SHIFT)
+/* Corrected */
+#define ESR_ELx_AET_CE		(UL(6) << ESR_ELx_AET_SHIFT)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..884f723 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,9 @@
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP	  2
+/* Error code for SError Interrupt (SEI) exception */
+#define KVM_SEI_SEV_RECOVERABLE	  1
+
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 07aa8e3..9ee13ad 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,6 +57,7 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+int handle_guest_sei(void);
 
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 7debb74..1afdc87 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_psci.h>
+#include <asm/system_misc.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
 	return arm_exit_handlers[hsr_ec];
 }
 
+/**
+ * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
+ * @vcpu:	the VCPU pointer
+ *
+ * For RAS SError interrupt, firstly let host kernel handle it.
+ * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
+ */
+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
+	unsigned int aet = esr & ESR_ELx_AET;
+
+	/*
+	 * This is not RAS SError
+	 */
+	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	/* The host kernel may handle this abort. */
+	handle_guest_sei();
+
+	/*
+	 * In below two conditions, it will directly inject the
+	 * virtual SError:
+	 * 1. The Syndrome is IMPLEMENTATION DEFINED
+	 * 2. It is Uncategorized SEI
+	 */
+	if (impdef_syndrome ||
+		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	switch (aet) {
+	case ESR_ELx_AET_CE:	/* corrected error */
+	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
+		return 1;	/* continue processing the guest exit */
+	case ESR_ELx_AET_UER:	/* The error has not been propagated */
+		/*
+		 * Userspace only handle the guest SError Interrupt(SEI) if the
+		 * error has not been propagated
+		 */
+		run->exit_reason = KVM_EXIT_EXCEPTION;
+		run->ex.exception = ESR_ELx_EC_SERROR;
+		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+		return 0;
+	default:
+		/*
+		 * Until now, the CPU supports RAS and SEI is fatal, or host
+		 * does not support to handle the SError.
+		 */
+		panic("This Asynchronous SError interrupt is dangerous, panic");
+	}
+
+	return 0;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to userspace.
@@ -201,8 +262,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			*vcpu_pc(vcpu) -= adj;
 		}
 
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	}
 
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
@@ -211,8 +271,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	case ARM_EXCEPTION_IRQ:
 		return 1;
 	case ARM_EXCEPTION_EL1_SERROR:
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	case ARM_EXCEPTION_TRAP:
 		/*
 		 * See ARM ARM B1.14.1: "Hyp traps on instructions
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index b64958b..8560672 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -728,6 +728,22 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
 }
 
 /*
+ * Handle SError interrupt that occurred in guest OS.
+ *
+ * The return value will be zero if the SEI was successfully handled
+ * and non-zero if handling is failed.
+ */
+int handle_guest_sei(void)
+{
+	int ret = -ENOENT;
+
+	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI))
+		ret = ghes_notify_sei();
+
+	return ret;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-10 19:54   ` Dongjiu Geng
  0 siblings, 0 replies; 98+ messages in thread
From: Dongjiu Geng @ 2017-11-10 19:54 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 7074 bytes --]

If it is not RAS SError, directly inject virtual SError,
which will keep the old way. If it is RAS SError, firstly
let host ACPI module to handle it. For the ACPI handling,
if the error address is invalid, APEI driver will not
identify the address to hwpoison memory and can not notify
guest to do the recovery. In order to safe, KVM continues
categorizing errors and handle it separately.

If the RAS error is not propagated, let host user space to
handle it. The reason is that sometimes we can only kill the
guest effected application instead of panic whose guest OS.
Host user space specifies a valid ESR and inject virtual
SError, guest can just kill the current application if the
non-consumed error coming from guest application.

Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
Signed-off-by: Quanming Wu <wuquanming(a)huawei.com>
---
 arch/arm64/include/asm/esr.h         | 15 ++++++++
 arch/arm64/include/asm/kvm_asm.h     |  3 ++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/kvm/handle_exit.c         | 67 +++++++++++++++++++++++++++++++++---
 arch/arm64/mm/fault.c                | 16 +++++++++
 5 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 66ed8b6..aca7eee 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -102,6 +102,7 @@
 #define ESR_ELx_FSC_ACCESS	(0x08)
 #define ESR_ELx_FSC_FAULT	(0x04)
 #define ESR_ELx_FSC_PERM	(0x0C)
+#define ESR_ELx_FSC_SERROR	(0x11)
 
 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
@@ -119,6 +120,20 @@
 #define ESR_ELx_CM_SHIFT	(8)
 #define ESR_ELx_CM 		(UL(1) << ESR_ELx_CM_SHIFT)
 
+/* ISS field definitions for SError interrupt */
+#define ESR_ELx_AET_SHIFT	(10)
+#define ESR_ELx_AET		(UL(0x7) << ESR_ELx_AET_SHIFT)
+/* Uncontainable error */
+#define ESR_ELx_AET_UC		(UL(0) << ESR_ELx_AET_SHIFT)
+/* Unrecoverable error */
+#define ESR_ELx_AET_UEU		(UL(1) << ESR_ELx_AET_SHIFT)
+/* Restartable error */
+#define ESR_ELx_AET_UEO		(UL(2) << ESR_ELx_AET_SHIFT)
+/* Recoverable error */
+#define ESR_ELx_AET_UER		(UL(3) << ESR_ELx_AET_SHIFT)
+/* Corrected */
+#define ESR_ELx_AET_CE		(UL(6) << ESR_ELx_AET_SHIFT)
+
 /* ISS field definitions for exceptions taken in to Hyp */
 #define ESR_ELx_CV		(UL(1) << 24)
 #define ESR_ELx_COND_SHIFT	(20)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 26a64d0..884f723 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -27,6 +27,9 @@
 #define ARM_EXCEPTION_IRQ	  0
 #define ARM_EXCEPTION_EL1_SERROR  1
 #define ARM_EXCEPTION_TRAP	  2
+/* Error code for SError Interrupt (SEI) exception */
+#define KVM_SEI_SEV_RECOVERABLE	  1
+
 /* The hyp-stub will return this for any kvm_call_hyp() call */
 #define ARM_EXCEPTION_HYP_GONE	  HVC_STUB_ERR
 
diff --git a/arch/arm64/include/asm/system_misc.h b/arch/arm64/include/asm/system_misc.h
index 07aa8e3..9ee13ad 100644
--- a/arch/arm64/include/asm/system_misc.h
+++ b/arch/arm64/include/asm/system_misc.h
@@ -57,6 +57,7 @@ void hook_debug_fault_code(int nr, int (*fn)(unsigned long, unsigned int,
 })
 
 int handle_guest_sea(phys_addr_t addr, unsigned int esr);
+int handle_guest_sei(void);
 
 #endif	/* __ASSEMBLY__ */
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 7debb74..1afdc87 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -28,6 +28,7 @@
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_psci.h>
+#include <asm/system_misc.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
@@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
 	return arm_exit_handlers[hsr_ec];
 }
 
+/**
+ * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
+ * @vcpu:	the VCPU pointer
+ *
+ * For RAS SError interrupt, firstly let host kernel handle it.
+ * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
+ */
+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
+	unsigned int aet = esr & ESR_ELx_AET;
+
+	/*
+	 * This is not RAS SError
+	 */
+	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	/* The host kernel may handle this abort. */
+	handle_guest_sei();
+
+	/*
+	 * In below two conditions, it will directly inject the
+	 * virtual SError:
+	 * 1. The Syndrome is IMPLEMENTATION DEFINED
+	 * 2. It is Uncategorized SEI
+	 */
+	if (impdef_syndrome ||
+		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+		kvm_inject_vabt(vcpu);
+		return 1;
+	}
+
+	switch (aet) {
+	case ESR_ELx_AET_CE:	/* corrected error */
+	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
+		return 1;	/* continue processing the guest exit */
+	case ESR_ELx_AET_UER:	/* The error has not been propagated */
+		/*
+		 * Userspace only handle the guest SError Interrupt(SEI) if the
+		 * error has not been propagated
+		 */
+		run->exit_reason = KVM_EXIT_EXCEPTION;
+		run->ex.exception = ESR_ELx_EC_SERROR;
+		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+		return 0;
+	default:
+		/*
+		 * Until now, the CPU supports RAS and SEI is fatal, or host
+		 * does not support to handle the SError.
+		 */
+		panic("This Asynchronous SError interrupt is dangerous, panic");
+	}
+
+	return 0;
+}
+
 /*
  * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
  * proper exit to userspace.
@@ -201,8 +262,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 			*vcpu_pc(vcpu) -= adj;
 		}
 
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	}
 
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
@@ -211,8 +271,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	case ARM_EXCEPTION_IRQ:
 		return 1;
 	case ARM_EXCEPTION_EL1_SERROR:
-		kvm_inject_vabt(vcpu);
-		return 1;
+		return kvm_handle_guest_sei(vcpu, run);
 	case ARM_EXCEPTION_TRAP:
 		/*
 		 * See ARM ARM B1.14.1: "Hyp traps on instructions
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index b64958b..8560672 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -728,6 +728,22 @@ int handle_guest_sea(phys_addr_t addr, unsigned int esr)
 }
 
 /*
+ * Handle SError interrupt that occurred in guest OS.
+ *
+ * The return value will be zero if the SEI was successfully handled
+ * and non-zero if handling is failed.
+ */
+int handle_guest_sei(void)
+{
+	int ret = -ENOENT;
+
+	if (IS_ENABLED(CONFIG_ACPI_APEI_SEI))
+		ret = ghes_notify_sei();
+
+	return ret;
+}
+
+/*
  * Dispatch a data abort to the relevant handler.
  */
 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 0/7] Support RAS virtualization in KVM
  2017-11-10 19:54 ` Dongjiu Geng
  (?)
@ 2017-11-14 16:00   ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> This series patches mainly do below things:
> 
> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>    KVM will will do a minimum simulation, there registers are simulated
>    to RAZ/WI in KVM.
> 2. Route synchronous External Abort exceptions from Non-secure EL0
>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>    otherwise jump to EL1 host kernel.
> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>    driver to parse the CPER record for SError which happened in the guest
> 4. Although we can use APEI driver to handle the guest SError, but not all
>    system support SEI notification, such as kernel-first. So here KVM will
>    also classify the Error through Exception Syndrome Register and do different
>    approaches according to Asynchronous Error Type

> 5. If the guest SError error is not propagated and not consumed, then KVM return
>    recoverable error status to user-space, user-space will specify the guest ESR

I thought we'd gone over this. There should be no RAS errors/notifications in
user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
the kernel has handled as much as it can. This hides the actual mechanisms the
kernel and firmware used.

User-space should not have to know how to handle RAS errors directly. This is a
service the operating system provides for it. This abstraction means the smae
user-space code is portable between x86, arm64, powerpc etc.

What if the firmware uses another notification method? User space should expect
the kernel to hide things like this from it.

If the kernel has no information to interpret a notification, how is user space
supposed to know?

I understand you are trying to work around your 'memory corruption at an unknown
address'[0] problem, but if the kernel can't know where this corrupt memory is
it should really reboot. What stops this corrupt data being swapped to disk?

Killing 'the thing' that was running at the time is not sufficient because we
don't know that this 'got' all the users of the corrupt memory. KSM can merge
pages between guests. This is the difference between the error persisting
forever killing off all the VMs one by one, and the corrupt page being silently
re-read from disk clearing the error.


>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.


Thanks,

James


[0] https://www.spinics.net/lists/arm-kernel/msg605345.html

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-14 16:00   ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> This series patches mainly do below things:
> 
> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>    KVM will will do a minimum simulation, there registers are simulated
>    to RAZ/WI in KVM.
> 2. Route synchronous External Abort exceptions from Non-secure EL0
>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>    otherwise jump to EL1 host kernel.
> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>    driver to parse the CPER record for SError which happened in the guest
> 4. Although we can use APEI driver to handle the guest SError, but not all
>    system support SEI notification, such as kernel-first. So here KVM will
>    also classify the Error through Exception Syndrome Register and do different
>    approaches according to Asynchronous Error Type

> 5. If the guest SError error is not propagated and not consumed, then KVM return
>    recoverable error status to user-space, user-space will specify the guest ESR

I thought we'd gone over this. There should be no RAS errors/notifications in
user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
the kernel has handled as much as it can. This hides the actual mechanisms the
kernel and firmware used.

User-space should not have to know how to handle RAS errors directly. This is a
service the operating system provides for it. This abstraction means the smae
user-space code is portable between x86, arm64, powerpc etc.

What if the firmware uses another notification method? User space should expect
the kernel to hide things like this from it.

If the kernel has no information to interpret a notification, how is user space
supposed to know?

I understand you are trying to work around your 'memory corruption at an unknown
address'[0] problem, but if the kernel can't know where this corrupt memory is
it should really reboot. What stops this corrupt data being swapped to disk?

Killing 'the thing' that was running at the time is not sufficient because we
don't know that this 'got' all the users of the corrupt memory. KSM can merge
pages between guests. This is the difference between the error persisting
forever killing off all the VMs one by one, and the corrupt page being silently
re-read from disk clearing the error.


>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.


Thanks,

James


[0] https://www.spinics.net/lists/arm-kernel/msg605345.html

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-14 16:00   ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3134 bytes --]

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> This series patches mainly do below things:
> 
> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>    KVM will will do a minimum simulation, there registers are simulated
>    to RAZ/WI in KVM.
> 2. Route synchronous External Abort exceptions from Non-secure EL0
>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>    otherwise jump to EL1 host kernel.
> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>    driver to parse the CPER record for SError which happened in the guest
> 4. Although we can use APEI driver to handle the guest SError, but not all
>    system support SEI notification, such as kernel-first. So here KVM will
>    also classify the Error through Exception Syndrome Register and do different
>    approaches according to Asynchronous Error Type

> 5. If the guest SError error is not propagated and not consumed, then KVM return
>    recoverable error status to user-space, user-space will specify the guest ESR

I thought we'd gone over this. There should be no RAS errors/notifications in
user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
the kernel has handled as much as it can. This hides the actual mechanisms the
kernel and firmware used.

User-space should not have to know how to handle RAS errors directly. This is a
service the operating system provides for it. This abstraction means the smae
user-space code is portable between x86, arm64, powerpc etc.

What if the firmware uses another notification method? User space should expect
the kernel to hide things like this from it.

If the kernel has no information to interpret a notification, how is user space
supposed to know?

I understand you are trying to work around your 'memory corruption at an unknown
address'[0] problem, but if the kernel can't know where this corrupt memory is
it should really reboot. What stops this corrupt data being swapped to disk?

Killing 'the thing' that was running at the time is not sufficient because we
don't know that this 'got' all the users of the corrupt memory. KSM can merge
pages between guests. This is the difference between the error persisting
forever killing off all the VMs one by one, and the corrupt page being silently
re-read from disk clearing the error.


>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.


Thanks,

James


[0] https://www.spinics.net/lists/arm-kernel/msg605345.html

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-11-10 19:54   ` Dongjiu Geng
  (?)
  (?)
@ 2017-11-14 16:00     ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: wuquanming, linux-doc, kvm, marc.zyngier, catalin.marinas,
	corbet, rjw, linux, linuxarm, linux-kernel, linux-acpi, bp,
	linux-arm-kernel, huangshaoyu, pbonzini, kvmarm, devel

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> If it is not RAS SError, directly inject virtual SError,
> which will keep the old way. If it is RAS SError, firstly
> let host ACPI module to handle it.

> For the ACPI handling,
> if the error address is invalid, APEI driver will not
> identify the address to hwpoison memory and can not notify
> guest to do the recovery.

The guest can't do any recover either. There is no recovery you can do without
some information about what the error is.

This is your memory corruption at an unknown address? We should reboot.

(I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
try and fix this. It makes some sense for polled or irq notifications, but not
SEA/SEI).


> In order to safe, KVM continues
> categorizing errors and handle it separately.

> If the RAS error is not propagated, let host user space to
> handle it. 

No. Host user space should not know anything about the kernel or platform RAS
support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
totally unmaintainable.

This thing needs to be portable. The kernel should handle the error, and report
any symptoms to user-space. e.g. 'this memory is gone'.

We shouldn't special case KVM.


> The reason is that sometimes we can only kill the
> guest effected application instead of panic whose guest OS.
> Host user space specifies a valid ESR and inject virtual
> SError, guest can just kill the current application if the
> non-consumed error coming from guest application.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Quanming Wu <wuquanming@huawei.com>

The last Signed-off-by should match the person posting the patch. It's a chain
of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
to credit Quanming Wu you can add CC and they can Ack/Review your patch.


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 7debb74..1afdc87 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>  	return arm_exit_handlers[hsr_ec];
>  }
>  
> +/**
> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
> + * @vcpu:	the VCPU pointer
> + *
> + * For RAS SError interrupt, firstly let host kernel handle it.
> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
> + */
> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
> +	unsigned int aet = esr & ESR_ELx_AET;
> +
> +	/*
> +	 * This is not RAS SError
> +	 */
> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}

> +	/* The host kernel may handle this abort. */
> +	handle_guest_sei();

This has to claim the SError as a notification. If APEI claims the error, KVM
doesn't need to do anything more. You ignore its return code.


> +
> +	/*
> +	 * In below two conditions, it will directly inject the
> +	 * virtual SError:
> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
> +	 * 2. It is Uncategorized SEI
> +	 */
> +	if (impdef_syndrome ||
> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}
> +
> +	switch (aet) {
> +	case ESR_ELx_AET_CE:	/* corrected error */
> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
> +		return 1;	/* continue processing the guest exit */

> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
> +		/*
> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
> +		 * error has not been propagated
> +		 */
> +		run->exit_reason = KVM_EXIT_EXCEPTION;
> +		run->ex.exception = ESR_ELx_EC_SERROR;
> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +		return 0;

We should not pass RAS notifications to user space. The kernel either handles
them, or it panics(). User space shouldn't even know if the kernel supports RAS
until it gets an MCEERR signal.

You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.

If we get a RAS SError and there are no CPER records or values in the ERR nodes,
we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


> +	default:
> +		/*
> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
> +		 * does not support to handle the SError.
> +		 */
> +		panic("This Asynchronous SError interrupt is dangerous, panic");
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>   * proper exit to userspace.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-14 16:00     ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: Dongjiu Geng
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> If it is not RAS SError, directly inject virtual SError,
> which will keep the old way. If it is RAS SError, firstly
> let host ACPI module to handle it.

> For the ACPI handling,
> if the error address is invalid, APEI driver will not
> identify the address to hwpoison memory and can not notify
> guest to do the recovery.

The guest can't do any recover either. There is no recovery you can do without
some information about what the error is.

This is your memory corruption at an unknown address? We should reboot.

(I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
try and fix this. It makes some sense for polled or irq notifications, but not
SEA/SEI).


> In order to safe, KVM continues
> categorizing errors and handle it separately.

> If the RAS error is not propagated, let host user space to
> handle it. 

No. Host user space should not know anything about the kernel or platform RAS
support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
totally unmaintainable.

This thing needs to be portable. The kernel should handle the error, and report
any symptoms to user-space. e.g. 'this memory is gone'.

We shouldn't special case KVM.


> The reason is that sometimes we can only kill the
> guest effected application instead of panic whose guest OS.
> Host user space specifies a valid ESR and inject virtual
> SError, guest can just kill the current application if the
> non-consumed error coming from guest application.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Quanming Wu <wuquanming@huawei.com>

The last Signed-off-by should match the person posting the patch. It's a chain
of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
to credit Quanming Wu you can add CC and they can Ack/Review your patch.


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 7debb74..1afdc87 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>  	return arm_exit_handlers[hsr_ec];
>  }
>  
> +/**
> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
> + * @vcpu:	the VCPU pointer
> + *
> + * For RAS SError interrupt, firstly let host kernel handle it.
> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
> + */
> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
> +	unsigned int aet = esr & ESR_ELx_AET;
> +
> +	/*
> +	 * This is not RAS SError
> +	 */
> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}

> +	/* The host kernel may handle this abort. */
> +	handle_guest_sei();

This has to claim the SError as a notification. If APEI claims the error, KVM
doesn't need to do anything more. You ignore its return code.


> +
> +	/*
> +	 * In below two conditions, it will directly inject the
> +	 * virtual SError:
> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
> +	 * 2. It is Uncategorized SEI
> +	 */
> +	if (impdef_syndrome ||
> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}
> +
> +	switch (aet) {
> +	case ESR_ELx_AET_CE:	/* corrected error */
> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
> +		return 1;	/* continue processing the guest exit */

> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
> +		/*
> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
> +		 * error has not been propagated
> +		 */
> +		run->exit_reason = KVM_EXIT_EXCEPTION;
> +		run->ex.exception = ESR_ELx_EC_SERROR;
> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +		return 0;

We should not pass RAS notifications to user space. The kernel either handles
them, or it panics(). User space shouldn't even know if the kernel supports RAS
until it gets an MCEERR signal.

You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.

If we get a RAS SError and there are no CPER records or values in the ERR nodes,
we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


> +	default:
> +		/*
> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
> +		 * does not support to handle the SError.
> +		 */
> +		panic("This Asynchronous SError interrupt is dangerous, panic");
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>   * proper exit to userspace.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-14 16:00     ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> If it is not RAS SError, directly inject virtual SError,
> which will keep the old way. If it is RAS SError, firstly
> let host ACPI module to handle it.

> For the ACPI handling,
> if the error address is invalid, APEI driver will not
> identify the address to hwpoison memory and can not notify
> guest to do the recovery.

The guest can't do any recover either. There is no recovery you can do without
some information about what the error is.

This is your memory corruption at an unknown address? We should reboot.

(I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
try and fix this. It makes some sense for polled or irq notifications, but not
SEA/SEI).


> In order to safe, KVM continues
> categorizing errors and handle it separately.

> If the RAS error is not propagated, let host user space to
> handle it. 

No. Host user space should not know anything about the kernel or platform RAS
support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
totally unmaintainable.

This thing needs to be portable. The kernel should handle the error, and report
any symptoms to user-space. e.g. 'this memory is gone'.

We shouldn't special case KVM.


> The reason is that sometimes we can only kill the
> guest effected application instead of panic whose guest OS.
> Host user space specifies a valid ESR and inject virtual
> SError, guest can just kill the current application if the
> non-consumed error coming from guest application.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
> Signed-off-by: Quanming Wu <wuquanming@huawei.com>

The last Signed-off-by should match the person posting the patch. It's a chain
of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
to credit Quanming Wu you can add CC and they can Ack/Review your patch.


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 7debb74..1afdc87 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>  	return arm_exit_handlers[hsr_ec];
>  }
>  
> +/**
> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
> + * @vcpu:	the VCPU pointer
> + *
> + * For RAS SError interrupt, firstly let host kernel handle it.
> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
> + */
> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
> +	unsigned int aet = esr & ESR_ELx_AET;
> +
> +	/*
> +	 * This is not RAS SError
> +	 */
> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}

> +	/* The host kernel may handle this abort. */
> +	handle_guest_sei();

This has to claim the SError as a notification. If APEI claims the error, KVM
doesn't need to do anything more. You ignore its return code.


> +
> +	/*
> +	 * In below two conditions, it will directly inject the
> +	 * virtual SError:
> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
> +	 * 2. It is Uncategorized SEI
> +	 */
> +	if (impdef_syndrome ||
> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}
> +
> +	switch (aet) {
> +	case ESR_ELx_AET_CE:	/* corrected error */
> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
> +		return 1;	/* continue processing the guest exit */

> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
> +		/*
> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
> +		 * error has not been propagated
> +		 */
> +		run->exit_reason = KVM_EXIT_EXCEPTION;
> +		run->ex.exception = ESR_ELx_EC_SERROR;
> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +		return 0;

We should not pass RAS notifications to user space. The kernel either handles
them, or it panics(). User space shouldn't even know if the kernel supports RAS
until it gets an MCEERR signal.

You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.

If we get a RAS SError and there are no CPER records or values in the ERR nodes,
we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


> +	default:
> +		/*
> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
> +		 * does not support to handle the SError.
> +		 */
> +		panic("This Asynchronous SError interrupt is dangerous, panic");
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>   * proper exit to userspace.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-14 16:00     ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-11-14 16:00 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 4886 bytes --]

Hi Dongjiu Geng,

On 10/11/17 19:54, Dongjiu Geng wrote:
> If it is not RAS SError, directly inject virtual SError,
> which will keep the old way. If it is RAS SError, firstly
> let host ACPI module to handle it.

> For the ACPI handling,
> if the error address is invalid, APEI driver will not
> identify the address to hwpoison memory and can not notify
> guest to do the recovery.

The guest can't do any recover either. There is no recovery you can do without
some information about what the error is.

This is your memory corruption at an unknown address? We should reboot.

(I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
try and fix this. It makes some sense for polled or irq notifications, but not
SEA/SEI).


> In order to safe, KVM continues
> categorizing errors and handle it separately.

> If the RAS error is not propagated, let host user space to
> handle it. 

No. Host user space should not know anything about the kernel or platform RAS
support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
totally unmaintainable.

This thing needs to be portable. The kernel should handle the error, and report
any symptoms to user-space. e.g. 'this memory is gone'.

We shouldn't special case KVM.


> The reason is that sometimes we can only kill the
> guest effected application instead of panic whose guest OS.
> Host user space specifies a valid ESR and inject virtual
> SError, guest can just kill the current application if the
> non-consumed error coming from guest application.
> 
> Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
> Signed-off-by: Quanming Wu <wuquanming(a)huawei.com>

The last Signed-off-by should match the person posting the patch. It's a chain
of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
to credit Quanming Wu you can add CC and they can Ack/Review your patch.


> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index 7debb74..1afdc87 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>  	return arm_exit_handlers[hsr_ec];
>  }
>  
> +/**
> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
> + * @vcpu:	the VCPU pointer
> + *
> + * For RAS SError interrupt, firstly let host kernel handle it.
> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
> + */
> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> +{
> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
> +	unsigned int aet = esr & ESR_ELx_AET;
> +
> +	/*
> +	 * This is not RAS SError
> +	 */
> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}

> +	/* The host kernel may handle this abort. */
> +	handle_guest_sei();

This has to claim the SError as a notification. If APEI claims the error, KVM
doesn't need to do anything more. You ignore its return code.


> +
> +	/*
> +	 * In below two conditions, it will directly inject the
> +	 * virtual SError:
> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
> +	 * 2. It is Uncategorized SEI
> +	 */
> +	if (impdef_syndrome ||
> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
> +		kvm_inject_vabt(vcpu);
> +		return 1;
> +	}
> +
> +	switch (aet) {
> +	case ESR_ELx_AET_CE:	/* corrected error */
> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
> +		return 1;	/* continue processing the guest exit */

> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
> +		/*
> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
> +		 * error has not been propagated
> +		 */
> +		run->exit_reason = KVM_EXIT_EXCEPTION;
> +		run->ex.exception = ESR_ELx_EC_SERROR;
> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +		return 0;

We should not pass RAS notifications to user space. The kernel either handles
them, or it panics(). User space shouldn't even know if the kernel supports RAS
until it gets an MCEERR signal.

You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.

If we get a RAS SError and there are no CPER records or values in the ERR nodes,
we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


> +	default:
> +		/*
> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
> +		 * does not support to handle the SError.
> +		 */
> +		panic("This Asynchronous SError interrupt is dangerous, panic");
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>   * proper exit to userspace.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 0/7] Support RAS virtualization in KVM
  2017-11-14 16:00   ` James Morse
  (?)
  (?)
@ 2017-11-15 11:06     ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:06 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,
   Thank you very much for your comments and review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> This series patches mainly do below things:
>>
>> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>>    KVM will will do a minimum simulation, there registers are simulated
>>    to RAZ/WI in KVM.
>> 2. Route synchronous External Abort exceptions from Non-secure EL0
>>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>>    otherwise jump to EL1 host kernel.
>> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>>    driver to parse the CPER record for SError which happened in the guest
>> 4. Although we can use APEI driver to handle the guest SError, but not all
>>    system support SEI notification, such as kernel-first. So here KVM will
>>    also classify the Error through Exception Syndrome Register and do different
>>    approaches according to Asynchronous Error Type
> 
>> 5. If the guest SError error is not propagated and not consumed, then KVM return
>>    recoverable error status to user-space, user-space will specify the guest ESR
> 
> I thought we'd gone over this. There should be no RAS errors/notifications in
> user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
> the kernel has handled as much as it can. This hides the actual mechanisms the
> kernel and firmware used.

Yes, I understand it.
For guest SError, if it is not  propagated and not consumed by PE, and the error address recorded by firmware is not accurate,
what is your suggestion about this scenario ?

I check again the comments in [0](as shown below), you ever suggest system panic.

-----------------------------------------------------------------
"I think in this scenario your firmware should describe a memory-error with an
unknown address. (i.e. don't set the 'physical address valid' bit in CPER's
'Table 275 Memory Error Record'). When Linux gets one of these, it should
panic(): We know some memory is corrupt, we don't know where it is"
----------------------------------------------------------------

but I think it is not better, you ever have below concern in [0]
"The fault may be in the page tables belonging to the guest kernel,
even worse they may belong to they hypervisor's stage2 page tables"

If it is in the page tables, killing the APP, the memory will be free. if there is another application
will use this error address again, trigger another SError?


you know the error still not consumed by PE , so we can isolated it by killing it.
lets discuses the host EL0, if host El0 APP happen SError and error not consumed by the PE.
do you mean we also panic host OS?


> 
> User-space should not have to know how to handle RAS errors directly. This is a
> service the operating system provides for it. This abstraction means the smae
> user-space code is portable between x86, arm64, powerpc etc.
> 
> What if the firmware uses another notification method? User space should expect
> the kernel to hide things like this from it.
> 
> If the kernel has no information to interpret a notification, how is user space
> supposed to know?
> 
> I understand you are trying to work around your 'memory corruption at an unknown
> address'[0] problem, but if the kernel can't know where this corrupt memory is
> it should really reboot. What stops this corrupt data being swapped to disk?
> 
> Killing 'the thing' that was running at the time is not sufficient because we
> don't know that this 'got' all the users of the corrupt memory. KSM can merge

I think if we better using the ESB to isolate the error between EL0 and EL1, isolate the error between different guest.
then the error will be isolate to El0 application if it happen in El0. When KSM running, the ESB can synchronize
the error out instead of spread the error to other guests.



> pages between guests. This is the difference between the error persisting
> forever killing off all the VMs one by one, and the corrupt page being silently
> re-read from disk clearing the error.
> 
> 
>>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg605345.html
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-15 11:06     ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:06 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,
   Thank you very much for your comments and review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> This series patches mainly do below things:
>>
>> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>>    KVM will will do a minimum simulation, there registers are simulated
>>    to RAZ/WI in KVM.
>> 2. Route synchronous External Abort exceptions from Non-secure EL0
>>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>>    otherwise jump to EL1 host kernel.
>> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>>    driver to parse the CPER record for SError which happened in the guest
>> 4. Although we can use APEI driver to handle the guest SError, but not all
>>    system support SEI notification, such as kernel-first. So here KVM will
>>    also classify the Error through Exception Syndrome Register and do different
>>    approaches according to Asynchronous Error Type
> 
>> 5. If the guest SError error is not propagated and not consumed, then KVM return
>>    recoverable error status to user-space, user-space will specify the guest ESR
> 
> I thought we'd gone over this. There should be no RAS errors/notifications in
> user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
> the kernel has handled as much as it can. This hides the actual mechanisms the
> kernel and firmware used.

Yes, I understand it.
For guest SError, if it is not  propagated and not consumed by PE, and the error address recorded by firmware is not accurate,
what is your suggestion about this scenario ?

I check again the comments in [0](as shown below), you ever suggest system panic.

-----------------------------------------------------------------
"I think in this scenario your firmware should describe a memory-error with an
unknown address. (i.e. don't set the 'physical address valid' bit in CPER's
'Table 275 Memory Error Record'). When Linux gets one of these, it should
panic(): We know some memory is corrupt, we don't know where it is"
----------------------------------------------------------------

but I think it is not better, you ever have below concern in [0]
"The fault may be in the page tables belonging to the guest kernel,
even worse they may belong to they hypervisor's stage2 page tables"

If it is in the page tables, killing the APP, the memory will be free. if there is another application
will use this error address again, trigger another SError?


you know the error still not consumed by PE , so we can isolated it by killing it.
lets discuses the host EL0, if host El0 APP happen SError and error not consumed by the PE.
do you mean we also panic host OS?


> 
> User-space should not have to know how to handle RAS errors directly. This is a
> service the operating system provides for it. This abstraction means the smae
> user-space code is portable between x86, arm64, powerpc etc.
> 
> What if the firmware uses another notification method? User space should expect
> the kernel to hide things like this from it.
> 
> If the kernel has no information to interpret a notification, how is user space
> supposed to know?
> 
> I understand you are trying to work around your 'memory corruption at an unknown
> address'[0] problem, but if the kernel can't know where this corrupt memory is
> it should really reboot. What stops this corrupt data being swapped to disk?
> 
> Killing 'the thing' that was running at the time is not sufficient because we
> don't know that this 'got' all the users of the corrupt memory. KSM can merge

I think if we better using the ESB to isolate the error between EL0 and EL1, isolate the error between different guest.
then the error will be isolate to El0 application if it happen in El0. When KSM running, the ESB can synchronize
the error out instead of spread the error to other guests.



> pages between guests. This is the difference between the error persisting
> forever killing off all the VMs one by one, and the corrupt page being silently
> re-read from disk clearing the error.
> 
> 
>>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg605345.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-15 11:06     ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,
   Thank you very much for your comments and review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> This series patches mainly do below things:
>>
>> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>>    KVM will will do a minimum simulation, there registers are simulated
>>    to RAZ/WI in KVM.
>> 2. Route synchronous External Abort exceptions from Non-secure EL0
>>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>>    otherwise jump to EL1 host kernel.
>> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>>    driver to parse the CPER record for SError which happened in the guest
>> 4. Although we can use APEI driver to handle the guest SError, but not all
>>    system support SEI notification, such as kernel-first. So here KVM will
>>    also classify the Error through Exception Syndrome Register and do different
>>    approaches according to Asynchronous Error Type
> 
>> 5. If the guest SError error is not propagated and not consumed, then KVM return
>>    recoverable error status to user-space, user-space will specify the guest ESR
> 
> I thought we'd gone over this. There should be no RAS errors/notifications in
> user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
> the kernel has handled as much as it can. This hides the actual mechanisms the
> kernel and firmware used.

Yes, I understand it.
For guest SError, if it is not  propagated and not consumed by PE, and the error address recorded by firmware is not accurate,
what is your suggestion about this scenario ?

I check again the comments in [0](as shown below), you ever suggest system panic.

-----------------------------------------------------------------
"I think in this scenario your firmware should describe a memory-error with an
unknown address. (i.e. don't set the 'physical address valid' bit in CPER's
'Table 275 Memory Error Record'). When Linux gets one of these, it should
panic(): We know some memory is corrupt, we don't know where it is"
----------------------------------------------------------------

but I think it is not better, you ever have below concern in [0]
"The fault may be in the page tables belonging to the guest kernel,
even worse they may belong to they hypervisor's stage2 page tables"

If it is in the page tables, killing the APP, the memory will be free. if there is another application
will use this error address again, trigger another SError?


you know the error still not consumed by PE , so we can isolated it by killing it.
lets discuses the host EL0, if host El0 APP happen SError and error not consumed by the PE.
do you mean we also panic host OS?


> 
> User-space should not have to know how to handle RAS errors directly. This is a
> service the operating system provides for it. This abstraction means the smae
> user-space code is portable between x86, arm64, powerpc etc.
> 
> What if the firmware uses another notification method? User space should expect
> the kernel to hide things like this from it.
> 
> If the kernel has no information to interpret a notification, how is user space
> supposed to know?
> 
> I understand you are trying to work around your 'memory corruption at an unknown
> address'[0] problem, but if the kernel can't know where this corrupt memory is
> it should really reboot. What stops this corrupt data being swapped to disk?
> 
> Killing 'the thing' that was running at the time is not sufficient because we
> don't know that this 'got' all the users of the corrupt memory. KSM can merge

I think if we better using the ESB to isolate the error between EL0 and EL1, isolate the error between different guest.
then the error will be isolate to El0 application if it happen in El0. When KSM running, the ESB can synchronize
the error out instead of spread the error to other guests.



> pages between guests. This is the difference between the error persisting
> forever killing off all the VMs one by one, and the corrupt page being silently
> re-read from disk clearing the error.
> 
> 
>>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg605345.html
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 0/7] Support RAS virtualization in KVM
@ 2017-11-15 11:06     ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:06 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 4974 bytes --]

Hi James,
   Thank you very much for your comments and review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> This series patches mainly do below things:
>>
>> 1. Trap RAS ERR* registers Accesses to EL2 from Non-secure EL1,
>>    KVM will will do a minimum simulation, there registers are simulated
>>    to RAZ/WI in KVM.
>> 2. Route synchronous External Abort exceptions from Non-secure EL0
>>    and EL1 to EL2. When exception EL3 routing is enabled by firmware,
>>    system will trap to EL3 firmware instead of EL2 KVM, then firmware
>>    judges whether El2 routing is enabled, if enabled, jump to EL2 KVM, 
>>    otherwise jump to EL1 host kernel.
>> 3. Enable APEI ARv8 SEI notification to parse the CPER records for SError
>>    in the ACPI GHES driver, KVM will call handle_guest_sei() to let ACPI
>>    driver to parse the CPER record for SError which happened in the guest
>> 4. Although we can use APEI driver to handle the guest SError, but not all
>>    system support SEI notification, such as kernel-first. So here KVM will
>>    also classify the Error through Exception Syndrome Register and do different
>>    approaches according to Asynchronous Error Type
> 
>> 5. If the guest SError error is not propagated and not consumed, then KVM return
>>    recoverable error status to user-space, user-space will specify the guest ESR
> 
> I thought we'd gone over this. There should be no RAS errors/notifications in
> user space. Only the symptoms should be sent, using the SIGBUS_MCEERR_A{O,R} if
> the kernel has handled as much as it can. This hides the actual mechanisms the
> kernel and firmware used.

Yes, I understand it.
For guest SError, if it is not  propagated and not consumed by PE, and the error address recorded by firmware is not accurate,
what is your suggestion about this scenario ?

I check again the comments in [0](as shown below), you ever suggest system panic.

-----------------------------------------------------------------
"I think in this scenario your firmware should describe a memory-error with an
unknown address. (i.e. don't set the 'physical address valid' bit in CPER's
'Table 275 Memory Error Record'). When Linux gets one of these, it should
panic(): We know some memory is corrupt, we don't know where it is"
----------------------------------------------------------------

but I think it is not better, you ever have below concern in [0]
"The fault may be in the page tables belonging to the guest kernel,
even worse they may belong to they hypervisor's stage2 page tables"

If it is in the page tables, killing the APP, the memory will be free. if there is another application
will use this error address again, trigger another SError?


you know the error still not consumed by PE , so we can isolated it by killing it.
lets discuses the host EL0, if host El0 APP happen SError and error not consumed by the PE.
do you mean we also panic host OS?


> 
> User-space should not have to know how to handle RAS errors directly. This is a
> service the operating system provides for it. This abstraction means the smae
> user-space code is portable between x86, arm64, powerpc etc.
> 
> What if the firmware uses another notification method? User space should expect
> the kernel to hide things like this from it.
> 
> If the kernel has no information to interpret a notification, how is user space
> supposed to know?
> 
> I understand you are trying to work around your 'memory corruption at an unknown
> address'[0] problem, but if the kernel can't know where this corrupt memory is
> it should really reboot. What stops this corrupt data being swapped to disk?
> 
> Killing 'the thing' that was running at the time is not sufficient because we
> don't know that this 'got' all the users of the corrupt memory. KSM can merge

I think if we better using the ESB to isolate the error between EL0 and EL1, isolate the error between different guest.
then the error will be isolate to El0 application if it happen in El0. When KSM running, the ESB can synchronize
the error out instead of spread the error to other guests.



> pages between guests. This is the difference between the error persisting
> forever killing off all the VMs one by one, and the corrupt page being silently
> re-read from disk clearing the error.
> 
> 
>>    and inject a virtual SError. For other Asynchronous Error Type, KVM directly
>>    injects virtual SError with IMPLEMENTATION DEFINED ESR or KVM is panic if the
>>    error is fatal. In the RAS extension, guest virtual ESR must be set, because
>>    all-zero  means 'RAS error: Uncategorized' instead of 'no valid ISS', so set
>>    this ESR to IMPLEMENTATION DEFINED by default if user space does not specify it.
> 
> 
> Thanks,
> 
> James
> 
> 
> [0] https://www.spinics.net/lists/arm-kernel/msg605345.html
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-11-14 16:00     ` James Morse
  (?)
  (?)
@ 2017-11-15 11:29       ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:29 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,

   Thanks a lot for the review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> If it is not RAS SError, directly inject virtual SError,
>> which will keep the old way. If it is RAS SError, firstly
>> let host ACPI module to handle it.
> 
>> For the ACPI handling,
>> if the error address is invalid, APEI driver will not
>> identify the address to hwpoison memory and can not notify
>> guest to do the recovery.
> 
> The guest can't do any recover either. There is no recovery you can do without
> some information about what the error is.
> 
> This is your memory corruption at an unknown address? We should reboot.
> 
> (I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
> try and fix this. It makes some sense for polled or irq notifications, but not
> SEA/SEI).
> 
> 
>> In order to safe, KVM continues
>> categorizing errors and handle it separately.
> 
>> If the RAS error is not propagated, let host user space to
>> handle it. 
> 
> No. Host user space should not know anything about the kernel or platform RAS
> support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
> totally unmaintainable.

Here I have two question:
(1) If the AET(Asynchronous Error Type) is Recoverable error (UER), do you mean we also reboot or panic?
(2) what is the chance to set guest ESR for Qemu?  here I return a error code to Qemu. when Qemu get this error return,
    it will specify guest ESR and inject the abort. here if KVM does not return error to Qemu, Qemu will do
    not know when to set the guest ESR value and inject abort.


> 
> This thing needs to be portable. The kernel should handle the error, and report
> any symptoms to user-space. e.g. 'this memory is gone'.
> 
> We shouldn't special case KVM.
> 
> 
>> The reason is that sometimes we can only kill the
>> guest effected application instead of panic whose guest OS.
>> Host user space specifies a valid ESR and inject virtual
>> SError, guest can just kill the current application if the
>> non-consumed error coming from guest application.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Quanming Wu <wuquanming@huawei.com>
> 
> The last Signed-off-by should match the person posting the patch. It's a chain
> of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
> to credit Quanming Wu you can add CC and they can Ack/Review your patch.

Ok, got it. thanks a lot for your suggestion.


> 
> 
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 7debb74..1afdc87 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>>  	return arm_exit_handlers[hsr_ec];
>>  }
>>  
>> +/**
>> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
>> + * @vcpu:	the VCPU pointer
>> + *
>> + * For RAS SError interrupt, firstly let host kernel handle it.
>> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
>> + */
>> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
>> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
>> +	unsigned int aet = esr & ESR_ELx_AET;
>> +
>> +	/*
>> +	 * This is not RAS SError
>> +	 */
>> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
> 
>> +	/* The host kernel may handle this abort. */
>> +	handle_guest_sei();
> 
> This has to claim the SError as a notification. If APEI claims the error, KVM
> doesn't need to do anything more. You ignore its return code.

Thanks for the pointing out.
I will check the return code, if it return success, KVM doesn't need to do anything more,
otherwise, continue run.

> 
> 
>> +
>> +	/*
>> +	 * In below two conditions, it will directly inject the
>> +	 * virtual SError:
>> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
>> +	 * 2. It is Uncategorized SEI
>> +	 */
>> +	if (impdef_syndrome ||
>> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
>> +
>> +	switch (aet) {
>> +	case ESR_ELx_AET_CE:	/* corrected error */
>> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
>> +		return 1;	/* continue processing the guest exit */
> 
>> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
>> +		/*
>> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> 
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.

Now I rely on this error return to let Qemu set guest ESR, otherwise user space will do not know when to set the guest ESR.
If so, how and when we told user space(Qemu) to set the guest ESR and inject abort?


> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
> 
>> +	default:
>> +		/*
>> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
>> +		 * does not support to handle the SError.
>> +		 */
>> +		panic("This Asynchronous SError interrupt is dangerous, panic");
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  /*
>>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>   * proper exit to userspace.
> 
> 
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-15 11:29       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:29 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,

   Thanks a lot for the review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> If it is not RAS SError, directly inject virtual SError,
>> which will keep the old way. If it is RAS SError, firstly
>> let host ACPI module to handle it.
> 
>> For the ACPI handling,
>> if the error address is invalid, APEI driver will not
>> identify the address to hwpoison memory and can not notify
>> guest to do the recovery.
> 
> The guest can't do any recover either. There is no recovery you can do without
> some information about what the error is.
> 
> This is your memory corruption at an unknown address? We should reboot.
> 
> (I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
> try and fix this. It makes some sense for polled or irq notifications, but not
> SEA/SEI).
> 
> 
>> In order to safe, KVM continues
>> categorizing errors and handle it separately.
> 
>> If the RAS error is not propagated, let host user space to
>> handle it. 
> 
> No. Host user space should not know anything about the kernel or platform RAS
> support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
> totally unmaintainable.

Here I have two question:
(1) If the AET(Asynchronous Error Type) is Recoverable error (UER), do you mean we also reboot or panic?
(2) what is the chance to set guest ESR for Qemu?  here I return a error code to Qemu. when Qemu get this error return,
    it will specify guest ESR and inject the abort. here if KVM does not return error to Qemu, Qemu will do
    not know when to set the guest ESR value and inject abort.


> 
> This thing needs to be portable. The kernel should handle the error, and report
> any symptoms to user-space. e.g. 'this memory is gone'.
> 
> We shouldn't special case KVM.
> 
> 
>> The reason is that sometimes we can only kill the
>> guest effected application instead of panic whose guest OS.
>> Host user space specifies a valid ESR and inject virtual
>> SError, guest can just kill the current application if the
>> non-consumed error coming from guest application.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Quanming Wu <wuquanming@huawei.com>
> 
> The last Signed-off-by should match the person posting the patch. It's a chain
> of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
> to credit Quanming Wu you can add CC and they can Ack/Review your patch.

Ok, got it. thanks a lot for your suggestion.


> 
> 
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 7debb74..1afdc87 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>>  	return arm_exit_handlers[hsr_ec];
>>  }
>>  
>> +/**
>> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
>> + * @vcpu:	the VCPU pointer
>> + *
>> + * For RAS SError interrupt, firstly let host kernel handle it.
>> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
>> + */
>> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
>> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
>> +	unsigned int aet = esr & ESR_ELx_AET;
>> +
>> +	/*
>> +	 * This is not RAS SError
>> +	 */
>> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
> 
>> +	/* The host kernel may handle this abort. */
>> +	handle_guest_sei();
> 
> This has to claim the SError as a notification. If APEI claims the error, KVM
> doesn't need to do anything more. You ignore its return code.

Thanks for the pointing out.
I will check the return code, if it return success, KVM doesn't need to do anything more,
otherwise, continue run.

> 
> 
>> +
>> +	/*
>> +	 * In below two conditions, it will directly inject the
>> +	 * virtual SError:
>> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
>> +	 * 2. It is Uncategorized SEI
>> +	 */
>> +	if (impdef_syndrome ||
>> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
>> +
>> +	switch (aet) {
>> +	case ESR_ELx_AET_CE:	/* corrected error */
>> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
>> +		return 1;	/* continue processing the guest exit */
> 
>> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
>> +		/*
>> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> 
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.

Now I rely on this error return to let Qemu set guest ESR, otherwise user space will do not know when to set the guest ESR.
If so, how and when we told user space(Qemu) to set the guest ESR and inject abort?


> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
> 
>> +	default:
>> +		/*
>> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
>> +		 * does not support to handle the SError.
>> +		 */
>> +		panic("This Asynchronous SError interrupt is dangerous, panic");
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  /*
>>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>   * proper exit to userspace.
> 
> 
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-15 11:29       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

   Thanks a lot for the review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> If it is not RAS SError, directly inject virtual SError,
>> which will keep the old way. If it is RAS SError, firstly
>> let host ACPI module to handle it.
> 
>> For the ACPI handling,
>> if the error address is invalid, APEI driver will not
>> identify the address to hwpoison memory and can not notify
>> guest to do the recovery.
> 
> The guest can't do any recover either. There is no recovery you can do without
> some information about what the error is.
> 
> This is your memory corruption at an unknown address? We should reboot.
> 
> (I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
> try and fix this. It makes some sense for polled or irq notifications, but not
> SEA/SEI).
> 
> 
>> In order to safe, KVM continues
>> categorizing errors and handle it separately.
> 
>> If the RAS error is not propagated, let host user space to
>> handle it. 
> 
> No. Host user space should not know anything about the kernel or platform RAS
> support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
> totally unmaintainable.

Here I have two question:
(1) If the AET(Asynchronous Error Type) is Recoverable error (UER), do you mean we also reboot or panic?
(2) what is the chance to set guest ESR for Qemu?  here I return a error code to Qemu. when Qemu get this error return,
    it will specify guest ESR and inject the abort. here if KVM does not return error to Qemu, Qemu will do
    not know when to set the guest ESR value and inject abort.


> 
> This thing needs to be portable. The kernel should handle the error, and report
> any symptoms to user-space. e.g. 'this memory is gone'.
> 
> We shouldn't special case KVM.
> 
> 
>> The reason is that sometimes we can only kill the
>> guest effected application instead of panic whose guest OS.
>> Host user space specifies a valid ESR and inject virtual
>> SError, guest can just kill the current application if the
>> non-consumed error coming from guest application.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com>
>> Signed-off-by: Quanming Wu <wuquanming@huawei.com>
> 
> The last Signed-off-by should match the person posting the patch. It's a chain
> of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
> to credit Quanming Wu you can add CC and they can Ack/Review your patch.

Ok, got it. thanks a lot for your suggestion.


> 
> 
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 7debb74..1afdc87 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>>  	return arm_exit_handlers[hsr_ec];
>>  }
>>  
>> +/**
>> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
>> + * @vcpu:	the VCPU pointer
>> + *
>> + * For RAS SError interrupt, firstly let host kernel handle it.
>> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
>> + */
>> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
>> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
>> +	unsigned int aet = esr & ESR_ELx_AET;
>> +
>> +	/*
>> +	 * This is not RAS SError
>> +	 */
>> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
> 
>> +	/* The host kernel may handle this abort. */
>> +	handle_guest_sei();
> 
> This has to claim the SError as a notification. If APEI claims the error, KVM
> doesn't need to do anything more. You ignore its return code.

Thanks for the pointing out.
I will check the return code, if it return success, KVM doesn't need to do anything more,
otherwise, continue run.

> 
> 
>> +
>> +	/*
>> +	 * In below two conditions, it will directly inject the
>> +	 * virtual SError:
>> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
>> +	 * 2. It is Uncategorized SEI
>> +	 */
>> +	if (impdef_syndrome ||
>> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
>> +
>> +	switch (aet) {
>> +	case ESR_ELx_AET_CE:	/* corrected error */
>> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
>> +		return 1;	/* continue processing the guest exit */
> 
>> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
>> +		/*
>> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> 
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.

Now I rely on this error return to let Qemu set guest ESR, otherwise user space will do not know when to set the guest ESR.
If so, how and when we told user space(Qemu) to set the guest ESR and inject abort?


> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
> 
>> +	default:
>> +		/*
>> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
>> +		 * does not support to handle the SError.
>> +		 */
>> +		panic("This Asynchronous SError interrupt is dangerous, panic");
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  /*
>>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>   * proper exit to userspace.
> 
> 
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-11-15 11:29       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-11-15 11:29 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 6040 bytes --]

Hi James,

   Thanks a lot for the review.

On 2017/11/15 0:00, James Morse wrote:
> Hi Dongjiu Geng,
> 
> On 10/11/17 19:54, Dongjiu Geng wrote:
>> If it is not RAS SError, directly inject virtual SError,
>> which will keep the old way. If it is RAS SError, firstly
>> let host ACPI module to handle it.
> 
>> For the ACPI handling,
>> if the error address is invalid, APEI driver will not
>> identify the address to hwpoison memory and can not notify
>> guest to do the recovery.
> 
> The guest can't do any recover either. There is no recovery you can do without
> some information about what the error is.
> 
> This is your memory corruption at an unknown address? We should reboot.
> 
> (I agree memory_failure.c's::me_kernel() is ignoring kernel errors, we should
> try and fix this. It makes some sense for polled or irq notifications, but not
> SEA/SEI).
> 
> 
>> In order to safe, KVM continues
>> categorizing errors and handle it separately.
> 
>> If the RAS error is not propagated, let host user space to
>> handle it. 
> 
> No. Host user space should not know anything about the kernel or platform RAS
> support. Doing so creates an ABI link between EL3 firmware and Qemu. This is
> totally unmaintainable.

Here I have two question:
(1) If the AET(Asynchronous Error Type) is Recoverable error (UER), do you mean we also reboot or panic?
(2) what is the chance to set guest ESR for Qemu?  here I return a error code to Qemu. when Qemu get this error return,
    it will specify guest ESR and inject the abort. here if KVM does not return error to Qemu, Qemu will do
    not know when to set the guest ESR value and inject abort.


> 
> This thing needs to be portable. The kernel should handle the error, and report
> any symptoms to user-space. e.g. 'this memory is gone'.
> 
> We shouldn't special case KVM.
> 
> 
>> The reason is that sometimes we can only kill the
>> guest effected application instead of panic whose guest OS.
>> Host user space specifies a valid ESR and inject virtual
>> SError, guest can just kill the current application if the
>> non-consumed error coming from guest application.
>>
>> Signed-off-by: Dongjiu Geng <gengdongjiu(a)huawei.com>
>> Signed-off-by: Quanming Wu <wuquanming(a)huawei.com>
> 
> The last Signed-off-by should match the person posting the patch. It's a chain
> of custody for GPL-signoff purposes, not a 'partially-written-by'. If you want
> to credit Quanming Wu you can add CC and they can Ack/Review your patch.

Ok, got it. thanks a lot for your suggestion.


> 
> 
>> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
>> index 7debb74..1afdc87 100644
>> --- a/arch/arm64/kvm/handle_exit.c
>> +++ b/arch/arm64/kvm/handle_exit.c
>> @@ -178,6 +179,66 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
>>  	return arm_exit_handlers[hsr_ec];
>>  }
>>  
>> +/**
>> + * kvm_handle_guest_sei - handles SError interrupt or asynchronous aborts
>> + * @vcpu:	the VCPU pointer
>> + *
>> + * For RAS SError interrupt, firstly let host kernel handle it.
>> + * If the AET is [ESR_ELx_AET_UER], then let user space handle it,
>> + */
>> +static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> +{
>> +	unsigned int esr = kvm_vcpu_get_hsr(vcpu);
>> +	bool impdef_syndrome =  esr & ESR_ELx_ISV;	/* aka IDS */
>> +	unsigned int aet = esr & ESR_ELx_AET;
>> +
>> +	/*
>> +	 * This is not RAS SError
>> +	 */
>> +	if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
> 
>> +	/* The host kernel may handle this abort. */
>> +	handle_guest_sei();
> 
> This has to claim the SError as a notification. If APEI claims the error, KVM
> doesn't need to do anything more. You ignore its return code.

Thanks for the pointing out.
I will check the return code, if it return success, KVM doesn't need to do anything more,
otherwise, continue run.

> 
> 
>> +
>> +	/*
>> +	 * In below two conditions, it will directly inject the
>> +	 * virtual SError:
>> +	 * 1. The Syndrome is IMPLEMENTATION DEFINED
>> +	 * 2. It is Uncategorized SEI
>> +	 */
>> +	if (impdef_syndrome ||
>> +		((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
>> +		kvm_inject_vabt(vcpu);
>> +		return 1;
>> +	}
>> +
>> +	switch (aet) {
>> +	case ESR_ELx_AET_CE:	/* corrected error */
>> +	case ESR_ELx_AET_UEO:	/* restartable error, not yet consumed */
>> +		return 1;	/* continue processing the guest exit */
> 
>> +	case ESR_ELx_AET_UER:	/* The error has not been propagated */
>> +		/*
>> +		 * Userspace only handle the guest SError Interrupt(SEI) if the
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> 
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.

Now I rely on this error return to let Qemu set guest ESR, otherwise user space will do not know when to set the guest ESR.
If so, how and when we told user space(Qemu) to set the guest ESR and inject abort?


> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
> 
>> +	default:
>> +		/*
>> +		 * Until now, the CPU supports RAS and SEI is fatal, or host
>> +		 * does not support to handle the SError.
>> +		 */
>> +		panic("This Asynchronous SError interrupt is dangerous, panic");
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>  /*
>>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>   * proper exit to userspace.
> 
> 
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-11-14 16:00     ` James Morse
  (?)
  (?)
@ 2017-12-06 10:26       ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-06 10:26 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm


On 2017/11/15 0:00, James Morse wrote:
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.
> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

Hi james,
  sorry to disturb you!

  do you think whether we need to set the guest ESR by user space?  if need, I need to
notify user space that there is a SError happen and need to set ESR for guest in some place of
KVM. so here I return a error code to user space. you mean we should not pass RAS notifications
to user space, so could you give some suggestion how to notify user space to set guest ESR.

Thanks a lot in advance.


> 
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-06 10:26       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-06 10:26 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm


On 2017/11/15 0:00, James Morse wrote:
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.
> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

Hi james,
  sorry to disturb you!

  do you think whether we need to set the guest ESR by user space?  if need, I need to
notify user space that there is a SError happen and need to set ESR for guest in some place of
KVM. so here I return a error code to user space. you mean we should not pass RAS notifications
to user space, so could you give some suggestion how to notify user space to set guest ESR.

Thanks a lot in advance.


> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-06 10:26       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-06 10:26 UTC (permalink / raw)
  To: linux-arm-kernel


On 2017/11/15 0:00, James Morse wrote:
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.
> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

Hi james,
  sorry to disturb you!

  do you think whether we need to set the guest ESR by user space?  if need, I need to
notify user space that there is a SError happen and need to set ESR for guest in some place of
KVM. so here I return a error code to user space. you mean we should not pass RAS notifications
to user space, so could you give some suggestion how to notify user space to set guest ESR.

Thanks a lot in advance.


> 
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-06 10:26       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-06 10:26 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 1176 bytes --]


On 2017/11/15 0:00, James Morse wrote:
>> +		 * error has not been propagated
>> +		 */
>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +		return 0;
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> until it gets an MCEERR signal.
> 
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
> 
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

Hi james,
  sorry to disturb you!

  do you think whether we need to set the guest ESR by user space?  if need, I need to
notify user space that there is a SError happen and need to set ESR for guest in some place of
KVM. so here I return a error code to user space. you mean we should not pass RAS notifications
to user space, so could you give some suggestion how to notify user space to set guest ESR.

Thanks a lot in advance.


> 
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-06 10:26       ` gengdongjiu
  (?)
@ 2017-12-06 19:04         ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-12-06 19:04 UTC (permalink / raw)
  To: gengdongjiu
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi gengdongjiu,

On 06/12/17 10:26, gengdongjiu wrote:
> On 2017/11/15 0:00, James Morse wrote:
>>> +		 * error has not been propagated
>>> +		 */
>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +		return 0;
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>> until it gets an MCEERR signal.
>>
>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>
>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

> do you think whether we need to set the guest ESR by user space?  if need, I need to
> notify user space that there is a SError happen and need to set ESR for guest in some place of
> KVM.

I think you are still coming from a world where user-space gets raw RAS
notifications via KVM. This should not happen because the notification method is
private to firmware and the kernel. KVM is just in the way when a guest is running.

Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.

The kernel RAS code may signal user-space with the symptoms of the error, and
user-space may decided to generate a new RAS notification for the guest.

This should function in exactly the same way, regardless of which notification
method is in use between the kernel and firmware. (its the only way to make this
future-proof).

Which notification user-space chooses to use entirely depends on what (if
anything) it advertised to the guest in the HEST. User-space has to be in
control of triggering any SError, not just overriding the ESR when KVM has
decided it wants to kill the guest.


> so here I return a error code to user space. you mean we should not pass RAS notifications
> to user space, so could you give some suggestion how to notify user space to set guest ESR.

KVM shouldn't give the guest an SError when it takes a RAS notification, it
should pass the notification to the kernel RAS code. It only needs to 'fall
through' to some default cause if both APEI and kernel-first deny-all-knowledge
of this notification.


The end-to-end flow is then (assuming no-VHE):
(1)An error occurs, taking the CPU to EL3.
EL3: triage the error, generate CPER, notify the OS
EL2: KVM takes the notification, exits the guest, returns to host EL1.
EL1: KVM handle_exit() calls APEI to handle the error.
This is the end of KVMs involvement in RAS - its just plumbing.

(2)APEI processes the CPER records and signals affected processes.
If KVM's user-space is affected, KVM will spot the pending signal when it goes
to re-enter the guest, and exit to user-space instead.
Qemu takes the SIGBUS_MCEERR_A{O,R}.

(3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
records (in memory only Qemu knows about), then drives the KVM API to make the
appropriate notification appear.


(1) only happens if the guest was running when the error arrived. GHES has ~4
flavours of IRQ which may be used to describe corruption in guest memory. Steps
(2) and (3) are exactly the same in this case.

Qemu may decide to trigger RAS errors all by itself, (probably for testing and
debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.


This way platform-firmware/host-kernel can use kernel-first or firmware-first
with any of the notifications, independently from Qemu/guest-kernel making a
different kernel-first or firmware-first with different notifications.

Passing information out of KVM breaks this, forcing Qemu to know about the
mechanism platform-firmware is using.


We need to tackle (1) and (3) separately. For (3) we need some API that lets
Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
a way of migrating pending SError yet... which is where I got stuck last time I
was looking at this.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-06 19:04         ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-12-06 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi gengdongjiu,

On 06/12/17 10:26, gengdongjiu wrote:
> On 2017/11/15 0:00, James Morse wrote:
>>> +		 * error has not been propagated
>>> +		 */
>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +		return 0;
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>> until it gets an MCEERR signal.
>>
>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>
>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

> do you think whether we need to set the guest ESR by user space?  if need, I need to
> notify user space that there is a SError happen and need to set ESR for guest in some place of
> KVM.

I think you are still coming from a world where user-space gets raw RAS
notifications via KVM. This should not happen because the notification method is
private to firmware and the kernel. KVM is just in the way when a guest is running.

Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.

The kernel RAS code may signal user-space with the symptoms of the error, and
user-space may decided to generate a new RAS notification for the guest.

This should function in exactly the same way, regardless of which notification
method is in use between the kernel and firmware. (its the only way to make this
future-proof).

Which notification user-space chooses to use entirely depends on what (if
anything) it advertised to the guest in the HEST. User-space has to be in
control of triggering any SError, not just overriding the ESR when KVM has
decided it wants to kill the guest.


> so here I return a error code to user space. you mean we should not pass RAS notifications
> to user space, so could you give some suggestion how to notify user space to set guest ESR.

KVM shouldn't give the guest an SError when it takes a RAS notification, it
should pass the notification to the kernel RAS code. It only needs to 'fall
through' to some default cause if both APEI and kernel-first deny-all-knowledge
of this notification.


The end-to-end flow is then (assuming no-VHE):
(1)An error occurs, taking the CPU to EL3.
EL3: triage the error, generate CPER, notify the OS
EL2: KVM takes the notification, exits the guest, returns to host EL1.
EL1: KVM handle_exit() calls APEI to handle the error.
This is the end of KVMs involvement in RAS - its just plumbing.

(2)APEI processes the CPER records and signals affected processes.
If KVM's user-space is affected, KVM will spot the pending signal when it goes
to re-enter the guest, and exit to user-space instead.
Qemu takes the SIGBUS_MCEERR_A{O,R}.

(3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
records (in memory only Qemu knows about), then drives the KVM API to make the
appropriate notification appear.


(1) only happens if the guest was running when the error arrived. GHES has ~4
flavours of IRQ which may be used to describe corruption in guest memory. Steps
(2) and (3) are exactly the same in this case.

Qemu may decide to trigger RAS errors all by itself, (probably for testing and
debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.


This way platform-firmware/host-kernel can use kernel-first or firmware-first
with any of the notifications, independently from Qemu/guest-kernel making a
different kernel-first or firmware-first with different notifications.

Passing information out of KVM breaks this, forcing Qemu to know about the
mechanism platform-firmware is using.


We need to tackle (1) and (3) separately. For (3) we need some API that lets
Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
a way of migrating pending SError yet... which is where I got stuck last time I
was looking at this.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-06 19:04         ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-12-06 19:04 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 4222 bytes --]

Hi gengdongjiu,

On 06/12/17 10:26, gengdongjiu wrote:
> On 2017/11/15 0:00, James Morse wrote:
>>> +		 * error has not been propagated
>>> +		 */
>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +		return 0;
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>> until it gets an MCEERR signal.
>>
>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>
>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)

> do you think whether we need to set the guest ESR by user space?  if need, I need to
> notify user space that there is a SError happen and need to set ESR for guest in some place of
> KVM.

I think you are still coming from a world where user-space gets raw RAS
notifications via KVM. This should not happen because the notification method is
private to firmware and the kernel. KVM is just in the way when a guest is running.

Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.

The kernel RAS code may signal user-space with the symptoms of the error, and
user-space may decided to generate a new RAS notification for the guest.

This should function in exactly the same way, regardless of which notification
method is in use between the kernel and firmware. (its the only way to make this
future-proof).

Which notification user-space chooses to use entirely depends on what (if
anything) it advertised to the guest in the HEST. User-space has to be in
control of triggering any SError, not just overriding the ESR when KVM has
decided it wants to kill the guest.


> so here I return a error code to user space. you mean we should not pass RAS notifications
> to user space, so could you give some suggestion how to notify user space to set guest ESR.

KVM shouldn't give the guest an SError when it takes a RAS notification, it
should pass the notification to the kernel RAS code. It only needs to 'fall
through' to some default cause if both APEI and kernel-first deny-all-knowledge
of this notification.


The end-to-end flow is then (assuming no-VHE):
(1)An error occurs, taking the CPU to EL3.
EL3: triage the error, generate CPER, notify the OS
EL2: KVM takes the notification, exits the guest, returns to host EL1.
EL1: KVM handle_exit() calls APEI to handle the error.
This is the end of KVMs involvement in RAS - its just plumbing.

(2)APEI processes the CPER records and signals affected processes.
If KVM's user-space is affected, KVM will spot the pending signal when it goes
to re-enter the guest, and exit to user-space instead.
Qemu takes the SIGBUS_MCEERR_A{O,R}.

(3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
records (in memory only Qemu knows about), then drives the KVM API to make the
appropriate notification appear.


(1) only happens if the guest was running when the error arrived. GHES has ~4
flavours of IRQ which may be used to describe corruption in guest memory. Steps
(2) and (3) are exactly the same in this case.

Qemu may decide to trigger RAS errors all by itself, (probably for testing and
debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.


This way platform-firmware/host-kernel can use kernel-first or firmware-first
with any of the notifications, independently from Qemu/guest-kernel making a
different kernel-first or firmware-first with different notifications.

Passing information out of KVM breaks this, forcing Qemu to know about the
mechanism platform-firmware is using.


We need to tackle (1) and (3) separately. For (3) we need some API that lets
Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
a way of migrating pending SError yet... which is where I got stuck last time I
was looking at this.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-06 19:04         ` James Morse
  (?)
  (?)
@ 2017-12-07  6:37           ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-07  6:37 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,

On 2017/12/7 3:04, James Morse wrote:
> Hi gengdongjiu,
> 
> On 06/12/17 10:26, gengdongjiu wrote:
>> On 2017/11/15 0:00, James Morse wrote:
>>>> +		 * error has not been propagated
>>>> +		 */
>>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +		return 0;
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>> until it gets an MCEERR signal.
>>>
>>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>>
>>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
>> do you think whether we need to set the guest ESR by user space?  if need, I need to
>> notify user space that there is a SError happen and need to set ESR for guest in some place of
>> KVM.
> 
> I think you are still coming from a world where user-space gets raw RAS
> notifications via KVM. This should not happen because the notification method is
> private to firmware and the kernel. KVM is just in the way when a guest is running.
> 
> Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
> or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.
> 
> The kernel RAS code may signal user-space with the symptoms of the error, and
> user-space may decided to generate a new RAS notification for the guest.
> 
> This should function in exactly the same way, regardless of which notification
> method is in use between the kernel and firmware. (its the only way to make this
> future-proof).
> 
> Which notification user-space chooses to use entirely depends on what (if
> anything) it advertised to the guest in the HEST. User-space has to be in
> control of triggering any SError, not just overriding the ESR when KVM has
> decided it wants to kill the guest.

thanks, I will explain more.

> 
> 
>> so here I return a error code to user space. you mean we should not pass RAS notifications
>> to user space, so could you give some suggestion how to notify user space to set guest ESR.
> 
> KVM shouldn't give the guest an SError when it takes a RAS notification, it
> should pass the notification to the kernel RAS code. It only needs to 'fall
> through' to some default cause if both APEI and kernel-first deny-all-knowledge
> of this notification.
> 
> 
> The end-to-end flow is then (assuming no-VHE):
> (1)An error occurs, taking the CPU to EL3.
> EL3: triage the error, generate CPER, notify the OS
> EL2: KVM takes the notification, exits the guest, returns to host EL1.
> EL1: KVM handle_exit() calls APEI to handle the error.
> This is the end of KVMs involvement in RAS - its just plumbing.
> 
> (2)APEI processes the CPER records and signals affected processes.
> If KVM's user-space is affected, KVM will spot the pending signal when it goes
> to re-enter the guest, and exit to user-space instead.
> Qemu takes the SIGBUS_MCEERR_A{O,R}.
> 
> (3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
> records (in memory only Qemu knows about), then drives the KVM API to make the
> appropriate notification appear.
> 
> 
> (1) only happens if the guest was running when the error arrived. GHES has ~4
> flavours of IRQ which may be used to describe corruption in guest memory. Steps
> (2) and (3) are exactly the same in this case.
> 
> Qemu may decide to trigger RAS errors all by itself, (probably for testing and
> debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.
> 
> 
> This way platform-firmware/host-kernel can use kernel-first or firmware-first
> with any of the notifications, independently from Qemu/guest-kernel making a
> different kernel-first or firmware-first with different notifications.
> 
> Passing information out of KVM breaks this, forcing Qemu to know about the
> mechanism platform-firmware is using.
> 
> 
> We need to tackle (1) and (3) separately. For (3) we need some API that lets
> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
> a way of migrating pending SError yet... which is where I got stuck last time I
> was looking at this.

I understand you most idea.

But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
otherwise Qemu will do not know how to do.

For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:

SIGBUS_MCEERR_AR trigger Synchronous External Abort.
SIGBUS_MCEERR_AO trigger GPIO IRQ.

For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all

not involve _trigger_ an SError.

so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

> 
> 
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-07  6:37           ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-07  6:37 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,

On 2017/12/7 3:04, James Morse wrote:
> Hi gengdongjiu,
> 
> On 06/12/17 10:26, gengdongjiu wrote:
>> On 2017/11/15 0:00, James Morse wrote:
>>>> +		 * error has not been propagated
>>>> +		 */
>>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +		return 0;
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>> until it gets an MCEERR signal.
>>>
>>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>>
>>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
>> do you think whether we need to set the guest ESR by user space?  if need, I need to
>> notify user space that there is a SError happen and need to set ESR for guest in some place of
>> KVM.
> 
> I think you are still coming from a world where user-space gets raw RAS
> notifications via KVM. This should not happen because the notification method is
> private to firmware and the kernel. KVM is just in the way when a guest is running.
> 
> Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
> or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.
> 
> The kernel RAS code may signal user-space with the symptoms of the error, and
> user-space may decided to generate a new RAS notification for the guest.
> 
> This should function in exactly the same way, regardless of which notification
> method is in use between the kernel and firmware. (its the only way to make this
> future-proof).
> 
> Which notification user-space chooses to use entirely depends on what (if
> anything) it advertised to the guest in the HEST. User-space has to be in
> control of triggering any SError, not just overriding the ESR when KVM has
> decided it wants to kill the guest.

thanks, I will explain more.

> 
> 
>> so here I return a error code to user space. you mean we should not pass RAS notifications
>> to user space, so could you give some suggestion how to notify user space to set guest ESR.
> 
> KVM shouldn't give the guest an SError when it takes a RAS notification, it
> should pass the notification to the kernel RAS code. It only needs to 'fall
> through' to some default cause if both APEI and kernel-first deny-all-knowledge
> of this notification.
> 
> 
> The end-to-end flow is then (assuming no-VHE):
> (1)An error occurs, taking the CPU to EL3.
> EL3: triage the error, generate CPER, notify the OS
> EL2: KVM takes the notification, exits the guest, returns to host EL1.
> EL1: KVM handle_exit() calls APEI to handle the error.
> This is the end of KVMs involvement in RAS - its just plumbing.
> 
> (2)APEI processes the CPER records and signals affected processes.
> If KVM's user-space is affected, KVM will spot the pending signal when it goes
> to re-enter the guest, and exit to user-space instead.
> Qemu takes the SIGBUS_MCEERR_A{O,R}.
> 
> (3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
> records (in memory only Qemu knows about), then drives the KVM API to make the
> appropriate notification appear.
> 
> 
> (1) only happens if the guest was running when the error arrived. GHES has ~4
> flavours of IRQ which may be used to describe corruption in guest memory. Steps
> (2) and (3) are exactly the same in this case.
> 
> Qemu may decide to trigger RAS errors all by itself, (probably for testing and
> debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.
> 
> 
> This way platform-firmware/host-kernel can use kernel-first or firmware-first
> with any of the notifications, independently from Qemu/guest-kernel making a
> different kernel-first or firmware-first with different notifications.
> 
> Passing information out of KVM breaks this, forcing Qemu to know about the
> mechanism platform-firmware is using.
> 
> 
> We need to tackle (1) and (3) separately. For (3) we need some API that lets
> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
> a way of migrating pending SError yet... which is where I got stuck last time I
> was looking at this.

I understand you most idea.

But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
otherwise Qemu will do not know how to do.

For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:

SIGBUS_MCEERR_AR trigger Synchronous External Abort.
SIGBUS_MCEERR_AO trigger GPIO IRQ.

For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all

not involve _trigger_ an SError.

so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

> 
> 
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-07  6:37           ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-07  6:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2017/12/7 3:04, James Morse wrote:
> Hi gengdongjiu,
> 
> On 06/12/17 10:26, gengdongjiu wrote:
>> On 2017/11/15 0:00, James Morse wrote:
>>>> +		 * error has not been propagated
>>>> +		 */
>>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +		return 0;
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>> until it gets an MCEERR signal.
>>>
>>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>>
>>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
>> do you think whether we need to set the guest ESR by user space?  if need, I need to
>> notify user space that there is a SError happen and need to set ESR for guest in some place of
>> KVM.
> 
> I think you are still coming from a world where user-space gets raw RAS
> notifications via KVM. This should not happen because the notification method is
> private to firmware and the kernel. KVM is just in the way when a guest is running.
> 
> Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
> or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.
> 
> The kernel RAS code may signal user-space with the symptoms of the error, and
> user-space may decided to generate a new RAS notification for the guest.
> 
> This should function in exactly the same way, regardless of which notification
> method is in use between the kernel and firmware. (its the only way to make this
> future-proof).
> 
> Which notification user-space chooses to use entirely depends on what (if
> anything) it advertised to the guest in the HEST. User-space has to be in
> control of triggering any SError, not just overriding the ESR when KVM has
> decided it wants to kill the guest.

thanks, I will explain more.

> 
> 
>> so here I return a error code to user space. you mean we should not pass RAS notifications
>> to user space, so could you give some suggestion how to notify user space to set guest ESR.
> 
> KVM shouldn't give the guest an SError when it takes a RAS notification, it
> should pass the notification to the kernel RAS code. It only needs to 'fall
> through' to some default cause if both APEI and kernel-first deny-all-knowledge
> of this notification.
> 
> 
> The end-to-end flow is then (assuming no-VHE):
> (1)An error occurs, taking the CPU to EL3.
> EL3: triage the error, generate CPER, notify the OS
> EL2: KVM takes the notification, exits the guest, returns to host EL1.
> EL1: KVM handle_exit() calls APEI to handle the error.
> This is the end of KVMs involvement in RAS - its just plumbing.
> 
> (2)APEI processes the CPER records and signals affected processes.
> If KVM's user-space is affected, KVM will spot the pending signal when it goes
> to re-enter the guest, and exit to user-space instead.
> Qemu takes the SIGBUS_MCEERR_A{O,R}.
> 
> (3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
> records (in memory only Qemu knows about), then drives the KVM API to make the
> appropriate notification appear.
> 
> 
> (1) only happens if the guest was running when the error arrived. GHES has ~4
> flavours of IRQ which may be used to describe corruption in guest memory. Steps
> (2) and (3) are exactly the same in this case.
> 
> Qemu may decide to trigger RAS errors all by itself, (probably for testing and
> debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.
> 
> 
> This way platform-firmware/host-kernel can use kernel-first or firmware-first
> with any of the notifications, independently from Qemu/guest-kernel making a
> different kernel-first or firmware-first with different notifications.
> 
> Passing information out of KVM breaks this, forcing Qemu to know about the
> mechanism platform-firmware is using.
> 
> 
> We need to tackle (1) and (3) separately. For (3) we need some API that lets
> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
> a way of migrating pending SError yet... which is where I got stuck last time I
> was looking at this.

I understand you most idea.

But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
otherwise Qemu will do not know how to do.

For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:

SIGBUS_MCEERR_AR trigger Synchronous External Abort.
SIGBUS_MCEERR_AO trigger GPIO IRQ.

For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all

not involve _trigger_ an SError.

so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

> 
> 
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-07  6:37           ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-07  6:37 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 5271 bytes --]

Hi James,

On 2017/12/7 3:04, James Morse wrote:
> Hi gengdongjiu,
> 
> On 06/12/17 10:26, gengdongjiu wrote:
>> On 2017/11/15 0:00, James Morse wrote:
>>>> +		 * error has not been propagated
>>>> +		 */
>>>> +		run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +		run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +		run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +		return 0;
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>> until it gets an MCEERR signal.
>>>
>>> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>>>
>>> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
>>> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)
> 
>> do you think whether we need to set the guest ESR by user space?  if need, I need to
>> notify user space that there is a SError happen and need to set ESR for guest in some place of
>> KVM.
> 
> I think you are still coming from a world where user-space gets raw RAS
> notifications via KVM. This should not happen because the notification method is
> private to firmware and the kernel. KVM is just in the way when a guest is running.
> 
> Notifications reaching KVM should be plumbed into the APEI-firmware-first-code
> or eventually, a kernel-first mechanism if APEI doesn't 'claim' them.
> 
> The kernel RAS code may signal user-space with the symptoms of the error, and
> user-space may decided to generate a new RAS notification for the guest.
> 
> This should function in exactly the same way, regardless of which notification
> method is in use between the kernel and firmware. (its the only way to make this
> future-proof).
> 
> Which notification user-space chooses to use entirely depends on what (if
> anything) it advertised to the guest in the HEST. User-space has to be in
> control of triggering any SError, not just overriding the ESR when KVM has
> decided it wants to kill the guest.

thanks, I will explain more.

> 
> 
>> so here I return a error code to user space. you mean we should not pass RAS notifications
>> to user space, so could you give some suggestion how to notify user space to set guest ESR.
> 
> KVM shouldn't give the guest an SError when it takes a RAS notification, it
> should pass the notification to the kernel RAS code. It only needs to 'fall
> through' to some default cause if both APEI and kernel-first deny-all-knowledge
> of this notification.
> 
> 
> The end-to-end flow is then (assuming no-VHE):
> (1)An error occurs, taking the CPU to EL3.
> EL3: triage the error, generate CPER, notify the OS
> EL2: KVM takes the notification, exits the guest, returns to host EL1.
> EL1: KVM handle_exit() calls APEI to handle the error.
> This is the end of KVMs involvement in RAS - its just plumbing.
> 
> (2)APEI processes the CPER records and signals affected processes.
> If KVM's user-space is affected, KVM will spot the pending signal when it goes
> to re-enter the guest, and exit to user-space instead.
> Qemu takes the SIGBUS_MCEERR_A{O,R}.
> 
> (3) Qemu decides it wants to hand the guest a RAS error, it populates the CPER
> records (in memory only Qemu knows about), then drives the KVM API to make the
> appropriate notification appear.
> 
> 
> (1) only happens if the guest was running when the error arrived. GHES has ~4
> flavours of IRQ which may be used to describe corruption in guest memory. Steps
> (2) and (3) are exactly the same in this case.
> 
> Qemu may decide to trigger RAS errors all by itself, (probably for testing and
> debugging), in which case (1) and (2) don't happen, but (3), is exactly the same.
> 
> 
> This way platform-firmware/host-kernel can use kernel-first or firmware-first
> with any of the notifications, independently from Qemu/guest-kernel making a
> different kernel-first or firmware-first with different notifications.
> 
> Passing information out of KVM breaks this, forcing Qemu to know about the
> mechanism platform-firmware is using.
> 
> 
> We need to tackle (1) and (3) separately. For (3) we need some API that lets
> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
> a way of migrating pending SError yet... which is where I got stuck last time I
> was looking at this.

I understand you most idea.

But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
otherwise Qemu will do not know how to do.

For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:

SIGBUS_MCEERR_AR trigger Synchronous External Abort.
SIGBUS_MCEERR_AO trigger GPIO IRQ.

For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all

not involve _trigger_ an SError.

so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

> 
> 
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-07  6:37           ` gengdongjiu
  (?)
  (?)
@ 2017-12-15  3:30             ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-15  3:30 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,

On 2017/12/7 14:37, gengdongjiu wrote:
>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>> a way of migrating pending SError yet... which is where I got stuck last time I
>> was looking at this.
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.
> 
> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

As I explained above:

If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
So Qemu does not know when to _trigger_ an SError.

so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
do you have a better idea about it? thanks

About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
how and when to trigger the SError.

[1]:
static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
        .......................
+       case ESR_ELx_AET_UER:   /* The error has not been propagated */
+               /*
+                * Userspace only handle the guest SError Interrupt(SEI) if the
+                * error has not been propagated
+                */
+               run->exit_reason = KVM_EXIT_EXCEPTION;
+               run->ex.exception = ESR_ELx_EC_SERROR;
+               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+               return 0;
        .......................
}

> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-15  3:30             ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-15  3:30 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi James,

On 2017/12/7 14:37, gengdongjiu wrote:
>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>> a way of migrating pending SError yet... which is where I got stuck last time I
>> was looking at this.
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.
> 
> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

As I explained above:

If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
So Qemu does not know when to _trigger_ an SError.

so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
do you have a better idea about it? thanks

About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
how and when to trigger the SError.

[1]:
static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
        .......................
+       case ESR_ELx_AET_UER:   /* The error has not been propagated */
+               /*
+                * Userspace only handle the guest SError Interrupt(SEI) if the
+                * error has not been propagated
+                */
+               run->exit_reason = KVM_EXIT_EXCEPTION;
+               run->ex.exception = ESR_ELx_EC_SERROR;
+               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+               return 0;
        .......................
}

> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-15  3:30             ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-15  3:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2017/12/7 14:37, gengdongjiu wrote:
>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>> a way of migrating pending SError yet... which is where I got stuck last time I
>> was looking at this.
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.
> 
> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

As I explained above:

If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
So Qemu does not know when to _trigger_ an SError.

so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
do you have a better idea about it? thanks

About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
how and when to trigger the SError.

[1]:
static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
        .......................
+       case ESR_ELx_AET_UER:   /* The error has not been propagated */
+               /*
+                * Userspace only handle the guest SError Interrupt(SEI) if the
+                * error has not been propagated
+                */
+               run->exit_reason = KVM_EXIT_EXCEPTION;
+               run->ex.exception = ESR_ELx_EC_SERROR;
+               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+               return 0;
        .......................
}

> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-15  3:30             ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-15  3:30 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

Hi James,

On 2017/12/7 14:37, gengdongjiu wrote:
>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>> a way of migrating pending SError yet... which is where I got stuck last time I
>> was looking at this.
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.
> 
> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

As I explained above:

If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
So Qemu does not know when to _trigger_ an SError.

so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
do you have a better idea about it? thanks

About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
how and when to trigger the SError.

[1]:
static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
{
        .......................
+       case ESR_ELx_AET_UER:   /* The error has not been propagated */
+               /*
+                * Userspace only handle the guest SError Interrupt(SEI) if the
+                * error has not been propagated
+                */
+               run->exit_reason = KVM_EXIT_EXCEPTION;
+               run->ex.exception = ESR_ELx_EC_SERROR;
+               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
+               return 0;
        .......................
}

> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-07  6:37           ` gengdongjiu
  (?)
@ 2017-12-15 18:52             ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-12-15 18:52 UTC (permalink / raw)
  To: gengdongjiu
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi gengdongjiu,

On 07/12/17 06:37, gengdongjiu wrote:
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.

It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
CPU you are trying to emulate.

I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
the guest doesn't take the SError immediately, instead tries to re-execute the
code KVM has unmapped from stage2 because its corrupt. (You could detect this
happening in Qemu and try something else)


Synchronous/asynchronous external abort matters to the CPU, but once the error
has been notified to software the reasons for this distinction disappear. Once
the error has been handled, all trace of this distinction is gone.

CPER records only describe component failures. You are trying to re-create some
state that disappeared with one of the firmware-first abstractions. Trying to
re-create this information isn't worth the effort as the distinction doesn't
matter to linux, only to the CPU.


> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

You mean there is no reason for Qemu to trigger an SError when it gets a signal
from the kernel.

The reasons the CPU might have to generate an SError don't apply to linux and
KVM user space. User-space will never get a signal for an uncontained error, we
will always panic(). We can't give user-space a signal for imprecise exceptions,
as it can't return from the signal. The classes of error that are left are
covered by polled/irq and NOTIFY_SEA.

Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
to, (but I don't think you should, the kernel may have unmapped the page at PC
from stage2 due to corruption).


I think the problem here is you're applying the CPU->software behaviour and
choices to software->software. By the time user-space gets the error, the
behaviour is different.



Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-15 18:52             ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-12-15 18:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hi gengdongjiu,

On 07/12/17 06:37, gengdongjiu wrote:
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.

It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
CPU you are trying to emulate.

I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
the guest doesn't take the SError immediately, instead tries to re-execute the
code KVM has unmapped from stage2 because its corrupt. (You could detect this
happening in Qemu and try something else)


Synchronous/asynchronous external abort matters to the CPU, but once the error
has been notified to software the reasons for this distinction disappear. Once
the error has been handled, all trace of this distinction is gone.

CPER records only describe component failures. You are trying to re-create some
state that disappeared with one of the firmware-first abstractions. Trying to
re-create this information isn't worth the effort as the distinction doesn't
matter to linux, only to the CPU.


> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

You mean there is no reason for Qemu to trigger an SError when it gets a signal
from the kernel.

The reasons the CPU might have to generate an SError don't apply to linux and
KVM user space. User-space will never get a signal for an uncontained error, we
will always panic(). We can't give user-space a signal for imprecise exceptions,
as it can't return from the signal. The classes of error that are left are
covered by polled/irq and NOTIFY_SEA.

Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
to, (but I don't think you should, the kernel may have unmapped the page at PC
from stage2 due to corruption).


I think the problem here is you're applying the CPU->software behaviour and
choices to software->software. By the time user-space gets the error, the
behaviour is different.



Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-15 18:52             ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2017-12-15 18:52 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2718 bytes --]

Hi gengdongjiu,

On 07/12/17 06:37, gengdongjiu wrote:
> I understand you most idea.
> 
> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
> otherwise Qemu will do not know how to do.
> 
> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
> 
> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
> SIGBUS_MCEERR_AO trigger GPIO IRQ.
> 
> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
> 
> not involve _trigger_ an SError.

It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
CPU you are trying to emulate.

I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
the guest doesn't take the SError immediately, instead tries to re-execute the
code KVM has unmapped from stage2 because its corrupt. (You could detect this
happening in Qemu and try something else)


Synchronous/asynchronous external abort matters to the CPU, but once the error
has been notified to software the reasons for this distinction disappear. Once
the error has been handled, all trace of this distinction is gone.

CPER records only describe component failures. You are trying to re-create some
state that disappeared with one of the firmware-first abstractions. Trying to
re-create this information isn't worth the effort as the distinction doesn't
matter to linux, only to the CPU.


> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.

You mean there is no reason for Qemu to trigger an SError when it gets a signal
from the kernel.

The reasons the CPU might have to generate an SError don't apply to linux and
KVM user space. User-space will never get a signal for an uncontained error, we
will always panic(). We can't give user-space a signal for imprecise exceptions,
as it can't return from the signal. The classes of error that are left are
covered by polled/irq and NOTIFY_SEA.

Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
to, (but I don't think you should, the kernel may have unmapped the page at PC
from stage2 due to corruption).


I think the problem here is you're applying the CPU->software behaviour and
choices to software->software. By the time user-space gets the error, the
behaviour is different.



Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-15 18:52             ` James Morse
  (?)
@ 2017-12-16  3:44               ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-16  3:44 UTC (permalink / raw)
  To: James Morse
  Cc: wuquanming, kvm, linux-doc, marc.zyngier, linux-kernel, linux,
	linux-acpi, huangshaoyu, linux-arm-kernel, kvmarm

Hi James,

On 2017/12/16 2:52, James Morse wrote:
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
> It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
> software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
> CPU you are trying to emulate.
> 
> I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
> the guest doesn't take the SError immediately, instead tries to re-execute the
I agree it is better to use NOTIFY_SEA for SIGBUS_MCEERR_AR in this case.

> code KVM has unmapped from stage2 because its corrupt. (You could detect this
> happening in Qemu and try something else)For something else, using NOTIFY_SEI for SIGBUS_MCEERR_AR? At current implementation,
It seems only have this case that "KVM has unmapped from stage2", do you thing we still have something else?

> 
> 
> Synchronous/asynchronous external abort matters to the CPU, but once the error
> has been notified to software the reasons for this distinction disappear. Once
> the error has been handled, all trace of this distinction is gone.
> 
> CPER records only describe component failures. You are trying to re-create some
> state that disappeared with one of the firmware-first abstractions. Trying to
> re-create this information isn't worth the effort as the distinction doesn't
> matter to linux, only to the CPU.
> 
> 
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> You mean there is no reason for Qemu to trigger an SError when it gets a signal
> from the kernel.
> 
> The reasons the CPU might have to generate an SError don't apply to linux and
> KVM user space. User-space will never get a signal for an uncontained error, we
> will always panic(). We can't give user-space a signal for imprecise exceptions,
> as it can't return from the signal. The classes of error that are left are
> covered by polled/irq and NOTIFY_SEA.
> 
> Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
> to, (but I don't think you should, the kernel may have unmapped the page at PC
> from stage2 due to corruption).
yes, you also said you do not want to generate RAS SErrors for SIGBUS_MCEERR_AR,
so Qemu does not know in which condition to generate RAS SErrors.

> 
> I think the problem here is you're applying the CPU->software behaviour and
> choices to software->software. By the time user-space gets the error, the
> behaviour is different.
In the KVM, as a policy choice to reserve this API to specify guest ESR and drive to trigger SError is OK,
At least for Qemu it does not know in which condition to trigger it.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-16  3:44               ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-16  3:44 UTC (permalink / raw)
  To: James Morse
  Cc: christoffer.dall, marc.zyngier, linux, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, huangshaoyu,
	wuquanming

Hi James,

On 2017/12/16 2:52, James Morse wrote:
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
> It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
> software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
> CPU you are trying to emulate.
> 
> I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
> the guest doesn't take the SError immediately, instead tries to re-execute the
I agree it is better to use NOTIFY_SEA for SIGBUS_MCEERR_AR in this case.

> code KVM has unmapped from stage2 because its corrupt. (You could detect this
> happening in Qemu and try something else)For something else, using NOTIFY_SEI for SIGBUS_MCEERR_AR? At current implementation,
It seems only have this case that "KVM has unmapped from stage2", do you thing we still have something else?

> 
> 
> Synchronous/asynchronous external abort matters to the CPU, but once the error
> has been notified to software the reasons for this distinction disappear. Once
> the error has been handled, all trace of this distinction is gone.
> 
> CPER records only describe component failures. You are trying to re-create some
> state that disappeared with one of the firmware-first abstractions. Trying to
> re-create this information isn't worth the effort as the distinction doesn't
> matter to linux, only to the CPU.
> 
> 
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> You mean there is no reason for Qemu to trigger an SError when it gets a signal
> from the kernel.
> 
> The reasons the CPU might have to generate an SError don't apply to linux and
> KVM user space. User-space will never get a signal for an uncontained error, we
> will always panic(). We can't give user-space a signal for imprecise exceptions,
> as it can't return from the signal. The classes of error that are left are
> covered by polled/irq and NOTIFY_SEA.
> 
> Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
> to, (but I don't think you should, the kernel may have unmapped the page at PC
> from stage2 due to corruption).
yes, you also said you do not want to generate RAS SErrors for SIGBUS_MCEERR_AR,
so Qemu does not know in which condition to generate RAS SErrors.

> 
> I think the problem here is you're applying the CPU->software behaviour and
> choices to software->software. By the time user-space gets the error, the
> behaviour is different.
In the KVM, as a policy choice to reserve this API to specify guest ESR and drive to trigger SError is OK,
At least for Qemu it does not know in which condition to trigger it.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-16  3:44               ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-16  3:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,

On 2017/12/16 2:52, James Morse wrote:
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
> It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
> software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
> CPU you are trying to emulate.
> 
> I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
> the guest doesn't take the SError immediately, instead tries to re-execute the
I agree it is better to use NOTIFY_SEA for SIGBUS_MCEERR_AR in this case.

> code KVM has unmapped from stage2 because its corrupt. (You could detect this
> happening in Qemu and try something else)For something else, using NOTIFY_SEI for SIGBUS_MCEERR_AR? At current implementation,
It seems only have this case that "KVM has unmapped from stage2", do you thing we still have something else?

> 
> 
> Synchronous/asynchronous external abort matters to the CPU, but once the error
> has been notified to software the reasons for this distinction disappear. Once
> the error has been handled, all trace of this distinction is gone.
> 
> CPER records only describe component failures. You are trying to re-create some
> state that disappeared with one of the firmware-first abstractions. Trying to
> re-create this information isn't worth the effort as the distinction doesn't
> matter to linux, only to the CPU.
> 
> 
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> You mean there is no reason for Qemu to trigger an SError when it gets a signal
> from the kernel.
> 
> The reasons the CPU might have to generate an SError don't apply to linux and
> KVM user space. User-space will never get a signal for an uncontained error, we
> will always panic(). We can't give user-space a signal for imprecise exceptions,
> as it can't return from the signal. The classes of error that are left are
> covered by polled/irq and NOTIFY_SEA.
> 
> Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
> to, (but I don't think you should, the kernel may have unmapped the page at PC
> from stage2 due to corruption).
yes, you also said you do not want to generate RAS SErrors for SIGBUS_MCEERR_AR,
so Qemu does not know in which condition to generate RAS SErrors.

> 
> I think the problem here is you're applying the CPU->software behaviour and
> choices to software->software. By the time user-space gets the error, the
> behaviour is different.
In the KVM, as a policy choice to reserve this API to specify guest ESR and drive to trigger SError is OK,
At least for Qemu it does not know in which condition to trigger it.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-11-14 16:00     ` James Morse
@ 2017-12-16  4:47       ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-16  4:47 UTC (permalink / raw)
  To: James Morse
  Cc: Dongjiu Geng, wuquanming, linux-doc, kvm, Marc Zyngier, linux,
	linuxarm, Linux Kernel Mailing List, linux-acpi, arm-mail-list,
	Huangshaoyu, kvmarm, devel

[...]
>
>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +             /*
>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>> +              * error has not been propagated
>> +              */
>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +             return 0;
>
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS

For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
below, which get from [0]

The state of the PE is Recoverable if all of the following are true:
— The error has not been silently propagated.
— The error has not been architecturally consumed by the PE. (The PE
architectural state is not infected.)
— The exception is precise and PE can recover execution from the
preferred return address of the exception, if software locates and
repairs the error.
The PE cannot make correct progress without either consuming the error
or otherwise making the error unrecoverable. The error remains latent
in the system.
If software cannot locate and repair the error, either the application
or the VM, or both, must be isolated by software.

so we can see the  exception is precise and PE can recover execution
from the preferred return address of the exception, so let guest
handling it is
better, for example, if it is guest application RAS error, we can kill
the guest application instead of panic whole OS; if it is guest kernel
RAS error, guest will panic.
Host does not know which application of guest has error, so host can
not handle it, panic OS is not a good choice for the Recoverable
error.

[0]
https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf


> until it gets an MCEERR signal.

user space will detect whether kernel support RAS before handing it.

>
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


>
>
>> +     default:
>> +             /*
>> +              * Until now, the CPU supports RAS and SEI is fatal, or host
>> +              * does not support to handle the SError.
>> +              */
>> +             panic("This Asynchronous SError interrupt is dangerous, panic");
>> +     }
>> +
>> +     return 0;
>> +}
>> +
>>  /*
>>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>   * proper exit to userspace.
>
>
>
> James
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2017-12-16  4:47       ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2017-12-16  4:47 UTC (permalink / raw)
  To: linux-arm-kernel

[...]
>
>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +             /*
>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>> +              * error has not been propagated
>> +              */
>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +             return 0;
>
> We should not pass RAS notifications to user space. The kernel either handles
> them, or it panics(). User space shouldn't even know if the kernel supports RAS

For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
below, which get from [0]

The state of the PE is Recoverable if all of the following are true:
? The error has not been silently propagated.
? The error has not been architecturally consumed by the PE. (The PE
architectural state is not infected.)
? The exception is precise and PE can recover execution from the
preferred return address of the exception, if software locates and
repairs the error.
The PE cannot make correct progress without either consuming the error
or otherwise making the error unrecoverable. The error remains latent
in the system.
If software cannot locate and repair the error, either the application
or the VM, or both, must be isolated by software.

so we can see the  exception is precise and PE can recover execution
from the preferred return address of the exception, so let guest
handling it is
better, for example, if it is guest application RAS error, we can kill
the guest application instead of panic whole OS; if it is guest kernel
RAS error, guest will panic.
Host does not know which application of guest has error, so host can
not handle it, panic OS is not a good choice for the Recoverable
error.

[0]
https://static.docs.arm.com/ddi0587/a/RAS%20Extension-release%20candidate_march_29.pdf


> until it gets an MCEERR signal.

user space will detect whether kernel support RAS before handing it.

>
> You're making your firmware-first notification an EL3->EL0 signal, bypassing the OS.
>
> If we get a RAS SError and there are no CPER records or values in the ERR nodes,
> we should panic as it looks like the CPU/firmware is broken. (spurious RAS errors)


>
>
>> +     default:
>> +             /*
>> +              * Until now, the CPU supports RAS and SEI is fatal, or host
>> +              * does not support to handle the SError.
>> +              */
>> +             panic("This Asynchronous SError interrupt is dangerous, panic");
>> +     }
>> +
>> +     return 0;
>> +}
>> +
>>  /*
>>   * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on
>>   * proper exit to userspace.
>
>
>
> James
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-15  3:30             ` gengdongjiu
  (?)
  (?)
@ 2018-01-12 18:05               ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: gengdongjiu
  Cc: wuquanming, linux-doc, kvm, marc.zyngier, catalin.marinas,
	corbet, rjw, linux, linuxarm, linux-kernel, linux-acpi, bp,
	linux-arm-kernel, huangshaoyu, pbonzini, kvmarm, devel

Hi gengdongjiu,

On 15/12/17 03:30, gengdongjiu wrote:
> On 2017/12/7 14:37, gengdongjiu wrote:
>>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>>> a way of migrating pending SError yet... which is where I got stuck last time I
>>> was looking at this.
>> I understand you most idea.
>>
>> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
>> otherwise Qemu will do not know how to do.
>>
>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
>>
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> 
> As I explained above:
> 
> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;

> So Qemu does not know when to _trigger_ an SError.

There is no answer to this. How the CPU decides is specific to the CPU design.
How Qemu decides is going to be specific to the machine it emulates.

My understanding is there is some overlap for which RAS errors are reported as
synchronous external abort, and which use SError. (Obviously the imprecise ones
are all SError). Which one the CPU uses depends on how the CPU is designed.

When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
stage2 fault because the page is marked with PG_poisoned. These started out as a
synchronous exception, but you could still report these with SError.

We don't have a way to signal user-space about imprecise exceptions, this isn't
a KVM specific problem.


> so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
> do you have a better idea about it? thanks

If ghes_notify_sei() fails to claim the error, we should drop through to
kernel-first-handling. We don't have that yet, just the stub that ignores errors
where we can make progress.

If neither firmware-first nor kernel-first claim a RAS error, we're in trouble.
I'd like to panic() as we got a RAS notification but no description of the
error. We can't do this until we have kernel-first support, hence that stub.


> About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
> how and when to trigger the SError.

I agree, but I think we should fix this first before we add another user of this
unmigratable hypervisor state.

(I recall someone saying migration is needed for any new KVM/cpu features, but I
can't find the thread)


> [1]:
> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
>         .......................
> +       case ESR_ELx_AET_UER:   /* The error has not been propagated */
> +               /*
> +                * Userspace only handle the guest SError Interrupt(SEI) if the
> +                * error has not been propagated
> +                */
> +               run->exit_reason = KVM_EXIT_EXCEPTION;
> +               run->ex.exception = ESR_ELx_EC_SERROR;

I'm against telling user space RAS errors ever happened, only the final
user-visible error when the kernel can't fix it.

This is inventing something new for RAS errors not claimed by firmware-first.
If we have kernel-first too, this will never happen. (unless your system is
losing the error description).


Your system has firmware-first, why isn't it claiming the notification?
If its not finding CPER records written by firmware, check firmware and the UEFI
memory map agree on the attributes to be used when read/writing that area.


> +               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +               return 0;


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-12 18:05               ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: gengdongjiu
  Cc: christoffer.dall, marc.zyngier, linux, bp, rjw, pbonzini,
	rkrcmar, corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

Hi gengdongjiu,

On 15/12/17 03:30, gengdongjiu wrote:
> On 2017/12/7 14:37, gengdongjiu wrote:
>>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>>> a way of migrating pending SError yet... which is where I got stuck last time I
>>> was looking at this.
>> I understand you most idea.
>>
>> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
>> otherwise Qemu will do not know how to do.
>>
>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
>>
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> 
> As I explained above:
> 
> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;

> So Qemu does not know when to _trigger_ an SError.

There is no answer to this. How the CPU decides is specific to the CPU design.
How Qemu decides is going to be specific to the machine it emulates.

My understanding is there is some overlap for which RAS errors are reported as
synchronous external abort, and which use SError. (Obviously the imprecise ones
are all SError). Which one the CPU uses depends on how the CPU is designed.

When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
stage2 fault because the page is marked with PG_poisoned. These started out as a
synchronous exception, but you could still report these with SError.

We don't have a way to signal user-space about imprecise exceptions, this isn't
a KVM specific problem.


> so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
> do you have a better idea about it? thanks

If ghes_notify_sei() fails to claim the error, we should drop through to
kernel-first-handling. We don't have that yet, just the stub that ignores errors
where we can make progress.

If neither firmware-first nor kernel-first claim a RAS error, we're in trouble.
I'd like to panic() as we got a RAS notification but no description of the
error. We can't do this until we have kernel-first support, hence that stub.


> About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
> how and when to trigger the SError.

I agree, but I think we should fix this first before we add another user of this
unmigratable hypervisor state.

(I recall someone saying migration is needed for any new KVM/cpu features, but I
can't find the thread)


> [1]:
> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
>         .......................
> +       case ESR_ELx_AET_UER:   /* The error has not been propagated */
> +               /*
> +                * Userspace only handle the guest SError Interrupt(SEI) if the
> +                * error has not been propagated
> +                */
> +               run->exit_reason = KVM_EXIT_EXCEPTION;
> +               run->ex.exception = ESR_ELx_EC_SERROR;

I'm against telling user space RAS errors ever happened, only the final
user-visible error when the kernel can't fix it.

This is inventing something new for RAS errors not claimed by firmware-first.
If we have kernel-first too, this will never happen. (unless your system is
losing the error description).


Your system has firmware-first, why isn't it claiming the notification?
If its not finding CPER records written by firmware, check firmware and the UEFI
memory map agree on the attributes to be used when read/writing that area.


> +               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +               return 0;


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-12 18:05               ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi gengdongjiu,

On 15/12/17 03:30, gengdongjiu wrote:
> On 2017/12/7 14:37, gengdongjiu wrote:
>>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>>> a way of migrating pending SError yet... which is where I got stuck last time I
>>> was looking at this.
>> I understand you most idea.
>>
>> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
>> otherwise Qemu will do not know how to do.
>>
>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
>>
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> 
> As I explained above:
> 
> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;

> So Qemu does not know when to _trigger_ an SError.

There is no answer to this. How the CPU decides is specific to the CPU design.
How Qemu decides is going to be specific to the machine it emulates.

My understanding is there is some overlap for which RAS errors are reported as
synchronous external abort, and which use SError. (Obviously the imprecise ones
are all SError). Which one the CPU uses depends on how the CPU is designed.

When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
stage2 fault because the page is marked with PG_poisoned. These started out as a
synchronous exception, but you could still report these with SError.

We don't have a way to signal user-space about imprecise exceptions, this isn't
a KVM specific problem.


> so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
> do you have a better idea about it? thanks

If ghes_notify_sei() fails to claim the error, we should drop through to
kernel-first-handling. We don't have that yet, just the stub that ignores errors
where we can make progress.

If neither firmware-first nor kernel-first claim a RAS error, we're in trouble.
I'd like to panic() as we got a RAS notification but no description of the
error. We can't do this until we have kernel-first support, hence that stub.


> About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
> how and when to trigger the SError.

I agree, but I think we should fix this first before we add another user of this
unmigratable hypervisor state.

(I recall someone saying migration is needed for any new KVM/cpu features, but I
can't find the thread)


> [1]:
> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
>         .......................
> +       case ESR_ELx_AET_UER:   /* The error has not been propagated */
> +               /*
> +                * Userspace only handle the guest SError Interrupt(SEI) if the
> +                * error has not been propagated
> +                */
> +               run->exit_reason = KVM_EXIT_EXCEPTION;
> +               run->ex.exception = ESR_ELx_EC_SERROR;

I'm against telling user space RAS errors ever happened, only the final
user-visible error when the kernel can't fix it.

This is inventing something new for RAS errors not claimed by firmware-first.
If we have kernel-first too, this will never happen. (unless your system is
losing the error description).


Your system has firmware-first, why isn't it claiming the notification?
If its not finding CPER records written by firmware, check firmware and the UEFI
memory map agree on the attributes to be used when read/writing that area.


> +               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +               return 0;


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-12 18:05               ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 4342 bytes --]

Hi gengdongjiu,

On 15/12/17 03:30, gengdongjiu wrote:
> On 2017/12/7 14:37, gengdongjiu wrote:
>>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>>> a way of migrating pending SError yet... which is where I got stuck last time I
>>> was looking at this.
>> I understand you most idea.
>>
>> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
>> otherwise Qemu will do not know how to do.
>>
>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>
>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>
>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>
>> not involve _trigger_ an SError.
>>
>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
> 
> As I explained above:
> 
> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;

> So Qemu does not know when to _trigger_ an SError.

There is no answer to this. How the CPU decides is specific to the CPU design.
How Qemu decides is going to be specific to the machine it emulates.

My understanding is there is some overlap for which RAS errors are reported as
synchronous external abort, and which use SError. (Obviously the imprecise ones
are all SError). Which one the CPU uses depends on how the CPU is designed.

When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
stage2 fault because the page is marked with PG_poisoned. These started out as a
synchronous exception, but you could still report these with SError.

We don't have a way to signal user-space about imprecise exceptions, this isn't
a KVM specific problem.


> so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
> do you have a better idea about it? thanks

If ghes_notify_sei() fails to claim the error, we should drop through to
kernel-first-handling. We don't have that yet, just the stub that ignores errors
where we can make progress.

If neither firmware-first nor kernel-first claim a RAS error, we're in trouble.
I'd like to panic() as we got a RAS notification but no description of the
error. We can't do this until we have kernel-first support, hence that stub.


> About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
> how and when to trigger the SError.

I agree, but I think we should fix this first before we add another user of this
unmigratable hypervisor state.

(I recall someone saying migration is needed for any new KVM/cpu features, but I
can't find the thread)


> [1]:
> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
> {
>         .......................
> +       case ESR_ELx_AET_UER:   /* The error has not been propagated */
> +               /*
> +                * Userspace only handle the guest SError Interrupt(SEI) if the
> +                * error has not been propagated
> +                */
> +               run->exit_reason = KVM_EXIT_EXCEPTION;
> +               run->ex.exception = ESR_ELx_EC_SERROR;

I'm against telling user space RAS errors ever happened, only the final
user-visible error when the kernel can't fix it.

This is inventing something new for RAS errors not claimed by firmware-first.
If we have kernel-first too, this will never happen. (unless your system is
losing the error description).


Your system has firmware-first, why isn't it claiming the notification?
If its not finding CPER records written by firmware, check firmware and the UEFI
memory map agree on the attributes to be used when read/writing that area.


> +               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
> +               return 0;


Thanks,

James


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-16  4:47       ` gengdongjiu
  (?)
  (?)
@ 2018-01-12 18:05         ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: gengdongjiu
  Cc: wuquanming, kvm, linux-doc, Marc Zyngier,
	Linux Kernel Mailing List, linuxarm, linux, Dongjiu Geng,
	linux-acpi, Huangshaoyu, kvmarm, arm-mail-list, devel

Hi gengdongjiu,

On 16/12/17 04:47, gengdongjiu wrote:
> [...]
>>
>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>> +             /*
>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>> +              * error has not been propagated
>>> +              */
>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +             return 0;
>>
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> 
> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
> below, which get from [0]

[..]

> so we can see the  exception is precise and PE can recover execution
> from the preferred return address of the exception, 

> so let guest handling it is
> better, for example, if it is guest application RAS error, we can kill
> the guest application instead of panic whole OS; if it is guest kernel
> RAS error, guest will panic.

If the kernel takes an unhandled RAS error it should panic - we don't know where
the error is.

I understand you want to kill-off guest tasks as a result of RAS errors, but
this needs to go through the whole APEI->memory_failure()->sigbus machinery so
that the kernel knows the kernel can keep running.

This saves us signalling user-space when we don't need to. An example:
code-corruption. Linux can happily re-read affected user-space executables from
disk, there is absolutely nothing user-space can do about it.
Handling errors first in the kernel allows us to do recovery for all the
affected processes, not just the one that happens to be running right now.


> Host does not know which application of guest has error, so host can
> not handle it,

It has to work this out, otherwise the errors we can handle never get a chance.

This kernel is expected to look at the error description, (which for some reason
we aren't talking about here), e.g. the CPER records, and determine what
recovery action is necessary for this error.
For memory errors this may be re-reading from disk, or at the worst case,
unmapping from all user-space users (including KVM's stage2) and raining signals
on all affected processes.

For a memory error the important piece of information is the physical address.
Only the kernel can do anything with this, it determines who owns the affected
memory and what needs doing to recover from the error.

If you pass the notification to user-space, all it can do is signal the guest to
"stop doing whatever it is you're doing". The guest may have been able to
re-read pages from disk, or otherwise handle the error.
Has the error been handled? No: The error remains latent in the system.


> panic OS is not a good choice for the Recoverable error.

If we don't know where the error is, and we can't make progress, its the only
sane choice.

This code is never expected to run! (why are we arguing about it?) We should get
RAS errors as GHES notifications from firmware via some mechanism. If those are
NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
handling based on the CPER records. If/when we get kernel-first, that can claim
the SError. What we're left with is RAS notifications that no-one claimed
because there was no error-description found.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-12 18:05         ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Dongjiu Geng, wuquanming, linux-doc, kvm, Marc Zyngier, linux,
	linuxarm, Linux Kernel Mailing List, linux-acpi, arm-mail-list,
	Huangshaoyu, kvmarm, devel

Hi gengdongjiu,

On 16/12/17 04:47, gengdongjiu wrote:
> [...]
>>
>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>> +             /*
>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>> +              * error has not been propagated
>>> +              */
>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +             return 0;
>>
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> 
> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
> below, which get from [0]

[..]

> so we can see the  exception is precise and PE can recover execution
> from the preferred return address of the exception, 

> so let guest handling it is
> better, for example, if it is guest application RAS error, we can kill
> the guest application instead of panic whole OS; if it is guest kernel
> RAS error, guest will panic.

If the kernel takes an unhandled RAS error it should panic - we don't know where
the error is.

I understand you want to kill-off guest tasks as a result of RAS errors, but
this needs to go through the whole APEI->memory_failure()->sigbus machinery so
that the kernel knows the kernel can keep running.

This saves us signalling user-space when we don't need to. An example:
code-corruption. Linux can happily re-read affected user-space executables from
disk, there is absolutely nothing user-space can do about it.
Handling errors first in the kernel allows us to do recovery for all the
affected processes, not just the one that happens to be running right now.


> Host does not know which application of guest has error, so host can
> not handle it,

It has to work this out, otherwise the errors we can handle never get a chance.

This kernel is expected to look at the error description, (which for some reason
we aren't talking about here), e.g. the CPER records, and determine what
recovery action is necessary for this error.
For memory errors this may be re-reading from disk, or at the worst case,
unmapping from all user-space users (including KVM's stage2) and raining signals
on all affected processes.

For a memory error the important piece of information is the physical address.
Only the kernel can do anything with this, it determines who owns the affected
memory and what needs doing to recover from the error.

If you pass the notification to user-space, all it can do is signal the guest to
"stop doing whatever it is you're doing". The guest may have been able to
re-read pages from disk, or otherwise handle the error.
Has the error been handled? No: The error remains latent in the system.


> panic OS is not a good choice for the Recoverable error.

If we don't know where the error is, and we can't make progress, its the only
sane choice.

This code is never expected to run! (why are we arguing about it?) We should get
RAS errors as GHES notifications from firmware via some mechanism. If those are
NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
handling based on the CPER records. If/when we get kernel-first, that can claim
the SError. What we're left with is RAS notifications that no-one claimed
because there was no error-description found.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-12 18:05         ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hi gengdongjiu,

On 16/12/17 04:47, gengdongjiu wrote:
> [...]
>>
>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>> +             /*
>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>> +              * error has not been propagated
>>> +              */
>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +             return 0;
>>
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> 
> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
> below, which get from [0]

[..]

> so we can see the  exception is precise and PE can recover execution
> from the preferred return address of the exception, 

> so let guest handling it is
> better, for example, if it is guest application RAS error, we can kill
> the guest application instead of panic whole OS; if it is guest kernel
> RAS error, guest will panic.

If the kernel takes an unhandled RAS error it should panic - we don't know where
the error is.

I understand you want to kill-off guest tasks as a result of RAS errors, but
this needs to go through the whole APEI->memory_failure()->sigbus machinery so
that the kernel knows the kernel can keep running.

This saves us signalling user-space when we don't need to. An example:
code-corruption. Linux can happily re-read affected user-space executables from
disk, there is absolutely nothing user-space can do about it.
Handling errors first in the kernel allows us to do recovery for all the
affected processes, not just the one that happens to be running right now.


> Host does not know which application of guest has error, so host can
> not handle it,

It has to work this out, otherwise the errors we can handle never get a chance.

This kernel is expected to look at the error description, (which for some reason
we aren't talking about here), e.g. the CPER records, and determine what
recovery action is necessary for this error.
For memory errors this may be re-reading from disk, or at the worst case,
unmapping from all user-space users (including KVM's stage2) and raining signals
on all affected processes.

For a memory error the important piece of information is the physical address.
Only the kernel can do anything with this, it determines who owns the affected
memory and what needs doing to recover from the error.

If you pass the notification to user-space, all it can do is signal the guest to
"stop doing whatever it is you're doing". The guest may have been able to
re-read pages from disk, or otherwise handle the error.
Has the error been handled? No: The error remains latent in the system.


> panic OS is not a good choice for the Recoverable error.

If we don't know where the error is, and we can't make progress, its the only
sane choice.

This code is never expected to run! (why are we arguing about it?) We should get
RAS errors as GHES notifications from firmware via some mechanism. If those are
NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
handling based on the CPER records. If/when we get kernel-first, that can claim
the SError. What we're left with is RAS notifications that no-one claimed
because there was no error-description found.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-12 18:05         ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-12 18:05 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3537 bytes --]

Hi gengdongjiu,

On 16/12/17 04:47, gengdongjiu wrote:
> [...]
>>
>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>> +             /*
>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>> +              * error has not been propagated
>>> +              */
>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>> +             return 0;
>>
>> We should not pass RAS notifications to user space. The kernel either handles
>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
> 
> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
> below, which get from [0]

[..]

> so we can see the  exception is precise and PE can recover execution
> from the preferred return address of the exception, 

> so let guest handling it is
> better, for example, if it is guest application RAS error, we can kill
> the guest application instead of panic whole OS; if it is guest kernel
> RAS error, guest will panic.

If the kernel takes an unhandled RAS error it should panic - we don't know where
the error is.

I understand you want to kill-off guest tasks as a result of RAS errors, but
this needs to go through the whole APEI->memory_failure()->sigbus machinery so
that the kernel knows the kernel can keep running.

This saves us signalling user-space when we don't need to. An example:
code-corruption. Linux can happily re-read affected user-space executables from
disk, there is absolutely nothing user-space can do about it.
Handling errors first in the kernel allows us to do recovery for all the
affected processes, not just the one that happens to be running right now.


> Host does not know which application of guest has error, so host can
> not handle it,

It has to work this out, otherwise the errors we can handle never get a chance.

This kernel is expected to look at the error description, (which for some reason
we aren't talking about here), e.g. the CPER records, and determine what
recovery action is necessary for this error.
For memory errors this may be re-reading from disk, or at the worst case,
unmapping from all user-space users (including KVM's stage2) and raining signals
on all affected processes.

For a memory error the important piece of information is the physical address.
Only the kernel can do anything with this, it determines who owns the affected
memory and what needs doing to recover from the error.

If you pass the notification to user-space, all it can do is signal the guest to
"stop doing whatever it is you're doing". The guest may have been able to
re-read pages from disk, or otherwise handle the error.
Has the error been handled? No: The error remains latent in the system.


> panic OS is not a good choice for the Recoverable error.

If we don't know where the error is, and we can't make progress, its the only
sane choice.

This code is never expected to run! (why are we arguing about it?) We should get
RAS errors as GHES notifications from firmware via some mechanism. If those are
NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
handling based on the CPER records. If/when we get kernel-first, that can claim
the SError. What we're left with is RAS notifications that no-one claimed
because there was no error-description found.



James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-12 18:05               ` James Morse
@ 2018-01-15  8:33                 ` Christoffer Dall
  -1 siblings, 0 replies; 98+ messages in thread
From: Christoffer Dall @ 2018-01-15  8:33 UTC (permalink / raw)
  To: James Morse
  Cc: gengdongjiu, marc.zyngier, linux, bp, rjw, pbonzini, rkrcmar,
	corbet, catalin.marinas, kvm, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-acpi, devel, huangshaoyu,
	wuquanming, linuxarm

On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
> On 15/12/17 03:30, gengdongjiu wrote:
> > On 2017/12/7 14:37, gengdongjiu wrote:

[...]

> 
> (I recall someone saying migration is needed for any new KVM/cpu features, but I
> can't find the thread)
> 

I don't know of any hard set-in-stone rule for this, but I have
certainly argued that since migration is a popular technique in data
centers and often a key motivation behind using virtual machines as it
provides both load-balancing and high availability, we should think
about migration support for all features and state.  Further, experience
has shown that retroactively trying to support migration can result in
really complex interfaces for saving/restoring state (see the ITS
ordering requirements in
Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
thinking about this problem when introducing functionality is a good
idea.

Of course, if there are really good arguments for having some state that
simply cannot be migrated, then that's fine, and we should just make
sure that userspace (e.g. QEMU) and higher level components in the
stack (libvirt, openstack, etc.) can detect this state being used, and
ideally enable/disable it, so that it can predict that a particular VM
cannot be migrated off a particular host, or between a particular set of
two hosts.  As an example, migration is typically prohibited when using
VFIO direct device assignment, but userspace etc. are already aware of
this.

As a final note, if we add support for some architectural feature, which
may be present on some particular hardware and/or implementation, if the
KVM support for said feature is automatically enabled (and not
selectively from userspace), I would push back quite strongly on
something that doesn't support migration, because it would effectively
prevent migration of VMs on ARM.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-15  8:33                 ` Christoffer Dall
  0 siblings, 0 replies; 98+ messages in thread
From: Christoffer Dall @ 2018-01-15  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
> On 15/12/17 03:30, gengdongjiu wrote:
> > On 2017/12/7 14:37, gengdongjiu wrote:

[...]

> 
> (I recall someone saying migration is needed for any new KVM/cpu features, but I
> can't find the thread)
> 

I don't know of any hard set-in-stone rule for this, but I have
certainly argued that since migration is a popular technique in data
centers and often a key motivation behind using virtual machines as it
provides both load-balancing and high availability, we should think
about migration support for all features and state.  Further, experience
has shown that retroactively trying to support migration can result in
really complex interfaces for saving/restoring state (see the ITS
ordering requirements in
Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
thinking about this problem when introducing functionality is a good
idea.

Of course, if there are really good arguments for having some state that
simply cannot be migrated, then that's fine, and we should just make
sure that userspace (e.g. QEMU) and higher level components in the
stack (libvirt, openstack, etc.) can detect this state being used, and
ideally enable/disable it, so that it can predict that a particular VM
cannot be migrated off a particular host, or between a particular set of
two hosts.  As an example, migration is typically prohibited when using
VFIO direct device assignment, but userspace etc. are already aware of
this.

As a final note, if we add support for some architectural feature, which
may be present on some particular hardware and/or implementation, if the
KVM support for said feature is automatically enabled (and not
selectively from userspace), I would push back quite strongly on
something that doesn't support migration, because it would effectively
prevent migration of VMs on ARM.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-15  8:33                 ` Christoffer Dall
  (?)
@ 2018-01-16 11:19                   ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-16 11:19 UTC (permalink / raw)
  To: Christoffer Dall, James Morse
  Cc: marc.zyngier, linux, bp, rjw, pbonzini, rkrcmar, corbet,
	catalin.marinas, kvm, linux-doc, linux-kernel, linux-arm-kernel,
	kvmarm, linux-acpi, devel, huangshaoyu, wuquanming, linuxarm

Hi Christoffer

On 2018/1/15 16:33, Christoffer Dall wrote:
> On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>>> On 2017/12/7 14:37, gengdongjiu wrote:
> 
> [...]
> 
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, but I
>> can't find the thread)
>>
> 
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
> 
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
> 
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Thanks very much for this mail and reply, I will check it, please give me some time due to
recently busy with other things.

> 
> Thanks,
> -Christoffer
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-16 11:19                   ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-16 11:19 UTC (permalink / raw)
  To: Christoffer Dall, James Morse
  Cc: marc.zyngier, linux, bp, rjw, pbonzini, rkrcmar, corbet,
	catalin.marinas, kvm, linux-doc, linux-kernel, linux-arm-kernel,
	kvmarm, linux-acpi, devel, huangshaoyu, wuquanming, linuxarm

Hi Christoffer

On 2018/1/15 16:33, Christoffer Dall wrote:
> On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>>> On 2017/12/7 14:37, gengdongjiu wrote:
> 
> [...]
> 
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, but I
>> can't find the thread)
>>
> 
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
> 
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
> 
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Thanks very much for this mail and reply, I will check it, please give me some time due to
recently busy with other things.

> 
> Thanks,
> -Christoffer
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-16 11:19                   ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-16 11:19 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]

Hi Christoffer

On 2018/1/15 16:33, Christoffer Dall wrote:
> On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>>> On 2017/12/7 14:37, gengdongjiu wrote:
> 
> [...]
> 
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, but I
>> can't find the thread)
>>
> 
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
> 
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
> 
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Thanks very much for this mail and reply, I will check it, please give me some time due to
recently busy with other things.

> 
> Thanks,
> -Christoffer
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-12 18:05         ` James Morse
  (?)
@ 2018-01-16 11:22           ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-16 11:22 UTC (permalink / raw)
  To: James Morse, gengdongjiu
  Cc: wuquanming, kvm, linux-doc, Marc Zyngier,
	Linux Kernel Mailing List, linuxarm, linux, linux-acpi,
	Huangshaoyu, kvmarm, arm-mail-list, devel

Hi James,
  thanks very much for your mail and reply, I will check it ASAP. Due to recently busy with other thing, so reply may be late.

On 2018/1/13 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> +             /*
>>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>>> +              * error has not been propagated
>>>> +              */
>>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +             return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
> 
> [..]
> 
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception, 
> 
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
> 
> If the kernel takes an unhandled RAS error it should panic - we don't know where
> the error is.
> 
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
> 
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
> 
> 
>> Host does not know which application of guest has error, so host can
>> not handle it,
> 
> It has to work this out, otherwise the errors we can handle never get a chance.
> 
> This kernel is expected to look at the error description, (which for some reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining signals
> on all affected processes.
> 
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
> 
> If you pass the notification to user-space, all it can do is signal the guest to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
> 
> 
>> panic OS is not a good choice for the Recoverable error.
> 
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
> 
> This code is never expected to run! (why are we arguing about it?) We should get
> RAS errors as GHES notifications from firmware via some mechanism. If those are
> NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
> handling based on the CPER records. If/when we get kernel-first, that can claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
> 
> 
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-16 11:22           ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-16 11:22 UTC (permalink / raw)
  To: James Morse, gengdongjiu
  Cc: wuquanming, linux-doc, kvm, Marc Zyngier, linux, linuxarm,
	Linux Kernel Mailing List, linux-acpi, arm-mail-list,
	Huangshaoyu, kvmarm, devel

Hi James,
  thanks very much for your mail and reply, I will check it ASAP. Due to recently busy with other thing, so reply may be late.

On 2018/1/13 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> +             /*
>>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>>> +              * error has not been propagated
>>>> +              */
>>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +             return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
> 
> [..]
> 
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception, 
> 
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
> 
> If the kernel takes an unhandled RAS error it should panic - we don't know where
> the error is.
> 
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
> 
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
> 
> 
>> Host does not know which application of guest has error, so host can
>> not handle it,
> 
> It has to work this out, otherwise the errors we can handle never get a chance.
> 
> This kernel is expected to look at the error description, (which for some reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining signals
> on all affected processes.
> 
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
> 
> If you pass the notification to user-space, all it can do is signal the guest to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
> 
> 
>> panic OS is not a good choice for the Recoverable error.
> 
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
> 
> This code is never expected to run! (why are we arguing about it?) We should get
> RAS errors as GHES notifications from firmware via some mechanism. If those are
> NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
> handling based on the CPER records. If/when we get kernel-first, that can claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
> 
> 
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-16 11:22           ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-16 11:22 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 3869 bytes --]

Hi James,
  thanks very much for your mail and reply, I will check it ASAP. Due to recently busy with other thing, so reply may be late.

On 2018/1/13 2:05, James Morse wrote:
> Hi gengdongjiu,
> 
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> +             /*
>>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>>> +              * error has not been propagated
>>>> +              */
>>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +             return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
> 
> [..]
> 
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception, 
> 
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
> 
> If the kernel takes an unhandled RAS error it should panic - we don't know where
> the error is.
> 
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
> 
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
> 
> 
>> Host does not know which application of guest has error, so host can
>> not handle it,
> 
> It has to work this out, otherwise the errors we can handle never get a chance.
> 
> This kernel is expected to look at the error description, (which for some reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining signals
> on all affected processes.
> 
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
> 
> If you pass the notification to user-space, all it can do is signal the guest to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
> 
> 
>> panic OS is not a good choice for the Recoverable error.
> 
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
> 
> This code is never expected to run! (why are we arguing about it?) We should get
> RAS errors as GHES notifications from firmware via some mechanism. If those are
> NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
> handling based on the CPER records. If/when we get kernel-first, that can claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
> 
> 
> 
> James
> 
> .
> 


^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-12 18:05               ` James Morse
@ 2018-01-21  2:45                 ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-21  2:45 UTC (permalink / raw)
  To: James Morse
  Cc: gengdongjiu, wuquanming, linux-doc, kvm, Marc Zyngier,
	Catalin Marinas, Jonathan Corbet, rjw, linux, linuxarm,
	Linux Kernel Mailing List, linux-acpi, bp, arm-mail-list,
	Huangshaoyu, pbonzini, kvmarm, devel

Hi James,
   Sorry for my late response due to out of office recently.

2018-01-13 2:05 GMT+08:00 James Morse <james.morse@arm.com>:
> Hi gengdongjiu,
>
> On 15/12/17 03:30, gengdongjiu wrote:
>> On 2017/12/7 14:37, gengdongjiu wrote:
>>>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>>>> a way of migrating pending SError yet... which is where I got stuck last time I
>>>> was looking at this.
>>> I understand you most idea.
>>>
>>> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
>>> otherwise Qemu will do not know how to do.
>>>
>>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
>>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>>>
>>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
>>
>> As I explained above:
>>
>> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
>> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
>
>> So Qemu does not know when to _trigger_ an SError.
>
> There is no answer to this. How the CPU decides is specific to the CPU design.
> How Qemu decides is going to be specific to the machine it emulates.
>
> My understanding is there is some overlap for which RAS errors are reported as
> synchronous external abort, and which use SError. (Obviously the imprecise ones
> are all SError). Which one the CPU uses depends on how the CPU is designed.
>
> When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
> stage2 fault because the page is marked with PG_poisoned. These started out as a
> synchronous exception, but you could still report these with SError.

yes, I agree, it is policy choice.

>
> We don't have a way to signal user-space about imprecise exceptions, this isn't
> a KVM specific problem.
>
>
>> so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
>> do you have a better idea about it? thanks
>
> If ghes_notify_sei() fails to claim the error, we should drop through to
> kernel-first-handling. We don't have that yet, just the stub that ignores errors
> where we can make progress.
>
> If neither firmware-first nor kernel-first claim a RAS error, we're in trouble.
> I'd like to panic() as we got a RAS notification but no description of the
> error. We can't do this until we have kernel-first support, hence that stub.
>
>
>> About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
>> how and when to trigger the SError.
>
> I agree, but I think we should fix this first before we add another user of this
> unmigratable hypervisor state.
>
> (I recall someone saying migration is needed for any new KVM/cpu features, but I
> can't find the thread)
>
>
>> [1]:
>> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>>         .......................
>> +       case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +               /*
>> +                * Userspace only handle the guest SError Interrupt(SEI) if the
>> +                * error has not been propagated
>> +                */
>> +               run->exit_reason = KVM_EXIT_EXCEPTION;
>> +               run->ex.exception = ESR_ELx_EC_SERROR;
>
> I'm against telling user space RAS errors ever happened, only the final
> user-visible error when the kernel can't fix it.

thanks for the explanation.
For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
be better[1].
But if you think panic is better until we support kernel-first, it is
also OK to me.


+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+     unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+     bool impdef_syndrome =  esr & ESR_ELx_ISV; /* aka IDS */
+     unsigned int aet = esr & ESR_ELx_AET;
+
+     /*
+     * This is not RAS SError
+     */
+     if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+         kvm_inject_vabt(vcpu);
+         return 1;
+     }
+
+     /* For RAS the host kernel may handle this abort. */
+     if (!handle_guest_sei())
+         return 1;
+
+     /*
+     * In below two conditions, it will directly inject the
+     * virtual SError:
+     * 1. The Syndrome is IMPLEMENTATION DEFINED
+     * 2. It is Uncategorized SEI
+     */
+     if (impdef_syndrome ||
+          ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+         kvm_inject_vabt(vcpu);
+         return 1;
+ }
+
+ switch (aet) {
+     case ESR_ELx_AET_CE: /* corrected error */
+     case ESR_ELx_AET_UEO: /* restartable error, not yet consumed */
+         return 1; /* continue processing the guest exit */
+     case ESR_ELx_AET_UER: /* recoverable error */
+         /*
+         * the exception is precise, not been silently propagated
+         * and not been consumed by the CPU, temporarily shut down
+         * the VM to isolated the error, hope not touch it again.
+         */
+     run->exit_reason = KVM_EXIT_EXCEPTION;
+     return 0;
+     default:
+     /*
+     * Until now, the CPU supports RAS, SError interrupt is fatal
+     * and host does not successfully handle it.
+     */
+     panic("This Asynchronous SError interrupt is dangerous, panic");
+    }
+
+     return 0;
+}
+

>
> This is inventing something new for RAS errors not claimed by firmware-first.
> If we have kernel-first too, this will never happen. (unless your system is
> losing the error description).
In fact, if we have kernel-first, I think we still need to judge the
error type by ESR, right?
If the handle_guest_sei() , may be the system does not support firmware-first,
so we judge the ESR value,

>
>
> Your system has firmware-first, why isn't it claiming the notification?
> If its not finding CPER records written by firmware, check firmware and the UEFI
> memory map agree on the attributes to be used when read/writing that area.
>
>
>> +               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +               return 0;
>
>
> Thanks,
>
> James
>
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-21  2:45                 ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-21  2:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi James,
   Sorry for my late response due to out of office recently.

2018-01-13 2:05 GMT+08:00 James Morse <james.morse@arm.com>:
> Hi gengdongjiu,
>
> On 15/12/17 03:30, gengdongjiu wrote:
>> On 2017/12/7 14:37, gengdongjiu wrote:
>>>> We need to tackle (1) and (3) separately. For (3) we need some API that lets
>>>> Qemu _trigger_ an SError in the guest, with a specified ESR. But, we don't have
>>>> a way of migrating pending SError yet... which is where I got stuck last time I
>>>> was looking at this.
>>> I understand you most idea.
>>>
>>> But In the Qemu one signal type can only correspond to one behavior, can not correspond to two behaviors,
>>> otherwise Qemu will do not know how to do.
>>>
>>> For the Qemu, if it receives the SIGBUS_MCEERR_AR signal, it will populate the CPER
>>> records and inject a SEA to guest through KVM IOCTL "KVM_SET_ONE_REG"; if receives the SIGBUS_MCEERR_AO
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>>>
>>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
>>
>> As I explained above:
>>
>> If Qemu received SIGBUS_MCEERR_AR, it will record CPER and trigger Synchronous External Abort;
>> If Qemu received SIGBUS_MCEERR_AO, it will record CPER and trigger GPIO IRQ;
>
>> So Qemu does not know when to _trigger_ an SError.
>
> There is no answer to this. How the CPU decides is specific to the CPU design.
> How Qemu decides is going to be specific to the machine it emulates.
>
> My understanding is there is some overlap for which RAS errors are reported as
> synchronous external abort, and which use SError. (Obviously the imprecise ones
> are all SError). Which one the CPU uses depends on how the CPU is designed.
>
> When you take an SIGBUS_MCEERR_AR from KVM, its because KVM can't complete a
> stage2 fault because the page is marked with PG_poisoned. These started out as a
> synchronous exception, but you could still report these with SError.

yes, I agree, it is policy choice.

>
> We don't have a way to signal user-space about imprecise exceptions, this isn't
> a KVM specific problem.
>
>
>> so here I "return a error" to Qemu if ghes_notify_sei() return failure in [1], if you opposed KVM "return error",
>> do you have a better idea about it? thanks
>
> If ghes_notify_sei() fails to claim the error, we should drop through to
> kernel-first-handling. We don't have that yet, just the stub that ignores errors
> where we can make progress.
>
> If neither firmware-first nor kernel-first claim a RAS error, we're in trouble.
> I'd like to panic() as we got a RAS notification but no description of the
> error. We can't do this until we have kernel-first support, hence that stub.
>
>
>> About the way of migrating pending SError, I think it is a separate case, because Qemu still does not know
>> how and when to trigger the SError.
>
> I agree, but I think we should fix this first before we add another user of this
> unmigratable hypervisor state.
>
> (I recall someone saying migration is needed for any new KVM/cpu features, but I
> can't find the thread)
>
>
>> [1]:
>> static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
>> {
>>         .......................
>> +       case ESR_ELx_AET_UER:   /* The error has not been propagated */
>> +               /*
>> +                * Userspace only handle the guest SError Interrupt(SEI) if the
>> +                * error has not been propagated
>> +                */
>> +               run->exit_reason = KVM_EXIT_EXCEPTION;
>> +               run->ex.exception = ESR_ELx_EC_SERROR;
>
> I'm against telling user space RAS errors ever happened, only the final
> user-visible error when the kernel can't fix it.

thanks for the explanation.
For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
be better[1].
But if you think panic is better until we support kernel-first, it is
also OK to me.


+static int kvm_handle_guest_sei(struct kvm_vcpu *vcpu, struct kvm_run *run)
+{
+     unsigned int esr = kvm_vcpu_get_hsr(vcpu);
+     bool impdef_syndrome =  esr & ESR_ELx_ISV; /* aka IDS */
+     unsigned int aet = esr & ESR_ELx_AET;
+
+     /*
+     * This is not RAS SError
+     */
+     if (!cpus_have_const_cap(ARM64_HAS_RAS_EXTN)) {
+         kvm_inject_vabt(vcpu);
+         return 1;
+     }
+
+     /* For RAS the host kernel may handle this abort. */
+     if (!handle_guest_sei())
+         return 1;
+
+     /*
+     * In below two conditions, it will directly inject the
+     * virtual SError:
+     * 1. The Syndrome is IMPLEMENTATION DEFINED
+     * 2. It is Uncategorized SEI
+     */
+     if (impdef_syndrome ||
+          ((esr & ESR_ELx_FSC) != ESR_ELx_FSC_SERROR)) {
+         kvm_inject_vabt(vcpu);
+         return 1;
+ }
+
+ switch (aet) {
+     case ESR_ELx_AET_CE: /* corrected error */
+     case ESR_ELx_AET_UEO: /* restartable error, not yet consumed */
+         return 1; /* continue processing the guest exit */
+     case ESR_ELx_AET_UER: /* recoverable error */
+         /*
+         * the exception is precise, not been silently propagated
+         * and not been consumed by the CPU, temporarily shut down
+         * the VM to isolated the error, hope not touch it again.
+         */
+     run->exit_reason = KVM_EXIT_EXCEPTION;
+     return 0;
+     default:
+     /*
+     * Until now, the CPU supports RAS, SError interrupt is fatal
+     * and host does not successfully handle it.
+     */
+     panic("This Asynchronous SError interrupt is dangerous, panic");
+    }
+
+     return 0;
+}
+

>
> This is inventing something new for RAS errors not claimed by firmware-first.
> If we have kernel-first too, this will never happen. (unless your system is
> losing the error description).
In fact, if we have kernel-first, I think we still need to judge the
error type by ESR, right?
If the handle_guest_sei() , may be the system does not support firmware-first,
so we judge the ESR value,

>
>
> Your system has firmware-first, why isn't it claiming the notification?
> If its not finding CPER records written by firmware, check firmware and the UEFI
> memory map agree on the attributes to be used when read/writing that area.
>
>
>> +               run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>> +               return 0;
>
>
> Thanks,
>
> James
>
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-12 18:05         ` James Morse
@ 2018-01-21  2:54           ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-21  2:54 UTC (permalink / raw)
  To: James Morse
  Cc: Dongjiu Geng, wuquanming, linux-doc, kvm, Marc Zyngier, linux,
	linuxarm, Linux Kernel Mailing List, linux-acpi, arm-mail-list,
	Huangshaoyu, kvmarm, devel

2018-01-13 2:05 GMT+08:00 James Morse <james.morse@arm.com>:
> Hi gengdongjiu,
>
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> +             /*
>>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>>> +              * error has not been propagated
>>>> +              */
>>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +             return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
>
> [..]
>
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception,
>
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
>
> If the kernel takes an unhandled RAS error it should panic - we don't know where
> the error is.
OK, here I will panic.

>
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
>
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
>
>
>> Host does not know which application of guest has error, so host can
>> not handle it,
>
> It has to work this out, otherwise the errors we can handle never get a chance.
>
> This kernel is expected to look at the error description, (which for some reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining signals
> on all affected processes.
>
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
>
> If you pass the notification to user-space, all it can do is signal the guest to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
>
>
>> panic OS is not a good choice for the Recoverable error.
>
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
Ok, I will panic here.

>
> This code is never expected to run! (why are we arguing about it?) We should get
> RAS errors as GHES notifications from firmware via some mechanism. If those are
> NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
> handling based on the CPER records. If/when we get kernel-first, that can claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
>
>
>
> James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-21  2:54           ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-21  2:54 UTC (permalink / raw)
  To: linux-arm-kernel

2018-01-13 2:05 GMT+08:00 James Morse <james.morse@arm.com>:
> Hi gengdongjiu,
>
> On 16/12/17 04:47, gengdongjiu wrote:
>> [...]
>>>
>>>> +     case ESR_ELx_AET_UER:   /* The error has not been propagated */
>>>> +             /*
>>>> +              * Userspace only handle the guest SError Interrupt(SEI) if the
>>>> +              * error has not been propagated
>>>> +              */
>>>> +             run->exit_reason = KVM_EXIT_EXCEPTION;
>>>> +             run->ex.exception = ESR_ELx_EC_SERROR;
>>>> +             run->ex.error_code = KVM_SEI_SEV_RECOVERABLE;
>>>> +             return 0;
>>>
>>> We should not pass RAS notifications to user space. The kernel either handles
>>> them, or it panics(). User space shouldn't even know if the kernel supports RAS
>>
>> For the  ESR_ELx_AET_UER(Recoverable error), let us see its definition
>> below, which get from [0]
>
> [..]
>
>> so we can see the  exception is precise and PE can recover execution
>> from the preferred return address of the exception,
>
>> so let guest handling it is
>> better, for example, if it is guest application RAS error, we can kill
>> the guest application instead of panic whole OS; if it is guest kernel
>> RAS error, guest will panic.
>
> If the kernel takes an unhandled RAS error it should panic - we don't know where
> the error is.
OK, here I will panic.

>
> I understand you want to kill-off guest tasks as a result of RAS errors, but
> this needs to go through the whole APEI->memory_failure()->sigbus machinery so
> that the kernel knows the kernel can keep running.
>
> This saves us signalling user-space when we don't need to. An example:
> code-corruption. Linux can happily re-read affected user-space executables from
> disk, there is absolutely nothing user-space can do about it.
> Handling errors first in the kernel allows us to do recovery for all the
> affected processes, not just the one that happens to be running right now.
>
>
>> Host does not know which application of guest has error, so host can
>> not handle it,
>
> It has to work this out, otherwise the errors we can handle never get a chance.
>
> This kernel is expected to look at the error description, (which for some reason
> we aren't talking about here), e.g. the CPER records, and determine what
> recovery action is necessary for this error.
> For memory errors this may be re-reading from disk, or at the worst case,
> unmapping from all user-space users (including KVM's stage2) and raining signals
> on all affected processes.
>
> For a memory error the important piece of information is the physical address.
> Only the kernel can do anything with this, it determines who owns the affected
> memory and what needs doing to recover from the error.
>
> If you pass the notification to user-space, all it can do is signal the guest to
> "stop doing whatever it is you're doing". The guest may have been able to
> re-read pages from disk, or otherwise handle the error.
> Has the error been handled? No: The error remains latent in the system.
>
>
>> panic OS is not a good choice for the Recoverable error.
>
> If we don't know where the error is, and we can't make progress, its the only
> sane choice.
Ok, I will panic here.

>
> This code is never expected to run! (why are we arguing about it?) We should get
> RAS errors as GHES notifications from firmware via some mechanism. If those are
> NOTIFY_SEI then APEI should claim the notification and kick off the appropriate
> handling based on the CPER records. If/when we get kernel-first, that can claim
> the SError. What we're left with is RAS notifications that no-one claimed
> because there was no error-description found.
>
>
>
> James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-15  8:33                 ` Christoffer Dall
@ 2018-01-21  3:10                   ` gengdongjiu
  -1 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-21  3:10 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: James Morse, wuquanming, linux-doc, kvm, Marc Zyngier,
	Catalin Marinas, Jonathan Corbet, rjw, linux, linuxarm,
	gengdongjiu, linux-acpi, bp, arm-mail-list, Huangshaoyu,
	pbonzini, kvmarm, Linux Kernel Mailing List, devel

2018-01-15 16:33 GMT+08:00 Christoffer Dall <christoffer.dall@linaro.org>:
> On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>> > On 2017/12/7 14:37, gengdongjiu wrote:
>
> [...]
>
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, but I
>> can't find the thread)
>>
>
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
yes, agree it.

>
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
Ok,  I think this problem is similar to migrating a VM that uses an irqchip in
 userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.

>
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Ok, got it.

>
> Thanks,
> -Christoffer
> _______________________________________________
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-21  3:10                   ` gengdongjiu
  0 siblings, 0 replies; 98+ messages in thread
From: gengdongjiu @ 2018-01-21  3:10 UTC (permalink / raw)
  To: linux-arm-kernel

2018-01-15 16:33 GMT+08:00 Christoffer Dall <christoffer.dall@linaro.org>:
> On Fri, Jan 12, 2018 at 06:05:23PM +0000, James Morse wrote:
>> On 15/12/17 03:30, gengdongjiu wrote:
>> > On 2017/12/7 14:37, gengdongjiu wrote:
>
> [...]
>
>>
>> (I recall someone saying migration is needed for any new KVM/cpu features, but I
>> can't find the thread)
>>
>
> I don't know of any hard set-in-stone rule for this, but I have
> certainly argued that since migration is a popular technique in data
> centers and often a key motivation behind using virtual machines as it
> provides both load-balancing and high availability, we should think
> about migration support for all features and state.  Further, experience
> has shown that retroactively trying to support migration can result in
> really complex interfaces for saving/restoring state (see the ITS
> ordering requirements in
> Documentation/virtual/kvm/devices/arm-vgic-its.txt as an example) so
> thinking about this problem when introducing functionality is a good
> idea.
yes, agree it.

>
> Of course, if there are really good arguments for having some state that
> simply cannot be migrated, then that's fine, and we should just make
> sure that userspace (e.g. QEMU) and higher level components in the
> stack (libvirt, openstack, etc.) can detect this state being used, and
> ideally enable/disable it, so that it can predict that a particular VM
> cannot be migrated off a particular host, or between a particular set of
> two hosts.  As an example, migration is typically prohibited when using
> VFIO direct device assignment, but userspace etc. are already aware of
> this.
Ok,  I think this problem is similar to migrating a VM that uses an irqchip in
 userspace and has set the IRQ or FIQ lines using KVM_IRQ_LINE.

>
> As a final note, if we add support for some architectural feature, which
> may be present on some particular hardware and/or implementation, if the
> KVM support for said feature is automatically enabled (and not
> selectively from userspace), I would push back quite strongly on
> something that doesn't support migration, because it would effectively
> prevent migration of VMs on ARM.
Ok, got it.

>
> Thanks,
> -Christoffer
> _______________________________________________
> kvmarm mailing list
> kvmarm at lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2018-01-21  2:45                 ` gengdongjiu
  (?)
  (?)
@ 2018-01-22 19:32                   ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-22 19:32 UTC (permalink / raw)
  To: gengdongjiu
  Cc: wuquanming, linux-acpi, kvm, linux-doc, Marc Zyngier,
	Catalin Marinas, Jonathan Corbet, rjw, linux, gengdongjiu,
	linuxarm, bp, arm-mail-list, pbonzini, Huangshaoyu, kvmarm,
	Linux Kernel Mailing List, devel

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-22 19:32                   ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-22 19:32 UTC (permalink / raw)
  To: gengdongjiu
  Cc: gengdongjiu, wuquanming, linux-doc, kvm, Marc Zyngier,
	Catalin Marinas, Jonathan Corbet, rjw, linux, linuxarm,
	Linux Kernel Mailing List, linux-acpi, bp, arm-mail-list,
	Huangshaoyu, pbonzini, kvmarm, devel

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-22 19:32                   ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-22 19:32 UTC (permalink / raw)
  To: linux-arm-kernel

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Devel] [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-22 19:32                   ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-22 19:32 UTC (permalink / raw)
  To: devel

[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]

Hi gengdongjiu,

On 21/01/18 02:45, gengdongjiu wrote:
> For the ESR_ELx_AET_UER, this exception is precise, closing the VM may
> be better[1].
> But if you think panic is better until we support kernel-first, it is
> also OK to me.

I'm not convinced SError while a guest was running means only guest memory could
be affected. Mechanisms like KSM means the error could affect multiple guests.

Both firmware-fist and kernel-first will give us the address, with which we can
know which processes are affected, isolated the memory and signal affected
processes.

Until we have one of these panic() is the only way we have to contain an error,
but its an interim fix.
Not panic()ing the host for an error that should be contained to the guest is a
fudge, we don't actually know its safe (KSM, page-table etc). I want to improve
on this with {firmware, kernel}-first support (or both!), I don't want to expose
that this is happening to user-space, as once we have one of {firmware,
kernel}-first, it shouldn't happen.


>> This is inventing something new for RAS errors not claimed by firmware-first.
>> If we have kernel-first too, this will never happen. (unless your system is
>> losing the error description).

> In fact, if we have kernel-first, I think we still need to judge the
> error type by ESR, right?

The kernel-first mechanism should consider the ESR/FAR, yes, but once the error
has been claimed and handled, KVM shouldn't care about any of these values.
(maybe we'll sanity check for uncontained errors, just in case the error escaped
to the RAS code...)

My point here was exposing 'unhandled' (ignored) RAS errors to user-space
creates an ABI: someone will complain once we start handling the error, and they
no longer get a notification via this 'unhandled' interface. Code written to use
this interface becomes useless/untested.


> If the handle_guest_sei() , may be the system does not support firmware-first,
> so we judge the ESR value,

...and panic()/ignore as appropriate.

I agree not all systems will support firmware-first, (big-endian is the obvious
example), but if we get kernel-first support this ESR guessing can disappear,
I'm against exposing it to user-space in the meantime.


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
  2017-12-16  3:44               ` gengdongjiu
@ 2018-01-22 19:36                 ` James Morse
  -1 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-22 19:36 UTC (permalink / raw)
  To: gengdongjiu
  Cc: christoffer.dall, marc.zyngier, linux, kvm, linux-doc,
	linux-kernel, linux-arm-kernel, kvmarm, linux-acpi, huangshaoyu,
	wuquanming

Hi gengdongjiu,

On 16/12/17 03:44, gengdongjiu wrote:
> On 2017/12/16 2:52, James Morse wrote:
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>> It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
>> software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
>> CPU you are trying to emulate.
>>
>> I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
>> the guest doesn't take the SError immediately, instead tries to re-execute the

> I agree it is better to use NOTIFY_SEA for SIGBUS_MCEERR_AR in this case.

>> code KVM has unmapped from stage2 because its corrupt. (You could detect this
>> happening in Qemu and try something else)

> For something else, using NOTIFY_SEI for SIGBUS_MCEERR_AR?

Sorry that was unclear. If you use NOTIFY_SEI, the guest may have PSTATE.A set,
in which case the the CPU will patiently wait for it to unmask, (or consume it
with an ESB-instruction), before delivering the notification. The guest may not
have unmasked SError because its hammering the same page taking the same fault
again and again. Pending the asynchronous notification and re-running the vcpu
doesn't guarantee progress will be made.

In this case user-space can spot its pended an asynchronous notification (for
the same address!) more than once in the last few seconds, and try something
else, like firing a guest:reboot watchdog on another CPU.


> At current implementation,
> It seems only have this case that "KVM has unmapped from stage2", do you thing we
> still have something else?

I'm wary that this only works for errors where we know the guest PC accessed the
faulting location.

The arch code will send this signal too if user-space touches the PG_poisoned
page. (I recall you checked Qemu spots this case and acts differently).
Migration is the obvious example for Qemu read/writing guest memory.

On x86 the MachineCheck code sends these signals too, so our kernel-first
implementation may do the same. As a response to a RAS error notified by
synchronous-external-abort, this is fine. But we need to remember '_AR' implies
the error is related to the code the signal interrupted, which wouldn't be true
for an error notified by SError.


>> Synchronous/asynchronous external abort matters to the CPU, but once the error
>> has been notified to software the reasons for this distinction disappear. Once
>> the error has been handled, all trace of this distinction is gone.
>>
>> CPER records only describe component failures. You are trying to re-create some
>> state that disappeared with one of the firmware-first abstractions. Trying to
>> re-create this information isn't worth the effort as the distinction doesn't
>> matter to linux, only to the CPU.
>>
>>
>>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
>> You mean there is no reason for Qemu to trigger an SError when it gets a signal
>> from the kernel.
>>
>> The reasons the CPU might have to generate an SError don't apply to linux and
>> KVM user space. User-space will never get a signal for an uncontained error, we
>> will always panic(). We can't give user-space a signal for imprecise exceptions,
>> as it can't return from the signal. The classes of error that are left are
>> covered by polled/irq and NOTIFY_SEA.
>>
>> Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
>> to, (but I don't think you should, the kernel may have unmapped the page at PC
>> from stage2 due to corruption).

> yes, you also said you do not want to generate RAS SErrors for SIGBUS_MCEERR_AR,
> so Qemu does not know in which condition to generate RAS SErrors.

There are two things going on here, firstly the guest may have masked PSTATE.A,
and be hammering an unmapped page. (this this 'sorry that was unclear' case
above). This would happen if the exception-entry code or stack became corrupt
when an exception was taken.
The second is what does existing non-RAS-aware software do? For SError it
panic()s, whereas for synchronous external abort there are some cases that can
be handled. (e.g. on linux: synchronous external abort from user-space).


>> I think the problem here is you're applying the CPU->software behaviour and
>> choices to software->software. By the time user-space gets the error, the
>> behaviour is different.

> In the KVM, as a policy choice to reserve this API to specify guest ESR and 
> drive to trigger SError is OK,
> At least for Qemu it does not know in which condition to trigger it.

I think you're saying "lets keep it KVM for now, Qemu doesn't have a better idea
of what to do."


Thanks,

James





^ permalink raw reply	[flat|nested] 98+ messages in thread

* [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization
@ 2018-01-22 19:36                 ` James Morse
  0 siblings, 0 replies; 98+ messages in thread
From: James Morse @ 2018-01-22 19:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi gengdongjiu,

On 16/12/17 03:44, gengdongjiu wrote:
> On 2017/12/16 2:52, James Morse wrote:
>>> signal, it will record the CPER and trigger a IRQ to notify guest, as shown below:
>>>
>>> SIGBUS_MCEERR_AR trigger Synchronous External Abort.
>>> SIGBUS_MCEERR_AO trigger GPIO IRQ.
>>>
>>> For the SIGBUS_MCEERR_AO and SIGBUS_MCEERR_AR, we have already specify trigger method, which all
>>>
>>> not involve _trigger_ an SError.
>> It's a policy choice. How does your virtual CPU notify RAS errors to its virtual
>> software? You could use SError for SIGBUS_MCEERR_AR, it depends on what type of
>> CPU you are trying to emulate.
>>
>> I'd suggest using NOTIFY_SEA for SIGBUS_MCEERR_AR as it avoids problems where
>> the guest doesn't take the SError immediately, instead tries to re-execute the

> I agree it is better to use NOTIFY_SEA for SIGBUS_MCEERR_AR in this case.

>> code KVM has unmapped from stage2 because its corrupt. (You could detect this
>> happening in Qemu and try something else)

> For something else, using NOTIFY_SEI for SIGBUS_MCEERR_AR?

Sorry that was unclear. If you use NOTIFY_SEI, the guest may have PSTATE.A set,
in which case the the CPU will patiently wait for it to unmask, (or consume it
with an ESB-instruction), before delivering the notification. The guest may not
have unmasked SError because its hammering the same page taking the same fault
again and again. Pending the asynchronous notification and re-running the vcpu
doesn't guarantee progress will be made.

In this case user-space can spot its pended an asynchronous notification (for
the same address!) more than once in the last few seconds, and try something
else, like firing a guest:reboot watchdog on another CPU.


> At current implementation,
> It seems only have this case that "KVM has unmapped from stage2", do you thing we
> still have something else?

I'm wary that this only works for errors where we know the guest PC accessed the
faulting location.

The arch code will send this signal too if user-space touches the PG_poisoned
page. (I recall you checked Qemu spots this case and acts differently).
Migration is the obvious example for Qemu read/writing guest memory.

On x86 the MachineCheck code sends these signals too, so our kernel-first
implementation may do the same. As a response to a RAS error notified by
synchronous-external-abort, this is fine. But we need to remember '_AR' implies
the error is related to the code the signal interrupted, which wouldn't be true
for an error notified by SError.


>> Synchronous/asynchronous external abort matters to the CPU, but once the error
>> has been notified to software the reasons for this distinction disappear. Once
>> the error has been handled, all trace of this distinction is gone.
>>
>> CPER records only describe component failures. You are trying to re-create some
>> state that disappeared with one of the firmware-first abstractions. Trying to
>> re-create this information isn't worth the effort as the distinction doesn't
>> matter to linux, only to the CPU.
>>
>>
>>> so there is no chance for Qemu to trigger the SError when gets the SIGBUS_MCEERR_A{O,R}.
>> You mean there is no reason for Qemu to trigger an SError when it gets a signal
>> from the kernel.
>>
>> The reasons the CPU might have to generate an SError don't apply to linux and
>> KVM user space. User-space will never get a signal for an uncontained error, we
>> will always panic(). We can't give user-space a signal for imprecise exceptions,
>> as it can't return from the signal. The classes of error that are left are
>> covered by polled/irq and NOTIFY_SEA.
>>
>> Qemu can decide to generate RAS SErrors for SIGBUS_MCEERR_AR if it really wants
>> to, (but I don't think you should, the kernel may have unmapped the page at PC
>> from stage2 due to corruption).

> yes, you also said you do not want to generate RAS SErrors for SIGBUS_MCEERR_AR,
> so Qemu does not know in which condition to generate RAS SErrors.

There are two things going on here, firstly the guest may have masked PSTATE.A,
and be hammering an unmapped page. (this this 'sorry that was unclear' case
above). This would happen if the exception-entry code or stack became corrupt
when an exception was taken.
The second is what does existing non-RAS-aware software do? For SError it
panic()s, whereas for synchronous external abort there are some cases that can
be handled. (e.g. on linux: synchronous external abort from user-space).


>> I think the problem here is you're applying the CPU->software behaviour and
>> choices to software->software. By the time user-space gets the error, the
>> behaviour is different.

> In the KVM, as a policy choice to reserve this API to specify guest ESR and 
> drive to trigger SError is OK,
> At least for Qemu it does not know in which condition to trigger it.

I think you're saying "lets keep it KVM for now, Qemu doesn't have a better idea
of what to do."


Thanks,

James

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2018-01-22 19:38 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-10 19:54 [PATCH v8 0/7] Support RAS virtualization in KVM Dongjiu Geng
2017-11-10 19:54 ` [Devel] " Dongjiu Geng
2017-11-10 19:54 ` Dongjiu Geng
2017-11-10 19:54 ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 3/7] acpi: apei: Add SEI notification type support for ARMv8 Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 4/7] KVM: arm64: Trap RAS error registers and set HCR_EL2's TERR & TEA Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 5/7] arm64: kvm: Introduce KVM_ARM_SET_SERROR_ESR ioctl Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 6/7] arm64: kvm: Set Virtual SError Exception Syndrome for guest Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54 ` [PATCH v8 7/7] arm64: kvm: handle SError Interrupt by categorization Dongjiu Geng
2017-11-10 19:54   ` [Devel] " Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-10 19:54   ` Dongjiu Geng
2017-11-14 16:00   ` James Morse
2017-11-14 16:00     ` [Devel] " James Morse
2017-11-14 16:00     ` James Morse
2017-11-14 16:00     ` James Morse
2017-11-15 11:29     ` gengdongjiu
2017-11-15 11:29       ` [Devel] " gengdongjiu
2017-11-15 11:29       ` gengdongjiu
2017-11-15 11:29       ` gengdongjiu
2017-12-06 10:26     ` gengdongjiu
2017-12-06 10:26       ` [Devel] " gengdongjiu
2017-12-06 10:26       ` gengdongjiu
2017-12-06 10:26       ` gengdongjiu
2017-12-06 19:04       ` James Morse
2017-12-06 19:04         ` [Devel] " James Morse
2017-12-06 19:04         ` James Morse
2017-12-07  6:37         ` gengdongjiu
2017-12-07  6:37           ` [Devel] " gengdongjiu
2017-12-07  6:37           ` gengdongjiu
2017-12-07  6:37           ` gengdongjiu
2017-12-15  3:30           ` gengdongjiu
2017-12-15  3:30             ` [Devel] " gengdongjiu
2017-12-15  3:30             ` gengdongjiu
2017-12-15  3:30             ` gengdongjiu
2018-01-12 18:05             ` James Morse
2018-01-12 18:05               ` [Devel] " James Morse
2018-01-12 18:05               ` James Morse
2018-01-12 18:05               ` James Morse
2018-01-15  8:33               ` Christoffer Dall
2018-01-15  8:33                 ` Christoffer Dall
2018-01-16 11:19                 ` gengdongjiu
2018-01-16 11:19                   ` [Devel] " gengdongjiu
2018-01-16 11:19                   ` gengdongjiu
2018-01-21  3:10                 ` gengdongjiu
2018-01-21  3:10                   ` gengdongjiu
2018-01-21  2:45               ` gengdongjiu
2018-01-21  2:45                 ` gengdongjiu
2018-01-22 19:32                 ` James Morse
2018-01-22 19:32                   ` [Devel] " James Morse
2018-01-22 19:32                   ` James Morse
2018-01-22 19:32                   ` James Morse
2017-12-15 18:52           ` James Morse
2017-12-15 18:52             ` [Devel] " James Morse
2017-12-15 18:52             ` James Morse
2017-12-16  3:44             ` gengdongjiu
2017-12-16  3:44               ` gengdongjiu
2017-12-16  3:44               ` gengdongjiu
2018-01-22 19:36               ` James Morse
2018-01-22 19:36                 ` James Morse
2017-12-16  4:47     ` gengdongjiu
2017-12-16  4:47       ` gengdongjiu
2018-01-12 18:05       ` James Morse
2018-01-12 18:05         ` [Devel] " James Morse
2018-01-12 18:05         ` James Morse
2018-01-12 18:05         ` James Morse
2018-01-16 11:22         ` gengdongjiu
2018-01-16 11:22           ` [Devel] " gengdongjiu
2018-01-16 11:22           ` gengdongjiu
2018-01-21  2:54         ` gengdongjiu
2018-01-21  2:54           ` gengdongjiu
2017-11-14 16:00 ` [PATCH v8 0/7] Support RAS virtualization in KVM James Morse
2017-11-14 16:00   ` [Devel] " James Morse
2017-11-14 16:00   ` James Morse
2017-11-15 11:06   ` gengdongjiu
2017-11-15 11:06     ` [Devel] " gengdongjiu
2017-11-15 11:06     ` gengdongjiu
2017-11-15 11:06     ` gengdongjiu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.