All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM
@ 2017-08-28 10:38 ` Dongjiu Geng
  0 siblings, 0 replies; 185+ messages in thread
From: Dongjiu Geng @ 2017-08-28 10:38 UTC (permalink / raw)
  To: christoffer.dall, marc.zyngier, rkrcmar, linux, catalin.marinas,
	will.deacon, lenb, robert.moore, lv.zheng, mark.rutland,
	james.morse, xiexiuqi, cov, david.daney, suzuki.poulose, stefan,
	Dave.Martin, kristina.martsenko, wangkefeng.wang, tbaicar,
	ard.biesheuvel, mingo, bp, shiju.jose, zjzhang, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, devel, mst, john.garry,
	jonathan.cameron, shameerali.kolot
  Cc: zhengqiang10, wuquanming, huangshaoyu, linuxarm, gengdongjiu

In the firmware-first RAS solution, corrupt data is detected in a
memory location when guest OS application software executing at EL0
or guest OS kernel El1 software are reading from the memory. The
memory node records errors in an error record accessible using
system registers.

Because SCR_EL3.EA is 1, then CPU will trap to El3 firmware, EL3
firmware records the error to APEI table through reading system
register.

Because the error was taken from a lower Exception level, if the
exception is SEA/SEI and HCR_EL2.TEA/HCR_EL2.AMO is 1, firmware
sets ESR_EL2/FAR_EL2 to fake a exception trap to EL2, then
transfers to hypervisor.

For the synchronous external abort(SEA), Hypervisor calls the
ghes_handle_memory_failure() to deal with this error,
ghes_handle_memory_failure() function reads the APEI table and 
callls memory_failure() to decide whether it needs to deliver
SIGBUS signal to user space, the advantage of using SIGBUS signal
to notify user space is that it can be compatible with Non-Kvm users.

For the SError Interrupt(SEI),KVM firstly classified the error.
Not call memory_failure() to handle it. Because the error address recorded
by APEI is not accurated, so can not identify the address to hwpoison
memory. If the SError error comes from guest user mode and is not propagated,
then signal user space to handle it, otherwise, directly injects virtual
SError, or panic if the error is fatal. when user space handles the error,
it will specify syndrome for the injected virtual SError. This syndrome value
is set to the VSESR_EL2. VSESR_EL2 is a new ARMv8.2 RAS extensions register
which provides the syndrome value reported to software on taking a virtual
SError interrupt exception.

Dongjiu Geng (5):
  acpi: apei: remove the unused code
  arm64: kvm: support user space to query RAS extension feature
  arm64: kvm: route synchronous external abort exceptions to el2
  KVM: arm64: allow get exception information from userspace
  arm64: kvm: handle SEI notification and pass the virtual syndrome

James Morse (1):
  KVM: arm64: Save ESR_EL2 on guest SError

Xie XiuQi (1):
  arm64: cpufeature: Detect CPU RAS Extentions

 arch/arm/include/asm/kvm_host.h      |  2 ++
 arch/arm/kvm/guest.c                 |  9 ++++++
 arch/arm64/Kconfig                   | 16 +++++++++++
 arch/arm64/include/asm/barrier.h     |  1 +
 arch/arm64/include/asm/cpucaps.h     |  3 +-
 arch/arm64/include/asm/esr.h         | 11 +++++++
 arch/arm64/include/asm/kvm_arm.h     |  2 ++
 arch/arm64/include/asm/kvm_emulate.h | 17 +++++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/asm/sysreg.h      |  5 ++++
 arch/arm64/include/asm/system_misc.h |  1 +
 arch/arm64/include/uapi/asm/kvm.h    |  5 ++++
 arch/arm64/kernel/cpufeature.c       | 13 +++++++++
 arch/arm64/kernel/process.c          |  3 ++
 arch/arm64/kvm/guest.c               | 50 ++++++++++++++++++++++++++++++++
 arch/arm64/kvm/handle_exit.c         | 56 ++++++++++++++++++++++++++++++++----
 arch/arm64/kvm/hyp/switch.c          | 29 +++++++++++++++++--
 arch/arm64/kvm/reset.c               |  3 ++
 arch/arm64/mm/fault.c                | 34 ++++++++++++++++++++++
 drivers/acpi/apei/ghes.c             | 14 ---------
 include/uapi/linux/kvm.h             |  3 ++
 virt/kvm/arm/arm.c                   |  7 +++++
 22 files changed, 263 insertions(+), 23 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 185+ messages in thread
[parent not found: <0184EA26B2509940AA629AE1405DD7F2016BA9E4@DGGEMA503-MBX.china.huawei.com>]
* Re: [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace
@ 2017-10-20 15:36 gengdongjiu
  0 siblings, 0 replies; 185+ messages in thread
From: gengdongjiu @ 2017-10-20 15:36 UTC (permalink / raw)
  To: 'James Morse', Christoffer Dall, Marc Zyngier
  Cc: lishuo (A), Liujun (Jun Liu), Lilei (Lious),
	kvmarm, kvm, linux-arm-kernel, Huangshaoyu, Wuquanming

CC James.

> > In the user space, we can check the si_code, if it is 
> > "BUS_MCEERR_AR", we use SEA notification type for the guest; if it is "BUS_MCEERR_AO", we use SEI notification type for the guest.
> > Because there are only two values for si_code("BUS_MCEERR_AR" and 
> > BUS_MCEERR_AO), in which case we can use the GSIV(IRQ)
> notification type?
> 
> This is for Qemu/kvmtool to decide, it depends on what sort of machine they are emulating.
> 
> For example, the physical machine's memory-controller may notify the 
> CPU about memory errors by triggering SError trapped to EL3, or with a 
> dedicated FIQ, also routed to EL3. By the time this gets to the host kernel the distinction doesn't matter. The host has handled the error.
> 
> For a guest, your memory-controller is effectively the host kernel. It 
> will give you an BUS_MCEERR_AO signal for any guest memory that is affected, and a BUS_MCEERR_AR if the guest directly accesses a page of affected memory.
> 
> What Qemu/kvmtool do with this is up to them. If they're emulating a machine with no RAS features, printing an error and exit.
> 
> Otherwise BUS_MCEERR_AR could be notified as one of the flavours of 
> IRQ, unless the affected vcpu has interrupts masked, in which case an SEA notification gives you some NMI-like behaviour.
> 
> For BUS_MCEERR_AO you could use SEI, IRQ or polled notification. My 
> choice would be IRQ, as you can't know if the guest supports SEI and 
> it would be a shame to kill it with an SError if the affected memory was free. SEA for synchronous errors is still a good choice even if the guest doesn't support it as that memory is still gone so its still a valid guest:Synchronous-external-abort.
> 

Add James.

CC some huawei's hardware engineers.

Hi James/Marc/Christoffer,

  As we discuss below solution:
When guest happen SEA/SEI, KVM calls memory_failure() to send an asynchronous SIGBUS signal(BUS_MCEERR_AO) to QEMU, and make this address to poisoned.
after QEMU receive this BUS_MCEERR_AO, it will record this address to CPER and notify guest. When guest happen stage2 page fault, KVM send a synchronous SIGBUS BUS_MCEERR_AR to QEMU, and QEMU also record CPER and immediately inject SEA abort.

But this solution, still have some problems.

1. In some situation, For RAS, when happen SEA, hardware cannot provide an error physical address to software instead it can only provide virtual address in FAR_ELx, This is to say, firmware cannot provide physical error address, but provided the virtual address in the FAR_ELx.
so BIOS cannot record this address to APEI table. In this case, when firmware Jump to hypervisor, hypervisor cannot call memory_failure(), now only the physical address is recorded and valid, APEI driver will call the memory_failure()), in this case, host will not send SIGBUS to QEMU. So guest cannot know there is SEA happen.
At least there is such issue in Huawei's platform (cannot provide PA for RAS firmware-first, only can provide VA in FAR_ELx)

2. if there is SEA/SEI, only deliver SIGBUS to notify QEMU. This information is limit.
 This SIGBUS can only provide an address and si_code(BUS_MCEERR_AO/ BUS_MCEERR_AR), nothing else.
 if QEMU record CPER and inject SEA/specify ESR, it may needs to know more information.
For example, if it injects SEA, it needs so setup many registers for guest, such as FAR_EL1. If sets it, it needs to know FAR_EL2.
 But QEMU cannot know this information to setup it if KVM cannot pass more fault info to QEMU.
 Of cause, we can identify the guest FAR_El1 register to invalid. But some time, guest needs to know it in the situation that host cannot provide the PA.

3. For SEI, the address is invalid, so in some platform, firmware will not record this AP. At least in HUAWEI's platform, firmware will not record it.
  we cannot always think that all platform can record PA for RAS, sometime it may use VA(in FAR_ELx).
  For SEI, if the address is not recorded, then the memory_failure() will be not called. So guest will not know it happens SEI. 

^ permalink raw reply	[flat|nested] 185+ messages in thread

end of thread, other threads:[~2017-11-03 18:38 UTC | newest]

Thread overview: 185+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-28 10:38 [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM Dongjiu Geng
2017-08-28 10:38 ` Dongjiu Geng
2017-08-28 10:38 ` Dongjiu Geng
2017-08-28 10:38 ` Dongjiu Geng
2017-08-28 10:38 ` [PATCH v6 1/7] arm64: cpufeature: Detect CPU RAS Extentions Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-31 17:44   ` James Morse
2017-08-31 17:44     ` James Morse
2017-08-31 17:44     ` James Morse
2017-09-04 11:20     ` gengdongjiu
2017-09-04 11:20       ` gengdongjiu
2017-09-04 11:20       ` gengdongjiu
2017-09-04 11:20       ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 2/7] KVM: arm64: Save ESR_EL2 on guest SError Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38 ` [PATCH v6 3/7] acpi: apei: remove the unused code Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-31 17:50   ` James Morse
2017-08-31 17:50     ` James Morse
2017-08-31 17:50     ` James Morse
2017-09-04 11:43     ` gengdongjiu
2017-09-04 11:43       ` gengdongjiu
2017-09-04 11:43       ` gengdongjiu
2017-09-04 11:43       ` gengdongjiu
2017-09-08 18:17       ` James Morse
2017-09-08 18:17         ` [Devel] " James Morse
2017-09-08 18:17         ` James Morse
2017-09-08 18:17         ` James Morse
2017-09-11 12:04         ` gengdongjiu
2017-09-11 12:04           ` [Devel] " gengdongjiu
2017-09-11 12:04           ` gengdongjiu
2017-09-11 12:04           ` gengdongjiu
2017-09-11 12:04           ` gengdongjiu
2017-09-14 12:35           ` James Morse
2017-09-14 12:35             ` James Morse
2017-09-14 12:35             ` James Morse
2017-09-14 12:51             ` gengdongjiu
2017-09-14 12:51               ` gengdongjiu
2017-09-14 12:51               ` gengdongjiu
2017-09-14 12:51               ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 4/7] arm64: kvm: support user space to query RAS extension feature Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-31 18:04   ` James Morse
2017-08-31 18:04     ` James Morse
2017-08-31 18:04     ` James Morse
2017-09-05  7:18     ` gengdongjiu
2017-09-05  7:18       ` gengdongjiu
2017-09-05  7:18       ` gengdongjiu
2017-09-07 16:31       ` James Morse
2017-09-07 16:31         ` [Devel] " James Morse
2017-09-07 16:31         ` James Morse
2017-09-07 16:31         ` James Morse
2017-09-08 14:34         ` 答复: " gengdongjiu
2017-09-08 14:34           ` gengdongjiu
2017-09-08 14:34           ` gengdongjiu
2017-09-08 15:03           ` Peter Maydell
2017-09-08 15:03             ` Peter Maydell
2017-09-08 15:03             ` Peter Maydell
2017-09-14 12:34             ` James Morse
2017-09-14 12:34               ` James Morse
2017-09-14 12:34               ` James Morse
2017-09-08 17:36         ` gengdongjiu
2017-09-08 17:36           ` [Devel] " gengdongjiu
2017-09-08 17:36           ` gengdongjiu
2017-09-08 17:36           ` gengdongjiu
2017-09-14 12:38           ` James Morse
2017-09-14 12:38             ` James Morse
2017-09-14 12:38             ` James Morse
2017-08-28 10:38 ` [PATCH v6 5/7] arm64: kvm: route synchronous external abort exceptions to el2 Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-09-07 16:31   ` James Morse
2017-09-07 16:31     ` [Devel] " James Morse
2017-09-07 16:31     ` James Morse
2017-09-07 16:31     ` James Morse
2017-09-13  8:12     ` gengdongjiu
2017-09-13  8:12       ` [Devel] " gengdongjiu
2017-09-13  8:12       ` gengdongjiu
2017-09-13  8:12       ` gengdongjiu
2017-09-13  8:12       ` gengdongjiu
2017-09-14 11:12     ` gengdongjiu
2017-09-14 11:12       ` gengdongjiu
2017-09-14 11:12       ` gengdongjiu
2017-09-14 11:12       ` gengdongjiu
2017-09-14 12:36       ` James Morse
2017-09-14 12:36         ` James Morse
2017-09-14 12:36         ` James Morse
2017-10-16 11:44       ` James Morse
2017-10-16 11:44         ` James Morse
2017-10-16 11:44         ` James Morse
2017-10-16 13:44         ` gengdongjiu
2017-10-16 13:44           ` gengdongjiu
2017-10-16 13:44           ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-09-07 16:30   ` James Morse
2017-09-07 16:30     ` [Devel] " James Morse
2017-09-07 16:30     ` James Morse
2017-09-07 16:30     ` James Morse
2017-09-13  7:32     ` gengdongjiu
2017-09-13  7:32       ` [Devel] " gengdongjiu
2017-09-13  7:32       ` gengdongjiu
2017-09-13  7:32       ` gengdongjiu
2017-09-13  7:32       ` gengdongjiu
2017-09-14 13:00       ` James Morse
2017-09-14 13:00         ` James Morse
2017-09-14 13:00         ` James Morse
2017-09-18 13:36         ` gengdongjiu
2017-09-18 13:36           ` gengdongjiu
2017-09-18 13:36           ` gengdongjiu
2017-09-18 13:36           ` gengdongjiu
2017-09-22 16:39           ` James Morse
2017-09-22 16:39             ` James Morse
2017-09-22 16:39             ` James Morse
2017-09-25 15:13             ` 答复: " gengdongjiu
2017-09-25 15:13               ` gengdongjiu
2017-09-25 15:13               ` gengdongjiu
2017-10-06 16:46               ` James Morse
2017-10-06 16:46                 ` James Morse
2017-10-06 16:46                 ` James Morse
2017-10-19  5:48                 ` gengdongjiu
2017-10-19  5:48                   ` gengdongjiu
2017-10-19  5:48                   ` gengdongjiu
2017-09-21  7:55         ` gengdongjiu
2017-09-21  7:55           ` gengdongjiu
2017-09-21  7:55           ` gengdongjiu
2017-09-21  7:55           ` gengdongjiu
2017-09-22 16:51           ` James Morse
2017-09-22 16:51             ` James Morse
2017-09-22 16:51             ` James Morse
2017-09-27 11:07             ` gengdongjiu
2017-09-27 11:07               ` gengdongjiu
2017-09-27 11:07               ` gengdongjiu
2017-09-27 11:07               ` gengdongjiu
2017-09-27 15:37               ` gengdongjiu
2017-09-27 15:37                 ` gengdongjiu
2017-09-27 15:37                 ` gengdongjiu
2017-10-06 17:31               ` James Morse
2017-10-06 17:31                 ` James Morse
2017-10-06 17:31                 ` James Morse
2017-10-19  7:49                 ` gengdongjiu
2017-10-19  7:49                   ` gengdongjiu
2017-10-19  7:49                   ` gengdongjiu
2017-10-19  7:49                   ` gengdongjiu
2017-08-28 10:38 ` [PATCH v6 7/7] arm64: kvm: handle SEI notification and pass the virtual syndrome Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-28 10:38   ` Dongjiu Geng
2017-08-31 17:43 ` [PATCH v6 0/7] Add RAS virtualization support for SEA/SEI notification type in KVM James Morse
2017-08-31 17:43   ` James Morse
2017-08-31 17:43   ` James Morse
2017-09-04 11:10   ` gengdongjiu
2017-09-04 11:10     ` gengdongjiu
2017-09-04 11:10     ` gengdongjiu
2017-09-04 11:10     ` gengdongjiu
2017-09-07 16:32     ` James Morse
2017-09-07 16:32       ` [Devel] " James Morse
2017-09-07 16:32       ` James Morse
2017-09-07 16:32       ` James Morse
2017-09-06 11:19 ` Peter Maydell
2017-09-06 11:19   ` Peter Maydell
2017-09-06 11:19   ` Peter Maydell
2017-09-06 11:29   ` gengdongjiu
2017-09-06 11:29     ` gengdongjiu
2017-09-06 11:29     ` gengdongjiu
     [not found] <0184EA26B2509940AA629AE1405DD7F2016BA9E4@DGGEMA503-MBX.china.huawei.com>
2017-10-20 15:33 ` [PATCH v6 6/7] KVM: arm64: allow get exception information from userspace gengdongjiu
2017-10-20 15:33   ` gengdongjiu
2017-10-25 17:42   ` James Morse
2017-10-25 17:42     ` James Morse
2017-10-27  7:21     ` gengdongjiu
2017-10-27  7:21       ` gengdongjiu
2017-11-03 18:36       ` James Morse
2017-11-03 18:36         ` James Morse
2017-10-20 15:36 gengdongjiu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.