linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
@ 2017-05-04 16:52 gengdongjiu
  2017-05-05 12:31 ` gengdongjiu
  2017-05-08 17:28 ` James Morse
  0 siblings, 2 replies; 13+ messages in thread
From: gengdongjiu @ 2017-05-04 16:52 UTC (permalink / raw)
  To: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, James Morse, Xiongfeng Wang,
	gengdongjiu

Dear James,
   Thanks a lot for your review and comments. I am very sorry for the
late response.


2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1984@gmail.com>:
>  Hi Dongjiu Geng,
>
> On 30/04/17 06:37, Dongjiu Geng wrote:
>> when happen SEA, deliver signal bus and handle the ioctl that
>> inject SEA abort to guest, so that guest can handle the SEA error.
>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 105b6ab..a96594f 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -20,8 +20,10 @@
>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>>   __coherent_cache_guest_page(vcpu, pfn, size);
>>  }
>>
>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool hwpoison)
>> +{
>> + siginfo_t info;
>> +
>> + info.si_signo   = SIGBUS;
>> + info.si_errno   = 0;
>> + if (hwpoison)
>> + info.si_code    = BUS_MCEERR_AR;
>> + else
>> + info.si_code    = 0;
>> +
>> + info.si_addr    = (void __user *)address;
>> + if (hugetlb)
>> + info.si_addr_lsb = PMD_SHIFT;
>> + else
>> + info.si_addr_lsb = PAGE_SHIFT;
>> +
>> + send_sig_info(SIGBUS, &info, current);
>> +}
>> +
> «  [hide part of quote]
>
> Punit reviewed the other version of this patch, this PMD_SHIFT is not the right
> thing to do, it needs a more accurate set of calls and shifts as there may be
> hugetlbfs pages other than PMD_SIZE.
>
> https://www.spinics.net/lists/arm-kernel/msg568919.html
>
> I haven't posted a new version of that patch because I was still hunting a bug
> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT
> returned to userspace instead of this hwpoison code being invoked.

  Ok, got it, thanks for your information.
>
> Please avoid duplicating functionality between patches, it wastes reviewers
> time, especially when we know there are problems with this approach.
>
>
>> +static void kvm_handle_bad_page(unsigned long address,
>> + bool hugetlb, bool hwpoison)
>> +{
>> + /* handle both hwpoison and other synchronous external Abort */
>> + if (hwpoison)
>> + kvm_send_signal(address, hugetlb, true);
>> + else
>> + kvm_send_signal(address, hugetlb, false);
>> +}
>
> Why the extra level of indirection? We only want to signal userspace like this
> from KVM for hwpoison. Signals for RAS related reasons should come from the bits
> of the kernel that decoded the error.

For the SEA, the are maily two types:
0b010000 Synchronous External Abort on memory access.
0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
encode the level.

hwpoison should belong to the  "Synchronous External Abort on memory access"
if the SEA type is not hwpoison, such as page table walk, do you mean
KVM do not deliver the SIGBUS?
If so, how the KVM handle the SEA type other than hwpoison?

>
> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>
>
>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>> index b37446a..780e3c4 100644
>> --- a/arch/arm64/kvm/guest.c
>> +++ b/arch/arm64/kvm/guest.c
>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>   return -EINVAL;
>>  }
>>
>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu)
>> +{
>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>> +
>> + return 0;
>> +}
>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index bb02909..1d2e2e7 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>  /* Available with KVM_CAP_X86_SMM */
>>  #define KVM_SMI                   _IO(KVMIO,   0xb7)
>> +#define KVM_ARM_SEA               _IO(KVMIO,   0xb8)
>>
>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>
>
> Why do we need a userspace API for SEA? It can also be done by using
> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
> way is you can choose which ESR value to use.
>
> Adding a new API call to do something you could do with an old one doesn't look
> right.

James, I considered your suggestion before that use the
KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
not have difference to use the alread existed KVM API.  so may be
changing the vcpu registers in qemu will duplicate with the KVM APIs.

injection a SEA is no more than setting some registers: elr_el1, PC,
PSTATE, SPSR_el1, far_el1, esr_el1
I seen this KVM API do the same thing as Qemu.  do you found call this
API will have issue and necessary to choose another ESR value?

I pasted the alread existed KVM API code:

static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned
long addr)
{
 unsigned long cpsr = *vcpu_cpsr(vcpu);
 bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
 u32 esr = 0;
 *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
 *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
 *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
 *vcpu_spsr(vcpu) = cpsr;
 vcpu_sys_reg(vcpu, FAR_EL1) = addr;
 /*
  * Build an {i,d}abort, depending on the level and the
  * instruction set. Report an external synchronous abort.
  */
 if (kvm_vcpu_trap_il_is32bit(vcpu))
  esr |= ESR_ELx_IL;
 /*
  * Here, the guest runs in AArch64 mode when in EL1. If we get
  * an AArch32 fault, it means we managed to trap an EL0 fault.
  */
 if (is_aarch32 || (cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
  esr |= (ESR_ELx_EC_IABT_LOW << ESR_ELx_EC_SHIFT);
 else
  esr |= (ESR_ELx_EC_IABT_CUR << ESR_ELx_EC_SHIFT);
 if (!is_iabt)
  esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
}

static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
    unsigned long addr)
{
 u32 vect_offset;
 u32 *far, *fsr;
 bool is_lpae;
 if (is_pabt) {
  vect_offset = 12;
  far = &vcpu_cp15(vcpu, c6_IFAR);
  fsr = &vcpu_cp15(vcpu, c5_IFSR);
 } else { /* !iabt */
  vect_offset = 16;
  far = &vcpu_cp15(vcpu, c6_DFAR);
  fsr = &vcpu_cp15(vcpu, c5_DFSR);
 }
 prepare_fault32(vcpu, COMPAT_PSR_MODE_ABT | COMPAT_PSR_A_BIT, vect_offset);
 *far = addr;
 /* Give the guest an IMPLEMENTATION DEFINED exception */
 is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
 if (is_lpae)
  *fsr = 1 << 9 | 0x34;
 else
  *fsr = 0x14;
}


/**
 * kvm_inject_dabt - inject a data abort into the guest
 * @vcpu: The VCPU to receive the undefined exception
 * @addr: The address to report in the DFAR
 *
 * It is assumed that this code is called from the VCPU thread and that the
 * VCPU therefore is not currently executing guest code.
 */
void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
{
 if (!(vcpu->arch.hcr_el2 & HCR_RW))
  inject_abt32(vcpu, false, addr);
 else
  inject_abt64(vcpu, false, addr);
}


>
>
> Thanks,
>
> James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-04 16:52 [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error gengdongjiu
@ 2017-05-05 12:31 ` gengdongjiu
  2017-05-12 17:24   ` James Morse
  2017-05-08 17:28 ` James Morse
  1 sibling, 1 reply; 13+ messages in thread
From: gengdongjiu @ 2017-05-05 12:31 UTC (permalink / raw)
  To: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, James Morse, Xiongfeng Wang,
	gengdongjiu

HI James,

2017-05-05 0:52 GMT+08:00 gengdongjiu <gengdj.1984@gmail.com>:
> Dear James,
>    Thanks a lot for your review and comments. I am very sorry for the
> late response.
>
>
> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1984@gmail.com>:
>>  Hi Dongjiu Geng,
>>
>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>> when happen SEA, deliver signal bus and handle the ioctl that
>>> inject SEA abort to guest, so that guest can handle the SEA error.
>>
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 105b6ab..a96594f 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> @@ -20,8 +20,10 @@
>>> @@ -1238,6 +1240,36 @@ static void coherent_cache_guest_page(struct kvm_vcpu *vcpu, kvm_pfn_t pfn,
>>>   __coherent_cache_guest_page(vcpu, pfn, size);
>>>  }
>>>
>>> +static void kvm_send_signal(unsigned long address, bool hugetlb, bool hwpoison)
>>> +{
>>> + siginfo_t info;
>>> +
>>> + info.si_signo   = SIGBUS;
>>> + info.si_errno   = 0;
>>> + if (hwpoison)
>>> + info.si_code    = BUS_MCEERR_AR;
>>> + else
>>> + info.si_code    = 0;
>>> +
>>> + info.si_addr    = (void __user *)address;
>>> + if (hugetlb)
>>> + info.si_addr_lsb = PMD_SHIFT;
>>> + else
>>> + info.si_addr_lsb = PAGE_SHIFT;
>>> +
>>> + send_sig_info(SIGBUS, &info, current);
>>> +}
>>> +
>> «  [hide part of quote]
>>
>> Punit reviewed the other version of this patch, this PMD_SHIFT is not the right
>> thing to do, it needs a more accurate set of calls and shifts as there may be
>> hugetlbfs pages other than PMD_SIZE.
>>
>> https://www.spinics.net/lists/arm-kernel/msg568919.html
>>
>> I haven't posted a new version of that patch because I was still hunting a bug
>> in the hugepage/hwpoison code, even with Punit's fixes series I see -EFAULT
>> returned to userspace instead of this hwpoison code being invoked.
>
>   Ok, got it, thanks for your information.
>>
>> Please avoid duplicating functionality between patches, it wastes reviewers
>> time, especially when we know there are problems with this approach.
>>
>>
>>> +static void kvm_handle_bad_page(unsigned long address,
>>> + bool hugetlb, bool hwpoison)
>>> +{
>>> + /* handle both hwpoison and other synchronous external Abort */
>>> + if (hwpoison)
>>> + kvm_send_signal(address, hugetlb, true);
>>> + else
>>> + kvm_send_signal(address, hugetlb, false);
>>> +}
>>
>> Why the extra level of indirection? We only want to signal userspace like this
>> from KVM for hwpoison. Signals for RAS related reasons should come from the bits
>> of the kernel that decoded the error.
>
> For the SEA, the are maily two types:
> 0b010000 Synchronous External Abort on memory access.
> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
> encode the level.
>
> hwpoison should belong to the  "Synchronous External Abort on memory access"
> if the SEA type is not hwpoison, such as page table walk, do you mean
> KVM do not deliver the SIGBUS?
> If so, how the KVM handle the SEA type other than hwpoison?
>
>>
>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>>
>>
>>> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
>>> index b37446a..780e3c4 100644
>>> --- a/arch/arm64/kvm/guest.c
>>> +++ b/arch/arm64/kvm/guest.c
>>> @@ -277,6 +277,13 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
>>>   return -EINVAL;
>>>  }
>>>
>>> +int kvm_vcpu_ioctl_sea(struct kvm_vcpu *vcpu)
>>> +{
>>> + kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
>>> +
>>> + return 0;
>>> +}
>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index bb02909..1d2e2e7 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>>  /* Available with KVM_CAP_X86_SMM */
>>>  #define KVM_SMI                   _IO(KVMIO,   0xb7)
>>> +#define KVM_ARM_SEA               _IO(KVMIO,   0xb8)
>>>
>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>>
>>
>> Why do we need a userspace API for SEA? It can also be done by using
>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
>> way is you can choose which ESR value to use.
>>
>> Adding a new API call to do something you could do with an old one doesn't look
>> right.
>
> James, I considered your suggestion before that use the
> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
> not have difference to use the alread existed KVM API.  so may be
> changing the vcpu registers in qemu will duplicate with the KVM APIs.
>
> injection a SEA is no more than setting some registers: elr_el1, PC,
> PSTATE, SPSR_el1, far_el1, esr_el1
> I seen this KVM API do the same thing as Qemu.  do you found call this
> API will have issue and necessary to choose another ESR value?

I consider again, you are right.

when guest OS happen an SEA, My current solution is shown below:

(1) host EL3 firmware firstly handle the SEA error and generate the CPER record.
(2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3,
far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2.
(3) then jump the EL2 hypervisor

so the EL2 hypervisor uses the ESR that come from esr_el3,  here the
ESR(esr_el3) value may be different with the exist KVM API's ESR.


>
> I pasted the alread existed KVM API code:
>
> static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned
> long addr)
> {
>  unsigned long cpsr = *vcpu_cpsr(vcpu);
>  bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
>  u32 esr = 0;
>  *vcpu_elr_el1(vcpu) = *vcpu_pc(vcpu);
>  *vcpu_pc(vcpu) = get_except_vector(vcpu, except_type_sync);
>  *vcpu_cpsr(vcpu) = PSTATE_FAULT_BITS_64;
>  *vcpu_spsr(vcpu) = cpsr;
>  vcpu_sys_reg(vcpu, FAR_EL1) = addr;
>  /*
>   * Build an {i,d}abort, depending on the level and the
>   * instruction set. Report an external synchronous abort.
>   */
>  if (kvm_vcpu_trap_il_is32bit(vcpu))
>   esr |= ESR_ELx_IL;
>  /*
>   * Here, the guest runs in AArch64 mode when in EL1. If we get
>   * an AArch32 fault, it means we managed to trap an EL0 fault.
>   */
>  if (is_aarch32 || (cpsr & PSR_MODE_MASK) == PSR_MODE_EL0t)
>   esr |= (ESR_ELx_EC_IABT_LOW << ESR_ELx_EC_SHIFT);
>  else
>   esr |= (ESR_ELx_EC_IABT_CUR << ESR_ELx_EC_SHIFT);
>  if (!is_iabt)
>   esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
>  vcpu_sys_reg(vcpu, ESR_EL1) = esr | ESR_ELx_FSC_EXTABT;
> }
>
> static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
>     unsigned long addr)
> {
>  u32 vect_offset;
>  u32 *far, *fsr;
>  bool is_lpae;
>  if (is_pabt) {
>   vect_offset = 12;
>   far = &vcpu_cp15(vcpu, c6_IFAR);
>   fsr = &vcpu_cp15(vcpu, c5_IFSR);
>  } else { /* !iabt */
>   vect_offset = 16;
>   far = &vcpu_cp15(vcpu, c6_DFAR);
>   fsr = &vcpu_cp15(vcpu, c5_DFSR);
>  }
>  prepare_fault32(vcpu, COMPAT_PSR_MODE_ABT | COMPAT_PSR_A_BIT, vect_offset);
>  *far = addr;
>  /* Give the guest an IMPLEMENTATION DEFINED exception */
>  is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
>  if (is_lpae)
>   *fsr = 1 << 9 | 0x34;
>  else
>   *fsr = 0x14;
> }
>
>
> /**
>  * kvm_inject_dabt - inject a data abort into the guest
>  * @vcpu: The VCPU to receive the undefined exception
>  * @addr: The address to report in the DFAR
>  *
>  * It is assumed that this code is called from the VCPU thread and that the
>  * VCPU therefore is not currently executing guest code.
>  */
> void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
> {
>  if (!(vcpu->arch.hcr_el2 & HCR_RW))
>   inject_abt32(vcpu, false, addr);
>  else
>   inject_abt64(vcpu, false, addr);
> }
>
>
>>
>>
>> Thanks,
>>
>> James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-04 16:52 [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error gengdongjiu
  2017-05-05 12:31 ` gengdongjiu
@ 2017-05-08 17:28 ` James Morse
  2017-05-08 17:54   ` Christoffer Dall
  2017-05-10  8:44   ` gengdongjiu
  1 sibling, 2 replies; 13+ messages in thread
From: James Morse @ 2017-05-08 17:28 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, Xiongfeng Wang

Hi gengdongjiu,

On 04/05/17 17:52, gengdongjiu wrote:
> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1984@gmail.com>:
>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>> index 105b6ab..a96594f 100644
>>> --- a/arch/arm/kvm/mmu.c
>>> +++ b/arch/arm/kvm/mmu.c
>>> +static void kvm_handle_bad_page(unsigned long address,
>>> + bool hugetlb, bool hwpoison)
>>> +{
>>> + /* handle both hwpoison and other synchronous external Abort */
>>> + if (hwpoison)
>>> + kvm_send_signal(address, hugetlb, true);
>>> + else
>>> + kvm_send_signal(address, hugetlb, false);
>>> +}
>>
>> Why the extra level of indirection? We only want to signal userspace like this
>> from KVM for hwpoison. Signals for RAS related reasons should come from the bits
>> of the kernel that decoded the error.
> 
> For the SEA, the are maily two types:
> 0b010000 Synchronous External Abort on memory access.
> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
> encode the level.

(KVM shouldn't have to make decisions about this)


> hwpoison should belong to the  "Synchronous External Abort on memory access"
> if the SEA type is not hwpoison, such as page table walk, do you mean
> KVM do not deliver the SIGBUS?


The flow of events should be SEI/SEA from firmware to the hosts's APEI code. KVM
should only be involved to get us back to the host if we were running a guest.
The APEI/hwpoison code may cause a set of processes to be sent signals. The code
in mm/memory-failure.c does this by walking the process rmaps using the physical
addresses in the CPER records.

We want user space to be sent signals as this can (and should) work in exactly
the same way on arm64 as it does on x86 or any other architecture. If a
web-browser can handle SIGBUS notifications for memory-corruption, it shouldn't
have to care what architecture it is running on.

So what is that KVM+SIGBUS patch about?...

>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
>> Qemu and KVM. This isn't the example of how user-space gets signalled.)

KVM creates guests as if they were additional users of Qemu's memory. The code
in mm/memory-failure.c may find that Qemu didn't have the affected page mapped
to user-space - but it may have been in use by stage2.

The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when the
guest touches the hwpoison page as if Qemu had touched the page itself.

Signals from KVM is a corner case, for firmware-first decisions should happen in
the APEI code based on CPER records.


> If so, how the KVM handle the SEA type other than hwpoison?

To deliver to a guest? It shouldn't have to know, user space should use a KVM
API to drive this.

When received from hardware? It shouldn't have to care, these things should be
passed into the APEI code for handling. KVM just needs to put the host registers
back.


>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index bb02909..1d2e2e7 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>>  /* Available with KVM_CAP_X86_SMM */
>>>  #define KVM_SMI                   _IO(KVMIO,   0xb7)
>>> +#define KVM_ARM_SEA               _IO(KVMIO,   0xb8)
>>>
>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>>
>>
>> Why do we need a userspace API for SEA? It can also be done by using
>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
>> way is you can choose which ESR value to use.
>>
>> Adding a new API call to do something you could do with an old one doesn't look
>> right.
> 
> James, I considered your suggestion before that use the
> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
> not have difference to use the alread existed KVM API. 

(Only that is an in-kernel helper, not a published API)


> so may be
> changing the vcpu registers in qemu will duplicate with the KVM APIs.

That is true, but the alternative is a new API that doesn't do anything new, its
just more convenient.

Marc and Christoffer are the people to convince.
I argue the existing API is sufficient.


> injection a SEA is no more than setting some registers: elr_el1, PC,
> PSTATE, SPSR_el1, far_el1, esr_el1
> I seen this KVM API do the same thing as Qemu.  do you found call this
> API will have issue and necessary to choose another ESR value?

Should we let user-space pick the ESR to deliver to the guest? Yes, letting
user-space specify the ESR gives the most flexibility to do something clever in
the future. An obvious choice for SEA is between the external-abort and 'parity
or ECC error' codes. If we tell user-space which of these happened (I don't
think Linux does today) then Qemu can relay that information to the guest.



Thanks,

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-08 17:28 ` James Morse
@ 2017-05-08 17:54   ` Christoffer Dall
  2017-05-09 14:28     ` James Morse
  2017-05-10  8:44   ` gengdongjiu
  1 sibling, 1 reply; 13+ messages in thread
From: Christoffer Dall @ 2017-05-08 17:54 UTC (permalink / raw)
  To: James Morse
  Cc: gengdongjiu, Tyler Baicar, Christoffer Dall, Marc Zyngier,
	pbonzini, rkrcmar, linux, catalin.marinas, will.deacon, rjw,
	Len Brown, matt, robert.moore, lv.zheng, nkaje, zjzhang,
	mark.rutland, akpm, eun.taik.lee, Sandeepa Prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, Fu Wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, Punit Agrawal, astone, harba,
	Hanjun Guo, John Garry, Shiju Jose, joe, Xiongfeng Wang

On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:

[...]

> 
> 
> >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> >>> index bb02909..1d2e2e7 100644
> >>> --- a/include/uapi/linux/kvm.h
> >>> +++ b/include/uapi/linux/kvm.h
> >>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
> >>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
> >>>  /* Available with KVM_CAP_X86_SMM */
> >>>  #define KVM_SMI                   _IO(KVMIO,   0xb7)
> >>> +#define KVM_ARM_SEA               _IO(KVMIO,   0xb8)
> >>>
> >>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> >>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
> >>>
> >>
> >> Why do we need a userspace API for SEA? It can also be done by using
> >> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
> >> way is you can choose which ESR value to use.
> >>
> >> Adding a new API call to do something you could do with an old one doesn't look
> >> right.
> > 
> > James, I considered your suggestion before that use the
> > KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
> > not have difference to use the alread existed KVM API. 
> 
> (Only that is an in-kernel helper, not a published API)
> 
> 
> > so may be
> > changing the vcpu registers in qemu will duplicate with the KVM APIs.
> 
> That is true, but the alternative is a new API that doesn't do anything new, its
> just more convenient.
> 
> Marc and Christoffer are the people to convince.
> I argue the existing API is sufficient.
> 

I must admit I am losing track of exactly what this proposed API was
supposed to do.

However, if it's a question about setting up VCPU registers to a certain
state and potentially modifying memory, then I think experience has
shown us (psci) that emulating something in the kernel that userspace
can have fine-grained control over is a bad idea, and should be left to
userspace using as generic APIs as possible.

Furthermore, if I understand what injecting a SEA requires, it is very
similar to resetting the CPU and loading data into guest memory, which
QEMU already does today, and there is no reason to introduce additional
APIs if it can be done using KVM_GET/SET_ONE_REG ioctls.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-08 17:54   ` Christoffer Dall
@ 2017-05-09 14:28     ` James Morse
  2017-05-10  9:15       ` gengdongjiu
  0 siblings, 1 reply; 13+ messages in thread
From: James Morse @ 2017-05-09 14:28 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: gengdongjiu, Tyler Baicar, Christoffer Dall, Marc Zyngier,
	pbonzini, rkrcmar, linux, catalin.marinas, will.deacon, rjw,
	Len Brown, matt, robert.moore, lv.zheng, nkaje, zjzhang,
	mark.rutland, akpm, eun.taik.lee, Sandeepa Prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, Fu Wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, Punit Agrawal, astone, harba,
	Hanjun Guo, John Garry, Shiju Jose, joe, Xiongfeng Wang

Hi Christoffer,

On 08/05/17 18:54, Christoffer Dall wrote:
> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:
> I must admit I am losing track of exactly what this proposed API was
> supposed to do.

There are two, and we keep jumping between them!
This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'.

SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these
today using the KVM_GET/SET_ONE_REG API whenever it wants to.

SEI uses SError, is asynchronous and can be masked. In addition these need to be
consumed/synchronised by the ESB instruction, even when executed by a guest.
Hardware has the necessary bits to drive all this, we need to expose an API to
drive it.

(I try to spell them out each time so I don't confuse SEI with something
synchronous!)


This patch was about SEA. I think you've answered our question:

> However, if it's a question about setting up VCPU registers to a certain
> state and potentially modifying memory, then I think experience has
> shown us (psci) that emulating something in the kernel that userspace
> can have fine-grained control over is a bad idea, and should be left to
> userspace using as generic APIs as possible.
> 
> Furthermore, if I understand what injecting a SEA requires, it is very
> similar to resetting the CPU and loading data into guest memory, which
> QEMU already does today, and there is no reason to introduce additional
> APIs if it can be done using KVM_GET/SET_ONE_REG ioctls.


Thanks,

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-08 17:28 ` James Morse
  2017-05-08 17:54   ` Christoffer Dall
@ 2017-05-10  8:44   ` gengdongjiu
  2017-05-12 17:25     ` James Morse
  1 sibling, 1 reply; 13+ messages in thread
From: gengdongjiu @ 2017-05-10  8:44 UTC (permalink / raw)
  To: James Morse, gengdongjiu
  Cc: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, Xiongfeng Wang

Hi James,
  thanks a lot for your answer.

On 2017/5/9 1:28, James Morse wrote:
> Hi gengdongjiu,
> 
> On 04/05/17 17:52, gengdongjiu wrote:
>> 2017-05-04 23:42 GMT+08:00 gengdongjiu <gengdj.1984@gmail.com>:
>>> On 30/04/17 06:37, Dongjiu Geng wrote:
>>>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>>>> index 105b6ab..a96594f 100644
>>>> --- a/arch/arm/kvm/mmu.c
>>>> +++ b/arch/arm/kvm/mmu.c
>>>> +static void kvm_handle_bad_page(unsigned long address,
>>>> + bool hugetlb, bool hwpoison)
>>>> +{
>>>> + /* handle both hwpoison and other synchronous external Abort */
>>>> + if (hwpoison)
>>>> + kvm_send_signal(address, hugetlb, true);
>>>> + else
>>>> + kvm_send_signal(address, hugetlb, false);
>>>> +}
>>>
>>> Why the extra level of indirection? We only want to signal userspace like this
>>> from KVM for hwpoison. Signals for RAS related reasons should come from the bits
>>> of the kernel that decoded the error.
>>
>> For the SEA, the are maily two types:
>> 0b010000 Synchronous External Abort on memory access.
>> 0b0101xx Synchronous External Abort on page table walk. DFSC[1:0]
>> encode the level.
> 
> (KVM shouldn't have to make decisions about this)
> 
> 
>> hwpoison should belong to the  "Synchronous External Abort on memory access"
>> if the SEA type is not hwpoison, such as page table walk, do you mean
>> KVM do not deliver the SIGBUS?
> 
> 
> The flow of events should be SEI/SEA from firmware to the hosts's APEI code. KVM
> should only be involved to get us back to the host if we were running a guest.
> The APEI/hwpoison code may cause a set of processes to be sent signals. The code
> in mm/memory-failure.c does this by walking the process rmaps using the physical
> addresses in the CPER records.
> 
> We want user space to be sent signals as this can (and should) work in exactly
> the same way on arm64 as it does on x86 or any other architecture. If a
> web-browser can handle SIGBUS notifications for memory-corruption, it shouldn't
> have to care what architecture it is running on.
 Ok, James, understand.

> 
> So what is that KVM+SIGBUS patch about?...
> 
>>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
>>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
> 
> KVM creates guests as if they were additional users of Qemu's memory. The code
> in mm/memory-failure.c may find that Qemu didn't have the affected page mapped
> to user-space - but it may have been in use by stage2.
> 
> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when the
> guest touches the hwpoison page as if Qemu had touched the page itself.
> 
> Signals from KVM is a corner case, for firmware-first decisions should happen in
> the APEI code based on CPER records.
> 
> 
>> If so, how the KVM handle the SEA type other than hwpoison?
> 
> To deliver to a guest? It shouldn't have to know, user space should use a KVM
> API to drive this.
> 
> When received from hardware? It shouldn't have to care, these things should be
> passed into the APEI code for handling. KVM just needs to put the host registers
> back.
Recently I confirmed with the hardware team. they said almost all the SEA errors have the
Poison flag, so may be there is no need to consider other SEA errors other than  hwPoison.
only consider SEA hwpoison errors can be enough.

> 
> 
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index bb02909..1d2e2e7 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -1306,6 +1306,7 @@ struct kvm_s390_ucas_mapping {
>>>>  #define KVM_S390_GET_IRQ_STATE  _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>>>>  /* Available with KVM_CAP_X86_SMM */
>>>>  #define KVM_SMI                   _IO(KVMIO,   0xb7)
>>>> +#define KVM_ARM_SEA               _IO(KVMIO,   0xb8)
>>>>
>>>>  #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>>>>  #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
>>>>
>>>
>>> Why do we need a userspace API for SEA? It can also be done by using
>>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. The advantage of doing it this
>>> way is you can choose which ESR value to use.
>>>
>>> Adding a new API call to do something you could do with an old one doesn't look
>>> right.
>>
>> James, I considered your suggestion before that use the
>> KVM_{G,S}ET_ONE_REG to change the vcpu registers. but I found it does
>> not have difference to use the alread existed KVM API. 
> 
> (Only that is an in-kernel helper, not a published API)

yes, the kvm_inject_dabt is an in-kernel API.

> 
> 
>> so may be
>> changing the vcpu registers in qemu will duplicate with the KVM APIs.
> 
> That is true, but the alternative is a new API that doesn't do anything new, its
> just more convenient.
> 
> Marc and Christoffer are the people to convince.
> I argue the existing API is sufficient.
> 
> 
>> injection a SEA is no more than setting some registers: elr_el1, PC,
>> PSTATE, SPSR_el1, far_el1, esr_el1
>> I seen this KVM API do the same thing as Qemu.  do you found call this
>> API will have issue and necessary to choose another ESR value?
> 
> Should we let user-space pick the ESR to deliver to the guest? Yes, letting
> user-space specify the ESR gives the most flexibility to do something clever in
> the future. An obvious choice for SEA is between the external-abort and 'parity
> or ECC error' codes. If we tell user-space which of these happened (I don't
> think Linux does today) then Qemu can relay that information to the guest.

may be the ESR is delivered by the KVM.
(1) guest OS EL0 happen SEA due to hwpoison
(2) CPU traps to EL3 firmware, and update the ESR_EL3
(3) the EL3 firmware copies the ESR_EL3 to ESR_EL2
(4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the SEA.

May be the esr_el2 can provide the accurate error information.
or do you think user-space specify the ESR instead of esr_el2 is better?


> 
> 
> 
> Thanks,
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-09 14:28     ` James Morse
@ 2017-05-10  9:15       ` gengdongjiu
  2017-05-10 12:20         ` Christoffer Dall
  0 siblings, 1 reply; 13+ messages in thread
From: gengdongjiu @ 2017-05-10  9:15 UTC (permalink / raw)
  To: James Morse, Christoffer Dall
  Cc: gengdongjiu, Tyler Baicar, Christoffer Dall, Marc Zyngier,
	pbonzini, rkrcmar, linux, catalin.marinas, will.deacon, rjw,
	Len Brown, matt, robert.moore, lv.zheng, nkaje, zjzhang,
	mark.rutland, akpm, eun.taik.lee, Sandeepa Prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, Fu Wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, Punit Agrawal, astone, harba,
	Hanjun Guo, John Garry, Shiju Jose, joe, Xiongfeng Wang

Thanks James's explanation.

Hi Christoffer,

On 2017/5/9 22:28, James Morse wrote:
> Hi Christoffer,
> 
> On 08/05/17 18:54, Christoffer Dall wrote:
>> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:
>> I must admit I am losing track of exactly what this proposed API was
>> supposed to do.
> 
> There are two, and we keep jumping between them!
> This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'.
> 
> SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these
> today using the KVM_GET/SET_ONE_REG API whenever it wants to.
> 
> SEI uses SError, is asynchronous and can be masked. In addition these need to be
> consumed/synchronised by the ESB instruction, even when executed by a guest.
> Hardware has the necessary bits to drive all this, we need to expose an API to
> drive it.
> 
> (I try to spell them out each time so I don't confuse SEI with something
> synchronous!)
> 
> 
> This patch was about SEA. I think you've answered our question:

we are talking about the SEA(synchronous data abort) injection two methods:

(1)change vcpu registers in the Qemu/kvmtools and using the KVM_GET/SET_ONE_REG API to set.
(2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL command from Qemu.


> 
>> However, if it's a question about setting up VCPU registers to a certain
>> state and potentially modifying memory, then I think experience has
>> shown us (psci) that emulating something in the kernel that userspace
>> can have fine-grained control over is a bad idea, and should be left to
>> userspace using as generic APIs as possible.
>>
>> Furthermore, if I understand what injecting a SEA requires, it is very
>> similar to resetting the CPU and loading data into guest memory, which
>> QEMU already does today, and there is no reason to introduce additional
>> APIs if it can be done using KVM_GET/SET_ONE_REG ioctls.
> 
> 
> Thanks,
> 
> James
> 
> .
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-10  9:15       ` gengdongjiu
@ 2017-05-10 12:20         ` Christoffer Dall
  2017-05-10 12:37           ` gengdongjiu
  0 siblings, 1 reply; 13+ messages in thread
From: Christoffer Dall @ 2017-05-10 12:20 UTC (permalink / raw)
  To: gengdongjiu
  Cc: James Morse, gengdongjiu, Tyler Baicar, Christoffer Dall,
	Marc Zyngier, pbonzini, rkrcmar, linux, catalin.marinas,
	will.deacon, rjw, Len Brown, matt, robert.moore, lv.zheng, nkaje,
	zjzhang, mark.rutland, akpm, eun.taik.lee, Sandeepa Prabhu,
	labbott, shijie.huang, rruigrok, paul.gortmaker, tn, Fu Wei,
	rostedt, bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel,
	linux-acpi, linux-efi, devel, Suzuki.Poulose, Punit Agrawal,
	astone, harba, Hanjun Guo, John Garry, Shiju Jose, joe,
	Xiongfeng Wang

On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote:
> Thanks James's explanation.
> 
> Hi Christoffer,
> 
> On 2017/5/9 22:28, James Morse wrote:
> > Hi Christoffer,
> > 
> > On 08/05/17 18:54, Christoffer Dall wrote:
> >> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:
> >> I must admit I am losing track of exactly what this proposed API was
> >> supposed to do.
> > 
> > There are two, and we keep jumping between them!
> > This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'.
> > 
> > SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these
> > today using the KVM_GET/SET_ONE_REG API whenever it wants to.
> > 
> > SEI uses SError, is asynchronous and can be masked. In addition these need to be
> > consumed/synchronised by the ESB instruction, even when executed by a guest.
> > Hardware has the necessary bits to drive all this, we need to expose an API to
> > drive it.
> > 
> > (I try to spell them out each time so I don't confuse SEI with something
> > synchronous!)
> > 
> > 
> > This patch was about SEA. I think you've answered our question:
> 
> we are talking about the SEA(synchronous data abort) injection two methods:
> 
> (1)change vcpu registers in the Qemu/kvmtools and using the KVM_GET/SET_ONE_REG API to set.

Yes, if this is possible, why would you want something more?

> (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL command from Qemu.
> 

I'm not really going to consider this, because "use internal API from
userspace" doesn't work.

So this should be:

  (2) Introduce a new API to do X.

I still think you know what my preference is; use the existing API if at
all possible.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-10 12:20         ` Christoffer Dall
@ 2017-05-10 12:37           ` gengdongjiu
  0 siblings, 0 replies; 13+ messages in thread
From: gengdongjiu @ 2017-05-10 12:37 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: James Morse, gengdongjiu, Tyler Baicar, Christoffer Dall,
	Marc Zyngier, pbonzini, rkrcmar, linux, catalin.marinas,
	will.deacon, rjw, Len Brown, matt, robert.moore, lv.zheng, nkaje,
	zjzhang, mark.rutland, akpm, eun.taik.lee, Sandeepa Prabhu,
	labbott, shijie.huang, rruigrok, paul.gortmaker, tn, Fu Wei,
	rostedt, bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel,
	linux-acpi, linux-efi, devel, Suzuki.Poulose, Punit Agrawal,
	astone, harba, Hanjun Guo, John Garry, Shiju Jose, joe,
	Xiongfeng Wang

Hi Christoffer,

On 2017/5/10 20:20, Christoffer Dall wrote:
> On Wed, May 10, 2017 at 05:15:04PM +0800, gengdongjiu wrote:
>> Thanks James's explanation.
>>
>> Hi Christoffer,
>>
>> On 2017/5/9 22:28, James Morse wrote:
>>> Hi Christoffer,
>>>
>>> On 08/05/17 18:54, Christoffer Dall wrote:
>>>> On Mon, May 08, 2017 at 06:28:02PM +0100, James Morse wrote:
>>>> I must admit I am losing track of exactly what this proposed API was
>>>> supposed to do.
>>>
>>> There are two, and we keep jumping between them!
>>> This is about two notification methods APEI has for arm64, 'SEA' and 'SEI'.
>>>
>>> SEA is synchronous and looks like a data abort. Qemu/kvmtool can inject these
>>> today using the KVM_GET/SET_ONE_REG API whenever it wants to.
>>>
>>> SEI uses SError, is asynchronous and can be masked. In addition these need to be
>>> consumed/synchronised by the ESB instruction, even when executed by a guest.
>>> Hardware has the necessary bits to drive all this, we need to expose an API to
>>> drive it.
>>>
>>> (I try to spell them out each time so I don't confuse SEI with something
>>> synchronous!)
>>>
>>>
>>> This patch was about SEA. I think you've answered our question:
>>
>> we are talking about the SEA(synchronous data abort) injection two methods:
>>
>> (1)change vcpu registers in the Qemu/kvmtools and using the KVM_GET/SET_ONE_REG API to set.
> 
> Yes, if this is possible, why would you want something more?
  we will use this method.

> 
>> (2)using existed in-kernel API "kvm_inject_dabt" to inject through IOCTL command from Qemu.
>>
> 
> I'm not really going to consider this, because "use internal API from
> userspace" doesn't work.
> 
> So this should be:
> 
>   (2) Introduce a new API to do X.
  you can ignore the second method, now we will not use it.

> 
> I still think you know what my preference is; use the existing API if at
> all possible.
> 
> Thanks,
> -Christoffer
> 
> .
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-05 12:31 ` gengdongjiu
@ 2017-05-12 17:24   ` James Morse
  2017-05-21  8:24     ` gengdongjiu
  0 siblings, 1 reply; 13+ messages in thread
From: James Morse @ 2017-05-12 17:24 UTC (permalink / raw)
  To: gengdongjiu
  Cc: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, Xiongfeng Wang

Hi gengdongjiu,

On 05/05/17 13:31, gengdongjiu wrote:
> when guest OS happen an SEA, My current solution is shown below:
> 
> (1) host EL3 firmware firstly handle the SEA error and generate the CPER record.
> (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3,
> far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2.

Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm sure
you exclude values from EL3 or the secure-world, we should never hand those to
the normal world.


> (3) then jump the EL2 hypervisor

> so the EL2 hypervisor uses the ESR that come from esr_el3,  here the
> ESR(esr_el3) value may be different with the exist KVM API's ESR.

The ESR may be different between EL3 and EL2. The ESR contains the severity of
the event, the CPU will choose this when it takes the SError to EL3. If it had
taken the SError to EL2, the CPU may have classified the error differently.

Firmware may need to generate a more severe ESR if it receives an error that
would be propagated by delivering SEI to a lower exception level, for example if
an EL2 system register is 'infected'.

This is the same for Qemu/kvmtool. A contained error at EL2 may be an
uncontained error if we hand it to guest EL1. Linux's RAS code will decide this
with its choice of signal to send, (and possibly which code to set).
Qemu/kvmtool need to choose an appropriate APEI notification, which may involve
generating a relevant ESR.

Also relevant is the problem we discussed earlier with trying to deliver fake
Physical-SError from software at EL3: If the SError is routed to EL2, and EL2
has PSTATE.A masked, EL3 has to wait and try again later. This is another case
where firmware may have to upgrade the classification of an error to uncontainable.


Thanks,

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-10  8:44   ` gengdongjiu
@ 2017-05-12 17:25     ` James Morse
  2017-05-21  9:23       ` gengdongjiu
  0 siblings, 1 reply; 13+ messages in thread
From: James Morse @ 2017-05-12 17:25 UTC (permalink / raw)
  To: gengdongjiu, gengdongjiu
  Cc: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, Xiongfeng Wang

Hi gengdongjiu,

On 10/05/17 09:44, gengdongjiu wrote:
> On 2017/5/9 1:28, James Morse wrote:
>>>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two users,
>>>> Qemu and KVM. This isn't the example of how user-space gets signalled.)
>>
>> KVM creates guests as if they were additional users of Qemu's memory. The code
>> in mm/memory-failure.c may find that Qemu didn't have the affected page mapped
>> to user-space - but it may have been in use by stage2.
>>
>> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal when the
>> guest touches the hwpoison page as if Qemu had touched the page itself.
>>
>> Signals from KVM is a corner case, for firmware-first decisions should happen in
>> the APEI code based on CPER records.

>>> If so, how the KVM handle the SEA type other than hwpoison?

>> To deliver to a guest? It shouldn't have to know, user space should use a KVM
>> API to drive this.
>>
>> When received from hardware? It shouldn't have to care, these things should be
>> passed into the APEI code for handling. KVM just needs to put the host registers
>> back.

> Recently I confirmed with the hardware team. they said almost all the SEA errors have the
> Poison flag, so may be there is no need to consider other SEA errors other than  hwPoison.
> only consider SEA hwpoison errors can be enough.

We should be careful here, by hwpoison I meant the Linux feature.
>From Documentation/vm/hwpoison.txt:
> Upcoming Intel CPUs have support for recovering from some memory errors
> (``MCA recovery''). This requires the OS to declare a page "poisoned",
> kill the processes associated with it and avoid using it in the future.

We were talking about KVM's reaction to 'the OS declaring a page poisoned'.
Lets try to call this one memory-failure, as that is its Kconfig name. (now I
understand why we've been confusing each other!)

Your hwpoison looks like something the CPU reports in the ERR<n>STATUS registers
(4.6.10 of DDI0587). This is something firmware should read, then describe to
the OS via CPER records. Depending on these CPER records linux may invoke its
memory-failure code.


>>> injection a SEA is no more than setting some registers: elr_el1, PC,
>>> PSTATE, SPSR_el1, far_el1, esr_el1
>>> I seen this KVM API do the same thing as Qemu.  do you found call this
>>> API will have issue and necessary to choose another ESR value?
>>
>> Should we let user-space pick the ESR to deliver to the guest? Yes, letting
>> user-space specify the ESR gives the most flexibility to do something clever in
>> the future. An obvious choice for SEA is between the external-abort and 'parity
>> or ECC error' codes. If we tell user-space which of these happened (I don't
>> think Linux does today) then Qemu can relay that information to the guest.

> may be the ESR is delivered by the KVM.
> (1) guest OS EL0 happen SEA due to hwpoison
> (2) CPU traps to EL3 firmware, and update the ESR_EL3
> (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2
> (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the SEA.
> 
> May be the esr_el2 can provide the accurate error information.
> or do you think user-space specify the ESR instead of esr_el2 is better?

I think the severity needs to be considered as the notification is handled by
each exception level. There are cases where it will need to be upgraded from
'contained' to 'uncontained'. (more discussion on another part of the thread).


Thanks,

James

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-12 17:24   ` James Morse
@ 2017-05-21  8:24     ` gengdongjiu
  0 siblings, 0 replies; 13+ messages in thread
From: gengdongjiu @ 2017-05-21  8:24 UTC (permalink / raw)
  To: James Morse
  Cc: Tyler Baicar, Christoffer Dall, Marc Zyngier, pbonzini, rkrcmar,
	linux, catalin.marinas, will.deacon, rjw, Len Brown, matt,
	robert.moore, lv.zheng, nkaje, zjzhang, mark.rutland, akpm,
	eun.taik.lee, Sandeepa Prabhu, labbott, shijie.huang, rruigrok,
	paul.gortmaker, tn, Fu Wei, rostedt, bristot, linux-arm-kernel,
	kvmarm, kvm, linux-kernel, linux-acpi, linux-efi, devel,
	Suzuki.Poulose, Punit Agrawal, astone, harba, Hanjun Guo,
	John Garry, Shiju Jose, joe, Xiongfeng Wang

Hi James,
    sorry for the late response due to recently verify and debug the
RAS solution.

2017-05-13 1:24 GMT+08:00, James Morse <james.morse@arm.com>:
> Hi gengdongjiu,
>
> On 05/05/17 13:31, gengdongjiu wrote:
>> when guest OS happen an SEA, My current solution is shown below:
>>
>> (1) host EL3 firmware firstly handle the SEA error and generate the CPER
>> record.
>> (2) EL3 firmware separately copy the esr_el3, elr_el3, SPSR_el3,
>> far_el3 to the esr_el2, elr_el2, SPSR_el2, far_el2.
>
> Copying {ELR,SPSR,FAR}_EL3 to the EL2 registers rings some alarm bells: I'm
> sure
> you exclude values from EL3 or the secure-world, we should never hand those
> to
> the normal world.

 it is sure that needs to  exclude the EL3 Error and secure-world.

>
>
>> (3) then jump the EL2 hypervisor
>
>> so the EL2 hypervisor uses the ESR that come from esr_el3,  here the
>> ESR(esr_el3) value may be different with the exist KVM API's ESR.
>
> The ESR may be different between EL3 and EL2. The ESR contains the severity
> of
> the event, the CPU will choose this when it takes the SError to EL3. If it
> had
> taken the SError to EL2, the CPU may have classified the error differently.
>
> Firmware may need to generate a more severe ESR if it receives an error
> that
> would be propagated by delivering SEI to a lower exception level, for
> example if
> an EL2 system register is 'infected'.
>
> This is the same for Qemu/kvmtool. A contained error at EL2 may be an
> uncontained error if we hand it to guest EL1. Linux's RAS code will decide
> this
> with its choice of signal to send, (and possibly which code to set).
> Qemu/kvmtool need to choose an appropriate APEI notification, which may
> involve
> generating a relevant ESR.
>
> Also relevant is the problem we discussed earlier with trying to deliver
> fake
> Physical-SError from software at EL3: If the SError is routed to EL2, and
> EL2
> has PSTATE.A masked, EL3 has to wait and try again later. This is another
> case
> where firmware may have to upgrade the classification of an error to
> uncontainable.
   it makes sense. thanks to James.

>
>
> Thanks,
>
> James
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error
  2017-05-12 17:25     ` James Morse
@ 2017-05-21  9:23       ` gengdongjiu
  0 siblings, 0 replies; 13+ messages in thread
From: gengdongjiu @ 2017-05-21  9:23 UTC (permalink / raw)
  To: James Morse
  Cc: gengdongjiu, Tyler Baicar, Christoffer Dall, Marc Zyngier,
	pbonzini, rkrcmar, linux, catalin.marinas, will.deacon, rjw,
	Len Brown, matt, robert.moore, lv.zheng, nkaje, zjzhang,
	mark.rutland, akpm, eun.taik.lee, Sandeepa Prabhu, labbott,
	shijie.huang, rruigrok, paul.gortmaker, tn, Fu Wei, rostedt,
	bristot, linux-arm-kernel, kvmarm, kvm, linux-kernel, linux-acpi,
	linux-efi, devel, Suzuki.Poulose, Punit Agrawal, astone, harba,
	Hanjun Guo, John Garry, Shiju Jose, joe, Xiongfeng Wang

2017-05-13 1:25 GMT+08:00, James Morse <james.morse@arm.com>:
> Hi gengdongjiu,
>
> On 10/05/17 09:44, gengdongjiu wrote:
>> On 2017/5/9 1:28, James Morse wrote:
>>>>> (hwpoison for KVM is a corner case as Qemu's memory effectively has two
>>>>> users,
>>>>> Qemu and KVM. This isn't the example of how user-space gets
>>>>> signalled.)
>>>
>>> KVM creates guests as if they were additional users of Qemu's memory. The
>>> code
>>> in mm/memory-failure.c may find that Qemu didn't have the affected page
>>> mapped
>>> to user-space - but it may have been in use by stage2.
>>>
>>> The KVM+SIGBUS patch hides this difference, meaning Qemu gets a signal
>>> when the
>>> guest touches the hwpoison page as if Qemu had touched the page itself.
>>>
>>> Signals from KVM is a corner case, for firmware-first decisions should
>>> happen in
>>> the APEI code based on CPER records.
>
>>>> If so, how the KVM handle the SEA type other than hwpoison?
>
>>> To deliver to a guest? It shouldn't have to know, user space should use a
>>> KVM
>>> API to drive this.
>>>
>>> When received from hardware? It shouldn't have to care, these things
>>> should be
>>> passed into the APEI code for handling. KVM just needs to put the host
>>> registers
>>> back.
>
>> Recently I confirmed with the hardware team. they said almost all the SEA
>> errors have the
>> Poison flag, so may be there is no need to consider other SEA errors other
>> than  hwPoison.
>> only consider SEA hwpoison errors can be enough.
>
> We should be careful here, by hwpoison I meant the Linux feature.
> From Documentation/vm/hwpoison.txt:
>> Upcoming Intel CPUs have support for recovering from some memory errors
>> (``MCA recovery''). This requires the OS to declare a page "poisoned",
>> kill the processes associated with it and avoid using it in the future.
>
> We were talking about KVM's reaction to 'the OS declaring a page poisoned'.
> Lets try to call this one memory-failure, as that is its Kconfig name. (now
> I
> understand why we've been confusing each other!)
>
> Your hwpoison looks like something the CPU reports in the ERR<n>STATUS
> registers
> (4.6.10 of DDI0587). This is something firmware should read, then describe
> to
> the OS via CPER records. Depending on these CPER records linux may invoke
> its
> memory-failure code.
  yes

>
>
>>>> injection a SEA is no more than setting some registers: elr_el1, PC,
>>>> PSTATE, SPSR_el1, far_el1, esr_el1
>>>> I seen this KVM API do the same thing as Qemu.  do you found call this
>>>> API will have issue and necessary to choose another ESR value?
>>>
>>> Should we let user-space pick the ESR to deliver to the guest? Yes,
>>> letting
>>> user-space specify the ESR gives the most flexibility to do something
>>> clever in
>>> the future. An obvious choice for SEA is between the external-abort and
>>> 'parity
>>> or ECC error' codes. If we tell user-space which of these happened (I
>>> don't
>>> think Linux does today) then Qemu can relay that information to the
>>> guest.
>
>> may be the ESR is delivered by the KVM.
>> (1) guest OS EL0 happen SEA due to hwpoison
>> (2) CPU traps to EL3 firmware, and update the ESR_EL3
>> (3) the EL3 firmware copies the ESR_EL3 to ESR_EL2
>> (4) then jump to EL2 hypervisor, hypervisor uses the ESR_EL2 to inject the
>> SEA.
>>
>> May be the esr_el2 can provide the accurate error information.
>> or do you think user-space specify the ESR instead of esr_el2 is better?
>
> I think the severity needs to be considered as the notification is handled
> by
> each exception level. There are cases where it will need to be upgraded
> from
> 'contained' to 'uncontained'. (more discussion on another part of the
> thread).
  understand it.

>
>
> Thanks,
>
> James
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-05-21  9:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 16:52 [PATCH v3 3/3] arm/arm64: signal SIBGUS and inject SEA Error gengdongjiu
2017-05-05 12:31 ` gengdongjiu
2017-05-12 17:24   ` James Morse
2017-05-21  8:24     ` gengdongjiu
2017-05-08 17:28 ` James Morse
2017-05-08 17:54   ` Christoffer Dall
2017-05-09 14:28     ` James Morse
2017-05-10  9:15       ` gengdongjiu
2017-05-10 12:20         ` Christoffer Dall
2017-05-10 12:37           ` gengdongjiu
2017-05-10  8:44   ` gengdongjiu
2017-05-12 17:25     ` James Morse
2017-05-21  9:23       ` gengdongjiu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).