linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xie XiuQi <xiexiuqi@huawei.com>
To: James Morse <james.morse@arm.com>,
	Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: <gengdongjiu@huawei.com>, <punit.agrawal@arm.com>,
	<mark.rutland@arm.com>, <linux-efi@vger.kernel.org>,
	<kvm@vger.kernel.org>, <rkrcmar@redhat.com>,
	<matt@codeblueprint.co.uk>, <catalin.marinas@arm.com>,
	Tyler Baicar <tbaicar@codeaurora.org>, <will.deacon@arm.com>,
	<robert.moore@intel.com>, <paul.gortmaker@windriver.com>,
	<lv.zheng@intel.com>, <kvmarm@lists.cs.columbia.edu>,
	<fu.wei@linaro.org>, <tn@semihalf.com>, <zjzhang@codeaurora.org>,
	<linux@armlinux.org.uk>, <linux-acpi@vger.kernel.org>,
	<eun.taik.lee@samsung.com>, <shijie.huang@arm.com>,
	<labbott@redhat.com>, <lenb@kernel.org>, <harba@codeaurora.org>,
	<Suzuki.Poulose@arm.com>, <marc.zyngier@arm.com>,
	<john.garry@huawei.com>, <rostedt@goodmis.org>,
	<nkaje@codeaurora.org>, <sandeepa.s.prabhu@gmail.com>,
	<linux-arm-kernel@lists.infradead.org>, <devel@acpica.org>,
	<rjw@rjwysocki.net>, <rruigrok@codeaurora.org>,
	<linux-kernel@vger.kernel.org>, <astone@redhat.com>,
	<hanjun.guo@linaro.org>, <joe@perches.com>, <pbonzini@redhat.com>,
	<akpm@linux-foundation.org>, <bristot@redhat.com>,
	<christoffer.dall@linaro.org>, <shiju.jose@huawei.com>
Subject: Re: [PATCH V11 10/10] arm/arm64: KVM: add guest SEA support
Date: Wed, 22 Mar 2017 20:08:48 +0800	[thread overview]
Message-ID: <b57e2cb1-129b-7369-96e0-33e5c87dbb74@huawei.com> (raw)
In-Reply-To: <58D25C8D.3070507@arm.com>

Hi James,

On 2017/3/22 19:14, James Morse wrote:
> Hi Wang Xiongfeng,
> 
> On 22/03/17 02:46, Xiongfeng Wang wrote:
>>> Guests are a special case as QEMU may never access the faulty memory itself, so
>>> it won't receive the 'late' signal. It looks like ARM/arm64 KVM lacks support
>>> for KVM_PFN_ERR_HWPOISON which sends SIGBUS from KVM's fault-handling code. I
>>> have patches to add support for this which I intend to send at rc1.
>>>
>>> [0] suggests 'KVM qemu' sets these MCE flags to take the 'early' path, but given
>>> x86s KVM_PFN_ERR_HWPOISON, this may be out of date.
>>>
>>>
>>> Either way, once QEMU gets a signal indicating the virtual address, it can
>>> generate its own APEI CPER records and use the KVM APIs to mock up an
>>> Synchronous External Abort, (or inject an IRQ or run the vcpu waiting for the
>>> guest's polling thread to come round, whichever was described to the guest via
>>> the HEST/GHES tables).
>>>
>>
>> I have another confusion about the SIGBUS signal. Can QEMU always get a SIGBUS when needed.
>> I know one circumstance which will send SIGBUS. The ghes_handle_memory_failure() in
>> ghes_do_proc() will send SIGBUS to QEMU, but this only happens when there exists memory section
>> in ghes, that is the section type is CPER_SEC_PLATFORM_MEM.
>> Suppose this case, an load  error in guest application causes an SEA, and the firmware take it.
>> The firmware begin to scan the error record and fill the ghes. But the error record in memory node
>> has been read by other handler.
> 
> (this looks like a race)
> 
>> The firmware won't add memory section in ghes, so ghes_handle_memory_failure() won't be called.
> 
> I think this would be a firmware bug. Firmware can reserve as much memory as it
> needs for writing CPER records, there should not be a case where 'the memory' is
> currently being processed by another handler.

I have a question here:
Consider this case, the memory controller first detected a memory error,
but it has not been consumed. So it will not generate the SEA. Memory error
may be reported to the OS by IRQ with MEM section in CPER record; and
after for a while, the error data was loaded into the cache and consumed,
when the SEA is generated. Is it possible only processor section, and no
MEM section in CPER record?

Obviously there are two different GHES above, one for SEA and another for IRQ/GSIV.
Could we assume that there is mem section in the SEA ghes table?

> 
> The memory firmware uses to write CPER records too shouldn't be published to the
> OS until it has finished. Once firmware has finished writing the CPER records it
> can update the memory pointed to by GHES->ErrorStatusAddress with the location
> of the CPER records and invoke the Notification method for this GHES. (SEI, SEA,
> IRQ etc). We should always get a complete set of CPER records to describe the error.
> 

Does it mean that the BIOS has the responsibility to ensure that the GHES table has a
complete error info, including memory, bus, tlb, cache and other related error info?

-- 
Thanks,
Xie XiuQi

> It firmware uses GHESv2 it can get an 'ack' write from APEI once it has finished
> processing the records. Once it gets this firmware knows it can re-use the memory.
> 
> (Obviously each GHES entry can only process one error at a time. Firmware should
> either handle this, or have one entry for each Error Source that can happen
> independently)
> 
> 
>> I mean that we may not rely on ghes_handle_memory_failure() to send SIGBUS to QEMU. Whether we should
>> add some other code to send SIGBUS in handle_guest_abort(). I don't know whether the ARM/arm64
>>  KVM_PFN_ERR_HWPOISON you mentioned above will cover all the cases.
> 
> The SIGBUS routine is part of the kernel's recovery method for memory errors. It
> should cover all the errors reported with this CPER_SEC_PLATFORM_MEM.
> 
> Back to the race you describe. It shouldn't matter if one CPU processes an error
> for guest memory while a vcpu is running on another. This may happen if the
> error was detected by DRAM's background scrub.
> If we don't treat KVM/Qemu as anything special the memory_failure()->SIGBUS path
> will happen regardless of whether the fault interrupted the guest or not.
> 
> 
> There are other types of error such as PCIe, CPU, BUS error etc. If it's
> possible to recover from these we may need additional code in the kernel. This
> shouldn't necessarily treat KVM as a special case.
> 
> 
> Thanks,
> 
> James
> 
> 
> .
> 

      reply	other threads:[~2017-03-22 12:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21 21:21 [PATCH V11 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 01/10] acpi: apei: read ack upon ghes record consumption Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 03/10] efi: parse ARM processor error Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 04/10] arm64: exception: handle Synchronous External Abort Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 05/10] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
2017-03-01  7:42   ` Xie XiuQi
2017-03-01 19:22     ` Baicar, Tyler
2017-02-21 21:21 ` [PATCH V11 06/10] acpi: apei: panic OS with fatal error status block Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 07/10] efi: print unrecognized CPER section Tyler Baicar
2017-02-21 21:21 ` [PATCH V11 08/10] ras: acpi / apei: generate trace event for " Tyler Baicar
2017-02-21 21:22 ` [PATCH V11 09/10] trace, ras: add ARM processor error trace event Tyler Baicar
2017-02-21 21:22 ` [PATCH V11 10/10] arm/arm64: KVM: add guest SEA support Tyler Baicar
2017-02-24 10:42   ` James Morse
2017-02-27 11:31     ` gengdongjiu
2017-02-28 19:43     ` Baicar, Tyler
2017-03-06 10:28       ` James Morse
2017-03-06 14:00         ` Baicar, Tyler
2017-02-25  7:15   ` Xiongfeng Wang
2017-02-27 13:58     ` James Morse
2017-02-28  6:25       ` Xiongfeng Wang
2017-02-28 13:21         ` James Morse
2017-03-01  2:31           ` Xiongfeng Wang
2017-03-02  9:39             ` Marc Zyngier
2017-03-06  3:38               ` Xiongfeng Wang
2017-03-06  1:28       ` gengdongjiu
2017-03-22  2:46       ` Xiongfeng Wang
2017-03-22 11:14         ` James Morse
2017-03-22 12:08           ` Xie XiuQi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b57e2cb1-129b-7369-96e0-33e5c87dbb74@huawei.com \
    --to=xiexiuqi@huawei.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=astone@redhat.com \
    --cc=bristot@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=devel@acpica.org \
    --cc=eun.taik.lee@samsung.com \
    --cc=fu.wei@linaro.org \
    --cc=gengdongjiu@huawei.com \
    --cc=hanjun.guo@linaro.org \
    --cc=harba@codeaurora.org \
    --cc=james.morse@arm.com \
    --cc=joe@perches.com \
    --cc=john.garry@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=labbott@redhat.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=lv.zheng@intel.com \
    --cc=marc.zyngier@arm.com \
    --cc=mark.rutland@arm.com \
    --cc=matt@codeblueprint.co.uk \
    --cc=nkaje@codeaurora.org \
    --cc=paul.gortmaker@windriver.com \
    --cc=pbonzini@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=rkrcmar@redhat.com \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=rruigrok@codeaurora.org \
    --cc=sandeepa.s.prabhu@gmail.com \
    --cc=shijie.huang@arm.com \
    --cc=shiju.jose@huawei.com \
    --cc=tbaicar@codeaurora.org \
    --cc=tn@semihalf.com \
    --cc=wangxiongfeng2@huawei.com \
    --cc=will.deacon@arm.com \
    --cc=zjzhang@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).