KVM Archive on lore.kernel.org
 help / color / Atom feed
From: gengdongjiu <gengdongjiu@huawei.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Shannon Zhao <shannon.zhaosl@gmail.com>,
	Fam Zheng <fam@euphon.net>, Richard Henderson <rth@twiddle.net>,
	Eduardo Habkost <ehabkost@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	"xuwei (O)" <xuwei5@huawei.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	James Morse <james.morse@arm.com>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	kvm-devel <kvm@vger.kernel.org>, qemu-arm <qemu-arm@nongnu.org>,
	Zheng Xiang <zhengxiang9@huawei.com>,
	Linuxarm <linuxarm@huawei.com>
Subject: Re: [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
Date: Fri, 17 Jan 2020 18:04:39 +0800
Message-ID: <c89db331-cb94-8e0b-edf8-25bfb64f826d@huawei.com> (raw)
In-Reply-To: <CAFEAcA_=PgkrWjwPxD89fCi85XPpcTHssXkSmE04Ctoj7AX0kA@mail.gmail.com>

On 2020/1/17 0:28, Peter Maydell wrote:
> On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote:
> 
>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> +{
>> +    ram_addr_t ram_addr;
>> +    hwaddr paddr;
>> +
>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>> +
>> +    if (acpi_enabled && addr &&
>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>> +        ram_addr = qemu_ram_addr_from_host(addr);
>> +        if (ram_addr != RAM_ADDR_INVALID &&
>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>> +            kvm_hwpoison_page_add(ram_addr);
>> +            /*
>> +             * Asynchronous signal will be masked by main thread, so
>> +             * only handle synchronous signal.
>> +             */
> 
> I don't understand this comment. (I think we've had discussions
> about it before, but it's still not clear to me.)
> 
> This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
> 
> (1) in the vcpu thread:
>   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
>     for an immediate vcpu exit
>   * the vcpu thread reads the flag on exit from KVM_RUN and
>     calls kvm_arch_on_sigbus_vcpu() directly
>   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO,
but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored.
At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.

> (2) MCE errors on other threads:
>   * here SIGBUS is blocked, so MCEERR_AR (action-required)
>     errors will cause the kernel to just kill the QEMU process
>   * MCEERR_AO errors will be handled via the iothread's use
>     of signalfd(), so kvm_on_sigbus() will get called from
>     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
>   * in this case the passed in CPUState will (arbitrarily) be that
>     for the first vCPU

For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2].

Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path.
Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test.

For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.
For Case2,it does not call call kvm_arch_on_sigbus_vcpu().


[1]:
/* Called synchronously (via signalfd) in main thread.  */
int kvm_on_sigbus(int code, void *addr)
{
#ifdef KVM_HAVE_MCE_INJECTION
    /* Action required MCE kills the process if SIGBUS is blocked.  Because
     * that's what happens in the I/O thread, where we handle MCE via signalfd,
     * we can only get action optional here.
     */
[2]: assert(code != BUS_MCEERR_AR);
    kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
    return 0;
#else
    return 1;
#endif
}

[3]: https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03575.html


> 
> For MCEERR_AR, the code here looks correct -- we know this is
> only going to happen for the relevant vCPU so we can go ahead
> and do the "record it in the ACPI table and tell the guest" bit.
> 
> But shouldn't we be doing something with the MCEERR_AO too?

Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call kvm_arch_on_sigbus_vcpu().
so I think currently we can just report error. If afterwards  MCEERR_AO error can call kvm_arch_on_sigbus_vcpu(), we can let the guest know about it.

> That of course will be trickier because we're not necessarily
> in the vcpu thread, but it would be nice to let the guest
> know about it.
> 
> One comment that would work with the current code would be:
> 
> /*
>  * If this is a BUS_MCEERR_AR, we know we have been called
>  * synchronously from the vCPU thread, so we can easily
>  * synchronize the state and inject an error.
>  *
>  * TODO: we currently don't tell the guest at all about BUS_MCEERR_AO.
>  * In that case we might either be being called synchronously from
>  * the vCPU thread, or a bit later from the main thread, so doing
At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
In the main thread, signalfd() is not called when it is BUS_MCEERR_AO which is triggered by guest.

>  * the injection of the error would be more complicated.
>  */
> 
> but I don't know if that's what you meant to say/implement...
we can implement MCEERR_AO logic when QEMU can receive MCEERR_AO error which is triggered by guest.

> 
>> +            if (code == BUS_MCEERR_AR) {
>> +                kvm_cpu_synchronize_state(c);
>> +                if (!acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
>> +                    kvm_inject_arm_sea(c);
>> +                } else {
>> +                    error_report("failed to record the error");
>> +                    abort();
>> +                }
>> +            }
>> +            return;
>> +        }
>> +        if (code == BUS_MCEERR_AO) {
>> +            error_report("Hardware memory error at addr %p for memory used by "
>> +                "QEMU itself instead of guest system!", addr);
>> +        }
>> +    }
>> +
>> +    if (code == BUS_MCEERR_AR) {
>> +        error_report("Hardware memory error!");
>> +        exit(1);
>> +    }
>> +}
>>
> 
> thanks
> -- PMM
> .
> 


  parent reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08 11:32 [PATCH v22 0/9] Add ARMv8 RAS virtualization support in QEMU Dongjiu Geng
2020-01-08 11:32 ` [PATCH v22 1/9] hw/arm/virt: Introduce a RAS machine option Dongjiu Geng
2020-01-08 11:32 ` [PATCH v22 2/9] docs: APEI GHES generation and CPER record description Dongjiu Geng
2020-01-15 15:11   ` Igor Mammedov
2020-01-08 11:32 ` [PATCH v22 3/9] ACPI: Build related register address fields via hardware error fw_cfg blob Dongjiu Geng
2020-01-23 15:14   ` Igor Mammedov
2020-02-02 14:01     ` gengdongjiu
2020-01-08 11:32 ` [PATCH v22 4/9] ACPI: Build Hardware Error Source Table Dongjiu Geng
2020-01-23 15:48   ` Igor Mammedov
2020-02-02 14:21     ` gengdongjiu
2020-02-05 16:43   ` Jonathan Cameron
2020-02-10 11:18     ` gengdongjiu
2020-01-08 11:32 ` [PATCH v22 5/9] ACPI: Record the Generic Error Status Block address Dongjiu Geng
2020-01-16 16:44   ` Peter Maydell
2020-01-17 10:36     ` gengdongjiu
2020-02-13 15:28     ` gengdongjiu
2020-01-17  7:39   ` Philippe Mathieu-Daudé
2020-01-17 10:47     ` gengdongjiu
2020-01-17 11:20       ` Philippe Mathieu-Daudé
2020-01-28 14:41   ` Igor Mammedov
2020-01-28 16:19     ` Igor Mammedov
2020-02-02 12:44     ` gengdongjiu
2020-02-03  7:51       ` Igor Mammedov
2020-01-08 11:32 ` [PATCH v22 6/9] KVM: Move hwpoison page related functions into kvm-all.c Dongjiu Geng
2020-01-08 11:32 ` [PATCH v22 7/9] ACPI: Record Generic Error Status Block(GESB) table Dongjiu Geng
2020-01-28 15:29   ` Igor Mammedov
2020-02-02 13:42     ` gengdongjiu
2020-02-03  7:55       ` Igor Mammedov
2020-01-08 11:32 ` [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM Dongjiu Geng
2020-01-16 16:28   ` Peter Maydell
2020-01-16 16:40     ` Peter Maydell
2020-01-17 10:04     ` gengdongjiu [this message]
2020-01-20 12:15       ` Peter Maydell
2020-01-22 15:30         ` gengdongjiu
2020-01-08 11:32 ` [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries Dongjiu Geng
2020-01-16 16:46   ` Peter Maydell
2020-01-17  7:22     ` Philippe Mathieu-Daudé
2020-01-17 10:16       ` gengdongjiu
2020-01-17 11:09       ` Peter Maydell
2020-01-17 11:22         ` Philippe Mathieu-Daudé
2020-01-19  8:19           ` gengdongjiu
2020-01-17 12:19         ` Michael S. Tsirkin
2020-01-09  3:38 ` [PATCH v22 0/9] Add ARMv8 RAS virtualization support in QEMU gengdongjiu

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c89db331-cb94-8e0b-edf8-25bfb64f826d@huawei.com \
    --to=gengdongjiu@huawei.com \
    --cc=ehabkost@redhat.com \
    --cc=fam@euphon.net \
    --cc=imammedo@redhat.com \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=shannon.zhaosl@gmail.com \
    --cc=xuwei5@huawei.com \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git