KVM Archive on lore.kernel.org
 help / color / Atom feed
From: gengdongjiu <gengdongjiu@huawei.com>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: Fam Zheng <fam@euphon.net>, Eduardo Habkost <ehabkost@redhat.com>,
	kvm-devel <kvm@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Linuxarm <linuxarm@huawei.com>,
	Shannon Zhao <shannon.zhaosl@gmail.com>,
	Zheng Xiang <zhengxiang9@huawei.com>,
	qemu-arm <qemu-arm@nongnu.org>, James Morse <james.morse@arm.com>,
	"xuwei (O)" <xuwei5@huawei.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM
Date: Wed, 22 Jan 2020 23:30:13 +0800
Message-ID: <a6030ba3-c82b-af4c-950c-4e15c315ad41@huawei.com> (raw)
In-Reply-To: <CAFEAcA_Qs3p=iEU+D5iqjyZYpPQO0D16AWvjp0wcvbvRNdGAGg@mail.gmail.com>

On 2020/1/20 20:15, Peter Maydell wrote:
> On Fri, 17 Jan 2020 at 10:05, gengdongjiu <gengdongjiu@huawei.com> wrote:
>>
>> On 2020/1/17 0:28, Peter Maydell wrote:
>>> On Wed, 8 Jan 2020 at 11:33, Dongjiu Geng <gengdongjiu@huawei.com> wrote:
>>>
>>>> +void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>> +{
>>>> +    ram_addr_t ram_addr;
>>>> +    hwaddr paddr;
>>>> +
>>>> +    assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>>>> +
>>>> +    if (acpi_enabled && addr &&
>>>> +            object_property_get_bool(qdev_get_machine(), "ras", NULL)) {
>>>> +        ram_addr = qemu_ram_addr_from_host(addr);
>>>> +        if (ram_addr != RAM_ADDR_INVALID &&
>>>> +            kvm_physical_memory_addr_from_host(c->kvm_state, addr, &paddr)) {
>>>> +            kvm_hwpoison_page_add(ram_addr);
>>>> +            /*
>>>> +             * Asynchronous signal will be masked by main thread, so
>>>> +             * only handle synchronous signal.
>>>> +             */
>>>
>>> I don't understand this comment. (I think we've had discussions
>>> about it before, but it's still not clear to me.)
>>>
>>> This function (kvm_arch_on_sigbus_vcpu()) will be called in two contexts:
>>>
>>> (1) in the vcpu thread:
>>>   * the real SIGBUS handler sigbus_handler() sets a flag and arranges
>>>     for an immediate vcpu exit
>>>   * the vcpu thread reads the flag on exit from KVM_RUN and
>>>     calls kvm_arch_on_sigbus_vcpu() directly
>>>   * the error could be MCEERR_AR or MCEERR_AOFor the vcpu thread, the error can be MCEERR_AR or MCEERR_AO,
>> but kernel/KVM usually uses MCEERR_AR(action required) instead of MCEERR_AO, because it needs do action immediately. For MCEERR_AO error, the action is optional and the error can be ignored.
>> At least I do not find Linux kernel/KVM deliver MCEERR_AO in the vcpu threads.
>>
>>> (2) MCE errors on other threads:
>>>   * here SIGBUS is blocked, so MCEERR_AR (action-required)
>>>     errors will cause the kernel to just kill the QEMU process
>>>   * MCEERR_AO errors will be handled via the iothread's use
>>>     of signalfd(), so kvm_on_sigbus() will get called from
>>>     the main thread, and it will call kvm_arch_on_sigbus_vcpu()
>>>   * in this case the passed in CPUState will (arbitrarily) be that
>>>     for the first vCPU
>>
>> For the MCE errors on other threads, it can only handle MCEERR_AO. If it is MCEERR_AR, the QEMU will assert and exit[2].
>>
>> Case1: Other APP indeed can send MCEERR_AO to QEMU, QEMU handle it via the iothread's use of signalfd() through above path.
>> Case2: But if the MCEERR_AO is delivered by kernel, I see QEMU ignore it because SIGBUS is masked in main thread[3], for this case, I do not see QEMU handle it via signalfd() for MCEERR_AO errors from my test.
> 
> SIGBUS is blocked in the main thread because we use signalfd().
> The function sigfd_handler() should be called and it will then
> manually invoke the correct function for the signal.
> 
>> For Case1,I think we should not let guest know it, because it is not triggered by guest. only other APP send SIGBUS to tell QEMU do somethings.
> 
> I don't understand what you mean here by "other app" or
> "guest" triggering of MCEERR. I thought that an MCEERR meant
> "the hardware has detected that there is a problem with the
> RAM". If there's a problem with the RAM and it's the RAM that's
> being used as guest RAM, we need to tell the guest, surely ?

  sure, If the error is guest RAM, we need to test the guest.
  I mean if the RAM that is being used as QEMU RAM(not guest RAM), we should not tell the guest.
  OR if another user space manually send SIGBUS to qemu, such as using "kill -s SIGBUS xxx" commands, we should not tell the guest.

> 
>> For Case2,it does not call call kvm_arch_on_sigbus_vcpu().
> 
> It should do. The code you quote calls that function
> for that case:
  According to our analysis, I also think it should call the function for that case.
  But from my test, I see  kvm_arch_on_sigbus_vcpu() is not called when KVM/Kernel delivers SIGBUS to QEMU main thread.
  So I am also confused. I haven't even dig into the reason yet.

 If anyone has done the test or knows the reason, welcome comments.


> 
>> [1]:
>> /* Called synchronously (via signalfd) in main thread.  */
>> int kvm_on_sigbus(int code, void *addr)
>> {
>> #ifdef KVM_HAVE_MCE_INJECTION
>>     /* Action required MCE kills the process if SIGBUS is blocked.  Because
>>      * that's what happens in the I/O thread, where we handle MCE via signalfd,
>>      * we can only get action optional here.
>>      */
>> [2]: assert(code != BUS_MCEERR_AR);
>>     kvm_arch_on_sigbus_vcpu(first_cpu, code, addr);
>>     return 0;
>> #else
>>     return 1;
>> #endif
>> }
> 
> 
>> Above all, from my test, for MCEERR_AO error which is triggered by guest, it not call
> kvm_arch_on_sigbus_vcpu().
> 
> I'm not sure what you mean by "triggered by guest". I assume that
> exactly what kind of errors the kernel can report and when will
> depend to some extent on the underlying hardware/firmware
> implementation of reporting of memory errors, but in principle
> the ABI allows the kernel to send SIGBUS_(BUS_MCEERR_AO) to the
> main thread, the signal should be handled by signalfd, our code
> for working with multiple fds should mean that the main thread
> calls sigfd_handler() to deal with reading bytes from the signalfd
> fd, and that function should then call sigbus_handler(), which
> calls kvm_on_sigbus(), which calls kvm_arch_on_sigbus_vcpu().
> If something in that code path is not working then we need to
> find out what it is.

  I agree with you, we need to check why it does not call sigbus_handler() for the SIGBUS delivered by kernel/KVM.
  But I think it can  put it in another series, this series we only handle the SIGBUS_(BUS_MCEERR_AR), whether do you think it is OK?
  Of course I will update the comments that you ever mentioned.

  By the way, If using "kill -s SIGBUS xxx" command to send SIGBUS to QEMU main thread, it indeed will be handled by signalfd.

> 
> thanks
> -- PMM
> .
> 


  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08 11:32 [PATCH v22 0/9] Add ARMv8 RAS virtualization support in QEMU Dongjiu Geng
2020-01-08 11:32 ` [PATCH v22 1/9] hw/arm/virt: Introduce a RAS machine option Dongjiu Geng
2020-01-08 11:32 ` [PATCH v22 2/9] docs: APEI GHES generation and CPER record description Dongjiu Geng
2020-01-15 15:11   ` Igor Mammedov
2020-01-08 11:32 ` [PATCH v22 3/9] ACPI: Build related register address fields via hardware error fw_cfg blob Dongjiu Geng
2020-01-23 15:14   ` Igor Mammedov
2020-02-02 14:01     ` gengdongjiu
2020-01-08 11:32 ` [PATCH v22 4/9] ACPI: Build Hardware Error Source Table Dongjiu Geng
2020-01-23 15:48   ` Igor Mammedov
2020-02-02 14:21     ` gengdongjiu
2020-02-05 16:43   ` Jonathan Cameron
2020-02-10 11:18     ` gengdongjiu
2020-01-08 11:32 ` [PATCH v22 5/9] ACPI: Record the Generic Error Status Block address Dongjiu Geng
2020-01-16 16:44   ` Peter Maydell
2020-01-17 10:36     ` gengdongjiu
2020-02-13 15:28     ` gengdongjiu
2020-01-17  7:39   ` Philippe Mathieu-Daudé
2020-01-17 10:47     ` gengdongjiu
2020-01-17 11:20       ` Philippe Mathieu-Daudé
2020-01-28 14:41   ` Igor Mammedov
2020-01-28 16:19     ` Igor Mammedov
2020-02-02 12:44     ` gengdongjiu
2020-02-03  7:51       ` Igor Mammedov
2020-01-08 11:32 ` [PATCH v22 6/9] KVM: Move hwpoison page related functions into kvm-all.c Dongjiu Geng
2020-01-08 11:32 ` [PATCH v22 7/9] ACPI: Record Generic Error Status Block(GESB) table Dongjiu Geng
2020-01-28 15:29   ` Igor Mammedov
2020-02-02 13:42     ` gengdongjiu
2020-02-03  7:55       ` Igor Mammedov
2020-01-08 11:32 ` [PATCH v22 8/9] target-arm: kvm64: handle SIGBUS signal from kernel or KVM Dongjiu Geng
2020-01-16 16:28   ` Peter Maydell
2020-01-16 16:40     ` Peter Maydell
2020-01-17 10:04     ` gengdongjiu
2020-01-20 12:15       ` Peter Maydell
2020-01-22 15:30         ` gengdongjiu [this message]
2020-01-08 11:32 ` [PATCH v22 9/9] MAINTAINERS: Add ACPI/HEST/GHES entries Dongjiu Geng
2020-01-16 16:46   ` Peter Maydell
2020-01-17  7:22     ` Philippe Mathieu-Daudé
2020-01-17 10:16       ` gengdongjiu
2020-01-17 11:09       ` Peter Maydell
2020-01-17 11:22         ` Philippe Mathieu-Daudé
2020-01-19  8:19           ` gengdongjiu
2020-01-17 12:19         ` Michael S. Tsirkin
2020-01-09  3:38 ` [PATCH v22 0/9] Add ARMv8 RAS virtualization support in QEMU gengdongjiu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6030ba3-c82b-af4c-950c-4e15c315ad41@huawei.com \
    --to=gengdongjiu@huawei.com \
    --cc=ehabkost@redhat.com \
    --cc=fam@euphon.net \
    --cc=imammedo@redhat.com \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=shannon.zhaosl@gmail.com \
    --cc=xuwei5@huawei.com \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git