From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [PATCH -v4] QEMU-KVM: MCE: Relay UCR MCE to guest Date: Tue, 22 Sep 2009 09:14:32 +0300 Message-ID: <4AB86B48.8060604@redhat.com> References: <1253501005.15717.548.camel@yhuang-dev.sh.intel.com> <4AB750A6.1090000@redhat.com> <1253581974.15717.726.camel@yhuang-dev.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , Andi Kleen , Anthony Liguori , "kvm@vger.kernel.org" To: Huang Ying Return-path: Received: from mx1.redhat.com ([209.132.183.28]:8655 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751474AbZIVGOl (ORCPT ); Tue, 22 Sep 2009 02:14:41 -0400 In-Reply-To: <1253581974.15717.726.camel@yhuang-dev.sh.intel.com> Sender: kvm-owner@vger.kernel.org List-ID: On 09/22/2009 04:12 AM, Huang Ying wrote: > On Mon, 2009-09-21 at 18:08 +0800, Avi Kivity wrote: > >> On 09/21/2009 05:43 AM, Huang Ying wrote: >> >>> UCR (uncorrected recovery) MCE is supported in recent Intel CPUs, >>> where some hardware error such as some memory error can be reported >>> without PCC (processor context corrupted). To recover from such MCE, >>> the corresponding memory will be unmapped, and all processes accessing >>> the memory will be killed via SIGBUS. >>> >>> For KVM, if QEMU/KVM is killed, all guest processes will be killed >>> too. So we relay SIGBUS from host OS to guest system via a UCR MCE >>> injection. Then guest OS can isolate corresponding memory and kill >>> necessary guest processes only. SIGBUS sent to main thread (not VCPU >>> threads) will be broadcast to all VCPU threads as UCR MCE. >>> >>> >>> >>> --- a/qemu-kvm.c >>> +++ b/qemu-kvm.c >>> @@ -27,10 +27,23 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> >> This causes a build failure, since not all hosts have, >> but more importantly: >> > Maybe we can just add necessary fields to struct qemu_signalfd_siginfo. > But this may be not portable. > That is what I did. >>> + >>> +static void sigbus_handler(int n, struct signalfd_siginfo *siginfo, void *ctx) >>> +{ >>> >>> >> Here you accept signalfd_siginfo, while >> >> >>> + >>> + memset(&action, 0, sizeof(action)); >>> + action.sa_flags = SA_SIGINFO; >>> + action.sa_sigaction = (void (*)(int, siginfo_t*, void*))sigbus_handler; >>> + sigaction(SIGBUS,&action, NULL); >>> + prctl(PR_MCE_KILL, 1, 1); >>> return 0; >>> >>> >> here you arm the function with something that will send it a siginfo_t. >> So it looks like this is broken if a signal is ever received directly? >> But can this happen due to signalfd? >> > Because SIGBUS is blocked, I think the signal handler will not be called > directly, but from sigfd_handler. > Yes, I think so too. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.