From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55121) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZyHET-0004X7-3K for qemu-devel@nongnu.org; Mon, 16 Nov 2015 05:41:58 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZyHEO-0006AZ-1r for qemu-devel@nongnu.org; Mon, 16 Nov 2015 05:41:57 -0500 References: <20151111171135.4328.41819.stgit@aravindap> <20151111171602.4328.34006.stgit@aravindap> <56444957.9080003@redhat.com> <56445E7B.5010904@redhat.com> <5644DF41.7060605@linux.vnet.ibm.com> <56498B20.3010609@redhat.com> <5649AAFE.5030002@linux.vnet.ibm.com> From: Thomas Huth Message-ID: <5649B2EA.6040108@redhat.com> Date: Mon, 16 Nov 2015 11:41:46 +0100 MIME-Version: 1.0 In-Reply-To: <5649AAFE.5030002@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 4/4] target-ppc: Handle NMI guest exit List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aravinda Prasad Cc: benh@au1.ibm.com, aik@ozlabs.ru, agraf@suse.de, qemu-devel@nongnu.org, qemu-ppc@nongnu.org, paulus@samba.org, sam.bobroff@au1.ibm.com, david@gibson.dropbear.id.au On 16/11/15 11:07, Aravinda Prasad wrote: >=20 >=20 > On Monday 16 November 2015 01:22 PM, Thomas Huth wrote: >> On 12/11/15 19:49, Aravinda Prasad wrote: >>> >>> On Thursday 12 November 2015 03:10 PM, Thomas Huth wrote: >> ... >>>> Also LoPAPR talks about 'subsequent processors report "fatal error >>>> previously reported"', so maybe the other processors should report t= hat >>>> condition in this case? >>> >>> I feel guest kernel is responsible for that or does that mean that qe= mu >>> should report the same error, which first processor encountered, for >>> subsequent processors? In that case what if the error encountered by >>> first processor was recovered. >> >> I simply refered to this text in LoPAPR: >> >> Multiple processors of the same OS image may experi- >> ence fatal events at, or about, the same time. The first processor >> to enter the machine check handling firmware reports >> the fatal error. Subsequent processors serialize waiting for the >> first processor to issue the ibm,nmi-interlock call. These >> subsequent processors report "fatal error previously reported". >=20 > Yes, I asked this because I am not clear what "fatal error previously > reported" means as described in PAPR. Looking at table "Table 137. RTAS Event Return Format (Fixed Part)" in LoPAPR, there is a "ALREADY_REPORTED" severity - I assume this is what is meant by the cited paragraph? >> Is there code in the host kernel already that takes care of this (I >> haven't checked)? If so, how does the host kernel know that the event >> happened "at or about the same time" since you're checking at the QEMU >> side for the mutex condition? >=20 > I don't think the host kernel takes care of this; it simply forwards > such errors to QEMU via NMI exit. I feel the time referred by "at or > about the same time" is the duration between the registered machine > check handler is invoked and the corresponding interlock call is issued > by guest, which QEMU knows and is protected by a mutex. I agree, that makes sense. >>>> And of course you've also got to check that the same CPU is not gett= ing >>>> multiple NMIs before the interlock function has been called again. >>> >>> I think it is good to check that. However, shouldn't the guest enable= ME >>> until it calls interlock function? >> >> First, the hypervisor should never trust the guest to do the right >> things. Second, LoPAPR says "the OS permanently relinquishes to firmwa= re >> the Machine State Register's Machine Check Enable bit", and Paul also >> said something similar in another mail to this thread, so I think you >> really have to check this in QEMU instead. >=20 > Hmm. ok. Since ME is always set when running in guest (assuming guest i= s > not disabling it), we cannot check ME bit to figure out whether the sam= e > CPU is getting UEs before interlock is called. One way is to record the > CPU ID upon such error and check before invoking registered machine > check handler whether that CPU has a pending interlock call. Terminate > the guest if there is a pending interlock call for that CPU rather than > causing the guest to trigger recursive machine check errors. Do we have some kind of checkstop state emulation in QEMU (sorry, I haven't checked yet)? If yes, it might be nicer to use that and set the guest state to PANIC instead of exiting QEMU directly - i.e. to do something similar like the guest_panicked() function in target-s390x/kvm.c. That way the management layer (libvirt) can decide on its own whether to terminate the guest, reboot or keep it in the crashed state for further analysis. Thomas