From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrey Korolyov Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error Date: Wed, 1 Apr 2015 15:26:53 +0300 Message-ID: References: <20150330185634.GE13271@potion.brq.redhat.com> <20150331134512.GG13271@potion.brq.redhat.com> <20150331164539.GD14262@potion.brq.redhat.com> <20150401114923.GH13271@potion.brq.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Bandan Das , "Kevin O'Connor" , "Dr. David Alan Gilbert" , Paolo Bonzini , Gerd Hoffmann , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Return-path: Received: from mail-lb0-f170.google.com ([209.85.217.170]:33128 "EHLO mail-lb0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753029AbbDAM1Q convert rfc822-to-8bit (ORCPT ); Wed, 1 Apr 2015 08:27:16 -0400 Received: by lbbzk7 with SMTP id zk7so18532802lbb.0 for ; Wed, 01 Apr 2015 05:27:15 -0700 (PDT) In-Reply-To: <20150401114923.GH13271@potion.brq.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Wed, Apr 1, 2015 at 2:49 PM, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > 2015-03-31 21:23+0300, Andrey Korolyov: >> On Tue, Mar 31, 2015 at 9:04 PM, Bandan Das wrote: >> > Bandan Das writes: >> >> Andrey Korolyov writes: >> >> ... >> >>> http://xdel.ru/downloads/kvm-e5v2-issue/another-tracepoint-fail-= with-apicv.dat.gz >> >>> >> >>> Something a bit more interesting, but the mess is happening just >> >>> *after* NMI firing. >> >> >> >> What happens if NMI is turned off on the host ? >> > >> > Sorry, I meant the watchdog.. >> >> Thanks, everything goes well (as it probably should go there): >> http://xdel.ru/downloads/kvm-e5v2-issue/apicv-enabled-nmi-disabled.d= at.gz > > Nice revelation! > > KVM doesn't expect host's NMIs to look like this so it doesn't pass t= hem > to the host. What was the watchdog that casually sent NMIs? > (It worked after "nmi_watchdog=3D0" on the host?) > > (Guest's NMI should have a different result as well. NMI_EXCEPTION i= s > an expected exit reason for guest's hard exceptions, they are then > differentiated by intr_info and nothing hinted that this was a NMI.) Yes, I disabled host watchdog during runtime. Indeed guest-induced NMI would look different and they had no reasons to be fired at this stage inside guest. I`d suspect a hypervisor hardware misbehavior there but have a very little idea on how APICv behavior (which is completely microcode-dependent and CPU-dependent but decoupled from peripheral hardware) may vary at this point, I am using 1.20140913.1 ucode version from debian if this can matter. Will send trace suggested by Paolo in a next couple of hours. Also it would be awesome to ask hardware folks from Intel who can prove or disprove my abovementioned statement (as I was unable to catch the problem on 2603v2 so far, this hypothesis has some chance to be real). From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:42114) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YdHjr-00063V-GK for qemu-devel@nongnu.org; Wed, 01 Apr 2015 08:27:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YdHjo-000097-VA for qemu-devel@nongnu.org; Wed, 01 Apr 2015 08:27:19 -0400 Received: from mail-la0-x22c.google.com ([2a00:1450:4010:c03::22c]:34142) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YdHjo-00008O-FA for qemu-devel@nongnu.org; Wed, 01 Apr 2015 08:27:16 -0400 Received: by lagg8 with SMTP id g8so35256125lag.1 for ; Wed, 01 Apr 2015 05:27:15 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20150401114923.GH13271@potion.brq.redhat.com> References: <20150330185634.GE13271@potion.brq.redhat.com> <20150331134512.GG13271@potion.brq.redhat.com> <20150331164539.GD14262@potion.brq.redhat.com> <20150401114923.GH13271@potion.brq.redhat.com> From: Andrey Korolyov Date: Wed, 1 Apr 2015 15:26:53 +0300 Message-ID: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= Cc: "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , "Dr. David Alan Gilbert" , Bandan Das , Kevin O'Connor , Gerd Hoffmann , Paolo Bonzini On Wed, Apr 1, 2015 at 2:49 PM, Radim Kr=C4=8Dm=C3=A1=C5=99 wrote: > 2015-03-31 21:23+0300, Andrey Korolyov: >> On Tue, Mar 31, 2015 at 9:04 PM, Bandan Das wrote: >> > Bandan Das writes: >> >> Andrey Korolyov writes: >> >> ... >> >>> http://xdel.ru/downloads/kvm-e5v2-issue/another-tracepoint-fail-with= -apicv.dat.gz >> >>> >> >>> Something a bit more interesting, but the mess is happening just >> >>> *after* NMI firing. >> >> >> >> What happens if NMI is turned off on the host ? >> > >> > Sorry, I meant the watchdog.. >> >> Thanks, everything goes well (as it probably should go there): >> http://xdel.ru/downloads/kvm-e5v2-issue/apicv-enabled-nmi-disabled.dat.g= z > > Nice revelation! > > KVM doesn't expect host's NMIs to look like this so it doesn't pass them > to the host. What was the watchdog that casually sent NMIs? > (It worked after "nmi_watchdog=3D0" on the host?) > > (Guest's NMI should have a different result as well. NMI_EXCEPTION is > an expected exit reason for guest's hard exceptions, they are then > differentiated by intr_info and nothing hinted that this was a NMI.) Yes, I disabled host watchdog during runtime. Indeed guest-induced NMI would look different and they had no reasons to be fired at this stage inside guest. I`d suspect a hypervisor hardware misbehavior there but have a very little idea on how APICv behavior (which is completely microcode-dependent and CPU-dependent but decoupled from peripheral hardware) may vary at this point, I am using 1.20140913.1 ucode version from debian if this can matter. Will send trace suggested by Paolo in a next couple of hours. Also it would be awesome to ask hardware folks from Intel who can prove or disprove my abovementioned statement (as I was unable to catch the problem on 2603v2 so far, this hypothesis has some chance to be real).