From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrey Korolyov Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error Date: Thu, 26 Mar 2015 18:05:09 +0300 Message-ID: References: <20150311165203.GL2334@work-vm> <20150311173738.GD29522@morn.localdomain> <20150311175904.GN2334@work-vm> <20150311184039.GA7341@morn.localdomain> <20150311184531.GA11423@morn.localdomain> <20150311191928.GA14695@morn.localdomain> <20150311193337.GA13162@work-vm> <20150311195920.GR2334@work-vm> <20150312095902.GC2330@work-vm> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: "Dr. David Alan Gilbert" , "Kevin O'Connor" , Paolo Bonzini , Gerd Hoffmann , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" To: Bandan Das Return-path: Received: from mail-la0-f42.google.com ([209.85.215.42]:34563 "EHLO mail-la0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752634AbbCZPFb (ORCPT ); Thu, 26 Mar 2015 11:05:31 -0400 Received: by lagg8 with SMTP id g8so47826989lag.1 for ; Thu, 26 Mar 2015 08:05:30 -0700 (PDT) In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Mar 26, 2015 at 12:18 PM, Andrey Korolyov wrote: > On Thu, Mar 26, 2015 at 5:47 AM, Bandan Das wrote: >> Hi Andrey, >> >> Andrey Korolyov writes: >> >>> On Mon, Mar 16, 2015 at 10:17 PM, Andrey Korolyov wrote: >>>> For now, it looks like bug have a mixed Murphy-Heisenberg nature, as >>>> it appearance is very rare (compared to the number of actual launches) >>>> and most probably bounded to the physical characteristics of my >>>> production nodes. As soon as I reach any reproducible path for a >>>> regular workstation environment, I`ll let everyone know. Also I am >>>> starting to think that issue can belong to the particular motherboard >>>> firmware revision, despite fact that the CPU microcode is the same >>>> everywhere. >> >> I will take the risk and say this - "could it be a processor bug ?" :) >> >>> >>> Hello everyone, I`ve managed to reproduce this issue >>> *deterministically* with latest seabios with smp fix and 3.18.3. The >>> error occuring just *once* per vm until hypervisor reboots, at least >>> in my setup, this is definitely crazy... >>> >>> - launch two VMs (Centos 7 in my case), >>> - wait a little while they are booting, >>> - attach serial console (I am using virsh list for this exact purpose), >>> - issue acpi reboot or reset, does not matter, >>> - VM always hangs at boot, most times with sgabios initialization >>> string printed out [1], but sometimes it hangs a bit later [2], >>> - no matter how many times I try to relaunch the QEMU afterwards, the >>> issue does not appear on VM which experienced problem once; >>> - trace and sample args can be seen in [3] and [4] respectively. >> >> My system is a Dell R720 dual socket which has 2620v2s. I tried your >> setup but couldn't reproduce (my qemu cmdline isn't exactly the same >> as yours), although, if you could simplify your command line a bit, >> I can try again. >> >> Bandan >> >>> 1) >>> Google, Inc. >>> Serial Graphics Adapter 06/11/14 >>> SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ >>> (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 >>> Term: 211x62 >>> 4 0 >>> >>> 2) >>> Google, Inc. >>> Serial Graphics Adapter 06/11/14 >>> SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ >>> (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 >>> Term: 211x62 >>> 4 0 >>> [...empty screen...] >>> SeaBIOS (version 1.8.1-20150325_230423-testnode) >>> Machine UUID 3c78721f-7317-4f85-bcbe-f5ad46d293a1 >>> >>> >>> iPXE (http://ipxe.org) 00:02.0 C100 PCI2.10 PnP PMM+3FF95BA0+3FEF5BA0 C10 >>> >>> 3) >>> >>> KVM internal error. Suberror: 2 >>> extra data[0]: 800000ef >>> extra data[1]: 80000b0d >>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 >>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d2c >>> EIP=0000d331 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>> ES =0000 00000000 0000ffff 00009300 >>> CS =f000 000f0000 0000ffff 00009b00 >>> SS =0000 00000000 0000ffff 00009300 >>> DS =0000 00000000 0000ffff 00009300 >>> FS =0000 00000000 0000ffff 00009300 >>> GS =0000 00000000 0000ffff 00009300 >>> LDT=0000 00000000 0000ffff 00008200 >>> TR =0000 00000000 0000ffff 00008b00 >>> GDT= 000f6cb0 00000037 >>> IDT= 00000000 000003ff >>> CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 >>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 >>> DR3=0000000000000000 >>> DR6=00000000ffff0ff0 DR7=0000000000000400 >>> EFER=0000000000000000 >>> Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb >>> 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f >>> ba 2d d4 fe fb 3f >>> >>> 4) >>> /usr/bin/qemu-system-x86_64 -name centos71 -S -machine >>> pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -bios >>> /usr/share/seabios/bios.bin -m 1024 -realtime mlock=off -smp >>> 12,sockets=1,cores=12,threads=12 -uuid >>> 3c78721f-7317-4f85-bcbe-f5ad46d293a1 -nographic -no-user-config >>> -nodefaults -device sga -chardev >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos71.monitor,server,nowait >>> -mon chardev=charmonitor,id=monitor,mode=control -rtc >>> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard >>> -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global >>> PIIX4_PM.disable_s4=1 -boot strict=on -device >>> nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device >>> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive >>> file=rbd:dev-rack2/centos7-1.raw:id=qemukvm:key=XXXXXXXXXXXXXXXXXXXXXXXXXX:auth_supported=cephx\;none:mon_host=10.6.0.1\:6789\;10.6.0.3\:6789\;10.6.0.4\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native >>> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 >>> -chardev pty,id=charserial0 -device >>> isa-serial,chardev=charserial0,id=serial0 -chardev >>> socket,id=charchannel0,path=/var/lib/libvirt/qemu/centos71.sock,server,nowait >>> -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 >>> -msg timestamp=on > > Hehe, 2.2 works just perfectly but 2.1 isn`t. I`ll bisect the issue in > a next couple of days and post the right commit (but as can remember > none of commits b/w 2.1 and 2.2 can fix simular issue by a purpose). > I`ve attached a reference xml to simplify playing with libvirt if > anyone willing to do so. Sorry, 2.2 hangs as well but more rarely. Looks like it is important to conduct the test sequence on a freshly booted host, as issue tends to not reappear during the hypervisor boot cycle. Please let me know if host kernel config is needed, for example if nobody will be able to reproduce this. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40797) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yb9Li-00086R-8O for qemu-devel@nongnu.org; Thu, 26 Mar 2015 11:05:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yb9Lg-0005Vx-7b for qemu-devel@nongnu.org; Thu, 26 Mar 2015 11:05:34 -0400 Received: from mail-la0-x231.google.com ([2a00:1450:4010:c03::231]:33302) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yb9Lf-0005Vn-I4 for qemu-devel@nongnu.org; Thu, 26 Mar 2015 11:05:32 -0400 Received: by labto5 with SMTP id to5so48018270lab.0 for ; Thu, 26 Mar 2015 08:05:30 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20150311165203.GL2334@work-vm> <20150311173738.GD29522@morn.localdomain> <20150311175904.GN2334@work-vm> <20150311184039.GA7341@morn.localdomain> <20150311184531.GA11423@morn.localdomain> <20150311191928.GA14695@morn.localdomain> <20150311193337.GA13162@work-vm> <20150311195920.GR2334@work-vm> <20150312095902.GC2330@work-vm> From: Andrey Korolyov Date: Thu, 26 Mar 2015 18:05:09 +0300 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] E5-2620v2 - emulation stop error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Bandan Das Cc: "kvm@vger.kernel.org" , "qemu-devel@nongnu.org" , "Dr. David Alan Gilbert" , Kevin O'Connor , Gerd Hoffmann , Paolo Bonzini On Thu, Mar 26, 2015 at 12:18 PM, Andrey Korolyov wrote: > On Thu, Mar 26, 2015 at 5:47 AM, Bandan Das wrote: >> Hi Andrey, >> >> Andrey Korolyov writes: >> >>> On Mon, Mar 16, 2015 at 10:17 PM, Andrey Korolyov wrote: >>>> For now, it looks like bug have a mixed Murphy-Heisenberg nature, as >>>> it appearance is very rare (compared to the number of actual launches) >>>> and most probably bounded to the physical characteristics of my >>>> production nodes. As soon as I reach any reproducible path for a >>>> regular workstation environment, I`ll let everyone know. Also I am >>>> starting to think that issue can belong to the particular motherboard >>>> firmware revision, despite fact that the CPU microcode is the same >>>> everywhere. >> >> I will take the risk and say this - "could it be a processor bug ?" :) >> >>> >>> Hello everyone, I`ve managed to reproduce this issue >>> *deterministically* with latest seabios with smp fix and 3.18.3. The >>> error occuring just *once* per vm until hypervisor reboots, at least >>> in my setup, this is definitely crazy... >>> >>> - launch two VMs (Centos 7 in my case), >>> - wait a little while they are booting, >>> - attach serial console (I am using virsh list for this exact purpose), >>> - issue acpi reboot or reset, does not matter, >>> - VM always hangs at boot, most times with sgabios initialization >>> string printed out [1], but sometimes it hangs a bit later [2], >>> - no matter how many times I try to relaunch the QEMU afterwards, the >>> issue does not appear on VM which experienced problem once; >>> - trace and sample args can be seen in [3] and [4] respectively. >> >> My system is a Dell R720 dual socket which has 2620v2s. I tried your >> setup but couldn't reproduce (my qemu cmdline isn't exactly the same >> as yours), although, if you could simplify your command line a bit, >> I can try again. >> >> Bandan >> >>> 1) >>> Google, Inc. >>> Serial Graphics Adapter 06/11/14 >>> SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ >>> (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 >>> Term: 211x62 >>> 4 0 >>> >>> 2) >>> Google, Inc. >>> Serial Graphics Adapter 06/11/14 >>> SGABIOS $Id: sgabios.S 8 2010-04-22 00:03:40Z nlaredo $ >>> (pbuilder@zorak) Wed Jun 11 05:57:34 UTC 2014 >>> Term: 211x62 >>> 4 0 >>> [...empty screen...] >>> SeaBIOS (version 1.8.1-20150325_230423-testnode) >>> Machine UUID 3c78721f-7317-4f85-bcbe-f5ad46d293a1 >>> >>> >>> iPXE (http://ipxe.org) 00:02.0 C100 PCI2.10 PnP PMM+3FF95BA0+3FEF5BA0 C10 >>> >>> 3) >>> >>> KVM internal error. Suberror: 2 >>> extra data[0]: 800000ef >>> extra data[1]: 80000b0d >>> EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000 >>> ESI=00000000 EDI=00000000 EBP=00000000 ESP=00006d2c >>> EIP=0000d331 EFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 >>> ES =0000 00000000 0000ffff 00009300 >>> CS =f000 000f0000 0000ffff 00009b00 >>> SS =0000 00000000 0000ffff 00009300 >>> DS =0000 00000000 0000ffff 00009300 >>> FS =0000 00000000 0000ffff 00009300 >>> GS =0000 00000000 0000ffff 00009300 >>> LDT=0000 00000000 0000ffff 00008200 >>> TR =0000 00000000 0000ffff 00008b00 >>> GDT= 000f6cb0 00000037 >>> IDT= 00000000 000003ff >>> CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 >>> DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 >>> DR3=0000000000000000 >>> DR6=00000000ffff0ff0 DR7=0000000000000400 >>> EFER=0000000000000000 >>> Code=66 c3 cd 02 cb cd 10 cb cd 13 cb cd 15 cb cd 16 cb cd 18 cb >>> 19 cb cd 1c cb cd 4a cb fa fc 66 ba 47 d3 0f 00 e9 ad fe f3 90 f0 0f >>> ba 2d d4 fe fb 3f >>> >>> 4) >>> /usr/bin/qemu-system-x86_64 -name centos71 -S -machine >>> pc-i440fx-2.1,accel=kvm,usb=off -cpu SandyBridge,+kvm_pv_eoi -bios >>> /usr/share/seabios/bios.bin -m 1024 -realtime mlock=off -smp >>> 12,sockets=1,cores=12,threads=12 -uuid >>> 3c78721f-7317-4f85-bcbe-f5ad46d293a1 -nographic -no-user-config >>> -nodefaults -device sga -chardev >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/centos71.monitor,server,nowait >>> -mon chardev=charmonitor,id=monitor,mode=control -rtc >>> base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard >>> -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global >>> PIIX4_PM.disable_s4=1 -boot strict=on -device >>> nec-usb-xhci,id=usb,bus=pci.0,addr=0x3 -device >>> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive >>> file=rbd:dev-rack2/centos7-1.raw:id=qemukvm:key=XXXXXXXXXXXXXXXXXXXXXXXXXX:auth_supported=cephx\;none:mon_host=10.6.0.1\:6789\;10.6.0.3\:6789\;10.6.0.4\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=writeback,aio=native >>> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 >>> -chardev pty,id=charserial0 -device >>> isa-serial,chardev=charserial0,id=serial0 -chardev >>> socket,id=charchannel0,path=/var/lib/libvirt/qemu/centos71.sock,server,nowait >>> -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 >>> -msg timestamp=on > > Hehe, 2.2 works just perfectly but 2.1 isn`t. I`ll bisect the issue in > a next couple of days and post the right commit (but as can remember > none of commits b/w 2.1 and 2.2 can fix simular issue by a purpose). > I`ve attached a reference xml to simplify playing with libvirt if > anyone willing to do so. Sorry, 2.2 hangs as well but more rarely. Looks like it is important to conduct the test sequence on a freshly booted host, as issue tends to not reappear during the hypervisor boot cycle. Please let me know if host kernel config is needed, for example if nobody will be able to reproduce this.