From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58323) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YEepi-0007VE-Lx for qemu-devel@nongnu.org; Fri, 23 Jan 2015 09:03:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YEepf-0005oz-EC for qemu-devel@nongnu.org; Fri, 23 Jan 2015 09:03:34 -0500 Received: from mail-we0-f171.google.com ([74.125.82.171]:57061) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YEepf-0005ot-4t for qemu-devel@nongnu.org; Fri, 23 Jan 2015 09:03:31 -0500 Received: by mail-we0-f171.google.com with SMTP id q58so381059wes.2 for ; Fri, 23 Jan 2015 06:03:30 -0800 (PST) Received: from [192.168.49.165] ([62.217.45.26]) by mx.google.com with ESMTPSA id i3sm1966832wie.23.2015.01.23.06.03.28 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Jan 2015 06:03:29 -0800 (PST) Message-ID: <54C254AF.7010101@profitbricks.com> Date: Fri, 23 Jan 2015 15:03:27 +0100 From: Mikhail Sennikovskii MIME-Version: 1.0 References: <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com> <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com> In-Reply-To: <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Hi all, I'm running a slitely modified migration over tcp test in virt-test, which does a migration from one "smp=2" VM to another on the same host over TCP, and exposes some dummy CPU load inside the GUEST while migration, and after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD inside the guest, which happens when " An expected clock interrupt was not received on a secondary processor in an MP system within the allocated interval. This indicates that the specified processor is hung and not processing interrupts. " This seems to happen with any qemu version I've tested (1.2 and above, including upstream), and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 host. One thing I noticed is that exposing a dummy CPU load on the HOST (like running multiple instances of the "while true; do false; done" script) in parallel with doing migration makes the issue to be quite easily reproducible. Looking inside the windows crash dump, the second CPU is just running at IRQL 0, and it aparently not hung, as Windows is able to save its state in the crash dump correctly, which assumes running some code on it. So this aparently seems to be some timing issue (like host scheduler does not schedule the thread executing secondary CPU's code in time). Could you give me some insight on this, i.e. is there a way to customize QEMU/KVM to avoid such issue? If you think this might be a qemu/kvm issue, I can provide you any info, like windows crash dumps, or the test-case to reproduce this. qemu is started as: from-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 \ -netdev user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ -m 2G \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu phenom \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :0 \ -rtc base=localtime,clock=host,driftfix=none \ -boot order=cdn,once=c,menu=off \ -enable-kvm to-VM: qemu-system-x86_64 \ -S \ -name 'virt-tests-vm1' \ -sandbox off \ -M pc-1.0 \ -nodefaults \ -vga std \ -chardev socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait \ -mon chardev=qmp_id_qmp1,mode=control \ -chardev socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait \ -device isa-serial,chardev=serial_id_serial0 \ -chardev socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait \ -device isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ -device virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ -device virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 \ -netdev user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ -m 2G \ -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ -cpu phenom \ -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ -vnc :1 \ -rtc base=localtime,clock=host,driftfix=none \ -boot order=cdn,once=c,menu=off \ -enable-kvm \ -incoming tcp:0:5200 Thanks, Mikhail