XP machine freeze

* XP machine freeze
@ 2015-03-16 15:10 Saso Slavicic
  2015-03-19  0:51 ` Marcelo Tosatti
  2015-03-22 15:31 ` Brad Campbell
  0 siblings, 2 replies; 25+ messages in thread
From: Saso Slavicic @ 2015-03-16 15:10 UTC (permalink / raw)
  To: kvm

Hi,

I'm fairly experienced with KVM (Centos 5/6), running about a dozen servers
with 20-30 different (Linux & MS platform) systems.
I have one Windows XP machine that acts very strangely - it freezes. I get
ping timeout for the VM from my monitoring and the machine spins 2 or 3
cores using all the cpu. Now the interesting thing that happens is that once
you open the console, it suddenly starts working again. You can see the
clock catching up as it was frozen in time and everything works normally
once the timer catches up. It usually happens probably about once a month,
although it happened yesterday and today again.

This machine is on Centos 6, qemu-kvm-0.12.1.2-2.448.el6_6, kernel
2.6.32-504.3.3.el6.x86_64.
I was able to do some debugging when the machine was frozen, so I got some
things to work with:

# virsh qemu-monitor-command --hmp DBserver 'info cpus'
* CPU #0: pc=0x0000000080501fdd thread_id=32595
  CPU #1: pc=0x00000000806e7a9b thread_id=32596
  CPU #2: pc=0x00000000ba2da162 (halted) thread_id=32597
  CPU #3: pc=0x00000000ba2da162 (halted) thread_id=32598

Now, in both yesterday's and today's event the CPU0 was stopped at
0x0000000080501fdd. I've disassembled the function and got this:

 0x0000000080501fb5:  int3
 0x0000000080501fb6:  mov    %edi,%edi
 0x0000000080501fb8:  push   %ebp
 0x0000000080501fb9:  mov    %esp,%ebp
 0x0000000080501fbb:  push   %esi
 0x0000000080501fbc:  mov    %fs:0x20,%eax
 0x0000000080501fc2:  mov    0x8(%ebp),%ecx
 0x0000000080501fc5:  lea    -0x1(%ecx),%esi
 0x0000000080501fc8:  test   %esi,%ecx
 0x0000000080501fca:  lea    0x7ec(%eax),%edx
 0x0000000080501fd0:  pop    %esi
 0x0000000080501fd1:  je     0x80501fdd
 0x0000000080501fd3:  lea    0x7a0(%eax),%edx
 0x0000000080501fd9:  jmp    0x80501fdd
 *0x0000000080501fdb:  pause
 0x0000000080501fdd:  cmpl   $0x0,(%edx)
 0x0000000080501fe0:  jne    0x80501fdb
 0x0000000080501fe2:  pop    %ebp
 0x0000000080501fe3:  ret    $0x4
 0x0000000080501fe6:  int3

Mov %edi,%edi is clearly the start of some function. From what I've been
able to understand, the code fetches _KPRCB structure (%fs:0x20) and then
does a spinlock between fdb and fe0 checking for PacketBarrier (?) in EDX
(0xffdff8c0). Now, $pc always shows fdd address, shouldn't it jump between
fdb and fe0, it seems as if it was stuck at fdd?

# virsh qemu-monitor-command --hmp DBserver 'info registers'
 EAX=ffdff120 EBX=c06ddf58 ECX=0000000e EDX=ffdff8c0
 ESI=be6e3921 EDI=c06ddf60 EBP=ba4ff708 ESP=ba4ff708
 EIP=80501fdd EFL=00000202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
 CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA]
 SS =0010 00000000 ffffffff 00c09300 DPL=0 DS   [-WA]
 DS =0023 00000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
 FS =0030 ffdff000 00001fff 00c09300 DPL=0 DS   [-WA]
 GS =0000 00000000 000fffff 00000000
 LDT=0000 00000000 000fffff 00000000
 TR =0028 80042000 000020ab 00008b00 DPL=0 TSS32-busy
 GDT=     8003f000 000003ff
 IDT=     8003f400 000007ff
 CR0=8001003b CR2=dbbec000 CR3=0b3c0020 CR4=000006f8
 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
 DR6=ffff0ff0 DR7=00000400
 FCW=027f FSW=0020 [ST=0] FTW=00 MXCSR=00001fa0
 FPR0=8053632b003c1658 c048 FPR1=e1e0c048bf80f6ab 76f8
 FPR2=e1e0000000000000 0023 FPR3=0b017c30003c1658 0000
 FPR4=0000003bba1a7604 1e64 FPR5=0007268c00000000 003b
 FPR6=000002020000001b 2684 FPR7=e3e0a9b4e1b50de4 ca0b
 XMM00=0000000000a1fc95000000000020027f
XMM01=0000ffff00001fa000001c4c00000001
 XMM02=000000000000c0488053632b003c1658
XMM03=00000000000076f8e1e0c048bf80f6ab
 XMM04=0000000000000023e1e0000000000000
XMM05=00000000000000000b017c30003c1658
 XMM06=0000000000001e640000003bba1a7604
XMM07=000000000000003b0007268c00000000

Clearly, the address in EDX is not 0:

[root@linux ~]# virsh qemu-monitor-command --hmp DBserver 'x/1xb 0xFFDFF8C0'
00000000ffdff8c0: 0x0e

[root@linux ~]# virt-manager

[root@linux ~]# virsh qemu-monitor-command --hmp DBserver 'x/1xb 0xFFDFF8C0'
00000000ffdff8c0: 0x00

However as soon as the VM console is opened and machine starts, the address
in EDX is set to 0 and the loop is broken.
Does anybody recognize what function that is? What could possibly happen
that opening the console and moving the mouse a little, unfreezes the
machine?
VM has .81 virtio drivers from Fedora repo at the moment.

The configuration of the machine is pretty standard:

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made
using:
  virsh edit DBserver
or other application using the libvirt API.
-->

 <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>DBserver</name>
  <uuid>e42b4cf2-7264-515f-4d24-6267eaa24be8</uuid>
  <memory unit='KiB'>3145728</memory>
  <currentMemory unit='KiB'>3145728</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='rhel6.6.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <cpu>
    <topology sockets='1' cores='4' threads='4'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/drbd1'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
function='0x0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source
dev='/dev/disk/by-id/usb-WD_Ext_HDD_1021_574D415A4138353838383731-0:0'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04'
function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05'
function='0x2'/>
    </controller>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:a6:92:ca'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'/>
    <video>
      <model type='vga' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-set'/>
    <qemu:arg value='device.virtio-disk0.x-data-plane=on'/>
  </qemu:commandline>
 </domain>

The above config is already changed as I've first experimented with removing
usb tablet (and installing vmware mouse drivers), turning 'x-data-plane on'
and so on, hoping to solve the problem...Is there anything else I can check
the next time the machine freezes?

Regards,
Saso Slavicic

^ permalink raw reply	[flat|nested] 25+ messages in thread