All of lore.kernel.org
 help / color / mirror / Atom feed
* HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
@ 2010-07-07 18:42 Gianni Tedesco
  2010-07-08 10:03 ` George Dunlap
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-07 18:42 UTC (permalink / raw)
  To: Xen Devel

Hi,

I've spent a few weeks investigating a very reproducible guest-hangs bug
which appears to affect all hypervisors from at least 3.4.2 through 4.0
to unstable.

To reproduce setup an RHEL5.2 guest for kickstart network install
something like this:

vmlinuz ks=nfs:1.2.3.4:ks-rhel52.cfg ksdevice=eth0 console=tty0
	console=ttyS0,9600n8 serial initrd=initrd.img root=/dev/ram0

With a vm profile something like this:

kernel = "/usr/lib/xen/boot/hvmloader"
builder = 'hvm'
memory = 128
name = "RHEL5.2-ks"
vcpus = 2
vif = [ 'type=ioemu,bridge=xenbr0,mac=00:26:b9:87:0e:d3' ]
disk = [ 'phy:/dev/sdb1,hda,w' ]
device_model = '/usr/lib/xen/bin/qemu-dm'

As long as VCPU's > 1 the guest repeatedly hangs after userspace has
started. In all cases hvmctx reports that the kernel is spinning away in
cpu_idle() as if waiting for an interrupt and EFLAGS.IF = 1. If a key is
pressed either on kb or via serial the system unhangs itself. The system
still responds to network traffic (eg. ping) during this time but that
doesn't unhang it.

I have ruled out all the usual suspects, timer modes, vpt_align, guest
kernel clock sources, acpi, hpet, hap and oos. With a little debugging I
was able to show that timer IRQ's from HPET as well as RESCHED IPI's
were still getting delivered during the hangs. Full ctx dump follows.

Help! :)

HVM save record for domain 94
Entry 0: type 1 instance 0, length 24
     Header: magic 0x54381286, version 1
             Xen changeset 0
             CPUID[0][%eax] 0x000106a5
             gtsc_khz 2666735
Entry 1: type 2 instance 0, length 1024
    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
            rcx 0x0000000000000000     rdx 0x0000000000000000
            rbp 0x0000000000030000     rsi 0x0000000000000001
            rdi 0xffffffff802e5658     rsp 0xffffffff803cff90
             r8 0xffffffff803ce000      r9 0x000000000000003e
            r10 0xffff8100070a0038     r11 0xffff81000769f7a0
            r12 0x0000000000000000     r13 0x0000000000000000
            r14 0x0000000000000000     r15 0x0000000000000000
            rip 0xffffffff8006ad64  rflags 0x0000000000000246
            cr0 0x000000008005003b     cr2 0x000000000042cc00
            cr3 0x0000000006bf9000     cr4 0x00000000000006e0
            dr0 0x0000000000000000     dr1 0x0000000000000000
            dr2 0x0000000000000000     dr3 0x0000000000000000
            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
             gs 0x00000000 (0xffffffff8039e000 + 0xffffffff / 0x00c00)
             ss 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             tr 0x00000040 (0xffff810001033000 + 0x0000206f / 0x0008b)
           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
           itdr            (0xffffffff8041d000 + 0x00000fff)
           gdtr            (0xffffffff803d0000 + 0x00000080)
    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
      shadow gs 0x0000000000000000
      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
           star 0x0023001000000000  cstar 0xffffffff80061584
         sfmask 0x0000000000003700   efer 0x0000000000000d01
            tsc 0x0000001af07d0e03
          event 0x00000000 error 0x00000000
    FPU:    fcw 0x037f fsw 0x0000
            ftw 0x00 (0x00) fop 0x0000
          fpuip 0x0000000000000000 fpudp 0x0000000000000000
          mxcsr 0x00001fa0 mask 0x0000ffff
            mm0 0x00000000000000000000 (0x000000000000)
            mm1 0x00000000000000000000 (0x000000000000)
            mm2 0x00000000000000000000 (0x000000000000)
            mm3 0x00000000000000000000 (0x000000000000)
            mm4 0x00000000000000000000 (0x000000000000)
            mm5 0x00000000000000000000 (0x000000000000)
            mm6 0x00000000000000000000 (0x000000000000)
            mm7 0x00000000000000000000 (0x000000000000)
          xmm00 0x00000000000000003fe333333f19999a
          xmm01 0x00000000000000000000000040266666
          xmm02 0x00000000000000000000000000000000
          xmm03 0x00000000000000000000000000000000
          xmm04 0x00000000000000000000000000000000
          xmm05 0x00000000000000000000000000000000
          xmm06 0x00000000000000000000000000000000
          xmm07 0x00000000000000000000000000000000
          xmm08 0x00000000000000000000000000000000
          xmm09 0x00000000000000000000000000000000
          xmm10 0x00000000000000000000000000000000
          xmm11 0x00000000000000000000000000000000
          xmm12 0x00000000000000000000000000000000
          xmm13 0x00000000000000000000000000000000
          xmm14 0x00000000000000000000000000000000
          xmm15 0x00000000000000000000000000000000
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
Entry 2: type 2 instance 1, length 1024
    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
            rcx 0x0000000000000000     rdx 0x0000000000000000
            rbp 0x0000000000000001     rsi 0x0000000000000001
            rdi 0xffffffff802e5658     rsp 0xffff81000708fef0
             r8 0xffff81000708e000      r9 0x000000000000003f
            r10 0xffff8100070a0008     r11 0xffff810006b5a200
            r12 0x00000000000000ff     r13 0xffffffff803a6080
            r14 0x0000000000000100     r15 0xffffffff803c8280
            rip 0xffffffff8006ad64  rflags 0x0000000000000246
            cr0 0x000000008005003b     cr2 0x0000000000866290
            cr3 0x0000000000201000     cr4 0x00000000000006e0
            dr0 0x0000000000000000     dr1 0x0000000000000000
            dr2 0x0000000000000000     dr3 0x0000000000000000
            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
             gs 0x00000000 (0xffff810007080b40 + 0xffffffff / 0x00c00)
             ss 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
             tr 0x00000040 (0xffff81000103b580 + 0x0000206f / 0x0008b)
           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
           itdr            (0xffffffff8041d000 + 0x00000fff)
           gdtr            (0xffff810007085000 + 0x00000080)
    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
      shadow gs 0x0000000000000000
      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
           star 0x0023001000000000  cstar 0xffffffff80061584
         sfmask 0x0000000000003700   efer 0x0000000000000d01
            tsc 0x0000001af07d53b6
          event 0x00000000 error 0x00000000
    FPU:    fcw 0x037f fsw 0x0000
            ftw 0x00 (0x00) fop 0x0000
          fpuip 0x0000000000000000 fpudp 0x0000000000000000
          mxcsr 0x00001fa0 mask 0x0000ffff
            mm0 0x00000000000000000000 (0x000000000000)
            mm1 0x00000000000000000000 (0x000000000000)
            mm2 0x00000000000000000000 (0x000000000000)
            mm3 0x00000000000000000000 (0x000000000000)
            mm4 0x00000000000000000000 (0x000000000000)
            mm5 0x00000000000000000000 (0x000000000000)
            mm6 0x00000000000000000000 (0x000000000000)
            mm7 0x00000000000000000000 (0x000000000000)
          xmm00 0x00000000000000003fe333333f19999a
          xmm01 0x00000000000000000000000040266666
          xmm02 0x00000000000000000000000000000000
          xmm03 0x00000000000000000000000000000000
          xmm04 0x00000000000000000000000000000000
          xmm05 0x00000000000000000000000000000000
          xmm06 0x00000000000000000000000000000000
          xmm07 0x00000000000000000000000000000000
          xmm08 0x00000000000000000000000000000000
          xmm09 0x00000000000000000000000000000000
          xmm10 0x00000000000000000000000000000000
          xmm11 0x00000000000000000000000000000000
          xmm12 0x00000000000000000000000000000000
          xmm13 0x00000000000000000000000000000000
          xmm14 0x00000000000000000000000000000000
          xmm15 0x00000000000000000000000000000000
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
Entry 3: type 3 instance 0, length 8
    PIC: IRQ base 0x20, irr 0x1, imr 0xfa, isr 0
         init_state 0, priority_add 0, readsel_isr 0, poll 0
         auto_eoi 1, rotate_on_auto_eoi 0
         special_fully_nested_mode 0, special_mask_mode 0
         is_master 1, elcr 0x24, int_output 0x1
Entry 4: type 3 instance 1, length 8
    PIC: IRQ base 0x28, irr 0, imr 0xff, isr 0
         init_state 0, priority_add 0, readsel_isr 0, poll 0
         auto_eoi 0, rotate_on_auto_eoi 0
         special_fully_nested_mode 0, special_mask_mode 0
         is_master 0, elcr 0xc, int_output 0
Entry 5: type 4 instance 0, length 400
    IOAPIC: base_address 0xfec00000, ioregsel 0x1c id 0x1
            pin 00: 0x0000000000010000
            pin 01: 0x0000000000000039
            pin 02: 0x0000000000000031
            pin 03: 0x0000000000000041
            pin 04: 0x0000000000000049
            pin 05: 0x000000000001a051
            pin 06: 0x0000000000000059
            pin 07: 0x0000000000000061
            pin 08: 0x0000000000000069
            pin 09: 0x0000000000000071
            pin 10: 0x000000000001a079
            pin 11: 0x000000000001a081
            pin 12: 0x0000000000000089
            pin 13: 0x0000000000000091
            pin 14: 0x0000000000000099
            pin 15: 0x00000000000000a1
            pin 16: 0x0000000000010000
            pin 17: 0x0000000000010000
            pin 18: 0x0000000000010000
            pin 19: 0x0000000000010000
            pin 20: 0x0000000000010000
            pin 21: 0x0000000000010000
            pin 22: 0x0000000000010000
            pin 23: 0x0000000000010000
            pin 24: 0x0000000000010000
            pin 25: 0x0000000000010000
            pin 26: 0x0000000000010000
            pin 27: 0x0000000000010000
            pin 28: 0x0000000000010000
            pin 29: 0x0000000000010000
            pin 30: 0x0000000000010000
            pin 31: 0x0000000000010000
            pin 32: 0x0000000000010000
            pin 33: 0x0000000000010000
            pin 34: 0x0000000000010000
            pin 35: 0x0000000000010000
            pin 36: 0x0000000000010000
            pin 37: 0x0000000000010000
            pin 38: 0x0000000000010000
            pin 39: 0x0000000000010000
            pin 40: 0x0000000000010000
            pin 41: 0x0000000000010000
            pin 42: 0x0000000000010000
            pin 43: 0x0000000000010000
            pin 44: 0x0000000000010000
            pin 45: 0x0000000000010000
            pin 46: 0x0000000000010000
            pin 47: 0x0000000000010000
Entry 6: type 5 instance 0, length 16
    LAPIC: base_msr 0xfee00900, disabled 0, timer_divisor 0x10
Entry 7: type 5 instance 1, length 16
    LAPIC: base_msr 0xfee00800, disabled 0, timer_divisor 0x10
Entry 8: type 6 instance 0, length 1024
    LAPIC registers:
          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
          0x0020: 0x0000000000000000   0x0030: 0x0000000000050014
          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
          0x00c0: 0x0000000000000000   0x00d0: 0x0000000001000000
          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
          0x0300: 0x00000000000000fc   0x0310: 0x0000000002000000
          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
          0x0360: 0x0000000000000400   0x0370: 0x00000000000000fe
          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
Entry 9: type 6 instance 1, length 1024
    LAPIC registers:
          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
          0x0020: 0x0000000002000000   0x0030: 0x0000000000050014
          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
          0x00c0: 0x0000000000000000   0x00d0: 0x0000000002000000
          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
          0x0300: 0x00000000000000fd   0x0310: 0x0000000000000000
          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
          0x0360: 0x0000000000010400   0x0370: 0x00000000000000fe
          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
Entry 10: type 7 instance 0, length 16
    PCI IRQs: 0x00000000000000000000000000000000
Entry 11: type 8 instance 0, length 8
    ISA IRQs: 0x0001
Entry 12: type 9 instance 0, length 8
    PCI LINK: 5 10 11 5
Entry 13: type 10 instance 0, length 56
    PIT: speaker off
         ch 0: count 0x4a9, latched_count 0x4a5, count_latched 0
               status 0, status_latched 0
               rd_state 0x3, wr_state 0x3, wr_latch 0xa9, rw_mode 0x3
               mode 0x2, bcd 0, gate 0x1
         ch 1: count 0x10000, latched_count 0, count_latched 0
               status 0, status_latched 0
               rd_state 0, wr_state 0, wr_latch 0, rw_mode 0
               mode 0xff, bcd 0, gate 0x1
Entry 14: type 11 instance 0, length 16
    RTC: regs 0x18 0x00 0x36 0x00 0x18 0x00 0x03 0x07
              0x07 0x10 0x26 0x02 0x00 0x80, index 0x10
Entry 15: type 12 instance 0, length 1048
    HPET: capability 0xf424008086a201 config 0
          isr 0 counter 0xa1ad6b9c
          timer0 config 0xf0000000000030 cmp 0
          timer0 period 0 fsb 0
          timer1 config 0xf0000000000030 cmp 0
          timer1 period 0 fsb 0
          timer2 config 0xf0000000000030 cmp 0
          timer2 period 0 fsb 0
Entry 16: type 13 instance 0, length 8
    ACPI PM: TMR_VAL 0x8fff, PM1a_STS 0x0, PM1a_EN 0x0
Entry 17: type 14 instance 0, length 240
    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
          var 0 0x00000000f0000000 0x0000000ff8000800
          var 1 0x00000000f8000000 0x0000000ffc000800
          var 2 0x0000000000000000 0x0000000000000000
          var 3 0x0000000000000000 0x0000000000000000
          var 4 0x0000000000000000 0x0000000000000000
          var 5 0x0000000000000000 0x0000000000000000
          var 6 0x0000000000000000 0x0000000000000000
          var 7 0x0000000000000000 0x0000000000000000
          fixed 00 0x0606060606060606
          fixed 01 0x0606060606060606
          fixed 02 0x0101010101010101
          fixed 03 0x0606060606060606
          fixed 04 0x0606060606060606
          fixed 05 0x0606060606060606
          fixed 06 0x0606060606060606
          fixed 07 0x0606060606060606
          fixed 08 0x0606060606060606
          fixed 09 0x0606060606060606
          fixed 10 0x0606060606060606
Entry 18: type 14 instance 1, length 240
    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
          var 0 0x00000000f0000000 0x0000000ff8000800
          var 1 0x00000000f8000000 0x0000000ffc000800
          var 2 0x0000000000000000 0x0000000000000000
          var 3 0x0000000000000000 0x0000000000000000
          var 4 0x0000000000000000 0x0000000000000000
          var 5 0x0000000000000000 0x0000000000000000
          var 6 0x0000000000000000 0x0000000000000000
          var 7 0x0000000000000000 0x0000000000000000
          fixed 00 0x0606060606060606
          fixed 01 0x0606060606060606
          fixed 02 0x0101010101010101
          fixed 03 0x0606060606060606
          fixed 04 0x0606060606060606
          fixed 05 0x0606060606060606
          fixed 06 0x0606060606060606
          fixed 07 0x0606060606060606
          fixed 08 0x0606060606060606
          fixed 09 0x0606060606060606
          fixed 10 0x0606060606060606
Entry 19: type 0 instance 0, length 0
[scara@habil xen-unstable.hg]$ sudo xen-hvmctx 94 | grep rip
            rip 0xffffffff8006ad64  rflags 0x0000000000000246
            rip 0xffffffff8006ad64  rflags 0x0000000000000246
[scara@habil xen-unstable.hg]$ sudo xen-hvmctx 94
HVM save record for domain 94
Entry 0: type 1 instance 0, length 24
     Header: magic 0x54381286, version 1
             Xen changeset 0
             CPUID[0][%eax] 0x000106a5
             gtsc_khz 2666735
Entry 1: type 2 instance 0, length 1024
    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
            rcx 0x0000000000000000     rdx 0x0000000000000000
            rbp 0x0000000000030000     rsi 0x0000000000000001
            rdi 0xffffffff802e5658     rsp 0xffffffff803cff90
             r8 0xffffffff803ce000      r9 0x000000000000003e
            r10 0xffff8100070a0038     r11 0xffff81000769f7a0
            r12 0x0000000000000000     r13 0x0000000000000000
            r14 0x0000000000000000     r15 0x0000000000000000
            rip 0xffffffff8006ad64  rflags 0x0000000000000246
            cr0 0x000000008005003b     cr2 0x000000000042cc00
            cr3 0x0000000006bf9000     cr4 0x00000000000006e0
            dr0 0x0000000000000000     dr1 0x0000000000000000
            dr2 0x0000000000000000     dr3 0x0000000000000000
            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
             gs 0x00000000 (0xffffffff8039e000 + 0xffffffff / 0x00c00)
             ss 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             tr 0x00000040 (0xffff810001033000 + 0x0000206f / 0x0008b)
           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
           itdr            (0xffffffff8041d000 + 0x00000fff)
           gdtr            (0xffffffff803d0000 + 0x00000080)
    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
      shadow gs 0x0000000000000000
      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
           star 0x0023001000000000  cstar 0xffffffff80061584
         sfmask 0x0000000000003700   efer 0x0000000000000d01
            tsc 0x0000001ffb1191c5
          event 0x00000000 error 0x00000000
    FPU:    fcw 0x037f fsw 0x0000
            ftw 0x00 (0x00) fop 0x0000
          fpuip 0x0000000000000000 fpudp 0x0000000000000000
          mxcsr 0x00001fa0 mask 0x0000ffff
            mm0 0x00000000000000000000 (0x000000000000)
            mm1 0x00000000000000000000 (0x000000000000)
            mm2 0x00000000000000000000 (0x000000000000)
            mm3 0x00000000000000000000 (0x000000000000)
            mm4 0x00000000000000000000 (0x000000000000)
            mm5 0x00000000000000000000 (0x000000000000)
            mm6 0x00000000000000000000 (0x000000000000)
            mm7 0x00000000000000000000 (0x000000000000)
          xmm00 0x00000000000000003fe333333f19999a
          xmm01 0x00000000000000000000000040266666
          xmm02 0x00000000000000000000000000000000
          xmm03 0x00000000000000000000000000000000
          xmm04 0x00000000000000000000000000000000
          xmm05 0x00000000000000000000000000000000
          xmm06 0x00000000000000000000000000000000
          xmm07 0x00000000000000000000000000000000
          xmm08 0x00000000000000000000000000000000
          xmm09 0x00000000000000000000000000000000
          xmm10 0x00000000000000000000000000000000
          xmm11 0x00000000000000000000000000000000
          xmm12 0x00000000000000000000000000000000
          xmm13 0x00000000000000000000000000000000
          xmm14 0x00000000000000000000000000000000
          xmm15 0x00000000000000000000000000000000
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
Entry 2: type 2 instance 1, length 1024
    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
            rcx 0x0000000000000000     rdx 0x0000000000000000
            rbp 0x0000000000000001     rsi 0x0000000000000001
            rdi 0xffffffff802e5658     rsp 0xffff81000708fef0
             r8 0xffff81000708e000      r9 0x000000000000003f
            r10 0xffff8100070a0008     r11 0xffff810006b5a480
            r12 0x00000000000000ff     r13 0xffffffff803a6080
            r14 0x0000000000000100     r15 0xffffffff803c8280
            rip 0xffffffff8006ad64  rflags 0x0000000000000246
            cr0 0x000000008005003b     cr2 0x0000000000866290
            cr3 0x0000000000201000     cr4 0x00000000000006e0
            dr0 0x0000000000000000     dr1 0x0000000000000000
            dr2 0x0000000000000000     dr3 0x0000000000000000
            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
             gs 0x00000000 (0xffff810007080b40 + 0xffffffff / 0x00c00)
             ss 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
             tr 0x00000040 (0xffff81000103b580 + 0x0000206f / 0x0008b)
           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
           itdr            (0xffffffff8041d000 + 0x00000fff)
           gdtr            (0xffff810007085000 + 0x00000080)
    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
      shadow gs 0x0000000000000000
      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
           star 0x0023001000000000  cstar 0xffffffff80061584
         sfmask 0x0000000000003700   efer 0x0000000000000d01
            tsc 0x0000001ffb11d380
          event 0x00000000 error 0x00000000
    FPU:    fcw 0x037f fsw 0x0000
            ftw 0x00 (0x00) fop 0x0000
          fpuip 0x0000000000000000 fpudp 0x0000000000000000
          mxcsr 0x00001fa0 mask 0x0000ffff
            mm0 0x00000000000000000000 (0x000000000000)
            mm1 0x00000000000000000000 (0x000000000000)
            mm2 0x00000000000000000000 (0x000000000000)
            mm3 0x00000000000000000000 (0x000000000000)
            mm4 0x00000000000000000000 (0x000000000000)
            mm5 0x00000000000000000000 (0x000000000000)
            mm6 0x00000000000000000000 (0x000000000000)
            mm7 0x00000000000000000000 (0x000000000000)
          xmm00 0x00000000000000003fe333333f19999a
          xmm01 0x00000000000000000000000040266666
          xmm02 0x00000000000000000000000000000000
          xmm03 0x00000000000000000000000000000000
          xmm04 0x00000000000000000000000000000000
          xmm05 0x00000000000000000000000000000000
          xmm06 0x00000000000000000000000000000000
          xmm07 0x00000000000000000000000000000000
          xmm08 0x00000000000000000000000000000000
          xmm09 0x00000000000000000000000000000000
          xmm10 0x00000000000000000000000000000000
          xmm11 0x00000000000000000000000000000000
          xmm12 0x00000000000000000000000000000000
          xmm13 0x00000000000000000000000000000000
          xmm14 0x00000000000000000000000000000000
          xmm15 0x00000000000000000000000000000000
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
               (0x00000000000000000000000000000000)
Entry 3: type 3 instance 0, length 8
    PIC: IRQ base 0x20, irr 0x1, imr 0xfa, isr 0
         init_state 0, priority_add 0, readsel_isr 0, poll 0
         auto_eoi 1, rotate_on_auto_eoi 0
         special_fully_nested_mode 0, special_mask_mode 0
         is_master 1, elcr 0x24, int_output 0x1
Entry 4: type 3 instance 1, length 8
    PIC: IRQ base 0x28, irr 0, imr 0xff, isr 0
         init_state 0, priority_add 0, readsel_isr 0, poll 0
         auto_eoi 0, rotate_on_auto_eoi 0
         special_fully_nested_mode 0, special_mask_mode 0
         is_master 0, elcr 0xc, int_output 0
Entry 5: type 4 instance 0, length 400
    IOAPIC: base_address 0xfec00000, ioregsel 0x1c id 0x1
            pin 00: 0x0000000000010000
            pin 01: 0x0000000000000039
            pin 02: 0x0000000000000031
            pin 03: 0x0000000000000041
            pin 04: 0x0000000000000049
            pin 05: 0x000000000001a051
            pin 06: 0x0000000000000059
            pin 07: 0x0000000000000061
            pin 08: 0x0000000000000069
            pin 09: 0x0000000000000071
            pin 10: 0x000000000001a079
            pin 11: 0x000000000001a081
            pin 12: 0x0000000000000089
            pin 13: 0x0000000000000091
            pin 14: 0x0000000000000099
            pin 15: 0x00000000000000a1
            pin 16: 0x0000000000010000
            pin 17: 0x0000000000010000
            pin 18: 0x0000000000010000
            pin 19: 0x0000000000010000
            pin 20: 0x0000000000010000
            pin 21: 0x0000000000010000
            pin 22: 0x0000000000010000
            pin 23: 0x0000000000010000
            pin 24: 0x0000000000010000
            pin 25: 0x0000000000010000
            pin 26: 0x0000000000010000
            pin 27: 0x0000000000010000
            pin 28: 0x0000000000010000
            pin 29: 0x0000000000010000
            pin 30: 0x0000000000010000
            pin 31: 0x0000000000010000
            pin 32: 0x0000000000010000
            pin 33: 0x0000000000010000
            pin 34: 0x0000000000010000
            pin 35: 0x0000000000010000
            pin 36: 0x0000000000010000
            pin 37: 0x0000000000010000
            pin 38: 0x0000000000010000
            pin 39: 0x0000000000010000
            pin 40: 0x0000000000010000
            pin 41: 0x0000000000010000
            pin 42: 0x0000000000010000
            pin 43: 0x0000000000010000
            pin 44: 0x0000000000010000
            pin 45: 0x0000000000010000
            pin 46: 0x0000000000010000
            pin 47: 0x0000000000010000
Entry 6: type 5 instance 0, length 16
    LAPIC: base_msr 0xfee00900, disabled 0, timer_divisor 0x10
Entry 7: type 5 instance 1, length 16
    LAPIC: base_msr 0xfee00800, disabled 0, timer_divisor 0x10
Entry 8: type 6 instance 0, length 1024
    LAPIC registers:
          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
          0x0020: 0x0000000000000000   0x0030: 0x0000000000050014
          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
          0x00c0: 0x0000000000000000   0x00d0: 0x0000000001000000
          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
          0x0300: 0x00000000000000fc   0x0310: 0x0000000002000000
          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
          0x0360: 0x0000000000000400   0x0370: 0x00000000000000fe
          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
Entry 9: type 6 instance 1, length 1024
    LAPIC registers:
          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
          0x0020: 0x0000000002000000   0x0030: 0x0000000000050014
          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
          0x00c0: 0x0000000000000000   0x00d0: 0x0000000002000000
          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
          0x0300: 0x00000000000000fd   0x0310: 0x0000000000000000
          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
          0x0360: 0x0000000000010400   0x0370: 0x00000000000000fe
          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
Entry 10: type 7 instance 0, length 16
    PCI IRQs: 0x00000000000000000000000000000000
Entry 11: type 8 instance 0, length 8
    ISA IRQs: 0x0001
Entry 12: type 9 instance 0, length 8
    PCI LINK: 5 10 11 5
Entry 13: type 10 instance 0, length 56
    PIT: speaker off
         ch 0: count 0x4a9, latched_count 0x4a7, count_latched 0
               status 0, status_latched 0
               rd_state 0x3, wr_state 0x3, wr_latch 0xa9, rw_mode 0x3
               mode 0x2, bcd 0, gate 0x1
         ch 1: count 0x10000, latched_count 0, count_latched 0
               status 0, status_latched 0
               rd_state 0, wr_state 0, wr_latch 0, rw_mode 0
               mode 0xff, bcd 0, gate 0x1
Entry 14: type 11 instance 0, length 16
    RTC: regs 0x26 0x00 0x36 0x00 0x18 0x00 0x03 0x07
              0x07 0x10 0x26 0x02 0x00 0x80, index 0x10
Entry 15: type 12 instance 0, length 1048
    HPET: capability 0xf424008086a201 config 0
          isr 0 counter 0xbfed04a9
          timer0 config 0xf0000000000030 cmp 0
          timer0 period 0 fsb 0
          timer1 config 0xf0000000000030 cmp 0
          timer1 period 0 fsb 0
          timer2 config 0xf0000000000030 cmp 0
          timer2 period 0 fsb 0
Entry 16: type 13 instance 0, length 8
    ACPI PM: TMR_VAL 0x8fff, PM1a_STS 0x0, PM1a_EN 0x0
Entry 17: type 14 instance 0, length 240
    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
          var 0 0x00000000f0000000 0x0000000ff8000800
          var 1 0x00000000f8000000 0x0000000ffc000800
          var 2 0x0000000000000000 0x0000000000000000
          var 3 0x0000000000000000 0x0000000000000000
          var 4 0x0000000000000000 0x0000000000000000
          var 5 0x0000000000000000 0x0000000000000000
          var 6 0x0000000000000000 0x0000000000000000
          var 7 0x0000000000000000 0x0000000000000000
          fixed 00 0x0606060606060606
          fixed 01 0x0606060606060606
          fixed 02 0x0101010101010101
          fixed 03 0x0606060606060606
          fixed 04 0x0606060606060606
          fixed 05 0x0606060606060606
          fixed 06 0x0606060606060606
          fixed 07 0x0606060606060606
          fixed 08 0x0606060606060606
          fixed 09 0x0606060606060606
          fixed 10 0x0606060606060606
Entry 18: type 14 instance 1, length 240
    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
          var 0 0x00000000f0000000 0x0000000ff8000800
          var 1 0x00000000f8000000 0x0000000ffc000800
          var 2 0x0000000000000000 0x0000000000000000
          var 3 0x0000000000000000 0x0000000000000000
          var 4 0x0000000000000000 0x0000000000000000
          var 5 0x0000000000000000 0x0000000000000000
          var 6 0x0000000000000000 0x0000000000000000
          var 7 0x0000000000000000 0x0000000000000000
          fixed 00 0x0606060606060606
          fixed 01 0x0606060606060606
          fixed 02 0x0101010101010101
          fixed 03 0x0606060606060606
          fixed 04 0x0606060606060606
          fixed 05 0x0606060606060606
          fixed 06 0x0606060606060606
          fixed 07 0x0606060606060606
          fixed 08 0x0606060606060606
          fixed 09 0x0606060606060606
          fixed 10 0x0606060606060606
Entry 19: type 0 instance 0, length 0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-07 18:42 HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1 Gianni Tedesco
@ 2010-07-08 10:03 ` George Dunlap
  2010-07-08 11:55   ` Gianni Tedesco
  2010-07-12 14:04 ` Konrad Rzeszutek Wilk
  2010-07-21 17:29 ` Gianni Tedesco
  2 siblings, 1 reply; 18+ messages in thread
From: George Dunlap @ 2010-07-08 10:03 UTC (permalink / raw)
  To: Gianni Tedesco; +Cc: Xen Devel

If both cpus are idling with EFLAGS.IF=1, this would imply that the
kernel thinks it's waiting on a device, yes?  One thing you could do
is to track the interaction between the guest and the devices, and see
if you can figure out what it's waiting for and why the thing it's
waiting for isn't happening.  You can use xentrace + xenalyze
(http://xenbits.xensource.com/ext/xenalyze.hg) to see all the PIO,
MMIO, and interrupts delivered to the guest.

Unfortunately this would mean understanding at some level the
interface the device presents, which may involve a lot of going
through driver code / going through QEMU, which doesn't sound fun. :-/
 Maybe someone else will have some suggestions...

I ended up with a similar-looking problem during boot with a stock
2.6.18.8 kernel, after hacking up a work-around to allow it to get
past the timer synchronization stage.  It might be easier to track
down if you have a failure mode that's quicker to reproduce and a
guest kernel that's easier to modify.  (But of course there's always
the possibility that it's a different bug with similar symptoms...)

 -George


On Wed, Jul 7, 2010 at 7:42 PM, Gianni Tedesco
<gianni.tedesco@citrix.com> wrote:
> Hi,
>
> I've spent a few weeks investigating a very reproducible guest-hangs bug
> which appears to affect all hypervisors from at least 3.4.2 through 4.0
> to unstable.
>
> To reproduce setup an RHEL5.2 guest for kickstart network install
> something like this:
>
> vmlinuz ks=nfs:1.2.3.4:ks-rhel52.cfg ksdevice=eth0 console=tty0
>        console=ttyS0,9600n8 serial initrd=initrd.img root=/dev/ram0
>
> With a vm profile something like this:
>
> kernel = "/usr/lib/xen/boot/hvmloader"
> builder = 'hvm'
> memory = 128
> name = "RHEL5.2-ks"
> vcpus = 2
> vif = [ 'type=ioemu,bridge=xenbr0,mac=00:26:b9:87:0e:d3' ]
> disk = [ 'phy:/dev/sdb1,hda,w' ]
> device_model = '/usr/lib/xen/bin/qemu-dm'
>
> As long as VCPU's > 1 the guest repeatedly hangs after userspace has
> started. In all cases hvmctx reports that the kernel is spinning away in
> cpu_idle() as if waiting for an interrupt and EFLAGS.IF = 1. If a key is
> pressed either on kb or via serial the system unhangs itself. The system
> still responds to network traffic (eg. ping) during this time but that
> doesn't unhang it.
>
> I have ruled out all the usual suspects, timer modes, vpt_align, guest
> kernel clock sources, acpi, hpet, hap and oos. With a little debugging I
> was able to show that timer IRQ's from HPET as well as RESCHED IPI's
> were still getting delivered during the hangs. Full ctx dump follows.
>
> Help! :)
>
> HVM save record for domain 94
> Entry 0: type 1 instance 0, length 24
>     Header: magic 0x54381286, version 1
>             Xen changeset 0
>             CPUID[0][%eax] 0x000106a5
>             gtsc_khz 2666735
> Entry 1: type 2 instance 0, length 1024
>    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
>            rcx 0x0000000000000000     rdx 0x0000000000000000
>            rbp 0x0000000000030000     rsi 0x0000000000000001
>            rdi 0xffffffff802e5658     rsp 0xffffffff803cff90
>             r8 0xffffffff803ce000      r9 0x000000000000003e
>            r10 0xffff8100070a0038     r11 0xffff81000769f7a0
>            r12 0x0000000000000000     r13 0x0000000000000000
>            r14 0x0000000000000000     r15 0x0000000000000000
>            rip 0xffffffff8006ad64  rflags 0x0000000000000246
>            cr0 0x000000008005003b     cr2 0x000000000042cc00
>            cr3 0x0000000006bf9000     cr4 0x00000000000006e0
>            dr0 0x0000000000000000     dr1 0x0000000000000000
>            dr2 0x0000000000000000     dr3 0x0000000000000000
>            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
>             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
>             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>             gs 0x00000000 (0xffffffff8039e000 + 0xffffffff / 0x00c00)
>             ss 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             tr 0x00000040 (0xffff810001033000 + 0x0000206f / 0x0008b)
>           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>           itdr            (0xffffffff8041d000 + 0x00000fff)
>           gdtr            (0xffffffff803d0000 + 0x00000080)
>    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
>      shadow gs 0x0000000000000000
>      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
>           star 0x0023001000000000  cstar 0xffffffff80061584
>         sfmask 0x0000000000003700   efer 0x0000000000000d01
>            tsc 0x0000001af07d0e03
>          event 0x00000000 error 0x00000000
>    FPU:    fcw 0x037f fsw 0x0000
>            ftw 0x00 (0x00) fop 0x0000
>          fpuip 0x0000000000000000 fpudp 0x0000000000000000
>          mxcsr 0x00001fa0 mask 0x0000ffff
>            mm0 0x00000000000000000000 (0x000000000000)
>            mm1 0x00000000000000000000 (0x000000000000)
>            mm2 0x00000000000000000000 (0x000000000000)
>            mm3 0x00000000000000000000 (0x000000000000)
>            mm4 0x00000000000000000000 (0x000000000000)
>            mm5 0x00000000000000000000 (0x000000000000)
>            mm6 0x00000000000000000000 (0x000000000000)
>            mm7 0x00000000000000000000 (0x000000000000)
>          xmm00 0x00000000000000003fe333333f19999a
>          xmm01 0x00000000000000000000000040266666
>          xmm02 0x00000000000000000000000000000000
>          xmm03 0x00000000000000000000000000000000
>          xmm04 0x00000000000000000000000000000000
>          xmm05 0x00000000000000000000000000000000
>          xmm06 0x00000000000000000000000000000000
>          xmm07 0x00000000000000000000000000000000
>          xmm08 0x00000000000000000000000000000000
>          xmm09 0x00000000000000000000000000000000
>          xmm10 0x00000000000000000000000000000000
>          xmm11 0x00000000000000000000000000000000
>          xmm12 0x00000000000000000000000000000000
>          xmm13 0x00000000000000000000000000000000
>          xmm14 0x00000000000000000000000000000000
>          xmm15 0x00000000000000000000000000000000
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
> Entry 2: type 2 instance 1, length 1024
>    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
>            rcx 0x0000000000000000     rdx 0x0000000000000000
>            rbp 0x0000000000000001     rsi 0x0000000000000001
>            rdi 0xffffffff802e5658     rsp 0xffff81000708fef0
>             r8 0xffff81000708e000      r9 0x000000000000003f
>            r10 0xffff8100070a0008     r11 0xffff810006b5a200
>            r12 0x00000000000000ff     r13 0xffffffff803a6080
>            r14 0x0000000000000100     r15 0xffffffff803c8280
>            rip 0xffffffff8006ad64  rflags 0x0000000000000246
>            cr0 0x000000008005003b     cr2 0x0000000000866290
>            cr3 0x0000000000201000     cr4 0x00000000000006e0
>            dr0 0x0000000000000000     dr1 0x0000000000000000
>            dr2 0x0000000000000000     dr3 0x0000000000000000
>            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
>             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
>             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>             gs 0x00000000 (0xffff810007080b40 + 0xffffffff / 0x00c00)
>             ss 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>             tr 0x00000040 (0xffff81000103b580 + 0x0000206f / 0x0008b)
>           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>           itdr            (0xffffffff8041d000 + 0x00000fff)
>           gdtr            (0xffff810007085000 + 0x00000080)
>    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
>      shadow gs 0x0000000000000000
>      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
>           star 0x0023001000000000  cstar 0xffffffff80061584
>         sfmask 0x0000000000003700   efer 0x0000000000000d01
>            tsc 0x0000001af07d53b6
>          event 0x00000000 error 0x00000000
>    FPU:    fcw 0x037f fsw 0x0000
>            ftw 0x00 (0x00) fop 0x0000
>          fpuip 0x0000000000000000 fpudp 0x0000000000000000
>          mxcsr 0x00001fa0 mask 0x0000ffff
>            mm0 0x00000000000000000000 (0x000000000000)
>            mm1 0x00000000000000000000 (0x000000000000)
>            mm2 0x00000000000000000000 (0x000000000000)
>            mm3 0x00000000000000000000 (0x000000000000)
>            mm4 0x00000000000000000000 (0x000000000000)
>            mm5 0x00000000000000000000 (0x000000000000)
>            mm6 0x00000000000000000000 (0x000000000000)
>            mm7 0x00000000000000000000 (0x000000000000)
>          xmm00 0x00000000000000003fe333333f19999a
>          xmm01 0x00000000000000000000000040266666
>          xmm02 0x00000000000000000000000000000000
>          xmm03 0x00000000000000000000000000000000
>          xmm04 0x00000000000000000000000000000000
>          xmm05 0x00000000000000000000000000000000
>          xmm06 0x00000000000000000000000000000000
>          xmm07 0x00000000000000000000000000000000
>          xmm08 0x00000000000000000000000000000000
>          xmm09 0x00000000000000000000000000000000
>          xmm10 0x00000000000000000000000000000000
>          xmm11 0x00000000000000000000000000000000
>          xmm12 0x00000000000000000000000000000000
>          xmm13 0x00000000000000000000000000000000
>          xmm14 0x00000000000000000000000000000000
>          xmm15 0x00000000000000000000000000000000
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
> Entry 3: type 3 instance 0, length 8
>    PIC: IRQ base 0x20, irr 0x1, imr 0xfa, isr 0
>         init_state 0, priority_add 0, readsel_isr 0, poll 0
>         auto_eoi 1, rotate_on_auto_eoi 0
>         special_fully_nested_mode 0, special_mask_mode 0
>         is_master 1, elcr 0x24, int_output 0x1
> Entry 4: type 3 instance 1, length 8
>    PIC: IRQ base 0x28, irr 0, imr 0xff, isr 0
>         init_state 0, priority_add 0, readsel_isr 0, poll 0
>         auto_eoi 0, rotate_on_auto_eoi 0
>         special_fully_nested_mode 0, special_mask_mode 0
>         is_master 0, elcr 0xc, int_output 0
> Entry 5: type 4 instance 0, length 400
>    IOAPIC: base_address 0xfec00000, ioregsel 0x1c id 0x1
>            pin 00: 0x0000000000010000
>            pin 01: 0x0000000000000039
>            pin 02: 0x0000000000000031
>            pin 03: 0x0000000000000041
>            pin 04: 0x0000000000000049
>            pin 05: 0x000000000001a051
>            pin 06: 0x0000000000000059
>            pin 07: 0x0000000000000061
>            pin 08: 0x0000000000000069
>            pin 09: 0x0000000000000071
>            pin 10: 0x000000000001a079
>            pin 11: 0x000000000001a081
>            pin 12: 0x0000000000000089
>            pin 13: 0x0000000000000091
>            pin 14: 0x0000000000000099
>            pin 15: 0x00000000000000a1
>            pin 16: 0x0000000000010000
>            pin 17: 0x0000000000010000
>            pin 18: 0x0000000000010000
>            pin 19: 0x0000000000010000
>            pin 20: 0x0000000000010000
>            pin 21: 0x0000000000010000
>            pin 22: 0x0000000000010000
>            pin 23: 0x0000000000010000
>            pin 24: 0x0000000000010000
>            pin 25: 0x0000000000010000
>            pin 26: 0x0000000000010000
>            pin 27: 0x0000000000010000
>            pin 28: 0x0000000000010000
>            pin 29: 0x0000000000010000
>            pin 30: 0x0000000000010000
>            pin 31: 0x0000000000010000
>            pin 32: 0x0000000000010000
>            pin 33: 0x0000000000010000
>            pin 34: 0x0000000000010000
>            pin 35: 0x0000000000010000
>            pin 36: 0x0000000000010000
>            pin 37: 0x0000000000010000
>            pin 38: 0x0000000000010000
>            pin 39: 0x0000000000010000
>            pin 40: 0x0000000000010000
>            pin 41: 0x0000000000010000
>            pin 42: 0x0000000000010000
>            pin 43: 0x0000000000010000
>            pin 44: 0x0000000000010000
>            pin 45: 0x0000000000010000
>            pin 46: 0x0000000000010000
>            pin 47: 0x0000000000010000
> Entry 6: type 5 instance 0, length 16
>    LAPIC: base_msr 0xfee00900, disabled 0, timer_divisor 0x10
> Entry 7: type 5 instance 1, length 16
>    LAPIC: base_msr 0xfee00800, disabled 0, timer_divisor 0x10
> Entry 8: type 6 instance 0, length 1024
>    LAPIC registers:
>          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
>          0x0020: 0x0000000000000000   0x0030: 0x0000000000050014
>          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
>          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
>          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
>          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
>          0x00c0: 0x0000000000000000   0x00d0: 0x0000000001000000
>          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
>          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
>          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
>          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
>          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
>          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
>          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
>          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
>          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
>          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
>          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
>          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
>          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
>          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
>          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
>          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
>          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
>          0x0300: 0x00000000000000fc   0x0310: 0x0000000002000000
>          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
>          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
>          0x0360: 0x0000000000000400   0x0370: 0x00000000000000fe
>          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
>          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
>          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
>          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
> Entry 9: type 6 instance 1, length 1024
>    LAPIC registers:
>          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
>          0x0020: 0x0000000002000000   0x0030: 0x0000000000050014
>          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
>          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
>          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
>          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
>          0x00c0: 0x0000000000000000   0x00d0: 0x0000000002000000
>          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
>          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
>          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
>          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
>          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
>          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
>          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
>          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
>          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
>          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
>          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
>          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
>          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
>          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
>          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
>          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
>          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
>          0x0300: 0x00000000000000fd   0x0310: 0x0000000000000000
>          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
>          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
>          0x0360: 0x0000000000010400   0x0370: 0x00000000000000fe
>          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
>          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
>          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
>          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
> Entry 10: type 7 instance 0, length 16
>    PCI IRQs: 0x00000000000000000000000000000000
> Entry 11: type 8 instance 0, length 8
>    ISA IRQs: 0x0001
> Entry 12: type 9 instance 0, length 8
>    PCI LINK: 5 10 11 5
> Entry 13: type 10 instance 0, length 56
>    PIT: speaker off
>         ch 0: count 0x4a9, latched_count 0x4a5, count_latched 0
>               status 0, status_latched 0
>               rd_state 0x3, wr_state 0x3, wr_latch 0xa9, rw_mode 0x3
>               mode 0x2, bcd 0, gate 0x1
>         ch 1: count 0x10000, latched_count 0, count_latched 0
>               status 0, status_latched 0
>               rd_state 0, wr_state 0, wr_latch 0, rw_mode 0
>               mode 0xff, bcd 0, gate 0x1
> Entry 14: type 11 instance 0, length 16
>    RTC: regs 0x18 0x00 0x36 0x00 0x18 0x00 0x03 0x07
>              0x07 0x10 0x26 0x02 0x00 0x80, index 0x10
> Entry 15: type 12 instance 0, length 1048
>    HPET: capability 0xf424008086a201 config 0
>          isr 0 counter 0xa1ad6b9c
>          timer0 config 0xf0000000000030 cmp 0
>          timer0 period 0 fsb 0
>          timer1 config 0xf0000000000030 cmp 0
>          timer1 period 0 fsb 0
>          timer2 config 0xf0000000000030 cmp 0
>          timer2 period 0 fsb 0
> Entry 16: type 13 instance 0, length 8
>    ACPI PM: TMR_VAL 0x8fff, PM1a_STS 0x0, PM1a_EN 0x0
> Entry 17: type 14 instance 0, length 240
>    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
>          var 0 0x00000000f0000000 0x0000000ff8000800
>          var 1 0x00000000f8000000 0x0000000ffc000800
>          var 2 0x0000000000000000 0x0000000000000000
>          var 3 0x0000000000000000 0x0000000000000000
>          var 4 0x0000000000000000 0x0000000000000000
>          var 5 0x0000000000000000 0x0000000000000000
>          var 6 0x0000000000000000 0x0000000000000000
>          var 7 0x0000000000000000 0x0000000000000000
>          fixed 00 0x0606060606060606
>          fixed 01 0x0606060606060606
>          fixed 02 0x0101010101010101
>          fixed 03 0x0606060606060606
>          fixed 04 0x0606060606060606
>          fixed 05 0x0606060606060606
>          fixed 06 0x0606060606060606
>          fixed 07 0x0606060606060606
>          fixed 08 0x0606060606060606
>          fixed 09 0x0606060606060606
>          fixed 10 0x0606060606060606
> Entry 18: type 14 instance 1, length 240
>    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
>          var 0 0x00000000f0000000 0x0000000ff8000800
>          var 1 0x00000000f8000000 0x0000000ffc000800
>          var 2 0x0000000000000000 0x0000000000000000
>          var 3 0x0000000000000000 0x0000000000000000
>          var 4 0x0000000000000000 0x0000000000000000
>          var 5 0x0000000000000000 0x0000000000000000
>          var 6 0x0000000000000000 0x0000000000000000
>          var 7 0x0000000000000000 0x0000000000000000
>          fixed 00 0x0606060606060606
>          fixed 01 0x0606060606060606
>          fixed 02 0x0101010101010101
>          fixed 03 0x0606060606060606
>          fixed 04 0x0606060606060606
>          fixed 05 0x0606060606060606
>          fixed 06 0x0606060606060606
>          fixed 07 0x0606060606060606
>          fixed 08 0x0606060606060606
>          fixed 09 0x0606060606060606
>          fixed 10 0x0606060606060606
> Entry 19: type 0 instance 0, length 0
> [scara@habil xen-unstable.hg]$ sudo xen-hvmctx 94 | grep rip
>            rip 0xffffffff8006ad64  rflags 0x0000000000000246
>            rip 0xffffffff8006ad64  rflags 0x0000000000000246
> [scara@habil xen-unstable.hg]$ sudo xen-hvmctx 94
> HVM save record for domain 94
> Entry 0: type 1 instance 0, length 24
>     Header: magic 0x54381286, version 1
>             Xen changeset 0
>             CPUID[0][%eax] 0x000106a5
>             gtsc_khz 2666735
> Entry 1: type 2 instance 0, length 1024
>    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
>            rcx 0x0000000000000000     rdx 0x0000000000000000
>            rbp 0x0000000000030000     rsi 0x0000000000000001
>            rdi 0xffffffff802e5658     rsp 0xffffffff803cff90
>             r8 0xffffffff803ce000      r9 0x000000000000003e
>            r10 0xffff8100070a0038     r11 0xffff81000769f7a0
>            r12 0x0000000000000000     r13 0x0000000000000000
>            r14 0x0000000000000000     r15 0x0000000000000000
>            rip 0xffffffff8006ad64  rflags 0x0000000000000246
>            cr0 0x000000008005003b     cr2 0x000000000042cc00
>            cr3 0x0000000006bf9000     cr4 0x00000000000006e0
>            dr0 0x0000000000000000     dr1 0x0000000000000000
>            dr2 0x0000000000000000     dr3 0x0000000000000000
>            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
>             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
>             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>             gs 0x00000000 (0xffffffff8039e000 + 0xffffffff / 0x00c00)
>             ss 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             tr 0x00000040 (0xffff810001033000 + 0x0000206f / 0x0008b)
>           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>           itdr            (0xffffffff8041d000 + 0x00000fff)
>           gdtr            (0xffffffff803d0000 + 0x00000080)
>    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
>      shadow gs 0x0000000000000000
>      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
>           star 0x0023001000000000  cstar 0xffffffff80061584
>         sfmask 0x0000000000003700   efer 0x0000000000000d01
>            tsc 0x0000001ffb1191c5
>          event 0x00000000 error 0x00000000
>    FPU:    fcw 0x037f fsw 0x0000
>            ftw 0x00 (0x00) fop 0x0000
>          fpuip 0x0000000000000000 fpudp 0x0000000000000000
>          mxcsr 0x00001fa0 mask 0x0000ffff
>            mm0 0x00000000000000000000 (0x000000000000)
>            mm1 0x00000000000000000000 (0x000000000000)
>            mm2 0x00000000000000000000 (0x000000000000)
>            mm3 0x00000000000000000000 (0x000000000000)
>            mm4 0x00000000000000000000 (0x000000000000)
>            mm5 0x00000000000000000000 (0x000000000000)
>            mm6 0x00000000000000000000 (0x000000000000)
>            mm7 0x00000000000000000000 (0x000000000000)
>          xmm00 0x00000000000000003fe333333f19999a
>          xmm01 0x00000000000000000000000040266666
>          xmm02 0x00000000000000000000000000000000
>          xmm03 0x00000000000000000000000000000000
>          xmm04 0x00000000000000000000000000000000
>          xmm05 0x00000000000000000000000000000000
>          xmm06 0x00000000000000000000000000000000
>          xmm07 0x00000000000000000000000000000000
>          xmm08 0x00000000000000000000000000000000
>          xmm09 0x00000000000000000000000000000000
>          xmm10 0x00000000000000000000000000000000
>          xmm11 0x00000000000000000000000000000000
>          xmm12 0x00000000000000000000000000000000
>          xmm13 0x00000000000000000000000000000000
>          xmm14 0x00000000000000000000000000000000
>          xmm15 0x00000000000000000000000000000000
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
> Entry 2: type 2 instance 1, length 1024
>    CPU:    rax 0x0000000000000000     rbx 0xffffffff8006ad3b
>            rcx 0x0000000000000000     rdx 0x0000000000000000
>            rbp 0x0000000000000001     rsi 0x0000000000000001
>            rdi 0xffffffff802e5658     rsp 0xffff81000708fef0
>             r8 0xffff81000708e000      r9 0x000000000000003f
>            r10 0xffff8100070a0008     r11 0xffff810006b5a480
>            r12 0x00000000000000ff     r13 0xffffffff803a6080
>            r14 0x0000000000000100     r15 0xffffffff803c8280
>            rip 0xffffffff8006ad64  rflags 0x0000000000000246
>            cr0 0x000000008005003b     cr2 0x0000000000866290
>            cr3 0x0000000000201000     cr4 0x00000000000006e0
>            dr0 0x0000000000000000     dr1 0x0000000000000000
>            dr2 0x0000000000000000     dr3 0x0000000000000000
>            dr6 0x00000000ffff0ff0     dr7 0x0000000000000400
>             cs 0x00000010 (0x0000000000000000 + 0xffffffff / 0x00a9b)
>             ds 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             es 0x00000018 (0x0000000000000000 + 0xffffffff / 0x00c93)
>             fs 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>             gs 0x00000000 (0xffff810007080b40 + 0xffffffff / 0x00c00)
>             ss 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>             tr 0x00000040 (0xffff81000103b580 + 0x0000206f / 0x0008b)
>           ldtr 0x00000000 (0x0000000000000000 + 0xffffffff / 0x00c00)
>           itdr            (0xffffffff8041d000 + 0x00000fff)
>           gdtr            (0xffff810007085000 + 0x00000080)
>    sysenter cs 0x00000010  eip 0xffffffff80061408  esp 0x0000000000000000
>      shadow gs 0x0000000000000000
>      MSR flags 0x0000000000000007  lstar 0xffffffff8005d098
>           star 0x0023001000000000  cstar 0xffffffff80061584
>         sfmask 0x0000000000003700   efer 0x0000000000000d01
>            tsc 0x0000001ffb11d380
>          event 0x00000000 error 0x00000000
>    FPU:    fcw 0x037f fsw 0x0000
>            ftw 0x00 (0x00) fop 0x0000
>          fpuip 0x0000000000000000 fpudp 0x0000000000000000
>          mxcsr 0x00001fa0 mask 0x0000ffff
>            mm0 0x00000000000000000000 (0x000000000000)
>            mm1 0x00000000000000000000 (0x000000000000)
>            mm2 0x00000000000000000000 (0x000000000000)
>            mm3 0x00000000000000000000 (0x000000000000)
>            mm4 0x00000000000000000000 (0x000000000000)
>            mm5 0x00000000000000000000 (0x000000000000)
>            mm6 0x00000000000000000000 (0x000000000000)
>            mm7 0x00000000000000000000 (0x000000000000)
>          xmm00 0x00000000000000003fe333333f19999a
>          xmm01 0x00000000000000000000000040266666
>          xmm02 0x00000000000000000000000000000000
>          xmm03 0x00000000000000000000000000000000
>          xmm04 0x00000000000000000000000000000000
>          xmm05 0x00000000000000000000000000000000
>          xmm06 0x00000000000000000000000000000000
>          xmm07 0x00000000000000000000000000000000
>          xmm08 0x00000000000000000000000000000000
>          xmm09 0x00000000000000000000000000000000
>          xmm10 0x00000000000000000000000000000000
>          xmm11 0x00000000000000000000000000000000
>          xmm12 0x00000000000000000000000000000000
>          xmm13 0x00000000000000000000000000000000
>          xmm14 0x00000000000000000000000000000000
>          xmm15 0x00000000000000000000000000000000
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
>               (0x00000000000000000000000000000000)
> Entry 3: type 3 instance 0, length 8
>    PIC: IRQ base 0x20, irr 0x1, imr 0xfa, isr 0
>         init_state 0, priority_add 0, readsel_isr 0, poll 0
>         auto_eoi 1, rotate_on_auto_eoi 0
>         special_fully_nested_mode 0, special_mask_mode 0
>         is_master 1, elcr 0x24, int_output 0x1
> Entry 4: type 3 instance 1, length 8
>    PIC: IRQ base 0x28, irr 0, imr 0xff, isr 0
>         init_state 0, priority_add 0, readsel_isr 0, poll 0
>         auto_eoi 0, rotate_on_auto_eoi 0
>         special_fully_nested_mode 0, special_mask_mode 0
>         is_master 0, elcr 0xc, int_output 0
> Entry 5: type 4 instance 0, length 400
>    IOAPIC: base_address 0xfec00000, ioregsel 0x1c id 0x1
>            pin 00: 0x0000000000010000
>            pin 01: 0x0000000000000039
>            pin 02: 0x0000000000000031
>            pin 03: 0x0000000000000041
>            pin 04: 0x0000000000000049
>            pin 05: 0x000000000001a051
>            pin 06: 0x0000000000000059
>            pin 07: 0x0000000000000061
>            pin 08: 0x0000000000000069
>            pin 09: 0x0000000000000071
>            pin 10: 0x000000000001a079
>            pin 11: 0x000000000001a081
>            pin 12: 0x0000000000000089
>            pin 13: 0x0000000000000091
>            pin 14: 0x0000000000000099
>            pin 15: 0x00000000000000a1
>            pin 16: 0x0000000000010000
>            pin 17: 0x0000000000010000
>            pin 18: 0x0000000000010000
>            pin 19: 0x0000000000010000
>            pin 20: 0x0000000000010000
>            pin 21: 0x0000000000010000
>            pin 22: 0x0000000000010000
>            pin 23: 0x0000000000010000
>            pin 24: 0x0000000000010000
>            pin 25: 0x0000000000010000
>            pin 26: 0x0000000000010000
>            pin 27: 0x0000000000010000
>            pin 28: 0x0000000000010000
>            pin 29: 0x0000000000010000
>            pin 30: 0x0000000000010000
>            pin 31: 0x0000000000010000
>            pin 32: 0x0000000000010000
>            pin 33: 0x0000000000010000
>            pin 34: 0x0000000000010000
>            pin 35: 0x0000000000010000
>            pin 36: 0x0000000000010000
>            pin 37: 0x0000000000010000
>            pin 38: 0x0000000000010000
>            pin 39: 0x0000000000010000
>            pin 40: 0x0000000000010000
>            pin 41: 0x0000000000010000
>            pin 42: 0x0000000000010000
>            pin 43: 0x0000000000010000
>            pin 44: 0x0000000000010000
>            pin 45: 0x0000000000010000
>            pin 46: 0x0000000000010000
>            pin 47: 0x0000000000010000
> Entry 6: type 5 instance 0, length 16
>    LAPIC: base_msr 0xfee00900, disabled 0, timer_divisor 0x10
> Entry 7: type 5 instance 1, length 16
>    LAPIC: base_msr 0xfee00800, disabled 0, timer_divisor 0x10
> Entry 8: type 6 instance 0, length 1024
>    LAPIC registers:
>          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
>          0x0020: 0x0000000000000000   0x0030: 0x0000000000050014
>          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
>          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
>          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
>          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
>          0x00c0: 0x0000000000000000   0x00d0: 0x0000000001000000
>          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
>          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
>          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
>          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
>          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
>          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
>          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
>          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
>          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
>          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
>          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
>          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
>          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
>          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
>          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
>          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
>          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
>          0x0300: 0x00000000000000fc   0x0310: 0x0000000002000000
>          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
>          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
>          0x0360: 0x0000000000000400   0x0370: 0x00000000000000fe
>          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
>          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
>          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
>          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
> Entry 9: type 6 instance 1, length 1024
>    LAPIC registers:
>          0x0000: 0x0000000000000000   0x0010: 0x0000000000000000
>          0x0020: 0x0000000002000000   0x0030: 0x0000000000050014
>          0x0040: 0x0000000000000000   0x0050: 0x0000000000000000
>          0x0060: 0x0000000000000000   0x0070: 0x0000000000000000
>          0x0080: 0x0000000000000000   0x0090: 0x0000000000000000
>          0x00a0: 0x0000000000000000   0x00b0: 0x0000000000000000
>          0x00c0: 0x0000000000000000   0x00d0: 0x0000000002000000
>          0x00e0: 0x00000000ffffffff   0x00f0: 0x00000000000001ff
>          0x0100: 0x0000000000000000   0x0110: 0x0000000000000000
>          0x0120: 0x0000000000000000   0x0130: 0x0000000000000000
>          0x0140: 0x0000000000000000   0x0150: 0x0000000000000000
>          0x0160: 0x0000000000000000   0x0170: 0x0000000000000000
>          0x0180: 0x0000000000000000   0x0190: 0x0000000000000000
>          0x01a0: 0x0000000000000000   0x01b0: 0x0000000000000000
>          0x01c0: 0x0000000000000000   0x01d0: 0x0000000000000000
>          0x01e0: 0x0000000000000000   0x01f0: 0x0000000000000000
>          0x0200: 0x0000000000000000   0x0210: 0x0000000000000000
>          0x0220: 0x0000000000000000   0x0230: 0x0000000000000000
>          0x0240: 0x0000000000000000   0x0250: 0x0000000000000000
>          0x0260: 0x0000000000000000   0x0270: 0x0000000000000000
>          0x0280: 0x0000000000000000   0x0290: 0x0000000000000000
>          0x02a0: 0x0000000000000000   0x02b0: 0x0000000000000000
>          0x02c0: 0x0000000000000000   0x02d0: 0x0000000000000000
>          0x02e0: 0x0000000000000000   0x02f0: 0x0000000000000000
>          0x0300: 0x00000000000000fd   0x0310: 0x0000000000000000
>          0x0320: 0x00000000000200ef   0x0330: 0x0000000000010000
>          0x0340: 0x0000000000010000   0x0350: 0x0000000000000400
>          0x0360: 0x0000000000010400   0x0370: 0x00000000000000fe
>          0x0380: 0x000000000000186a   0x0390: 0x0000000000000000
>          0x03a0: 0x0000000000000000   0x03b0: 0x0000000000000000
>          0x03c0: 0x0000000000000000   0x03d0: 0x0000000000000000
>          0x03e0: 0x0000000000000003   0x03f0: 0x0000000000000000
> Entry 10: type 7 instance 0, length 16
>    PCI IRQs: 0x00000000000000000000000000000000
> Entry 11: type 8 instance 0, length 8
>    ISA IRQs: 0x0001
> Entry 12: type 9 instance 0, length 8
>    PCI LINK: 5 10 11 5
> Entry 13: type 10 instance 0, length 56
>    PIT: speaker off
>         ch 0: count 0x4a9, latched_count 0x4a7, count_latched 0
>               status 0, status_latched 0
>               rd_state 0x3, wr_state 0x3, wr_latch 0xa9, rw_mode 0x3
>               mode 0x2, bcd 0, gate 0x1
>         ch 1: count 0x10000, latched_count 0, count_latched 0
>               status 0, status_latched 0
>               rd_state 0, wr_state 0, wr_latch 0, rw_mode 0
>               mode 0xff, bcd 0, gate 0x1
> Entry 14: type 11 instance 0, length 16
>    RTC: regs 0x26 0x00 0x36 0x00 0x18 0x00 0x03 0x07
>              0x07 0x10 0x26 0x02 0x00 0x80, index 0x10
> Entry 15: type 12 instance 0, length 1048
>    HPET: capability 0xf424008086a201 config 0
>          isr 0 counter 0xbfed04a9
>          timer0 config 0xf0000000000030 cmp 0
>          timer0 period 0 fsb 0
>          timer1 config 0xf0000000000030 cmp 0
>          timer1 period 0 fsb 0
>          timer2 config 0xf0000000000030 cmp 0
>          timer2 period 0 fsb 0
> Entry 16: type 13 instance 0, length 8
>    ACPI PM: TMR_VAL 0x8fff, PM1a_STS 0x0, PM1a_EN 0x0
> Entry 17: type 14 instance 0, length 240
>    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
>          var 0 0x00000000f0000000 0x0000000ff8000800
>          var 1 0x00000000f8000000 0x0000000ffc000800
>          var 2 0x0000000000000000 0x0000000000000000
>          var 3 0x0000000000000000 0x0000000000000000
>          var 4 0x0000000000000000 0x0000000000000000
>          var 5 0x0000000000000000 0x0000000000000000
>          var 6 0x0000000000000000 0x0000000000000000
>          var 7 0x0000000000000000 0x0000000000000000
>          fixed 00 0x0606060606060606
>          fixed 01 0x0606060606060606
>          fixed 02 0x0101010101010101
>          fixed 03 0x0606060606060606
>          fixed 04 0x0606060606060606
>          fixed 05 0x0606060606060606
>          fixed 06 0x0606060606060606
>          fixed 07 0x0606060606060606
>          fixed 08 0x0606060606060606
>          fixed 09 0x0606060606060606
>          fixed 10 0x0606060606060606
> Entry 18: type 14 instance 1, length 240
>    MTRR: PAT 0x7040600070406, cap 0x508, default 0xc06
>          var 0 0x00000000f0000000 0x0000000ff8000800
>          var 1 0x00000000f8000000 0x0000000ffc000800
>          var 2 0x0000000000000000 0x0000000000000000
>          var 3 0x0000000000000000 0x0000000000000000
>          var 4 0x0000000000000000 0x0000000000000000
>          var 5 0x0000000000000000 0x0000000000000000
>          var 6 0x0000000000000000 0x0000000000000000
>          var 7 0x0000000000000000 0x0000000000000000
>          fixed 00 0x0606060606060606
>          fixed 01 0x0606060606060606
>          fixed 02 0x0101010101010101
>          fixed 03 0x0606060606060606
>          fixed 04 0x0606060606060606
>          fixed 05 0x0606060606060606
>          fixed 06 0x0606060606060606
>          fixed 07 0x0606060606060606
>          fixed 08 0x0606060606060606
>          fixed 09 0x0606060606060606
>          fixed 10 0x0606060606060606
> Entry 19: type 0 instance 0, length 0
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-08 10:03 ` George Dunlap
@ 2010-07-08 11:55   ` Gianni Tedesco
  2010-07-08 13:28     ` George Dunlap
  0 siblings, 1 reply; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-08 11:55 UTC (permalink / raw)
  To: George Dunlap; +Cc: Xen Devel

On Thu, 2010-07-08 at 11:03 +0100, George Dunlap wrote:
> If both cpus are idling with EFLAGS.IF=1, this would imply that the
> kernel thinks it's waiting on a device, yes?  One thing you could do
> is to track the interaction between the guest and the devices, and see
> if you can figure out what it's waiting for and why the thing it's
> waiting for isn't happening.  You can use xentrace + xenalyze
> (http://xenbits.xensource.com/ext/xenalyze.hg) to see all the PIO,
> MMIO, and interrupts delivered to the guest.
> 
> Unfortunately this would mean understanding at some level the
> interface the device presents, which may involve a lot of going
> through driver code / going through QEMU, which doesn't sound fun. :-/
>  Maybe someone else will have some suggestions...

Hmm, yeah, usually that's a headache to do for one device never mind the
whole system...

> I ended up with a similar-looking problem during boot with a stock
> 2.6.18.8 kernel, after hacking up a work-around to allow it to get
> past the timer synchronization stage.  It might be easier to track
> down if you have a failure mode that's quicker to reproduce and a
> guest kernel that's easier to modify.  (But of course there's always
> the possibility that it's a different bug with similar symptoms...)

Well this reproduces relatively quick but because it's a vendor kernel +
custom initrd it's a bit harder to modify components. Just re-building
the original turns out to be a pain.

I think for now my time is probably best spent trying to minimise the
code required to reproduce the thing and hopefully, in turn, minimise
the amount of PIO + MMIO + IRQ traces to go through.

Argh :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-08 11:55   ` Gianni Tedesco
@ 2010-07-08 13:28     ` George Dunlap
  2010-07-09 16:20       ` Gianni Tedesco
  2010-07-21 16:46       ` Gianni Tedesco
  0 siblings, 2 replies; 18+ messages in thread
From: George Dunlap @ 2010-07-08 13:28 UTC (permalink / raw)
  To: Gianni Tedesco; +Cc: Xen Devel

On Thu, Jul 8, 2010 at 12:55 PM, Gianni Tedesco
<gianni.tedesco@citrix.com> wrote:
> Hmm, yeah, usually that's a headache to do for one device never mind the
> whole system...

But realistically, there's only a handful of devices which it might be
waiting on -- seems like the disk is the most likely culprit.

I'm happy to help with the tracing / analysis bit.

In any case, thanks for doing this, and good luck.

 -George

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-08 13:28     ` George Dunlap
@ 2010-07-09 16:20       ` Gianni Tedesco
  2010-07-21 16:46       ` Gianni Tedesco
  1 sibling, 0 replies; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-09 16:20 UTC (permalink / raw)
  To: George Dunlap; +Cc: Xen Devel

On Thu, 2010-07-08 at 14:28 +0100, George Dunlap wrote:
> On Thu, Jul 8, 2010 at 12:55 PM, Gianni Tedesco
> <gianni.tedesco@citrix.com> wrote:
> > Hmm, yeah, usually that's a headache to do for one device never mind the
> > whole system...
> 
> But realistically, there's only a handful of devices which it might be
> waiting on -- seems like the disk is the most likely culprit.
> 
> I'm happy to help with the tracing / analysis bit.
> 
> In any case, thanks for doing this, and good luck.

Problem is I seem to have ruled out most of that now. IDE is firing off
IRQ's and the host ACKing them properly. There are even hangs during
periods of no IDE activity - just working from ramdisks. Timers are
getting through. Networking always seems to work fine and the hang
occurs regardless of e1000 vs rtl8139. The bug reproduces without serial
and regardless of acpi, std-vga, etc so there isn't much else that could
be going wrong here.

I managed to get a shell up and running while the system is hung and
just spawning "busybox ls" also hangs in an uninterruptible state
(although the uninterruptable part may have been due to lack of job
control in the shell.)

Since the device model and IRQ delivery seems to work as expected I am
wondering if there could be an artefact that causes, for example, the
kernels semaphore implementation to not work as it should?

Gianni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-07 18:42 HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1 Gianni Tedesco
  2010-07-08 10:03 ` George Dunlap
@ 2010-07-12 14:04 ` Konrad Rzeszutek Wilk
  2010-07-12 14:24   ` George Dunlap
  2010-07-12 15:09   ` Gianni Tedesco
  2010-07-21 17:29 ` Gianni Tedesco
  2 siblings, 2 replies; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-07-12 14:04 UTC (permalink / raw)
  To: Gianni Tedesco; +Cc: Xen Devel

On Wed, Jul 07, 2010 at 07:42:35PM +0100, Gianni Tedesco wrote:
> Hi,
> 
> I've spent a few weeks investigating a very reproducible guest-hangs bug
> which appears to affect all hypervisors from at least 3.4.2 through 4.0
> to unstable.
> 
> To reproduce setup an RHEL5.2 guest for kickstart network install
> something like this:

Does this happen with RHEL5.4? CentOS 5.4?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-12 14:04 ` Konrad Rzeszutek Wilk
@ 2010-07-12 14:24   ` George Dunlap
  2010-07-12 15:09   ` Gianni Tedesco
  1 sibling, 0 replies; 18+ messages in thread
From: George Dunlap @ 2010-07-12 14:24 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Xen Devel, Gianni Tedesco

Something superficially similar happens when booting an
(already-installed) Debian Etch system with a 2.6.18-6 kernel, and a
kernel.org version of 2.6.18.8; but there's no guarantee it's the same
root cause.

 -George

On Mon, Jul 12, 2010 at 3:04 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Jul 07, 2010 at 07:42:35PM +0100, Gianni Tedesco wrote:
>> Hi,
>>
>> I've spent a few weeks investigating a very reproducible guest-hangs bug
>> which appears to affect all hypervisors from at least 3.4.2 through 4.0
>> to unstable.
>>
>> To reproduce setup an RHEL5.2 guest for kickstart network install
>> something like this:
>
> Does this happen with RHEL5.4? CentOS 5.4?
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-12 14:04 ` Konrad Rzeszutek Wilk
  2010-07-12 14:24   ` George Dunlap
@ 2010-07-12 15:09   ` Gianni Tedesco
  2010-07-12 15:44     ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-12 15:09 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: George Dunlap, Xen Devel

On Mon, 2010-07-12 at 15:04 +0100, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 07, 2010 at 07:42:35PM +0100, Gianni Tedesco wrote:
> > Hi,
> > 
> > I've spent a few weeks investigating a very reproducible guest-hangs bug
> > which appears to affect all hypervisors from at least 3.4.2 through 4.0
> > to unstable.
> > 
> > To reproduce setup an RHEL5.2 guest for kickstart network install
> > something like this:
> 
> Does this happen with RHEL5.4? CentOS 5.4?

I used the same setup to test RHEL5, RHEL5.[1234] and can only reproduce
on 5.1 and 5.2.

As George pointed out, this may be an issue with a far wider range of
kernels but difficult to reproduce.

My most recent finding is that this is reproduced by loading modules
which create a kernel thread. A sysrq-t during the hangs is showing
every task waiting for kthread_create to complete except for anaconda
which is either waiting on a tty_ioctl() (serial install) or poll (vga
install).

Gianni Tedesco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-12 15:09   ` Gianni Tedesco
@ 2010-07-12 15:44     ` Konrad Rzeszutek Wilk
  2010-07-13 16:31       ` Gianni Tedesco
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-07-12 15:44 UTC (permalink / raw)
  To: Gianni Tedesco; +Cc: George Dunlap, Xen Devel

On Mon, Jul 12, 2010 at 04:09:36PM +0100, Gianni Tedesco wrote:
> On Mon, 2010-07-12 at 15:04 +0100, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jul 07, 2010 at 07:42:35PM +0100, Gianni Tedesco wrote:
> > > Hi,
> > > 
> > > I've spent a few weeks investigating a very reproducible guest-hangs bug
> > > which appears to affect all hypervisors from at least 3.4.2 through 4.0
> > > to unstable.
> > > 
> > > To reproduce setup an RHEL5.2 guest for kickstart network install
> > > something like this:
> > 
> > Does this happen with RHEL5.4? CentOS 5.4?
> 
> I used the same setup to test RHEL5, RHEL5.[1234] and can only reproduce
> on 5.1 and 5.2.

Ok, did you look in the changelog for RHEL5.3 and above. It might be
that you are hitting a bug that was fixed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-12 15:44     ` Konrad Rzeszutek Wilk
@ 2010-07-13 16:31       ` Gianni Tedesco
  2010-07-13 18:13         ` Gianni Tedesco
  0 siblings, 1 reply; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-13 16:31 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: George Dunlap, Xen Devel

On Mon, 2010-07-12 at 16:44 +0100, Konrad Rzeszutek Wilk wrote:
> On Mon, Jul 12, 2010 at 04:09:36PM +0100, Gianni Tedesco wrote:
> > On Mon, 2010-07-12 at 15:04 +0100, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Jul 07, 2010 at 07:42:35PM +0100, Gianni Tedesco wrote:
> > > > Hi,
> > > > 
> > > > I've spent a few weeks investigating a very reproducible guest-hangs bug
> > > > which appears to affect all hypervisors from at least 3.4.2 through 4.0
> > > > to unstable.
> > > > 
> > > > To reproduce setup an RHEL5.2 guest for kickstart network install
> > > > something like this:
> > > 
> > > Does this happen with RHEL5.4? CentOS 5.4?
> > 
> > I used the same setup to test RHEL5, RHEL5.[1234] and can only reproduce
> > on 5.1 and 5.2.
> 
> Ok, did you look in the changelog for RHEL5.3 and above. It might be
> that you are hitting a bug that was fixed.

There are several potential candidates but difficult to get further info
due to redhat bugzilla. Trying to get in touch with a relevant engineer
to confirm / falsify that theory.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-13 16:31       ` Gianni Tedesco
@ 2010-07-13 18:13         ` Gianni Tedesco
  2010-07-13 18:42           ` Dan Magenheimer
  0 siblings, 1 reply; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-13 18:13 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: George Dunlap, Xen Devel

On Tue, 2010-07-13 at 17:31 +0100, Gianni Tedesco wrote:
> > > I used the same setup to test RHEL5, RHEL5.[1234] and can only reproduce
> > > on 5.1 and 5.2.
> > 
> > Ok, did you look in the changelog for RHEL5.3 and above. It might be
> > that you are hitting a bug that was fixed.
> 
> There are several potential candidates but difficult to get further info
> due to redhat bugzilla. Trying to get in touch with a relevant engineer
> to confirm / falsify that theory.

The patch "Fix gettimeofday reliability issues with TSC, HPET, and
PM-Timer" seems to mask the bug and make it much less reproducable. This
patch was to fix some gettimeofday-goes-backwards issues on bare metal.
As a result of this I can now confirm the bug is still present in
RHEL5.3 at least - I shall test the others shortly.

Looks like TSC/PIT timesource is either a) unreliable in xen, b)
unreliable in RHEL kernels or c) all of the above.

Gianni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-13 18:13         ` Gianni Tedesco
@ 2010-07-13 18:42           ` Dan Magenheimer
  2010-07-13 19:13             ` Gianni Tedesco
  0 siblings, 1 reply; 18+ messages in thread
From: Dan Magenheimer @ 2010-07-13 18:42 UTC (permalink / raw)
  To: Gianni Tedesco, Konrad Wilk; +Cc: George Dunlap, Xen Devel

This may be totally unrelated, but just in case...

Are you using xl to create your problem domains?
If so, you might want to set timer_mode=1 in your
vm.cfg.  (See other xen-devel thread "xen tsc problems".)

> -----Original Message-----
> From: Gianni Tedesco [mailto:gianni.tedesco@citrix.com]
> Sent: Tuesday, July 13, 2010 12:13 PM
> To: Konrad Rzeszutek Wilk
> Cc: George Dunlap; Xen Devel
> Subject: Re: [Xen-devel] HVM SMP linux guest hangs in cpu_idle() with
> EFLAGS.IF = 1
> 
> On Tue, 2010-07-13 at 17:31 +0100, Gianni Tedesco wrote:
> > > > I used the same setup to test RHEL5, RHEL5.[1234] and can only
> reproduce
> > > > on 5.1 and 5.2.
> > >
> > > Ok, did you look in the changelog for RHEL5.3 and above. It might
> be
> > > that you are hitting a bug that was fixed.
> >
> > There are several potential candidates but difficult to get further
> info
> > due to redhat bugzilla. Trying to get in touch with a relevant
> engineer
> > to confirm / falsify that theory.
> 
> The patch "Fix gettimeofday reliability issues with TSC, HPET, and
> PM-Timer" seems to mask the bug and make it much less reproducable.
> This
> patch was to fix some gettimeofday-goes-backwards issues on bare metal.
> As a result of this I can now confirm the bug is still present in
> RHEL5.3 at least - I shall test the others shortly.
> 
> Looks like TSC/PIT timesource is either a) unreliable in xen, b)
> unreliable in RHEL kernels or c) all of the above.
> 
> Gianni
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-13 18:42           ` Dan Magenheimer
@ 2010-07-13 19:13             ` Gianni Tedesco
  2010-07-13 19:27               ` Gianni Tedesco
  0 siblings, 1 reply; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-13 19:13 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: George Dunlap, Xen Devel, Konrad Wilk

On Tue, 2010-07-13 at 19:42 +0100, Dan Magenheimer wrote:
> This may be totally unrelated, but just in case...
> 
> Are you using xl to create your problem domains?
> If so, you might want to set timer_mode=1 in your
> vm.cfg.  (See other xen-devel thread "xen tsc problems".)

Yeah I just had that chat with stefano but I am running with
timer_mode=1 now and no changes, I had timer_mode=3 before, and 0 before
that.

Currently instrumenting my kernel to check for any time-sources going
backwards...

Gianni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-13 19:13             ` Gianni Tedesco
@ 2010-07-13 19:27               ` Gianni Tedesco
  0 siblings, 0 replies; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-13 19:27 UTC (permalink / raw)
  To: Dan Magenheimer; +Cc: George Dunlap, Xen Devel, Konrad Wilk

On Tue, 2010-07-13 at 20:13 +0100, Gianni Tedesco wrote:
> On Tue, 2010-07-13 at 19:42 +0100, Dan Magenheimer wrote:
> > This may be totally unrelated, but just in case...
> > 
> > Are you using xl to create your problem domains?
> > If so, you might want to set timer_mode=1 in your
> > vm.cfg.  (See other xen-devel thread "xen tsc problems".)
> 
> Yeah I just had that chat with stefano but I am running with
> timer_mode=1 now and no changes, I had timer_mode=3 before, and 0 before
> that.
> 
> Currently instrumenting my kernel to check for any time-sources going
> backwards...

In fact, strike this, the clocksource is jiffies which is being bumped
by interrupts, according to hvmctx, HPET is delivering ISR 0 so no way
wall time should be going backwards.

Damn

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-08 13:28     ` George Dunlap
  2010-07-09 16:20       ` Gianni Tedesco
@ 2010-07-21 16:46       ` Gianni Tedesco
  1 sibling, 0 replies; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-21 16:46 UTC (permalink / raw)
  To: George Dunlap; +Cc: Xen Devel

On Thu, 2010-07-08 at 14:28 +0100, George Dunlap wrote:
> On Thu, Jul 8, 2010 at 12:55 PM, Gianni Tedesco
> <gianni.tedesco@citrix.com> wrote:
> > Hmm, yeah, usually that's a headache to do for one device never mind the
> > whole system...
> 
> But realistically, there's only a handful of devices which it might be
> waiting on -- seems like the disk is the most likely culprit.

I can now *categorically* rule this out. I have injected IRQ's for every
installed device and none of this un-sticks the system...

Gianni

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-07 18:42 HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1 Gianni Tedesco
  2010-07-08 10:03 ` George Dunlap
  2010-07-12 14:04 ` Konrad Rzeszutek Wilk
@ 2010-07-21 17:29 ` Gianni Tedesco
  2010-07-21 17:56   ` Nakajima, Jun
  2 siblings, 1 reply; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-21 17:29 UTC (permalink / raw)
  To: Xen Devel

Another data-point I have on this but haven't mentioned here yet is that
the userspace (anaconda) processes are usually hanging in
tty_wait_until_sent() which is apparently why things are woken up by
receiving a keypress on the serial line.

At other times the hang occurs while userspace (hotplug, also anaconda)
is waiting in do_poll() on a kernel netlink socket. In that case a
keypress on serial line also wakes it (!!)

Gianni Tedesco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-21 17:29 ` Gianni Tedesco
@ 2010-07-21 17:56   ` Nakajima, Jun
  2010-07-21 17:59     ` Gianni Tedesco
  0 siblings, 1 reply; 18+ messages in thread
From: Nakajima, Jun @ 2010-07-21 17:56 UTC (permalink / raw)
  To: Gianni Tedesco, Xen Devel

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

Gianni Tedesco wrote on Wed, 21 Jul 2010 at 10:29:43:

> Another data-point I have on this but haven't mentioned here yet is that
> the userspace (anaconda) processes are usually hanging in
> tty_wait_until_sent() which is apparently why things are woken up by
> receiving a keypress on the serial line.
> 
> At other times the hang occurs while userspace (hotplug, also anaconda)
> is waiting in do_poll() on a kernel netlink socket. In that case a
> keypress on serial line also wakes it (!!)
>

I'm not sure if I followed all the emails on the thread, but did you see such a hang without serial connection? I had impression (i.e. kind of remember) serial connection of Linux at boot time caused a hang especially with SMP (on native). 
 
> Gianni Tedesco
> 

Jun
___
Intel Open Source Technology Center




[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1
  2010-07-21 17:56   ` Nakajima, Jun
@ 2010-07-21 17:59     ` Gianni Tedesco
  0 siblings, 0 replies; 18+ messages in thread
From: Gianni Tedesco @ 2010-07-21 17:59 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: Xen Devel

On Wed, 2010-07-21 at 18:56 +0100, Nakajima, Jun wrote:
> Gianni Tedesco wrote on Wed, 21 Jul 2010 at 10:29:43:
> 
> > Another data-point I have on this but haven't mentioned here yet is that
> > the userspace (anaconda) processes are usually hanging in
> > tty_wait_until_sent() which is apparently why things are woken up by
> > receiving a keypress on the serial line.
> > 
> > At other times the hang occurs while userspace (hotplug, also anaconda)
> > is waiting in do_poll() on a kernel netlink socket. In that case a
> > keypress on serial line also wakes it (!!)
> >
> 
> I'm not sure if I followed all the emails on the thread, but did you
> see such a hang without serial connection? I had impression (i.e. kind
> of remember) serial connection of Linux at boot time caused a hang
> especially with SMP (on native). 

I have seen the hang without serial console enabled but with serial
still attached... In that case the hangs still happen but much more
rarely. I am going to try some further tests to remove serial from the
equation and see what happens.

Thanks for the hint

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-07-21 17:59 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-07 18:42 HVM SMP linux guest hangs in cpu_idle() with EFLAGS.IF = 1 Gianni Tedesco
2010-07-08 10:03 ` George Dunlap
2010-07-08 11:55   ` Gianni Tedesco
2010-07-08 13:28     ` George Dunlap
2010-07-09 16:20       ` Gianni Tedesco
2010-07-21 16:46       ` Gianni Tedesco
2010-07-12 14:04 ` Konrad Rzeszutek Wilk
2010-07-12 14:24   ` George Dunlap
2010-07-12 15:09   ` Gianni Tedesco
2010-07-12 15:44     ` Konrad Rzeszutek Wilk
2010-07-13 16:31       ` Gianni Tedesco
2010-07-13 18:13         ` Gianni Tedesco
2010-07-13 18:42           ` Dan Magenheimer
2010-07-13 19:13             ` Gianni Tedesco
2010-07-13 19:27               ` Gianni Tedesco
2010-07-21 17:29 ` Gianni Tedesco
2010-07-21 17:56   ` Nakajima, Jun
2010-07-21 17:59     ` Gianni Tedesco

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.