Re: Need help with fixing the Xen waitqueue feature

* Re: Need help with fixing the Xen waitqueue feature
       [not found] <20111122150755.GA18727@aepfle.de>
@ 2011-11-22 15:40 ` Keir Fraser
  2011-11-22 15:54   ` Keir Fraser
  2011-11-22 17:36   ` Olaf Hering
  0 siblings, 2 replies; 47+ messages in thread
From: Keir Fraser @ 2011-11-22 15:40 UTC (permalink / raw)
  To: Olaf Hering; +Cc: xen-devel

On 22/11/2011 15:07, "Olaf Hering" <olaf@aepfle.de> wrote:

> On Tue, Nov 22, Keir Fraser wrote:
> 
>> On 22/11/2011 13:54, "Olaf Hering" <olaf@aepfle.de> wrote:
>> 
>>> On Tue, Nov 22, Keir Fraser wrote:
>>> 
>>>> I think I checked before, but: also unresponsive to serial debug keys?
>>> 
>>> Good point, I will check that. So far I havent used these keys.
>> 
>> If they work then 'd' will give you a backtrace on every CPU, and 'q' will
>> dump domain and vcpu states. That should make things easier!
> 
> They do indeed work. The backtrace below is from another system.
> Looks like hpet_broadcast_exit() is involved.
> 
> Does that output below give any good hints?

It tells us that the hypervisor itself is in good shape. The deterministic
RIP in hpet_broadcast_exit() is simply because the serial rx interrupt is
always waking us from the idle loop. That RIP value will simply be the first
possible interruption point after the HLT instruction.

I have a new theory, which is that if we go round the for-loop in
wait_event() more than once, the vcpu's pause counter gets messed up and
goes negative, condemning it to sleep forever.

I have *just* pushed a change to the debug 'q' key (ignore the changeset
comment referring to 'd' key, I got that wrong!) which will print per-vcpu
and per-domain pause_count values. Please get the system stuck again, and
send the output from 'q' key with that new changeset (c/s 24178).

Finally, I don't really know what the prep/wake/done messages from your logs
mean, as you didn't send the patch that prints them.

 -- Keir 

>> Try the attached patch (please also try reducing the size of the new
>> parameter to the inline asm from PAGE_SIZE down to e.g. 2000 to force the
>> domain-crashing path).
> 
> Thanks, I will try it.
> 
> 
> Olaf
> 
> 
> ..........
> 
> (XEN) 'q' pressed -> dumping domain info (now=0x5E:F50D77F8)
> (XEN) General information for domain 0:
> (XEN)     refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={}
> max_pages=4294967295
> (XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
> (XEN) Rangesets belonging to domain 0:
> (XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
> d00-ffff }
> (XEN)     Interrupts { 0-207 }
> (XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN)     DomPage list too long to display
> (XEN)     XenPage 000000000021e6d9: caf=c000000000000002, taf=7400000000000002
> (XEN)     XenPage 000000000021e6d8: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000021e6d7: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000021e6d6: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 00000000000db2fe: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN)     VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     250 Hz periodic timer (period 4 ms)
> (XEN) General information for domain 1:
> (XEN)     refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={}
> max_pages=131328
> (XEN)     handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts log_dirty translate external
> (XEN) Rangesets belonging to domain 1:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN)     DomPage list too long to display
> (XEN)     PoD entries=0 cachesize=0
> (XEN)     XenPage 000000000020df70: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020e045: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020c58c: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020c5a4: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 0000000000019f1e: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020eb23: caf=c000000000000001, taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN)     VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
> (XEN) 'q' pressed -> dumping domain info (now=0x60:A7DD8B08)
> (XEN) General information for domain 0:
> (XEN)     refcnt=3 dying=0 nr_pages=1852873 xenheap_pages=5 dirty_cpus={}
> max_pages=4294967295
> (XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=00000004
> (XEN) Rangesets belonging to domain 0:
> (XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-3f7, 400-807, 80c-cfb,
> d00-ffff }
> (XEN)     Interrupts { 0-207 }
> (XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
> (XEN) Memory pages belonging to domain 0:
> (XEN)     DomPage list too long to display
> (XEN)     XenPage 000000000021e6d9: caf=c000000000000002, taf=7400000000000002
> (XEN)     XenPage 000000000021e6d8: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000021e6d7: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000021e6d6: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 00000000000db2fe: caf=c000000000000002, taf=7400000000000002
> (XEN) VCPU information and callbacks for domain 0:
> (XEN)     VCPU0: CPU0 [has=F] flags=0 poll=0 upcall_pend = 01, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     250 Hz periodic timer (period 4 ms)
> (XEN) General information for domain 1:
> (XEN)     refcnt=3 dying=0 nr_pages=3645 xenheap_pages=6 dirty_cpus={}
> max_pages=131328
> (XEN)     handle=d80155e4-8f8b-94e1-8382-94084b7f1e51 vm_assist=00000000
> (XEN)     paging assistance: hap refcounts log_dirty translate external
> (XEN) Rangesets belonging to domain 1:
> (XEN)     I/O Ports  { }
> (XEN)     Interrupts { }
> (XEN)     I/O Memory { }
> (XEN) Memory pages belonging to domain 1:
> (XEN)     DomPage list too long to display
> (XEN)     PoD entries=0 cachesize=0
> (XEN)     XenPage 000000000020df70: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020e045: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020c58c: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020c5a4: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 0000000000019f1e: caf=c000000000000001, taf=7400000000000001
> (XEN)     XenPage 000000000020eb23: caf=c000000000000001, taf=7400000000000001
> (XEN) VCPU information and callbacks for domain 1:
> (XEN)     VCPU0: CPU0 [has=F] flags=4 poll=0 upcall_pend = 00, upcall_mask =
> 00 dirty_cpus={} cpu_affinity={0}
> (XEN)     paging assistance: hap, 4 levels
> (XEN)     No periodic timer
> (XEN) Notifying guest 0:0 (virq 1, port 0, stat 0/-1/-1)
> (XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/0/0)
> (XEN) 'd' pressed -> dumping registers
> (XEN)
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.2.24169-20111122.144218  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
> (XEN) rax: 0000000000003b40   rbx: 000000674742e72d   rcx: 0000000000000001
> (XEN) rdx: 0000000000000000   rsi: ffff82c48030f000   rdi: ffff82c4802bfea0
> (XEN) rbp: ffff82c4802bfee0   rsp: ffff82c4802bfe78   r8:  000000008c858211
> (XEN) r9:  0000000000000003   r10: ffff82c4803064e0   r11: 000000676bf885a3
> (XEN) r12: ffff83021e70e840   r13: ffff83021e70e8d0   r14: 00000067471bdb62
> (XEN) r15: ffff82c48030e440   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000db4c4000   cr2: 0000000000beb000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfe78:
> (XEN)    ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff ffff82c4802bfed0
> (XEN)    0000000180124b57 0000000000000000 0000000000000000 ffff82c48025b200
> (XEN)    0000152900006fe3 ffff82c4802bff18 ffff82c48025b200 ffff82c4802bff18
> (XEN)    ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d 0000000000000000
> (XEN)    ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff ffff82c4802bfe00
> (XEN)    0000000000000000 0000000000001000 0000000000001000 0000000000000000
> (XEN)    8000000000000427 ffff8801d8579010 0000000000000246 00000000deadbeef
> (XEN)    ffff8801d8579000 ffff8801d8579000 00000000fffffffe ffffffff8000302a
> (XEN)    00000000deadbeef 00000000deadbeef 00000000deadbeef 0000010000000000
> (XEN)    ffffffff8000302a 000000000000e033 0000000000000246 ffff8801a515bd10
> (XEN)    000000000000e02b 000000000000beef 000000000000beef 000000000000beef
> (XEN)    000000000000beef 0000000000000000 ffff8300db6c6000 0000000000000000
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN)    [<ffff82c48015a88d>] idle_loop+0x6c/0x7b
> (XEN)
> (XEN) 'd' pressed -> dumping registers
> (XEN)
> (XEN) *** Dumping CPU0 host state: ***
> (XEN) ----[ Xen-4.2.24169-20111122.144218  x86_64  debug=y  Tainted:    C
> ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
> (XEN) rax: 0000000000003b40   rbx: 00000078f4fbe7ed   rcx: 0000000000000001
> (XEN) rdx: 0000000000000000   rsi: ffff82c48030f000   rdi: ffff82c4802bfea0
> (XEN) rbp: ffff82c4802bfee0   rsp: ffff82c4802bfe78   r8:  00000000cd4f8db6
> (XEN) r9:  0000000000000002   r10: ffff82c480308780   r11: 000000790438291d
> (XEN) r12: ffff83021e70e840   r13: ffff83021e70e8d0   r14: 00000078f412a61c
> (XEN) r15: ffff82c48030e440   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 00000000db4c4000   cr2: 0000000000beb000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfe78:
> (XEN)    ffff82c48019f0ca ffff82c4802bff18 ffffffffffffffff ffff82c4802bfed0
> (XEN)    0000000180124b57 0000000000000000 0000000000000000 ffff82c48025b200
> (XEN)    0000239e00007657 ffff82c4802bff18 ffff82c48025b200 ffff82c4802bff18
> (XEN)    ffff82c48030e468 ffff82c4802bff10 ffff82c48015a88d 0000000000000000
> (XEN)    ffff8300db6c6000 ffff8300db6c6000 ffffffffffffffff ffff82c4802bfe00
> (XEN)    0000000000000000 0000000000001000 0000000000001000 0000000000000000
> (XEN)    8000000000000427 ffff8801d8579010 0000000000000246 00000000deadbeef
> (XEN)    ffff8801d8579000 ffff8801d8579000 00000000fffffffe ffffffff8000302a
> (XEN)    00000000deadbeef 00000000deadbeef 00000000deadbeef 0000010000000000
> (XEN)    ffffffff8000302a 000000000000e033 0000000000000246 ffff8801a515bd10
> (XEN)    000000000000e02b 000000000000beef 000000000000beef 000000000000beef
> (XEN)    000000000000beef 0000000000000000 ffff8300db6c6000 0000000000000000
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48019bfe6>] hpet_broadcast_exit+0x0/0x1f9
> (XEN)    [<ffff82c48015a88d>] idle_loop+0x6c/0x7b
> (XEN)
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread