All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Bader <stefan.bader@canonical.com>
To: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>
Subject: Strange PVM spinlock case revisited
Date: Fri, 08 Feb 2013 16:33:19 +0100	[thread overview]
Message-ID: <51151ABF.1070007@canonical.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 10471 bytes --]

A while ago I had been reporting an issue which causes Xen PVM guests
to hang (apparently related to spinlocks). Since then I was able to
gather a few more facts which I try to provide below. For the real
reasons, I am still puzzled and would be thankful for any hints or
direction.

- Happens with Xen 4.1.2 and Xen 4.2 host-side.
- PVM guest with 8 VCPUs is running 800 threads doing DB updates.
- When hanging at least 2 VCPUs in xen_spin_lock_slow with the lock free.
  IIRC the weird lock always belongs to one VCPU runqueue.
- Testcase shows the problem for guest kernel versions v3.2..v3.5 (not
  verified farther back). Since v3.6 it cannot be reproduced. Bisect
  ends up with: 611ae8e3f5204f7480b3b405993b3352cfa16662
    "x86/tlb: enable tlb flush range support for x86"
- The other potential fix for this was to prevent xen_spin_lock_slow()
  from unmasking events (enabling interrupts) if those were enabled
  before the spinlock call.

Maybe allowing or not allowing the interrupts in xen_spin_lock_slow
just manages to tip the likelihood of getting nested deeper. The same
way that the changed TLB flush may reduce it by causing lesser IPIs.
From the static information I can extract from a dump it is hard to
tell what exactly is going on. VCPU2 seems to be doing something but
looks to be rather deep in event handling. The irq_count is at 2, but
irq_count seems to be used in a few more places than just xen_hypervisor
callback. Though that at least seems to be the entry on the stack I can
see for VCPU2. The contents of the per-cpu variables for irq_reqs and
irq_stack_ptr at least seem consistent as if a callback came in while
being on the sched_op hypercall. But then the bt command seems to be
operating on a completely different stack.
VCPU2 seems to be active, not polling (event processing could be initiated
from enable_irqs via force callback or before finally exiting the sched_op
hypercall), there seems to be a pending upcall but upcalls are masked.
The next thing I would try is to instrument the sites incrementing and
decrementing irq_count...
---

Below some of the info seen from dom0 debugging keys and looking
at a dump with crash:

Dom0 Info:

Polling vCPUs: {1,5-7}
VCPU0: CPU0 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
VCPU1: CPU1 [has=F] poll=10 upcall_pend = 00, upcall_mask = 00
VCPU2: CPU6 [has=T] poll=0  upcall_pend = 01, upcall_mask = 01
VCPU3: CPU2 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
VCPU4: CPU4 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
VCPU5: CPU3 [has=F] poll=0  upcall_pend = 00, upcall_mask = 00
VCPU6: CPU7 [has=F] poll=40 upcall_pend = 01, upcall_mask = 01
VCPU7: CPU4 [has=F] poll=46 upcall_pend = 01, upcall_mask = 01

10 [0/1]: s=6 n=1 x=0
40 [0/1]: s=6 n=6 x=0
46 [0/1]: s=6 n=7 x=0

Guest Info:

lock_spinners:
[0] (struct xen_spinlock *) 0x
[1] (struct xen_spinlock *) 0xffff8803bffc8ae8 = { lock=1, spinners=2 }
[2] (struct xen_spinlock *) 0xffff8803bfc93700 = { lock=0, spinners=2 }
[3] (struct xen_spinlock *) 0x
[4] (struct xen_spinlock *) 0x
[5] (struct xen_spinlock *) 0xffff8803bffc8ae8 -> [1]
[6] (struct xen_spinlock *) 0xffff8803bfc93700 -> [2]
[7] (struct xen_spinlock *) 0xffffffff81f15ef0 = { lock=1, spinners=1 }

VCPU[2,6] -> lock of runqueue[2] = (struct rq *) 0xffff8803bfc93700

irq_count[1] = 1
irq_count[2] = 2
It is -1 for the rest.

*(struct pt_regs *) irq_regs[2] = {
	.ip = 0xffffffff810013aa == hypercall_page+938 -> sched_op
	.sp = 0xffff8803bfc83ce8
}

char *irq_stack_ptr[2] = 0xffff8803bfc83fc0

#> bt -T ffff88033d65c4a0
PID: 2050   TASK: ffff88033d65c4a0  CPU: 2   COMMAND: "postgres"
  [ffff88033d72d530] get_page_from_freelist at ffffffff8111e6df
  [ffff88033d72d600] __alloc_pages_nodemask at ffffffff8111ebec
  [ffff88033d72d620] check_preempt_curr at ffffffff81050254
  [ffff88033d72d640] pull_task at ffffffff8105e33e
  [ffff88033d72d670] balance_tasks at ffffffff8105e4b6
  [ffff88033d72d680] radix_tree_lookup_slot at ffffffff8130db5e
  [ffff88033d72d690] find_get_page at ffffffff811161ee
  [ffff88033d72d6b0] find_lock_page at ffffffff81116286
  [ffff88033d72d6e0] shmem_getpage_gfp at ffffffff8112dd10
  [ffff88033d72d748] pte_pfn_to_mfn at ffffffff81005a78
  [ffff88033d72d7a0] get_page_from_freelist at ffffffff8111e6df
  [ffff88033d72d7c0] _raw_spin_lock at ffffffff81655e4e
  [ffff88033d72d7d0] ext4_ext_check_cache at ffffffff81239248
  [ffff88033d72d820] ext4_ext_map_blocks at ffffffff8123e269
  [ffff88033d72d870] _raw_spin_lock at ffffffff81655e4e
  [ffff88033d72d880] enqueue_to_backlog at ffffffff8153e55f
  [ffff88033d72d890] __wake_up_common at ffffffff8104c1c8
  [ffff88033d72d8e0] netif_rx.part.82 at ffffffff8153f2de
  [ffff88033d72d920] netif_rx at ffffffff8153f400
  [ffff88033d72d960] loopback_xmit at ffffffff8146fb1c
  [ffff88033d72d990] dev_hard_start_xmit at ffffffff8153fea6
  [ffff88033d72d9d0] do_softirq at ffffffff81016284
  [ffff88033d72d9f0] local_bh_enable at ffffffff8106d614
  [ffff88033d72da00] dev_queue_xmit at ffffffff81540309
  [ffff88033d72da50] ip_finish_output at ffffffff815769ab
  [ffff88033d72da90] ip_output at ffffffff81577518
  [ffff88033d72dac0] ip_local_out at ffffffff81576c09
  [ffff88033d72dae0] ip_send_skb at ffffffff81577f4b
  [ffff88033d72db00] udp_send_skb at ffffffff8159aaa1
  [ffff88033d72db40] ip_generic_getfrag at ffffffff81575030
  [ffff88033d72db50] udp_sendmsg at ffffffff8159bd38
  [ffff88033d72db90] irq_to_desc at ffffffff810d70b7
  [ffff88033d72dbd0] notify_remote_via_irq at ffffffff813a74c1
  [ffff88033d72dc10] ttwu_queue at ffffffff81052672
  [ffff88033d72dc38] _raw_spin_unlock_irqrestore at ffffffff8165605e
  [ffff88033d72dc90] inet_sendmsg at ffffffff815a6474
  --- bt -t would end here
  [ffff88033d72dcb0] apparmor_socket_sendmsg at ffffffff812d43f7
  [ffff88033d72dcd0] sock_sendmsg at ffffffff815266fe
  [ffff88033d72dd70] md_make_request at ffffffff814e0606
  [ffff88033d72ddd0] kmem_cache_free at ffffffff811605cf
  [ffff88033d72de10] mempool_free_slab at ffffffff81118507
  [ffff88033d72de20] mempool_free at ffffffff811185b7
  [ffff88033d72de50] sys_sendto at ffffffff8152974d
  [ffff88033d72dee0] ext4_sync_file at ffffffff8120f07b
  [ffff88033d72df00] vfs_write at ffffffff811764b0
  [ffff88033d72df50] xen_hypervisor_callback at ffffffff816606db
    RIP: 00007f6852c7b39a  RSP: 00007fff431454e0  RFLAGS: 00000212
    RAX: 00007f6852fb38b8  RBX: 00007f6852fb3720  RCX: 000000000000014a
    RDX: 0000000000000150  RSI: 0000000000000140  RDI: 00007f6852fb3720
    RBP: 0000000000000150   R8: 0000000000000015   R9: 0000000000000000
    R10: 0000000000000000  R11: 00007f6852c9174a  R12: 00007f6852fb3778
    R13: 0000000000000140  R14: 00007f6852fb38b8  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: e033  SS: e02b

#> bt -T ffff88033b812dc0
PID: 2069   TASK: ffff88033b812dc0  CPU: 6   COMMAND: "postgres"
  [ffff88033b897530] get_page_from_freelist at ffffffff8111e6df
  [ffff88033b897600] __alloc_pages_nodemask at ffffffff8111ebec
  [ffff88033b897640] radix_tree_lookup at ffffffff8130db6b
  [ffff88033b897650] irq_to_desc at ffffffff810d70b7
  [ffff88033b897660] irq_get_irq_data at ffffffff810d9f4e
  [ffff88033b897670] balance_tasks at ffffffff8105e493
  [ffff88033b897680] radix_tree_lookup_slot at ffffffff8130db5e
  [ffff88033b897690] find_get_page at ffffffff811161ee
  [ffff88033b8976b0] find_lock_page at ffffffff81116286
  [ffff88033b8976e0] shmem_getpage_gfp at ffffffff8112dd10
  [ffff88033b897748] pte_pfn_to_mfn at ffffffff81005a78
  [ffff88033b897770] xen_set_pte_at at ffffffff81008e29
  [ffff88033b897788] __raw_callee_save_xen_make_pte at ffffffff810052cd
  [ffff88033b8977b0] unlock_page at ffffffff8111589a
  [ffff88033b8977d0] __do_fault at ffffffff81138ac9
  [ffff88033b8978a0] pvclock_clocksource_read at ffffffff8103dd15
  [ffff88033b8978f0] xen_clocksource_read at ffffffff8100a850
  [ffff88033b897900] sched_clock at ffffffff8101bd79
  [ffff88033b897910] blk_rq_init at ffffffff812ec262
  [ffff88033b897930] get_request at ffffffff812f004e
  [ffff88033b8979c0] cpumask_next_and at ffffffff81308c66
  [ffff88033b8979e0] find_busiest_group at ffffffff8105a061
  --- bt -t ends here
  [ffff88033b897a30] radix_tree_lookup at ffffffff8130db6b
  [ffff88033b897a40] irq_to_desc at ffffffff810d70b7
  [ffff88033b897a50] irq_get_irq_data at ffffffff810d9f4e
  [ffff88033b897a60] info_for_irq at ffffffff813a636e
  [ffff88033b897a80] xen_poll_irq_timeout at ffffffff813a696e
  [ffff88033b897ac0] xen_poll_irq at ffffffff813a8450
  [ffff88033b897ad0] xen_spin_lock_slow at ffffffff8163ad77
  [ffff88033b897b20] xen_spin_lock at ffffffff810121da
  [ffff88033b897b40] _raw_spin_lock at ffffffff81655e4e
  [ffff88033b897b50] double_rq_lock at ffffffff8105119c
  [ffff88033b897b80] load_balance at ffffffff8105e788
  [ffff88033b897b88] _raw_spin_unlock_irqrestore at ffffffff8165605e
  [ffff88033b897c10] idle_balance at ffffffff8163d57c
  [ffff88033b897c60] __schedule at ffffffff81653ea9
  [ffff88033b897cc0] sleep_on_page at ffffffff81115810
  [ffff88033b897ce0] schedule at ffffffff81653fcf
  [ffff88033b897cf0] io_schedule at ffffffff8165407f
  [ffff88033b897d10] sleep_on_page at ffffffff8111581e
  [ffff88033b897d20] __wait_on_bit at ffffffff8165489f
  [ffff88033b897d70] wait_on_page_bit at ffffffff81115988
  [ffff88033b897da0] wake_bit_function at ffffffff8108a140
  [ffff88033b897dc0] filemap_fdatawait_range at ffffffff81115a9c
  [ffff88033b897e60] do_writepages at ffffffff81120ee1
  [ffff88033b897eb0] filemap_write_and_wait_range at ffffffff81117398
  [ffff88033b897ee0] ext4_sync_file at ffffffff8120ef9f
  [ffff88033b897f00] vfs_write at ffffffff811764b0
  [ffff88033b897f40] do_fsync at ffffffff811a4a36
  [ffff88033b897f70] sys_fdatasync at ffffffff811a4d83
  [ffff88033b897f80] system_call_fastpath at ffffffff8165e442
    RIP: 00007f6852ce8240  RSP: 00007fff43145618  RFLAGS: 00000246
    RAX: 000000000000004b  RBX: ffffffff8165e442  RCX: 00007f6852ce1900
    RDX: 00000000000000c5  RSI: 000000000000000c  RDI: 000000000000000c
    RBP: 00000000000000c5   R8: 000000000073a550   R9: 0000000000000000
    R10: 0000000000000004  R11: 0000000000000246  R12: ffffffff811a4d83
    R13: ffff88033b897f78  R14: 0000000000000001  R15: 0000000000af7488
    ORIG_RAX: 000000000000004b  CS: e033  SS: e02b


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 899 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

             reply	other threads:[~2013-02-08 15:33 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 15:33 Stefan Bader [this message]
2013-02-11 17:29 ` Strange PVM spinlock case revisited Ian Campbell
2013-02-12 13:42   ` Stefan Bader
2013-02-12 14:07     ` Ian Campbell
2013-02-12 14:50       ` Stefan Bader
2013-02-12 15:07         ` Ian Campbell
2013-02-12 15:53           ` Stefan Bader
2013-02-13 11:31   ` Stefan Bader
2013-02-13 12:16     ` Stefan Bader
2013-02-14 11:04 ` Strange PVM spinlock case revisited (nailed down) Stefan Bader
2013-02-14 11:43   ` Jan Beulich
2013-02-14 14:21     ` Stefan Bader
2013-02-14 14:48       ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51151ABF.1070007@canonical.com \
    --to=stefan.bader@canonical.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.