From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Strange PVM spinlock case revisited Date: Fri, 08 Feb 2013 16:33:19 +0100 Message-ID: <51151ABF.1070007@canonical.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0007533134303517939==" Return-path: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "xen-devel@lists.xen.org" Cc: Andrew Cooper , Ian Campbell List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============0007533134303517939== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigDC212E15171035B117EDAF53" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigDC212E15171035B117EDAF53 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: quoted-printable A while ago I had been reporting an issue which causes Xen PVM guests to hang (apparently related to spinlocks). Since then I was able to gather a few more facts which I try to provide below. For the real reasons, I am still puzzled and would be thankful for any hints or direction. - Happens with Xen 4.1.2 and Xen 4.2 host-side. - PVM guest with 8 VCPUs is running 800 threads doing DB updates. - When hanging at least 2 VCPUs in xen_spin_lock_slow with the lock free.= IIRC the weird lock always belongs to one VCPU runqueue. - Testcase shows the problem for guest kernel versions v3.2..v3.5 (not verified farther back). Since v3.6 it cannot be reproduced. Bisect ends up with: 611ae8e3f5204f7480b3b405993b3352cfa16662 "x86/tlb: enable tlb flush range support for x86" - The other potential fix for this was to prevent xen_spin_lock_slow() from unmasking events (enabling interrupts) if those were enabled before the spinlock call. Maybe allowing or not allowing the interrupts in xen_spin_lock_slow just manages to tip the likelihood of getting nested deeper. The same way that the changed TLB flush may reduce it by causing lesser IPIs. =46rom the static information I can extract from a dump it is hard to tell what exactly is going on. VCPU2 seems to be doing something but looks to be rather deep in event handling. The irq_count is at 2, but irq_count seems to be used in a few more places than just xen_hypervisor callback. Though that at least seems to be the entry on the stack I can see for VCPU2. The contents of the per-cpu variables for irq_reqs and irq_stack_ptr at least seem consistent as if a callback came in while being on the sched_op hypercall. But then the bt command seems to be operating on a completely different stack. VCPU2 seems to be active, not polling (event processing could be initiate= d from enable_irqs via force callback or before finally exiting the sched_o= p hypercall), there seems to be a pending upcall but upcalls are masked. The next thing I would try is to instrument the sites incrementing and decrementing irq_count... --- Below some of the info seen from dom0 debugging keys and looking at a dump with crash: Dom0 Info: Polling vCPUs: {1,5-7} VCPU0: CPU0 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D 00 VCPU1: CPU1 [has=3DF] poll=3D10 upcall_pend =3D 00, upcall_mask =3D 00 VCPU2: CPU6 [has=3DT] poll=3D0 upcall_pend =3D 01, upcall_mask =3D 01 VCPU3: CPU2 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D 00 VCPU4: CPU4 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D 00 VCPU5: CPU3 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D 00 VCPU6: CPU7 [has=3DF] poll=3D40 upcall_pend =3D 01, upcall_mask =3D 01 VCPU7: CPU4 [has=3DF] poll=3D46 upcall_pend =3D 01, upcall_mask =3D 01 10 [0/1]: s=3D6 n=3D1 x=3D0 40 [0/1]: s=3D6 n=3D6 x=3D0 46 [0/1]: s=3D6 n=3D7 x=3D0 Guest Info: lock_spinners: [0] (struct xen_spinlock *) 0x [1] (struct xen_spinlock *) 0xffff8803bffc8ae8 =3D { lock=3D1, spinners=3D= 2 } [2] (struct xen_spinlock *) 0xffff8803bfc93700 =3D { lock=3D0, spinners=3D= 2 } [3] (struct xen_spinlock *) 0x [4] (struct xen_spinlock *) 0x [5] (struct xen_spinlock *) 0xffff8803bffc8ae8 -> [1] [6] (struct xen_spinlock *) 0xffff8803bfc93700 -> [2] [7] (struct xen_spinlock *) 0xffffffff81f15ef0 =3D { lock=3D1, spinners=3D= 1 } VCPU[2,6] -> lock of runqueue[2] =3D (struct rq *) 0xffff8803bfc93700 irq_count[1] =3D 1 irq_count[2] =3D 2 It is -1 for the rest. *(struct pt_regs *) irq_regs[2] =3D { .ip =3D 0xffffffff810013aa =3D=3D hypercall_page+938 -> sched_op .sp =3D 0xffff8803bfc83ce8 } char *irq_stack_ptr[2] =3D 0xffff8803bfc83fc0 #> bt -T ffff88033d65c4a0 PID: 2050 TASK: ffff88033d65c4a0 CPU: 2 COMMAND: "postgres" [ffff88033d72d530] get_page_from_freelist at ffffffff8111e6df [ffff88033d72d600] __alloc_pages_nodemask at ffffffff8111ebec [ffff88033d72d620] check_preempt_curr at ffffffff81050254 [ffff88033d72d640] pull_task at ffffffff8105e33e [ffff88033d72d670] balance_tasks at ffffffff8105e4b6 [ffff88033d72d680] radix_tree_lookup_slot at ffffffff8130db5e [ffff88033d72d690] find_get_page at ffffffff811161ee [ffff88033d72d6b0] find_lock_page at ffffffff81116286 [ffff88033d72d6e0] shmem_getpage_gfp at ffffffff8112dd10 [ffff88033d72d748] pte_pfn_to_mfn at ffffffff81005a78 [ffff88033d72d7a0] get_page_from_freelist at ffffffff8111e6df [ffff88033d72d7c0] _raw_spin_lock at ffffffff81655e4e [ffff88033d72d7d0] ext4_ext_check_cache at ffffffff81239248 [ffff88033d72d820] ext4_ext_map_blocks at ffffffff8123e269 [ffff88033d72d870] _raw_spin_lock at ffffffff81655e4e [ffff88033d72d880] enqueue_to_backlog at ffffffff8153e55f [ffff88033d72d890] __wake_up_common at ffffffff8104c1c8 [ffff88033d72d8e0] netif_rx.part.82 at ffffffff8153f2de [ffff88033d72d920] netif_rx at ffffffff8153f400 [ffff88033d72d960] loopback_xmit at ffffffff8146fb1c [ffff88033d72d990] dev_hard_start_xmit at ffffffff8153fea6 [ffff88033d72d9d0] do_softirq at ffffffff81016284 [ffff88033d72d9f0] local_bh_enable at ffffffff8106d614 [ffff88033d72da00] dev_queue_xmit at ffffffff81540309 [ffff88033d72da50] ip_finish_output at ffffffff815769ab [ffff88033d72da90] ip_output at ffffffff81577518 [ffff88033d72dac0] ip_local_out at ffffffff81576c09 [ffff88033d72dae0] ip_send_skb at ffffffff81577f4b [ffff88033d72db00] udp_send_skb at ffffffff8159aaa1 [ffff88033d72db40] ip_generic_getfrag at ffffffff81575030 [ffff88033d72db50] udp_sendmsg at ffffffff8159bd38 [ffff88033d72db90] irq_to_desc at ffffffff810d70b7 [ffff88033d72dbd0] notify_remote_via_irq at ffffffff813a74c1 [ffff88033d72dc10] ttwu_queue at ffffffff81052672 [ffff88033d72dc38] _raw_spin_unlock_irqrestore at ffffffff8165605e [ffff88033d72dc90] inet_sendmsg at ffffffff815a6474 --- bt -t would end here [ffff88033d72dcb0] apparmor_socket_sendmsg at ffffffff812d43f7 [ffff88033d72dcd0] sock_sendmsg at ffffffff815266fe [ffff88033d72dd70] md_make_request at ffffffff814e0606 [ffff88033d72ddd0] kmem_cache_free at ffffffff811605cf [ffff88033d72de10] mempool_free_slab at ffffffff81118507 [ffff88033d72de20] mempool_free at ffffffff811185b7 [ffff88033d72de50] sys_sendto at ffffffff8152974d [ffff88033d72dee0] ext4_sync_file at ffffffff8120f07b [ffff88033d72df00] vfs_write at ffffffff811764b0 [ffff88033d72df50] xen_hypervisor_callback at ffffffff816606db RIP: 00007f6852c7b39a RSP: 00007fff431454e0 RFLAGS: 00000212 RAX: 00007f6852fb38b8 RBX: 00007f6852fb3720 RCX: 000000000000014a RDX: 0000000000000150 RSI: 0000000000000140 RDI: 00007f6852fb3720 RBP: 0000000000000150 R8: 0000000000000015 R9: 0000000000000000 R10: 0000000000000000 R11: 00007f6852c9174a R12: 00007f6852fb3778 R13: 0000000000000140 R14: 00007f6852fb38b8 R15: 0000000000000001 ORIG_RAX: ffffffffffffffff CS: e033 SS: e02b #> bt -T ffff88033b812dc0 PID: 2069 TASK: ffff88033b812dc0 CPU: 6 COMMAND: "postgres" [ffff88033b897530] get_page_from_freelist at ffffffff8111e6df [ffff88033b897600] __alloc_pages_nodemask at ffffffff8111ebec [ffff88033b897640] radix_tree_lookup at ffffffff8130db6b [ffff88033b897650] irq_to_desc at ffffffff810d70b7 [ffff88033b897660] irq_get_irq_data at ffffffff810d9f4e [ffff88033b897670] balance_tasks at ffffffff8105e493 [ffff88033b897680] radix_tree_lookup_slot at ffffffff8130db5e [ffff88033b897690] find_get_page at ffffffff811161ee [ffff88033b8976b0] find_lock_page at ffffffff81116286 [ffff88033b8976e0] shmem_getpage_gfp at ffffffff8112dd10 [ffff88033b897748] pte_pfn_to_mfn at ffffffff81005a78 [ffff88033b897770] xen_set_pte_at at ffffffff81008e29 [ffff88033b897788] __raw_callee_save_xen_make_pte at ffffffff810052cd [ffff88033b8977b0] unlock_page at ffffffff8111589a [ffff88033b8977d0] __do_fault at ffffffff81138ac9 [ffff88033b8978a0] pvclock_clocksource_read at ffffffff8103dd15 [ffff88033b8978f0] xen_clocksource_read at ffffffff8100a850 [ffff88033b897900] sched_clock at ffffffff8101bd79 [ffff88033b897910] blk_rq_init at ffffffff812ec262 [ffff88033b897930] get_request at ffffffff812f004e [ffff88033b8979c0] cpumask_next_and at ffffffff81308c66 [ffff88033b8979e0] find_busiest_group at ffffffff8105a061 --- bt -t ends here [ffff88033b897a30] radix_tree_lookup at ffffffff8130db6b [ffff88033b897a40] irq_to_desc at ffffffff810d70b7 [ffff88033b897a50] irq_get_irq_data at ffffffff810d9f4e [ffff88033b897a60] info_for_irq at ffffffff813a636e [ffff88033b897a80] xen_poll_irq_timeout at ffffffff813a696e [ffff88033b897ac0] xen_poll_irq at ffffffff813a8450 [ffff88033b897ad0] xen_spin_lock_slow at ffffffff8163ad77 [ffff88033b897b20] xen_spin_lock at ffffffff810121da [ffff88033b897b40] _raw_spin_lock at ffffffff81655e4e [ffff88033b897b50] double_rq_lock at ffffffff8105119c [ffff88033b897b80] load_balance at ffffffff8105e788 [ffff88033b897b88] _raw_spin_unlock_irqrestore at ffffffff8165605e [ffff88033b897c10] idle_balance at ffffffff8163d57c [ffff88033b897c60] __schedule at ffffffff81653ea9 [ffff88033b897cc0] sleep_on_page at ffffffff81115810 [ffff88033b897ce0] schedule at ffffffff81653fcf [ffff88033b897cf0] io_schedule at ffffffff8165407f [ffff88033b897d10] sleep_on_page at ffffffff8111581e [ffff88033b897d20] __wait_on_bit at ffffffff8165489f [ffff88033b897d70] wait_on_page_bit at ffffffff81115988 [ffff88033b897da0] wake_bit_function at ffffffff8108a140 [ffff88033b897dc0] filemap_fdatawait_range at ffffffff81115a9c [ffff88033b897e60] do_writepages at ffffffff81120ee1 [ffff88033b897eb0] filemap_write_and_wait_range at ffffffff81117398 [ffff88033b897ee0] ext4_sync_file at ffffffff8120ef9f [ffff88033b897f00] vfs_write at ffffffff811764b0 [ffff88033b897f40] do_fsync at ffffffff811a4a36 [ffff88033b897f70] sys_fdatasync at ffffffff811a4d83 [ffff88033b897f80] system_call_fastpath at ffffffff8165e442 RIP: 00007f6852ce8240 RSP: 00007fff43145618 RFLAGS: 00000246 RAX: 000000000000004b RBX: ffffffff8165e442 RCX: 00007f6852ce1900 RDX: 00000000000000c5 RSI: 000000000000000c RDI: 000000000000000c RBP: 00000000000000c5 R8: 000000000073a550 R9: 0000000000000000 R10: 0000000000000004 R11: 0000000000000246 R12: ffffffff811a4d83 R13: ffff88033b897f78 R14: 0000000000000001 R15: 0000000000af7488 ORIG_RAX: 000000000000004b CS: e033 SS: e02b --------------enigDC212E15171035B117EDAF53 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCgAGBQJRFRrAAAoJEOhnXe7L7s6jbxEP/3p7L9ASFBZDCrxsnG7n67jQ 2MysQ0MWLtbVrngaGSCz5GLK609T1rVnTt4Yn67EGc2R01Rhnohfplelcp3l7jOt t2LidGO7A0oS2+XPLc+WfWQ5OzEqRoRX51A4UDffA7Ug4GJ/TkjSIHwiJ7xA8rNW 1pNozx5C9WivKouuMleWQYGXAZe1NzloC6EONXva+c9NEtHIJURG5DZI42e+/DTB TIeeU/SIus9+vzji9Kyvlvs6xry3+t2kUMRKN/u6PKTRgyK5zxaq3xtQQ4vBcrht XosLKxoSYo2nBo324KO/Rep5hMCgD2QCaHEByEpxfYlKwJUsbCX0FnyKSryv4FK8 xek3ui61hbHSpfEYAtPtOPQ48GqPIlJ+l5frq1Dh6FjCoIFPQgQgujFz0/u14dOz Gj/AmMZtkecGC0cJMHGEXJCNEL1YEw+n9+cey2ohriNzNnqat3CgUNNuxiqqTNcu Jv86GvbRqhQDs4lprPcenLtmctEJ/E/dMQ3c1DlSnLhLsxh5bfrpIGfzu7pWs4Ke HmqKzNpECpMXHJuMVVfYk9o7VKS8YKsUhG0bT81Xa9RWOKGMP/oWz62ErYOC5B8+ U/4vxnxBiE5SZY6KNPVnds2tHCRVSZvjekfpvmvtyVp4uPhAWlZZ6JDEqqlCyMBT KOcWVqMVUzGN1NtYgKqb =2vi6 -----END PGP SIGNATURE----- --------------enigDC212E15171035B117EDAF53-- --===============0007533134303517939== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0007533134303517939==--