From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: Strange PVM spinlock case revisited Date: Tue, 12 Feb 2013 15:50:13 +0100 Message-ID: <511A56A5.1040305@canonical.com> References: <51151ABF.1070007@canonical.com> <1360603744.20449.74.camel@zakaz.uk.xensource.com> <511A46AC.5020202@canonical.com> <1360678069.20449.169.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============9144088592457002976==" Return-path: In-Reply-To: <1360678069.20449.169.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Andrew Cooper , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============9144088592457002976== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enigB330817712CDC1EA33DC055E" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigB330817712CDC1EA33DC055E Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 12.02.2013 15:07, Ian Campbell wrote: > On Tue, 2013-02-12 at 13:42 +0000, Stefan Bader wrote: >> On 11.02.2013 18:29, Ian Campbell wrote: >>> On Fri, 2013-02-08 at 15:33 +0000, Stefan Bader wrote: >>>> A while ago I had been reporting an issue which causes Xen PVM guest= s >>>> to hang (apparently related to spinlocks). Since then I was able to >>>> gather a few more facts which I try to provide below. For the real >>>> reasons, I am still puzzled and would be thankful for any hints or >>>> direction. >>>> >>>> - Happens with Xen 4.1.2 and Xen 4.2 host-side. >>>> - PVM guest with 8 VCPUs is running 800 threads doing DB updates. >>> >>> This is on a host with >=3D 8 PCPUs, correct? >> >> Maybe >=3D4 but I cannot remember for sure anymore. >=20 > OK, so the important point I was getting at was whether the guest was > overcommitting the host or not, it seems like it is (2xVCPUs for each > PCPU) Err not so much. I run on an 8-core host. ;) >=20 >=20 >>>> From the static information I can extract from a dump it is hard to >>>> tell what exactly is going on. VCPU2 seems to be doing something but= >>>> looks to be rather deep in event handling. The irq_count is at 2, bu= t >>>> irq_count seems to be used in a few more places than just xen_hyperv= isor >>>> callback. Though that at least seems to be the entry on the stack I = can >>>> see for VCPU2. The contents of the per-cpu variables for irq_reqs an= d >>>> irq_stack_ptr at least seem consistent as if a callback came in whil= e >>>> being on the sched_op hypercall. But then the bt command seems to be= >>>> operating on a completely different stack. >>>> VCPU2 seems to be active, not polling (event processing could be ini= tiated >>>> from enable_irqs via force callback or before finally exiting the sc= hed_op >>>> hypercall), there seems to be a pending upcall but upcalls are maske= d. >>>> The next thing I would try is to instrument the sites incrementing a= nd >>>> decrementing irq_count... >>>> --- >>>> >>>> Below some of the info seen from dom0 debugging keys and looking >>>> at a dump with crash: >>>> >>>> Dom0 Info: >>>> >>>> Polling vCPUs: {1,5-7} >>>> VCPU0: CPU0 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D = 00 >>>> VCPU1: CPU1 [has=3DF] poll=3D10 upcall_pend =3D 00, upcall_mask =3D = 00 >>>> VCPU2: CPU6 [has=3DT] poll=3D0 upcall_pend =3D 01, upcall_mask =3D = 01 >>>> VCPU3: CPU2 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D = 00 >>>> VCPU4: CPU4 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D = 00 >>>> VCPU5: CPU3 [has=3DF] poll=3D0 upcall_pend =3D 00, upcall_mask =3D = 00 >>>> VCPU6: CPU7 [has=3DF] poll=3D40 upcall_pend =3D 01, upcall_mask =3D = 01 >>>> VCPU7: CPU4 [has=3DF] poll=3D46 upcall_pend =3D 01, upcall_mask =3D = 01 >>>> >>>> 10 [0/1]: s=3D6 n=3D1 x=3D0 >>>> 40 [0/1]: s=3D6 n=3D6 x=3D0 >>>> 46 [0/1]: s=3D6 n=3D7 x=3D0 >>>> >>>> Guest Info: >>>> >>>> lock_spinners: >>>> [0] (struct xen_spinlock *) 0x >>>> [1] (struct xen_spinlock *) 0xffff8803bffc8ae8 =3D { lock=3D1, spinn= ers=3D2 } >>>> [2] (struct xen_spinlock *) 0xffff8803bfc93700 =3D { lock=3D0, spinn= ers=3D2 } >>> >>> If I understand correctly this means that 2 threads (VCPU2 & 6, I >>> suppose) are spinning on this lock, but no one is holding it? >> >> To my reading both went into xen_spinlock_slow(). VCPU2 may be either = in the >> raw_local_irq_enable() before doing poll_irq or maybe in handling some= event >> after receiving the wakeup irq but before returning from the hv call. = So the >> VCPU that had the lock has returned it and likely the wakeup has been = sent. >=20 > Makes sense, so we are in the window between one holder unlocking and > the next unlocking, which explains why noone is currently holding it > (which I previously thought odd). >=20 >>> An interesting hack^Wexperiment might be to make xen_poll_irq use a >>> timeout and see if that unwedges things -- this would help confirm th= at >>> the issue is on nested wakeup. >> >> I could do that. This would at least re-activate the other waiters. In= the case >> shown VCPU6 and if something in the path of execution on VCPU2 deadloc= ks there >> this would move things ahead. >=20 > Yes, it would be interesting e.g. if only either 6 or 2 made progress > and the other remained wedged for all time. On my list next... >=20 >>> I suppose xen_spin_unlock could also be instrumented to indicate who = it >>> last kicked and for which lock and that might show us something? >> >> IIRC I had done this but it did not really show much. What I have done= in the >> meantime was to instrument the IRQ service functions in >> arch/x86/kernel/entry_64.S to give me a history of callbacks. This sho= ws (in >> another experiment where it is 1,2,5,6 in the same lock and 2,5,6 stil= l polling) >> that the last events on the vcpu not polling are: >> >> xen_do_hypervisor_callback+127 >> call_softirq+29 >> call_softirq+125 >> call_softirq+29 >> call_softirq+125 >> call_softirq+29 >> call_softirq+125 >> xen_do_hypervisor_callback+28 >> call_softirq+29 >> xen_do_hypervisor_callback+28 >> >> The lower offsets are when irq_count gets incremented and the higher o= ffsets are >> when irq_count gets decremented before ending the callback. This shows= that at >> least in this case there was an upcall, a softirq and another upcall b= eing >> triggered without finishing the previous one. There was another experi= ment where >> I saw softirq, upcall, upcall. But at least it seems that there is alw= ays a >> three level stack. >> I believe that softirq, upcall would be ok as softirq processing enabl= es >> interrupts but is it expected to have an upcall interrupted by another= one? >=20 > I'm not sure. evtchn's don't have a priority mechanism so I expect not > in general but individual interrupt handlers could well reenable > interrupts, I think (it might be one of the flags you pass when calling= > request_irq?). Just a gut feeling but it feels wrong to enable interrupts in any interru= pt handler. I thought for anything that needs interrupts enabled and/or take= s longer it should pushed off to a workqueue... >=20 >>> Can someone explain why the non-locked update of lock_spinners in >>> spinning_lock() is safe against xen_spin_unlock reading the field fro= m >>> another VCPU? I suspect it isn't, so what happens if the other VCPU >>> kicks the lock which is just about to become prev? maybe this is hand= led >>> in xen_spin_lock_slow somehow? >> >> Isn't that safe because lock_spinners is percpu? >=20 > xen_spin_unlock_slow() accesses the lock_spinners of remote CPUs, which= > is what I think might be unsafe. Ah right, that. Hm, the way those two play together always was a bit brai= n hurting. Though somehow it feels like if the top level poll_irq would ret= urn and count things as spurious wakeup. I think in that case I would expect the lock_spinner entry of one vcpu to be not matching the freed lock... Not a really scientific argument. >=20 > Ian. >=20 --------------enigB330817712CDC1EA33DC055E Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCgAGBQJRGlalAAoJEOhnXe7L7s6j+1oP/08jOOODxE7u2zRq1cdxkYRJ TXGrorgkovk7xRoRCrAyjNukRDep+hA4POyDmTiwYJkSwwTvwGTYVAprRopZS8l/ q2nt63LxJ1dLS51xSCejxwzd7bZcJWALmPWa69kJh+A9HSebTADjLtiESi5N6wM9 DJYMVlHqV++Gb3DaWsTf7KYfYqUi5H0Kqy9kXS47fKAc/u/IQGYsMILfOpnxNDLN NqDqJAzIxtdEXlmXMCzgLLZW1O5XbYdgPp+QKUzT4UBssDgNfztUoWRDizjWHT/R Vkyo+DnYxA+2qX3LDwF1dRy6vh6d6wY1cnPPhGI6RmqAsL00CHbk/aDhqs42yzoS Ox30SJ38KVc65gGFPRFasgkuyOC+6QJYDm/7h9cHTOWSx4G2TtMYc5qoZrRN+Vcc VirMbb28kP9N3FHgUYnsfcEPKAn4fN1SBQuy1oPUxlDa+u3otibSFf93Gvz2s8ld uvmH/LK4+lfpAwCleZZxhwZKzJ9kvC5ejQ50Hq+0X6bJ7jWPYRVOB1/48U4NS8sg rU6dfM1gU6UIj3qKu/cYHsJoZSNB5cjvHBPehz3hdTuLf7hanSocXcgRfAnKYePh Y+FcCUDsQTKgMRCu7S2WJEgJjBPI0pD0J7jgVncK5MnOPHXwW1kzgr+UdDZsMov9 6h6Ih1ePWrbcRLTQ2/E1 =RPP+ -----END PGP SIGNATURE----- --------------enigB330817712CDC1EA33DC055E-- --===============9144088592457002976== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============9144088592457002976==--