From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: Strange PVM spinlock case revisited Date: Tue, 12 Feb 2013 16:53:06 +0100 Message-ID: <511A6562.4070303@canonical.com> References: <51151ABF.1070007@canonical.com> <1360603744.20449.74.camel@zakaz.uk.xensource.com> <511A46AC.5020202@canonical.com> <1360678069.20449.169.camel@zakaz.uk.xensource.com> <511A56A5.1040305@canonical.com> <1360681641.20449.178.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0573501494782216971==" Return-path: In-Reply-To: <1360681641.20449.178.camel@zakaz.uk.xensource.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Ian Campbell Cc: Andrew Cooper , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============0573501494782216971== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig84979C251F77BDAB5045B7EA" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig84979C251F77BDAB5045B7EA Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 12.02.2013 16:07, Ian Campbell wrote: > On Tue, 2013-02-12 at 14:50 +0000, Stefan Bader wrote: >> On 12.02.2013 15:07, Ian Campbell wrote: >>> On Tue, 2013-02-12 at 13:42 +0000, Stefan Bader wrote: >>>> On 11.02.2013 18:29, Ian Campbell wrote: >>>>> On Fri, 2013-02-08 at 15:33 +0000, Stefan Bader wrote: >>>>>> A while ago I had been reporting an issue which causes Xen PVM gue= sts >>>>>> to hang (apparently related to spinlocks). Since then I was able t= o >>>>>> gather a few more facts which I try to provide below. For the real= >>>>>> reasons, I am still puzzled and would be thankful for any hints or= >>>>>> direction. >>>>>> >>>>>> - Happens with Xen 4.1.2 and Xen 4.2 host-side. >>>>>> - PVM guest with 8 VCPUs is running 800 threads doing DB updates. >>>>> >>>>> This is on a host with >=3D 8 PCPUs, correct? >>>> >>>> Maybe >=3D4 but I cannot remember for sure anymore. >>> >>> OK, so the important point I was getting at was whether the guest was= >>> overcommitting the host or not, it seems like it is (2xVCPUs for each= >>> PCPU) >> >> Err not so much. I run on an 8-core host. ;) >=20 > Um, which is what I asked. What is >=3D 4 then? Uhm, me not being able to discern between P and V at some point. So what = I was thinking, was that I believe I had this when running with 4 VCPUs or 8 VC= PUs on the same 8 PCPU host... >=20 >=20 >>> >>>>> I suppose xen_spin_unlock could also be instrumented to indicate wh= o it >>>>> last kicked and for which lock and that might show us something? >>>> >>>> IIRC I had done this but it did not really show much. What I have do= ne in the >>>> meantime was to instrument the IRQ service functions in >>>> arch/x86/kernel/entry_64.S to give me a history of callbacks. This s= hows (in >>>> another experiment where it is 1,2,5,6 in the same lock and 2,5,6 st= ill polling) >>>> that the last events on the vcpu not polling are: >>>> >>>> xen_do_hypervisor_callback+127 >>>> call_softirq+29 >>>> call_softirq+125 >>>> call_softirq+29 >>>> call_softirq+125 >>>> call_softirq+29 >>>> call_softirq+125 >>>> xen_do_hypervisor_callback+28 >>>> call_softirq+29 >>>> xen_do_hypervisor_callback+28 >>>> >>>> The lower offsets are when irq_count gets incremented and the higher= offsets are >>>> when irq_count gets decremented before ending the callback. This sho= ws that at >>>> least in this case there was an upcall, a softirq and another upcall= being >>>> triggered without finishing the previous one. There was another expe= riment where >>>> I saw softirq, upcall, upcall. But at least it seems that there is a= lways a >>>> three level stack. >>>> I believe that softirq, upcall would be ok as softirq processing ena= bles >>>> interrupts but is it expected to have an upcall interrupted by anoth= er one? >>> >>> I'm not sure. evtchn's don't have a priority mechanism so I expect no= t >>> in general but individual interrupt handlers could well reenable >>> interrupts, I think (it might be one of the flags you pass when calli= ng >>> request_irq?). >> >> Just a gut feeling but it feels wrong to enable interrupts in any inte= rrupt >> handler. I thought for anything that needs interrupts enabled and/or t= akes >> longer it should pushed off to a workqueue... >=20 > Either that or some sort of interrupt prioritisation mechanism in the > hardware so only higher priority interrupts than the current one will > occur. >=20 >> >>> >>>>> Can someone explain why the non-locked update of lock_spinners in >>>>> spinning_lock() is safe against xen_spin_unlock reading the field f= rom >>>>> another VCPU? I suspect it isn't, so what happens if the other VCPU= >>>>> kicks the lock which is just about to become prev? maybe this is ha= ndled >>>>> in xen_spin_lock_slow somehow? >>>> >>>> Isn't that safe because lock_spinners is percpu? >>> >>> xen_spin_unlock_slow() accesses the lock_spinners of remote CPUs, whi= ch >>> is what I think might be unsafe. >> >> Ah right, that. Hm, the way those two play together always was a bit b= rain >> hurting. >=20 > Yes... >=20 >> Though somehow it feels like if the top level poll_irq would return a= nd >> count things as spurious wakeup. I think in that case I would expect t= he >> lock_spinner entry of one vcpu to be not matching the freed lock... >> Not a really scientific argument. >=20 > Adding a BUG_ON(__this_cpu_write(lock_spinners) !=3D xl) in > unspinning_lock might be interesting? Hm, beside of writing prev into there which always should not be xl, I th= ink since write access only happens in spin_lock_slow that should stack ok. T= he two cases that would go wrong are sending a wakeup event when the lock was ju= st replaced on the other vcpu or not sending one when it should be. IIRC I had one attempt print when xen_spin_unlock_slow went through all v= cpus without finding one to signal. That seemed to happen rather often but the= n would resolve as xen_spin_lock_slow for the other lock would set the event pend= ing for the prev lock if there was one (after being sucessful with the top level = lock). I think there is room for improvement by finding a way not always to sear= ch from vcpu 0 in spin_unlock_slow and maybe in spin_lock_slow to check whether returning from poll without the current lock being free may be because of= prev being free now. When there are more than one spinner may be able to proce= ed... >=20 > I wonder if this is something where the ftrace and related kernel > infrastructure might be useful? MAybe. Not sure there are hooks for what we are looking for and whether t= he system if not hanging too badly. My usual approach of looking at the dump= would require to manually find the buffer and parse the records... --------------enig84979C251F77BDAB5045B7EA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCgAGBQJRGmViAAoJEOhnXe7L7s6jVN8P/1AhkAiOIodQfLqufAkxSQqV zX/2g+3H+YAnJJGqkqMdgo7IbLLdiWuc++swNeTjjyBFiHseTss02ubY2e8nvbnx bIo9LpA/VzYYZ46eOp1uiXJ4sNZ/Zw0/Vb8vS+Ngcvt9QllTFyJP+SMSJkB12T+Y vn6O9xe1VwvlBkI8kk5Y9I6OMlmi3C/QlvMrwtlSPvdJ24hFC8TOyRpzVFY+zDjN OYRWPjrzkqzXCBal85Rgl60sthpMwe7i+C2Nzl0icjp9/kjC4Oc0zYX/E3k7X2ra WXxjutJuzkij6mW8oZ2LrMzQjBhGrGdrYYvlcQwrAb8lngAqEPeqGLa5p9Li9eQj 3A5WCVs8ewdbGsu9NV3GKsCEI5Y8/lZ5dESfFEPjxc5F8g9IcHrTKYG9x28pIoif hKQNT56Rxx8RvxcEZMGHSel7f+wYbExh2eqXuQkO4NzaeJ3j/q9XXKGNoeecTsEc 16RoFTpRZ3dlJuAtGWMCMhFhp0gEkjOnmuWIliV3WcJc4FyXD79VDj8cqighutrp LNI8ypcZdmW0AhSio21n+pZDiZyyZG2XEMvflPYuyB7u7dkh32eKo+E1d+FerO3E Nlg5J6c/vxtaQjWk5NyL2wWJ5FzpHeQ05qYeZBnV+Q0lL18YzHQ/gIhbISrmwf3m klZI0vf2CI6K60g0GYai =/jEs -----END PGP SIGNATURE----- --------------enig84979C251F77BDAB5045B7EA-- --===============0573501494782216971== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============0573501494782216971==--