From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefan Bader <stefan.bader@canonical.com>
Subject: Re: Strange PVM spinlock case revisited
Date: Tue, 12 Feb 2013 16:53:06 +0100
Message-ID: <511A6562.4070303@canonical.com>
References: <51151ABF.1070007@canonical.com>
	<1360603744.20449.74.camel@zakaz.uk.xensource.com>
	<511A46AC.5020202@canonical.com>
	<1360678069.20449.169.camel@zakaz.uk.xensource.com>
	<511A56A5.1040305@canonical.com>
	<1360681641.20449.178.camel@zakaz.uk.xensource.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0573501494782216971=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <1360681641.20449.178.camel@zakaz.uk.xensource.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--===============0573501494782216971==
Content-Type: multipart/signed; micalg=pgp-sha512;
 protocol="application/pgp-signature";
 boundary="------------enig84979C251F77BDAB5045B7EA"

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig84979C251F77BDAB5045B7EA
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 12.02.2013 16:07, Ian Campbell wrote:
> On Tue, 2013-02-12 at 14:50 +0000, Stefan Bader wrote:
>> On 12.02.2013 15:07, Ian Campbell wrote:
>>> On Tue, 2013-02-12 at 13:42 +0000, Stefan Bader wrote:
>>>> On 11.02.2013 18:29, Ian Campbell wrote:
>>>>> On Fri, 2013-02-08 at 15:33 +0000, Stefan Bader wrote:
>>>>>> A while ago I had been reporting an issue which causes Xen PVM gue=
sts
>>>>>> to hang (apparently related to spinlocks). Since then I was able t=
o
>>>>>> gather a few more facts which I try to provide below. For the real=

>>>>>> reasons, I am still puzzled and would be thankful for any hints or=

>>>>>> direction.
>>>>>>
>>>>>> - Happens with Xen 4.1.2 and Xen 4.2 host-side.
>>>>>> - PVM guest with 8 VCPUs is running 800 threads doing DB updates.
>>>>>
>>>>> This is on a host with >=3D 8 PCPUs, correct?
>>>>
>>>> Maybe >=3D4 but I cannot remember for sure anymore.
>>>
>>> OK, so the important point I was getting at was whether the guest was=

>>> overcommitting the host or not, it seems like it is (2xVCPUs for each=

>>> PCPU)
>>
>> Err not so much. I run on an 8-core host. ;)
>=20
> Um, which is what I asked. What is >=3D 4 then?

Uhm, me not being able to discern between P and V at some point. So what =
I was
thinking, was that I believe I had this when running with 4 VCPUs or 8 VC=
PUs on
the same 8 PCPU host...

>=20
>=20
>>>
>>>>> I suppose xen_spin_unlock could also be instrumented to indicate wh=
o it
>>>>> last kicked and for which lock and that might show us something?
>>>>
>>>> IIRC I had done this but it did not really show much. What I have do=
ne in the
>>>> meantime was to instrument the IRQ service functions in
>>>> arch/x86/kernel/entry_64.S to give me a history of callbacks. This s=
hows (in
>>>> another experiment where it is 1,2,5,6 in the same lock and 2,5,6 st=
ill polling)
>>>> that the last events on the vcpu not polling are:
>>>>
>>>> xen_do_hypervisor_callback+127
>>>> call_softirq+29
>>>> call_softirq+125
>>>> call_softirq+29
>>>> call_softirq+125
>>>> call_softirq+29
>>>> call_softirq+125
>>>> xen_do_hypervisor_callback+28
>>>> call_softirq+29
>>>> xen_do_hypervisor_callback+28
>>>>
>>>> The lower offsets are when irq_count gets incremented and the higher=
 offsets are
>>>> when irq_count gets decremented before ending the callback. This sho=
ws that at
>>>> least in this case there was an upcall, a softirq and another upcall=
 being
>>>> triggered without finishing the previous one. There was another expe=
riment where
>>>> I saw softirq, upcall, upcall. But at least it seems that there is a=
lways a
>>>> three level stack.
>>>> I believe that softirq, upcall would be ok as softirq processing ena=
bles
>>>> interrupts but is it expected to have an upcall interrupted by anoth=
er one?
>>>
>>> I'm not sure. evtchn's don't have a priority mechanism so I expect no=
t
>>> in general but individual interrupt handlers could well reenable
>>> interrupts, I think (it might be one of the flags you pass when calli=
ng
>>> request_irq?).
>>
>> Just a gut feeling but it feels wrong to enable interrupts in any inte=
rrupt
>> handler. I thought for anything that needs interrupts enabled and/or t=
akes
>> longer it should pushed off to a workqueue...
>=20
> Either that or some sort of interrupt prioritisation mechanism in the
> hardware so only higher priority interrupts than the current one will
> occur.
>=20
>>
>>>
>>>>> Can someone explain why the non-locked update of lock_spinners in
>>>>> spinning_lock() is safe against xen_spin_unlock reading the field f=
rom
>>>>> another VCPU? I suspect it isn't, so what happens if the other VCPU=

>>>>> kicks the lock which is just about to become prev? maybe this is ha=
ndled
>>>>> in xen_spin_lock_slow somehow?
>>>>
>>>> Isn't that safe because lock_spinners is percpu?
>>>
>>> xen_spin_unlock_slow() accesses the lock_spinners of remote CPUs, whi=
ch
>>> is what I think might be unsafe.
>>
>> Ah right, that. Hm, the way those two play together always was a bit b=
rain
>> hurting.
>=20
> Yes...
>=20
>>  Though somehow it feels like if the top level poll_irq would return a=
nd
>> count things as spurious wakeup. I think in that case I would expect t=
he
>> lock_spinner entry of one vcpu to be not matching the freed lock...
>> Not a really scientific argument.
>=20
> Adding a BUG_ON(__this_cpu_write(lock_spinners) !=3D xl) in
> unspinning_lock might be interesting?

Hm, beside of writing prev into there which always should not be xl, I th=
ink
since write access only happens in spin_lock_slow that should stack ok. T=
he two
cases that would go wrong are sending a wakeup event when the lock was ju=
st
replaced on the other vcpu or not sending one when it should be.
IIRC I had one attempt print when xen_spin_unlock_slow went through all v=
cpus
without finding one to signal. That seemed to happen rather often but the=
n would
resolve as xen_spin_lock_slow for the other lock would set the event pend=
ing for
the prev lock if there was one (after being sucessful with the top level =
lock).

I think there is room for improvement by finding a way not always to sear=
ch from
vcpu 0 in spin_unlock_slow and maybe in spin_lock_slow to check whether
returning from poll without the current lock being free may be because of=
 prev
being free now. When there are more than one spinner may be able to proce=
ed...

>=20
> I wonder if this is something where the ftrace and related kernel
> infrastructure might be useful?

MAybe. Not sure there are hooks for what we are looking for and whether t=
he
system if not hanging too badly. My usual approach of looking at the dump=
 would
require to manually find the buffer and parse the records...


--------------enig84979C251F77BDAB5045B7EA
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCgAGBQJRGmViAAoJEOhnXe7L7s6jVN8P/1AhkAiOIodQfLqufAkxSQqV
zX/2g+3H+YAnJJGqkqMdgo7IbLLdiWuc++swNeTjjyBFiHseTss02ubY2e8nvbnx
bIo9LpA/VzYYZ46eOp1uiXJ4sNZ/Zw0/Vb8vS+Ngcvt9QllTFyJP+SMSJkB12T+Y
vn6O9xe1VwvlBkI8kk5Y9I6OMlmi3C/QlvMrwtlSPvdJ24hFC8TOyRpzVFY+zDjN
OYRWPjrzkqzXCBal85Rgl60sthpMwe7i+C2Nzl0icjp9/kjC4Oc0zYX/E3k7X2ra
WXxjutJuzkij6mW8oZ2LrMzQjBhGrGdrYYvlcQwrAb8lngAqEPeqGLa5p9Li9eQj
3A5WCVs8ewdbGsu9NV3GKsCEI5Y8/lZ5dESfFEPjxc5F8g9IcHrTKYG9x28pIoif
hKQNT56Rxx8RvxcEZMGHSel7f+wYbExh2eqXuQkO4NzaeJ3j/q9XXKGNoeecTsEc
16RoFTpRZ3dlJuAtGWMCMhFhp0gEkjOnmuWIliV3WcJc4FyXD79VDj8cqighutrp
LNI8ypcZdmW0AhSio21n+pZDiZyyZG2XEMvflPYuyB7u7dkh32eKo+E1d+FerO3E
Nlg5J6c/vxtaQjWk5NyL2wWJ5FzpHeQ05qYeZBnV+Q0lL18YzHQ/gIhbISrmwf3m
klZI0vf2CI6K60g0GYai
=/jEs
-----END PGP SIGNATURE-----

--------------enig84979C251F77BDAB5045B7EA--


--===============0573501494782216971==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============0573501494782216971==--