From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH v2 1/2] xen: fix a (latent) cpupool-related race during domain destroy Date: Fri, 15 Jul 2016 13:52:42 +0200 Message-ID: <1468583562.13039.96.camel@citrix.com> References: <146851288308.22413.4619190133086534604.stgit@Solace.fritz.box> <146851308019.22413.8905002507733716302.stgit@Solace.fritz.box> <5788AF33.2030603@suse.com> <1468577660.13039.78.camel@citrix.com> <5788BCA2.4020404@suse.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6701675374421404407==" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bO1ga-0000af-Rq for xen-devel@lists.xenproject.org; Fri, 15 Jul 2016 11:53:40 +0000 In-Reply-To: <5788BCA2.4020404@suse.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Juergen Gross , xen-devel@lists.xenproject.org Cc: George Dunlap , Andrew Cooper , Jan Beulich List-Id: xen-devel@lists.xenproject.org --===============6701675374421404407== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-Sj3BbNhJNQy4SLR+Fz74" --=-Sj3BbNhJNQy4SLR+Fz74 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-07-15 at 12:36 +0200, Juergen Gross wrote: > On 15/07/16 12:14, Dario Faggioli wrote: > > In particular, I'm probably not fully understanding, from that > > commit > > changelog, what is the set of operations/command that I should run > > to > > check whether or not I reintroduced the issue back. > You need to create a domain in a cpupool and destroy it again while > some dom0 process still is holding a reference to it (resulting in a > zombie domain). Then try to destroy the cpupool. >=20 Ah, I see. I wasn't get the fact that it needed to be a zombie domain from anywhere. > > What am I missing? > The domain being a zombie domain might change the picture. Moving it > to > cpupool0 was failing before my patch and it might do so again with > your > patch applied. >=20 Mmmm... I don't immediately see the reason why moving a zombie domain fails either, but I guess I'll have to try. But then, correct me if I'm wrong, the situation is like this: =C2=A0- right now there's a (potential) race between domain's scheduling=C2= =A0 =C2=A0 =C2=A0data destruction and domain removal from a cpupool; =C2=A0- with my patch, the race goes away, but we risk not being able to=C2= =A0 =C2=A0 =C2=A0destroy a cpupool with a zombie domain in it. I don't think we want to be in either of these two situations. :-( The race is never triggering so far, but I've already patches to Credit2 (finishing implementing soft affinity for it) that make it a real thing. I think I can work around that specific case by doing something like the below: diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index bc0e794..d91fd70 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -193,12 +193,9 @@ struct cpupool =C2=A0 =C2=A0static inline cpumask_t* cpupool_domain_cpumask(struct domain *d) =C2=A0{ -=C2=A0=C2=A0=C2=A0=C2=A0/* -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* d->cpupool is NULL only for the idle domai= n, and no one should -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* be interested in calling this for the idle= domain. -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ -=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(d->cpupool !=3D NULL); -=C2=A0=C2=A0=C2=A0=C2=A0return d->cpupool->cpu_valid; +=C2=A0=C2=A0=C2=A0=C2=A0/* No one should be interested in calling this for= the idle domain! */ +=C2=A0=C2=A0=C2=A0=C2=A0ASSERT(!is_idle_domain(d)); +=C2=A0=C2=A0=C2=A0=C2=A0return d->cpupool ? d->cpupool->cpu_valid : cpupoo= l0->cpu_valid; =C2=A0} =C2=A0 =C2=A0#endif /* __XEN_SCHED_IF_H__ */ But even if that would be acceptable (what do you think?), I still don't like to have the race there lurking. :-/ Therefore, I still think this patch is correct, but I'm up for investigating further and finding a way to solve the "zombie in cpupool" issue as well. Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-Sj3BbNhJNQy4SLR+Fz74 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXiM6PAAoJEBZCeImluHPu+fIQAJxzf6TzUn7BKcFRnseDaefj Gq4JUjUXPJTVb99BuEWkRI/YWXvUoPbt3AdTxgQvBeHDNVBZFKl17rW8T6Y+rOjC 7xTwHSilW8KOab1KF5anTiQjiwnXAVqoKR9mMuHqZc8x7aFfxacIi22fvLsB4lXQ AE5jb8Guj41g7sKp371/PkWnYsM+BXBEHpIGqnt8AKl0t6LO3R7dj4ubw9zFOR3A gCgzAZvBGnK6BajWwwdi8jbN+betHLqf1a7yHN5G1ngk2TsbxM/+T5bIsfYpWN7+ hgcuLP7ZlbK1/wJIcGJ1a/iJn+JfzwigoTL73teIbsA/g1wkegUac67FXMtcrZ0g G02YniVVwnmcCBaFwdb259+LnwjlbiBKQwxBnlZ+NQWHzoWI7vH4Mxj3fHbiSb5L DJjat8fpYCPa+OWgdiC2YZxlLcjo1YuwFeJg0NXesePBY8xL+fRz6q9y/j059Lkb V6Kxriqa4hR6PCa14qzzNnVD0v4kalALhLg9/fGGEQxGq5YPAs0kNEdyuS5eHYnU qvH14i73QhBx119DMjlip+MxGeVdHscVPhBR0efHKl719ZSLGcEb8SeP/ZBJ9znF lQL0ZbOzvyxoyWDkfPFEtaqr3bO3K2IeknGCchjDcN7T5yGfGxqmK5KXc7XX32YX 59HgoIhRdqm4ZCbodveG =feww -----END PGP SIGNATURE----- --=-Sj3BbNhJNQy4SLR+Fz74-- --===============6701675374421404407== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============6701675374421404407==--