From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 09/16] xen: sched: close potential races when switching scheduler to CPUs Date: Tue, 5 Apr 2016 18:26:31 +0200 Message-ID: <1459873591.3166.54.camel@citrix.com> References: <20160318185524.8117.74837.stgit@Solace.station> <20160318190505.8117.89778.stgit@Solace.station> <56F2E8F7.4000200@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3485179828167848187==" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1anTrF-0004ho-GZ for xen-devel@lists.xenproject.org; Tue, 05 Apr 2016 16:29:37 +0000 In-Reply-To: <56F2E8F7.4000200@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: George Dunlap , xen-devel@lists.xenproject.org Cc: Tianyang Chen , Meng Xu List-Id: xen-devel@lists.xenproject.org --===============3485179828167848187== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=-KZf2xvKwlt55fC6vNmzc" --=-KZf2xvKwlt55fC6vNmzc Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2016-03-23 at 19:05 +0000, George Dunlap wrote: > On 18/03/16 19:05, Dario Faggioli wrote: > >=20 > > by using the sched_switch hook that we have introduced in > > the various schedulers. > >=20 > > The key is to let the actual switch of scheduler and the > > remapping of the scheduler lock for the CPU (if necessary) > > happen together (in the same critical section) protected > > (at least) by the old scheduler lock for the CPU. > >=20 > > This also means that, in Credit2 and RTDS, we can get rid > > of the code that was doing the scheduler lock remapping > > in csched2_free_pdata() and rt_free_pdata(), and of their > > triggering ASSERT-s. > >=20 > > Signed-off-by: Dario Faggioli > Similar to my comment before -- in my own tree I squashed patches 6-9 > into a single commit and found it much easier to review. :-) >=20 I understand your point. I'll consider doing something like this for v2 (that I'm just finishing putting together), but I'm not sure I like it. For instance, although the issue has the same roots and similar consequences for all schedulers, the actual race is different between Credit1 and Credit2 (RTDS is the same as Credit2), and having distinct patches for each scheduler allows me to describe both the situations in details, in their respective changelog, without the changelogs themselves becoming too long (they're actually quite long already!!). > One important question... > >=20 > > --- a/xen/common/schedule.c > > +++ b/xen/common/schedule.c > >=C2=A0 > > @@ -1652,17 +1661,20 @@ int schedule_cpu_switch(unsigned int cpu, > > struct cpupool *c) > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return -ENOMEM; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} > > =C2=A0 > > -=C2=A0=C2=A0=C2=A0=C2=A0lock =3D pcpu_schedule_lock_irq(cpu); > > - > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0SCHED_OP(old_ops, tick_suspend, cpu); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0/* > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* The actual switch, including (if neces= sary) the rerouting > > of the > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* scheduler lock to whatever new_ops pre= fers,=C2=A0=C2=A0needs to > > happen in one > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* critical section, protected by old_ops= ' lock, or races are > > possible. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* Since each scheduler has its own contr= aints and locking > > scheme, do > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0* that inside specific scheduler code, r= ather than here. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0*/ > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0vpriv_old =3D idle->sched_priv; > > -=C2=A0=C2=A0=C2=A0=C2=A0idle->sched_priv =3D vpriv; > > -=C2=A0=C2=A0=C2=A0=C2=A0per_cpu(scheduler, cpu) =3D new_ops; > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ppriv_old =3D per_cpu(schedule_data, cpu)= .sched_priv; > > -=C2=A0=C2=A0=C2=A0=C2=A0per_cpu(schedule_data, cpu).sched_priv =3D ppr= iv; > > -=C2=A0=C2=A0=C2=A0=C2=A0SCHED_OP(new_ops, tick_resume, cpu); > > +=C2=A0=C2=A0=C2=A0=C2=A0SCHED_OP(new_ops, switch_sched, cpu, ppriv, vp= riv); > Is it really safe to read sched_priv without the lock held? >=20 So, you put down a lot more reasoning on this issue in another email, and I'll reply in more length to that one. But just about this specific thing. We're in schedule_cpu_switch(), and schedule_cpu_switch() is indeed the only function that changes the content of sd->sched_priv, when the system is _live_. It both reads the old pointer, stash it, allocate the new one, assign it, and free the old one. It's therefore only because of this function that a race can happen. In fact, the only other situation where sched_priv changes is during cpu bringup (CPU_UP_PREPARE phase), or teardown. But those cases are not of much concern (and, in fact, there's no locking in there, independently from this series). Now, schedule_cpu_switch is called by: 1 cpupool.c=C2=A0=C2=A0cpupool_assign_cpu_locked=C2=A0=C2=A0=C2=A0=C2=A0268= ret =3D schedule_cpu_switch(cpu, c); 2 cpupool.c=C2=A0=C2=A0cpupool_unassign_cpu_helper=C2=A0=C2=A0325 ret =3D s= chedule_cpu_switch(cpu, NULL); to move the cpu inside or outside from a cpupool, and in both cases, we have taken the cpupool_lock spinlock already when calling it. So, yes, it looks to me that sched_priv is safe to be manipulated like the patch is doing... Am I overlooking something? Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-KZf2xvKwlt55fC6vNmzc Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlcD5zcACgkQk4XaBE3IOsSdFQCghjZDRYtnag5MR1YnDXcQefZ0 H2QAn3INQTg9BQQ8fnS/84SAAPCsXgWh =Fb2j -----END PGP SIGNATURE----- --=-KZf2xvKwlt55fC6vNmzc-- --===============3485179828167848187== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============3485179828167848187==--