From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 5/5] xen: RCU: avoid busy waiting until the end of grace period. Date: Tue, 1 Aug 2017 02:47:25 +0200 Message-ID: <1501548445.30551.5.camel@citrix.com> References: <150114201043.22910.12807057883146318803.stgit@Solace> <150114249858.22910.4601418126082976816.stgit@Solace> <1501538621.30551.3.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6858362725583148101==" Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dcLLp-0006WN-Th for xen-devel@lists.xenproject.org; Tue, 01 Aug 2017 00:47:58 +0000 In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Stefano Stabellini Cc: xen-devel@lists.xenproject.org, Julien Grall , Jan Beulich , Andrew Cooper List-Id: xen-devel@lists.xenproject.org --===============6858362725583148101== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-gOnQlNNZN9w0nPLWWUou" --=-gOnQlNNZN9w0nPLWWUou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2017-07-31 at 16:58 -0700, Stefano Stabellini wrote: > On Tue, 1 Aug 2017, Dario Faggioli wrote: > > On Mon, 2017-07-31 at 14:20 -0700, Stefano Stabellini wrote: > > > On Thu, 27 Jul 2017, Dario Faggioli wrote: > > > >=20 > > > > diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c > > > > index f0fdc87..4586f2a 100644 > > > > --- a/xen/common/rcupdate.c > > > > +++ b/xen/common/rcupdate.c > > > > @@ -84,8 +84,14 @@ struct rcu_data { > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int cpu; > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0struct rcu_head barrier; > > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0last_rs_qlen;=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0/* qlen during the last > > > > resched */ > > > > + > > > > +=C2=A0=C2=A0=C2=A0=C2=A0/* 3) idle CPUs handling */ > > > > +=C2=A0=C2=A0=C2=A0=C2=A0struct timer idle_timer; > > > > +=C2=A0=C2=A0=C2=A0=C2=A0bool idle_timer_active; > > > > =C2=A0}; > > > > =C2=A0 > > > > +#define RCU_IDLE_TIMER_PERIOD MILLISECS(10) > > >=20 > > > Isn't this a bit too short? How is it chosen? > > >=20 > >=20 > > What makes you think it's short? >=20 > In terms of power saving and CPU sleep states, 10ms is not much to > sleep > for. I wonder if there are any power saving benefits in sleeping for > only 10ms (especially on x86 where entering and exiting CPU sleep > states > takes longer, to be confirmed). > I *think* we should be fine with, say, 100ms. But that's again, guess/rule of thumb, nothing more than that. And, TBH, I'm not even sure what a good experiment/benchmark would be, to assess whether a particular value is good or bad. :-/ > =C2=A0=C2=A0We might as well do the thing we need > to do immediately? I guess we cannot do that? > You're guessing correct, we can't. It's indeed a bit tricky, and it took it a little bit to me as well to figure all of it out properly. Basically, let's say that, at time t1, on CPU1, someone calls call_rcu(). The situation on the other CPUs is: CPU0 busy; CPU2 idle (no timer pending); CPU3 busy. So, a new grace period starts, and its exact end will be when CPU0, CPU1 and CPU3 have quiesced once (in Xen, quiescence means: "going through do_softirq()"). But RCU it's a passive mechanism, i.e., we rely on each CPU coming to the RCU core logic, and tell <>. Let's say that CPU0 quiesces at time t2 > t1. CPU1 quiesces at time t3 > t2, and goes fully idle (no timer pending). CPU3 quiesces at time t4 > t3. Now, at time t4, CPU1 can actually invoke the callbeck, queued at time t1, from within call_rcu(). This patch series solves two problems, of our current RCU implementation: 1) right now, we don't only wait for CPU0, CPU1 and CPU3, we also wait=C2= =A0 for CPU2. But since CPU2 is fully idle, it just won't bother=C2=A0 telling=C2=A0RCU that it has quiesced (well, on x86, that would actually= =C2=A0 happen=C2=A0at some point, while on ARM, it is really possible that this= =C2=A0 never happens!). This is solved in patch 3, by introducing the=C2=A0 cpumask; 2) now (after patch 3) we know that we just can avoid waiting for=C2=A0 CPU2. Good. But at time t4, when CPU3 quiesces, marking the end of the grace period, how would CPU1 --which is fully idle-- know that it=C2=A0can now safely invoke the callback? Again, RCU is passive, so it relies=C2=A0on CPU1 to figure that out on its own, next time it wakes up,=C2=A0e.g.,=C2=A0because of the periodic tick timer. But we don't hav= e a periodic tick timer! Well, this again means that, on x86, CPU1 will actually=C2=A0figure that out at some (unpredictable) point in future. On=C2=A0ARM,=C2=A0not=C2=A0so much. The purpose of the timer in this pat= ch is to make sure it always will. In fact, with patches 4 and 5 applied, at time t3, we let CPU1 go=C2=A0 idle, but we program the timer. It will fire at t3+T (with T=3D10ms,=C2= =A0 right now). When this happens, if t3+T > t4, CPU1 invokes the callback, and we're done. If not (and CPU1 is still idle) we retry in another T. So, when you say "immediately", *when* do you actually mean? We can't invoke the callback at t3 (i.e., immediately after CPU1 quiesces), because we need to wait for CPU3 to do the same. We can't invoke the callback when CPU3 quiesces, at t4 (i.e., immediately after the grace period ends) either, because the callback it's on CPU1, not on CPU3. Linux's solution is not to stop the tick for CPU1 at time t3. It will be stopped only after the first firing of the tick itself at time t > t4 (if CPU1 is still idle, of course). This timer thing is, IMO, pretty similar. The only difference is that we don't have a tick to not stop... So I had to make up a very special one. :-D TL;DR, yes, I also would have loved the world (or even just this problem) to be simple. It's not! ;-P Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-gOnQlNNZN9w0nPLWWUou Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZf8+dAAoJEBZCeImluHPujFEQAO/Qnpb0VdJYfUsd8K9TUUlO QfEL+1md6Ojxi56Y5xfdYD53NC2IU1PVlBFfKPjzRKWGnMO5xwSkH+Ve5CBa/ZqV i89PA/212EyzxPX8U2xdeXTYJskEVIjbCDAJ0/lfKf2Q0ygOd2oMru9AnV+o6k0F 6uNO1FmdfKWZnIWlCwNhrGfCn/dApCgTxyL7FTJ2IMaJVs9y/ygWK32+ZA4m+QTs yfu9FkEErTfpjj/Yz8kWJ0tdg7oPZkv+RRKb1IV/X6VnWBtz+MYGKOcPO99MJ/UQ yQZoOJzQxYq2NQJ6h868p8UQiCSGW9Y3YgHmwgJZ1koVHUv+OXi/COrPBY3GkTvS e9QWIE5/7C6Xy6lONxzvqBCTaV+HzOwvsvPyp5V5UY54s/iOUjBGQXasyh69hsHc 4R6GXkgFnObQGD2BlpHfCppMLAc5je20NcWZDWncCottLa+EjQxMzQb1DumRhH3i IXm1BtIfmiTT1Ci/LIA78hfB50DB6bAAOubs8Z0US2nvFFWCd6Nj5ikS4BVy/jiy fddrc73yH4FOr+Cm6HEpUs1665zJ1E5W6WeM79jMa2qA1d2ZPEom7+iyKUYyMpCS EZMWvUNhkK2tC8R/b6nyJtUWLBkkarFCMSbVS4f1FPuvGsUIvdezXYivEHfzFpE3 3qL0xc8zhGJtIXVYZiib =1gXD -----END PGP SIGNATURE----- --=-gOnQlNNZN9w0nPLWWUou-- --===============6858362725583148101== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============6858362725583148101==--