From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started. Date: Fri, 11 Aug 2017 19:25:09 +0200 Message-ID: <1502472309.5719.35.camel@citrix.com> References: <150114201043.22910.12807057883146318803.stgit@Solace> <150114248433.22910.16140726025093688678.stgit@Solace> <20170809113823.GA40847@deinos.phlegethon.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4773016210969510265==" Return-path: Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dgDge-0007nl-Ry for xen-devel@lists.xenproject.org; Fri, 11 Aug 2017 17:25:28 +0000 In-Reply-To: <20170809113823.GA40847@deinos.phlegethon.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Tim Deegan Cc: xen-devel@lists.xenproject.org, Julien Grall , Stefano Stabellini , Jan Beulich , Andrew Cooper List-Id: xen-devel@lists.xenproject.org --===============4773016210969510265== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-VUIwlhf944mcBMZGtekm" --=-VUIwlhf944mcBMZGtekm Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 2017-08-09 at 12:38 +0100, Tim Deegan wrote: > Hi, >=20 Hey! :-) > At 10:01 +0200 on 27 Jul (1501149684), Dario Faggioli wrote: > > In Xen, that is impossible, and that's particularly problematic > > when system is idle (or lightly loaded) systems, as CPUs that > > are idle may never have the chance to tell RCU about their > > quiescence, and grace periods could extend indefinitely! >=20 > [...] > > The first step for fixing this situation is for RCU to record, > > at the beginning of a grace period, which CPUs are already idle. > > In fact, being idle, they can't be in the middle of any read-side > > critical section, and we don't have to wait for them to declare > > a grace period finished. >=20 > AIUI this patch fixes a bug where: > =C2=A0- a CPU is idle/asleep; > =C2=A0- it is added to the cpumask of a new RCU grace period; and > =C2=A0- because the CPU is asleep, the grace period never ends.=C2=A0 > Have I understood? >=20 Yes indeed. Basically, without this patch, all the CPUs are always added/considered for all the grace periods. And if there are some that are idle/asleep, we need to wait for them to wake up, before declared the grace period ended. So the duration of the grace period itself is out of any control (and in fact, it happens to be reasonable, on x86, even on fully idle system, while not so much on ARM). Even virtually infinite... :-O > I think we might be left with a race condition: > =C2=A0- CPU A is about to idle.=C2=A0=C2=A0It runs the RCU softirq and > =C2=A0=C2=A0=C2=A0clears itself from the current grace period. > =C2=A0- CPU B ends the grace period and starts a new one. > =C2=A0- CPU A calls rcu_idle_enter() and sleeps. > =C2=A0- The new grace period never ends. >=20 Yes, I think this is a thing. I've give it some thoughts, and will post something about it as another email. > Is that fixed by your later rcu_idle_timer patch?=C2=A0=C2=A0AIUI that's = only > invoked when the calling CPU has pending RCU callbacks. >=20 The timer is meant at dealing with the CPU with pending callback, yes. I don't think we can solve this issue with the timer, even if we extend its scope to all CPUs with whatever pending RCU processing... we'd still have problems with grace periods that starts (on CPU B) after the check for whether or not we need the timer has happened. > Or can it be fixed here by something like this in rcu_idle_enter? > =C2=A0- lock rcp->lock > =C2=A0- add ourselves to the idle mask > =C2=A0- if we are in rcp->cpumask: > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0- cpu_quiet() > =C2=A0- unlock rcp->lock >=20 If rcu_idle_enter() leaves inside an IRQ disabled section (e.g., in mwait_idle(), as it does right now), I can't take rcp->lock (because of spinlock IRQ safety). If I move it outside of that, then yes, the above may work. Or, actually, we may not even need it... But as I said, more on this in another message. > There's also the code at the top of rcu_check_quiescent_state() that > requres _two_ idle states per batch.=C2=A0=C2=A0I don't know what race th= at's > protecting against so I don't know whether we need to worry about it > here as well. :) >=20 Not sure I'm getting this part. It's not a race, it is that, literally, the first time (after someone has started a new batch) a CPU executes rcu_check_quiescent_state(), realizes we're in a new grace period: =C2=A0=C2=A0=C2=A06.681421066 -.-.-.|.--x-|--- d0v7 rcu_call fn=3D0xffff82d= 080207c4c, rcp_cur=3D-300, rdp_quiescbatch=3D-300 =C2=A0=C2=A0=C2=A06.681422601 -.-.-.|.--x-|--- d0v7 do_softirq, pending =3D= 0x00000010 =C2=A0=C2=A0=C2=A06.681422889 -.-.-.|.--x-|--- d0v7 rcu_pending? yes (ret= =3D2): no pending entries but new entries =C2=A0=C2=A0=C2=A06.681423214 -.-.-.|.--x-|--- d0v7 raise_softirq nr 3 =C2=A0=C2=A0=C2=A06.681423494 -.-.-.|.--x-|--- d0v7 softirq_handler nr 3 =C2=A0=C2=A0=C2=A06.681423494 -.-.-.|.--x-|--- d0v7 rcu_process_callbacks, = rcp_completed=3D-300, rdp_batch=3D0, rdp_curlist: null, rdp_nxtlist: yes, r= dp_donelist: null =C2=A0=C2=A0=C2=A06.681423494 -.-.-.|.--x-|--- d0v7 rcu_next_batch, rcp_cur= =3D-300, rdp_batch=3D-299, rcp_next_pending? no =C2=A0=C2=A0=C2=A06.681423494 -.-.-.|.--x-|--- d0v7 rcu_start_batch, rcp_cu= r=3D-299, rcp_cpumask=3D0x00001442 1->6.681423494 -.-.-.|.--x-|--- d0v7 rcu_check_quiesc_state, rcp_cur=3D-299= , rdp_quiescbatch=3D-300, rdp_qs_pending: no =C2=A0=C2=A0=C2=A06.681423494 -.-.-.|.--x-|--- d0v7 rcu_grace_start, rcp_cu= r=3D-299 The second realizes the grace period ended: =C2=A0=C2=A0=C2=A06.681425277 -.-.-.|.--x-|--- d0v7 do_softirq, pending =3D= 0x00000010 =C2=A0=C2=A0=C2=A06.681425647 -.-.-.|.--x-|--- d0v7 rcu_pending? yes (ret= =3D5): waiting for CPU to quiesce (rdp_qs_pending=3D1) =C2=A0=C2=A0=C2=A06.681425872 -.-.-.|.--x-|--- d0v7 raise_softirq nr 3 =C2=A0=C2=A0=C2=A06.681426117 -.-.-.|.--x-|--- d0v7 softirq_handler nr 3 =C2=A0=C2=A0=C2=A06.681426117 -.-.-.|.--x-|--- d0v7 rcu_process_callbacks, = rcp_completed=3D-300, rdp_batch=3D-299, rdp_curlist: yes, rdp_nxtlist: null= , rdp_donelist: null 2->6.681426117 -.-.-.|.--x-|--- d0v7 rcu_check_quiesc_state, rcp_cur=3D-299= , rdp_quiescbatch=3D-299, rdp_qs_pending: yes =C2=A0=C2=A0=C2=A06.681426117 -.-.-.|.--x-|--- d0v7 rcu_grace_done, rcp_cur= =3D-299, rcp_completed=3D-300, rdp_quiescbatch=3D-299 =C2=A0=C2=A0=C2=A06.681426117 -.-.-.|.--x-|--- d0v7 rcu_cpu_quiet, rcp_cur= =3D-299, rcp_cpumask=3D0x00001042 This trace above was for the CPU which actaully started the grace period, and which has the callback queued. Here's the same for another one: 6.682213282 -.-.-.x.----|--- d0v2 vcpu_block d0v2 6.682213549 -.-.-.x.----|--- d0v2 raise_softirq nr 1 6.682213999 -.-.-.x.----|--- d0v2 do_softirq, pending =3D 0x00000002 6.682214397 -.-.-.x.----|--- d0v2 rcu_pending? yes (ret=3D4): waiting fo= r CPU to quiesce (rdp_qs_pending=3D0) 6.682214667 -.-.-.x.----|--- d0v2 raise_softirq nr 3 [...] 6.682224404 -.-.-.x.----|--- d32767v6 softirq_handler nr 3 6.682224404 -.-.-.x.----|--- d32767v6 rcu_process_callbacks, rcp_complet= ed=3D-300, rdp_batch=3D0, rdp_curlist: null, rdp_nxtlist: null, rdp_donelis= t: null 1->6.682224404 -.-.-.x.----|--- d32767v6 rcu_check_quiesc_state, rcp_cur=3D= -299, rdp_quiescbatch=3D-300, rdp_qs_pending: no 6.682224404 -.-.-.x.----|--- d32767v6 rcu_grace_start, rcp_cur=3D-299 6.682225310 -.-.-.x.----|--- d32767v6 do_softirq, pending =3D 0x00000000 =C2=A0=C2=A0=C2=A06.682225570 -.-.-.x.----|--- d32767v6 rcu_pending? yes (r= et=3D5): waiting for CPU to quiesce (rdp_qs_pending=3D1) =C2=A0=C2=A0=C2=A06.682225790 -.-.-.x.----|--- d32767v6 raise_softirq nr 3 =C2=A0=C2=A0=C2=A06.682226032 -.-.-.x.----|--- d32767v6 softirq_handler nr = 3 =C2=A0=C2=A0=C2=A06.682226032 -.-.-.x.----|--- d32767v6 rcu_process_callbac= ks, rcp_completed=3D-300, rdp_batch=3D0, rdp_curlist: null, rdp_nxtlist: nu= ll, rdp_donelist: null 2->6.682226032 -.-.-.x.----|--- d32767v6 rcu_check_quiesc_state, rcp_cur=3D= -299, rdp_quiescbatch=3D-299, rdp_qs_pending: yes =C2=A0=C2=A0=C2=A06.682226032 -.-.-.x.----|--- d32767v6 rcu_grace_done, rcp= _cur=3D-299, rcp_completed=3D-300, rdp_quiescbatch=3D-299 =C2=A0=C2=A0=C2=A06.682226032 -.-.-.x.----|--- d32767v6 rcu_cpu_quiet, rcp_= cur=3D-299, rcp_cpumask=3D0x00000000 But maybe I've not understood properly what you meant... Thanks and Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-VUIwlhf944mcBMZGtekm Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJZjeh4AAoJEBZCeImluHPudwwP/AiImkIWhD3JrOvp2MuTqTtf 8KclWqXp5yEblJ9p3cuU+ousWhIUoyCiJdSL/JUqsz9/P7cZjglGlz0tBhrMEd+e b5kZ2m6ZnIYHYtLvLskLZ+gZ5//0b0dEMwY4N6zlQuaGIk+VCCLBSifOHG2CBRau EvixeldMlZnyGHwPzcHHR1AB/vYBS/8mPJA/fju3RUngjA/XSHIpHeAJmfWwriuQ 5LWyMCb86TJwQVI+v5JDEeyFH15M5n2p6ueqRLiJc8wMp9r7RrVYnJUwgFwzZ5rF E9q3qcRXjkHqWfEfornueiz+XzKN9DpGPnH/UABa2IU+TNOrAhzhLLggVYU+9Rl3 zWTCs3Ao4sC33JNoK9n75h2urcloxsvJy8Si5Y9B8SWmYMcIXUEHB9RYOaV0dTIR NDbNfGSjEqKXeuXlloktPun3qYdK2ysAcqxPo2M8+q23DM+B9mOJKLumLd/GbQX6 nOZzfHFDS4ZgmdC4/yInIkVYf4SpHX3twlGk3PzVd/mbDAev1LtUg4+2P02JYe8x +y7kd6+DKvDmryAUg6OrX0sBGvzr82Ow0/C2Mg3x0HdAFtSYcWv9Q7cUJ6UTE74c ZVO4FOAs8+/2QnDvFdBDAy6L3RD9x2CtD0beQ4GxMErSGxYs1Ss1yhcaRgWkIhqJ KAURAlCpwTdChbsXLd2K =bgwX -----END PGP SIGNATURE----- --=-VUIwlhf944mcBMZGtekm-- --===============4773016210969510265== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============4773016210969510265==--