From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: [PATCH 0/3] VMX: Properly handle pi descriptor and per-cpu blocking list Date: Fri, 24 Jun 2016 09:22:32 +0200 Message-ID: <1466752952.18398.81.camel@citrix.com> References: <1463734431-22353-1-git-send-email-feng.wu@intel.com> <573F02B102000078000ED304@prv-mh.provo.novell.com> <5742D6BB02000078000EDA57@prv-mh.provo.novell.com> <5742E0BE02000078000EDABD@prv-mh.provo.novell.com> <1464007152.21930.55.camel@citrix.com> <1464098547.21930.107.camel@citrix.com> <1466694703.18398.69.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6239519136574168291==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: "Wu, Feng" , Jan Beulich Cc: "george.dunlap@eu.citrix.com" , "andrew.cooper3@citrix.com" , "Tian, Kevin" , "keir@xen.org" , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org --===============6239519136574168291== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-eNsElQS3DmoqllWJw3h0" --=-eNsElQS3DmoqllWJw3h0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2016-06-24 at 06:11 +0000, Wu, Feng wrote: > > -----Original Message----- > > From: Dario Faggioli [mailto:dario.faggioli@citrix.com] > > No, because we call cpu_disable_scheduler() from __cpu_disable(), > > only > > when system state is SYS_STATE_suspend already, and hence we take > > the > > then branch of the 'if', which does never return an error. > Thanks for the elaboration. I find __cpu_disable() can be called with > system state not being SYS_STATE_suspend. Here is my experiment: >=20 > 1. Launch a guest and pin vCPU 3 to pCPU 3 (which makes the > experiment simpler) > 2. offline pCPU 3 via "xen-hptool cpu-offline 3" >=20 Ah, yes, of course. I should have included cpu-hot(un)plug in my analysis in the first place, sorry for not doing so. > The call path of the above steps is: > arch_do_sysctl() > =C2=A0 -> cpu_down_helper() > =C2=A0=C2=A0=C2=A0=C2=A0-> cpu_down() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0-> take_cpu_down() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0-> __cpu_disable() > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0-> cpu_disabl= e_scheduler() (enter the 'else'=C2=A0=C2=A0part) >=20 Right, and the important part is this one: > int stop_machine_run(int (*fn)(void *), void *data, unsigned int cpu) > { >=20 > ...... >=20 > point 1 >=20 > =C2=A0=C2=A0=C2=A0=C2=A0for_each_cpu ( i, &allbutself ) > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0tasklet_schedule_on_cpu(&= per_cpu(stopmachine_tasklet, i), i); >=20 > point 2 > ...... >=20 > } >=20 > at point 1 above,=C2=A0 > vCPU3->runstate.state: 0, vCPU3->is_running: 1 > while at point 2 above: > vCPU3->runstate.state: 1, vCPU3->is_running: 0 >=20 This is exactly as you describe. cpu hotplug is done in stop machine context. Check the comment close to the definition of stop_machine_run: /** =C2=A0* stop_machine_run: freeze the machine on all CPUs and run this funct= ion =C2=A0* @fn: the function to run =C2=A0* @data: the data ptr for the @fn() =C2=A0* @cpu: the cpu to run @fn() on (or all, if @cpu =3D=3D NR_CPUS). =C2=A0* =C2=A0* Description: This causes every other cpu to enter a safe point, wit= h =C2=A0* each of which disables interrupts, and finally interrupts are disab= led =C2=A0* on the current CPU.=C2=A0=C2=A0The result is that none is holding a= spinlock =C2=A0* or inside any other preempt-disabled region when @fn() runs. =C2=A0* =C2=A0* This can be thought of as a very heavy write lock, equivalent to =C2=A0* grabbing every spinlock in the kernel. */ As you discovered yourself, this is achieved by forcing the execution of a tasklet on all the pcpus, which include pCPU 3 of your example. So, vCPU 3 was running, but then some called stop_machine_run(), which causes the descheduling of vCPU 3, and the execution of the stopmachine tasklet. Thet's why you find is_running to be 0, and that's why =C2=A0we never retur= n EAGAIN. > I tested it for many times and got the same result. I am not sure the > vcpu > state transition just happens to occurs here or not? If the > transition doesn't > happen and is_running is still 1 when we get vcpu_migrate() in > cpu_disable_scheduler() in the above case, should it be a problem? > I'm not sure what you mean here (in particular with "just happen to occur"). If you're wondering whether the fact that vCPU 3 gets descheduled happens by chance or by design, it's indeed the latter, and so, no, we don't have a problem with this code path. Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-eNsElQS3DmoqllWJw3h0 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXbN+5AAoJEBZCeImluHPujUAQAOd0WnN0hW9fV+/diOYwv9BR z9BbB7tjo7900wEzgZye/2ri8k7iZecz44j4QKHSTGAFzP8tkYr4sOv4qelRG2pM 70Mxv2fh3gDVfhwtEQYWfu0I33jub7BQ7euy12PZsbWe1J7eDB12t9kbrB8ed91z 3cvb4cUPhyPEjGHXrjFhVyBZAP7dFaajCa85Abj406feJMiX5XsC280UgEIIWqbu BhtuTAnxst3Fw8NfnwHzQI4mrdUy7xWL34ulYZ8gjXYgELHz1LlH9tM6CRr0CwrM Y1I9+ImuGcdhTVMOkjlACXupBQ6fy2Rgdqt12Na2xFU0tzZ7oCVkevjWPr30ZcWi sTGlGmmaJLusJESrQQk0CYwoZ0sAvdRa1yb/to8OxQZ5uNgaWPU3SppepKrBQwZL YhtuL8CbUt+MyEQwZ5znPn1aRdGYY5Aq/kRgGXwCS2VCDFr8S0fsGHBFzRHe+3Yf 1YBAKLnB2RLetkzRK07Uxoq5X7szTauxC6WUdV1COCHgn6vggvE1sJvDKDfNHBsj 3MFAaP8wgwr3mf6tU8Oivcd3hRRiaw+/tvxC2E9LIVAMWB57aJUeYEKiiIW8EwlF 7+SFix7q/+o4QIZ1mZ2UcWAL9hIOwPXt2bR8tWvDQR3sxvhg37zkgGkuYVHajlty ZQdxVK0jVbplDTPADW7C =C0cE -----END PGP SIGNATURE----- --=-eNsElQS3DmoqllWJw3h0-- --===============6239519136574168291== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwOi8vbGlzdHMueGVuLm9y Zy94ZW4tZGV2ZWwK --===============6239519136574168291==--