From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dario Faggioli Subject: Re: Xen on ARM IRQ latency and scheduler overhead Date: Thu, 16 Feb 2017 13:20:14 +0100 Message-ID: <1487247614.6732.45.camel@citrix.com> References: <1486716022.3042.112.camel@citrix.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1257863704645073294==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Stefano Stabellini Cc: george.dunlap@eu.citrix.com, edgar.iglesias@xilinx.com, julien.grall@arm.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============1257863704645073294== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="=-nDchUSgtRj2V68nHUtm6" --=-nDchUSgtRj2V68nHUtm6 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 2017-02-10 at 10:32 -0800, Stefano Stabellini wrote: > On Fri, 10 Feb 2017, Dario Faggioli wrote: > > Right, interesting use case. I'm glad to see there's some interest > > in > > it, and am happy to help investigating, and trying to make things > > better. >=20 > Thank you! >=20 Hey, FYI, I am looking into this. It's just that I've got a couple of other things in my plate right now. > > Ok, do you (or anyone) mind explaining in a little bit more details > > what the app tries to measure and how it does that. >=20 > Give a look at app/xen/guest_irq_latency/apu.c: >=20 > https://github.com/edgarigl/tbm/blob/master/app/xen/guest_irq_latency > /apu.c >=20 > This is my version which uses the phys_timer (instead of the > virt_timer): >=20 > https://github.com/sstabellini/tbm/blob/phys-timer/app/xen/guest_irq_ > latency/apu.c >=20 Yep, I did look at those. > Edgar can jump in to add more info if needed (he is the author of the > app), but as you can see from the code, the app is very simple. It > sets > a timer event in the future, then, after receiving the event, it > checks > the current time and compare it with the deadline. >=20 Right, and you check the current time with: =C2=A0 now =3D aarch64_irq_get_stamp(el); which I guess is compatible with the values you use for the counter. > > > These are the results, in nanosec: > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 AVG= =C2=A0=C2=A0=C2=A0=C2=A0 MIN=C2=A0=C2=A0=C2=A0=C2=A0 MAX=C2=A0=C2=A0=C2=A0= =C2=A0 WARM MAX > > >=20 > > > NODEBUG no WFI=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = 1890=C2=A0=C2=A0=C2=A0 1800=C2=A0=C2=A0=C2=A0 3170=C2=A0=C2=A0=C2=A0 2070 > > > NODEBUG WFI=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 4850=C2=A0=C2=A0=C2=A0 4810=C2=A0=C2=A0=C2=A0 7030=C2=A0=C2= =A0=C2=A0 4980 > > > NODEBUG no WFI credit2=C2=A0 2217=C2=A0=C2=A0=C2=A0 2090=C2=A0=C2=A0= =C2=A0 3420=C2=A0=C2=A0=C2=A0 2650 > > > NODEBUG WFI credit2=C2=A0=C2=A0=C2=A0=C2=A0 8080=C2=A0=C2=A0=C2=A0 78= 90=C2=A0=C2=A0=C2=A0 10320=C2=A0=C2=A0 8300 > > >=20 > > > DEBUG no WFI=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 2252=C2=A0=C2=A0=C2=A0 2080=C2=A0=C2=A0=C2=A0 3320=C2=A0=C2=A0=C2= =A0 2650 > > > DEBUG WFI=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 6500=C2=A0=C2=A0=C2=A0 6140=C2=A0=C2=A0=C2=A0 8520= =C2=A0=C2=A0=C2=A0 8130 > > > DEBUG WFI, credit2=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 8050=C2=A0=C2=A0=C2= =A0 7870=C2=A0=C2=A0=C2=A0 10680=C2=A0=C2=A0 8450 > > >=20 > > > DEBUG means Xen DEBUG build. > > >=20 [...] > > > As you can see, depending on whether the guest issues a WFI or > > > not > > > while > > > waiting for interrupts, the results change significantly. > > > Interestingly, > > > credit2 does worse than credit1 in this area. > > >=20 > > This is with current staging right?=C2=A0 >=20 > That's right. >=20 So, when you have the chance, can I see the output of =C2=A0xl debug-key r =C2=A0xl dmesg Both under Credit1 and Credit2? > > I can try sending a quick patch for disabling the tick when a CPU > > is > > idle, but I'd need your help in testing it. >=20 > That might be useful, however, if I understand this right, we don't > actually want a periodic timer in Xen just to make the system more > responsive, do we? >=20 IMO, no. I'd call that an hack, and don't think we should go that route. Not until we have figured out and squeezed as much as possible all the other sources of latency, and that has proven not to be enough, at least. I'll send the patch. > > > Assuming that the problem is indeed the scheduler, one workaround > > > that > > > we could introduce today would be to avoid calling vcpu_unblock > > > on > > > guest > > > WFI and call vcpu_yield instead. This change makes things > > > significantly > > > better: > > >=20 > > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 AV= G=C2=A0=C2=A0=C2=A0=C2=A0 MIN=C2=A0=C2=A0=C2=A0=C2=A0 MAX=C2=A0=C2=A0=C2=A0= =C2=A0 WARM > > > MAX > > > DEBUG WFI (yield, no block)=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 2900=C2=A0=C2=A0=C2=A0 2190=C2=A0=C2=A0=C2=A0 5130=C2=A0=C2=A0= =C2=A0 5130 > > > DEBUG WFI (yield, no block) credit2=C2=A0 3514=C2=A0=C2=A0=C2=A0 2280= =C2=A0=C2=A0=C2=A0 6180=C2=A0=C2=A0=C2=A0 5430 > > >=20 > > > Is that a reasonable change to make? Would it cause significantly > > > more > > > power consumption in Xen (because xen/arch/arm/domain.c:idle_loop > > > might > > > not be called anymore)? > > >=20 > > Exactly. So, I think that, as Linux has 'idle=3Dpoll', it is > > conceivable > > to have something similar in Xen, and if we do, I guess it can be > > implemented as you suggest. > >=20 > > But, no, I don't think this is satisfying as default, not before > > trying > > to figure out what is going on, and if we can improve things in > > other > > ways. >=20 > OK. Should I write a patch for that? I guess it would be arm specific > initially. What do you think it would be a good name for the option? >=20 Well, I think such an option may be useful on other arches too, but we better measure/verify that before. Therefore, I'd be ok for this to be only implemented on ARM for now. As per the name, I actually like the 'idle=3D', and as values, what about 'sleep' or 'block' for the current default, and stick to 'poll' for the new behavior you'll implement? Or do you think it is at risk of confusion with Linux? An alternative would be something like 'wfi=3D[sleep,idle]', or 'wfi=3D[block,poll]', but that is ARM specific, and it'd mean we will need another option for making x86 behave similarly. > > But it would not be much more useful than that, IMO. >=20 > Why? Actually I know of several potential users of Xen on ARM > interested > exactly in this use-case. They only have a statically defined number > of > guests with a total amount of vcpu lower or equal to the number of > pcpu > in the system. Wouldn't a scheduler like that help in this scenario? > What I'm saying is that would be rather inflexible. In the sense that it won't be possible to have statically pinned and dynamically moving vcpus in the same guest, it would be hard to control what vcpu is statically assigned to what pcpu, making a domain statically assigned would mean move it to another cpupool (which is the only way to use a different scheduler, right now, in Xen), and things like this. I know there are static use cases... But I'm not entirely sure how static they really are, and whether they, in the end, will really like such degree of inflexibility. But anyway, indeed I can give you a scheduler that, provided it leaves in a cpupool with M pcpus, a soon as a new domain with n vcpus is moved inside the pool, statically assign its n0,n1...,nk,k<=3DM vcpus to a pcpu, and always stick with that. And we'll see what will happen! :-) Regards, Dario --=20 <> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) --=-nDchUSgtRj2V68nHUtm6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYpZkAAAoJEBZCeImluHPu3noP/iIwTPC+yIq+dGZ8elSYNqoU S0CpMrpxHCSleMfzVsZjnTJDBTMyD2Ak+pbcdmZb68jlxKzHuYZ2tpPOL4eQz/ey vQE3fvT0QmW0U8dWMqlKHVfBIfB63tgl1GgelOdG31MMJ2oksNWwkqri4JpX17uy GaZkMOW7omS2qsPZZzLwlqwbxAgXgKB0NQ2KUkN19F/R1L8Wv1/jbZ/tarWL+N8q 9XgO5FhAaEcWh+szglQzLZr2fT6qOPDaIybK8oUzJwcR5DGhP2r8Rtmhcno1+IPP fquDjEL76fINBFWrou6QdrmFUN7Oj1Le7kOcb5yVCCG73x5FcQmHVMDkdyJXyzTk TjQ6NU9GnRid0hNUJyPcIXNWXN8JpZ08fdlIgKbTw7FFMveAKZ+czfFbLn9S6v4P O7QUhTzNhiOj350hptjIyBe8YiygwtujPWpk6KF4ZImSuGoT4S7OQbd8/fgxzcQF MCtoIX8SbqWZW41/ZZx75D6I1TaewW2se+Jqpvpk39Mgf5QGxA0e9qyHtxssmMeN 7Yr/mV19tLSNkZ7t/cc5MdY2ODO8Bes+U37blRXWmdypFKvgihuxWSq+pDUmh67x T3v4WPcxw0m6Wz174oHas6hZIU9B4ym0ImZtE23nL3zb2mf+Pb7S8OG7nmNajekA ixKZR2qSAX1QB6RkRZkC =IvNg -----END PGP SIGNATURE----- --=-nDchUSgtRj2V68nHUtm6-- --===============1257863704645073294== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============1257863704645073294==--