From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dario Faggioli <dario.faggioli@citrix.com>
Subject: Re: [PATCH] Modified RTDS scheduler to use an
 event-driven model instead of polling.
Date: Fri, 12 Jun 2015 11:28:13 +0200
Message-ID: <1434101293.18921.45.camel@citrix.com>
References: <1433764019-2194-1-git-send-email-dgolomb@seas.upenn.edu>
	<1433854393.2403.141.camel@citrix.com>
	<CAENZ-+=kgB83=kcSt_oi_NOBbwKJixn1jdKshtUJbjjARP08HQ@mail.gmail.com>
	<1433958412.2482.68.camel@citrix.com>
	<CAENZ-+msfnQGUMDY_AK+BHdgn--hHFQzXmxZXDBZ5tJgn_s9vQ@mail.gmail.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============8490284930797920307=="
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <CAENZ-+msfnQGUMDY_AK+BHdgn--hHFQzXmxZXDBZ5tJgn_s9vQ@mail.gmail.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Meng Xu <xumengpanda@gmail.com>
Cc: Wei Liu <wei.liu2@citrix.com>, Sisu Xi <xisisu@gmail.com>, George Dunlap <george.dunlap@eu.citrix.com>, "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>, "mengxu@cis.upenn.edu" <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>, Chong Li <lichong659@gmail.com>, Dagaen Golomb <dgolomb@seas.upenn.edu>
List-Id: xen-devel@lists.xenproject.org

--===============8490284930797920307==
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature"; boundary="=-ohK+mdeYYSs1XRrCzFeN"

--=-ohK+mdeYYSs1XRrCzFeN
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 2015-06-10 at 22:50 -0700, Meng Xu wrote:
> Hi Dario,
>=20
> First I think I got most of the points you raised/explained! They are
> very very clear and thanks for the detailed explanation of your idea!
>=20
Glad you got it and (sort of :-) ) agree.

> >> 2015-06-09 5:53 GMT-07:00 Dario Faggioli <dario.faggioli@citrix.com>:
> >
> >> 2) Using cyclictest as Dario mentioned before to test the real-time
> >> performance at end user. Dagaen, I can provide you the commands to run
> >> it, which is actually quite simple to run.

> > So, yes, seeing some results would be great, independently from the
> > specific work done in this patch.
>=20
> Right! I have some ideas about this test but won't want to mess up the
> focus of this thread. :-)
> I will raise this test again when we come to it.
>=20
Ok. Looking forward to see this happening.

> > Yeah, and what I can't figure out is why you decided to do so. The
> > reason I don't like it is that things become a lot (more) complex to
> > understand, maintain and modify.
>=20
> Now I get why you think it is harder to maintain. Right. reusing the
> timer will just make the rt_schedule complex and make the hot path
> longer.
>
Exactly.

> >               runq_tickle(snext)...WAIT, WHAT?!?! :-O
> >
> > I mean, and I'm noticing this now, if the replenishments done during a
> > particular call to rt_schedule() are not enough to change the situation
> > on that particular pcpu, and hence the task which was running (and that
> > you are deliberately disturbing with _a_full_execution_ of the
> > scheduler's core function!) should continue to do so, you're calling
> > runq_tickle() right on it. So, AFAICT, you're tickling scurr, not a
> > newly replenished vcpu!
> >
> > Am I missing or misreading something? Let's assume not, and see what
> > happens in this case...
>=20
> You are right! Although this issue (i.e., tickling on scurr instead of
> the next high priority VCPU) can be "fixed" (dirtily), it can be
> avoided with the design option a) you said.
>=20
Of course it can be fixed.. Pretty much everything can! Point is the
reason why it happened, and how to make these things not happen and/or
easier to figure out. That's (one of) the point(s) of keeping things
simple and self contained, even within a single component (like a
scheduler), instead of "let's do everything in this 10k lines
function!" :-P

Glad that you saw what I meat.

> > Jokes apart, if the above analysis is accurate, I think this is a good
> > enough example of what I meant when saying to Dagaen "this is making
> > things too complex".
> Yes. The flow chart you drew is very clear! Thanks!
>=20
> @Dagaen, what do you think? Please comment on Dario's reply with your
> opinion and raise any of your concerns.
>=20
Indeed.

> > Here's how I envision things to go. Again, I'm talking about sticking
> > with option a), so no per-vcpu timers, just 1 timer and a global queue,
> > which now is a replenishment queue:
> >
> >   timer interrupt
> >     TIMER_SOFTIRQ raised
> >       process softirqs
> >         replenishment_timer_handler()
> >           [spin_lock]
> >             <for_each_replenishment_event(repl_time < NOW()) {
> >                replenish(vcpu)
> >                runq_tickle(vcpu)
> >              }>
> >           [spin_lock]
> >
> > Then, on the tickled pcpus (if any):
> >
> >   process softirqs
> >     SCHEDULE_SOFTIRQ
> >       rt_schedule()
> >         [spin_lock]
> >         snext =3D runq_pick(): snext =3D=3D vcpu X
> >         [spin_unlock]
> >
> > And get rid of __repl_update(), which makes my eyes bleed every time I
> > open sched_rt.c. :-)
> >
> > Oh, and note that we probably can use a new spin lock for the
> > replenishment queue, different from the existing global scheduler
> > spinlock, lowering contention and increasing scalabiliy even further!
>=20
> Here is the main question I have about your advice.
> If we are introducing a new spin lock for the replenishment queue,
> what is the relation between the new lock and the old lock of the
> runq?
>
That's hard to tell in advance, you'll know it completely only when
trying to implement it. But, yes, when more locks are involved, it's
impotant to figure out the relationships between each other, or we risk
introducing deadlock or, even if things are correct, fail to improve the
performance (or do even worse!!).

The idea is, since the two operations (scheduling/budget enforcement and
replenishments) are logically independent, and if they're implemented in
a way that stress this independence, then it make sens to try to use
different spin locks.

As soon as you have more than one spin lock, what you should pay the
most attention to is, if they need to 'nest' (one is acquired when the
other is already being held), that has to happen consistently, or
deadlock will occur! :-/

> Because the deadline decides the priority of VCPUs and thus decides
> the ordering of VCPUs in the runq, the "replenish(vcpu)" will operate
> on the runq as well. As shown in the workflow in your reply:
>      >                replenish(vcpu)
>      >                runq_tickle(vcpu)
> The runq_tickle(vcpu) will pick the desired CPU. On that CPU,
> rt_schedule will pick snext by runq_pick(). Therefore, the replenished
> VCPU need to be resorted in the runq. So replenish(vcpu) will operates
> on the runq.
>=20
Oh, sure, absolutely. Calling __runq_insert() and runq_tickle(), clearly
must be done with the runqueue lock (which is the global scheduler lock,
at the moment) held.

Reading again what I wrote, I realize that I probably expressed myself
really bad, when hinting at that "let's use another lock" thing.

What I really wanted to say is that, if there will be a replenishment
queue, i.e., an event queue on which replenishment events are queued,
and that the timer handling routine scans, there may be some locking
required for consistently dealing with the queue itself. If that will be
the case, we probably can use _another_ spin lock, and let the timer
handling routine acquire the runqueue lock *only* when/if it gets to do
the insertions and the ticklings.
Sorry for this... I added that paragraph quickly, right before hitting
'send'... I probably would have better not done so! :-P

Again, this may or may not be necessary, and may or may not be possible,
depending on whether the two locks would nest nicely everywhere they're
required. This also depends on how we'll deal with the replenishment
timer.

So, the bottom line of all this is: get down to it, ad we'll see how it
will be better put together! :-D

> Another question in my mind is: do we really need the replenish queue
> to record the replenish time? Because the period =3D deadline in the
> current implementation, the deadline of VCPUs in runq is actually the
> replenish time. (We are reusing the runq here and I'm unsure if it
> good or not in terms of maintenance.)
>=20
Again, let's see. The timer will be always programmed to fire at the
most imminent replenishment event, so it seems to me that you need
something that will tell you, when servicing replenishment X, what's the
next time instant you want the timer to fire, to perform the next
replenishment.

Actually, the depleted queue you have know, can well become the
replenishment queue (it will have to be kept sorted, though, I think).
Whether you want to keep it as the tail of the actual runqueue, or split
the two of them, it does not matter much, IMO.

Regards,
Dario

--=20
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

--=-ohK+mdeYYSs1XRrCzFeN
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEUEABECAAYFAlV6pi4ACgkQk4XaBE3IOsSoWQCghtVtEjAvO+/EoeUH3XoEIGOK
LtoAmJGqCGsqqX/GG0RSs/vibSOIOfA=
=XNuZ
-----END PGP SIGNATURE-----

--=-ohK+mdeYYSs1XRrCzFeN--


--===============8490284930797920307==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

--===============8490284930797920307==--