From: Meng Xu <xumengpanda@gmail.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
Tianyang Chen <tiche@seas.upenn.edu>,
George Dunlap <george.dunlap@citrix.com>,
Dagaen Golomb <dgolomb@seas.upenn.edu>,
Meng Xu <mengxu@cis.upenn.edu>
Subject: Re: [PATCH v7]xen: sched: convert RTDS from time to event driven model
Date: Wed, 9 Mar 2016 23:00:35 -0500 [thread overview]
Message-ID: <CAENZ-+=2R7st5jP6fKMmt61_AFQy16iT01oGS_5+iSCFXJ4o+Q@mail.gmail.com> (raw)
In-Reply-To: <1457538364.3102.418.camel@citrix.com>
On Wed, Mar 9, 2016 at 10:46 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Tue, 2016-03-08 at 23:33 -0500, Meng Xu wrote:
>> I didn't mark out all repeated style issues. I think you can correct
>> all of the style issues, such as the spaces in the code, in the next
>> version.
>>
> Yes, please. At v7, style issues shouldn't definitely be there any
> longer.
>
> Have another look at CODING_STYLE, perhaps, especially focusing on
> spaces in conditional expressions and line length.
Tianyang, please correct those coding styles in next version. Now it's
time to pay more attention to coding style...
>
>> Hi Dario,
>> Could you help check if my two concerns in the comments below make
>> sense?
>> One is small, about if we should use () to make the code clearer.
>> The other is about the logic in the replenishment handler.
>>
> I think tickling during replenishment does have issues, yes. See below.
>
> Anyway, Meng, can you trim your quotes when replying? By this, I mean
> cut the part of conversation/patches that are not relevant for what you
> are saying in this very email (but leave enough context in place for
> making what you want to point out understandable).
Sure.. Got it.
>> > @@ -1150,6 +1276,71 @@ rt_dom_cntl(
>> > return rc;
>> > }
>> >
>> > +/*
>> > + * The replenishment timer handler picks vcpus
>> > + * from the replq and does the actual replenishment.
>> > + */
>> > +static void repl_handler(void *data){
>> > + unsigned long flags;
>> > + s_time_t now = NOW();
>> > + struct scheduler *ops = data;
>> > + struct rt_private *prv = rt_priv(ops);
>> > + struct list_head *replq = rt_replq(ops);
>> > + struct timer *repl_timer = prv->repl_timer;
>> > + struct list_head *iter, *tmp;
>> > + struct rt_vcpu *svc = NULL;
>> > +
>> > + spin_lock_irqsave(&prv->lock, flags);
>> > +
>> > + stop_timer(repl_timer);
>> > +
>> > + list_for_each_safe(iter, tmp, replq)
>> > + {
>> > + svc = replq_elem(iter);
>> > +
>> > + if ( now < svc->cur_deadline )
>> > + break;
>> > +
>> > + rt_update_deadline(now, svc);
>> > +
>> > + /*
>> > + * when the replenishment happens
>> > + * svc is either on a pcpu or on
>> > + * runq/depletedq
>> > + */
>> > + if( __vcpu_on_q(svc) )
>> > + {
>> > + /* put back to runq */
>> > + __q_remove(svc);
>> > + __runq_insert(ops, svc);
>> > + }
>> > +
>> > + /*
>> > + * tickle regardless where it's at
>> > + * because a running vcpu could have
>> > + * a later deadline than others after
>> > + * replenishment
>> > + */
>> > + runq_tickle(ops, svc);
>> Well, I'm thinking about the correctness of tickle here.
>> The svc is running on a core, say P_i, right now. After replenishing
>> the budget for svc, svc's deadline also changes. So svc should
>> probably be preempted by the head vcpu in the runq. If that's the
>> case, we should use the head of runq to tickle, instead of using svc.
>> Right?
>>
> I agree that, in case of replenishment of a running vcpu, calling
> runq_tickle() like this is not correct. In fact, the whole idea of
> tickling in Credit1 and 2, from which it has been borrowed to here, was
> meant at (potentially) putting a vcpu in execution, not vice versa. :-/
>
> I therefore agree that, if svc ends up with a later deadline than the
> first vcpu in the runque, we should actually call run_tickle() on the
> latter!
>
>> Actually, is the runq_tickle necessary here? Can we just check if the
>> updated svc has higher priority (smaller deadline) than the head vcpu
>> in the runq? If so, we just tickle the cpu P_i, where svc is
>> currently
>> running on.
>>
> Well, if we are replenishing a vcpu that is depleted, whose new
> deadline can be earlier than any of the ones of the vcpus that are
> running (can't it?) and/or there can be idle vcpus on which you can run
> it, now that it has budget. So, in this case, I think we need what's
> done in runq_tickle()...
>
> The third case is that we replenish a vcpu that is in the runqueue, so
> it had budget, but was not running. In that case, there should be no
> chance that it should preempt a running vcpu, as it was waiting in the
> runqueue, which means --if we have M pcpus-- it was not among the M
> earliest deadline vcpus, and we just pushed its deadline away!
> It should also not be necessary to do any deadline comparison with
> other vcpus in the runqueue. In fact, still considering that we are
> moving the deadline ahead, it's going to either be in the same or
> "worst" position in the runqueue.
>
>> But either way, I'm thinking the logic here is incorrect. If the
>> updated svc has a lower priority, you will end up tickle no core
>> although the svc should be preempted.
>>
> It seems to me that the logic here could be (sort-of):
>
> for_each_to_be_replenished_vcpu(v)
> {
> deadline_queue_remove(replq, v);
> rt_update_deadline(v);
>
> if ( curr_on_cpu(v->processor) == v)) //running
> {
> if ( v->cur_deadline >= runq[0]->cur_deadline )
> tickle(runq[0]); /* runq[0] => first vcpu in the runq */
> }
> else if ( __vcpu_on_q(v) )
> {
> if ( v->cur_budget == 0 ) //depleted
> tickle(v);
> else //runnable
> /* do nothing */
> }
> deadline_queue_insert(v, replq);
> }
>
> Thoughts?
I like the logic here. But there is one thing to note/change.
After we run rt_update_deadline(v), the v->cur_deadline will be set to
v->budget, which means that we cannot use the v->cur_deadline to
determine if it is in runq or depletedq.
I suggest to have a flag to determine if the v is in depletedq by
v->cur_budget == 0 first before we run rt_update_deadline(v); and use
the flag later.
>
> I actually think there may be room for a nasty race, in case we do more
> than one replenishments.
>
> In fact, assume that, at time t:
> - we do the replenishment of v1, which is running and v2, which is
> runnable,
> - we have 1 cpu,
> - v1 and v2 have, currently, the same deadline == t,
> - v1's new deadline is going to be t+100, and v2's new deadline is
> going to be t+150,
> - v2 is the only (i.e., also the first) runnable vcpu,
> - v1's replenishment event comes first in the replenishment queue.
>
> With the code above, we update v1's deadline (to t+100), go check it
> against v2's _not_updated_ deadline (t) and (since t+100 > t) find that
> a preemption is necessary, so we call tickle(v2). That will raise
> SCHEDULE_SOFTIRQ for the cpu, because v2 should --as far as situation
> is right now-- preempt v1.
>
> However, right after that, we update v2's deadline to t+150, and we do
> nothing. So, even if the vcpu with the earliest deadline (v1) is
> running, we reschedule.
Hmm, I kind of get what you want to deliver with the example here. I agree.
I will just summarize what I understand here:
When we replenish VCPU one by one and tickle them one by one, we may
end up use a to-be-updated vcpu to tickle, which will probably be
"kicked out" by the next to-be-updated vcpu.
The rt_schedule will probably do a noop for the to-be-updated vcpu
since it can tell this vcpu actually misses deadline (because this
vcpu is about to update the deadline, it means this vcpu just
reaches/passed the current deadline.)
>
> This should be harmless, as we do figure out during rt_schedule() that
> no real context switch is necessary, but I think it would better be
> avoided, if possible.
Right
.
>
> It looks to me that we can re-arrange the code above as follows:
>
> for_each_to_be_replenished_vcpu(v)
> {
> rt_update_deadline(v);
> }
we have to record the number of VCPUs that has updated their deadline,
so that we know when to stop for the next loop..
>
> for_each_to_be_replenished_vcpu(v)
> {
> deadline_queue_remove(replq, v);
>
> if ( curr_on_cpu(v->processor) == v)) //running
> {
> if ( v->cur_deadline >= runq[0]->cur_deadline )
> tickle(runq[0]); /* runq[0] => first vcpu in the runq */
> }
> else if ( __vcpu_on_q(v) )
> {
> if ( v->cur_budget == 0 ) //depleted
> tickle(v);
> else //runnable
> /* do nothing */
> }
> deadline_queue_insert(v, replq);
> }
>
> Basically, by doing all the replenishments (which includes updating all
> the deadlines) upfront, we should be able to prevent the above
> situation.
>
> So, again, thoughts?
I think we need to decide which vcpu is on depleted q before update
deadline and we also need to record which vcpus should be updated. So
I added some code into your code:
#define __RTDS_is_depleted 3
#define RTDS_is_depleted (1<<__RTDS_is_depleted)
int num_repl_vcpus = 0;
for_each_to_be_replenished_vcpu(v)
{
if (v->cur_budget <= 0)
set_bit(__RTDS_is_depleted, &v->flags);
rt_update_deadline(v);
num_repl_vcpus++;
}
for_each_to_be_replenished_vcpu(v)
{
deadline_queue_remove(replq, v);
if ( curr_on_cpu(v->processor) == v)) //running
{
if ( v->cur_deadline >= runq[0]->cur_deadline )
tickle(runq[0]); /* runq[0] => first vcpu in the runq */
}
else if ( __vcpu_on_q(v) )
{
if ( v->flags & RTDS_is_depleted ) //depleted
{
clear_bit(__RTDS_is_depleted, &v->flags);
tickle(v);
}
else //runnable
/* do nothing */
}
deadline_queue_insert(v, replq);
/* stop at the vcpu that does not need replenishment */
num_repl_vcpus--;
if (!num_repl_vcpus)
break;
}
What do you think this version?
Thanks and Best Regards,
Meng
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-03-10 4:00 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-05 1:39 [PATCH v7]xen: sched: convert RTDS from time to event driven model Tianyang Chen
2016-03-09 4:33 ` Meng Xu
2016-03-09 15:46 ` Dario Faggioli
2016-03-10 4:00 ` Meng Xu [this message]
2016-03-10 10:38 ` Dario Faggioli
2016-03-10 15:28 ` Meng Xu
2016-03-10 16:43 ` Dario Faggioli
2016-03-10 18:08 ` Meng Xu
2016-03-10 23:53 ` Dario Faggioli
2016-03-11 0:33 ` Meng Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAENZ-+=2R7st5jP6fKMmt61_AFQy16iT01oGS_5+iSCFXJ4o+Q@mail.gmail.com' \
--to=xumengpanda@gmail.com \
--cc=dario.faggioli@citrix.com \
--cc=dgolomb@seas.upenn.edu \
--cc=george.dunlap@citrix.com \
--cc=mengxu@cis.upenn.edu \
--cc=tiche@seas.upenn.edu \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).