Re: [PATCH v7]xen: sched: convert RTDS from time to event driven model

From: Meng Xu <xumengpanda@gmail.com>
To: Dario Faggioli <dario.faggioli@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Tianyang Chen <tiche@seas.upenn.edu>,
	George Dunlap <george.dunlap@citrix.com>,
	Dagaen Golomb <dgolomb@seas.upenn.edu>,
	Meng Xu <mengxu@cis.upenn.edu>
Subject: Re: [PATCH v7]xen: sched: convert RTDS from time to event driven model
Date: Wed, 9 Mar 2016 23:00:35 -0500	[thread overview]
Message-ID: <CAENZ-+=2R7st5jP6fKMmt61_AFQy16iT01oGS_5+iSCFXJ4o+Q@mail.gmail.com> (raw)
In-Reply-To: <1457538364.3102.418.camel@citrix.com>

On Wed, Mar 9, 2016 at 10:46 AM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On Tue, 2016-03-08 at 23:33 -0500, Meng Xu wrote:
>> I didn't mark out all repeated style issues. I think you can correct
>> all of the style issues, such as the spaces in the code, in the next
>> version.
>>
> Yes, please. At v7, style issues shouldn't definitely be there any
> longer.
>
> Have another look at CODING_STYLE, perhaps, especially focusing on
> spaces in conditional expressions and line length.

Tianyang, please correct those coding styles in next version. Now it's
time to pay more attention to coding style...

>
>> Hi Dario,
>> Could you help check if my two concerns in the comments below make
>> sense?
>> One is small, about if we should use () to make the code clearer.
>> The other is about the logic in the replenishment handler.
>>
> I think tickling during replenishment does have issues, yes. See below.
>
> Anyway, Meng, can you trim your quotes when replying? By this, I mean
> cut the part of conversation/patches that are not relevant for what you
> are saying in this very email (but leave enough context in place for
> making what you want to point out understandable).

Sure.. Got it.

>> > @@ -1150,6 +1276,71 @@ rt_dom_cntl(
>> >      return rc;
>> >  }
>> >
>> > +/*
>> > + * The replenishment timer handler picks vcpus
>> > + * from the replq and does the actual replenishment.
>> > + */
>> > +static void repl_handler(void *data){
>> > +    unsigned long flags;
>> > +    s_time_t now = NOW();
>> > +    struct scheduler *ops = data;
>> > +    struct rt_private *prv = rt_priv(ops);
>> > +    struct list_head *replq = rt_replq(ops);
>> > +    struct timer *repl_timer = prv->repl_timer;
>> > +    struct list_head *iter, *tmp;
>> > +    struct rt_vcpu *svc = NULL;
>> > +
>> > +    spin_lock_irqsave(&prv->lock, flags);
>> > +
>> > +    stop_timer(repl_timer);
>> > +
>> > +    list_for_each_safe(iter, tmp, replq)
>> > +    {
>> > +        svc = replq_elem(iter);
>> > +
>> > +        if ( now < svc->cur_deadline )
>> > +            break;
>> > +
>> > +        rt_update_deadline(now, svc);
>> > +
>> > +        /*
>> > +         * when the replenishment happens
>> > +         * svc is either on a pcpu or on
>> > +         * runq/depletedq
>> > +         */
>> > +        if( __vcpu_on_q(svc) )
>> > +        {
>> > +            /* put back to runq */
>> > +            __q_remove(svc);
>> > +            __runq_insert(ops, svc);
>> > +        }
>> > +
>> > +        /*
>> > +         * tickle regardless where it's at
>> > +         * because a running vcpu could have
>> > +         * a later deadline than others after
>> > +         * replenishment
>> > +         */
>> > +        runq_tickle(ops, svc);
>> Well, I'm thinking about the correctness of tickle here.
>> The svc is running on a core, say P_i, right now. After replenishing
>> the budget for svc, svc's deadline also changes. So svc  should
>> probably be preempted by the head vcpu in the runq. If that's the
>> case, we should use the head of runq to tickle, instead of using svc.
>> Right?
>>
> I agree that, in case of replenishment of a running vcpu, calling
> runq_tickle() like this is not correct. In fact, the whole idea of
> tickling in Credit1 and 2, from which it has been borrowed to here, was
> meant at (potentially) putting a vcpu in execution, not vice versa. :-/
>
> I therefore agree that, if svc ends up with a later deadline than the
> first vcpu in the runque, we should actually call run_tickle() on the
> latter!
>
>> Actually, is the runq_tickle necessary here? Can we just check if the
>> updated svc has higher priority (smaller deadline) than the head vcpu
>> in the runq? If so, we just tickle the cpu P_i, where svc is
>> currently
>> running on.
>>
> Well, if we are replenishing a vcpu that is depleted, whose new
> deadline can be earlier than any of the ones of the vcpus that are
> running (can't it?) and/or there can be idle vcpus on which you can run
> it, now that it has budget. So, in this case, I think we need what's
> done in runq_tickle()...
>
> The third case is that we replenish a vcpu that is in the runqueue, so
> it had budget, but was not running. In that case, there should be no
> chance that it should preempt a running vcpu, as it was waiting in the
> runqueue, which means --if we have M pcpus-- it was not among the M
> earliest deadline vcpus, and we just pushed its deadline away!
> It should also not be necessary to do any deadline comparison with
> other vcpus in the runqueue. In fact, still considering that we are
> moving the deadline ahead, it's going to either be in the same or
> "worst" position in the runqueue.
>
>> But either way, I'm thinking the logic here is incorrect. If the
>> updated svc has a lower priority, you will end up tickle no core
>> although the svc should be preempted.
>>
> It seems to me that the logic here could be (sort-of):
>
>   for_each_to_be_replenished_vcpu(v)
>   {
>     deadline_queue_remove(replq, v);
>     rt_update_deadline(v);
>
>     if ( curr_on_cpu(v->processor) == v)) //running
>     {
>         if ( v->cur_deadline >= runq[0]->cur_deadline )
>           tickle(runq[0]); /* runq[0] => first vcpu in the runq */
>     }
>     else if ( __vcpu_on_q(v) )
>     {
>         if ( v->cur_budget == 0 )         //depleted
>           tickle(v);
>         else                              //runnable
>           /* do nothing */
>     }
>     deadline_queue_insert(v, replq);
>   }
>
> Thoughts?

I like the logic here. But there is one thing to note/change.

After we run rt_update_deadline(v), the v->cur_deadline will be set to
v->budget, which means that we cannot use the v->cur_deadline to
determine if it is in runq or depletedq.

I suggest to have a flag to determine if the v is in depletedq by
v->cur_budget == 0 first before we run rt_update_deadline(v); and use
the flag later.

>
> I actually think there may be room for a nasty race, in case we do more
> than one replenishments.
>
> In fact, assume that, at time t:
>  - we do the replenishment of v1, which is running and v2, which is
>    runnable,
>  - we have 1 cpu,
>  - v1 and v2 have, currently, the same deadline == t,
>  - v1's new deadline is going to be t+100, and v2's new deadline is
>    going to be t+150,
>  - v2 is the only (i.e., also the first) runnable vcpu,
>  - v1's replenishment event comes first in the replenishment queue.
>
> With the code above, we update v1's deadline (to t+100), go check it
> against v2's _not_updated_ deadline (t) and (since t+100 > t) find that
> a preemption is necessary, so we call tickle(v2). That will raise
> SCHEDULE_SOFTIRQ for the cpu, because v2 should --as far as situation
> is right now-- preempt v1.
>
> However, right after that, we update v2's deadline to t+150, and we do
> nothing. So, even if the vcpu with the earliest deadline (v1) is
> running, we reschedule.

Hmm, I kind of get what you want to deliver with the example here. I agree.
I will just summarize what I understand here:
When we replenish VCPU one by one and tickle them one by one, we may
end up use a to-be-updated vcpu to tickle, which will probably be
"kicked out" by the next to-be-updated vcpu.

The rt_schedule will probably do a noop for the to-be-updated vcpu
since it can tell this vcpu actually misses deadline (because this
vcpu is about to update the deadline, it means this vcpu just
reaches/passed the current deadline.)

>
> This should be harmless, as we do figure out during rt_schedule() that
> no real context switch is necessary, but I think it would better be
> avoided, if possible.
Right
.

>
> It looks to me that we can re-arrange the code above as follows:
>
>   for_each_to_be_replenished_vcpu(v)
>   {
>     rt_update_deadline(v);
>   }

we have to record the number of VCPUs that has updated their deadline,
so that we know when to stop for the next loop..
>
>   for_each_to_be_replenished_vcpu(v)
>   {
>     deadline_queue_remove(replq, v);
>
>     if ( curr_on_cpu(v->processor) == v)) //running
>     {
>         if ( v->cur_deadline >= runq[0]->cur_deadline )
>           tickle(runq[0]); /* runq[0] => first vcpu in the runq */
>     }
>     else if ( __vcpu_on_q(v) )
>     {
>         if ( v->cur_budget == 0 )         //depleted
>           tickle(v);
>         else                              //runnable
>           /* do nothing */
>     }
>     deadline_queue_insert(v, replq);
>   }
>
> Basically, by doing all the replenishments (which includes updating all
> the deadlines) upfront, we should be able to prevent the above
> situation.
>
> So, again, thoughts?

I think we need to decide which vcpu is on depleted q before update
deadline and we also need to record which vcpus should be updated. So
I added some code into your code:

#define __RTDS_is_depleted     3
#define RTDS_is_depleted  (1<<__RTDS_is_depleted)

int num_repl_vcpus = 0;
 for_each_to_be_replenished_vcpu(v)
 {
   if (v->cur_budget <= 0)
       set_bit(__RTDS_is_depleted, &v->flags);

    rt_update_deadline(v);
    num_repl_vcpus++;
 }

  for_each_to_be_replenished_vcpu(v)
  {
    deadline_queue_remove(replq, v);

    if ( curr_on_cpu(v->processor) == v)) //running
    {
        if ( v->cur_deadline >= runq[0]->cur_deadline )
          tickle(runq[0]); /* runq[0] => first vcpu in the runq */
    }
    else if ( __vcpu_on_q(v) )
    {
        if ( v->flags & RTDS_is_depleted )         //depleted
        {
          clear_bit(__RTDS_is_depleted, &v->flags);
          tickle(v);
        }
        else                              //runnable
          /* do nothing */
    }
    deadline_queue_insert(v, replq);

    /* stop at the vcpu that does not need replenishment */
    num_repl_vcpus--;
    if (!num_repl_vcpus)
        break;
 }

What do you think this version?

Thanks and Best Regards,

Meng

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel