All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luca Abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@arm.com>,
	Claudio Scordino <claudio@evidence.eu.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Tommaso Cucinotta <tommaso.cucinotta@sssup.it>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Joel Fernandes <joelaf@google.com>,
	Mathieu Poirier <mathieu.poirier@linaro.org>
Subject: Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization
Date: Mon, 24 Jul 2017 10:06:09 +0200	[thread overview]
Message-ID: <20170724100609.4ec38473@luca> (raw)
In-Reply-To: <20170324224715.4098dbfb@nowhere>

Hi Peter,

On Fri, 24 Mar 2017 22:47:15 +0100
luca abeni <luca.abeni@santannapisa.it> wrote:

> Hi Peter,
> 
> On Fri, 24 Mar 2017 14:20:41 +0100
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Mar 24, 2017 at 04:52:55AM +0100, luca abeni wrote:
> >   
> > > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > > index d67eee8..952cac8 100644
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -445,16 +445,33 @@ struct sched_dl_entity {
> > >  	 *
> > >  	 * @dl_yielded tells if task gave up the CPU before
> > > consuming
> > >  	 * all its available runtime during the last job.
> > > +	 *
> > > +	 * @dl_non_contending tells if task is inactive while still
> > > +	 * contributing to the active utilization. In other words,
> > > it
> > > +	 * indicates if the inactive timer has been armed and its
> > > handler
> > > +	 * has not been executed yet. This flag is useful to avoid
> > > race
> > > +	 * conditions between the inactive timer handler and the
> > > wakeup
> > > +	 * code.
> > >  	 */
> > >  	int				dl_throttled;
> > >  	int				dl_boosted;
> > >  	int				dl_yielded;
> > > +	int				dl_non_contending;    
> > 
> > We should maybe look into making those bits :1, not something for this
> > patch though;  
> 
> Yes, grouping all the flags in a single field was my intention too... I
> planned to submit a patch to do this after merging the reclaiming
> patches... But maybe it is better to do this first :)

I implemented this change, but before submitting the patch I have a
small question.
I implemented some helpers to access the various
{throttled,boosted,yielded,non_contending} flags. I have some
"dl_{throttled,boosted,...}()" inline functions for reading the values
of the flags, and some inline functions for setting / clearing the
flags. For these, I have two possibilities:
- using two separate "dl_set_{throttled,...}()" and
  "dl_clear_{throttled,..}()" functions for each flag
- using one single "dl_set_{throttled,...}(dl, value)" function per
  flag, in which the flag's value is specified.

I have no preferences (with the first proposal, I introduce more inline
functions, but I think the functions can be made more efficient /
optimized). Which one of the two proposals is preferred? (or, there is
a third, better, idea that I overlooked?)


			Thanks,
				Luca

> 
> 
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index cef9adb..86aed82 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -65,6 +65,107 @@ void sub_running_bw(u64 dl_bw, struct dl_rq
> > > *dl_rq) dl_rq->running_bw = 0;
> > >  }
> > >  
> > > +void dl_change_utilization(struct task_struct *p, u64 new_bw)
> > > +{
> > > +	if (!task_on_rq_queued(p)) {
> > > +		struct rq *rq = task_rq(p);
> > > +
> > > +		if (p->dl.dl_non_contending) {
> > > +			sub_running_bw(p->dl.dl_bw, &rq->dl);
> > > +			p->dl.dl_non_contending = 0;
> > > +			/*
> > > +			 * If the timer handler is currently
> > > running and the
> > > +			 * timer cannot be cancelled,
> > > inactive_task_timer()
> > > +			 * will see that dl_not_contending is not
> > > set, and
> > > +			 * will not touch the rq's active
> > > utilization,
> > > +			 * so we are still safe.
> > > +			 */
> > > +			if
> > > (hrtimer_try_to_cancel(&p->dl.inactive_timer) == 1)
> > > +				put_task_struct(p);
> > > +		}
> > > +	}
> > > +}    
> > 
> > If we rearrange that slightly we can avoid the double indent:  
> [...]
> Ok; I'll rework the code in this way on Monday.
> 
> [...]
> > > +		if (hrtimer_try_to_cancel(&dl_se->inactive_timer)
> > > == 1)
> > > +			put_task_struct(dl_task_of(dl_se));
> > > +		dl_se->dl_non_contending = 0;    
> > 
> > This had me worried for a little while as being a use-after-free, but
> > this put_task_struct() cannot be the last in this case. Might be
> > worth a comment.  
> 
> Or maybe it is better to move "dl_se->dl_non_contending = 0;" before
> hrtimer_try_to_cancel()?
> 
> >   
> > > +	} else {
> > > +		/*
> > > +		 * Since "dl_non_contending" is not set, the
> > > +		 * task's utilization has already been removed from
> > > +		 * active utilization (either when the task
> > > blocked,
> > > +		 * when the "inactive timer" fired).
> > > +		 * So, add it back.
> > > +		 */
> > > +		add_running_bw(dl_se->dl_bw, dl_rq);
> > > +	}
> > > +}    
> > 
> > In general I feel it would be nice to have a state diagram included
> > somewhere near these two functions. It would be nice to not have to
> > dig out the PDF every time.  
> 
> Ok... Since I am not good at ascii art, would it be ok to add a textual
> description? If yes, I'll add a comment like:
> "
> The utilization of a task is added to the runqueue's active utilization
> when the task becomes active (is enqueued in the runqueue), and is
> removed when the task becomes inactive. A task does not become
> immediately inactive when it blocks, but becomes inactive at the so
> called "0 lag time"; so, we setup the "inactive timer" to fire at the
> "0 lag time". When the "inactive timer" fires, the task utilization is
> removed from the runqueue's active utilization. If the task wakes up
> again on the same runqueue before the "0 lag time", the active
> utilization must not be changed and the "inactive timer" must be
> cancelled. If the task wakes up again on a different runqueue before
> the "0 lag time", then the task's utilization must be removed from the
> previous runqueue's active utilization and must be added to the new
> runqueue's active utilization.
> In order to avoid races between a task waking up on a runqueue while the
> "inactive timer" is running on a different CPU, the "dl_non_contending"
> flag is used to indicate that a task is not on a runqueue but is active
> (so, the flag is set when the task blocks and is cleared when the
> "inactive timer" fires or when the task  wakes up).
> "
> (if this is ok, where can I add this comment?)
> 
> 
> > > +static void migrate_task_rq_dl(struct task_struct *p)
> > > +{
> > > +	if ((p->state == TASK_WAKING) &&
> > > (p->dl.dl_non_contending)) {
> > > +		struct rq *rq = task_rq(p);
> > > +
> > > +		raw_spin_lock(&rq->lock);
> > > +		sub_running_bw(p->dl.dl_bw, &rq->dl);
> > > +		p->dl.dl_non_contending = 0;
> > > +		/*
> > > +		 * If the timer handler is currently running and
> > > the
> > > +		 * timer cannot be cancelled, inactive_task_timer()
> > > +		 * will see that dl_not_contending is not set, and
> > > +		 * will not touch the rq's active utilization,
> > > +		 * so we are still safe.
> > > +		 */
> > > +		if (hrtimer_try_to_cancel(&p->dl.inactive_timer)
> > > == 1)
> > > +			put_task_struct(p);
> > > +
> > > +		raw_spin_unlock(&rq->lock);
> > > +	}
> > > +}    
> > 
> > This can similarly be reduced in indent, albeit this is only a single
> > indent level, so not as onerous as the other one.  
> 
> Ok; I'll do this on Monday.
> 
> 
> > What had me puzzled for a while here is taking the lock; because some
> > callers of set_task_cpu() must in fact hold this lock already. Worth a
> > comment I feel.  
> 
> Ok; I'll add a comment mentioning that since state is TASK_WAKING we do
> not have the lock.
> 
> 
> > Once I figured out the exact locking; I realized you do this on
> > cross-cpu wakeups. We spend a fair amount of effort not to take both
> > rq locks. But I suppose you have to here.  
> 
> The problem is that when a task wakes up before the "0 lag time" on a
> different runqueue, we must "migrate" its utilization from the old
> runqueue to the new one (see comment above). And I need the lock to
> avoid races... The only alternative I can think about is to ad a new
> lock protecting the active utilization, and to take it instead of the
> runqueue lock (I do not know if this is enough, I need to check...).
> I'll have a look on Monday.
> 
> 
> 
> 			Thanks,
> 				Luca

  parent reply	other threads:[~2017-07-24  8:06 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-24  3:52 [RFC v5 0/9] CPU reclaiming for SCHED_DEADLINE luca abeni
2017-03-24  3:52 ` [RFC v5 1/9] sched/deadline: track the active utilization luca abeni
2017-03-26 17:04   ` Mathieu Poirier
2017-03-26 20:55     ` luca abeni
2017-03-24  3:52 ` [RFC v5 2/9] sched/deadline: improve the tracking of " luca abeni
2017-03-24 13:20   ` Peter Zijlstra
2017-03-24 21:47     ` luca abeni
2017-03-25  2:31       ` Steven Rostedt
2017-03-27  8:20         ` Luca Abeni
2017-03-27  8:54           ` Claudio Scordino
2017-03-27  7:17       ` Juri Lelli
2017-03-27  7:43         ` Luca Abeni
2017-03-27  8:45           ` Juri Lelli
2017-03-27  7:36       ` Luca Abeni
2017-07-24  8:06       ` Luca Abeni [this message]
2017-07-24  9:04         ` Peter Zijlstra
2017-07-25  6:41           ` Luca Abeni
2017-03-24 13:23   ` Peter Zijlstra
2017-07-24  7:54     ` Luca Abeni
2017-07-24  9:11       ` Peter Zijlstra
2017-07-25  6:46         ` Luca Abeni
2017-03-26 17:32   ` Mathieu Poirier
2017-03-26 21:01     ` luca abeni
2017-03-24  3:52 ` [RFC v5 3/9] sched/deadline: fix the update of the total -deadline utilization luca abeni
2017-03-24  3:52 ` [RFC v5 4/9] sched/deadline: implement GRUB accounting luca abeni
2017-03-24  3:52 ` [RFC v5 5/9] sched/deadline: do not reclaim the whole CPU bandwidth luca abeni
2017-03-24 14:00   ` Peter Zijlstra
2017-03-24 21:58     ` luca abeni
2017-03-25  2:38       ` Steven Rostedt
2017-03-27  8:35         ` Peter Zijlstra
2017-03-24  3:52 ` [RFC v5 6/9] sched/deadline: make GRUB a task's flag luca abeni
2017-03-24  3:53 ` [RFC v5 7/9] sched/deadline: track the "total rq utilization" too luca abeni
2017-03-24  3:53 ` [RFC v5 8/9] sched/deadline: base GRUB reclaiming on the inactive utilization luca abeni
2017-03-27 14:26   ` Peter Zijlstra
2017-03-27 14:56     ` Luca Abeni
2017-03-27 15:53       ` Peter Zijlstra
2017-03-27 17:02         ` luca abeni
2017-05-08  7:41     ` Luca Abeni
2017-05-08  8:06       ` Peter Zijlstra
2017-05-09  9:37         ` Luca Abeni
2017-03-24  3:53 ` [RFC v5 9/9] sched/deadline: also reclaim bandwidth not used by dl tasks luca abeni
2017-03-27 14:03   ` Peter Zijlstra
2017-03-27 14:48     ` Luca Abeni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170724100609.4ec38473@luca \
    --to=luca.abeni@santannapisa.it \
    --cc=bristot@redhat.com \
    --cc=claudio@evidence.eu.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.poirier@linaro.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tommaso.cucinotta@sssup.it \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.