All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gregory Haskins <gregory.haskins@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Gregory Haskins <ghaskins@novell.com>,
	mingo@elte.hu, peterz@infradead.org,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	npiggin@suse.de
Subject: Re: [TIP/SCHED/DEVEL PATCH v3 6/6] sched: create "pushable_tasks" list to limit pushing to one attempt
Date: Thu, 04 Sep 2008 17:26:46 -0400	[thread overview]
Message-ID: <48C05296.8000909@gmail.com> (raw)
In-Reply-To: <alpine.DEB.1.10.0809041706050.3849@gandalf.stny.rr.com>

[-- Attachment #1: Type: text/plain, Size: 12415 bytes --]

Steven Rostedt wrote:
> On Thu, 4 Sep 2008, Gregory Haskins wrote:
>   
>> However, the current implementation suffers from a limitation in the
>> push logic.  Once overloaded, all tasks (other than current) on the
>> RQ are analyzed on every push operation, even if it was previously
>> unpushable (due to affinity, etc).  Whats more, the operation stops
>> at the first task that is unpushable and will not look at items
>> lower in the queue.  This causes two problems:
>>
>> 1) We can have the same tasks analyzed over and over again during each
>>    push, which extends out the fast path in the scheduler for no
>>    gain.  Consider a RQ that has dozens of tasks that are bound to a
>>    core.  Each one of those tasks will be encountered and skipped
>>    for each push operation while they are queued.
>>
>> 2) There may be lower-priority tasks under the unpushable task that
>>    could have been successfully pushed, but will never be considered
>>    until either the unpushable task is cleared, or a pull operation
>>    succeeds.  The net result is a potential latency source for mid
>>    priority tasks.
>>     
>
> Yep, we knew of these limitations when we created it. It was a big
> TODO. Thanks for actually doing it. It is what? One year already?
>
> ;-)
>   

Yeah, agreed.  Sorry if it sounded like I was representing it as a new find.

>   
>> This patch aims to rectify these two conditions by introducing a new
>> priority sorted list: "pushable_tasks".  A task is added to the list
>> each time a task is activated or preempted.  It is removed from the
>> list any time it is deactivated, made current, or fails to push.
>>
>> This works because a task only needs to be attempted to push once.
>> After an initial failure to push, the other cpus will eventually try to
>> pull the task when the conditions are proper.  This also solves the
>> problem that we don't completely analyze all tasks due to encountering
>> an unpushable tasks.  Now every task will have a push attempted (when
>> appropriate).
>>
>> This reduces latency both by shorting the critical section of the
>> rq->lock for certain workloads, and by making sure the algorithm
>> considers all eligible tasks in the system.
>>
>> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
>> CC: Steven Rostedt <srostedt@redhat.com>
>> ---
>>
>>  include/linux/init_task.h |    1 
>>  include/linux/sched.h     |    1 
>>  kernel/sched.c            |    4 ++
>>  kernel/sched_rt.c         |  110 +++++++++++++++++++++++++++++++++++++++++----
>>  4 files changed, 106 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
>> index 021d8e7..7f910f4 100644
>> --- a/include/linux/init_task.h
>> +++ b/include/linux/init_task.h
>> @@ -140,6 +140,7 @@ extern struct group_info init_groups;
>>  		.nr_cpus_allowed = NR_CPUS,				\
>>  	},								\
>>  	.tasks		= LIST_HEAD_INIT(tsk.tasks),			\
>> +	.pushable_tasks = PLIST_NODE_INIT(tsk.pushable_tasks, MAX_PRIO), \
>>  	.ptraced	= LIST_HEAD_INIT(tsk.ptraced),			\
>>  	.ptrace_entry	= LIST_HEAD_INIT(tsk.ptrace_entry),		\
>>  	.real_parent	= &tsk,						\
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index cf8cd8c..3480c8a 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1078,6 +1078,7 @@ struct task_struct {
>>  #endif
>>  
>>  	struct list_head tasks;
>> +	struct plist_node pushable_tasks;
>>  
>>  	struct mm_struct *mm, *active_mm;
>>  
>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index ddc3877..dbb9e8c 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -447,6 +447,7 @@ struct rt_rq {
>>  #ifdef CONFIG_SMP
>>  	unsigned long rt_nr_migratory;
>>  	int overloaded;
>> +	struct plist_head pushable_tasks;
>>  #endif
>>  	int rt_throttled;
>>  	u64 rt_time;
>> @@ -2383,6 +2384,8 @@ void sched_fork(struct task_struct *p, int clone_flags)
>>  	/* Want to start with kernel preemption disabled. */
>>  	task_thread_info(p)->preempt_count = 1;
>>  #endif
>> +	plist_node_init(&p->pushable_tasks, MAX_PRIO);
>> +
>>  	put_cpu();
>>  }
>>  
>> @@ -8009,6 +8012,7 @@ static void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq)
>>  #ifdef CONFIG_SMP
>>  	rt_rq->rt_nr_migratory = 0;
>>  	rt_rq->overloaded = 0;
>> +	plist_head_init(&rq->rt.pushable_tasks, &rq->lock);
>>  #endif
>>  
>>  	rt_rq->rt_time = 0;
>> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
>> index 277ccd2..b22d18a 100644
>> --- a/kernel/sched_rt.c
>> +++ b/kernel/sched_rt.c
>> @@ -49,6 +49,24 @@ static void update_rt_migration(struct rq *rq)
>>  		rq->rt.overloaded = 0;
>>  	}
>>  }
>> +
>> +static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
>> +{
>> +	plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
>> +	plist_node_init(&p->pushable_tasks, p->prio);
>> +	plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);
>> +}
>> +
>> +static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
>> +{
>> +	plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
>> +}
>> +
>> +#else
>> +
>> +#define enqueue_pushable_task(rq, p) do { } while (0)
>> +#define dequeue_pushable_task(rq, p) do { } while (0)
>> +
>>  #endif /* CONFIG_SMP */
>>  
>>  static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
>> @@ -669,6 +687,9 @@ static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int wakeup)
>>  
>>  	enqueue_rt_entity(rt_se);
>>  
>> +	if (!task_current(rq, p) && p->rt.nr_cpus_allowed > 1)
>> +		enqueue_pushable_task(rq, p);
>> +
>>  	inc_cpu_load(rq, p->se.load.weight);
>>  }
>>  
>> @@ -679,6 +700,8 @@ static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int sleep)
>>  	update_curr_rt(rq);
>>  	dequeue_rt_entity(rt_se);
>>  
>> +	dequeue_pushable_task(rq, p);
>> +
>>  	dec_cpu_load(rq, p->se.load.weight);
>>  }
>>  
>> @@ -824,7 +847,7 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
>>  	return next;
>>  }
>>  
>> -static struct task_struct *pick_next_task_rt(struct rq *rq)
>> +static struct task_struct *_pick_next_task_rt(struct rq *rq)
>>  {
>>  	struct sched_rt_entity *rt_se;
>>  	struct task_struct *p;
>> @@ -846,6 +869,18 @@ static struct task_struct *pick_next_task_rt(struct rq *rq)
>>  
>>  	p = rt_task_of(rt_se);
>>  	p->se.exec_start = rq->clock;
>> +
>> +	return p;
>> +}
>> +
>> +static struct task_struct *pick_next_task_rt(struct rq *rq)
>> +{
>> +	struct task_struct *p = _pick_next_task_rt(rq);
>> +
>> +	/* The running task is never eligible for pushing */
>> +	if (p)
>> +		dequeue_pushable_task(rq, p);
>> +
>>  	return p;
>>  }
>>  
>> @@ -853,6 +888,13 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
>>  {
>>  	update_curr_rt(rq);
>>  	p->se.exec_start = 0;
>> +
>> +	/*
>> +	 * The previous task needs to be made eligible for pushing
>> +	 * if it is still active
>> +	 */
>> +	if (p->se.on_rq && p->rt.nr_cpus_allowed > 1)
>> +		enqueue_pushable_task(rq, p);
>>  }
>>  
>>  #ifdef CONFIG_SMP
>> @@ -1031,6 +1073,28 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>>  	return lowest_rq;
>>  }
>>  
>> +static inline int has_pushable_tasks(struct rq *rq)
>> +{
>> +	return !plist_head_empty(&rq->rt.pushable_tasks);
>> +}
>> +
>> +static struct task_struct *pick_next_pushable_task(struct rq *rq)
>> +{
>> +	struct task_struct *p;
>> +
>> +	if (!has_pushable_tasks(rq))
>> +		return NULL;
>> +
>> +	p = plist_first_entry(&rq->rt.pushable_tasks,
>> +			      struct task_struct, pushable_tasks);
>> +
>> +	BUG_ON(rq->cpu != task_cpu(p));
>> +	BUG_ON(task_current(rq, p));
>> +	BUG_ON(p->rt.nr_cpus_allowed <= 1);
>>     
>
> I have two new BUG_ONs for you. (This isn't a fast path)
>
> 	BUG_ON(!p->se.on_rq);
> 	BUG_ON(!rt_task(p));
>   

ack

>
>   
>> +
>> +	return p;
>> +}
>> +
>>  /*
>>   * If the current CPU has more than one RT task, see if the non
>>   * running task can migrate over to a CPU that is running a task
>> @@ -1040,13 +1104,12 @@ static int push_rt_task(struct rq *rq)
>>  {
>>  	struct task_struct *next_task;
>>  	struct rq *lowest_rq;
>> -	int ret = 0;
>>  	int paranoid = RT_MAX_TRIES;
>>  
>>  	if (!rq->rt.overloaded)
>>  		return 0;
>>  
>> -	next_task = pick_next_highest_task_rt(rq, -1);
>> +	next_task = pick_next_pushable_task(rq);
>>  	if (!next_task)
>>  		return 0;
>>  
>> @@ -1078,12 +1141,19 @@ static int push_rt_task(struct rq *rq)
>>  		 * so it is possible that next_task has changed.
>>  		 * If it has, then try again.
>>  		 */
>> -		task = pick_next_highest_task_rt(rq, -1);
>> +		task = pick_next_pushable_task(rq);
>>  		if (unlikely(task != next_task) && task && paranoid--) {
>>  			put_task_struct(next_task);
>>  			next_task = task;
>>  			goto retry;
>>  		}
>> +
>> +		/*
>> +		 * Once we have failed to push this task, we will not
>> +		 * try again, since the other cpus will pull from us
>> +		 * when they are ready
>> +		 */
>> +		dequeue_pushable_task(rq, next_task);
>>  		goto out;
>>  	}
>>  
>> @@ -1095,11 +1165,10 @@ static int push_rt_task(struct rq *rq)
>>  
>>  	double_unlock_balance(rq, lowest_rq);
>>  
>> -	ret = 1;
>>  out:
>>  	put_task_struct(next_task);
>>  
>> -	return ret;
>> +	return 1;
>>  }
>>  
>>  /*
>> @@ -1128,7 +1197,7 @@ static int pull_rt_task(struct rq *this_rq)
>>  	if (likely(!rt_overloaded(this_rq)))
>>  		return 0;
>>  
>> -	next = pick_next_task_rt(this_rq);
>> +	next = _pick_next_task_rt(this_rq);
>>  
>>  	for_each_cpu_mask_nr(cpu, this_rq->rd->rto_mask) {
>>  		if (this_cpu == cpu)
>> @@ -1145,7 +1214,7 @@ static int pull_rt_task(struct rq *this_rq)
>>  		if (double_lock_balance(this_rq, src_rq)) {
>>  			struct task_struct *old_next = next;
>>  
>> -			next = pick_next_task_rt(this_rq);
>> +			next = _pick_next_task_rt(this_rq);
>>  			if (next != old_next)
>>  				ret = 1;
>>  		}
>> @@ -1217,7 +1286,7 @@ static void pre_schedule_rt(struct rq *rq, struct task_struct *prev)
>>   */
>>  static int needs_post_schedule_rt(struct rq *rq)
>>  {
>> -	return rq->rt.overloaded ? 1 : 0;
>> +	return has_pushable_tasks(rq);
>>  }
>>  
>>  static void post_schedule_rt(struct rq *rq)
>> @@ -1240,7 +1309,7 @@ static void task_wake_up_rt(struct rq *rq, struct task_struct *p)
>>  {
>>  	if (!task_running(rq, p) &&
>>  	    !test_tsk_need_resched(rq->curr) &&
>> -	    rq->rt.overloaded &&
>> +	    has_pushable_tasks(rq) &&
>>  	    p->rt.nr_cpus_allowed > 1)
>>  		push_rt_tasks(rq);
>>  }
>> @@ -1277,6 +1346,24 @@ static void set_cpus_allowed_rt(struct task_struct *p,
>>  	if (p->se.on_rq && (weight != p->rt.nr_cpus_allowed)) {
>>  		struct rq *rq = task_rq(p);
>>  
>> +		if (!task_current(rq, p)) {
>> +			/*
>> +			 * Make sure we dequeue this task from the pushable list
>> +			 * before going further.  It will either remain off of
>> +			 * the list because we are no longer pushable, or it
>> +			 * will be requeued.
>> +			 */
>> +			if (p->rt.nr_cpus_allowed > 1)
>> +				dequeue_pushable_task(rq, p);
>> +
>> +			/*
>> +			 * Requeue if our weight is changing and still > 1
>> +			 */
>> +			if (weight > 1)
>> +				enqueue_pushable_task(rq, p);
>> +
>> +		}
>> +
>>  		if ((p->rt.nr_cpus_allowed <= 1) && (weight > 1)) {
>>  			rq->rt.rt_nr_migratory++;
>>  		} else if ((p->rt.nr_cpus_allowed > 1) && (weight <= 1)) {
>> @@ -1453,6 +1540,9 @@ static void set_curr_task_rt(struct rq *rq)
>>  	struct task_struct *p = rq->curr;
>>  
>>  	p->se.exec_start = rq->clock;
>> +
>> +	/* The running task is never eligible for pushing */
>> +	dequeue_pushable_task(rq, p);
>>  }
>>  
>>  static const struct sched_class rt_sched_class = {
>>
>>     
>
> OK, I like the patch, but I'm not 100% that it doesn't have bugs. I'll ACK 
> it, but this definitely needs to be heavily tested, before going mainline. 
> But for linux-tip, -mm, or linux-next, I think it is, on the surface, 
> good to go (with the added BUG_ONs).
>
> I'll pull it into -rt as well.
>
> Acked-by: Steven Rostedt <srostedt@redhat.com>
>   

Thanks Steve, Ill spin a v4 with the fixes you requested and get it out
to you.

-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

      reply	other threads:[~2008-09-04 21:29 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-25 20:15 [PATCH 0/5] sched: misc rt fixes for tip/sched/devel Gregory Haskins
2008-08-25 20:15 ` [PATCH 1/5] sched: only try to push a task on wakeup if it is migratable Gregory Haskins
2008-08-25 20:15 ` [PATCH 2/5] sched: pull only one task during NEWIDLE balancing to limit critical section Gregory Haskins
2008-08-26  6:21   ` Nick Piggin
2008-08-26 11:36     ` Gregory Haskins
2008-08-27  6:41       ` Nick Piggin
2008-08-27 11:50         ` Gregory Haskins
2008-08-27 11:57           ` Nick Piggin
2008-08-25 20:15 ` [PATCH 3/5] sched: make double-lock-balance fair Gregory Haskins
2008-08-25 20:15   ` Gregory Haskins
2008-08-26  6:14   ` Nick Piggin
2008-08-26 12:23     ` Gregory Haskins
2008-08-27  6:36       ` Nick Piggin
2008-08-27 11:41         ` Gregory Haskins
2008-08-27 11:53           ` Nick Piggin
2008-08-27 12:10             ` Gregory Haskins
2008-08-25 20:15 ` [PATCH 4/5] sched: add sched_class->needs_post_schedule() member Gregory Haskins
2008-08-25 20:15 ` [PATCH 5/5] sched: create "pushable_tasks" list to limit pushing to one attempt Gregory Haskins
2008-08-26 17:34 ` [PATCH v2 0/6] Series short description Gregory Haskins
2008-08-26 17:34   ` [PATCH v2 1/6] sched: only try to push a task on wakeup if it is migratable Gregory Haskins
2008-08-26 17:34   ` [PATCH v2 2/6] sched: pull only one task during NEWIDLE balancing to limit critical section Gregory Haskins
2008-08-26 17:34     ` Gregory Haskins
2008-08-26 17:35   ` [PATCH v2 3/6] sched: make double-lock-balance fair Gregory Haskins
2008-08-27  8:21     ` Peter Zijlstra
2008-08-27  8:25       ` Peter Zijlstra
2008-08-27 10:26       ` Nick Piggin
2008-08-27 10:41         ` Peter Zijlstra
2008-08-27 10:56           ` Nick Piggin
2008-08-27 10:57             ` Nick Piggin
2008-08-27 12:03               ` Gregory Haskins
2008-08-27 11:07             ` Peter Zijlstra
2008-08-27 11:17               ` Russell King
2008-08-27 12:00               ` Gregory Haskins
2008-08-29 12:49               ` Ralf Baechle
2008-08-27 12:13             ` Gregory Haskins
2008-08-27 12:02       ` Gregory Haskins
2008-08-26 17:35   ` [PATCH v2 4/6] sched: add sched_class->needs_post_schedule() member Gregory Haskins
2008-08-26 17:35   ` [PATCH v2 5/6] plist: fix PLIST_NODE_INIT to work with debug enabled Gregory Haskins
2008-08-26 17:35   ` [PATCH v2 6/6] sched: create "pushable_tasks" list to limit pushing to one attempt Gregory Haskins
2008-08-29 13:24     ` Gregory Haskins
2008-08-26 18:16   ` [PATCH v2 0/6] sched: misc rt fixes for tip/sched/devel (was: Series short description) Gregory Haskins
2008-08-27  8:33   ` [PATCH v2 0/6] Series short description Peter Zijlstra
2008-09-04 12:54 ` [TIP/SCHED/DEVEL PATCH v3 0/6] sched: misc rt fixes Gregory Haskins
2008-09-04 12:55   ` [TIP/SCHED/DEVEL PATCH v3 1/6] sched: only try to push a task on wakeup if it is migratable Gregory Haskins
2008-09-04 12:55   ` [TIP/SCHED/DEVEL PATCH v3 2/6] sched: pull only one task during NEWIDLE balancing to limit critical section Gregory Haskins
2008-09-04 12:55     ` Gregory Haskins
2008-09-04 12:55   ` [TIP/SCHED/DEVEL PATCH v3 3/6] sched: make double-lock-balance fair Gregory Haskins
2008-09-04 12:55   ` [TIP/SCHED/DEVEL PATCH v3 4/6] sched: add sched_class->needs_post_schedule() member Gregory Haskins
2008-09-04 20:36     ` Steven Rostedt
2008-09-04 20:36       ` Gregory Haskins
2008-09-04 12:55   ` [TIP/SCHED/DEVEL PATCH v3 5/6] plist: fix PLIST_NODE_INIT to work with debug enabled Gregory Haskins
2008-09-04 12:55   ` [TIP/SCHED/DEVEL PATCH v3 6/6] sched: create "pushable_tasks" list to limit pushing to one attempt Gregory Haskins
2008-09-04 21:16     ` Steven Rostedt
2008-09-04 21:26       ` Gregory Haskins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48C05296.8000909@gmail.com \
    --to=gregory.haskins@gmail.com \
    --cc=ghaskins@novell.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.