linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: change pulling RT task to be pulling the highest-prio run-queue first
@ 2011-05-28 14:34 Hillf Danton
  2011-05-31 15:00 ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Hillf Danton @ 2011-05-28 14:34 UTC (permalink / raw)
  To: LKML
  Cc: Yong Zhang, Mike Galbraith, Steven Rostedt, Peter Zijlstra, Ingo Molnar

When pulling, RT tasks are pulled from one overloaded run-queue after another,
which is changed to be pulling tasks from the highest-prio run-queue first.

A new function, cpupri_find_prio(), is added to easy pulling in prio sequence.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
---

--- tip-git/kernel/sched_rt.c	Sun May 22 20:12:01 2011
+++ sched_rt.c	Sat May 28 21:24:13 2011
@@ -1434,18 +1434,33 @@ static void push_rt_tasks(struct rq *rq)
 		;
 }

+static DEFINE_PER_CPU(cpumask_var_t, high_cpu_mask);
+
 static int pull_rt_task(struct rq *this_rq)
 {
 	int this_cpu = this_rq->cpu, ret = 0, cpu;
 	struct task_struct *p;
 	struct rq *src_rq;
+	struct cpumask *high_mask = __get_cpu_var(high_cpu_mask);
+	int prio = 0;

 	if (likely(!rt_overloaded(this_rq)))
 		return 0;
+loop:
+	if (! (prio < this_rq->rt.highest_prio.curr))
+		return ret;
+
+	if (! cpupri_find_prio(&this_rq->rd->cpupri, prio,
+				this_rq->rd->rto_mask, high_mask)) {
+		prio++;
+		goto loop;
+	}

-	for_each_cpu(cpu, this_rq->rd->rto_mask) {
-		if (this_cpu == cpu)
-			continue;
+	if (cpumask_test_cpu(this_cpu, high_mask))
+		/* no efforts needed if this RQ is high enough in prio */
+		return ret;
+
+	for_each_cpu(cpu, high_mask) {

 		src_rq = cpu_rq(cpu);

@@ -1508,8 +1523,13 @@ static int pull_rt_task(struct rq *this_
 		}
 skip:
 		double_unlock_balance(this_rq, src_rq);
+		if (ret)
+			goto done;
 	}

+	prio++;
+	goto loop;
+done:
 	return ret;
 }

@@ -1630,9 +1650,12 @@ static inline void init_sched_rt_class(v
 {
 	unsigned int i;

-	for_each_possible_cpu(i)
+	for_each_possible_cpu(i) {
 		zalloc_cpumask_var_node(&per_cpu(local_cpu_mask, i),
 					GFP_KERNEL, cpu_to_node(i));
+		zalloc_cpumask_var_node(&per_cpu(high_cpu_mask, i),
+					GFP_KERNEL, cpu_to_node(i));
+	}
 }
 #endif /* CONFIG_SMP */

--- tip-git/kernel/sched_cpupri.h	Sat May 14 15:21:56 2011
+++ sched_cpupri.h	Sat May 28 21:29:26 2011
@@ -26,6 +26,7 @@ struct cpupri {
 #ifdef CONFIG_SMP
 int  cpupri_find(struct cpupri *cp,
 		 struct task_struct *p, struct cpumask *lowest_mask);
+int cpupri_find_prio(struct cpupri *,int, struct cpumask *, struct cpumask *);
 void cpupri_set(struct cpupri *cp, int cpu, int pri);
 int cpupri_init(struct cpupri *cp);
 void cpupri_cleanup(struct cpupri *cp);
--- tip-git/kernel/sched_cpupri.c	Sat May 14 15:21:56 2011
+++ sched_cpupri.c	Sat May 28 21:46:51 2011
@@ -202,3 +202,33 @@ void cpupri_cleanup(struct cpupri *cp)
 	for (i = 0; i < CPUPRI_NR_PRIORITIES; i++)
 		free_cpumask_var(cp->pri_to_cpu[i].mask);
 }
+
+/*
+ * cpupri_find_prio - find CPUs for a given prio
+ * @cp: The cpupri context
+ * @task_prio: The given prio for which CPUs are slected
+ * @filter_mask: A mask to filter selected CPUs
+ * @ret_mask: A mask to fill in with selected CPUs
+ *
+ * Returns: (int)bool - CPUs were found
+ */
+int cpupri_find_prio(struct cpupri *cp, int task_prio,
+			struct cpumask *filter_mask,
+			struct cpumask *ret_mask)
+{
+	task_prio = convert_prio(task_prio);
+
+	if (cp->pri_active[task_prio / BITS_PER_LONG] &
+		  (1UL << (task_prio % BITS_PER_LONG))) {
+
+		struct cpupri_vec *vec  = &cp->pri_to_cpu[task_prio];
+
+		cpumask_and(ret_mask, filter_mask, vec->mask);
+
+		if (cpumask_any(ret_mask) < nr_cpu_ids)
+			return 1;
+	}
+
+	return 0;
+}
+

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched: change pulling RT task to be pulling the highest-prio run-queue first
  2011-05-28 14:34 [PATCH] sched: change pulling RT task to be pulling the highest-prio run-queue first Hillf Danton
@ 2011-05-31 15:00 ` Steven Rostedt
  2011-06-03 15:11   ` Hillf Danton
  0 siblings, 1 reply; 3+ messages in thread
From: Steven Rostedt @ 2011-05-31 15:00 UTC (permalink / raw)
  To: Hillf Danton
  Cc: LKML, Yong Zhang, Mike Galbraith, Peter Zijlstra, Ingo Molnar

On Sat, 2011-05-28 at 22:34 +0800, Hillf Danton wrote:
> When pulling, RT tasks are pulled from one overloaded run-queue after another,
> which is changed to be pulling tasks from the highest-prio run-queue first.

First off, a change like this requires rational. Preferably, in the
showing of benchmarks, and test cases that demonstrate the problems of
the current scheduler and explains to us that these changes improve the
situation.

There is no rational nor any benchmarks that explain why this is better
than the current method.

> 
> A new function, cpupri_find_prio(), is added to easy pulling in prio sequence.
> 
> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> ---
> 
> --- tip-git/kernel/sched_rt.c	Sun May 22 20:12:01 2011
> +++ sched_rt.c	Sat May 28 21:24:13 2011
> @@ -1434,18 +1434,33 @@ static void push_rt_tasks(struct rq *rq)
>  		;
>  }
> 
> +static DEFINE_PER_CPU(cpumask_var_t, high_cpu_mask);
> +
>  static int pull_rt_task(struct rq *this_rq)
>  {
>  	int this_cpu = this_rq->cpu, ret = 0, cpu;
>  	struct task_struct *p;
>  	struct rq *src_rq;
> +	struct cpumask *high_mask = __get_cpu_var(high_cpu_mask);
> +	int prio = 0;
> 
>  	if (likely(!rt_overloaded(this_rq)))
>  		return 0;
> +loop:
> +	if (! (prio < this_rq->rt.highest_prio.curr))
> +		return ret;
> +
> +	if (! cpupri_find_prio(&this_rq->rd->cpupri, prio,
> +				this_rq->rd->rto_mask, high_mask)) {
> +		prio++;
> +		goto loop;
> +	}

This loop looks to be expensive in the hot path.

Note, in practice, not many RT tasks are running at the same time. If
this is not the case, then please explain what situation has multiple RT
tasks contending for more than one CPU where RT tasks are forced to
migrate continuously, and this patch fixes the situation.

I understand that the current code looks a bit expensive, as it loops
through the CPUs that are overloaded, and pulls over the RT tasks
waiting to run that are of higher priority than the one currently on
this task. If it picks wrong, it could potentially pull over more than
one task.

But in practice (and I've traced this a while back), it seldom ever
happens.

But if you see that this code is hitting the slow path constantly, and
your code shows better performance, and you can demonstrate this via a
benchmark that I could use to reproduce, then I will consider taking
these changes.

-- Steve


> 
> -	for_each_cpu(cpu, this_rq->rd->rto_mask) {
> -		if (this_cpu == cpu)
> -			continue;
> +	if (cpumask_test_cpu(this_cpu, high_mask))
> +		/* no efforts needed if this RQ is high enough in prio */
> +		return ret;
> +
> +	for_each_cpu(cpu, high_mask) {
> 
>  		src_rq = cpu_rq(cpu);
> 
> @@ -1508,8 +1523,13 @@ static int pull_rt_task(struct rq *this_
>  		}
>  skip:
>  		double_unlock_balance(this_rq, src_rq);
> +		if (ret)
> +			goto done;
>  	}
> 
> +	prio++;
> +	goto loop;
> +done:
>  	return ret;
>  }
> 
> @@ -1630,9 +1650,12 @@ static inline void init_sched_rt_class(v
>  {
>  	unsigned int i;
> 
> -	for_each_possible_cpu(i)
> +	for_each_possible_cpu(i) {
>  		zalloc_cpumask_var_node(&per_cpu(local_cpu_mask, i),
>  					GFP_KERNEL, cpu_to_node(i));
> +		zalloc_cpumask_var_node(&per_cpu(high_cpu_mask, i),
> +					GFP_KERNEL, cpu_to_node(i));
> +	}
>  }
>  #endif /* CONFIG_SMP */
> 
> --- tip-git/kernel/sched_cpupri.h	Sat May 14 15:21:56 2011
> +++ sched_cpupri.h	Sat May 28 21:29:26 2011
> @@ -26,6 +26,7 @@ struct cpupri {
>  #ifdef CONFIG_SMP
>  int  cpupri_find(struct cpupri *cp,
>  		 struct task_struct *p, struct cpumask *lowest_mask);
> +int cpupri_find_prio(struct cpupri *,int, struct cpumask *, struct cpumask *);
>  void cpupri_set(struct cpupri *cp, int cpu, int pri);
>  int cpupri_init(struct cpupri *cp);
>  void cpupri_cleanup(struct cpupri *cp);
> --- tip-git/kernel/sched_cpupri.c	Sat May 14 15:21:56 2011
> +++ sched_cpupri.c	Sat May 28 21:46:51 2011
> @@ -202,3 +202,33 @@ void cpupri_cleanup(struct cpupri *cp)
>  	for (i = 0; i < CPUPRI_NR_PRIORITIES; i++)
>  		free_cpumask_var(cp->pri_to_cpu[i].mask);
>  }
> +
> +/*
> + * cpupri_find_prio - find CPUs for a given prio
> + * @cp: The cpupri context
> + * @task_prio: The given prio for which CPUs are slected
> + * @filter_mask: A mask to filter selected CPUs
> + * @ret_mask: A mask to fill in with selected CPUs
> + *
> + * Returns: (int)bool - CPUs were found
> + */
> +int cpupri_find_prio(struct cpupri *cp, int task_prio,
> +			struct cpumask *filter_mask,
> +			struct cpumask *ret_mask)
> +{
> +	task_prio = convert_prio(task_prio);
> +
> +	if (cp->pri_active[task_prio / BITS_PER_LONG] &
> +		  (1UL << (task_prio % BITS_PER_LONG))) {
> +
> +		struct cpupri_vec *vec  = &cp->pri_to_cpu[task_prio];
> +
> +		cpumask_and(ret_mask, filter_mask, vec->mask);
> +
> +		if (cpumask_any(ret_mask) < nr_cpu_ids)
> +			return 1;
> +	}
> +
> +	return 0;
> +}
> +



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] sched: change pulling RT task to be pulling the highest-prio run-queue first
  2011-05-31 15:00 ` Steven Rostedt
@ 2011-06-03 15:11   ` Hillf Danton
  0 siblings, 0 replies; 3+ messages in thread
From: Hillf Danton @ 2011-06-03 15:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: LKML, Yong Zhang, Mike Galbraith, Peter Zijlstra, Ingo Molnar

On Tue, May 31, 2011 at 11:00 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Sat, 2011-05-28 at 22:34 +0800, Hillf Danton wrote:
>> When pulling, RT tasks are pulled from one overloaded run-queue after another,
>> which is changed to be pulling tasks from the highest-prio run-queue first.
>
> First off, a change like this requires rational. Preferably, in the
> showing of benchmarks, and test cases that demonstrate the problems of
> the current scheduler and explains to us that these changes improve the
> situation.
>
> There is no rational nor any benchmarks that explain why this is better
> than the current method.
>

Hi Steven

Thanks for your review, which shows the shortage of the patch, test case.


>>
>> A new function, cpupri_find_prio(), is added to easy pulling in prio sequence.
>>
>> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> ---
>>
>> --- tip-git/kernel/sched_rt.c Sun May 22 20:12:01 2011
>> +++ sched_rt.c        Sat May 28 21:24:13 2011
>> @@ -1434,18 +1434,33 @@ static void push_rt_tasks(struct rq *rq)
>>               ;
>>  }
>>
>> +static DEFINE_PER_CPU(cpumask_var_t, high_cpu_mask);
>> +
>>  static int pull_rt_task(struct rq *this_rq)
>>  {
>>       int this_cpu = this_rq->cpu, ret = 0, cpu;
>>       struct task_struct *p;
>>       struct rq *src_rq;
>> +     struct cpumask *high_mask = __get_cpu_var(high_cpu_mask);
>> +     int prio = 0;
>>
>>       if (likely(!rt_overloaded(this_rq)))
>>               return 0;
>> +loop:
>> +     if (! (prio < this_rq->rt.highest_prio.curr))
>> +             return ret;
>> +
>> +     if (! cpupri_find_prio(&this_rq->rd->cpupri, prio,
>> +                             this_rq->rd->rto_mask, high_mask)) {
>> +             prio++;
>> +             goto loop;
>> +     }
>
> This loop looks to be expensive in the hot path.
>

You are right, the introduced overhead in worse cases is
this_rq->rt.highest_prio.curr times bit-test like

        if (cp->pri_active[task_prio / BITS_PER_LONG] &
             (1UL << ((BITS_PER_LONG - 1) - (task_prio % BITS_PER_LONG)))) {

which I think slowdowns the hot patch a lot:/

> Note, in practice, not many RT tasks are running at the same time. If
> this is not the case, then please explain what situation has multiple RT
> tasks contending for more than one CPU where RT tasks are forced to
> migrate continuously, and this patch fixes the situation.
>

The situation is hard to be constructed, I guess it is only captured by
rt_overloaded()


> I understand that the current code looks a bit expensive, as it loops
> through the CPUs that are overloaded, and pulls over the RT tasks
> waiting to run that are of higher priority than the one currently on
> this task. If it picks wrong, it could potentially pull over more than
> one task.
>
> But in practice (and I've traced this a while back), it seldom ever
> happens.
>
> But if you see that this code is hitting the slow path constantly, and
> your code shows better performance, and you can demonstrate this via a
> benchmark that I could use to reproduce, then I will consider taking
> these changes.
>

Since you already traced, the hitting could not happen, I believe.

thanks
           Hillf

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-06-03 15:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-28 14:34 [PATCH] sched: change pulling RT task to be pulling the highest-prio run-queue first Hillf Danton
2011-05-31 15:00 ` Steven Rostedt
2011-06-03 15:11   ` Hillf Danton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).