linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8] sched/deadline: support dl task migration during cpu hotplug
@ 2015-02-25 11:50 Wanpeng Li
  2015-03-02 12:11 ` Juri Lelli
  0 siblings, 1 reply; 3+ messages in thread
From: Wanpeng Li @ 2015-02-25 11:50 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra; +Cc: Juri Lelli, linux-kernel, Wanpeng Li

I observe that dl task can't be migrated to other cpus during cpu hotplug,
in addition, task may/may not be running again if cpu is added back. The
root cause which I found is that dl task will be throtted and removed from
dl rq after comsuming all budget, which leads to stop task can't pick it up
from dl rq and migrate to other cpus during hotplug.

The method to reproduce:
schedtool -E -t 50000:100000 -e ./test
Actually test is just a simple for loop. Then observe which cpu the test
task is on.
echo 0 > /sys/devices/system/cpu/cpuN/online

This patch adds the dl task migration during cpu hotplug by finding a most
suitable later deadline rq after dl timer fire if current rq is offline,
if fail to find a suitable later deadline rq then fallback to any eligible
online cpu in order that the deadline task will come back to us, and the
push/pull mechanism should then move it around properly.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
---
v7 -> v8:
 * remove rd->span related modification since Pang's commit 16b269436b72 
   (sched/deadline: Modify cpudl::free_cpus to reflect rd->online) merged 
   upstream, which Juri pointed out can handle the exclusive cpusets.
 * rebase 
v6 -> v7:
 * rebase
v5 -> v6:
 * add double_lock_balance in the fallback path
v4 -> v5:
 * remove raw_spin_unlock(&rq->lock)
 * cleanup codes, spotted by Peterz
 * cleanup patch description
v3 -> v4:
 * use tsk_cpus_allowed wrapper
 * fix compile error
v2 -> v3:
 * don't get_task_struct
 * if cannot preempt any rq, fallback to pick any online cpus
 * use cpu_active_mask as original later_mask if cpu is offline
v1 -> v2:
 * push the task to another cpu in dl_task_timer() if rq is offline.
 kernel/sched/deadline.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 3fa8fa6..49f92c8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -492,6 +492,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
 	return hrtimer_active(&dl_se->dl_timer);
 }
 
+static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
 /*
  * This is the bandwidth enforcement timer callback. If here, we know
  * a task is not on its dl_rq, since the fact that the timer was running
@@ -537,6 +538,43 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 	update_rq_clock(rq);
 
 	/*
+	 * So if we find that the rq the task was on is no longer
+	 * available, we need to select a new rq.
+	 */
+	if (unlikely(!rq->online)) {
+		struct rq *later_rq = NULL;
+
+		later_rq = find_lock_later_rq(p, rq);
+
+		if (!later_rq) {
+			int cpu;
+
+			/*
+			 * If cannot preempt any rq, fallback to pick any
+			 * online cpu.
+			 */
+			cpu = cpumask_any_and(cpu_active_mask,
+					tsk_cpus_allowed(p));
+			if (cpu >= nr_cpu_ids) {
+				pr_warn("fail to find any online cpu and task will never come back\n");
+				goto unlock;
+			}
+			later_rq = cpu_rq(cpu);
+			double_lock_balance(rq, later_rq);
+		}
+
+		deactivate_task(rq, p, 0);
+		set_task_cpu(p, later_rq->cpu);
+		activate_task(later_rq, p, ENQUEUE_REPLENISH);
+
+		resched_curr(later_rq);
+
+		double_unlock_balance(rq, later_rq);
+
+		goto unlock;
+	}
+
+	/*
 	 * If the throttle happened during sched-out; like:
 	 *
 	 *   schedule()
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v8] sched/deadline: support dl task migration during cpu hotplug
  2015-02-25 11:50 [PATCH v8] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
@ 2015-03-02 12:11 ` Juri Lelli
  2015-03-02 22:59   ` Wanpeng Li
  0 siblings, 1 reply; 3+ messages in thread
From: Juri Lelli @ 2015-03-02 12:11 UTC (permalink / raw)
  To: Wanpeng Li, Ingo Molnar, Peter Zijlstra; +Cc: linux-kernel

Hi,

On 25/02/2015 11:50, Wanpeng Li wrote:
> I observe that dl task can't be migrated to other cpus during cpu hotplug,
> in addition, task may/may not be running again if cpu is added back. The
> root cause which I found is that dl task will be throtted and removed from
> dl rq after comsuming all budget, which leads to stop task can't pick it up
> from dl rq and migrate to other cpus during hotplug.
> 
> The method to reproduce:
> schedtool -E -t 50000:100000 -e ./test
> Actually test is just a simple for loop. Then observe which cpu the test
> task is on.
> echo 0 > /sys/devices/system/cpu/cpuN/online
> 
> This patch adds the dl task migration during cpu hotplug by finding a most
> suitable later deadline rq after dl timer fire if current rq is offline,
> if fail to find a suitable later deadline rq then fallback to any eligible
> online cpu in order that the deadline task will come back to us, and the
> push/pull mechanism should then move it around properly.
> 
> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
> ---
> v7 -> v8:
>  * remove rd->span related modification since Pang's commit 16b269436b72 
>    (sched/deadline: Modify cpudl::free_cpus to reflect rd->online) merged 
>    upstream, which Juri pointed out can handle the exclusive cpusets.
>  * rebase 
> v6 -> v7:
>  * rebase
> v5 -> v6:
>  * add double_lock_balance in the fallback path
> v4 -> v5:
>  * remove raw_spin_unlock(&rq->lock)
>  * cleanup codes, spotted by Peterz
>  * cleanup patch description
> v3 -> v4:
>  * use tsk_cpus_allowed wrapper
>  * fix compile error
> v2 -> v3:
>  * don't get_task_struct
>  * if cannot preempt any rq, fallback to pick any online cpus
>  * use cpu_active_mask as original later_mask if cpu is offline
> v1 -> v2:
>  * push the task to another cpu in dl_task_timer() if rq is offline.
>  kernel/sched/deadline.c | 38 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 38 insertions(+)
> 
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 3fa8fa6..49f92c8 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -492,6 +492,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
>  	return hrtimer_active(&dl_se->dl_timer);
>  }
>  
> +static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
>  /*
>   * This is the bandwidth enforcement timer callback. If here, we know
>   * a task is not on its dl_rq, since the fact that the timer was running
> @@ -537,6 +538,43 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>  	update_rq_clock(rq);
>  
>  	/*
> +	 * So if we find that the rq the task was on is no longer
> +	 * available, we need to select a new rq.
> +	 */
> +	if (unlikely(!rq->online)) {
> +		struct rq *later_rq = NULL;
> +
> +		later_rq = find_lock_later_rq(p, rq);
> +
> +		if (!later_rq) {
> +			int cpu;
> +
> +			/*
> +			 * If cannot preempt any rq, fallback to pick any
> +			 * online cpu.
> +			 */
> +			cpu = cpumask_any_and(cpu_active_mask,
> +					tsk_cpus_allowed(p));

Please align this to cpu_active_mask above.

> +			if (cpu >= nr_cpu_ids) {
> +				pr_warn("fail to find any online cpu and task will never come back\n");

Wouldn't be better a WARN_ON(1) here? It is a pretty
serious situation.

> +				goto unlock;
> +			}
> +			later_rq = cpu_rq(cpu);
> +			double_lock_balance(rq, later_rq);
> +		}
> +
> +		deactivate_task(rq, p, 0);
> +		set_task_cpu(p, later_rq->cpu);
> +		activate_task(later_rq, p, ENQUEUE_REPLENISH);
> +
> +		resched_curr(later_rq);

Your later_rq can also come from the cpumask_any_and(), we
should check if we need a resched here.

Best,

- Juri

> +
> +		double_unlock_balance(rq, later_rq);
> +
> +		goto unlock;
> +	}
> +
> +	/*
>  	 * If the throttle happened during sched-out; like:
>  	 *
>  	 *   schedule()
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v8] sched/deadline: support dl task migration during cpu hotplug
  2015-03-02 12:11 ` Juri Lelli
@ 2015-03-02 22:59   ` Wanpeng Li
  0 siblings, 0 replies; 3+ messages in thread
From: Wanpeng Li @ 2015-03-02 22:59 UTC (permalink / raw)
  To: Juri Lelli; +Cc: Ingo Molnar, Peter Zijlstra, linux-kernel, Wanpeng Li

Hi Juri,
On Mon, Mar 02, 2015 at 12:11:48PM +0000, Juri Lelli wrote:
>Hi,
>
>On 25/02/2015 11:50, Wanpeng Li wrote:
>> I observe that dl task can't be migrated to other cpus during cpu hotplug,
>> in addition, task may/may not be running again if cpu is added back. The
>> root cause which I found is that dl task will be throtted and removed from
>> dl rq after comsuming all budget, which leads to stop task can't pick it up
>> from dl rq and migrate to other cpus during hotplug.
>> 
>> The method to reproduce:
>> schedtool -E -t 50000:100000 -e ./test
>> Actually test is just a simple for loop. Then observe which cpu the test
>> task is on.
>> echo 0 > /sys/devices/system/cpu/cpuN/online
>> 
>> This patch adds the dl task migration during cpu hotplug by finding a most
>> suitable later deadline rq after dl timer fire if current rq is offline,
>> if fail to find a suitable later deadline rq then fallback to any eligible
>> online cpu in order that the deadline task will come back to us, and the
>> push/pull mechanism should then move it around properly.
>> 
>> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
>> ---
>> v7 -> v8:
>>  * remove rd->span related modification since Pang's commit 16b269436b72 
>>    (sched/deadline: Modify cpudl::free_cpus to reflect rd->online) merged 
>>    upstream, which Juri pointed out can handle the exclusive cpusets.
>>  * rebase 
>> v6 -> v7:
>>  * rebase
>> v5 -> v6:
>>  * add double_lock_balance in the fallback path
>> v4 -> v5:
>>  * remove raw_spin_unlock(&rq->lock)
>>  * cleanup codes, spotted by Peterz
>>  * cleanup patch description
>> v3 -> v4:
>>  * use tsk_cpus_allowed wrapper
>>  * fix compile error
>> v2 -> v3:
>>  * don't get_task_struct
>>  * if cannot preempt any rq, fallback to pick any online cpus
>>  * use cpu_active_mask as original later_mask if cpu is offline
>> v1 -> v2:
>>  * push the task to another cpu in dl_task_timer() if rq is offline.
>>  kernel/sched/deadline.c | 38 ++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 38 insertions(+)
>> 
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 3fa8fa6..49f92c8 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -492,6 +492,7 @@ static int start_dl_timer(struct sched_dl_entity *dl_se, bool boosted)
>>  	return hrtimer_active(&dl_se->dl_timer);
>>  }
>>  
>> +static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq);
>>  /*
>>   * This is the bandwidth enforcement timer callback. If here, we know
>>   * a task is not on its dl_rq, since the fact that the timer was running
>> @@ -537,6 +538,43 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
>>  	update_rq_clock(rq);
>>  
>>  	/*
>> +	 * So if we find that the rq the task was on is no longer
>> +	 * available, we need to select a new rq.
>> +	 */
>> +	if (unlikely(!rq->online)) {
>> +		struct rq *later_rq = NULL;
>> +
>> +		later_rq = find_lock_later_rq(p, rq);
>> +
>> +		if (!later_rq) {
>> +			int cpu;
>> +
>> +			/*
>> +			 * If cannot preempt any rq, fallback to pick any
>> +			 * online cpu.
>> +			 */
>> +			cpu = cpumask_any_and(cpu_active_mask,
>> +					tsk_cpus_allowed(p));
>
>Please align this to cpu_active_mask above.

Ok.

>
>> +			if (cpu >= nr_cpu_ids) {
>> +				pr_warn("fail to find any online cpu and task will never come back\n");
>
>Wouldn't be better a WARN_ON(1) here? It is a pretty
>serious situation.

Good idea.

>
>> +				goto unlock;
>> +			}
>> +			later_rq = cpu_rq(cpu);
>> +			double_lock_balance(rq, later_rq);
>> +		}
>> +
>> +		deactivate_task(rq, p, 0);
>> +		set_task_cpu(p, later_rq->cpu);
>> +		activate_task(later_rq, p, ENQUEUE_REPLENISH);
>> +
>> +		resched_curr(later_rq);
>
>Your later_rq can also come from the cpumask_any_and(), we
>should check if we need a resched here.

I will add the check in next version, great thanks for your review. ;-)

Regards,
Wanpeng Li 

>
>Best,
>
>- Juri
>
>> +
>> +		double_unlock_balance(rq, later_rq);
>> +
>> +		goto unlock;
>> +	}
>> +
>> +	/*
>>  	 * If the throttle happened during sched-out; like:
>>  	 *
>>  	 *   schedule()
>> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-03-02 23:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-25 11:50 [PATCH v8] sched/deadline: support dl task migration during cpu hotplug Wanpeng Li
2015-03-02 12:11 ` Juri Lelli
2015-03-02 22:59   ` Wanpeng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).