All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
       [not found] <20140624074148.8738.57690.stgit@tkhai>
@ 2014-06-24  7:53 ` Kirill Tkhai
  2014-06-24 17:03   ` bsegall
  2014-06-26 11:08   ` Srikar Dronamraju
  2014-06-24  7:53 ` [PATCH v2 2/3] sched/rt: __disable_runtime: Enqueue just unthrottled rt_rq back on the stack Kirill Tkhai
  2014-06-24  7:54 ` [PATCH v2 3/3] sched: Rework check_for_tasks() Kirill Tkhai
  2 siblings, 2 replies; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-24  7:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, tkhai, Srikar Dronamraju,
	Mike Galbraith, khorenko, Ben Segall, Paul Turner

We kill rq->rd on the CPU_DOWN_PREPARE stage:

	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline

This unthrottles all throttled cfs_rqs.

But the cpu is still able to call schedule() till

	take_cpu_down->__cpu_disable()

is called from stop_machine.

This case the tasks from just unthrottled cfs_rqs are pickable
in a standard scheduler way, and they are picked by dying cpu.
The cfs_rqs becomes throttled again, and migrate_tasks()
in migration_call skips their tasks (one more unthrottle
in migrate_tasks()->CPU_DYING does not happen, because rq->rd
is already NULL).

Patch sets runtime_enabled to zero. This guarantees, the runtime
is not accounted, and the cfs_rqs won't exceed given
cfs_rq->runtime_remaining = 1, and tasks will be pickable
in migrate_tasks(). runtime_enabled is recalculated again
when rq becomes online again.

Ben Segall also noticed, we always enable runtime in
tg_set_cfs_bandwidth(). Actually, we should do that for online
cpus only. To fix that, we check if a cpu is online when
its rq is locked. This guarantees we do not have races with
set_rq_offline(), which also requires rq->lock.

v2: Fix race with tg_set_cfs_bandwidth().
    Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
CC: Konstantin Khorenko <khorenko@parallels.com>
CC: Ben Segall <bsegall@google.com>
CC: Paul Turner <pjt@google.com>
CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Mike Galbraith <umgwanakikbuti@gmail.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |   15 +++++++++++----
 kernel/sched/fair.c |   22 ++++++++++++++++++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f3063c..707a3c5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
 		struct rq *rq = cfs_rq->rq;
 
 		raw_spin_lock_irq(&rq->lock);
-		cfs_rq->runtime_enabled = runtime_enabled;
-		cfs_rq->runtime_remaining = 0;
+		/*
+		 * Do not enable runtime on offline runqueues. We specially
+		 * make it disabled in unthrottle_offline_cfs_rqs().
+		 */
+		if (cpu_online(i)) {
+			cfs_rq->runtime_enabled = runtime_enabled;
+			cfs_rq->runtime_remaining = 0;
+
+			if (cfs_rq->throttled)
+				unthrottle_cfs_rq(cfs_rq);
+		}
 
-		if (cfs_rq->throttled)
-			unthrottle_cfs_rq(cfs_rq);
 		raw_spin_unlock_irq(&rq->lock);
 	}
 	if (runtime_was_enabled && !runtime_enabled)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1f9c457..5616d23 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3776,6 +3776,19 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
 	hrtimer_cancel(&cfs_b->slack_timer);
 }
 
+static void __maybe_unused update_runtime_enabled(struct rq *rq)
+{
+	struct cfs_rq *cfs_rq;
+
+	for_each_leaf_cfs_rq(rq, cfs_rq) {
+		struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
+
+		raw_spin_lock(&cfs_b->lock);
+		cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
+		raw_spin_unlock(&cfs_b->lock);
+	}
+}
+
 static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 {
 	struct cfs_rq *cfs_rq;
@@ -3789,6 +3802,12 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 		 * there's some valid quota amount
 		 */
 		cfs_rq->runtime_remaining = 1;
+		/*
+		 * Offline rq is schedulable till cpu is completely disabled
+		 * in take_cpu_down(), so we prevent new cfs throttling here.
+		 */
+		cfs_rq->runtime_enabled = 0;
+
 		if (cfs_rq_throttled(cfs_rq))
 			unthrottle_cfs_rq(cfs_rq);
 	}
@@ -3832,6 +3851,7 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
 	return NULL;
 }
 static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
+static inline void update_runtime_enabled(struct rq *rq) {}
 static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
 
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -7325,6 +7345,8 @@ void trigger_load_balance(struct rq *rq)
 static void rq_online_fair(struct rq *rq)
 {
 	update_sysctl();
+
+	update_runtime_enabled(rq);
 }
 
 static void rq_offline_fair(struct rq *rq)




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/3] sched/rt: __disable_runtime: Enqueue just unthrottled rt_rq back on the stack
       [not found] <20140624074148.8738.57690.stgit@tkhai>
  2014-06-24  7:53 ` [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq Kirill Tkhai
@ 2014-06-24  7:53 ` Kirill Tkhai
  2014-06-24  7:54 ` [PATCH v2 3/3] sched: Rework check_for_tasks() Kirill Tkhai
  2 siblings, 0 replies; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-24  7:53 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, tkhai, Srikar Dronamraju,
	Mike Galbraith, khorenko, Ben Segall, Paul Turner


Make rt_rq available for pick_next_task(). Otherwise, their tasks
stay prisoned long time till dead cpu becomes alive again.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
CC: Konstantin Khorenko <khorenko@parallels.com>
CC: Ben Segall <bsegall@google.com>
CC: Paul Turner <pjt@google.com>
CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Mike Galbraith <umgwanakikbuti@gmail.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/rt.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a490831..671a8b5 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -740,6 +740,9 @@ static void __disable_runtime(struct rq *rq)
 		rt_rq->rt_throttled = 0;
 		raw_spin_unlock(&rt_rq->rt_runtime_lock);
 		raw_spin_unlock(&rt_b->rt_runtime_lock);
+
+		/* Make rt_rq available for pick_next_task() */
+		sched_rt_rq_enqueue(rt_rq);
 	}
 }
 




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/3] sched: Rework check_for_tasks()
       [not found] <20140624074148.8738.57690.stgit@tkhai>
  2014-06-24  7:53 ` [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq Kirill Tkhai
  2014-06-24  7:53 ` [PATCH v2 2/3] sched/rt: __disable_runtime: Enqueue just unthrottled rt_rq back on the stack Kirill Tkhai
@ 2014-06-24  7:54 ` Kirill Tkhai
  2 siblings, 0 replies; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-24  7:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, tkhai, Srikar Dronamraju,
	Mike Galbraith, khorenko, Ben Segall, Paul Turner


1)Iterate throw all of threads in the system.
  Check for all threads, not only for group leaders.

2)Check for p->on_rq instead of p->state and cputime.
  Preempted task in !TASK_RUNNING state  OR just
  created task may be queued, that we want to be
  reported too.

3)Use read_lock() instead of write_lock().
  This function does not change any structures, and
  read_lock() is enough.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
CC: Konstantin Khorenko <khorenko@parallels.com>
CC: Ben Segall <bsegall@google.com>
CC: Paul Turner <pjt@google.com>
CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
CC: Mike Galbraith <umgwanakikbuti@gmail.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Ingo Molnar <mingo@kernel.org>
---
 kernel/cpu.c |   33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index a343bde..81e2a38 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -274,21 +274,28 @@ void clear_tasks_mm_cpumask(int cpu)
 	rcu_read_unlock();
 }
 
-static inline void check_for_tasks(int cpu)
+static inline void check_for_tasks(int dead_cpu)
 {
-	struct task_struct *p;
-	cputime_t utime, stime;
+	struct task_struct *g, *p;
 
-	write_lock_irq(&tasklist_lock);
-	for_each_process(p) {
-		task_cputime(p, &utime, &stime);
-		if (task_cpu(p) == cpu && p->state == TASK_RUNNING &&
-		    (utime || stime))
-			pr_warn("Task %s (pid = %d) is on cpu %d (state = %ld, flags = %x)\n",
-				p->comm, task_pid_nr(p), cpu,
-				p->state, p->flags);
-	}
-	write_unlock_irq(&tasklist_lock);
+	read_lock_irq(&tasklist_lock);
+	do_each_thread(g, p) {
+		if (!p->on_rq)
+			continue;
+		/*
+		 * We do the check with unlocked task_rq(p)->lock.
+		 * Order the reading to do not warn about a task,
+		 * which was running on this cpu in the past, and
+		 * it's just been woken on another cpu.
+		 */
+		rmb();
+		if (task_cpu(p) != dead_cpu)
+			continue;
+
+		pr_warn("Task %s (pid=%d) is on cpu %d (state=%ld, flags=%x)\n",
+			p->comm, task_pid_nr(p), dead_cpu, p->state, p->flags);
+	} while_each_thread(g, p);
+	read_unlock_irq(&tasklist_lock);
 }
 
 struct take_cpu_down_param {




^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-24  7:53 ` [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq Kirill Tkhai
@ 2014-06-24 17:03   ` bsegall
  2014-06-24 19:01     ` Kirill Tkhai
  2014-06-26 11:08   ` Srikar Dronamraju
  1 sibling, 1 reply; 14+ messages in thread
From: bsegall @ 2014-06-24 17:03 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, tkhai,
	Srikar Dronamraju, Mike Galbraith, khorenko, Paul Turner

Kirill Tkhai <ktkhai@parallels.com> writes:

> We kill rq->rd on the CPU_DOWN_PREPARE stage:
>
> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>
> This unthrottles all throttled cfs_rqs.
>
> But the cpu is still able to call schedule() till
>
> 	take_cpu_down->__cpu_disable()
>
> is called from stop_machine.
>
> This case the tasks from just unthrottled cfs_rqs are pickable
> in a standard scheduler way, and they are picked by dying cpu.
> The cfs_rqs becomes throttled again, and migrate_tasks()
> in migration_call skips their tasks (one more unthrottle
> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> is already NULL).
>
> Patch sets runtime_enabled to zero. This guarantees, the runtime
> is not accounted, and the cfs_rqs won't exceed given
> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> in migrate_tasks(). runtime_enabled is recalculated again
> when rq becomes online again.
>
> Ben Segall also noticed, we always enable runtime in
> tg_set_cfs_bandwidth(). Actually, we should do that for online
> cpus only. To fix that, we check if a cpu is online when
> its rq is locked. This guarantees we do not have races with
> set_rq_offline(), which also requires rq->lock.
>
> v2: Fix race with tg_set_cfs_bandwidth().
>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>
> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> CC: Konstantin Khorenko <khorenko@parallels.com>
> CC: Ben Segall <bsegall@google.com>
> CC: Paul Turner <pjt@google.com>
> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Ingo Molnar <mingo@kernel.org>
> ---
>  kernel/sched/core.c |   15 +++++++++++----
>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
>  2 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7f3063c..707a3c5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>  		struct rq *rq = cfs_rq->rq;
>  
>  		raw_spin_lock_irq(&rq->lock);
> -		cfs_rq->runtime_enabled = runtime_enabled;
> -		cfs_rq->runtime_remaining = 0;
> +		/*
> +		 * Do not enable runtime on offline runqueues. We specially
> +		 * make it disabled in unthrottle_offline_cfs_rqs().
> +		 */
> +		if (cpu_online(i)) {
> +			cfs_rq->runtime_enabled = runtime_enabled;
> +			cfs_rq->runtime_remaining = 0;
> +
> +			if (cfs_rq->throttled)
> +				unthrottle_cfs_rq(cfs_rq);
> +		}

We can just do for_each_online_cpu, yes? Also we probably need
get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
right?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-24 17:03   ` bsegall
@ 2014-06-24 19:01     ` Kirill Tkhai
  2014-06-24 19:13       ` bsegall
  0 siblings, 1 reply; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-24 19:01 UTC (permalink / raw)
  To: bsegall
  Cc: Kirill Tkhai, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Srikar Dronamraju, Mike Galbraith, khorenko, Paul Turner

On 24.06.2014 21:03, bsegall@google.com wrote:
> Kirill Tkhai <ktkhai@parallels.com> writes:
> 
>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
>>
>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>>
>> This unthrottles all throttled cfs_rqs.
>>
>> But the cpu is still able to call schedule() till
>>
>> 	take_cpu_down->__cpu_disable()
>>
>> is called from stop_machine.
>>
>> This case the tasks from just unthrottled cfs_rqs are pickable
>> in a standard scheduler way, and they are picked by dying cpu.
>> The cfs_rqs becomes throttled again, and migrate_tasks()
>> in migration_call skips their tasks (one more unthrottle
>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>> is already NULL).
>>
>> Patch sets runtime_enabled to zero. This guarantees, the runtime
>> is not accounted, and the cfs_rqs won't exceed given
>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
>> in migrate_tasks(). runtime_enabled is recalculated again
>> when rq becomes online again.
>>
>> Ben Segall also noticed, we always enable runtime in
>> tg_set_cfs_bandwidth(). Actually, we should do that for online
>> cpus only. To fix that, we check if a cpu is online when
>> its rq is locked. This guarantees we do not have races with
>> set_rq_offline(), which also requires rq->lock.
>>
>> v2: Fix race with tg_set_cfs_bandwidth().
>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>>
>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
>> CC: Konstantin Khorenko <khorenko@parallels.com>
>> CC: Ben Segall <bsegall@google.com>
>> CC: Paul Turner <pjt@google.com>
>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
>> CC: Peter Zijlstra <peterz@infradead.org>
>> CC: Ingo Molnar <mingo@kernel.org>
>> ---
>>  kernel/sched/core.c |   15 +++++++++++----
>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
>>  2 files changed, 33 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 7f3063c..707a3c5 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>>  		struct rq *rq = cfs_rq->rq;
>>  
>>  		raw_spin_lock_irq(&rq->lock);
>> -		cfs_rq->runtime_enabled = runtime_enabled;
>> -		cfs_rq->runtime_remaining = 0;
>> +		/*
>> +		 * Do not enable runtime on offline runqueues. We specially
>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
>> +		 */
>> +		if (cpu_online(i)) {
>> +			cfs_rq->runtime_enabled = runtime_enabled;
>> +			cfs_rq->runtime_remaining = 0;
>> +
>> +			if (cfs_rq->throttled)
>> +				unthrottle_cfs_rq(cfs_rq);
>> +		}
> 
> We can just do for_each_online_cpu, yes? Also we probably need
> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
> right?
> 

Yes, we can use for_each_online_cpu/for_each_active_cpu with
get_online_cpus() taken. But it adds one more lock dependence.
This looks worse for me.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-24 19:01     ` Kirill Tkhai
@ 2014-06-24 19:13       ` bsegall
  2014-06-24 19:26         ` Kirill Tkhai
  0 siblings, 1 reply; 14+ messages in thread
From: bsegall @ 2014-06-24 19:13 UTC (permalink / raw)
  To: tkhai
  Cc: Kirill Tkhai, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Srikar Dronamraju, Mike Galbraith, khorenko, Paul Turner

Kirill Tkhai <tkhai@yandex.ru> writes:

> On 24.06.2014 21:03, bsegall@google.com wrote:
>> Kirill Tkhai <ktkhai@parallels.com> writes:
>> 
>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
>>>
>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>>>
>>> This unthrottles all throttled cfs_rqs.
>>>
>>> But the cpu is still able to call schedule() till
>>>
>>> 	take_cpu_down->__cpu_disable()
>>>
>>> is called from stop_machine.
>>>
>>> This case the tasks from just unthrottled cfs_rqs are pickable
>>> in a standard scheduler way, and they are picked by dying cpu.
>>> The cfs_rqs becomes throttled again, and migrate_tasks()
>>> in migration_call skips their tasks (one more unthrottle
>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>>> is already NULL).
>>>
>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
>>> is not accounted, and the cfs_rqs won't exceed given
>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
>>> in migrate_tasks(). runtime_enabled is recalculated again
>>> when rq becomes online again.
>>>
>>> Ben Segall also noticed, we always enable runtime in
>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
>>> cpus only. To fix that, we check if a cpu is online when
>>> its rq is locked. This guarantees we do not have races with
>>> set_rq_offline(), which also requires rq->lock.
>>>
>>> v2: Fix race with tg_set_cfs_bandwidth().
>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>>>
>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
>>> CC: Konstantin Khorenko <khorenko@parallels.com>
>>> CC: Ben Segall <bsegall@google.com>
>>> CC: Paul Turner <pjt@google.com>
>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
>>> CC: Peter Zijlstra <peterz@infradead.org>
>>> CC: Ingo Molnar <mingo@kernel.org>
>>> ---
>>>  kernel/sched/core.c |   15 +++++++++++----
>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
>>>  2 files changed, 33 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 7f3063c..707a3c5 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>>>  		struct rq *rq = cfs_rq->rq;
>>>  
>>>  		raw_spin_lock_irq(&rq->lock);
>>> -		cfs_rq->runtime_enabled = runtime_enabled;
>>> -		cfs_rq->runtime_remaining = 0;
>>> +		/*
>>> +		 * Do not enable runtime on offline runqueues. We specially
>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
>>> +		 */
>>> +		if (cpu_online(i)) {
>>> +			cfs_rq->runtime_enabled = runtime_enabled;
>>> +			cfs_rq->runtime_remaining = 0;
>>> +
>>> +			if (cfs_rq->throttled)
>>> +				unthrottle_cfs_rq(cfs_rq);
>>> +		}
>> 
>> We can just do for_each_online_cpu, yes? Also we probably need
>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
>> right?
>> 
>
> Yes, we can use for_each_online_cpu/for_each_active_cpu with
> get_online_cpus() taken. But it adds one more lock dependence.
> This looks worse for me.

I mean, you need get_online_cpus anyway - cpu_online is just a test
against the same mask that for_each_online_cpu uses, and without taking
the lock you can still race with offlining and reset runtime_enabled.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-24 19:13       ` bsegall
@ 2014-06-24 19:26         ` Kirill Tkhai
  2014-06-25  7:53           ` Kirill Tkhai
  0 siblings, 1 reply; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-24 19:26 UTC (permalink / raw)
  To: bsegall
  Cc: Kirill Tkhai, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Srikar Dronamraju, Mike Galbraith, khorenko, Paul Turner

On 24.06.2014 23:13, bsegall@google.com wrote:
> Kirill Tkhai <tkhai@yandex.ru> writes:
> 
>> On 24.06.2014 21:03, bsegall@google.com wrote:
>>> Kirill Tkhai <ktkhai@parallels.com> writes:
>>>
>>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
>>>>
>>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>>>>
>>>> This unthrottles all throttled cfs_rqs.
>>>>
>>>> But the cpu is still able to call schedule() till
>>>>
>>>> 	take_cpu_down->__cpu_disable()
>>>>
>>>> is called from stop_machine.
>>>>
>>>> This case the tasks from just unthrottled cfs_rqs are pickable
>>>> in a standard scheduler way, and they are picked by dying cpu.
>>>> The cfs_rqs becomes throttled again, and migrate_tasks()
>>>> in migration_call skips their tasks (one more unthrottle
>>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>>>> is already NULL).
>>>>
>>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
>>>> is not accounted, and the cfs_rqs won't exceed given
>>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
>>>> in migrate_tasks(). runtime_enabled is recalculated again
>>>> when rq becomes online again.
>>>>
>>>> Ben Segall also noticed, we always enable runtime in
>>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
>>>> cpus only. To fix that, we check if a cpu is online when
>>>> its rq is locked. This guarantees we do not have races with
>>>> set_rq_offline(), which also requires rq->lock.
>>>>
>>>> v2: Fix race with tg_set_cfs_bandwidth().
>>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>>>>
>>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
>>>> CC: Konstantin Khorenko <khorenko@parallels.com>
>>>> CC: Ben Segall <bsegall@google.com>
>>>> CC: Paul Turner <pjt@google.com>
>>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
>>>> CC: Peter Zijlstra <peterz@infradead.org>
>>>> CC: Ingo Molnar <mingo@kernel.org>
>>>> ---
>>>>  kernel/sched/core.c |   15 +++++++++++----
>>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
>>>>  2 files changed, 33 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index 7f3063c..707a3c5 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>>>>  		struct rq *rq = cfs_rq->rq;
>>>>  
>>>>  		raw_spin_lock_irq(&rq->lock);
>>>> -		cfs_rq->runtime_enabled = runtime_enabled;
>>>> -		cfs_rq->runtime_remaining = 0;
>>>> +		/*
>>>> +		 * Do not enable runtime on offline runqueues. We specially
>>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
>>>> +		 */
>>>> +		if (cpu_online(i)) {
>>>> +			cfs_rq->runtime_enabled = runtime_enabled;
>>>> +			cfs_rq->runtime_remaining = 0;
>>>> +
>>>> +			if (cfs_rq->throttled)
>>>> +				unthrottle_cfs_rq(cfs_rq);
>>>> +		}
>>>
>>> We can just do for_each_online_cpu, yes? Also we probably need
>>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
>>> right?
>>>
>>
>> Yes, we can use for_each_online_cpu/for_each_active_cpu with
>> get_online_cpus() taken. But it adds one more lock dependence.
>> This looks worse for me.
> 
> I mean, you need get_online_cpus anyway - cpu_online is just a test
> against the same mask that for_each_online_cpu uses, and without taking
> the lock you can still race with offlining and reset runtime_enabled.
> 

Oh, I see. Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-24 19:26         ` Kirill Tkhai
@ 2014-06-25  7:53           ` Kirill Tkhai
  2014-06-25  8:03             ` Kirill Tkhai
  2014-06-25 16:52             ` bsegall
  0 siblings, 2 replies; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-25  7:53 UTC (permalink / raw)
  To: Ben Segall
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Srikar Dronamraju,
	Mike Galbraith, khorenko, Paul Turner, tkhai

В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
> On 24.06.2014 23:13, bsegall@google.com wrote:
> > Kirill Tkhai <tkhai@yandex.ru> writes:
> > 
> >> On 24.06.2014 21:03, bsegall@google.com wrote:
> >>> Kirill Tkhai <ktkhai@parallels.com> writes:
> >>>
> >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
> >>>>
> >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> >>>>
> >>>> This unthrottles all throttled cfs_rqs.
> >>>>
> >>>> But the cpu is still able to call schedule() till
> >>>>
> >>>> 	take_cpu_down->__cpu_disable()
> >>>>
> >>>> is called from stop_machine.
> >>>>
> >>>> This case the tasks from just unthrottled cfs_rqs are pickable
> >>>> in a standard scheduler way, and they are picked by dying cpu.
> >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
> >>>> in migration_call skips their tasks (one more unthrottle
> >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> >>>> is already NULL).
> >>>>
> >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
> >>>> is not accounted, and the cfs_rqs won't exceed given
> >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> >>>> in migrate_tasks(). runtime_enabled is recalculated again
> >>>> when rq becomes online again.
> >>>>
> >>>> Ben Segall also noticed, we always enable runtime in
> >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
> >>>> cpus only. To fix that, we check if a cpu is online when
> >>>> its rq is locked. This guarantees we do not have races with
> >>>> set_rq_offline(), which also requires rq->lock.
> >>>>
> >>>> v2: Fix race with tg_set_cfs_bandwidth().
> >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> >>>>
> >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
> >>>> CC: Ben Segall <bsegall@google.com>
> >>>> CC: Paul Turner <pjt@google.com>
> >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> >>>> CC: Peter Zijlstra <peterz@infradead.org>
> >>>> CC: Ingo Molnar <mingo@kernel.org>
> >>>> ---
> >>>>  kernel/sched/core.c |   15 +++++++++++----
> >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
> >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >>>> index 7f3063c..707a3c5 100644
> >>>> --- a/kernel/sched/core.c
> >>>> +++ b/kernel/sched/core.c
> >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
> >>>>  		struct rq *rq = cfs_rq->rq;
> >>>>  
> >>>>  		raw_spin_lock_irq(&rq->lock);
> >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
> >>>> -		cfs_rq->runtime_remaining = 0;
> >>>> +		/*
> >>>> +		 * Do not enable runtime on offline runqueues. We specially
> >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
> >>>> +		 */
> >>>> +		if (cpu_online(i)) {
> >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
> >>>> +			cfs_rq->runtime_remaining = 0;
> >>>> +
> >>>> +			if (cfs_rq->throttled)
> >>>> +				unthrottle_cfs_rq(cfs_rq);
> >>>> +		}
> >>>
> >>> We can just do for_each_online_cpu, yes? Also we probably need
> >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
> >>> right?
> >>>
> >>
> >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
> >> get_online_cpus() taken. But it adds one more lock dependence.
> >> This looks worse for me.
> > 
> > I mean, you need get_online_cpus anyway - cpu_online is just a test
> > against the same mask that for_each_online_cpu uses, and without taking
> > the lock you can still race with offlining and reset runtime_enabled.
> > 
> 
> Oh, I see. Thanks.

But we can check for rq->online, don't we? How about this?

    sched/fair: Disable runtime_enabled on dying rq
    
    We kill rq->rd on the CPU_DOWN_PREPARE stage:
    
    	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
    	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
    
    This unthrottles all throttled cfs_rqs.
    
    But the cpu is still able to call schedule() till
    
    	take_cpu_down->__cpu_disable()
    
    is called from stop_machine.
    
    This case the tasks from just unthrottled cfs_rqs are pickable
    in a standard scheduler way, and they are picked by dying cpu.
    The cfs_rqs becomes throttled again, and migrate_tasks()
    in migration_call skips their tasks (one more unthrottle
    in migrate_tasks()->CPU_DYING does not happen, because rq->rd
    is already NULL).
    
    Patch sets runtime_enabled to zero. This guarantees, the runtime
    is not accounted, and the cfs_rqs won't exceed given
    cfs_rq->runtime_remaining = 1, and tasks will be pickable
    in migrate_tasks(). runtime_enabled is recalculated again
    when rq becomes online again.
    
    Ben Segall also noticed, we always enable runtime in
    tg_set_cfs_bandwidth(). Actually, we should do that for online
    cpus only. To fix that, we check if a cpu is online when
    its rq is locked. This guarantees we do not have races with
    set_rq_offline(), which also requires rq->lock.
    
    v2: Fix race with tg_set_cfs_bandwidth().
        Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
    v3: Check for rq->online instead of cpu_active.
    
    Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ceea8d0..4c163c9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7837,11 +7837,17 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
 		struct rq *rq = cfs_rq->rq;
 
 		raw_spin_lock_irq(&rq->lock);
-		cfs_rq->runtime_enabled = runtime_enabled;
-		cfs_rq->runtime_remaining = 0;
+		/*
+		 * Do not enable runtime on offline runqueues. We specially
+		 * disable it in unthrottle_offline_cfs_rqs().
+		 */
+		if (rq->online) {
+			cfs_rq->runtime_enabled = runtime_enabled;
+			cfs_rq->runtime_remaining = 0;
 
-		if (cfs_rq->throttled)
-			unthrottle_cfs_rq(cfs_rq);
+			if (cfs_rq->throttled)
+				unthrottle_cfs_rq(cfs_rq);
+		}
 		raw_spin_unlock_irq(&rq->lock);
 	}
 	if (runtime_was_enabled && !runtime_enabled)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1f9c457..5616d23 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3776,6 +3776,19 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
 	hrtimer_cancel(&cfs_b->slack_timer);
 }
 
+static void __maybe_unused update_runtime_enabled(struct rq *rq)
+{
+	struct cfs_rq *cfs_rq;
+
+	for_each_leaf_cfs_rq(rq, cfs_rq) {
+		struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
+
+		raw_spin_lock(&cfs_b->lock);
+		cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
+		raw_spin_unlock(&cfs_b->lock);
+	}
+}
+
 static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 {
 	struct cfs_rq *cfs_rq;
@@ -3789,6 +3802,12 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
 		 * there's some valid quota amount
 		 */
 		cfs_rq->runtime_remaining = 1;
+		/*
+		 * Offline rq is schedulable till cpu is completely disabled
+		 * in take_cpu_down(), so we prevent new cfs throttling here.
+		 */
+		cfs_rq->runtime_enabled = 0;
+
 		if (cfs_rq_throttled(cfs_rq))
 			unthrottle_cfs_rq(cfs_rq);
 	}
@@ -3832,6 +3851,7 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
 	return NULL;
 }
 static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
+static inline void update_runtime_enabled(struct rq *rq) {}
 static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
 
 #endif /* CONFIG_CFS_BANDWIDTH */
@@ -7325,6 +7345,8 @@ void trigger_load_balance(struct rq *rq)
 static void rq_online_fair(struct rq *rq)
 {
 	update_sysctl();
+
+	update_runtime_enabled(rq);
 }
 
 static void rq_offline_fair(struct rq *rq)



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-25  7:53           ` Kirill Tkhai
@ 2014-06-25  8:03             ` Kirill Tkhai
  2014-06-25 16:52             ` bsegall
  1 sibling, 0 replies; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-25  8:03 UTC (permalink / raw)
  To: Ben Segall
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Srikar Dronamraju,
	Mike Galbraith, khorenko, Paul Turner, tkhai

В Ср, 25/06/2014 в 11:53 +0400, Kirill Tkhai пишет:
> В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
> > On 24.06.2014 23:13, bsegall@google.com wrote:
> > > Kirill Tkhai <tkhai@yandex.ru> writes:
> > > 
> > >> On 24.06.2014 21:03, bsegall@google.com wrote:
> > >>> Kirill Tkhai <ktkhai@parallels.com> writes:
> > >>>
> > >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
> > >>>>
> > >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> > >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> > >>>>
> > >>>> This unthrottles all throttled cfs_rqs.
> > >>>>
> > >>>> But the cpu is still able to call schedule() till
> > >>>>
> > >>>> 	take_cpu_down->__cpu_disable()
> > >>>>
> > >>>> is called from stop_machine.
> > >>>>
> > >>>> This case the tasks from just unthrottled cfs_rqs are pickable
> > >>>> in a standard scheduler way, and they are picked by dying cpu.
> > >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
> > >>>> in migration_call skips their tasks (one more unthrottle
> > >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> > >>>> is already NULL).
> > >>>>
> > >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
> > >>>> is not accounted, and the cfs_rqs won't exceed given
> > >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> > >>>> in migrate_tasks(). runtime_enabled is recalculated again
> > >>>> when rq becomes online again.
> > >>>>
> > >>>> Ben Segall also noticed, we always enable runtime in
> > >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
> > >>>> cpus only. To fix that, we check if a cpu is online when
> > >>>> its rq is locked. This guarantees we do not have races with
> > >>>> set_rq_offline(), which also requires rq->lock.
> > >>>>
> > >>>> v2: Fix race with tg_set_cfs_bandwidth().
> > >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> > >>>>
> > >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> > >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
> > >>>> CC: Ben Segall <bsegall@google.com>
> > >>>> CC: Paul Turner <pjt@google.com>
> > >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> > >>>> CC: Peter Zijlstra <peterz@infradead.org>
> > >>>> CC: Ingo Molnar <mingo@kernel.org>
> > >>>> ---
> > >>>>  kernel/sched/core.c |   15 +++++++++++----
> > >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
> > >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
> > >>>>
> > >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > >>>> index 7f3063c..707a3c5 100644
> > >>>> --- a/kernel/sched/core.c
> > >>>> +++ b/kernel/sched/core.c
> > >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
> > >>>>  		struct rq *rq = cfs_rq->rq;
> > >>>>  
> > >>>>  		raw_spin_lock_irq(&rq->lock);
> > >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
> > >>>> -		cfs_rq->runtime_remaining = 0;
> > >>>> +		/*
> > >>>> +		 * Do not enable runtime on offline runqueues. We specially
> > >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
> > >>>> +		 */
> > >>>> +		if (cpu_online(i)) {
> > >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
> > >>>> +			cfs_rq->runtime_remaining = 0;
> > >>>> +
> > >>>> +			if (cfs_rq->throttled)
> > >>>> +				unthrottle_cfs_rq(cfs_rq);
> > >>>> +		}
> > >>>
> > >>> We can just do for_each_online_cpu, yes? Also we probably need
> > >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
> > >>> right?
> > >>>
> > >>
> > >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
> > >> get_online_cpus() taken. But it adds one more lock dependence.
> > >> This looks worse for me.
> > > 
> > > I mean, you need get_online_cpus anyway - cpu_online is just a test
> > > against the same mask that for_each_online_cpu uses, and without taking
> > > the lock you can still race with offlining and reset runtime_enabled.
> > > 
> > 
> > Oh, I see. Thanks.
> 
> But we can check for rq->online, don't we? How about this?

Oh, rq->online presents only in SMP case... Ok, lets do with
get_online_cpus()...

> 
>     sched/fair: Disable runtime_enabled on dying rq
>     
>     We kill rq->rd on the CPU_DOWN_PREPARE stage:
>     
>     	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>     	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>     
>     This unthrottles all throttled cfs_rqs.
>     
>     But the cpu is still able to call schedule() till
>     
>     	take_cpu_down->__cpu_disable()
>     
>     is called from stop_machine.
>     
>     This case the tasks from just unthrottled cfs_rqs are pickable
>     in a standard scheduler way, and they are picked by dying cpu.
>     The cfs_rqs becomes throttled again, and migrate_tasks()
>     in migration_call skips their tasks (one more unthrottle
>     in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>     is already NULL).
>     
>     Patch sets runtime_enabled to zero. This guarantees, the runtime
>     is not accounted, and the cfs_rqs won't exceed given
>     cfs_rq->runtime_remaining = 1, and tasks will be pickable
>     in migrate_tasks(). runtime_enabled is recalculated again
>     when rq becomes online again.
>     
>     Ben Segall also noticed, we always enable runtime in
>     tg_set_cfs_bandwidth(). Actually, we should do that for online
>     cpus only. To fix that, we check if a cpu is online when
>     its rq is locked. This guarantees we do not have races with
>     set_rq_offline(), which also requires rq->lock.
>     
>     v2: Fix race with tg_set_cfs_bandwidth().
>         Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>     v3: Check for rq->online instead of cpu_active.
>     
>     Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ceea8d0..4c163c9 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7837,11 +7837,17 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>  		struct rq *rq = cfs_rq->rq;
>  
>  		raw_spin_lock_irq(&rq->lock);
> -		cfs_rq->runtime_enabled = runtime_enabled;
> -		cfs_rq->runtime_remaining = 0;
> +		/*
> +		 * Do not enable runtime on offline runqueues. We specially
> +		 * disable it in unthrottle_offline_cfs_rqs().
> +		 */
> +		if (rq->online) {
> +			cfs_rq->runtime_enabled = runtime_enabled;
> +			cfs_rq->runtime_remaining = 0;
>  
> -		if (cfs_rq->throttled)
> -			unthrottle_cfs_rq(cfs_rq);
> +			if (cfs_rq->throttled)
> +				unthrottle_cfs_rq(cfs_rq);
> +		}
>  		raw_spin_unlock_irq(&rq->lock);
>  	}
>  	if (runtime_was_enabled && !runtime_enabled)
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1f9c457..5616d23 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3776,6 +3776,19 @@ static void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b)
>  	hrtimer_cancel(&cfs_b->slack_timer);
>  }
>  
> +static void __maybe_unused update_runtime_enabled(struct rq *rq)
> +{
> +	struct cfs_rq *cfs_rq;
> +
> +	for_each_leaf_cfs_rq(rq, cfs_rq) {
> +		struct cfs_bandwidth *cfs_b = &cfs_rq->tg->cfs_bandwidth;
> +
> +		raw_spin_lock(&cfs_b->lock);
> +		cfs_rq->runtime_enabled = cfs_b->quota != RUNTIME_INF;
> +		raw_spin_unlock(&cfs_b->lock);
> +	}
> +}
> +
>  static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>  {
>  	struct cfs_rq *cfs_rq;
> @@ -3789,6 +3802,12 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>  		 * there's some valid quota amount
>  		 */
>  		cfs_rq->runtime_remaining = 1;
> +		/*
> +		 * Offline rq is schedulable till cpu is completely disabled
> +		 * in take_cpu_down(), so we prevent new cfs throttling here.
> +		 */
> +		cfs_rq->runtime_enabled = 0;
> +
>  		if (cfs_rq_throttled(cfs_rq))
>  			unthrottle_cfs_rq(cfs_rq);
>  	}
> @@ -3832,6 +3851,7 @@ static inline struct cfs_bandwidth *tg_cfs_bandwidth(struct task_group *tg)
>  	return NULL;
>  }
>  static inline void destroy_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
> +static inline void update_runtime_enabled(struct rq *rq) {}
>  static inline void unthrottle_offline_cfs_rqs(struct rq *rq) {}
>  
>  #endif /* CONFIG_CFS_BANDWIDTH */
> @@ -7325,6 +7345,8 @@ void trigger_load_balance(struct rq *rq)
>  static void rq_online_fair(struct rq *rq)
>  {
>  	update_sysctl();
> +
> +	update_runtime_enabled(rq);
>  }
>  
>  static void rq_offline_fair(struct rq *rq)
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-25  7:53           ` Kirill Tkhai
  2014-06-25  8:03             ` Kirill Tkhai
@ 2014-06-25 16:52             ` bsegall
  2014-06-25 17:31               ` Kirill Tkhai
  1 sibling, 1 reply; 14+ messages in thread
From: bsegall @ 2014-06-25 16:52 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Srikar Dronamraju,
	Mike Galbraith, khorenko, Paul Turner, tkhai

Kirill Tkhai <ktkhai@parallels.com> writes:

> В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
>> On 24.06.2014 23:13, bsegall@google.com wrote:
>> > Kirill Tkhai <tkhai@yandex.ru> writes:
>> > 
>> >> On 24.06.2014 21:03, bsegall@google.com wrote:
>> >>> Kirill Tkhai <ktkhai@parallels.com> writes:
>> >>>
>> >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
>> >>>>
>> >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>> >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>> >>>>
>> >>>> This unthrottles all throttled cfs_rqs.
>> >>>>
>> >>>> But the cpu is still able to call schedule() till
>> >>>>
>> >>>> 	take_cpu_down->__cpu_disable()
>> >>>>
>> >>>> is called from stop_machine.
>> >>>>
>> >>>> This case the tasks from just unthrottled cfs_rqs are pickable
>> >>>> in a standard scheduler way, and they are picked by dying cpu.
>> >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
>> >>>> in migration_call skips their tasks (one more unthrottle
>> >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>> >>>> is already NULL).
>> >>>>
>> >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
>> >>>> is not accounted, and the cfs_rqs won't exceed given
>> >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
>> >>>> in migrate_tasks(). runtime_enabled is recalculated again
>> >>>> when rq becomes online again.
>> >>>>
>> >>>> Ben Segall also noticed, we always enable runtime in
>> >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
>> >>>> cpus only. To fix that, we check if a cpu is online when
>> >>>> its rq is locked. This guarantees we do not have races with
>> >>>> set_rq_offline(), which also requires rq->lock.
>> >>>>
>> >>>> v2: Fix race with tg_set_cfs_bandwidth().
>> >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>> >>>>
>> >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
>> >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
>> >>>> CC: Ben Segall <bsegall@google.com>
>> >>>> CC: Paul Turner <pjt@google.com>
>> >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
>> >>>> CC: Peter Zijlstra <peterz@infradead.org>
>> >>>> CC: Ingo Molnar <mingo@kernel.org>
>> >>>> ---
>> >>>>  kernel/sched/core.c |   15 +++++++++++----
>> >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
>> >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
>> >>>>
>> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >>>> index 7f3063c..707a3c5 100644
>> >>>> --- a/kernel/sched/core.c
>> >>>> +++ b/kernel/sched/core.c
>> >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>> >>>>  		struct rq *rq = cfs_rq->rq;
>> >>>>  
>> >>>>  		raw_spin_lock_irq(&rq->lock);
>> >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
>> >>>> -		cfs_rq->runtime_remaining = 0;
>> >>>> +		/*
>> >>>> +		 * Do not enable runtime on offline runqueues. We specially
>> >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
>> >>>> +		 */
>> >>>> +		if (cpu_online(i)) {
>> >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
>> >>>> +			cfs_rq->runtime_remaining = 0;
>> >>>> +
>> >>>> +			if (cfs_rq->throttled)
>> >>>> +				unthrottle_cfs_rq(cfs_rq);
>> >>>> +		}
>> >>>
>> >>> We can just do for_each_online_cpu, yes? Also we probably need
>> >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
>> >>> right?
>> >>>
>> >>
>> >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
>> >> get_online_cpus() taken. But it adds one more lock dependence.
>> >> This looks worse for me.
>> > 
>> > I mean, you need get_online_cpus anyway - cpu_online is just a test
>> > against the same mask that for_each_online_cpu uses, and without taking
>> > the lock you can still race with offlining and reset runtime_enabled.
>> > 
>> 
>> Oh, I see. Thanks.
>
> But we can check for rq->online, don't we? How about this?

Yeah, that should work.
>
>     sched/fair: Disable runtime_enabled on dying rq
>     
>     We kill rq->rd on the CPU_DOWN_PREPARE stage:
>     
>     	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>     	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>     
>     This unthrottles all throttled cfs_rqs.
>     
>     But the cpu is still able to call schedule() till
>     
>     	take_cpu_down->__cpu_disable()
>     
>     is called from stop_machine.
>     
>     This case the tasks from just unthrottled cfs_rqs are pickable
>     in a standard scheduler way, and they are picked by dying cpu.
>     The cfs_rqs becomes throttled again, and migrate_tasks()
>     in migration_call skips their tasks (one more unthrottle
>     in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>     is already NULL).
>     
>     Patch sets runtime_enabled to zero. This guarantees, the runtime
>     is not accounted, and the cfs_rqs won't exceed given
>     cfs_rq->runtime_remaining = 1, and tasks will be pickable
>     in migrate_tasks(). runtime_enabled is recalculated again
>     when rq becomes online again.
>     
>     Ben Segall also noticed, we always enable runtime in
>     tg_set_cfs_bandwidth(). Actually, we should do that for online
>     cpus only. To fix that, we check if a cpu is online when
>     its rq is locked. This guarantees we do not have races with
>     set_rq_offline(), which also requires rq->lock.
>     
>     v2: Fix race with tg_set_cfs_bandwidth().
>         Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>     v3: Check for rq->online instead of cpu_active.
>     
>     Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>

Reviewed-by: Ben Segall <bsegall@google.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-25 16:52             ` bsegall
@ 2014-06-25 17:31               ` Kirill Tkhai
  2014-06-25 17:40                 ` bsegall
  0 siblings, 1 reply; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-25 17:31 UTC (permalink / raw)
  To: bsegall
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Srikar Dronamraju,
	Mike Galbraith, khorenko, Paul Turner, tkhai

В Ср, 25/06/2014 в 09:52 -0700, bsegall@google.com пишет:
> Kirill Tkhai <ktkhai@parallels.com> writes:
> 
> > В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
> >> On 24.06.2014 23:13, bsegall@google.com wrote:
> >> > Kirill Tkhai <tkhai@yandex.ru> writes:
> >> > 
> >> >> On 24.06.2014 21:03, bsegall@google.com wrote:
> >> >>> Kirill Tkhai <ktkhai@parallels.com> writes:
> >> >>>
> >> >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
> >> >>>>
> >> >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> >> >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> >> >>>>
> >> >>>> This unthrottles all throttled cfs_rqs.
> >> >>>>
> >> >>>> But the cpu is still able to call schedule() till
> >> >>>>
> >> >>>> 	take_cpu_down->__cpu_disable()
> >> >>>>
> >> >>>> is called from stop_machine.
> >> >>>>
> >> >>>> This case the tasks from just unthrottled cfs_rqs are pickable
> >> >>>> in a standard scheduler way, and they are picked by dying cpu.
> >> >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
> >> >>>> in migration_call skips their tasks (one more unthrottle
> >> >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> >> >>>> is already NULL).
> >> >>>>
> >> >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
> >> >>>> is not accounted, and the cfs_rqs won't exceed given
> >> >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> >> >>>> in migrate_tasks(). runtime_enabled is recalculated again
> >> >>>> when rq becomes online again.
> >> >>>>
> >> >>>> Ben Segall also noticed, we always enable runtime in
> >> >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
> >> >>>> cpus only. To fix that, we check if a cpu is online when
> >> >>>> its rq is locked. This guarantees we do not have races with
> >> >>>> set_rq_offline(), which also requires rq->lock.
> >> >>>>
> >> >>>> v2: Fix race with tg_set_cfs_bandwidth().
> >> >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> >> >>>>
> >> >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> >> >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
> >> >>>> CC: Ben Segall <bsegall@google.com>
> >> >>>> CC: Paul Turner <pjt@google.com>
> >> >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> >> >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> >> >>>> CC: Peter Zijlstra <peterz@infradead.org>
> >> >>>> CC: Ingo Molnar <mingo@kernel.org>
> >> >>>> ---
> >> >>>>  kernel/sched/core.c |   15 +++++++++++----
> >> >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
> >> >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
> >> >>>>
> >> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> >>>> index 7f3063c..707a3c5 100644
> >> >>>> --- a/kernel/sched/core.c
> >> >>>> +++ b/kernel/sched/core.c
> >> >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
> >> >>>>  		struct rq *rq = cfs_rq->rq;
> >> >>>>  
> >> >>>>  		raw_spin_lock_irq(&rq->lock);
> >> >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
> >> >>>> -		cfs_rq->runtime_remaining = 0;
> >> >>>> +		/*
> >> >>>> +		 * Do not enable runtime on offline runqueues. We specially
> >> >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
> >> >>>> +		 */
> >> >>>> +		if (cpu_online(i)) {
> >> >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
> >> >>>> +			cfs_rq->runtime_remaining = 0;
> >> >>>> +
> >> >>>> +			if (cfs_rq->throttled)
> >> >>>> +				unthrottle_cfs_rq(cfs_rq);
> >> >>>> +		}
> >> >>>
> >> >>> We can just do for_each_online_cpu, yes? Also we probably need
> >> >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
> >> >>> right?
> >> >>>
> >> >>
> >> >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
> >> >> get_online_cpus() taken. But it adds one more lock dependence.
> >> >> This looks worse for me.
> >> > 
> >> > I mean, you need get_online_cpus anyway - cpu_online is just a test
> >> > against the same mask that for_each_online_cpu uses, and without taking
> >> > the lock you can still race with offlining and reset runtime_enabled.
> >> > 
> >> 
> >> Oh, I see. Thanks.
> >
> > But we can check for rq->online, don't we? How about this?
> 
> Yeah, that should work.

We can't base on it because rq->offline is not available in !SMP.
Could you review the message from [PATCH v3 1/3] topic?

> >
> >     sched/fair: Disable runtime_enabled on dying rq
> >     
> >     We kill rq->rd on the CPU_DOWN_PREPARE stage:
> >     
> >     	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> >     	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> >     
> >     This unthrottles all throttled cfs_rqs.
> >     
> >     But the cpu is still able to call schedule() till
> >     
> >     	take_cpu_down->__cpu_disable()
> >     
> >     is called from stop_machine.
> >     
> >     This case the tasks from just unthrottled cfs_rqs are pickable
> >     in a standard scheduler way, and they are picked by dying cpu.
> >     The cfs_rqs becomes throttled again, and migrate_tasks()
> >     in migration_call skips their tasks (one more unthrottle
> >     in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> >     is already NULL).
> >     
> >     Patch sets runtime_enabled to zero. This guarantees, the runtime
> >     is not accounted, and the cfs_rqs won't exceed given
> >     cfs_rq->runtime_remaining = 1, and tasks will be pickable
> >     in migrate_tasks(). runtime_enabled is recalculated again
> >     when rq becomes online again.
> >     
> >     Ben Segall also noticed, we always enable runtime in
> >     tg_set_cfs_bandwidth(). Actually, we should do that for online
> >     cpus only. To fix that, we check if a cpu is online when
> >     its rq is locked. This guarantees we do not have races with
> >     set_rq_offline(), which also requires rq->lock.
> >     
> >     v2: Fix race with tg_set_cfs_bandwidth().
> >         Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> >     v3: Check for rq->online instead of cpu_active.
> >     
> >     Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> 
> Reviewed-by: Ben Segall <bsegall@google.com>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-25 17:31               ` Kirill Tkhai
@ 2014-06-25 17:40                 ` bsegall
  2014-06-25 17:44                   ` Kirill Tkhai
  0 siblings, 1 reply; 14+ messages in thread
From: bsegall @ 2014-06-25 17:40 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Srikar Dronamraju,
	Mike Galbraith, khorenko, Paul Turner, tkhai

Kirill Tkhai <ktkhai@parallels.com> writes:

> В Ср, 25/06/2014 в 09:52 -0700, bsegall@google.com пишет:
>> Kirill Tkhai <ktkhai@parallels.com> writes:
>> 
>> > В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
>> >> On 24.06.2014 23:13, bsegall@google.com wrote:
>> >> > Kirill Tkhai <tkhai@yandex.ru> writes:
>> >> > 
>> >> >> On 24.06.2014 21:03, bsegall@google.com wrote:
>> >> >>> Kirill Tkhai <ktkhai@parallels.com> writes:
>> >> >>>
>> >> >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
>> >> >>>>
>> >> >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
>> >> >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
>> >> >>>>
>> >> >>>> This unthrottles all throttled cfs_rqs.
>> >> >>>>
>> >> >>>> But the cpu is still able to call schedule() till
>> >> >>>>
>> >> >>>> 	take_cpu_down->__cpu_disable()
>> >> >>>>
>> >> >>>> is called from stop_machine.
>> >> >>>>
>> >> >>>> This case the tasks from just unthrottled cfs_rqs are pickable
>> >> >>>> in a standard scheduler way, and they are picked by dying cpu.
>> >> >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
>> >> >>>> in migration_call skips their tasks (one more unthrottle
>> >> >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
>> >> >>>> is already NULL).
>> >> >>>>
>> >> >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
>> >> >>>> is not accounted, and the cfs_rqs won't exceed given
>> >> >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
>> >> >>>> in migrate_tasks(). runtime_enabled is recalculated again
>> >> >>>> when rq becomes online again.
>> >> >>>>
>> >> >>>> Ben Segall also noticed, we always enable runtime in
>> >> >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
>> >> >>>> cpus only. To fix that, we check if a cpu is online when
>> >> >>>> its rq is locked. This guarantees we do not have races with
>> >> >>>> set_rq_offline(), which also requires rq->lock.
>> >> >>>>
>> >> >>>> v2: Fix race with tg_set_cfs_bandwidth().
>> >> >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
>> >> >>>>
>> >> >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
>> >> >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
>> >> >>>> CC: Ben Segall <bsegall@google.com>
>> >> >>>> CC: Paul Turner <pjt@google.com>
>> >> >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> >> >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
>> >> >>>> CC: Peter Zijlstra <peterz@infradead.org>
>> >> >>>> CC: Ingo Molnar <mingo@kernel.org>
>> >> >>>> ---
>> >> >>>>  kernel/sched/core.c |   15 +++++++++++----
>> >> >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
>> >> >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
>> >> >>>>
>> >> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> >> >>>> index 7f3063c..707a3c5 100644
>> >> >>>> --- a/kernel/sched/core.c
>> >> >>>> +++ b/kernel/sched/core.c
>> >> >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
>> >> >>>>  		struct rq *rq = cfs_rq->rq;
>> >> >>>>  
>> >> >>>>  		raw_spin_lock_irq(&rq->lock);
>> >> >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
>> >> >>>> -		cfs_rq->runtime_remaining = 0;
>> >> >>>> +		/*
>> >> >>>> +		 * Do not enable runtime on offline runqueues. We specially
>> >> >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
>> >> >>>> +		 */
>> >> >>>> +		if (cpu_online(i)) {
>> >> >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
>> >> >>>> +			cfs_rq->runtime_remaining = 0;
>> >> >>>> +
>> >> >>>> +			if (cfs_rq->throttled)
>> >> >>>> +				unthrottle_cfs_rq(cfs_rq);
>> >> >>>> +		}
>> >> >>>
>> >> >>> We can just do for_each_online_cpu, yes? Also we probably need
>> >> >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
>> >> >>> right?
>> >> >>>
>> >> >>
>> >> >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
>> >> >> get_online_cpus() taken. But it adds one more lock dependence.
>> >> >> This looks worse for me.
>> >> > 
>> >> > I mean, you need get_online_cpus anyway - cpu_online is just a test
>> >> > against the same mask that for_each_online_cpu uses, and without taking
>> >> > the lock you can still race with offlining and reset runtime_enabled.
>> >> > 
>> >> 
>> >> Oh, I see. Thanks.
>> >
>> > But we can check for rq->online, don't we? How about this?
>> 
>> Yeah, that should work.
>
> We can't base on it because rq->offline is not available in !SMP.
> Could you review the message from [PATCH v3 1/3] topic?

I'm not sure what you mean here. The patch just checking cpu_online
won't work, is there another version you want me to look at?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-25 17:40                 ` bsegall
@ 2014-06-25 17:44                   ` Kirill Tkhai
  0 siblings, 0 replies; 14+ messages in thread
From: Kirill Tkhai @ 2014-06-25 17:44 UTC (permalink / raw)
  To: bsegall
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Srikar Dronamraju,
	Mike Galbraith, khorenko, Paul Turner, tkhai

В Ср, 25/06/2014 в 10:40 -0700, bsegall@google.com пишет:
> Kirill Tkhai <ktkhai@parallels.com> writes:
> 
> > В Ср, 25/06/2014 в 09:52 -0700, bsegall@google.com пишет:
> >> Kirill Tkhai <ktkhai@parallels.com> writes:
> >> 
> >> > В Вт, 24/06/2014 в 23:26 +0400, Kirill Tkhai пишет:
> >> >> On 24.06.2014 23:13, bsegall@google.com wrote:
> >> >> > Kirill Tkhai <tkhai@yandex.ru> writes:
> >> >> > 
> >> >> >> On 24.06.2014 21:03, bsegall@google.com wrote:
> >> >> >>> Kirill Tkhai <ktkhai@parallels.com> writes:
> >> >> >>>
> >> >> >>>> We kill rq->rd on the CPU_DOWN_PREPARE stage:
> >> >> >>>>
> >> >> >>>> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> >> >> >>>> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> >> >> >>>>
> >> >> >>>> This unthrottles all throttled cfs_rqs.
> >> >> >>>>
> >> >> >>>> But the cpu is still able to call schedule() till
> >> >> >>>>
> >> >> >>>> 	take_cpu_down->__cpu_disable()
> >> >> >>>>
> >> >> >>>> is called from stop_machine.
> >> >> >>>>
> >> >> >>>> This case the tasks from just unthrottled cfs_rqs are pickable
> >> >> >>>> in a standard scheduler way, and they are picked by dying cpu.
> >> >> >>>> The cfs_rqs becomes throttled again, and migrate_tasks()
> >> >> >>>> in migration_call skips their tasks (one more unthrottle
> >> >> >>>> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> >> >> >>>> is already NULL).
> >> >> >>>>
> >> >> >>>> Patch sets runtime_enabled to zero. This guarantees, the runtime
> >> >> >>>> is not accounted, and the cfs_rqs won't exceed given
> >> >> >>>> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> >> >> >>>> in migrate_tasks(). runtime_enabled is recalculated again
> >> >> >>>> when rq becomes online again.
> >> >> >>>>
> >> >> >>>> Ben Segall also noticed, we always enable runtime in
> >> >> >>>> tg_set_cfs_bandwidth(). Actually, we should do that for online
> >> >> >>>> cpus only. To fix that, we check if a cpu is online when
> >> >> >>>> its rq is locked. This guarantees we do not have races with
> >> >> >>>> set_rq_offline(), which also requires rq->lock.
> >> >> >>>>
> >> >> >>>> v2: Fix race with tg_set_cfs_bandwidth().
> >> >> >>>>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> >> >> >>>>
> >> >> >>>> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> >> >> >>>> CC: Konstantin Khorenko <khorenko@parallels.com>
> >> >> >>>> CC: Ben Segall <bsegall@google.com>
> >> >> >>>> CC: Paul Turner <pjt@google.com>
> >> >> >>>> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> >> >> >>>> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> >> >> >>>> CC: Peter Zijlstra <peterz@infradead.org>
> >> >> >>>> CC: Ingo Molnar <mingo@kernel.org>
> >> >> >>>> ---
> >> >> >>>>  kernel/sched/core.c |   15 +++++++++++----
> >> >> >>>>  kernel/sched/fair.c |   22 ++++++++++++++++++++++
> >> >> >>>>  2 files changed, 33 insertions(+), 4 deletions(-)
> >> >> >>>>
> >> >> >>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> >> >>>> index 7f3063c..707a3c5 100644
> >> >> >>>> --- a/kernel/sched/core.c
> >> >> >>>> +++ b/kernel/sched/core.c
> >> >> >>>> @@ -7842,11 +7842,18 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
> >> >> >>>>  		struct rq *rq = cfs_rq->rq;
> >> >> >>>>  
> >> >> >>>>  		raw_spin_lock_irq(&rq->lock);
> >> >> >>>> -		cfs_rq->runtime_enabled = runtime_enabled;
> >> >> >>>> -		cfs_rq->runtime_remaining = 0;
> >> >> >>>> +		/*
> >> >> >>>> +		 * Do not enable runtime on offline runqueues. We specially
> >> >> >>>> +		 * make it disabled in unthrottle_offline_cfs_rqs().
> >> >> >>>> +		 */
> >> >> >>>> +		if (cpu_online(i)) {
> >> >> >>>> +			cfs_rq->runtime_enabled = runtime_enabled;
> >> >> >>>> +			cfs_rq->runtime_remaining = 0;
> >> >> >>>> +
> >> >> >>>> +			if (cfs_rq->throttled)
> >> >> >>>> +				unthrottle_cfs_rq(cfs_rq);
> >> >> >>>> +		}
> >> >> >>>
> >> >> >>> We can just do for_each_online_cpu, yes? Also we probably need
> >> >> >>> get_online_cpus/put_online_cpus, and/or want cpu_active_mask instead
> >> >> >>> right?
> >> >> >>>
> >> >> >>
> >> >> >> Yes, we can use for_each_online_cpu/for_each_active_cpu with
> >> >> >> get_online_cpus() taken. But it adds one more lock dependence.
> >> >> >> This looks worse for me.
> >> >> > 
> >> >> > I mean, you need get_online_cpus anyway - cpu_online is just a test
> >> >> > against the same mask that for_each_online_cpu uses, and without taking
> >> >> > the lock you can still race with offlining and reset runtime_enabled.
> >> >> > 
> >> >> 
> >> >> Oh, I see. Thanks.
> >> >
> >> > But we can check for rq->online, don't we? How about this?
> >> 
> >> Yeah, that should work.
> >
> > We can't base on it because rq->offline is not available in !SMP.
> > Could you review the message from [PATCH v3 1/3] topic?
> 
> I'm not sure what you mean here. The patch just checking cpu_online
> won't work, is there another version you want me to look at?

I mean this one: https://lkml.org/lkml/2014/6/25/123

Have you received it? My email client shows you are properly CCed.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq
  2014-06-24  7:53 ` [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq Kirill Tkhai
  2014-06-24 17:03   ` bsegall
@ 2014-06-26 11:08   ` Srikar Dronamraju
  1 sibling, 0 replies; 14+ messages in thread
From: Srikar Dronamraju @ 2014-06-26 11:08 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, tkhai, Mike Galbraith,
	khorenko, Ben Segall, Paul Turner

* Kirill Tkhai <ktkhai@parallels.com> [2014-06-24 11:53:52]:

> We kill rq->rd on the CPU_DOWN_PREPARE stage:
> 
> 	cpuset_cpu_inactive -> cpuset_update_active_cpus -> partition_sched_domains ->
> 	-> cpu_attach_domain -> rq_attach_root -> set_rq_offline
> 
> This unthrottles all throttled cfs_rqs.
> 
> But the cpu is still able to call schedule() till
> 
> 	take_cpu_down->__cpu_disable()
> 
> is called from stop_machine.
> 
> This case the tasks from just unthrottled cfs_rqs are pickable
> in a standard scheduler way, and they are picked by dying cpu.
> The cfs_rqs becomes throttled again, and migrate_tasks()
> in migration_call skips their tasks (one more unthrottle
> in migrate_tasks()->CPU_DYING does not happen, because rq->rd
> is already NULL).
> 
> Patch sets runtime_enabled to zero. This guarantees, the runtime
> is not accounted, and the cfs_rqs won't exceed given
> cfs_rq->runtime_remaining = 1, and tasks will be pickable
> in migrate_tasks(). runtime_enabled is recalculated again
> when rq becomes online again.
> 
> Ben Segall also noticed, we always enable runtime in
> tg_set_cfs_bandwidth(). Actually, we should do that for online
> cpus only. To fix that, we check if a cpu is online when
> its rq is locked. This guarantees we do not have races with
> set_rq_offline(), which also requires rq->lock.
> 
> v2: Fix race with tg_set_cfs_bandwidth().
>     Move cfs_rq->runtime_enabled=0 above unthrottle_cfs_rq().
> 
> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
> CC: Konstantin Khorenko <khorenko@parallels.com>
> CC: Ben Segall <bsegall@google.com>
> CC: Paul Turner <pjt@google.com>
> CC: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> CC: Mike Galbraith <umgwanakikbuti@gmail.com>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: Ingo Molnar <mingo@kernel.org>

looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-06-26 11:08 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20140624074148.8738.57690.stgit@tkhai>
2014-06-24  7:53 ` [PATCH v2 1/3] sched/fair: Disable runtime_enabled on dying rq Kirill Tkhai
2014-06-24 17:03   ` bsegall
2014-06-24 19:01     ` Kirill Tkhai
2014-06-24 19:13       ` bsegall
2014-06-24 19:26         ` Kirill Tkhai
2014-06-25  7:53           ` Kirill Tkhai
2014-06-25  8:03             ` Kirill Tkhai
2014-06-25 16:52             ` bsegall
2014-06-25 17:31               ` Kirill Tkhai
2014-06-25 17:40                 ` bsegall
2014-06-25 17:44                   ` Kirill Tkhai
2014-06-26 11:08   ` Srikar Dronamraju
2014-06-24  7:53 ` [PATCH v2 2/3] sched/rt: __disable_runtime: Enqueue just unthrottled rt_rq back on the stack Kirill Tkhai
2014-06-24  7:54 ` [PATCH v2 3/3] sched: Rework check_for_tasks() Kirill Tkhai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.