linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task()
@ 2014-09-22 18:36 Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Combine two branches which do the same.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/fair.c |   16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2a1e6ac..420bc98 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5292,24 +5292,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	if (!tsk_cache_hot)
 		tsk_cache_hot = migrate_degrades_locality(p, env);
 
-	if (migrate_improves_locality(p, env)) {
-#ifdef CONFIG_SCHEDSTATS
+	if (migrate_improves_locality(p, env) || !tsk_cache_hot ||
+	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
 		if (tsk_cache_hot) {
 			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
 			schedstat_inc(p, se.statistics.nr_forced_migrations);
 		}
-#endif
-		return 1;
-	}
-
-	if (!tsk_cache_hot ||
-		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
-
-		if (tsk_cache_hot) {
-			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
-			schedstat_inc(p, se.statistics.nr_forced_migrations);
-		}
-
 		return 1;
 	}
 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/6] sched: Do not pick a task which is switching on other cpu
  2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
@ 2014-09-22 18:36 ` Kirill Tkhai
  2014-09-23 15:06   ` [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW Peter Zijlstra
  2014-09-22 18:36 ` [PATCH v3 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Architectures, which define __ARCH_WANT_UNLOCKED_CTXSW,
may pull a task when it's in the middle of schedule().

CPU1(task1 calls schedule)            CPU2
...                                   schedule()
...                                      idle_balance()
...                                         load_balance()
...                                            ...
schedule()                                     ...
   prepare_lock_switch()                       ...
      raw_spin_unlock(&rq1->lock)              ...
      ...                                      raw_spin_lock(&rq1->lock)
      ...                                         detach_tasks();
      ...                                            can_migrate_task(task1)
      ...                                         attach_tasks(); <--- move task1 to rq2
      ...                                      raw_spin_unlock(&rq1->lock)
      ...                                context_switch() <--- switch to task1's stack
      ...                                ...
   (using task1's stack)                 (using task1's stack)
   ...                                   ...
   context_switch()                      ...


Parallel use of a single stack is not a good idea.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # Should this go to stable?
---
 kernel/sched/core.c     |   11 +++--------
 kernel/sched/deadline.c |    7 ++++++-
 kernel/sched/fair.c     |    3 +++
 kernel/sched/rt.c       |    7 ++++++-
 kernel/sched/sched.h    |   16 ++++++++++++++++
 5 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a93b87..5b864e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1700,15 +1700,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 
 #ifdef CONFIG_SMP
 	/*
-	 * If the owning (remote) cpu is still in the middle of schedule() with
-	 * this task as prev, wait until its done referencing the task.
+	 * Note, that p is dequeued at the moment. But it still
+	 * may be "prev" in the middle of schedule() on other cpu.
 	 */
-	while (p->on_cpu)
-		cpu_relax();
-	/*
-	 * Pairs with the smp_wmb() in finish_lock_switch().
-	 */
-	smp_rmb();
+	cpu_relax__while_on_cpu(p);
 
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aaa5abb..ea0ba33 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1364,7 +1364,9 @@ static int push_dl_task(struct rq *rq)
 		next_task = task;
 		goto retry;
 	}
-
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+	cpu_relax__while_on_cpu(next_task);
+#endif
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, later_rq->cpu);
 	activate_task(later_rq, next_task, 0);
@@ -1451,6 +1453,9 @@ static int pull_dl_task(struct rq *this_rq)
 
 			ret = 1;
 
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+			cpu_relax__while_on_cpu(p);
+#endif
 			deactivate_task(src_rq, p, 0);
 			set_task_cpu(p, this_cpu);
 			activate_task(this_rq, p, 0);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 420bc98..80c5064 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5298,6 +5298,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
 			schedstat_inc(p, se.statistics.nr_forced_migrations);
 		}
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+		cpu_relax__while_on_cpu(p);
+#endif
 		return 1;
 	}
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e6a774..de356b0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1734,7 +1734,9 @@ static int push_rt_task(struct rq *rq)
 		next_task = task;
 		goto retry;
 	}
-
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+	cpu_relax__while_on_cpu(next_task);
+#endif
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, lowest_rq->cpu);
 	activate_task(lowest_rq, next_task, 0);
@@ -1823,6 +1825,9 @@ static int pull_rt_task(struct rq *this_rq)
 
 			ret = 1;
 
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+			cpu_relax__while_on_cpu(p);
+#endif
 			deactivate_task(src_rq, p, 0);
 			set_task_cpu(p, this_cpu);
 			activate_task(this_rq, p, 0);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1bc6aad..9c07d72 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1034,6 +1034,22 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 #endif /* __ARCH_WANT_UNLOCKED_CTXSW */
 
 /*
+ * If the owning (remote) cpu is still in the middle of schedule() with
+ * this task as prev, wait until its done referencing the task.
+ */
+static inline void cpu_relax__while_on_cpu(struct task_struct *p)
+{
+#ifdef CONFIG_SMP
+	while (p->on_cpu)
+		cpu_relax();
+	/*
+	 * Pairs with the smp_wmb() in finish_lock_switch().
+	 */
+	smp_rmb();
+#endif
+}
+
+/*
  * wake flags
  */
 #define WF_SYNC		0x01		/* waker goes to sleep after wakeup */


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/6] sched: Use dl_bw_of() under RCU read lock
  2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
@ 2014-09-22 18:36 ` Kirill Tkhai
  2014-09-24 14:57   ` [tip:sched/core] " tip-bot for Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
Probability of use-after-free isn't zero here.

Also add lockdep assert into dl_bw_cpus().

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # v3.14+
---
 kernel/sched/core.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b864e9..a300fce 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1989,6 +1989,8 @@ unsigned long to_ratio(u64 period, u64 runtime)
 #ifdef CONFIG_SMP
 inline struct dl_bw *dl_bw_of(int i)
 {
+	rcu_lockdep_assert(rcu_read_lock_sched_held(),
+			   "sched RCU must be held");
 	return &cpu_rq(i)->rd->dl_bw;
 }
 
@@ -1997,6 +1999,8 @@ static inline int dl_bw_cpus(int i)
 	struct root_domain *rd = cpu_rq(i)->rd;
 	int cpus = 0;
 
+	rcu_lockdep_assert(rcu_read_lock_sched_held(),
+			   "sched RCU must be held");
 	for_each_cpu_and(i, rd->span, cpu_active_mask)
 		cpus++;
 
@@ -7623,6 +7627,8 @@ static int sched_dl_global_constraints(void)
 	int cpu, ret = 0;
 	unsigned long flags;
 
+	rcu_read_lock();
+
 	/*
 	 * Here we want to check the bandwidth not being set to some
 	 * value smaller than the currently allocated bandwidth in
@@ -7644,6 +7650,8 @@ static int sched_dl_global_constraints(void)
 			break;
 	}
 
+	rcu_read_unlock();
+
 	return ret;
 }
 
@@ -7659,6 +7667,7 @@ static void sched_dl_do_global(void)
 	if (global_rt_runtime() != RUNTIME_INF)
 		new_bw = to_ratio(global_rt_period(), global_rt_runtime());
 
+	rcu_read_lock();
 	/*
 	 * FIXME: As above...
 	 */
@@ -7669,6 +7678,7 @@ static void sched_dl_do_global(void)
 		dl_b->bw = new_bw;
 		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 	}
+	rcu_read_unlock();
 }
 
 static int sched_rt_global_validate(void)


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask
  2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
@ 2014-09-22 18:36 ` Kirill Tkhai
  2014-09-24 14:57   ` [tip:sched/core] sched: cleanup: Rename 'out_unlock' to ' out_free_new_mask' tip-bot for Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Nothing is locked there, so label's name only confuses a reader.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/core.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a300fce..3b07710 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4003,14 +4003,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 		rcu_read_lock();
 		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
 			rcu_read_unlock();
-			goto out_unlock;
+			goto out_free_new_mask;
 		}
 		rcu_read_unlock();
 	}
 
 	retval = security_task_setscheduler(p);
 	if (retval)
-		goto out_unlock;
+		goto out_free_new_mask;
 
 
 	cpuset_cpus_allowed(p, cpus_allowed);
@@ -4028,7 +4028,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 
 		if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
 			retval = -EBUSY;
-			goto out_unlock;
+			goto out_free_new_mask;
 		}
 	}
 #endif
@@ -4047,7 +4047,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 			goto again;
 		}
 	}
-out_unlock:
+out_free_new_mask:
 	free_cpumask_var(new_mask);
 out_free_cpus_allowed:
 	free_cpumask_var(cpus_allowed);


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
                   ` (2 preceding siblings ...)
  2014-09-22 18:36 ` [PATCH v3 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
@ 2014-09-22 18:36 ` Kirill Tkhai
  2014-09-24 14:57   ` [tip:sched/core] " tip-bot for Kirill Tkhai
  2014-09-22 18:36 ` [PATCH v3 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
  2014-09-24 14:56 ` [tip:sched/core] sched/fair: Remove duplicate code from can_migrate_task() tip-bot for Kirill Tkhai
  5 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Probability of use-after-free isn't zero in this place.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # v3.14+
---
 kernel/sched/core.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3b07710..643ee99 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4023,13 +4023,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	 * root_domain.
 	 */
 #ifdef CONFIG_SMP
-	if (task_has_dl_policy(p)) {
-		const struct cpumask *span = task_rq(p)->rd->span;
-
-		if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
+	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
+		rcu_read_lock();
+		if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
 			retval = -EBUSY;
+			rcu_read_unlock();
 			goto out_free_new_mask;
 		}
+		rcu_read_unlock();
 	}
 #endif
 again:


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 6/6] sched/rt: Use resched_curr() in task_tick_rt()
  2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
                   ` (3 preceding siblings ...)
  2014-09-22 18:36 ` [PATCH v3 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
@ 2014-09-22 18:36 ` Kirill Tkhai
  2014-09-24 14:57   ` [tip:sched/core] " tip-bot for Kirill Tkhai
  2014-09-24 14:56 ` [tip:sched/core] sched/fair: Remove duplicate code from can_migrate_task() tip-bot for Kirill Tkhai
  5 siblings, 1 reply; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-22 18:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Some time ago PREEMPT_NEED_RESCHED was implemented,
so reschedule technics is a little more difficult now.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/rt.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index de356b0..c322071 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2077,7 +2077,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 	for_each_sched_rt_entity(rt_se) {
 		if (rt_se->run_list.prev != rt_se->run_list.next) {
 			requeue_task_rt(rq, p, 0);
-			set_tsk_need_resched(p);
+			resched_curr(rq);
 			return;
 		}
 	}


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  2014-09-22 18:36 ` [PATCH v3 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
@ 2014-09-23 15:06   ` Peter Zijlstra
  2014-09-23 15:16     ` Kirill Tkhai
                       ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2014-09-23 15:06 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: linux-kernel, Ingo Molnar, Kirill Tkhai, oleg, linux, ralf,
	tony.luck, linux-mips, linux-ia64

On Mon, Sep 22, 2014 at 10:36:18PM +0400, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@parallels.com>
> 
> Architectures, which define __ARCH_WANT_UNLOCKED_CTXSW,
> may pull a task when it's in the middle of schedule().
> 
> CPU1(task1 calls schedule)            CPU2
> ...                                   schedule()
> ...                                      idle_balance()
> ...                                         load_balance()
> ...                                            ...
> schedule()                                     ...
>    prepare_lock_switch()                       ...
>       raw_spin_unlock(&rq1->lock)              ...
>       ...                                      raw_spin_lock(&rq1->lock)
>       ...                                         detach_tasks();
>       ...                                            can_migrate_task(task1)
>       ...                                         attach_tasks(); <--- move task1 to rq2
>       ...                                      raw_spin_unlock(&rq1->lock)
>       ...                                context_switch() <--- switch to task1's stack
>       ...                                ...
>    (using task1's stack)                 (using task1's stack)
>    ...                                   ...
>    context_switch()                      ...
> 
> 
> Parallel use of a single stack is not a good idea.

Indeed it is, but how about we do this instead?

---
Subject: sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW

Kirill found that there's a subtle race in the
__ARCH_WANT_UNLOCKED_CTXSW code, and instead of fixing it, remove the
entire exception because neither arch that uses it seems to actually
still require it.

Boot tested on mips64el (qemu) only.

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/ia64/include/asm/processor.h |  1 -
 arch/mips/include/asm/processor.h |  6 ------
 kernel/sched/core.c               |  6 ------
 kernel/sched/sched.h              | 30 ------------------------------
 4 files changed, 43 deletions(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index c736713..ce53c50 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -19,7 +19,6 @@
 #include <asm/ptrace.h>
 #include <asm/ustack.h>
 
-#define __ARCH_WANT_UNLOCKED_CTXSW
 #define ARCH_HAS_PREFETCH_SWITCH_STACK
 
 #define IA64_NUM_PHYS_STACK_REG	96
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 05f0843..f1df4cb 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -397,12 +397,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define ARCH_HAS_PREFETCHW
 #define prefetchw(x) __builtin_prefetch((x), 1, 1)
 
-/*
- * See Documentation/scheduler/sched-arch.txt; prevents deadlock on SMP
- * systems.
- */
-#define __ARCH_WANT_UNLOCKED_CTXSW
-
 #endif
 
 #endif /* _ASM_PROCESSOR_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a93b87..ccbafb0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2304,10 +2304,6 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 	 */
 	post_schedule(rq);
 
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
-	/* In this case, finish_task_switch does not reenable preemption */
-	preempt_enable();
-#endif
 	if (current->set_child_tid)
 		put_user(task_pid_vnr(current), current->set_child_tid);
 }
@@ -2350,9 +2346,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * of the scheduler it's an obvious special-case), so we
 	 * do an early lockdep release here:
 	 */
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
-#endif
 
 	context_tracking_task_switch(prev, next);
 	/* Here we just switch the register state and the stack. */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1bc6aad..d87f122 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -966,7 +966,6 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 # define finish_arch_post_lock_switch()	do { } while (0)
 #endif
 
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
 #ifdef CONFIG_SMP
@@ -1004,35 +1003,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 	raw_spin_unlock_irq(&rq->lock);
 }
 
-#else /* __ARCH_WANT_UNLOCKED_CTXSW */
-static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
-{
-#ifdef CONFIG_SMP
-	/*
-	 * We can optimise this out completely for !SMP, because the
-	 * SMP rebalancing from interrupt is the only thing that cares
-	 * here.
-	 */
-	next->on_cpu = 1;
-#endif
-	raw_spin_unlock(&rq->lock);
-}
-
-static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
-{
-#ifdef CONFIG_SMP
-	/*
-	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
-	 * We must ensure this doesn't happen until the switch is completely
-	 * finished.
-	 */
-	smp_wmb();
-	prev->on_cpu = 0;
-#endif
-	local_irq_enable();
-}
-#endif /* __ARCH_WANT_UNLOCKED_CTXSW */
-
 /*
  * wake flags
  */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  2014-09-23 15:06   ` [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW Peter Zijlstra
@ 2014-09-23 15:16     ` Kirill Tkhai
  2014-09-23 19:45     ` Guenter Roeck
  2014-09-24 14:56     ` [tip:sched/core] sched, mips, ia64: " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: Kirill Tkhai @ 2014-09-23 15:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Kirill Tkhai, oleg, linux, ralf,
	tony.luck, linux-mips, linux-ia64

23.09.2014, 19:06, "Peter Zijlstra" <peterz@infradead.org>:
> On Mon, Sep 22, 2014 at 10:36:18PM +0400, Kirill Tkhai wrote:
>>  From: Kirill Tkhai <ktkhai@parallels.com>
>>
>>  Architectures, which define __ARCH_WANT_UNLOCKED_CTXSW,
>>  may pull a task when it's in the middle of schedule().
>>
>>  CPU1(task1 calls schedule)            CPU2
>>  ...                                   schedule()
>>  ...                                      idle_balance()
>>  ...                                         load_balance()
>>  ...                                            ...
>>  schedule()                                     ...
>>     prepare_lock_switch()                       ...
>>        raw_spin_unlock(&rq1->lock)              ...
>>        ...                                      raw_spin_lock(&rq1->lock)
>>        ...                                         detach_tasks();
>>        ...                                            can_migrate_task(task1)
>>        ...                                         attach_tasks(); <--- move task1 to rq2
>>        ...                                      raw_spin_unlock(&rq1->lock)
>>        ...                                context_switch() <--- switch to task1's stack
>>        ...                                ...
>>     (using task1's stack)                 (using task1's stack)
>>     ...                                   ...
>>     context_switch()                      ...
>>
>>  Parallel use of a single stack is not a good idea.
>
> Indeed it is, but how about we do this instead?

Completely agree, looks good for me.

> ---
> Subject: sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
>
> Kirill found that there's a subtle race in the
> __ARCH_WANT_UNLOCKED_CTXSW code, and instead of fixing it, remove the
> entire exception because neither arch that uses it seems to actually
> still require it.
>
> Boot tested on mips64el (qemu) only.
>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Kirill Tkhai <ktkhai@parallels.com> (one more review of ia64 part)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  2014-09-23 15:06   ` [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW Peter Zijlstra
  2014-09-23 15:16     ` Kirill Tkhai
@ 2014-09-23 19:45     ` Guenter Roeck
  2014-09-24 14:56     ` [tip:sched/core] sched, mips, ia64: " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: Guenter Roeck @ 2014-09-23 19:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kirill Tkhai, linux-kernel, Ingo Molnar, Kirill Tkhai, oleg,
	ralf, tony.luck, linux-mips, linux-ia64

On Tue, Sep 23, 2014 at 05:06:41PM +0200, Peter Zijlstra wrote:
> On Mon, Sep 22, 2014 at 10:36:18PM +0400, Kirill Tkhai wrote:
> > From: Kirill Tkhai <ktkhai@parallels.com>
> > 
> > Architectures, which define __ARCH_WANT_UNLOCKED_CTXSW,
> > may pull a task when it's in the middle of schedule().
> > 
> > CPU1(task1 calls schedule)            CPU2
> > ...                                   schedule()
> > ...                                      idle_balance()
> > ...                                         load_balance()
> > ...                                            ...
> > schedule()                                     ...
> >    prepare_lock_switch()                       ...
> >       raw_spin_unlock(&rq1->lock)              ...
> >       ...                                      raw_spin_lock(&rq1->lock)
> >       ...                                         detach_tasks();
> >       ...                                            can_migrate_task(task1)
> >       ...                                         attach_tasks(); <--- move task1 to rq2
> >       ...                                      raw_spin_unlock(&rq1->lock)
> >       ...                                context_switch() <--- switch to task1's stack
> >       ...                                ...
> >    (using task1's stack)                 (using task1's stack)
> >    ...                                   ...
> >    context_switch()                      ...
> > 
> > 
> > Parallel use of a single stack is not a good idea.
> 
> Indeed it is, but how about we do this instead?
> 
> ---
> Subject: sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
> 
> Kirill found that there's a subtle race in the
> __ARCH_WANT_UNLOCKED_CTXSW code, and instead of fixing it, remove the
> entire exception because neither arch that uses it seems to actually
> still require it.
> 
> Boot tested on mips64el (qemu) only.
> 
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Guenter Roeck <linux@roeck-us.net>
> Cc: Ralf Baechle <ralf@linux-mips.org>
> Cc: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/ia64/include/asm/processor.h |  1 -
>  arch/mips/include/asm/processor.h |  6 ------
>  kernel/sched/core.c               |  6 ------
>  kernel/sched/sched.h              | 30 ------------------------------

You might also want to clean up any references to the define if you remove it.

Documentation/scheduler/sched-arch.txt:you must `#define __ARCH_WANT_UNLOCKED_CTXSW` in a header file
arch/blackfin/kernel/entry.S:    * since Blackfin does not define __ARCH_WANT_UNLOCKED_CTXSW, so

Guenter

>  4 files changed, 43 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
> index c736713..ce53c50 100644
> --- a/arch/ia64/include/asm/processor.h
> +++ b/arch/ia64/include/asm/processor.h
> @@ -19,7 +19,6 @@
>  #include <asm/ptrace.h>
>  #include <asm/ustack.h>
>  
> -#define __ARCH_WANT_UNLOCKED_CTXSW
>  #define ARCH_HAS_PREFETCH_SWITCH_STACK
>  
>  #define IA64_NUM_PHYS_STACK_REG	96
> diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
> index 05f0843..f1df4cb 100644
> --- a/arch/mips/include/asm/processor.h
> +++ b/arch/mips/include/asm/processor.h
> @@ -397,12 +397,6 @@ unsigned long get_wchan(struct task_struct *p);
>  #define ARCH_HAS_PREFETCHW
>  #define prefetchw(x) __builtin_prefetch((x), 1, 1)
>  
> -/*
> - * See Documentation/scheduler/sched-arch.txt; prevents deadlock on SMP
> - * systems.
> - */
> -#define __ARCH_WANT_UNLOCKED_CTXSW
> -
>  #endif
>  
>  #endif /* _ASM_PROCESSOR_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2a93b87..ccbafb0 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2304,10 +2304,6 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
>  	 */
>  	post_schedule(rq);
>  
> -#ifdef __ARCH_WANT_UNLOCKED_CTXSW
> -	/* In this case, finish_task_switch does not reenable preemption */
> -	preempt_enable();
> -#endif
>  	if (current->set_child_tid)
>  		put_user(task_pid_vnr(current), current->set_child_tid);
>  }
> @@ -2350,9 +2346,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
>  	 * of the scheduler it's an obvious special-case), so we
>  	 * do an early lockdep release here:
>  	 */
> -#ifndef __ARCH_WANT_UNLOCKED_CTXSW
>  	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
> -#endif
>  
>  	context_tracking_task_switch(prev, next);
>  	/* Here we just switch the register state and the stack. */
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 1bc6aad..d87f122 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -966,7 +966,6 @@ static inline int task_on_rq_migrating(struct task_struct *p)
>  # define finish_arch_post_lock_switch()	do { } while (0)
>  #endif
>  
> -#ifndef __ARCH_WANT_UNLOCKED_CTXSW
>  static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
>  {
>  #ifdef CONFIG_SMP
> @@ -1004,35 +1003,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
>  	raw_spin_unlock_irq(&rq->lock);
>  }
>  
> -#else /* __ARCH_WANT_UNLOCKED_CTXSW */
> -static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
> -{
> -#ifdef CONFIG_SMP
> -	/*
> -	 * We can optimise this out completely for !SMP, because the
> -	 * SMP rebalancing from interrupt is the only thing that cares
> -	 * here.
> -	 */
> -	next->on_cpu = 1;
> -#endif
> -	raw_spin_unlock(&rq->lock);
> -}
> -
> -static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
> -{
> -#ifdef CONFIG_SMP
> -	/*
> -	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
> -	 * We must ensure this doesn't happen until the switch is completely
> -	 * finished.
> -	 */
> -	smp_wmb();
> -	prev->on_cpu = 0;
> -#endif
> -	local_irq_enable();
> -}
> -#endif /* __ARCH_WANT_UNLOCKED_CTXSW */
> -
>  /*
>   * wake flags
>   */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
  2014-09-23 15:06   ` [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW Peter Zijlstra
  2014-09-23 15:16     ` Kirill Tkhai
  2014-09-23 19:45     ` Guenter Roeck
@ 2014-09-24 14:56     ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Peter Zijlstra @ 2014-09-24 14:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, torvalds, peterz, tony.luck, ralf, akpm, tglx, davidlohr,
	hpa, linux-kernel, james.hogan, qais.yousef, keescook,
	paul.burton, fenghua.yu, tkhai

Commit-ID:  c55f5158f5606f8a62e694b7e009f59b92ac6258
Gitweb:     http://git.kernel.org/tip/c55f5158f5606f8a62e694b7e009f59b92ac6258
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 23 Sep 2014 17:06:41 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:47:05 +0200

sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW

Kirill found that there's a subtle race in the
__ARCH_WANT_UNLOCKED_CTXSW code, and instead of fixing it, remove the
entire exception because neither arch that uses it seems to actually
still require it.

Boot tested on mips64el (qemu) only.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kirill Tkhai <tkhai@yandex.ru>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Qais Yousef <qais.yousef@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: oleg@redhat.com
Cc: linux@roeck-us.net
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mips@linux-mips.org
Link: http://lkml.kernel.org/r/20140923150641.GH3312@worktop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/ia64/include/asm/processor.h |  1 -
 arch/mips/include/asm/processor.h |  6 ------
 kernel/sched/core.c               |  6 ------
 kernel/sched/sched.h              | 30 ------------------------------
 4 files changed, 43 deletions(-)

diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index c736713..ce53c50 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -19,7 +19,6 @@
 #include <asm/ptrace.h>
 #include <asm/ustack.h>
 
-#define __ARCH_WANT_UNLOCKED_CTXSW
 #define ARCH_HAS_PREFETCH_SWITCH_STACK
 
 #define IA64_NUM_PHYS_STACK_REG	96
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index 05f0843..f1df4cb 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -397,12 +397,6 @@ unsigned long get_wchan(struct task_struct *p);
 #define ARCH_HAS_PREFETCHW
 #define prefetchw(x) __builtin_prefetch((x), 1, 1)
 
-/*
- * See Documentation/scheduler/sched-arch.txt; prevents deadlock on SMP
- * systems.
- */
-#define __ARCH_WANT_UNLOCKED_CTXSW
-
 #endif
 
 #endif /* _ASM_PROCESSOR_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d65566d..5b0eac9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2331,10 +2331,6 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
 	 */
 	post_schedule(rq);
 
-#ifdef __ARCH_WANT_UNLOCKED_CTXSW
-	/* In this case, finish_task_switch does not reenable preemption */
-	preempt_enable();
-#endif
 	if (current->set_child_tid)
 		put_user(task_pid_vnr(current), current->set_child_tid);
 }
@@ -2377,9 +2373,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	 * of the scheduler it's an obvious special-case), so we
 	 * do an early lockdep release here:
 	 */
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
-#endif
 
 	context_tracking_task_switch(prev, next);
 	/* Here we just switch the register state and the stack. */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 16e1ca9..6130251 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -975,7 +975,6 @@ static inline int task_on_rq_migrating(struct task_struct *p)
 # define finish_arch_post_lock_switch()	do { } while (0)
 #endif
 
-#ifndef __ARCH_WANT_UNLOCKED_CTXSW
 static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
 {
 #ifdef CONFIG_SMP
@@ -1013,35 +1012,6 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 	raw_spin_unlock_irq(&rq->lock);
 }
 
-#else /* __ARCH_WANT_UNLOCKED_CTXSW */
-static inline void prepare_lock_switch(struct rq *rq, struct task_struct *next)
-{
-#ifdef CONFIG_SMP
-	/*
-	 * We can optimise this out completely for !SMP, because the
-	 * SMP rebalancing from interrupt is the only thing that cares
-	 * here.
-	 */
-	next->on_cpu = 1;
-#endif
-	raw_spin_unlock(&rq->lock);
-}
-
-static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
-{
-#ifdef CONFIG_SMP
-	/*
-	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
-	 * We must ensure this doesn't happen until the switch is completely
-	 * finished.
-	 */
-	smp_wmb();
-	prev->on_cpu = 0;
-#endif
-	local_irq_enable();
-}
-#endif /* __ARCH_WANT_UNLOCKED_CTXSW */
-
 /*
  * wake flags
  */

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/fair: Remove duplicate code from can_migrate_task()
  2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
                   ` (4 preceding siblings ...)
  2014-09-22 18:36 ` [PATCH v3 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
@ 2014-09-24 14:56 ` tip-bot for Kirill Tkhai
  5 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Kirill Tkhai @ 2014-09-24 14:56 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, ktkhai, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  7a96c231ca23f0f5622852307df4209afc502ec3
Gitweb:     http://git.kernel.org/tip/7a96c231ca23f0f5622852307df4209afc502ec3
Author:     Kirill Tkhai <ktkhai@parallels.com>
AuthorDate: Mon, 22 Sep 2014 22:36:12 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:47:07 +0200

sched/fair: Remove duplicate code from can_migrate_task()

Combine two branches which do the same.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183612.11015.64200.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8cb32f8..10a5a28 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5315,24 +5315,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	if (!tsk_cache_hot)
 		tsk_cache_hot = migrate_degrades_locality(p, env);
 
-	if (migrate_improves_locality(p, env)) {
-#ifdef CONFIG_SCHEDSTATS
+	if (migrate_improves_locality(p, env) || !tsk_cache_hot ||
+	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
 		if (tsk_cache_hot) {
 			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
 			schedstat_inc(p, se.statistics.nr_forced_migrations);
 		}
-#endif
-		return 1;
-	}
-
-	if (!tsk_cache_hot ||
-		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
-
-		if (tsk_cache_hot) {
-			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
-			schedstat_inc(p, se.statistics.nr_forced_migrations);
-		}
-
 		return 1;
 	}
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched: Use dl_bw_of() under RCU read lock
  2014-09-22 18:36 ` [PATCH v3 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
@ 2014-09-24 14:57   ` tip-bot for Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Kirill Tkhai @ 2014-09-24 14:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, ktkhai, hpa, mingo, torvalds, peterz, paulmck, tglx

Commit-ID:  66339c31bc3978d5fff9c4b4cb590a861def4db2
Gitweb:     http://git.kernel.org/tip/66339c31bc3978d5fff9c4b4cb590a861def4db2
Author:     Kirill Tkhai <ktkhai@parallels.com>
AuthorDate: Mon, 22 Sep 2014 22:36:24 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:47:09 +0200

sched: Use dl_bw_of() under RCU read lock

dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
Probability of use-after-free isn't zero here.

Also add lockdep assert into dl_bw_cpus().

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183624.11015.71558.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5b0eac9..f0adb03 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2021,6 +2021,8 @@ unsigned long to_ratio(u64 period, u64 runtime)
 #ifdef CONFIG_SMP
 inline struct dl_bw *dl_bw_of(int i)
 {
+	rcu_lockdep_assert(rcu_read_lock_sched_held(),
+			   "sched RCU must be held");
 	return &cpu_rq(i)->rd->dl_bw;
 }
 
@@ -2029,6 +2031,8 @@ static inline int dl_bw_cpus(int i)
 	struct root_domain *rd = cpu_rq(i)->rd;
 	int cpus = 0;
 
+	rcu_lockdep_assert(rcu_read_lock_sched_held(),
+			   "sched RCU must be held");
 	for_each_cpu_and(i, rd->span, cpu_active_mask)
 		cpus++;
 
@@ -7645,6 +7649,8 @@ static int sched_dl_global_constraints(void)
 	int cpu, ret = 0;
 	unsigned long flags;
 
+	rcu_read_lock();
+
 	/*
 	 * Here we want to check the bandwidth not being set to some
 	 * value smaller than the currently allocated bandwidth in
@@ -7666,6 +7672,8 @@ static int sched_dl_global_constraints(void)
 			break;
 	}
 
+	rcu_read_unlock();
+
 	return ret;
 }
 
@@ -7681,6 +7689,7 @@ static void sched_dl_do_global(void)
 	if (global_rt_runtime() != RUNTIME_INF)
 		new_bw = to_ratio(global_rt_period(), global_rt_runtime());
 
+	rcu_read_lock();
 	/*
 	 * FIXME: As above...
 	 */
@@ -7691,6 +7700,7 @@ static void sched_dl_do_global(void)
 		dl_b->bw = new_bw;
 		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 	}
+	rcu_read_unlock();
 }
 
 static int sched_rt_global_validate(void)

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched: cleanup: Rename 'out_unlock' to ' out_free_new_mask'
  2014-09-22 18:36 ` [PATCH v3 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
@ 2014-09-24 14:57   ` tip-bot for Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Kirill Tkhai @ 2014-09-24 14:57 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, ktkhai, hpa, mingo, peterz, tglx

Commit-ID:  16303ab2fe214635240a8f57cad2cd29792d4e3b
Gitweb:     http://git.kernel.org/tip/16303ab2fe214635240a8f57cad2cd29792d4e3b
Author:     Kirill Tkhai <ktkhai@parallels.com>
AuthorDate: Mon, 22 Sep 2014 22:36:30 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:47:10 +0200

sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'

Nothing is locked there, so label's name only confuses a reader.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140922183630.11015.59500.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0adb03..316127a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4029,14 +4029,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 		rcu_read_lock();
 		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
 			rcu_read_unlock();
-			goto out_unlock;
+			goto out_free_new_mask;
 		}
 		rcu_read_unlock();
 	}
 
 	retval = security_task_setscheduler(p);
 	if (retval)
-		goto out_unlock;
+		goto out_free_new_mask;
 
 
 	cpuset_cpus_allowed(p, cpus_allowed);
@@ -4054,7 +4054,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 
 		if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
 			retval = -EBUSY;
-			goto out_unlock;
+			goto out_free_new_mask;
 		}
 	}
 #endif
@@ -4073,7 +4073,7 @@ again:
 			goto again;
 		}
 	}
-out_unlock:
+out_free_new_mask:
 	free_cpumask_var(new_mask);
 out_free_cpus_allowed:
 	free_cpumask_var(cpus_allowed);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-22 18:36 ` [PATCH v3 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
@ 2014-09-24 14:57   ` tip-bot for Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Kirill Tkhai @ 2014-09-24 14:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, ktkhai, hpa, mingo, torvalds, peterz, paulmck, tglx

Commit-ID:  f1e3a0932f3a9554371792a7daaf1e0eb19f66d5
Gitweb:     http://git.kernel.org/tip/f1e3a0932f3a9554371792a7daaf1e0eb19f66d5
Author:     Kirill Tkhai <ktkhai@parallels.com>
AuthorDate: Mon, 22 Sep 2014 22:36:36 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:47:11 +0200

sched: Use rq->rd in sched_setaffinity() under RCU read lock

Probability of use-after-free isn't zero in this place.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v3.14+
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183636.11015.83611.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 316127a..b5349fe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4049,13 +4049,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	 * root_domain.
 	 */
 #ifdef CONFIG_SMP
-	if (task_has_dl_policy(p)) {
-		const struct cpumask *span = task_rq(p)->rd->span;
-
-		if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
+	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
+		rcu_read_lock();
+		if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
 			retval = -EBUSY;
+			rcu_read_unlock();
 			goto out_free_new_mask;
 		}
+		rcu_read_unlock();
 	}
 #endif
 again:

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [tip:sched/core] sched/rt: Use resched_curr() in task_tick_rt()
  2014-09-22 18:36 ` [PATCH v3 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
@ 2014-09-24 14:57   ` tip-bot for Kirill Tkhai
  0 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Kirill Tkhai @ 2014-09-24 14:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, ktkhai, hpa, mingo, torvalds, peterz, tglx

Commit-ID:  8aa6f0ebf41b5fdd186276394bf07e7bd6884d94
Gitweb:     http://git.kernel.org/tip/8aa6f0ebf41b5fdd186276394bf07e7bd6884d94
Author:     Kirill Tkhai <ktkhai@parallels.com>
AuthorDate: Mon, 22 Sep 2014 22:36:43 +0400
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 24 Sep 2014 14:47:12 +0200

sched/rt: Use resched_curr() in task_tick_rt()

Some time ago PREEMPT_NEED_RESCHED was implemented,
so reschedule technics is a little more difficult now.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140922183642.11015.66039.stgit@localhost
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/rt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e6a774..87ea5bf 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2072,7 +2072,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 	for_each_sched_rt_entity(rt_se) {
 		if (rt_se->run_list.prev != rt_se->run_list.next) {
 			requeue_task_rt(rq, p, 0);
-			set_tsk_need_resched(p);
+			resched_curr(rq);
 			return;
 		}
 	}

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-09-24 14:58 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-22 18:36 [PATCH v3 1/6] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
2014-09-22 18:36 ` [PATCH v3 2/6] sched: Do not pick a task which is switching on other cpu Kirill Tkhai
2014-09-23 15:06   ` [RFC][PATCH] sched,mips,ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW Peter Zijlstra
2014-09-23 15:16     ` Kirill Tkhai
2014-09-23 19:45     ` Guenter Roeck
2014-09-24 14:56     ` [tip:sched/core] sched, mips, ia64: " tip-bot for Peter Zijlstra
2014-09-22 18:36 ` [PATCH v3 3/6] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
2014-09-24 14:57   ` [tip:sched/core] " tip-bot for Kirill Tkhai
2014-09-22 18:36 ` [PATCH v3 4/6] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
2014-09-24 14:57   ` [tip:sched/core] sched: cleanup: Rename 'out_unlock' to ' out_free_new_mask' tip-bot for Kirill Tkhai
2014-09-22 18:36 ` [PATCH v3 5/6] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
2014-09-24 14:57   ` [tip:sched/core] " tip-bot for Kirill Tkhai
2014-09-22 18:36 ` [PATCH v3 6/6] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
2014-09-24 14:57   ` [tip:sched/core] " tip-bot for Kirill Tkhai
2014-09-24 14:56 ` [tip:sched/core] sched/fair: Remove duplicate code from can_migrate_task() tip-bot for Kirill Tkhai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).