linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task()
@ 2014-09-20 16:51 Kirill Tkhai
  2014-09-20 16:51 ` [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW) Kirill Tkhai
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Two branches do the same, so we can combine them and get
rid of duplicate code.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/fair.c |   16 ++--------------
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 74fa2c2..fa13f94 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5292,24 +5292,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 	if (!tsk_cache_hot)
 		tsk_cache_hot = migrate_degrades_locality(p, env);
 
-	if (migrate_improves_locality(p, env)) {
-#ifdef CONFIG_SCHEDSTATS
+	if (migrate_improves_locality(p, env) || !tsk_cache_hot ||
+	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
 		if (tsk_cache_hot) {
 			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
 			schedstat_inc(p, se.statistics.nr_forced_migrations);
 		}
-#endif
-		return 1;
-	}
-
-	if (!tsk_cache_hot ||
-		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
-
-		if (tsk_cache_hot) {
-			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
-			schedstat_inc(p, se.statistics.nr_forced_migrations);
-		}
-
 		return 1;
 	}
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW)
  2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
@ 2014-09-20 16:51 ` Kirill Tkhai
  2014-09-20 18:33   ` Peter Zijlstra
  2014-09-20 16:51 ` [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

We may pick a task which is in context_switch() on other cpu at the moment.
Parallel using of a single stack by two processes is not a good idea.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org>
---
 kernel/sched/core.c     |   11 +----------
 kernel/sched/deadline.c |    7 ++++++-
 kernel/sched/fair.c     |    3 +++
 kernel/sched/rt.c       |    7 ++++++-
 kernel/sched/sched.h    |   16 ++++++++++++++++
 5 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 55d86e7..c8d754f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1699,16 +1699,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		goto stat;
 
 #ifdef CONFIG_SMP
-	/*
-	 * If the owning (remote) cpu is still in the middle of schedule() with
-	 * this task as prev, wait until its done referencing the task.
-	 */
-	while (p->on_cpu)
-		cpu_relax();
-	/*
-	 * Pairs with the smp_wmb() in finish_lock_switch().
-	 */
-	smp_rmb();
+	cpu_relax__while_on_cpu(p);
 
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aaa5abb..ea0ba33 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1364,7 +1364,9 @@ static int push_dl_task(struct rq *rq)
 		next_task = task;
 		goto retry;
 	}
-
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+	cpu_relax__while_on_cpu(next_task);
+#endif
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, later_rq->cpu);
 	activate_task(later_rq, next_task, 0);
@@ -1451,6 +1453,9 @@ static int pull_dl_task(struct rq *this_rq)
 
 			ret = 1;
 
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+			cpu_relax__while_on_cpu(p);
+#endif
 			deactivate_task(src_rq, p, 0);
 			set_task_cpu(p, this_cpu);
 			activate_task(this_rq, p, 0);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fa13f94..0ec070e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5298,6 +5298,9 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
 			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
 			schedstat_inc(p, se.statistics.nr_forced_migrations);
 		}
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+		cpu_relax__while_on_cpu(p);
+#endif
 		return 1;
 	}
 
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2e6a774..de356b0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1734,7 +1734,9 @@ static int push_rt_task(struct rq *rq)
 		next_task = task;
 		goto retry;
 	}
-
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+	cpu_relax__while_on_cpu(next_task);
+#endif
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, lowest_rq->cpu);
 	activate_task(lowest_rq, next_task, 0);
@@ -1823,6 +1825,9 @@ static int pull_rt_task(struct rq *this_rq)
 
 			ret = 1;
 
+#ifdef __ARCH_WANT_UNLOCKED_CTXSW
+			cpu_relax__while_on_cpu(p);
+#endif
 			deactivate_task(src_rq, p, 0);
 			set_task_cpu(p, this_cpu);
 			activate_task(this_rq, p, 0);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index aa0f73b..e2ef9b7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1034,6 +1034,22 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
 #endif /* __ARCH_WANT_UNLOCKED_CTXSW */
 
 /*
+ * If the owning (remote) cpu is still in the middle of schedule() with
+ * this task as prev, wait until its done referencing the task.
+ */
+static inline void cpu_relax__while_on_cpu(struct task_struct *p)
+{
+#ifdef CONFIG_SMP
+	while (p->on_cpu)
+		cpu_relax();
+	/*
+	 * Pairs with the smp_wmb() in finish_lock_switch().
+	 */
+	smp_rmb();
+#endif
+}
+
+/*
  * wake flags
  */
 #define WF_SYNC		0x01		/* waker goes to sleep after wakeup */


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock
  2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
  2014-09-20 16:51 ` [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW) Kirill Tkhai
@ 2014-09-20 16:51 ` Kirill Tkhai
  2014-09-20 18:57   ` Peter Zijlstra
  2014-09-20 16:51 ` [PATCH 4/7] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
Probability of use-after-free and memory corruption aren't zero here.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # v3.14+
---
 kernel/sched/core.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c8d754f..00a024c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7619,6 +7619,8 @@ static int sched_dl_global_constraints(void)
 	int cpu, ret = 0;
 	unsigned long flags;
 
+	rcu_read_lock();
+
 	/*
 	 * Here we want to check the bandwidth not being set to some
 	 * value smaller than the currently allocated bandwidth in
@@ -7640,6 +7642,8 @@ static int sched_dl_global_constraints(void)
 			break;
 	}
 
+	rcu_read_unlock();
+
 	return ret;
 }
 
@@ -7655,6 +7659,7 @@ static void sched_dl_do_global(void)
 	if (global_rt_runtime() != RUNTIME_INF)
 		new_bw = to_ratio(global_rt_period(), global_rt_runtime());
 
+	rcu_read_lock();
 	/*
 	 * FIXME: As above...
 	 */
@@ -7665,6 +7670,7 @@ static void sched_dl_do_global(void)
 		dl_b->bw = new_bw;
 		raw_spin_unlock_irqrestore(&dl_b->lock, flags);
 	}
+	rcu_read_unlock();
 }
 
 static int sched_rt_global_validate(void)


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/7] sched: cleanup: Rename out_unlock to out_free_new_mask
  2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
  2014-09-20 16:51 ` [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW) Kirill Tkhai
  2014-09-20 16:51 ` [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
@ 2014-09-20 16:51 ` Kirill Tkhai
  2014-09-20 16:51 ` [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Nothing is locked there, so label's name only confuses a reader.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/core.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 00a024c..65655a887 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3995,14 +3995,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 		rcu_read_lock();
 		if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
 			rcu_read_unlock();
-			goto out_unlock;
+			goto out_free_new_mask;
 		}
 		rcu_read_unlock();
 	}
 
 	retval = security_task_setscheduler(p);
 	if (retval)
-		goto out_unlock;
+		goto out_free_new_mask;
 
 
 	cpuset_cpus_allowed(p, cpus_allowed);
@@ -4020,7 +4020,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 
 		if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
 			retval = -EBUSY;
-			goto out_unlock;
+			goto out_free_new_mask;
 		}
 	}
 #endif
@@ -4039,7 +4039,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 			goto again;
 		}
 	}
-out_unlock:
+out_free_new_mask:
 	free_cpumask_var(new_mask);
 out_free_cpus_allowed:
 	free_cpumask_var(cpus_allowed);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
                   ` (2 preceding siblings ...)
  2014-09-20 16:51 ` [PATCH 4/7] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
@ 2014-09-20 16:51 ` Kirill Tkhai
  2014-09-20 18:59   ` Peter Zijlstra
  2014-09-20 16:51 ` [PATCH 6/7] sched: Delete rq::skip_clock_update == -1 Kirill Tkhai
  2014-09-20 16:51 ` [PATCH 7/7] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
  5 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
Probability of NULL pointer derefference isn't zero in this place.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Cc: <stable@vger.kernel.org> # v3.14+
---
 kernel/sched/core.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 65655a887..a40d6e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4015,13 +4015,14 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
 	 * root_domain.
 	 */
 #ifdef CONFIG_SMP
-	if (task_has_dl_policy(p)) {
-		const struct cpumask *span = task_rq(p)->rd->span;
-
-		if (dl_bandwidth_enabled() && !cpumask_subset(span, new_mask)) {
+	if (task_has_dl_policy(p) && dl_bandwidth_enabled()) {
+		rcu_read_lock();
+		if (!cpumask_subset(task_rq(p)->rd->span, new_mask)) {
 			retval = -EBUSY;
+			rcu_read_unlock();
 			goto out_free_new_mask;
 		}
+		rcu_read_unlock();
 	}
 #endif
 again:


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/7] sched: Delete rq::skip_clock_update == -1
  2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
                   ` (3 preceding siblings ...)
  2014-09-20 16:51 ` [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
@ 2014-09-20 16:51 ` Kirill Tkhai
  2014-09-20 19:01   ` Peter Zijlstra
  2014-09-20 16:51 ` [PATCH 7/7] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai
  5 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Idle class task is always queued, so we can safely remove "-1" case here.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/core.c |    2 +-
 kernel/sched/rt.c   |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a40d6e1..7d0d023 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2812,7 +2812,7 @@ static void __sched __schedule(void)
 		switch_count = &prev->nvcsw;
 	}
 
-	if (task_on_rq_queued(prev) || rq->skip_clock_update < 0)
+	if (task_on_rq_queued(prev))
 		update_rq_clock(rq);
 
 	next = pick_next_task(rq, prev);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index de356b0..828eda0 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -835,7 +835,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
 				 * lest wakeup -> unthrottle time accumulate.
 				 */
 				if (rt_rq->rt_nr_running && rq->curr == rq->idle)
-					rq->skip_clock_update = -1;
+					rq->skip_clock_update = 0;
 			}
 			if (rt_rq->rt_time || rt_rq->rt_nr_running)
 				idle = 0;


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 7/7] sched/rt: Use resched_curr() in task_tick_rt()
  2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
                   ` (4 preceding siblings ...)
  2014-09-20 16:51 ` [PATCH 6/7] sched: Delete rq::skip_clock_update == -1 Kirill Tkhai
@ 2014-09-20 16:51 ` Kirill Tkhai
  5 siblings, 0 replies; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 16:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Ingo Molnar, Kirill Tkhai

From: Kirill Tkhai <ktkhai@parallels.com>

Some time ago PREEMPT_NEED_RESCHED was implemented,
so reschedule technics is a little more difficult now.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
---
 kernel/sched/rt.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 828eda0..8c7ba1e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2077,7 +2077,7 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
 	for_each_sched_rt_entity(rt_se) {
 		if (rt_se->run_list.prev != rt_se->run_list.next) {
 			requeue_task_rt(rq, p, 0);
-			set_tsk_need_resched(p);
+			resched_curr(rq);
 			return;
 		}
 	}


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW)
  2014-09-20 16:51 ` [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW) Kirill Tkhai
@ 2014-09-20 18:33   ` Peter Zijlstra
  2014-09-20 18:54     ` Peter Zijlstra
  2014-09-20 19:03     ` Peter Zijlstra
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 18:33 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 08:51:22PM +0400, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@parallels.com>
> 
> We may pick a task which is in context_switch() on other cpu at the moment.
> Parallel using of a single stack by two processes is not a good idea.

Please elaborate on who exactly that might happen. Its best to have
comprehensive changelogs for issues that fix races.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW)
  2014-09-20 18:33   ` Peter Zijlstra
@ 2014-09-20 18:54     ` Peter Zijlstra
  2014-09-20 20:09       ` Kirill Tkhai
  2014-09-20 19:03     ` Peter Zijlstra
  1 sibling, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 18:54 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai, ralf

On Sat, Sep 20, 2014 at 08:33:26PM +0200, Peter Zijlstra wrote:
> On Sat, Sep 20, 2014 at 08:51:22PM +0400, Kirill Tkhai wrote:
> > From: Kirill Tkhai <ktkhai@parallels.com>
> > 
> > We may pick a task which is in context_switch() on other cpu at the moment.
> > Parallel using of a single stack by two processes is not a good idea.
> 
> Please elaborate on who exactly that might happen. Its best to have
> comprehensive changelogs for issues that fix races.

FWIW IIRC we can remove UNLOCKED_CTXSW from IA64 and I forgot if I
audited MIPS, but I suspect we can (and should) remove it there too.

That would make this exception go away and clean up some of this ugly.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock
  2014-09-20 16:51 ` [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
@ 2014-09-20 18:57   ` Peter Zijlstra
  2014-09-20 19:25     ` Kirill Tkhai
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 18:57 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 08:51:28PM +0400, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@parallels.com>
> 
> dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
> Probability of use-after-free and memory corruption aren't zero here.
> 

Additionally we might want to add something like:
lockdep_assert_held_rcu() and put that in dl_bw_of() and other such
places.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-20 16:51 ` [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
@ 2014-09-20 18:59   ` Peter Zijlstra
  2014-09-20 19:05     ` Kirill Tkhai
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 18:59 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 08:51:40PM +0400, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@parallels.com>
> 
> task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
> Probability of NULL pointer derefference isn't zero in this place.

I don't see NULL derefs, just use-after-free.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/7] sched: Delete rq::skip_clock_update == -1
  2014-09-20 16:51 ` [PATCH 6/7] sched: Delete rq::skip_clock_update == -1 Kirill Tkhai
@ 2014-09-20 19:01   ` Peter Zijlstra
  2014-09-21  4:37     ` Mike Galbraith
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 19:01 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 08:51:46PM +0400, Kirill Tkhai wrote:
> From: Kirill Tkhai <ktkhai@parallels.com>
> 
> Idle class task is always queued, so we can safely remove "-1" case here.

Tag you're it :-)

https://lkml.org/lkml/2014/4/8/295

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW)
  2014-09-20 18:33   ` Peter Zijlstra
  2014-09-20 18:54     ` Peter Zijlstra
@ 2014-09-20 19:03     ` Peter Zijlstra
  1 sibling, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 19:03 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 08:33:26PM +0200, Peter Zijlstra wrote:
> On Sat, Sep 20, 2014 at 08:51:22PM +0400, Kirill Tkhai wrote:
> > From: Kirill Tkhai <ktkhai@parallels.com>
> > 
> > We may pick a task which is in context_switch() on other cpu at the moment.
> > Parallel using of a single stack by two processes is not a good idea.
> 
> Please elaborate on who exactly that might happen. Its best to have
> comprehensive changelogs for issues that fix races.

typing hard, ugh. s/who/how/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-20 18:59   ` Peter Zijlstra
@ 2014-09-20 19:05     ` Kirill Tkhai
  2014-09-20 19:18       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 19:05 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On 20.09.2014 22:59, Peter Zijlstra wrote:
> On Sat, Sep 20, 2014 at 08:51:40PM +0400, Kirill Tkhai wrote:
>> From: Kirill Tkhai <ktkhai@parallels.com>
>>
>> task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
>> Probability of NULL pointer derefference isn't zero in this place.
> 
> I don't see NULL derefs, just use-after-free.
> 

It's very paranod case :). Two pointers are here:

task_rq(p)->rd (somebody zeroed it "rd") ->span

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-20 19:05     ` Kirill Tkhai
@ 2014-09-20 19:18       ` Peter Zijlstra
  2014-09-20 19:21         ` Kirill Tkhai
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 19:18 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 11:05:46PM +0400, Kirill Tkhai wrote:
> On 20.09.2014 22:59, Peter Zijlstra wrote:
> > On Sat, Sep 20, 2014 at 08:51:40PM +0400, Kirill Tkhai wrote:
> >> From: Kirill Tkhai <ktkhai@parallels.com>
> >>
> >> task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
> >> Probability of NULL pointer derefference isn't zero in this place.
> > 
> > I don't see NULL derefs, just use-after-free.
> > 
> 
> It's very paranod case :). Two pointers are here:
> 
> task_rq(p)->rd (somebody zeroed it "rd") ->span

What you're saying is: due to the reuse someone might have put a NULL
in there. Which is fair, but I'd still call it use-after-free because
that is the first order problem. Dereferencing 'unknown' memory can of
course cause all kinds of 'fun' problems :-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock
  2014-09-20 19:18       ` Peter Zijlstra
@ 2014-09-20 19:21         ` Kirill Tkhai
  0 siblings, 0 replies; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 19:21 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On 20.09.2014 23:18, Peter Zijlstra wrote:
> On Sat, Sep 20, 2014 at 11:05:46PM +0400, Kirill Tkhai wrote:
>> On 20.09.2014 22:59, Peter Zijlstra wrote:
>>> On Sat, Sep 20, 2014 at 08:51:40PM +0400, Kirill Tkhai wrote:
>>>> From: Kirill Tkhai <ktkhai@parallels.com>
>>>>
>>>> task_rq(p)->rd and task_rq(p)->rd->span may be used-after-free here.
>>>> Probability of NULL pointer derefference isn't zero in this place.
>>>
>>> I don't see NULL derefs, just use-after-free.
>>>
>>
>> It's very paranod case :). Two pointers are here:
>>
>> task_rq(p)->rd (somebody zeroed it "rd") ->span
> 
> What you're saying is: due to the reuse someone might have put a NULL
> in there. Which is fair, but I'd still call it use-after-free because
> that is the first order problem. Dereferencing 'unknown' memory can of
> course cause all kinds of 'fun' problems :-)

Yeah, it's logical, I'll update the description.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock
  2014-09-20 18:57   ` Peter Zijlstra
@ 2014-09-20 19:25     ` Kirill Tkhai
  2014-09-20 19:32       ` Peter Zijlstra
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 19:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On 20.09.2014 22:57, Peter Zijlstra wrote:
> On Sat, Sep 20, 2014 at 08:51:28PM +0400, Kirill Tkhai wrote:
>> From: Kirill Tkhai <ktkhai@parallels.com>
>>
>> dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
>> Probability of use-after-free and memory corruption aren't zero here.
>>
> 
> Additionally we might want to add something like:
> lockdep_assert_held_rcu() and put that in dl_bw_of() and other such
> places.

Should we change (not now, in general) RCU-related pointers to use
rcu_dereference() to have unlocked RCU warnings in dmesg? To catch
a problems like that.

This may make code worse readable though.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock
  2014-09-20 19:25     ` Kirill Tkhai
@ 2014-09-20 19:32       ` Peter Zijlstra
  0 siblings, 0 replies; 21+ messages in thread
From: Peter Zijlstra @ 2014-09-20 19:32 UTC (permalink / raw)
  To: Kirill Tkhai; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, Sep 20, 2014 at 11:25:39PM +0400, Kirill Tkhai wrote:
> On 20.09.2014 22:57, Peter Zijlstra wrote:
> > On Sat, Sep 20, 2014 at 08:51:28PM +0400, Kirill Tkhai wrote:
> >> From: Kirill Tkhai <ktkhai@parallels.com>
> >>
> >> dl_bw_of() dereferences rq->rd which has to have RCU read lock held.
> >> Probability of use-after-free and memory corruption aren't zero here.
> >>
> > 
> > Additionally we might want to add something like:
> > lockdep_assert_held_rcu() and put that in dl_bw_of() and other such
> > places.
> 
> Should we change (not now, in general) RCU-related pointers to use
> rcu_dereference() to have unlocked RCU warnings in dmesg? To catch
> a problems like that.
> 
> This may make code worse readable though.

Possibly, we should probably use rcu_assign_pointer() and
rcu_dereference() on rq->rd. Sometimes you can avoid that if you're
playing games with static objects, but I don't think that is true here.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW)
  2014-09-20 18:54     ` Peter Zijlstra
@ 2014-09-20 20:09       ` Kirill Tkhai
  2014-09-20 20:19         ` Kirill Tkhai
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 20:09 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai, ralf

В Сб, 20/09/2014 в 20:54 +0200, Peter Zijlstra пишет:
> On Sat, Sep 20, 2014 at 08:33:26PM +0200, Peter Zijlstra wrote:
> > On Sat, Sep 20, 2014 at 08:51:22PM +0400, Kirill Tkhai wrote:
> > > From: Kirill Tkhai <ktkhai@parallels.com>
> > > 
> > > We may pick a task which is in context_switch() on other cpu at the moment.
> > > Parallel using of a single stack by two processes is not a good idea.
> > 
> > Please elaborate on who exactly that might happen. Its best to have
> > comprehensive changelogs for issues that fix races.
> 
> FWIW IIRC we can remove UNLOCKED_CTXSW from IA64 and I forgot if I
> audited MIPS, but I suspect we can (and should) remove it there too.
> 
> That would make this exception go away and clean up some of this ugly.

Yeah, you've said me about IA64:

http://www.spinics.net/lists/linux-ia64/msg10229.html

It's about 10 years since the logic, which was documented in ia64
header, has been removed. It looks like, ia64 maintainers are not
interested much...

***

To do not to start a new message. I've found the above when I was
analysing if the optimisation below is OK (assume, we have accessor
cpu_relax__while_on_cpu()):

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7d0d023..8d765ba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1699,8 +1699,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		goto stat;
 
 #ifdef CONFIG_SMP
-	cpu_relax__while_on_cpu(p);
-
 	p->sched_contributes_to_load = !!task_contributes_to_load(p);
 	p->state = TASK_WAKING;
 
@@ -1708,6 +1706,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		p->sched_class->task_waking(p);
 
 	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
+
+	cpu_relax__while_on_cpu(p);
+
 	if (task_cpu(p) != cpu) {
 		wake_flags |= WF_MIGRATED;
 		set_task_cpu(p, cpu);

Looks like, now problem here. Task p is dequeued, we can set sched_contributes_to_load and state
here, also task_waking does not produce problems, only arithmetics is there. select_task_rq()
is R/O function.

Now I'm testing this.


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW)
  2014-09-20 20:09       ` Kirill Tkhai
@ 2014-09-20 20:19         ` Kirill Tkhai
  0 siblings, 0 replies; 21+ messages in thread
From: Kirill Tkhai @ 2014-09-20 20:19 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kirill Tkhai, ralf

В Вс, 21/09/2014 в 00:09 +0400, Kirill Tkhai пишет:
> В Сб, 20/09/2014 в 20:54 +0200, Peter Zijlstra пишет:
> > On Sat, Sep 20, 2014 at 08:33:26PM +0200, Peter Zijlstra wrote:
> > > On Sat, Sep 20, 2014 at 08:51:22PM +0400, Kirill Tkhai wrote:
> > > > From: Kirill Tkhai <ktkhai@parallels.com>
> > > > 
> > > > We may pick a task which is in context_switch() on other cpu at the moment.
> > > > Parallel using of a single stack by two processes is not a good idea.
> > > 
> > > Please elaborate on who exactly that might happen. Its best to have
> > > comprehensive changelogs for issues that fix races.
> > 
> > FWIW IIRC we can remove UNLOCKED_CTXSW from IA64 and I forgot if I
> > audited MIPS, but I suspect we can (and should) remove it there too.
> > 
> > That would make this exception go away and clean up some of this ugly.
> 
> Yeah, you've said me about IA64:
> 
> http://www.spinics.net/lists/linux-ia64/msg10229.html
> 
> It's about 10 years since the logic, which was documented in ia64
> header, has been removed. It looks like, ia64 maintainers are not
> interested much...
> 
> ***
> 
> To do not to start a new message. I've found the above when I was
> analysing if the optimisation below is OK (assume, we have accessor
> cpu_relax__while_on_cpu()):
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 7d0d023..8d765ba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1699,8 +1699,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  		goto stat;
>  
>  #ifdef CONFIG_SMP
> -	cpu_relax__while_on_cpu(p);
> -
>  	p->sched_contributes_to_load = !!task_contributes_to_load(p);
>  	p->state = TASK_WAKING;
>  
> @@ -1708,6 +1706,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
>  		p->sched_class->task_waking(p);
>  
>  	cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
> +
> +	cpu_relax__while_on_cpu(p);
> +
>  	if (task_cpu(p) != cpu) {
>  		wake_flags |= WF_MIGRATED;
>  		set_task_cpu(p, cpu);
> 
> Looks like, now problem here. Task p is dequeued, we can set sched_contributes_to_load and state

s/now/no/

> here, also task_waking does not produce problems, only arithmetics is there. select_task_rq()
> is R/O function.
> 
> Now I'm testing this.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/7] sched: Delete rq::skip_clock_update == -1
  2014-09-20 19:01   ` Peter Zijlstra
@ 2014-09-21  4:37     ` Mike Galbraith
  0 siblings, 0 replies; 21+ messages in thread
From: Mike Galbraith @ 2014-09-21  4:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kirill Tkhai, linux-kernel, Ingo Molnar, Kirill Tkhai

On Sat, 2014-09-20 at 21:01 +0200, Peter Zijlstra wrote: 
> On Sat, Sep 20, 2014 at 08:51:46PM +0400, Kirill Tkhai wrote:
> > From: Kirill Tkhai <ktkhai@parallels.com>
> > 
> > Idle class task is always queued, so we can safely remove "-1" case here.
> 
> Tag you're it :-)
> 
> https://lkml.org/lkml/2014/4/8/295

Maybe it should be given the too annoying to live treatment.  Saving
fastpath cycles is great, corner cases less so.

-Mike


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-09-21  4:37 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-20 16:51 [PATCH 1/7] sched/fair: Remove duplicate code from can_migrate_task() Kirill Tkhai
2014-09-20 16:51 ` [PATCH 2/7] sched: Fix picking a task switching on other cpu (__ARCH_WANT_UNLOCKED_CTXSW) Kirill Tkhai
2014-09-20 18:33   ` Peter Zijlstra
2014-09-20 18:54     ` Peter Zijlstra
2014-09-20 20:09       ` Kirill Tkhai
2014-09-20 20:19         ` Kirill Tkhai
2014-09-20 19:03     ` Peter Zijlstra
2014-09-20 16:51 ` [PATCH 3/7] sched: Use dl_bw_of() under RCU read lock Kirill Tkhai
2014-09-20 18:57   ` Peter Zijlstra
2014-09-20 19:25     ` Kirill Tkhai
2014-09-20 19:32       ` Peter Zijlstra
2014-09-20 16:51 ` [PATCH 4/7] sched: cleanup: Rename out_unlock to out_free_new_mask Kirill Tkhai
2014-09-20 16:51 ` [PATCH 5/7] sched: Use rq->rd in sched_setaffinity() under RCU read lock Kirill Tkhai
2014-09-20 18:59   ` Peter Zijlstra
2014-09-20 19:05     ` Kirill Tkhai
2014-09-20 19:18       ` Peter Zijlstra
2014-09-20 19:21         ` Kirill Tkhai
2014-09-20 16:51 ` [PATCH 6/7] sched: Delete rq::skip_clock_update == -1 Kirill Tkhai
2014-09-20 19:01   ` Peter Zijlstra
2014-09-21  4:37     ` Mike Galbraith
2014-09-20 16:51 ` [PATCH 7/7] sched/rt: Use resched_curr() in task_tick_rt() Kirill Tkhai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).