linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
@ 2011-05-31 17:26 Sergey Senozhatsky
  2011-05-31 19:45 ` Peter Zijlstra
  2011-06-03 15:37 ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2011-05-31 17:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Peter Zijlstra, Andrew Morton, linux-kernel

Wrap __set_task_cpu() with RCU read-side critical section. 
__set_task_cpu() calls task_group() that performs rcu dereference check in 
task_subsys_state_check(), causing:

 [  152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection!
 [  152.262795] 
 [  152.262841] stack backtrace:
 [  152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599
 [  152.262851] Call Trace:
 [  152.262860]  [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf
 [  152.262868]  [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce
 [  152.262876]  [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98
 [  152.262883]  [<ffffffff8123a72d>] ? plist_del+0x82/0x89
 [  152.262889]  [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38
 [  152.262895]  [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89
 [  152.262902]  [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247
 [  152.262909]  [<ffffffff81037138>] post_schedule_rt+0x1b/0x24
 [  152.262918]  [<ffffffff81477c1c>] schedule+0x989/0xa9e
 [  152.262923]  [<ffffffff814775e6>] ? schedule+0x353/0xa9e
 [  152.262931]  [<ffffffff8147de58>] ? sub_preempt_count+0x8f/0xa3
 [  152.262938]  [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195
 [  152.262946]  [<ffffffff810072e5>] ? native_sched_clock+0x38/0x65
 [  152.262953]  [<ffffffff81062c0c>] ? cpu_clock+0x4a/0x5f
 [  152.262958]  [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195
 [  152.262965]  [<ffffffff81071a15>] ? trace_hardirqs_on_caller+0x10d/0x131
 [  152.262971]  [<ffffffff81071a46>] ? trace_hardirqs_on+0xd/0xf
 [  152.262977]  [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195
 [  152.262983]  [<ffffffff8109fd94>] watchdog+0x68/0xab
 [  152.262990]  [<ffffffff8105cb82>] kthread+0x9a/0xa2
 [  152.262999]  [<ffffffff81481e24>] kernel_thread_helper+0x4/0x10
 [  152.263005]  [<ffffffff8102d6bf>] ? finish_task_switch+0x76/0xf0
 [  152.263012]  [<ffffffff8147b258>] ? retint_restore_args+0x13/0x13
 [  152.263019]  [<ffffffff8105cae8>] ? __init_kthread_worker+0x53/0x53
 [  152.263024]  [<ffffffff81481e20>] ? gs_change+0x13/0x13


Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

---

 kernel/sched.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index cbb3a0e..cf0dd8a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2212,7 +2212,9 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 		perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0);
 	}
 
+	rcu_read_lock();
 	__set_task_cpu(p, new_cpu);
+	rcu_read_unlock();
 }
 
 struct migration_arg {


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-05-31 17:26 [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() Sergey Senozhatsky
@ 2011-05-31 19:45 ` Peter Zijlstra
  2011-06-03 15:37 ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2011-05-31 19:45 UTC (permalink / raw)
  To: Sergey Senozhatsky; +Cc: Ingo Molnar, Andrew Morton, linux-kernel

On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote:
> Wrap __set_task_cpu() with RCU read-side critical section. 
> __set_task_cpu() calls task_group() that performs rcu dereference check in 
> task_subsys_state_check(), causing:
> 
>  [  152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection!
>  [  152.262795] 
>  [  152.262841] stack backtrace:
>  [  152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599
>  [  152.262851] Call Trace:
>  [  152.262860]  [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf
>  [  152.262868]  [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce
>  [  152.262876]  [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98
>  [  152.262883]  [<ffffffff8123a72d>] ? plist_del+0x82/0x89
>  [  152.262889]  [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38
>  [  152.262895]  [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89
>  [  152.262902]  [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247
>  [  152.262909]  [<ffffffff81037138>] post_schedule_rt+0x1b/0x24
>  [  152.262918]  [<ffffffff81477c1c>] schedule+0x989/0xa9e
>  [  152.262923]  [<ffffffff814775e6>] ? schedule+0x353/0xa9e
>  [  152.262931]  [<ffffffff8147de58>] ? sub_preempt_count+0x8f/0xa3
>  [  152.262938]  [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195
>  [  152.262946]  [<ffffffff810072e5>] ? native_sched_clock+0x38/0x65
>  [  152.262953]  [<ffffffff81062c0c>] ? cpu_clock+0x4a/0x5f
>  [  152.262958]  [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195
>  [  152.262965]  [<ffffffff81071a15>] ? trace_hardirqs_on_caller+0x10d/0x131
>  [  152.262971]  [<ffffffff81071a46>] ? trace_hardirqs_on+0xd/0xf
>  [  152.262977]  [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195
>  [  152.262983]  [<ffffffff8109fd94>] watchdog+0x68/0xab
>  [  152.262990]  [<ffffffff8105cb82>] kthread+0x9a/0xa2
>  [  152.262999]  [<ffffffff81481e24>] kernel_thread_helper+0x4/0x10
>  [  152.263005]  [<ffffffff8102d6bf>] ? finish_task_switch+0x76/0xf0
>  [  152.263012]  [<ffffffff8147b258>] ? retint_restore_args+0x13/0x13
>  [  152.263019]  [<ffffffff8105cae8>] ? __init_kthread_worker+0x53/0x53
>  [  152.263024]  [<ffffffff81481e20>] ? gs_change+0x13/0x13
> 
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

Sorry, no. You failed to ask youself, what is it protecting, and how
does wrapping it like this ensure the proper thing is done.

What you've done is basically silence the warning for all set_task_cpu()
callers, without proper consideration.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-05-31 17:26 [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() Sergey Senozhatsky
  2011-05-31 19:45 ` Peter Zijlstra
@ 2011-06-03 15:37 ` Peter Zijlstra
  2011-06-03 18:16   ` Sergey Senozhatsky
                     ` (3 more replies)
  1 sibling, 4 replies; 12+ messages in thread
From: Peter Zijlstra @ 2011-06-03 15:37 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Oleg Nesterov

On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote:
>  [  152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection!
>  [  152.262795] 
>  [  152.262841] stack backtrace:
>  [  152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599
>  [  152.262851] Call Trace:
>  [  152.262860]  [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf
>  [  152.262868]  [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce
>  [  152.262876]  [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98
>  [  152.262883]  [<ffffffff8123a72d>] ? plist_del+0x82/0x89
>  [  152.262889]  [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38
>  [  152.262895]  [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89
>  [  152.262902]  [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247
>  [  152.262909]  [<ffffffff81037138>] post_schedule_rt+0x1b/0x24
>  [  152.262918]  [<ffffffff81477c1c>] schedule+0x989/0xa9e 

Does the below cure the issue? (completely untested)

---
Subject: sched: Fix/clarify set_task_cpu() locking rules
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Fri Jun 03 17:28:08 CEST 2011

Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
set_task_cpu() was called with both relevant rq->locks held, which
should be sufficient for running tasks since holding its rq->lock will
serialize against sched_move_task().

Update the comments and fix the task_group() lockdep test.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-k3lie1tjkcp3626dn5r5ihge@git.kernel.org
---
 kernel/sched.c |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq)
 /*
  * Return the group to which this tasks belongs.
  *
- * We use task_subsys_state_check() and extend the RCU verification
- * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
- * holds that lock for each task it moves into the cgroup. Therefore
- * by holding that lock, we pin the task to the current cgroup.
+ * We use task_subsys_state_check() and extend the RCU verification with
+ * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each
+ * task it moves into the cgroup. Therefore by holding either of those locks,
+ * we pin the task to the current cgroup.
  */
 static inline struct task_group *task_group(struct task_struct *p)
 {
@@ -616,7 +616,8 @@ static inline struct task_group *task_gr
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
-			lockdep_is_held(&p->pi_lock));
+			lockdep_is_held(&p->pi_lock) ||
+			lockdep_is_held(&task_rq(p)->lock));
 	tg = container_of(css, struct task_group, css);
 
 	return autogroup_task_group(p, tg);
@@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p,
 			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
 
 #ifdef CONFIG_LOCKDEP
+	/*
+	 * The caller should hold either p->pi_lock or rq->lock, when changing
+	 * a task's CPU.
+	 *
+	 * sched_move_task() holds both and thus holding either pins the cgroup,
+	 * see set_task_rq().
+	 *
+	 * Furthermore, all task_rq users should acquire both locks, see
+	 * task_rq_lock().
+	 */
 	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
 				      lockdep_is_held(&task_rq(p)->lock)));
 #endif


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-03 15:37 ` Peter Zijlstra
@ 2011-06-03 18:16   ` Sergey Senozhatsky
  2011-06-03 22:49   ` Sergey Senozhatsky
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2011-06-03 18:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Oleg Nesterov

On (06/03/11 17:37), Peter Zijlstra wrote:
> On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote:
> >  [  152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection!
> >  [  152.262795] 
> >  [  152.262841] stack backtrace:
> >  [  152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599
> >  [  152.262851] Call Trace:
> >  [  152.262860]  [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf
> >  [  152.262868]  [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce
> >  [  152.262876]  [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98
> >  [  152.262883]  [<ffffffff8123a72d>] ? plist_del+0x82/0x89
> >  [  152.262889]  [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38
> >  [  152.262895]  [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89
> >  [  152.262902]  [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247
> >  [  152.262909]  [<ffffffff81037138>] post_schedule_rt+0x1b/0x24
> >  [  152.262918]  [<ffffffff81477c1c>] schedule+0x989/0xa9e 
> 
> Does the below cure the issue? (completely untested)
>
Hello,

I believe it should, rq->lock is grabbed in post_schedule() and in 
find_lock_lowest_rq(). For now I grab task->pi_lock  in push_rt_task(), 
but your approach is definitely better.

Will test and report back.


	Sergey


> ---
> Subject: sched: Fix/clarify set_task_cpu() locking rules
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Fri Jun 03 17:28:08 CEST 2011
> 
> Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
> set_task_cpu() was called with both relevant rq->locks held, which
> should be sufficient for running tasks since holding its rq->lock will
> serialize against sched_move_task().
> 
> Update the comments and fix the task_group() lockdep test.
> 
> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Link: http://lkml.kernel.org/n/tip-k3lie1tjkcp3626dn5r5ihge@git.kernel.org
> ---
>  kernel/sched.c |   21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq)
>  /*
>   * Return the group to which this tasks belongs.
>   *
> - * We use task_subsys_state_check() and extend the RCU verification
> - * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
> - * holds that lock for each task it moves into the cgroup. Therefore
> - * by holding that lock, we pin the task to the current cgroup.
> + * We use task_subsys_state_check() and extend the RCU verification with
> + * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each
> + * task it moves into the cgroup. Therefore by holding either of those locks,
> + * we pin the task to the current cgroup.
>   */
>  static inline struct task_group *task_group(struct task_struct *p)
>  {
> @@ -616,7 +616,8 @@ static inline struct task_group *task_gr
>  	struct cgroup_subsys_state *css;
>  
>  	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> -			lockdep_is_held(&p->pi_lock));
> +			lockdep_is_held(&p->pi_lock) ||
> +			lockdep_is_held(&task_rq(p)->lock));
>  	tg = container_of(css, struct task_group, css);
>  
>  	return autogroup_task_group(p, tg);
> @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p,
>  			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
>  
>  #ifdef CONFIG_LOCKDEP
> +	/*
> +	 * The caller should hold either p->pi_lock or rq->lock, when changing
> +	 * a task's CPU.
> +	 *
> +	 * sched_move_task() holds both and thus holding either pins the cgroup,
> +	 * see set_task_rq().
> +	 *
> +	 * Furthermore, all task_rq users should acquire both locks, see
> +	 * task_rq_lock().
> +	 */
>  	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
>  				      lockdep_is_held(&task_rq(p)->lock)));
>  #endif
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-03 15:37 ` Peter Zijlstra
  2011-06-03 18:16   ` Sergey Senozhatsky
@ 2011-06-03 22:49   ` Sergey Senozhatsky
  2011-06-05 19:12   ` Oleg Nesterov
  2011-06-07 12:03   ` [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules tip-bot for Peter Zijlstra
  3 siblings, 0 replies; 12+ messages in thread
From: Sergey Senozhatsky @ 2011-06-03 22:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Oleg Nesterov

On (06/03/11 17:37), Peter Zijlstra wrote:
> On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote:
> >  [  152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection!
> >  [  152.262795] 
> >  [  152.262841] stack backtrace:
> >  [  152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599
> >  [  152.262851] Call Trace:
> >  [  152.262860]  [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf
> >  [  152.262868]  [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce
> >  [  152.262876]  [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98
> >  [  152.262883]  [<ffffffff8123a72d>] ? plist_del+0x82/0x89
> >  [  152.262889]  [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38
> >  [  152.262895]  [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89
> >  [  152.262902]  [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247
> >  [  152.262909]  [<ffffffff81037138>] post_schedule_rt+0x1b/0x24
> >  [  152.262918]  [<ffffffff81477c1c>] schedule+0x989/0xa9e 
> 
> Does the below cure the issue? (completely untested)
> 


Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>

	Sergey



> ---
> Subject: sched: Fix/clarify set_task_cpu() locking rules
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Fri Jun 03 17:28:08 CEST 2011
> 
> Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
> set_task_cpu() was called with both relevant rq->locks held, which
> should be sufficient for running tasks since holding its rq->lock will
> serialize against sched_move_task().
> 
> Update the comments and fix the task_group() lockdep test.
> 
> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Link: http://lkml.kernel.org/n/tip-k3lie1tjkcp3626dn5r5ihge@git.kernel.org
> ---
>  kernel/sched.c |   21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6/kernel/sched.c
> ===================================================================
> --- linux-2.6.orig/kernel/sched.c
> +++ linux-2.6/kernel/sched.c
> @@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq)
>  /*
>   * Return the group to which this tasks belongs.
>   *
> - * We use task_subsys_state_check() and extend the RCU verification
> - * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
> - * holds that lock for each task it moves into the cgroup. Therefore
> - * by holding that lock, we pin the task to the current cgroup.
> + * We use task_subsys_state_check() and extend the RCU verification with
> + * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each
> + * task it moves into the cgroup. Therefore by holding either of those locks,
> + * we pin the task to the current cgroup.
>   */
>  static inline struct task_group *task_group(struct task_struct *p)
>  {
> @@ -616,7 +616,8 @@ static inline struct task_group *task_gr
>  	struct cgroup_subsys_state *css;
>  
>  	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> -			lockdep_is_held(&p->pi_lock));
> +			lockdep_is_held(&p->pi_lock) ||
> +			lockdep_is_held(&task_rq(p)->lock));
>  	tg = container_of(css, struct task_group, css);
>  
>  	return autogroup_task_group(p, tg);
> @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p,
>  			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
>  
>  #ifdef CONFIG_LOCKDEP
> +	/*
> +	 * The caller should hold either p->pi_lock or rq->lock, when changing
> +	 * a task's CPU.
> +	 *
> +	 * sched_move_task() holds both and thus holding either pins the cgroup,
> +	 * see set_task_rq().
> +	 *
> +	 * Furthermore, all task_rq users should acquire both locks, see
> +	 * task_rq_lock().
> +	 */
>  	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
>  				      lockdep_is_held(&task_rq(p)->lock)));
>  #endif
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-03 15:37 ` Peter Zijlstra
  2011-06-03 18:16   ` Sergey Senozhatsky
  2011-06-03 22:49   ` Sergey Senozhatsky
@ 2011-06-05 19:12   ` Oleg Nesterov
  2011-06-06  9:06     ` Peter Zijlstra
  2011-06-06 13:43     ` Peter Zijlstra
  2011-06-07 12:03   ` [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules tip-bot for Peter Zijlstra
  3 siblings, 2 replies; 12+ messages in thread
From: Oleg Nesterov @ 2011-06-05 19:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel

On 06/03, Peter Zijlstra wrote:
>
> @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p,
>  			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
>
>  #ifdef CONFIG_LOCKDEP
> +	/*
> +	 * The caller should hold either p->pi_lock or rq->lock, when changing
> +	 * a task's CPU.

Is it literally true? IIRC, we need ->pi_lock if the task is not active,
and rq->lock if p->on_rq = 1. And that is why we do not clear p->on_rq
between deactivate_task() + activate_task(), correct?

> +	 *
> +	 * sched_move_task() holds both and thus holding either pins the cgroup,
> +	 * see set_task_rq().
> +	 *
> +	 * Furthermore, all task_rq users should acquire both locks, see
> +	 * task_rq_lock().
> +	 */
>  	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
>  				      lockdep_is_held(&task_rq(p)->lock)));

IOW, perhaps this should be

	WARN_ON_ONCE(debug_locks && !lockdep_is_held(p->on_rq ?
					&task_rq(p)->lock : &p->pi_lock))

?

Not that I really suggest to change this WARN_ON(), I am just trying
to recall the new rules.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-05 19:12   ` Oleg Nesterov
@ 2011-06-06  9:06     ` Peter Zijlstra
  2011-06-06 16:46       ` Oleg Nesterov
  2011-06-06 13:43     ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2011-06-06  9:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel

On Sun, 2011-06-05 at 21:12 +0200, Oleg Nesterov wrote:
> On 06/03, Peter Zijlstra wrote:
> >
> > @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p,
> >  			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
> >
> >  #ifdef CONFIG_LOCKDEP
> > +	/*
> > +	 * The caller should hold either p->pi_lock or rq->lock, when changing
> > +	 * a task's CPU.
> 
> Is it literally true? IIRC, we need ->pi_lock if the task is not active,
> and rq->lock if p->on_rq = 1. And that is why we do not clear p->on_rq
> between deactivate_task() + activate_task(), correct?
> 
> > +	 *
> > +	 * sched_move_task() holds both and thus holding either pins the cgroup,
> > +	 * see set_task_rq().
> > +	 *
> > +	 * Furthermore, all task_rq users should acquire both locks, see
> > +	 * task_rq_lock().
> > +	 */
> >  	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
> >  				      lockdep_is_held(&task_rq(p)->lock)));
> 
> IOW, perhaps this should be
> 
> 	WARN_ON_ONCE(debug_locks && !lockdep_is_held(p->on_rq ?
> 					&task_rq(p)->lock : &p->pi_lock))
> 
> ?
> 
> Not that I really suggest to change this WARN_ON(), I am just trying
> to recall the new rules.

You're right, p->pi_lock for wakeups, rq->lock for runnable tasks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-05 19:12   ` Oleg Nesterov
  2011-06-06  9:06     ` Peter Zijlstra
@ 2011-06-06 13:43     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2011-06-06 13:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel

On Sun, 2011-06-05 at 21:12 +0200, Oleg Nesterov wrote:
> 
>         WARN_ON_ONCE(debug_locks && !lockdep_is_held(p->on_rq ?
>                                         &task_rq(p)->lock : &p->pi_lock))
> 
> Not that I really suggest to change this WARN_ON(), I am just trying
> to recall the new rules. 

I actually like that, so I tried it, we need to clean up some of the
fork() path hacks but it should be possible. Will try and remember for
the next release or so.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-06  9:06     ` Peter Zijlstra
@ 2011-06-06 16:46       ` Oleg Nesterov
  2011-06-07  9:31         ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2011-06-06 16:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel

On 06/06, Peter Zijlstra wrote:
>
> You're right, p->pi_lock for wakeups, rq->lock for runnable tasks.

Good, thanks.

Help! I have another question.

	try_to_wake_up:

		raw_spin_lock_irqsave(&p->pi_lock, flags);
		if (!(p->state & state))
			goto out;

		cpu = task_cpu(p);

		if (p->on_rq && ttwu_remote(p, wake_flags))
			goto stat;

This doesn't look a bit confusing, we can't trust "cpu = task_cpu" before
we check ->on_rq. OK, not a problem, this cpu number can only be used in
ttwu_stat(cpu).

But ttwu_stat(cpu) in turn does

	if (cpu != task_cpu(p))
		schedstat_inc(p, se.statistics.nr_wakeups_migrate);

Ignoring the theoretical races with pull_task/etc, how it is possible
that cpu != task_cpu(p) ? Another caller is try_to_wake_up_local(), it
obviously can't trigger this case.

This looks broken to me. Looking at its name, I guess nr_wakeups_migrate
should be incremented if ttwu does set_task_cpu(), correct?

IOW. Don't we need something like the (untested/ucompiled) patch below?
_If_ I am right, I can resend it with the changelog/etc but please feel
free to make another fix.

Oleg.

--- x/kernel/sched.c
+++ x/kernel/sched.c
@@ -2423,13 +2423,14 @@ static void update_avg(u64 *avg, u64 sam
 #endif
 
 static void
-ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
+ttwu_stat(struct task_struct *p, bool migrate, int wake_flags)
 {
 #ifdef CONFIG_SCHEDSTATS
 	struct rq *rq = this_rq();
 
 #ifdef CONFIG_SMP
 	int this_cpu = smp_processor_id();
+	int cpu = task_cpu(p);
 
 	if (cpu == this_cpu) {
 		schedstat_inc(rq, ttwu_local);
@@ -2455,7 +2456,7 @@ ttwu_stat(struct task_struct *p, int cpu
 	if (wake_flags & WF_SYNC)
 		schedstat_inc(p, se.statistics.nr_wakeups_sync);
 
-	if (cpu != task_cpu(p))
+	if (migrate)
 		schedstat_inc(p, se.statistics.nr_wakeups_migrate);
 
 #endif /* CONFIG_SCHEDSTATS */
@@ -2630,6 +2631,7 @@ try_to_wake_up(struct task_struct *p, un
 {
 	unsigned long flags;
 	int cpu, success = 0;
+	bool migrate = false;
 
 	smp_wmb();
 	raw_spin_lock_irqsave(&p->pi_lock, flags);
@@ -2637,7 +2639,6 @@ try_to_wake_up(struct task_struct *p, un
 		goto out;
 
 	success = 1; /* we're going to change ->state */
-	cpu = task_cpu(p);
 
 	if (p->on_rq && ttwu_remote(p, wake_flags))
 		goto stat;
@@ -2674,13 +2675,15 @@ try_to_wake_up(struct task_struct *p, un
 		p->sched_class->task_waking(p);
 
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-	if (task_cpu(p) != cpu)
+	if (task_cpu(p) != cpu) {
 		set_task_cpu(p, cpu);
+		migrate = true;
+	}
 #endif /* CONFIG_SMP */
 
 	ttwu_queue(p, cpu);
 stat:
-	ttwu_stat(p, cpu, wake_flags);
+	ttwu_stat(p, migrate, wake_flags);
 out:
 	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 
@@ -2716,7 +2719,7 @@ static void try_to_wake_up_local(struct 
 		ttwu_activate(rq, p, ENQUEUE_WAKEUP);
 
 	ttwu_do_wakeup(rq, p, 0);
-	ttwu_stat(p, smp_processor_id(), 0);
+	ttwu_stat(p, false, 0);
 out:
 	raw_spin_unlock(&p->pi_lock);
 }



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-06 16:46       ` Oleg Nesterov
@ 2011-06-07  9:31         ` Peter Zijlstra
  2011-06-07 14:03           ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2011-06-07  9:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel

On Mon, 2011-06-06 at 18:46 +0200, Oleg Nesterov wrote:
> On 06/06, Peter Zijlstra wrote:
> >
> > You're right, p->pi_lock for wakeups, rq->lock for runnable tasks.
> 
> Good, thanks.
> 
> Help! I have another question.
> 
> 	try_to_wake_up:
> 
> 		raw_spin_lock_irqsave(&p->pi_lock, flags);
> 		if (!(p->state & state))
> 			goto out;
> 
> 		cpu = task_cpu(p);
> 
> 		if (p->on_rq && ttwu_remote(p, wake_flags))
> 			goto stat;
> 
> This doesn't look a bit confusing, we can't trust "cpu = task_cpu" before
> we check ->on_rq. OK, not a problem, this cpu number can only be used in
> ttwu_stat(cpu).
> 
> But ttwu_stat(cpu) in turn does
> 
> 	if (cpu != task_cpu(p))
> 		schedstat_inc(p, se.statistics.nr_wakeups_migrate);
> 
> Ignoring the theoretical races with pull_task/etc, how it is possible
> that cpu != task_cpu(p) ? Another caller is try_to_wake_up_local(), it
> obviously can't trigger this case.
> 
> This looks broken to me. Looking at its name, I guess nr_wakeups_migrate
> should be incremented if ttwu does set_task_cpu(), correct?
> 
> IOW. Don't we need something like the (untested/ucompiled) patch below?
> _If_ I am right, I can resend it with the changelog/etc but please feel
> free to make another fix.

You're right, I spotted the same a few days ago which resulted in:

---
commit f339b9dc1f03591761d5d930800db24bc0eda1e1
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Tue May 31 10:49:20 2011 +0200

    sched: Fix schedstat.nr_wakeups_migrate
    
    While looking over the code I found that with the ttwu rework the
    nr_wakeups_migrate test broke since we now switch cpus prior to
    calling ttwu_stat(), hence the test is always true.
    
    Cure this by passing the migration state in wake_flags. Also move the
    whole test under CONFIG_SMP, its hard to migrate tasks on UP :-)
    
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Link: http://lkml.kernel.org/n/tip-pwwxl7gdqs5676f1d4cx6pj7@git.kernel.org
    Signed-off-by: Ingo Molnar <mingo@elte.hu>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8da84b7..483c1ed 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1063,6 +1063,7 @@ struct sched_domain;
  */
 #define WF_SYNC		0x01		/* waker goes to sleep after wakup */
 #define WF_FORK		0x02		/* child wakeup after fork */
+#define WF_MIGRATED	0x04		/* internal use, task got migrated */
 
 #define ENQUEUE_WAKEUP		1
 #define ENQUEUE_HEAD		2
diff --git a/kernel/sched.c b/kernel/sched.c
index 49cc70b..2fe98ed 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2447,6 +2447,10 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 		}
 		rcu_read_unlock();
 	}
+
+	if (wake_flags & WF_MIGRATED)
+		schedstat_inc(p, se.statistics.nr_wakeups_migrate);
+
 #endif /* CONFIG_SMP */
 
 	schedstat_inc(rq, ttwu_count);
@@ -2455,9 +2459,6 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
 	if (wake_flags & WF_SYNC)
 		schedstat_inc(p, se.statistics.nr_wakeups_sync);
 
-	if (cpu != task_cpu(p))
-		schedstat_inc(p, se.statistics.nr_wakeups_migrate);
-
 #endif /* CONFIG_SCHEDSTATS */
 }
 
@@ -2675,8 +2676,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 		p->sched_class->task_waking(p);
 
 	cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-	if (task_cpu(p) != cpu)
+	if (task_cpu(p) != cpu) {
+		wake_flags |= WF_MIGRATED;
 		set_task_cpu(p, cpu);
+	}
 #endif /* CONFIG_SMP */
 
 	ttwu_queue(p, cpu);


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules
  2011-06-03 15:37 ` Peter Zijlstra
                     ` (2 preceding siblings ...)
  2011-06-05 19:12   ` Oleg Nesterov
@ 2011-06-07 12:03   ` tip-bot for Peter Zijlstra
  3 siblings, 0 replies; 12+ messages in thread
From: tip-bot for Peter Zijlstra @ 2011-06-07 12:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz,
	sergey.senozhatsky, tglx, oleg, mingo

Commit-ID:  6c6c54e1807faf116724451ef2bd14993780470a
Gitweb:     http://git.kernel.org/tip/6c6c54e1807faf116724451ef2bd14993780470a
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 3 Jun 2011 17:37:07 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Tue, 7 Jun 2011 12:26:40 +0200

sched: Fix/clarify set_task_cpu() locking rules

Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where
set_task_cpu() was called with both relevant rq->locks held, which
should be sufficient for running tasks since holding its rq->lock
will serialize against sched_move_task().

Update the comments and fix the task_group() lockdep test.

Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1307115427.2353.3456.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |   21 ++++++++++++++++-----
 1 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 2fe98ed..3f2e502 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq)
 /*
  * Return the group to which this tasks belongs.
  *
- * We use task_subsys_state_check() and extend the RCU verification
- * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach()
- * holds that lock for each task it moves into the cgroup. Therefore
- * by holding that lock, we pin the task to the current cgroup.
+ * We use task_subsys_state_check() and extend the RCU verification with
+ * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each
+ * task it moves into the cgroup. Therefore by holding either of those locks,
+ * we pin the task to the current cgroup.
  */
 static inline struct task_group *task_group(struct task_struct *p)
 {
@@ -616,7 +616,8 @@ static inline struct task_group *task_group(struct task_struct *p)
 	struct cgroup_subsys_state *css;
 
 	css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
-			lockdep_is_held(&p->pi_lock));
+			lockdep_is_held(&p->pi_lock) ||
+			lockdep_is_held(&task_rq(p)->lock));
 	tg = container_of(css, struct task_group, css);
 
 	return autogroup_task_group(p, tg);
@@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 			!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
 
 #ifdef CONFIG_LOCKDEP
+	/*
+	 * The caller should hold either p->pi_lock or rq->lock, when changing
+	 * a task's CPU. ->pi_lock for waking tasks, rq->lock for runnable tasks.
+	 *
+	 * sched_move_task() holds both and thus holding either pins the cgroup,
+	 * see set_task_rq().
+	 *
+	 * Furthermore, all task_rq users should acquire both locks, see
+	 * task_rq_lock().
+	 */
 	WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) ||
 				      lockdep_is_held(&task_rq(p)->lock)));
 #endif

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu()
  2011-06-07  9:31         ` Peter Zijlstra
@ 2011-06-07 14:03           ` Oleg Nesterov
  0 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2011-06-07 14:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel

On 06/07, Peter Zijlstra wrote:
>
>     Cure this by passing the migration state in wake_flags. Also move the
>     whole test under CONFIG_SMP, its hard to migrate tasks on UP :-)


Agreed, WF_MIGRATE looks better. and "move under CONFIG_SMP" is probably
correct ;)

I still think it makes sense to move task_cpu(p) from try_to_wake_up() to
ttwu_stat(), just because it looks confusing. But this is very minor,
please forget.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-06-07 14:05 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-31 17:26 [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() Sergey Senozhatsky
2011-05-31 19:45 ` Peter Zijlstra
2011-06-03 15:37 ` Peter Zijlstra
2011-06-03 18:16   ` Sergey Senozhatsky
2011-06-03 22:49   ` Sergey Senozhatsky
2011-06-05 19:12   ` Oleg Nesterov
2011-06-06  9:06     ` Peter Zijlstra
2011-06-06 16:46       ` Oleg Nesterov
2011-06-07  9:31         ` Peter Zijlstra
2011-06-07 14:03           ` Oleg Nesterov
2011-06-06 13:43     ` Peter Zijlstra
2011-06-07 12:03   ` [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).