* [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() @ 2011-05-31 17:26 Sergey Senozhatsky 2011-05-31 19:45 ` Peter Zijlstra 2011-06-03 15:37 ` Peter Zijlstra 0 siblings, 2 replies; 12+ messages in thread From: Sergey Senozhatsky @ 2011-05-31 17:26 UTC (permalink / raw) To: Ingo Molnar; +Cc: Peter Zijlstra, Andrew Morton, linux-kernel Wrap __set_task_cpu() with RCU read-side critical section. __set_task_cpu() calls task_group() that performs rcu dereference check in task_subsys_state_check(), causing: [ 152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection! [ 152.262795] [ 152.262841] stack backtrace: [ 152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599 [ 152.262851] Call Trace: [ 152.262860] [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf [ 152.262868] [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce [ 152.262876] [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98 [ 152.262883] [<ffffffff8123a72d>] ? plist_del+0x82/0x89 [ 152.262889] [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38 [ 152.262895] [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89 [ 152.262902] [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247 [ 152.262909] [<ffffffff81037138>] post_schedule_rt+0x1b/0x24 [ 152.262918] [<ffffffff81477c1c>] schedule+0x989/0xa9e [ 152.262923] [<ffffffff814775e6>] ? schedule+0x353/0xa9e [ 152.262931] [<ffffffff8147de58>] ? sub_preempt_count+0x8f/0xa3 [ 152.262938] [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195 [ 152.262946] [<ffffffff810072e5>] ? native_sched_clock+0x38/0x65 [ 152.262953] [<ffffffff81062c0c>] ? cpu_clock+0x4a/0x5f [ 152.262958] [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195 [ 152.262965] [<ffffffff81071a15>] ? trace_hardirqs_on_caller+0x10d/0x131 [ 152.262971] [<ffffffff81071a46>] ? trace_hardirqs_on+0xd/0xf [ 152.262977] [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195 [ 152.262983] [<ffffffff8109fd94>] watchdog+0x68/0xab [ 152.262990] [<ffffffff8105cb82>] kthread+0x9a/0xa2 [ 152.262999] [<ffffffff81481e24>] kernel_thread_helper+0x4/0x10 [ 152.263005] [<ffffffff8102d6bf>] ? finish_task_switch+0x76/0xf0 [ 152.263012] [<ffffffff8147b258>] ? retint_restore_args+0x13/0x13 [ 152.263019] [<ffffffff8105cae8>] ? __init_kthread_worker+0x53/0x53 [ 152.263024] [<ffffffff81481e20>] ? gs_change+0x13/0x13 Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> --- kernel/sched.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index cbb3a0e..cf0dd8a 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2212,7 +2212,9 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) perf_sw_event(PERF_COUNT_SW_CPU_MIGRATIONS, 1, 1, NULL, 0); } + rcu_read_lock(); __set_task_cpu(p, new_cpu); + rcu_read_unlock(); } struct migration_arg { ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-05-31 17:26 [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() Sergey Senozhatsky @ 2011-05-31 19:45 ` Peter Zijlstra 2011-06-03 15:37 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2011-05-31 19:45 UTC (permalink / raw) To: Sergey Senozhatsky; +Cc: Ingo Molnar, Andrew Morton, linux-kernel On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote: > Wrap __set_task_cpu() with RCU read-side critical section. > __set_task_cpu() calls task_group() that performs rcu dereference check in > task_subsys_state_check(), causing: > > [ 152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection! > [ 152.262795] > [ 152.262841] stack backtrace: > [ 152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599 > [ 152.262851] Call Trace: > [ 152.262860] [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf > [ 152.262868] [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce > [ 152.262876] [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98 > [ 152.262883] [<ffffffff8123a72d>] ? plist_del+0x82/0x89 > [ 152.262889] [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38 > [ 152.262895] [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89 > [ 152.262902] [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247 > [ 152.262909] [<ffffffff81037138>] post_schedule_rt+0x1b/0x24 > [ 152.262918] [<ffffffff81477c1c>] schedule+0x989/0xa9e > [ 152.262923] [<ffffffff814775e6>] ? schedule+0x353/0xa9e > [ 152.262931] [<ffffffff8147de58>] ? sub_preempt_count+0x8f/0xa3 > [ 152.262938] [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195 > [ 152.262946] [<ffffffff810072e5>] ? native_sched_clock+0x38/0x65 > [ 152.262953] [<ffffffff81062c0c>] ? cpu_clock+0x4a/0x5f > [ 152.262958] [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195 > [ 152.262965] [<ffffffff81071a15>] ? trace_hardirqs_on_caller+0x10d/0x131 > [ 152.262971] [<ffffffff81071a46>] ? trace_hardirqs_on+0xd/0xf > [ 152.262977] [<ffffffff8109fd2c>] ? watchdog_enable+0x195/0x195 > [ 152.262983] [<ffffffff8109fd94>] watchdog+0x68/0xab > [ 152.262990] [<ffffffff8105cb82>] kthread+0x9a/0xa2 > [ 152.262999] [<ffffffff81481e24>] kernel_thread_helper+0x4/0x10 > [ 152.263005] [<ffffffff8102d6bf>] ? finish_task_switch+0x76/0xf0 > [ 152.263012] [<ffffffff8147b258>] ? retint_restore_args+0x13/0x13 > [ 152.263019] [<ffffffff8105cae8>] ? __init_kthread_worker+0x53/0x53 > [ 152.263024] [<ffffffff81481e20>] ? gs_change+0x13/0x13 > > > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Sorry, no. You failed to ask youself, what is it protecting, and how does wrapping it like this ensure the proper thing is done. What you've done is basically silence the warning for all set_task_cpu() callers, without proper consideration. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-05-31 17:26 [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() Sergey Senozhatsky 2011-05-31 19:45 ` Peter Zijlstra @ 2011-06-03 15:37 ` Peter Zijlstra 2011-06-03 18:16 ` Sergey Senozhatsky ` (3 more replies) 1 sibling, 4 replies; 12+ messages in thread From: Peter Zijlstra @ 2011-06-03 15:37 UTC (permalink / raw) To: Sergey Senozhatsky Cc: Ingo Molnar, Andrew Morton, linux-kernel, Oleg Nesterov On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote: > [ 152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection! > [ 152.262795] > [ 152.262841] stack backtrace: > [ 152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599 > [ 152.262851] Call Trace: > [ 152.262860] [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf > [ 152.262868] [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce > [ 152.262876] [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98 > [ 152.262883] [<ffffffff8123a72d>] ? plist_del+0x82/0x89 > [ 152.262889] [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38 > [ 152.262895] [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89 > [ 152.262902] [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247 > [ 152.262909] [<ffffffff81037138>] post_schedule_rt+0x1b/0x24 > [ 152.262918] [<ffffffff81477c1c>] schedule+0x989/0xa9e Does the below cure the issue? (completely untested) --- Subject: sched: Fix/clarify set_task_cpu() locking rules From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Jun 03 17:28:08 CEST 2011 Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where set_task_cpu() was called with both relevant rq->locks held, which should be sufficient for running tasks since holding its rq->lock will serialize against sched_move_task(). Update the comments and fix the task_group() lockdep test. Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-k3lie1tjkcp3626dn5r5ihge@git.kernel.org --- kernel/sched.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) Index: linux-2.6/kernel/sched.c =================================================================== --- linux-2.6.orig/kernel/sched.c +++ linux-2.6/kernel/sched.c @@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq) /* * Return the group to which this tasks belongs. * - * We use task_subsys_state_check() and extend the RCU verification - * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach() - * holds that lock for each task it moves into the cgroup. Therefore - * by holding that lock, we pin the task to the current cgroup. + * We use task_subsys_state_check() and extend the RCU verification with + * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each + * task it moves into the cgroup. Therefore by holding either of those locks, + * we pin the task to the current cgroup. */ static inline struct task_group *task_group(struct task_struct *p) { @@ -616,7 +616,8 @@ static inline struct task_group *task_gr struct cgroup_subsys_state *css; css = task_subsys_state_check(p, cpu_cgroup_subsys_id, - lockdep_is_held(&p->pi_lock)); + lockdep_is_held(&p->pi_lock) || + lockdep_is_held(&task_rq(p)->lock)); tg = container_of(css, struct task_group, css); return autogroup_task_group(p, tg); @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); #ifdef CONFIG_LOCKDEP + /* + * The caller should hold either p->pi_lock or rq->lock, when changing + * a task's CPU. + * + * sched_move_task() holds both and thus holding either pins the cgroup, + * see set_task_rq(). + * + * Furthermore, all task_rq users should acquire both locks, see + * task_rq_lock(). + */ WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) || lockdep_is_held(&task_rq(p)->lock))); #endif ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-03 15:37 ` Peter Zijlstra @ 2011-06-03 18:16 ` Sergey Senozhatsky 2011-06-03 22:49 ` Sergey Senozhatsky ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Sergey Senozhatsky @ 2011-06-03 18:16 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Oleg Nesterov On (06/03/11 17:37), Peter Zijlstra wrote: > On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote: > > [ 152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection! > > [ 152.262795] > > [ 152.262841] stack backtrace: > > [ 152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599 > > [ 152.262851] Call Trace: > > [ 152.262860] [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf > > [ 152.262868] [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce > > [ 152.262876] [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98 > > [ 152.262883] [<ffffffff8123a72d>] ? plist_del+0x82/0x89 > > [ 152.262889] [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38 > > [ 152.262895] [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89 > > [ 152.262902] [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247 > > [ 152.262909] [<ffffffff81037138>] post_schedule_rt+0x1b/0x24 > > [ 152.262918] [<ffffffff81477c1c>] schedule+0x989/0xa9e > > Does the below cure the issue? (completely untested) > Hello, I believe it should, rq->lock is grabbed in post_schedule() and in find_lock_lowest_rq(). For now I grab task->pi_lock in push_rt_task(), but your approach is definitely better. Will test and report back. Sergey > --- > Subject: sched: Fix/clarify set_task_cpu() locking rules > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri Jun 03 17:28:08 CEST 2011 > > Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where > set_task_cpu() was called with both relevant rq->locks held, which > should be sufficient for running tasks since holding its rq->lock will > serialize against sched_move_task(). > > Update the comments and fix the task_group() lockdep test. > > Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Cc: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Link: http://lkml.kernel.org/n/tip-k3lie1tjkcp3626dn5r5ihge@git.kernel.org > --- > kernel/sched.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > Index: linux-2.6/kernel/sched.c > =================================================================== > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq) > /* > * Return the group to which this tasks belongs. > * > - * We use task_subsys_state_check() and extend the RCU verification > - * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach() > - * holds that lock for each task it moves into the cgroup. Therefore > - * by holding that lock, we pin the task to the current cgroup. > + * We use task_subsys_state_check() and extend the RCU verification with > + * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each > + * task it moves into the cgroup. Therefore by holding either of those locks, > + * we pin the task to the current cgroup. > */ > static inline struct task_group *task_group(struct task_struct *p) > { > @@ -616,7 +616,8 @@ static inline struct task_group *task_gr > struct cgroup_subsys_state *css; > > css = task_subsys_state_check(p, cpu_cgroup_subsys_id, > - lockdep_is_held(&p->pi_lock)); > + lockdep_is_held(&p->pi_lock) || > + lockdep_is_held(&task_rq(p)->lock)); > tg = container_of(css, struct task_group, css); > > return autogroup_task_group(p, tg); > @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, > !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); > > #ifdef CONFIG_LOCKDEP > + /* > + * The caller should hold either p->pi_lock or rq->lock, when changing > + * a task's CPU. > + * > + * sched_move_task() holds both and thus holding either pins the cgroup, > + * see set_task_rq(). > + * > + * Furthermore, all task_rq users should acquire both locks, see > + * task_rq_lock(). > + */ > WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) || > lockdep_is_held(&task_rq(p)->lock))); > #endif > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-03 15:37 ` Peter Zijlstra 2011-06-03 18:16 ` Sergey Senozhatsky @ 2011-06-03 22:49 ` Sergey Senozhatsky 2011-06-05 19:12 ` Oleg Nesterov 2011-06-07 12:03 ` [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules tip-bot for Peter Zijlstra 3 siblings, 0 replies; 12+ messages in thread From: Sergey Senozhatsky @ 2011-06-03 22:49 UTC (permalink / raw) To: Peter Zijlstra; +Cc: Ingo Molnar, Andrew Morton, linux-kernel, Oleg Nesterov On (06/03/11 17:37), Peter Zijlstra wrote: > On Tue, 2011-05-31 at 20:26 +0300, Sergey Senozhatsky wrote: > > [ 152.262791] kernel/sched.c:619 invoked rcu_dereference_check() without protection! > > [ 152.262795] > > [ 152.262841] stack backtrace: > > [ 152.262846] Pid: 16, comm: watchdog/1 Not tainted 3.0.0-rc1-dbg-00441-g1d5f9cc-dirty #599 > > [ 152.262851] Call Trace: > > [ 152.262860] [<ffffffff8106e17b>] lockdep_rcu_dereference+0xa7/0xaf > > [ 152.262868] [<ffffffff810369f4>] set_task_cpu+0x1ed/0x3ce > > [ 152.262876] [<ffffffff8123a5d7>] ? plist_check_head+0x94/0x98 > > [ 152.262883] [<ffffffff8123a72d>] ? plist_del+0x82/0x89 > > [ 152.262889] [<ffffffff8102b139>] ? dequeue_task_rt+0x33/0x38 > > [ 152.262895] [<ffffffff8102e3ac>] ? dequeue_task+0x82/0x89 > > [ 152.262902] [<ffffffff81036fc0>] push_rt_task.part.131+0x1bb/0x247 > > [ 152.262909] [<ffffffff81037138>] post_schedule_rt+0x1b/0x24 > > [ 152.262918] [<ffffffff81477c1c>] schedule+0x989/0xa9e > > Does the below cure the issue? (completely untested) > Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Sergey > --- > Subject: sched: Fix/clarify set_task_cpu() locking rules > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri Jun 03 17:28:08 CEST 2011 > > Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where > set_task_cpu() was called with both relevant rq->locks held, which > should be sufficient for running tasks since holding its rq->lock will > serialize against sched_move_task(). > > Update the comments and fix the task_group() lockdep test. > > Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> > Cc: Oleg Nesterov <oleg@redhat.com> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Link: http://lkml.kernel.org/n/tip-k3lie1tjkcp3626dn5r5ihge@git.kernel.org > --- > kernel/sched.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > Index: linux-2.6/kernel/sched.c > =================================================================== > --- linux-2.6.orig/kernel/sched.c > +++ linux-2.6/kernel/sched.c > @@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq) > /* > * Return the group to which this tasks belongs. > * > - * We use task_subsys_state_check() and extend the RCU verification > - * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach() > - * holds that lock for each task it moves into the cgroup. Therefore > - * by holding that lock, we pin the task to the current cgroup. > + * We use task_subsys_state_check() and extend the RCU verification with > + * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each > + * task it moves into the cgroup. Therefore by holding either of those locks, > + * we pin the task to the current cgroup. > */ > static inline struct task_group *task_group(struct task_struct *p) > { > @@ -616,7 +616,8 @@ static inline struct task_group *task_gr > struct cgroup_subsys_state *css; > > css = task_subsys_state_check(p, cpu_cgroup_subsys_id, > - lockdep_is_held(&p->pi_lock)); > + lockdep_is_held(&p->pi_lock) || > + lockdep_is_held(&task_rq(p)->lock)); > tg = container_of(css, struct task_group, css); > > return autogroup_task_group(p, tg); > @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, > !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); > > #ifdef CONFIG_LOCKDEP > + /* > + * The caller should hold either p->pi_lock or rq->lock, when changing > + * a task's CPU. > + * > + * sched_move_task() holds both and thus holding either pins the cgroup, > + * see set_task_rq(). > + * > + * Furthermore, all task_rq users should acquire both locks, see > + * task_rq_lock(). > + */ > WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) || > lockdep_is_held(&task_rq(p)->lock))); > #endif > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-03 15:37 ` Peter Zijlstra 2011-06-03 18:16 ` Sergey Senozhatsky 2011-06-03 22:49 ` Sergey Senozhatsky @ 2011-06-05 19:12 ` Oleg Nesterov 2011-06-06 9:06 ` Peter Zijlstra 2011-06-06 13:43 ` Peter Zijlstra 2011-06-07 12:03 ` [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules tip-bot for Peter Zijlstra 3 siblings, 2 replies; 12+ messages in thread From: Oleg Nesterov @ 2011-06-05 19:12 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel On 06/03, Peter Zijlstra wrote: > > @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, > !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); > > #ifdef CONFIG_LOCKDEP > + /* > + * The caller should hold either p->pi_lock or rq->lock, when changing > + * a task's CPU. Is it literally true? IIRC, we need ->pi_lock if the task is not active, and rq->lock if p->on_rq = 1. And that is why we do not clear p->on_rq between deactivate_task() + activate_task(), correct? > + * > + * sched_move_task() holds both and thus holding either pins the cgroup, > + * see set_task_rq(). > + * > + * Furthermore, all task_rq users should acquire both locks, see > + * task_rq_lock(). > + */ > WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) || > lockdep_is_held(&task_rq(p)->lock))); IOW, perhaps this should be WARN_ON_ONCE(debug_locks && !lockdep_is_held(p->on_rq ? &task_rq(p)->lock : &p->pi_lock)) ? Not that I really suggest to change this WARN_ON(), I am just trying to recall the new rules. Oleg. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-05 19:12 ` Oleg Nesterov @ 2011-06-06 9:06 ` Peter Zijlstra 2011-06-06 16:46 ` Oleg Nesterov 2011-06-06 13:43 ` Peter Zijlstra 1 sibling, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2011-06-06 9:06 UTC (permalink / raw) To: Oleg Nesterov Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel On Sun, 2011-06-05 at 21:12 +0200, Oleg Nesterov wrote: > On 06/03, Peter Zijlstra wrote: > > > > @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, > > !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); > > > > #ifdef CONFIG_LOCKDEP > > + /* > > + * The caller should hold either p->pi_lock or rq->lock, when changing > > + * a task's CPU. > > Is it literally true? IIRC, we need ->pi_lock if the task is not active, > and rq->lock if p->on_rq = 1. And that is why we do not clear p->on_rq > between deactivate_task() + activate_task(), correct? > > > + * > > + * sched_move_task() holds both and thus holding either pins the cgroup, > > + * see set_task_rq(). > > + * > > + * Furthermore, all task_rq users should acquire both locks, see > > + * task_rq_lock(). > > + */ > > WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) || > > lockdep_is_held(&task_rq(p)->lock))); > > IOW, perhaps this should be > > WARN_ON_ONCE(debug_locks && !lockdep_is_held(p->on_rq ? > &task_rq(p)->lock : &p->pi_lock)) > > ? > > Not that I really suggest to change this WARN_ON(), I am just trying > to recall the new rules. You're right, p->pi_lock for wakeups, rq->lock for runnable tasks. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-06 9:06 ` Peter Zijlstra @ 2011-06-06 16:46 ` Oleg Nesterov 2011-06-07 9:31 ` Peter Zijlstra 0 siblings, 1 reply; 12+ messages in thread From: Oleg Nesterov @ 2011-06-06 16:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel On 06/06, Peter Zijlstra wrote: > > You're right, p->pi_lock for wakeups, rq->lock for runnable tasks. Good, thanks. Help! I have another question. try_to_wake_up: raw_spin_lock_irqsave(&p->pi_lock, flags); if (!(p->state & state)) goto out; cpu = task_cpu(p); if (p->on_rq && ttwu_remote(p, wake_flags)) goto stat; This doesn't look a bit confusing, we can't trust "cpu = task_cpu" before we check ->on_rq. OK, not a problem, this cpu number can only be used in ttwu_stat(cpu). But ttwu_stat(cpu) in turn does if (cpu != task_cpu(p)) schedstat_inc(p, se.statistics.nr_wakeups_migrate); Ignoring the theoretical races with pull_task/etc, how it is possible that cpu != task_cpu(p) ? Another caller is try_to_wake_up_local(), it obviously can't trigger this case. This looks broken to me. Looking at its name, I guess nr_wakeups_migrate should be incremented if ttwu does set_task_cpu(), correct? IOW. Don't we need something like the (untested/ucompiled) patch below? _If_ I am right, I can resend it with the changelog/etc but please feel free to make another fix. Oleg. --- x/kernel/sched.c +++ x/kernel/sched.c @@ -2423,13 +2423,14 @@ static void update_avg(u64 *avg, u64 sam #endif static void -ttwu_stat(struct task_struct *p, int cpu, int wake_flags) +ttwu_stat(struct task_struct *p, bool migrate, int wake_flags) { #ifdef CONFIG_SCHEDSTATS struct rq *rq = this_rq(); #ifdef CONFIG_SMP int this_cpu = smp_processor_id(); + int cpu = task_cpu(p); if (cpu == this_cpu) { schedstat_inc(rq, ttwu_local); @@ -2455,7 +2456,7 @@ ttwu_stat(struct task_struct *p, int cpu if (wake_flags & WF_SYNC) schedstat_inc(p, se.statistics.nr_wakeups_sync); - if (cpu != task_cpu(p)) + if (migrate) schedstat_inc(p, se.statistics.nr_wakeups_migrate); #endif /* CONFIG_SCHEDSTATS */ @@ -2630,6 +2631,7 @@ try_to_wake_up(struct task_struct *p, un { unsigned long flags; int cpu, success = 0; + bool migrate = false; smp_wmb(); raw_spin_lock_irqsave(&p->pi_lock, flags); @@ -2637,7 +2639,6 @@ try_to_wake_up(struct task_struct *p, un goto out; success = 1; /* we're going to change ->state */ - cpu = task_cpu(p); if (p->on_rq && ttwu_remote(p, wake_flags)) goto stat; @@ -2674,13 +2675,15 @@ try_to_wake_up(struct task_struct *p, un p->sched_class->task_waking(p); cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags); - if (task_cpu(p) != cpu) + if (task_cpu(p) != cpu) { set_task_cpu(p, cpu); + migrate = true; + } #endif /* CONFIG_SMP */ ttwu_queue(p, cpu); stat: - ttwu_stat(p, cpu, wake_flags); + ttwu_stat(p, migrate, wake_flags); out: raw_spin_unlock_irqrestore(&p->pi_lock, flags); @@ -2716,7 +2719,7 @@ static void try_to_wake_up_local(struct ttwu_activate(rq, p, ENQUEUE_WAKEUP); ttwu_do_wakeup(rq, p, 0); - ttwu_stat(p, smp_processor_id(), 0); + ttwu_stat(p, false, 0); out: raw_spin_unlock(&p->pi_lock); } ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-06 16:46 ` Oleg Nesterov @ 2011-06-07 9:31 ` Peter Zijlstra 2011-06-07 14:03 ` Oleg Nesterov 0 siblings, 1 reply; 12+ messages in thread From: Peter Zijlstra @ 2011-06-07 9:31 UTC (permalink / raw) To: Oleg Nesterov Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel On Mon, 2011-06-06 at 18:46 +0200, Oleg Nesterov wrote: > On 06/06, Peter Zijlstra wrote: > > > > You're right, p->pi_lock for wakeups, rq->lock for runnable tasks. > > Good, thanks. > > Help! I have another question. > > try_to_wake_up: > > raw_spin_lock_irqsave(&p->pi_lock, flags); > if (!(p->state & state)) > goto out; > > cpu = task_cpu(p); > > if (p->on_rq && ttwu_remote(p, wake_flags)) > goto stat; > > This doesn't look a bit confusing, we can't trust "cpu = task_cpu" before > we check ->on_rq. OK, not a problem, this cpu number can only be used in > ttwu_stat(cpu). > > But ttwu_stat(cpu) in turn does > > if (cpu != task_cpu(p)) > schedstat_inc(p, se.statistics.nr_wakeups_migrate); > > Ignoring the theoretical races with pull_task/etc, how it is possible > that cpu != task_cpu(p) ? Another caller is try_to_wake_up_local(), it > obviously can't trigger this case. > > This looks broken to me. Looking at its name, I guess nr_wakeups_migrate > should be incremented if ttwu does set_task_cpu(), correct? > > IOW. Don't we need something like the (untested/ucompiled) patch below? > _If_ I am right, I can resend it with the changelog/etc but please feel > free to make another fix. You're right, I spotted the same a few days ago which resulted in: --- commit f339b9dc1f03591761d5d930800db24bc0eda1e1 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Tue May 31 10:49:20 2011 +0200 sched: Fix schedstat.nr_wakeups_migrate While looking over the code I found that with the ttwu rework the nr_wakeups_migrate test broke since we now switch cpus prior to calling ttwu_stat(), hence the test is always true. Cure this by passing the migration state in wake_flags. Also move the whole test under CONFIG_SMP, its hard to migrate tasks on UP :-) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-pwwxl7gdqs5676f1d4cx6pj7@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu> diff --git a/include/linux/sched.h b/include/linux/sched.h index 8da84b7..483c1ed 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1063,6 +1063,7 @@ struct sched_domain; */ #define WF_SYNC 0x01 /* waker goes to sleep after wakup */ #define WF_FORK 0x02 /* child wakeup after fork */ +#define WF_MIGRATED 0x04 /* internal use, task got migrated */ #define ENQUEUE_WAKEUP 1 #define ENQUEUE_HEAD 2 diff --git a/kernel/sched.c b/kernel/sched.c index 49cc70b..2fe98ed 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -2447,6 +2447,10 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) } rcu_read_unlock(); } + + if (wake_flags & WF_MIGRATED) + schedstat_inc(p, se.statistics.nr_wakeups_migrate); + #endif /* CONFIG_SMP */ schedstat_inc(rq, ttwu_count); @@ -2455,9 +2459,6 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) if (wake_flags & WF_SYNC) schedstat_inc(p, se.statistics.nr_wakeups_sync); - if (cpu != task_cpu(p)) - schedstat_inc(p, se.statistics.nr_wakeups_migrate); - #endif /* CONFIG_SCHEDSTATS */ } @@ -2675,8 +2676,10 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) p->sched_class->task_waking(p); cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags); - if (task_cpu(p) != cpu) + if (task_cpu(p) != cpu) { + wake_flags |= WF_MIGRATED; set_task_cpu(p, cpu); + } #endif /* CONFIG_SMP */ ttwu_queue(p, cpu); ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-07 9:31 ` Peter Zijlstra @ 2011-06-07 14:03 ` Oleg Nesterov 0 siblings, 0 replies; 12+ messages in thread From: Oleg Nesterov @ 2011-06-07 14:03 UTC (permalink / raw) To: Peter Zijlstra Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel On 06/07, Peter Zijlstra wrote: > > Cure this by passing the migration state in wake_flags. Also move the > whole test under CONFIG_SMP, its hard to migrate tasks on UP :-) Agreed, WF_MIGRATE looks better. and "move under CONFIG_SMP" is probably correct ;) I still think it makes sense to move task_cpu(p) from try_to_wake_up() to ttwu_stat(), just because it looks confusing. But this is very minor, please forget. Oleg. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() 2011-06-05 19:12 ` Oleg Nesterov 2011-06-06 9:06 ` Peter Zijlstra @ 2011-06-06 13:43 ` Peter Zijlstra 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2011-06-06 13:43 UTC (permalink / raw) To: Oleg Nesterov Cc: Sergey Senozhatsky, Ingo Molnar, Andrew Morton, linux-kernel On Sun, 2011-06-05 at 21:12 +0200, Oleg Nesterov wrote: > > WARN_ON_ONCE(debug_locks && !lockdep_is_held(p->on_rq ? > &task_rq(p)->lock : &p->pi_lock)) > > Not that I really suggest to change this WARN_ON(), I am just trying > to recall the new rules. I actually like that, so I tried it, we need to clean up some of the fork() path hacks but it should be possible. Will try and remember for the next release or so. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules 2011-06-03 15:37 ` Peter Zijlstra ` (2 preceding siblings ...) 2011-06-05 19:12 ` Oleg Nesterov @ 2011-06-07 12:03 ` tip-bot for Peter Zijlstra 3 siblings, 0 replies; 12+ messages in thread From: tip-bot for Peter Zijlstra @ 2011-06-07 12:03 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, a.p.zijlstra, peterz, sergey.senozhatsky, tglx, oleg, mingo Commit-ID: 6c6c54e1807faf116724451ef2bd14993780470a Gitweb: http://git.kernel.org/tip/6c6c54e1807faf116724451ef2bd14993780470a Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Fri, 3 Jun 2011 17:37:07 +0200 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Tue, 7 Jun 2011 12:26:40 +0200 sched: Fix/clarify set_task_cpu() locking rules Sergey reported a CONFIG_PROVE_RCU warning in push_rt_task where set_task_cpu() was called with both relevant rq->locks held, which should be sufficient for running tasks since holding its rq->lock will serialize against sched_move_task(). Update the comments and fix the task_group() lockdep test. Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1307115427.2353.3456.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu> --- kernel/sched.c | 21 ++++++++++++++++----- 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/sched.c b/kernel/sched.c index 2fe98ed..3f2e502 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -605,10 +605,10 @@ static inline int cpu_of(struct rq *rq) /* * Return the group to which this tasks belongs. * - * We use task_subsys_state_check() and extend the RCU verification - * with lockdep_is_held(&p->pi_lock) because cpu_cgroup_attach() - * holds that lock for each task it moves into the cgroup. Therefore - * by holding that lock, we pin the task to the current cgroup. + * We use task_subsys_state_check() and extend the RCU verification with + * pi->lock and rq->lock because cpu_cgroup_attach() holds those locks for each + * task it moves into the cgroup. Therefore by holding either of those locks, + * we pin the task to the current cgroup. */ static inline struct task_group *task_group(struct task_struct *p) { @@ -616,7 +616,8 @@ static inline struct task_group *task_group(struct task_struct *p) struct cgroup_subsys_state *css; css = task_subsys_state_check(p, cpu_cgroup_subsys_id, - lockdep_is_held(&p->pi_lock)); + lockdep_is_held(&p->pi_lock) || + lockdep_is_held(&task_rq(p)->lock)); tg = container_of(css, struct task_group, css); return autogroup_task_group(p, tg); @@ -2200,6 +2201,16 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu) !(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)); #ifdef CONFIG_LOCKDEP + /* + * The caller should hold either p->pi_lock or rq->lock, when changing + * a task's CPU. ->pi_lock for waking tasks, rq->lock for runnable tasks. + * + * sched_move_task() holds both and thus holding either pins the cgroup, + * see set_task_rq(). + * + * Furthermore, all task_rq users should acquire both locks, see + * task_rq_lock(). + */ WARN_ON_ONCE(debug_locks && !(lockdep_is_held(&p->pi_lock) || lockdep_is_held(&task_rq(p)->lock))); #endif ^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2011-06-07 14:05 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-31 17:26 [PATCH] sched: RCU-protect __set_task_cpu() in set_task_cpu() Sergey Senozhatsky 2011-05-31 19:45 ` Peter Zijlstra 2011-06-03 15:37 ` Peter Zijlstra 2011-06-03 18:16 ` Sergey Senozhatsky 2011-06-03 22:49 ` Sergey Senozhatsky 2011-06-05 19:12 ` Oleg Nesterov 2011-06-06 9:06 ` Peter Zijlstra 2011-06-06 16:46 ` Oleg Nesterov 2011-06-07 9:31 ` Peter Zijlstra 2011-06-07 14:03 ` Oleg Nesterov 2011-06-06 13:43 ` Peter Zijlstra 2011-06-07 12:03 ` [tip:sched/urgent] sched: Fix/clarify set_task_cpu() locking rules tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).