* syscall_regfunc() && TIF_SYSCALL_TRACEPOINT @ 2012-03-30 18:31 Oleg Nesterov 2012-03-30 19:02 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2012-03-30 18:31 UTC (permalink / raw) To: Ingo Molnar, Jason Baron, Steven Rostedt; +Cc: linux-kernel Hello. I've looked at syscall_regfunc/unregfunc by accident, and I am a bit confused... void syscall_regfunc(void) { unsigned long flags; struct task_struct *g, *t; if (!sys_tracepoint_refcount) { read_lock_irqsave(&tasklist_lock, flags); Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_ doesn't. Any subtle reason I missed? do_each_thread(g, t) { /* Skip kernel threads. */ if (t->mm) We should check PF_KTHREAD, not ->mm. set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT); But the main question is, can't we race with clone() and miss the new child? The new task is not "visible" to do_each_thread() until copy_process()->list_add_tail_rcu(thread_group/init_task.tasks). Don't we need something like the patch below? Oleg. --- x/kernel/fork.c +++ x/kernel/fork.c @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process( total_forks++; spin_unlock(¤t->sighand->siglock); +#ifdef CONFIG_TRACEPOINTS + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT); +#endif write_unlock_irq(&tasklist_lock); + proc_fork_connector(p); cgroup_post_fork(p); if (clone_flags & CLONE_THREAD) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT 2012-03-30 18:31 syscall_regfunc() && TIF_SYSCALL_TRACEPOINT Oleg Nesterov @ 2012-03-30 19:02 ` Steven Rostedt 2012-03-30 20:15 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2012-03-30 19:02 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel On Fri, 2012-03-30 at 20:31 +0200, Oleg Nesterov wrote: > Hello. > > I've looked at syscall_regfunc/unregfunc by accident, and I am > a bit confused... > > void syscall_regfunc(void) > { > unsigned long flags; > struct task_struct *g, *t; > > if (!sys_tracepoint_refcount) { > read_lock_irqsave(&tasklist_lock, flags); > > Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_ > doesn't. Any subtle reason I missed? As long as an interrupt doesn't take the tasklist lock as write, we don't. If it doesn't then we should be safe not to disable interrupts here. > > do_each_thread(g, t) { > /* Skip kernel threads. */ > if (t->mm) > > We should check PF_KTHREAD, not ->mm. A lot of places test ->mm for kernel threads. Just search for ->mm in kernel/sched/core.c > > set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT); > > But the main question is, can't we race with clone() and miss the > new child? The new task is not "visible" to do_each_thread() until > copy_process()->list_add_tail_rcu(thread_group/init_task.tasks). > > Don't we need something like the patch below? > > Oleg. > > > --- x/kernel/fork.c > +++ x/kernel/fork.c > @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process( > > total_forks++; > spin_unlock(¤t->sighand->siglock); > +#ifdef CONFIG_TRACEPOINTS > + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) > + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT); I'm not so worried about the set (although that should be done) but it is entirely possible that we need a clear too. Leaving this flag set would cause a task to take the overhead of tracing syscalls without ever tracing them. -- Steve > +#endif > write_unlock_irq(&tasklist_lock); > + > proc_fork_connector(p); > cgroup_post_fork(p); > if (clone_flags & CLONE_THREAD) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT 2012-03-30 19:02 ` Steven Rostedt @ 2012-03-30 20:15 ` Oleg Nesterov 2012-03-31 0:13 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2012-03-30 20:15 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, Jason Baron, linux-kernel On 03/30, Steven Rostedt wrote: > > On Fri, 2012-03-30 at 20:31 +0200, Oleg Nesterov wrote: > > Hello. > > > > I've looked at syscall_regfunc/unregfunc by accident, and I am > > a bit confused... > > > > void syscall_regfunc(void) > > { > > unsigned long flags; > > struct task_struct *g, *t; > > > > if (!sys_tracepoint_refcount) { > > read_lock_irqsave(&tasklist_lock, flags); > > > > Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_ > > doesn't. Any subtle reason I missed? > > As long as an interrupt doesn't take the tasklist lock as write, No, this is forbidden. > > do_each_thread(g, t) { > > /* Skip kernel threads. */ > > if (t->mm) > > > > We should check PF_KTHREAD, not ->mm. > > A lot of places test ->mm for kernel threads. And this is wrong, use_mm() can set ->mm != NULL. This is the common mistake. > Just search for ->mm in > kernel/sched/core.c Probably normalize_rt_tasks() and __sched_setscheduler() should be fixed. > > Don't we need something like the patch below? > > > > Oleg. > > > > > > --- x/kernel/fork.c > > +++ x/kernel/fork.c > > @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process( > > > > total_forks++; > > spin_unlock(¤t->sighand->siglock); > > +#ifdef CONFIG_TRACEPOINTS > > + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT))) > > + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT); > > I'm not so worried about the set (although that should be done) but it > is entirely possible that we need a clear too. Leaving this flag set > would cause a task to take the overhead of tracing syscalls without ever > tracing them. Agreed. OK, I'll send the patch with "else clear". But I don't really understand why do you think that "clear" is more important. Sure, the wrong TIF_SYSCALL_TRACEPOINT triggers the slow path unnecessary, but this is more or less harmless. But if we do not set the task obviously won't report trace_sys*, this looks like a bug even if nothing bad can happen. Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT 2012-03-30 20:15 ` Oleg Nesterov @ 2012-03-31 0:13 ` Steven Rostedt 2012-03-31 20:45 ` Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2012-03-31 0:13 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote: > > A lot of places test ->mm for kernel threads. > > And this is wrong, use_mm() can set ->mm != NULL. This is the common > mistake. > > > Just search for ->mm in > > kernel/sched/core.c > > Probably normalize_rt_tasks() and __sched_setscheduler() should be fixed. Heh, the __sched_setscheduler() usage is from me. But I'm not sure we had an alternative back in 2005. > But I don't really understand why do you think that "clear" is more > important. They are both important. But as I tend to consider performance when tracing is off as critical, I'm more concerned about that. But both must be fixed, because not reporting traces can confuse a developer. > Sure, the wrong TIF_SYSCALL_TRACEPOINT triggers the slow > path unnecessary, but this is more or less harmless. But if we do > not set the task obviously won't report trace_sys*, this looks like > a bug even if nothing bad can happen. Agreed, both are a bug and both should be fixed. Thanks, -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT 2012-03-31 0:13 ` Steven Rostedt @ 2012-03-31 20:45 ` Oleg Nesterov 2012-03-31 21:37 ` Steven Rostedt 0 siblings, 1 reply; 10+ messages in thread From: Oleg Nesterov @ 2012-03-31 20:45 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ingo Molnar, Jason Baron, linux-kernel On 03/30, Steven Rostedt wrote: > > On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote: > > > But I don't really understand why do you think that "clear" is more > > important. > > They are both important. But as I tend to consider performance when > tracing is off as critical, I'm more concerned about that. But both must > be fixed, because not reporting traces can confuse a developer. Ah, got it, thanks. I was going to send the simple patch we discussed, but suddenly I realized that I have another question. Why do we want to filter out the kernel threads in syscall_regfunc? >From cc3b13c1 "tracing: Don't trace kernel thread syscalls" then it has no effect to trace the kernel thread calls to syscalls in that path. Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these. OK, but then it doesn't hurt? Or is there another reason why TIF_SYSCALL_TRACEPOINT is not desirable on kthread? The problem is ____call_usermodehelper() which execs the user-space task. This clears PF_KTHREAD (sets ->mm), but obviously if sys_tracepoint_refcount != 0 this is too late. So what do you think we should do, - keep this check - remove it - remove it in a separate patch - add the "sync with sys_tracepoint_refcount" hook before kernel_execve() ? Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT 2012-03-31 20:45 ` Oleg Nesterov @ 2012-03-31 21:37 ` Steven Rostedt 2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov 0 siblings, 1 reply; 10+ messages in thread From: Steven Rostedt @ 2012-03-31 21:37 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote: > On 03/30, Steven Rostedt wrote: > > > > On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote: > > > > > But I don't really understand why do you think that "clear" is more > > > important. > > > > They are both important. But as I tend to consider performance when > > tracing is off as critical, I'm more concerned about that. But both must > > be fixed, because not reporting traces can confuse a developer. > > Ah, got it, thanks. > > I was going to send the simple patch we discussed, but suddenly I > realized that I have another question. > > Why do we want to filter out the kernel threads in syscall_regfunc? > > >From cc3b13c1 "tracing: Don't trace kernel thread syscalls" > > then it has no effect to trace the kernel thread calls > to syscalls in that path. > Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these. > > OK, but then it doesn't hurt? Or is there another reason why > TIF_SYSCALL_TRACEPOINT is not desirable on kthread? Right, it doesn't hurt. I was about to say that in a previous email. > > The problem is ____call_usermodehelper() which execs the user-space > task. This clears PF_KTHREAD (sets ->mm), but obviously if > sys_tracepoint_refcount != 0 this is too late. > > So what do you think we should do, > > - keep this check > > - remove it > > - remove it in a separate patch I say this one (remove it in a separate patch). That way if something breaks we know exactly what did it ;-) > > - add the "sync with sys_tracepoint_refcount" hook > before kernel_execve() > > ? -- Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) 2012-03-31 21:37 ` Steven Rostedt @ 2012-04-01 21:37 ` Oleg Nesterov 2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Oleg Nesterov @ 2012-04-01 21:37 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner, Frederic Weisbecker On 03/31, Steven Rostedt wrote: > > On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote: > > > > So what do you think we should do, > > > > - keep this check > > > > - remove it > > > > - remove it in a separate patch > > I say this one (remove it in a separate patch). That way if something > breaks we know exactly what did it ;-) OK, agreed. Don't really know how can I test this... but the kernel didn't crash after I enabled the syscall tracer ;) Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() 2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov @ 2012-04-01 21:38 ` Oleg Nesterov 2012-04-01 21:38 ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov 2012-04-20 21:26 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov 2 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2012-04-01 21:38 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner, Frederic Weisbecker syscall_regfunc() and syscall_unregfunc() should set/clear TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race with copy_process() and miss the new child which was not added to init_task.tasks list yet. Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT under tasklist. While at it, - remove _irqsafe from syscall_regfunc/syscall_unregfunc, read_lock(tasklist) doesn't need to disable irqs. - change syscall_unregfunc() to check PF_KTHREAD to skip the kernel threads, ->mm != NULL is the common mistake. Note: probably this check should be simply removed, needs another patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/fork.c | 7 +++++++ kernel/tracepoint.c | 12 +++++------- 2 files changed, 12 insertions(+), 7 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 34d7ed1..772d4a5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1446,7 +1446,14 @@ static struct task_struct *copy_process(unsigned long clone_flags, total_forks++; spin_unlock(¤t->sighand->siglock); +#ifdef CONFIG_TRACEPOINTS + if (test_thread_flag(TIF_SYSCALL_TRACEPOINT)) + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT); + else + clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT); +#endif write_unlock_irq(&tasklist_lock); + proc_fork_connector(p); cgroup_post_fork(p); if (clone_flags & CLONE_THREAD) diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index f1539de..e2a4523 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -734,33 +734,31 @@ static int sys_tracepoint_refcount; void syscall_regfunc(void) { - unsigned long flags; struct task_struct *g, *t; if (!sys_tracepoint_refcount) { - read_lock_irqsave(&tasklist_lock, flags); + read_lock(&tasklist_lock); do_each_thread(g, t) { /* Skip kernel threads. */ - if (t->mm) + if (!(t->flags & PF_KTHREAD)) set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT); } while_each_thread(g, t); - read_unlock_irqrestore(&tasklist_lock, flags); + read_unlock(&tasklist_lock); } sys_tracepoint_refcount++; } void syscall_unregfunc(void) { - unsigned long flags; struct task_struct *g, *t; sys_tracepoint_refcount--; if (!sys_tracepoint_refcount) { - read_lock_irqsave(&tasklist_lock, flags); + read_lock(&tasklist_lock); do_each_thread(g, t) { clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT); } while_each_thread(g, t); - read_unlock_irqrestore(&tasklist_lock, flags); + read_unlock(&tasklist_lock); } } #endif -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads 2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov 2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov @ 2012-04-01 21:38 ` Oleg Nesterov 2012-04-20 21:26 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov 2 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2012-04-01 21:38 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner, Frederic Weisbecker syscall_regfunc() ignores the kernel thread because "it has no effect", see cc3b13c1 "Don't trace kernel thread syscalls". However, this means that a user-space task spawned by call_usermodehelper() won't report the system calls if kernel_execve() is called when sys_tracepoint_refcount != 0. Remove this check. Hopefully the unnecessary report from ret_from_fork path mentioned by cc3b13c1 is fine. In fact "this is the only case" is not true. Say, kernel_execve() itself does "int 80" on X86_32. Hopefully fine too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> --- kernel/tracepoint.c | 4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index e2a4523..2403e60 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -739,9 +739,7 @@ void syscall_regfunc(void) if (!sys_tracepoint_refcount) { read_lock(&tasklist_lock); do_each_thread(g, t) { - /* Skip kernel threads. */ - if (!(t->flags & PF_KTHREAD)) - set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT); + set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT); } while_each_thread(g, t); read_unlock(&tasklist_lock); } -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) 2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov 2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov 2012-04-01 21:38 ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov @ 2012-04-20 21:26 ` Oleg Nesterov 2 siblings, 0 replies; 10+ messages in thread From: Oleg Nesterov @ 2012-04-20 21:26 UTC (permalink / raw) To: Steven Rostedt Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner, Frederic Weisbecker Should I resend this or we do not really care? The problem is minor, but both patches look like the simple and obvious bugfix to me. On 04/01, Oleg Nesterov wrote: > > On 03/31, Steven Rostedt wrote: > > > > On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote: > > > > > > So what do you think we should do, > > > > > > - keep this check > > > > > > - remove it > > > > > > - remove it in a separate patch > > > > I say this one (remove it in a separate patch). That way if something > > breaks we know exactly what did it ;-) > > OK, agreed. > > Don't really know how can I test this... but the kernel didn't > crash after I enabled the syscall tracer ;) > > Oleg. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-04-20 21:27 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-03-30 18:31 syscall_regfunc() && TIF_SYSCALL_TRACEPOINT Oleg Nesterov 2012-03-30 19:02 ` Steven Rostedt 2012-03-30 20:15 ` Oleg Nesterov 2012-03-31 0:13 ` Steven Rostedt 2012-03-31 20:45 ` Oleg Nesterov 2012-03-31 21:37 ` Steven Rostedt 2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov 2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov 2012-04-01 21:38 ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov 2012-04-20 21:26 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.