* syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
@ 2012-03-30 18:31 Oleg Nesterov
2012-03-30 19:02 ` Steven Rostedt
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-03-30 18:31 UTC (permalink / raw)
To: Ingo Molnar, Jason Baron, Steven Rostedt; +Cc: linux-kernel
Hello.
I've looked at syscall_regfunc/unregfunc by accident, and I am
a bit confused...
void syscall_regfunc(void)
{
unsigned long flags;
struct task_struct *g, *t;
if (!sys_tracepoint_refcount) {
read_lock_irqsave(&tasklist_lock, flags);
Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_
doesn't. Any subtle reason I missed?
do_each_thread(g, t) {
/* Skip kernel threads. */
if (t->mm)
We should check PF_KTHREAD, not ->mm.
set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
But the main question is, can't we race with clone() and miss the
new child? The new task is not "visible" to do_each_thread() until
copy_process()->list_add_tail_rcu(thread_group/init_task.tasks).
Don't we need something like the patch below?
Oleg.
--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -1446,7 +1446,12 @@ static struct task_struct *copy_process(
total_forks++;
spin_unlock(¤t->sighand->siglock);
+#ifdef CONFIG_TRACEPOINTS
+ if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
+ set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+#endif
write_unlock_irq(&tasklist_lock);
+
proc_fork_connector(p);
cgroup_post_fork(p);
if (clone_flags & CLONE_THREAD)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
2012-03-30 18:31 syscall_regfunc() && TIF_SYSCALL_TRACEPOINT Oleg Nesterov
@ 2012-03-30 19:02 ` Steven Rostedt
2012-03-30 20:15 ` Oleg Nesterov
0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2012-03-30 19:02 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel
On Fri, 2012-03-30 at 20:31 +0200, Oleg Nesterov wrote:
> Hello.
>
> I've looked at syscall_regfunc/unregfunc by accident, and I am
> a bit confused...
>
> void syscall_regfunc(void)
> {
> unsigned long flags;
> struct task_struct *g, *t;
>
> if (!sys_tracepoint_refcount) {
> read_lock_irqsave(&tasklist_lock, flags);
>
> Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_
> doesn't. Any subtle reason I missed?
As long as an interrupt doesn't take the tasklist lock as write, we
don't. If it doesn't then we should be safe not to disable interrupts
here.
>
> do_each_thread(g, t) {
> /* Skip kernel threads. */
> if (t->mm)
>
> We should check PF_KTHREAD, not ->mm.
A lot of places test ->mm for kernel threads. Just search for ->mm in
kernel/sched/core.c
>
> set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
>
> But the main question is, can't we race with clone() and miss the
> new child? The new task is not "visible" to do_each_thread() until
> copy_process()->list_add_tail_rcu(thread_group/init_task.tasks).
>
> Don't we need something like the patch below?
>
> Oleg.
>
>
> --- x/kernel/fork.c
> +++ x/kernel/fork.c
> @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process(
>
> total_forks++;
> spin_unlock(¤t->sighand->siglock);
> +#ifdef CONFIG_TRACEPOINTS
> + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
I'm not so worried about the set (although that should be done) but it
is entirely possible that we need a clear too. Leaving this flag set
would cause a task to take the overhead of tracing syscalls without ever
tracing them.
-- Steve
> +#endif
> write_unlock_irq(&tasklist_lock);
> +
> proc_fork_connector(p);
> cgroup_post_fork(p);
> if (clone_flags & CLONE_THREAD)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
2012-03-30 19:02 ` Steven Rostedt
@ 2012-03-30 20:15 ` Oleg Nesterov
2012-03-31 0:13 ` Steven Rostedt
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-03-30 20:15 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Ingo Molnar, Jason Baron, linux-kernel
On 03/30, Steven Rostedt wrote:
>
> On Fri, 2012-03-30 at 20:31 +0200, Oleg Nesterov wrote:
> > Hello.
> >
> > I've looked at syscall_regfunc/unregfunc by accident, and I am
> > a bit confused...
> >
> > void syscall_regfunc(void)
> > {
> > unsigned long flags;
> > struct task_struct *g, *t;
> >
> > if (!sys_tracepoint_refcount) {
> > read_lock_irqsave(&tasklist_lock, flags);
> >
> > Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_
> > doesn't. Any subtle reason I missed?
>
> As long as an interrupt doesn't take the tasklist lock as write,
No, this is forbidden.
> > do_each_thread(g, t) {
> > /* Skip kernel threads. */
> > if (t->mm)
> >
> > We should check PF_KTHREAD, not ->mm.
>
> A lot of places test ->mm for kernel threads.
And this is wrong, use_mm() can set ->mm != NULL. This is the common
mistake.
> Just search for ->mm in
> kernel/sched/core.c
Probably normalize_rt_tasks() and __sched_setscheduler() should be fixed.
> > Don't we need something like the patch below?
> >
> > Oleg.
> >
> >
> > --- x/kernel/fork.c
> > +++ x/kernel/fork.c
> > @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process(
> >
> > total_forks++;
> > spin_unlock(¤t->sighand->siglock);
> > +#ifdef CONFIG_TRACEPOINTS
> > + if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> > + set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
>
> I'm not so worried about the set (although that should be done) but it
> is entirely possible that we need a clear too. Leaving this flag set
> would cause a task to take the overhead of tracing syscalls without ever
> tracing them.
Agreed. OK, I'll send the patch with "else clear".
But I don't really understand why do you think that "clear" is more
important. Sure, the wrong TIF_SYSCALL_TRACEPOINT triggers the slow
path unnecessary, but this is more or less harmless. But if we do
not set the task obviously won't report trace_sys*, this looks like
a bug even if nothing bad can happen.
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
2012-03-30 20:15 ` Oleg Nesterov
@ 2012-03-31 0:13 ` Steven Rostedt
2012-03-31 20:45 ` Oleg Nesterov
0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2012-03-31 0:13 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel
On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote:
> > A lot of places test ->mm for kernel threads.
>
> And this is wrong, use_mm() can set ->mm != NULL. This is the common
> mistake.
>
> > Just search for ->mm in
> > kernel/sched/core.c
>
> Probably normalize_rt_tasks() and __sched_setscheduler() should be fixed.
Heh, the __sched_setscheduler() usage is from me. But I'm not sure we
had an alternative back in 2005.
> But I don't really understand why do you think that "clear" is more
> important.
They are both important. But as I tend to consider performance when
tracing is off as critical, I'm more concerned about that. But both must
be fixed, because not reporting traces can confuse a developer.
> Sure, the wrong TIF_SYSCALL_TRACEPOINT triggers the slow
> path unnecessary, but this is more or less harmless. But if we do
> not set the task obviously won't report trace_sys*, this looks like
> a bug even if nothing bad can happen.
Agreed, both are a bug and both should be fixed.
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
2012-03-31 0:13 ` Steven Rostedt
@ 2012-03-31 20:45 ` Oleg Nesterov
2012-03-31 21:37 ` Steven Rostedt
0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-03-31 20:45 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Ingo Molnar, Jason Baron, linux-kernel
On 03/30, Steven Rostedt wrote:
>
> On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote:
>
> > But I don't really understand why do you think that "clear" is more
> > important.
>
> They are both important. But as I tend to consider performance when
> tracing is off as critical, I'm more concerned about that. But both must
> be fixed, because not reporting traces can confuse a developer.
Ah, got it, thanks.
I was going to send the simple patch we discussed, but suddenly I
realized that I have another question.
Why do we want to filter out the kernel threads in syscall_regfunc?
>From cc3b13c1 "tracing: Don't trace kernel thread syscalls"
then it has no effect to trace the kernel thread calls
to syscalls in that path.
Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these.
OK, but then it doesn't hurt? Or is there another reason why
TIF_SYSCALL_TRACEPOINT is not desirable on kthread?
The problem is ____call_usermodehelper() which execs the user-space
task. This clears PF_KTHREAD (sets ->mm), but obviously if
sys_tracepoint_refcount != 0 this is too late.
So what do you think we should do,
- keep this check
- remove it
- remove it in a separate patch
- add the "sync with sys_tracepoint_refcount" hook
before kernel_execve()
?
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
2012-03-31 20:45 ` Oleg Nesterov
@ 2012-03-31 21:37 ` Steven Rostedt
2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2012-03-31 21:37 UTC (permalink / raw)
To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel
On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote:
> On 03/30, Steven Rostedt wrote:
> >
> > On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote:
> >
> > > But I don't really understand why do you think that "clear" is more
> > > important.
> >
> > They are both important. But as I tend to consider performance when
> > tracing is off as critical, I'm more concerned about that. But both must
> > be fixed, because not reporting traces can confuse a developer.
>
> Ah, got it, thanks.
>
> I was going to send the simple patch we discussed, but suddenly I
> realized that I have another question.
>
> Why do we want to filter out the kernel threads in syscall_regfunc?
>
> >From cc3b13c1 "tracing: Don't trace kernel thread syscalls"
>
> then it has no effect to trace the kernel thread calls
> to syscalls in that path.
> Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these.
>
> OK, but then it doesn't hurt? Or is there another reason why
> TIF_SYSCALL_TRACEPOINT is not desirable on kthread?
Right, it doesn't hurt. I was about to say that in a previous email.
>
> The problem is ____call_usermodehelper() which execs the user-space
> task. This clears PF_KTHREAD (sets ->mm), but obviously if
> sys_tracepoint_refcount != 0 this is too late.
>
> So what do you think we should do,
>
> - keep this check
>
> - remove it
>
> - remove it in a separate patch
I say this one (remove it in a separate patch). That way if something
breaks we know exactly what did it ;-)
>
> - add the "sync with sys_tracepoint_refcount" hook
> before kernel_execve()
>
> ?
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT)
2012-03-31 21:37 ` Steven Rostedt
@ 2012-04-01 21:37 ` Oleg Nesterov
2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-01 21:37 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
Frederic Weisbecker
On 03/31, Steven Rostedt wrote:
>
> On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote:
> >
> > So what do you think we should do,
> >
> > - keep this check
> >
> > - remove it
> >
> > - remove it in a separate patch
>
> I say this one (remove it in a separate patch). That way if something
> breaks we know exactly what did it ;-)
OK, agreed.
Don't really know how can I test this... but the kernel didn't
crash after I enabled the syscall tracer ;)
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process()
2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
@ 2012-04-01 21:38 ` Oleg Nesterov
2012-04-01 21:38 ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
2012-04-20 21:26 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
2 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-01 21:38 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
Frederic Weisbecker
syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
init_task.tasks list yet.
Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.
While at it,
- remove _irqsafe from syscall_regfunc/syscall_unregfunc,
read_lock(tasklist) doesn't need to disable irqs.
- change syscall_unregfunc() to check PF_KTHREAD to skip
the kernel threads, ->mm != NULL is the common mistake.
Note: probably this check should be simply removed, needs
another patch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/fork.c | 7 +++++++
kernel/tracepoint.c | 12 +++++-------
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 34d7ed1..772d4a5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1446,7 +1446,14 @@ static struct task_struct *copy_process(unsigned long clone_flags,
total_forks++;
spin_unlock(¤t->sighand->siglock);
+#ifdef CONFIG_TRACEPOINTS
+ if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
+ set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+ else
+ clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+#endif
write_unlock_irq(&tasklist_lock);
+
proc_fork_connector(p);
cgroup_post_fork(p);
if (clone_flags & CLONE_THREAD)
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index f1539de..e2a4523 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -734,33 +734,31 @@ static int sys_tracepoint_refcount;
void syscall_regfunc(void)
{
- unsigned long flags;
struct task_struct *g, *t;
if (!sys_tracepoint_refcount) {
- read_lock_irqsave(&tasklist_lock, flags);
+ read_lock(&tasklist_lock);
do_each_thread(g, t) {
/* Skip kernel threads. */
- if (t->mm)
+ if (!(t->flags & PF_KTHREAD))
set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
} while_each_thread(g, t);
- read_unlock_irqrestore(&tasklist_lock, flags);
+ read_unlock(&tasklist_lock);
}
sys_tracepoint_refcount++;
}
void syscall_unregfunc(void)
{
- unsigned long flags;
struct task_struct *g, *t;
sys_tracepoint_refcount--;
if (!sys_tracepoint_refcount) {
- read_lock_irqsave(&tasklist_lock, flags);
+ read_lock(&tasklist_lock);
do_each_thread(g, t) {
clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
} while_each_thread(g, t);
- read_unlock_irqrestore(&tasklist_lock, flags);
+ read_unlock(&tasklist_lock);
}
}
#endif
--
1.5.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads
2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
@ 2012-04-01 21:38 ` Oleg Nesterov
2012-04-20 21:26 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
2 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-01 21:38 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
Frederic Weisbecker
syscall_regfunc() ignores the kernel thread because "it has
no effect", see cc3b13c1 "Don't trace kernel thread syscalls".
However, this means that a user-space task spawned by
call_usermodehelper() won't report the system calls if
kernel_execve() is called when sys_tracepoint_refcount != 0.
Remove this check. Hopefully the unnecessary report from
ret_from_fork path mentioned by cc3b13c1 is fine. In fact
"this is the only case" is not true. Say, kernel_execve()
itself does "int 80" on X86_32. Hopefully fine too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/tracepoint.c | 4 +---
1 files changed, 1 insertions(+), 3 deletions(-)
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index e2a4523..2403e60 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -739,9 +739,7 @@ void syscall_regfunc(void)
if (!sys_tracepoint_refcount) {
read_lock(&tasklist_lock);
do_each_thread(g, t) {
- /* Skip kernel threads. */
- if (!(t->flags & PF_KTHREAD))
- set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
+ set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
} while_each_thread(g, t);
read_unlock(&tasklist_lock);
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT)
2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
2012-04-01 21:38 ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
@ 2012-04-20 21:26 ` Oleg Nesterov
2 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-20 21:26 UTC (permalink / raw)
To: Steven Rostedt
Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
Frederic Weisbecker
Should I resend this or we do not really care?
The problem is minor, but both patches look like the simple
and obvious bugfix to me.
On 04/01, Oleg Nesterov wrote:
>
> On 03/31, Steven Rostedt wrote:
> >
> > On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote:
> > >
> > > So what do you think we should do,
> > >
> > > - keep this check
> > >
> > > - remove it
> > >
> > > - remove it in a separate patch
> >
> > I say this one (remove it in a separate patch). That way if something
> > breaks we know exactly what did it ;-)
>
> OK, agreed.
>
> Don't really know how can I test this... but the kernel didn't
> crash after I enabled the syscall tracer ;)
>
> Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-04-20 21:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-30 18:31 syscall_regfunc() && TIF_SYSCALL_TRACEPOINT Oleg Nesterov
2012-03-30 19:02 ` Steven Rostedt
2012-03-30 20:15 ` Oleg Nesterov
2012-03-31 0:13 ` Steven Rostedt
2012-03-31 20:45 ` Oleg Nesterov
2012-03-31 21:37 ` Steven Rostedt
2012-04-01 21:37 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
2012-04-01 21:38 ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
2012-04-01 21:38 ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
2012-04-20 21:26 ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.