All of lore.kernel.org
 help / color / mirror / Atom feed
* syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
@ 2012-03-30 18:31 Oleg Nesterov
  2012-03-30 19:02 ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-03-30 18:31 UTC (permalink / raw)
  To: Ingo Molnar, Jason Baron, Steven Rostedt; +Cc: linux-kernel

Hello.

I've looked at syscall_regfunc/unregfunc by accident, and I am
a bit confused...

	void syscall_regfunc(void)
	{
		unsigned long flags;
		struct task_struct *g, *t;

		if (!sys_tracepoint_refcount) {
			read_lock_irqsave(&tasklist_lock, flags);

Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_
doesn't. Any subtle reason I missed?

			do_each_thread(g, t) {
				/* Skip kernel threads. */
				if (t->mm)

We should check PF_KTHREAD, not ->mm.

					set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);

But the main question is, can't we race with clone() and miss the
new child? The new task is not "visible" to do_each_thread() until
copy_process()->list_add_tail_rcu(thread_group/init_task.tasks).

Don't we need something like the patch below?

Oleg.


--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -1446,7 +1446,12 @@ static struct task_struct *copy_process(
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
+#ifdef CONFIG_TRACEPOINTS
+	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
+		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+#endif
 	write_unlock_irq(&tasklist_lock);
+
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
 	if (clone_flags & CLONE_THREAD)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
  2012-03-30 18:31 syscall_regfunc() && TIF_SYSCALL_TRACEPOINT Oleg Nesterov
@ 2012-03-30 19:02 ` Steven Rostedt
  2012-03-30 20:15   ` Oleg Nesterov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2012-03-30 19:02 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel

On Fri, 2012-03-30 at 20:31 +0200, Oleg Nesterov wrote:
> Hello.
> 
> I've looked at syscall_regfunc/unregfunc by accident, and I am
> a bit confused...
> 
> 	void syscall_regfunc(void)
> 	{
> 		unsigned long flags;
> 		struct task_struct *g, *t;
> 
> 		if (!sys_tracepoint_refcount) {
> 			read_lock_irqsave(&tasklist_lock, flags);
> 
> Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_
> doesn't. Any subtle reason I missed?

As long as an interrupt doesn't take the tasklist lock as write, we
don't. If it doesn't then we should be safe not to disable interrupts
here.

> 
> 			do_each_thread(g, t) {
> 				/* Skip kernel threads. */
> 				if (t->mm)
> 
> We should check PF_KTHREAD, not ->mm.

A lot of places test ->mm for kernel threads. Just search for ->mm in
kernel/sched/core.c


> 
> 					set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> 
> But the main question is, can't we race with clone() and miss the
> new child? The new task is not "visible" to do_each_thread() until
> copy_process()->list_add_tail_rcu(thread_group/init_task.tasks).
> 
> Don't we need something like the patch below?
> 
> Oleg.
> 
> 
> --- x/kernel/fork.c
> +++ x/kernel/fork.c
> @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process(
>  
>  	total_forks++;
>  	spin_unlock(&current->sighand->siglock);
> +#ifdef CONFIG_TRACEPOINTS
> +	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> +		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);

I'm not so worried about the set (although that should be done) but it
is entirely possible that we need a clear too. Leaving this flag set
would cause a task to take the overhead of tracing syscalls without ever
tracing them.

-- Steve


> +#endif
>  	write_unlock_irq(&tasklist_lock);
> +
>  	proc_fork_connector(p);
>  	cgroup_post_fork(p);
>  	if (clone_flags & CLONE_THREAD)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
  2012-03-30 19:02 ` Steven Rostedt
@ 2012-03-30 20:15   ` Oleg Nesterov
  2012-03-31  0:13     ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-03-30 20:15 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, Jason Baron, linux-kernel

On 03/30, Steven Rostedt wrote:
>
> On Fri, 2012-03-30 at 20:31 +0200, Oleg Nesterov wrote:
> > Hello.
> >
> > I've looked at syscall_regfunc/unregfunc by accident, and I am
> > a bit confused...
> >
> > 	void syscall_regfunc(void)
> > 	{
> > 		unsigned long flags;
> > 		struct task_struct *g, *t;
> >
> > 		if (!sys_tracepoint_refcount) {
> > 			read_lock_irqsave(&tasklist_lock, flags);
> >
> > Why _irqsave? write_lock(tasklist) needs to disable irqs, but read_
> > doesn't. Any subtle reason I missed?
>
> As long as an interrupt doesn't take the tasklist lock as write,

No, this is forbidden.

> > 			do_each_thread(g, t) {
> > 				/* Skip kernel threads. */
> > 				if (t->mm)
> >
> > We should check PF_KTHREAD, not ->mm.
>
> A lot of places test ->mm for kernel threads.

And this is wrong, use_mm() can set ->mm != NULL. This is the common
mistake.

> Just search for ->mm in
> kernel/sched/core.c

Probably normalize_rt_tasks() and __sched_setscheduler() should be fixed.

> > Don't we need something like the patch below?
> >
> > Oleg.
> >
> >
> > --- x/kernel/fork.c
> > +++ x/kernel/fork.c
> > @@ -1446,7 +1446,12 @@ static struct task_struct *copy_process(
> >
> >  	total_forks++;
> >  	spin_unlock(&current->sighand->siglock);
> > +#ifdef CONFIG_TRACEPOINTS
> > +	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> > +		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
>
> I'm not so worried about the set (although that should be done) but it
> is entirely possible that we need a clear too. Leaving this flag set
> would cause a task to take the overhead of tracing syscalls without ever
> tracing them.

Agreed. OK, I'll send the patch with "else clear".

But I don't really understand why do you think that "clear" is more
important. Sure, the wrong TIF_SYSCALL_TRACEPOINT triggers the slow
path unnecessary, but this is more or less harmless. But if we do
not set the task obviously won't report trace_sys*, this looks like
a bug even if nothing bad can happen.

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
  2012-03-30 20:15   ` Oleg Nesterov
@ 2012-03-31  0:13     ` Steven Rostedt
  2012-03-31 20:45       ` Oleg Nesterov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2012-03-31  0:13 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel

On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote:

> > A lot of places test ->mm for kernel threads.
> 
> And this is wrong, use_mm() can set ->mm != NULL. This is the common
> mistake.
> 
> > Just search for ->mm in
> > kernel/sched/core.c
> 
> Probably normalize_rt_tasks() and __sched_setscheduler() should be fixed.

Heh, the __sched_setscheduler() usage is from me. But I'm not sure we
had an alternative back in 2005.


> But I don't really understand why do you think that "clear" is more
> important.

They are both important. But as I tend to consider performance when
tracing is off as critical, I'm more concerned about that. But both must
be fixed, because not reporting traces can confuse a developer.



>  Sure, the wrong TIF_SYSCALL_TRACEPOINT triggers the slow
> path unnecessary, but this is more or less harmless. But if we do
> not set the task obviously won't report trace_sys*, this looks like
> a bug even if nothing bad can happen.

Agreed, both are a bug and both should be fixed.

Thanks,

-- Steve



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
  2012-03-31  0:13     ` Steven Rostedt
@ 2012-03-31 20:45       ` Oleg Nesterov
  2012-03-31 21:37         ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2012-03-31 20:45 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Ingo Molnar, Jason Baron, linux-kernel

On 03/30, Steven Rostedt wrote:
>
> On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote:
>
> > But I don't really understand why do you think that "clear" is more
> > important.
>
> They are both important. But as I tend to consider performance when
> tracing is off as critical, I'm more concerned about that. But both must
> be fixed, because not reporting traces can confuse a developer.

Ah, got it, thanks.

I was going to send the simple patch we discussed, but suddenly I
realized that I have another question.

Why do we want to filter out the kernel threads in syscall_regfunc?

>From cc3b13c1 "tracing: Don't trace kernel thread syscalls"

	 then it has no effect to trace the kernel thread calls
	 to syscalls in that path.
	 Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these.

OK, but then it doesn't hurt? Or is there another reason why
TIF_SYSCALL_TRACEPOINT is not desirable on kthread?

The problem is ____call_usermodehelper() which execs the user-space
task. This clears PF_KTHREAD (sets ->mm), but obviously if
sys_tracepoint_refcount != 0 this is too late.

So what do you think we should do,

	- keep this check

	- remove it

	- remove it in a separate patch

	- add the "sync with sys_tracepoint_refcount" hook
	  before kernel_execve()

?

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT
  2012-03-31 20:45       ` Oleg Nesterov
@ 2012-03-31 21:37         ` Steven Rostedt
  2012-04-01 21:37           ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2012-03-31 21:37 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Ingo Molnar, Jason Baron, linux-kernel

On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote:
> On 03/30, Steven Rostedt wrote:
> >
> > On Fri, 2012-03-30 at 22:15 +0200, Oleg Nesterov wrote:
> >
> > > But I don't really understand why do you think that "clear" is more
> > > important.
> >
> > They are both important. But as I tend to consider performance when
> > tracing is off as critical, I'm more concerned about that. But both must
> > be fixed, because not reporting traces can confuse a developer.
> 
> Ah, got it, thanks.
> 
> I was going to send the simple patch we discussed, but suddenly I
> realized that I have another question.
> 
> Why do we want to filter out the kernel threads in syscall_regfunc?
> 
> >From cc3b13c1 "tracing: Don't trace kernel thread syscalls"
> 
> 	 then it has no effect to trace the kernel thread calls
> 	 to syscalls in that path.
> 	 Setting the TIF_SYSCALL_TRACEPOINT flag is then useless for these.
> 
> OK, but then it doesn't hurt? Or is there another reason why
> TIF_SYSCALL_TRACEPOINT is not desirable on kthread?

Right, it doesn't hurt. I was about to say that in a previous email.

> 
> The problem is ____call_usermodehelper() which execs the user-space
> task. This clears PF_KTHREAD (sets ->mm), but obviously if
> sys_tracepoint_refcount != 0 this is too late.
> 
> So what do you think we should do,
> 
> 	- keep this check
> 
> 	- remove it
> 
> 	- remove it in a separate patch

I say this one (remove it in a separate patch). That way if something
breaks we know exactly what did it ;-)

> 
> 	- add the "sync with sys_tracepoint_refcount" hook
> 	  before kernel_execve()
> 
> ?

-- Steve



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT)
  2012-03-31 21:37         ` Steven Rostedt
@ 2012-04-01 21:37           ` Oleg Nesterov
  2012-04-01 21:38             ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
                               ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-01 21:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
	Frederic Weisbecker

On 03/31, Steven Rostedt wrote:
>
> On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote:
> >
> > So what do you think we should do,
> >
> > 	- keep this check
> >
> > 	- remove it
> >
> > 	- remove it in a separate patch
>
> I say this one (remove it in a separate patch). That way if something
> breaks we know exactly what did it ;-)

OK, agreed.

Don't really know how can I test this... but the kernel didn't
crash after I enabled the syscall tracer ;)

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2012-04-01 21:37           ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
@ 2012-04-01 21:38             ` Oleg Nesterov
  2012-04-01 21:38             ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
  2012-04-20 21:26             ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
  2 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-01 21:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
	Frederic Weisbecker

syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
init_task.tasks list yet.

Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.

While at it,

	- remove _irqsafe from syscall_regfunc/syscall_unregfunc,
	  read_lock(tasklist) doesn't need to disable irqs.

	- change syscall_unregfunc() to check PF_KTHREAD to skip
	  the kernel threads, ->mm != NULL is the common mistake.

	  Note: probably this check should be simply removed, needs
	  another patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/fork.c       |    7 +++++++
 kernel/tracepoint.c |   12 +++++-------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 34d7ed1..772d4a5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1446,7 +1446,14 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
+#ifdef CONFIG_TRACEPOINTS
+	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
+		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+	else
+		clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+#endif
 	write_unlock_irq(&tasklist_lock);
+
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
 	if (clone_flags & CLONE_THREAD)
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index f1539de..e2a4523 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -734,33 +734,31 @@ static int sys_tracepoint_refcount;
 
 void syscall_regfunc(void)
 {
-	unsigned long flags;
 	struct task_struct *g, *t;
 
 	if (!sys_tracepoint_refcount) {
-		read_lock_irqsave(&tasklist_lock, flags);
+		read_lock(&tasklist_lock);
 		do_each_thread(g, t) {
 			/* Skip kernel threads. */
-			if (t->mm)
+			if (!(t->flags & PF_KTHREAD))
 				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		} while_each_thread(g, t);
-		read_unlock_irqrestore(&tasklist_lock, flags);
+		read_unlock(&tasklist_lock);
 	}
 	sys_tracepoint_refcount++;
 }
 
 void syscall_unregfunc(void)
 {
-	unsigned long flags;
 	struct task_struct *g, *t;
 
 	sys_tracepoint_refcount--;
 	if (!sys_tracepoint_refcount) {
-		read_lock_irqsave(&tasklist_lock, flags);
+		read_lock(&tasklist_lock);
 		do_each_thread(g, t) {
 			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		} while_each_thread(g, t);
-		read_unlock_irqrestore(&tasklist_lock, flags);
+		read_unlock(&tasklist_lock);
 	}
 }
 #endif
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2012-04-01 21:37           ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
  2012-04-01 21:38             ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
@ 2012-04-01 21:38             ` Oleg Nesterov
  2012-04-20 21:26             ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
  2 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-01 21:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
	Frederic Weisbecker

syscall_regfunc() ignores the kernel thread because "it has
no effect", see cc3b13c1 "Don't trace kernel thread syscalls".

However, this means that a user-space task spawned by
call_usermodehelper() won't report the system calls if
kernel_execve() is called when sys_tracepoint_refcount != 0.

Remove this check. Hopefully the unnecessary report from
ret_from_fork path mentioned by cc3b13c1 is fine. In fact
"this is the only case" is not true. Say, kernel_execve()
itself does "int 80" on X86_32. Hopefully fine too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/tracepoint.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index e2a4523..2403e60 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -739,9 +739,7 @@ void syscall_regfunc(void)
 	if (!sys_tracepoint_refcount) {
 		read_lock(&tasklist_lock);
 		do_each_thread(g, t) {
-			/* Skip kernel threads. */
-			if (!(t->flags & PF_KTHREAD))
-				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
+			set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		} while_each_thread(g, t);
 		read_unlock(&tasklist_lock);
 	}
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT)
  2012-04-01 21:37           ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
  2012-04-01 21:38             ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
  2012-04-01 21:38             ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
@ 2012-04-20 21:26             ` Oleg Nesterov
  2 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2012-04-20 21:26 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Jason Baron, linux-kernel, Hendrik Brueckner,
	Frederic Weisbecker

Should I resend this or we do not really care?

The problem is minor, but both patches look like the simple
and obvious bugfix to me.

On 04/01, Oleg Nesterov wrote:
>
> On 03/31, Steven Rostedt wrote:
> >
> > On Sat, 2012-03-31 at 22:45 +0200, Oleg Nesterov wrote:
> > >
> > > So what do you think we should do,
> > >
> > > 	- keep this check
> > >
> > > 	- remove it
> > >
> > > 	- remove it in a separate patch
> >
> > I say this one (remove it in a separate patch). That way if something
> > breaks we know exactly what did it ;-)
>
> OK, agreed.
>
> Don't really know how can I test this... but the kernel didn't
> crash after I enabled the syscall tracer ;)
>
> Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-20 21:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-30 18:31 syscall_regfunc() && TIF_SYSCALL_TRACEPOINT Oleg Nesterov
2012-03-30 19:02 ` Steven Rostedt
2012-03-30 20:15   ` Oleg Nesterov
2012-03-31  0:13     ` Steven Rostedt
2012-03-31 20:45       ` Oleg Nesterov
2012-03-31 21:37         ` Steven Rostedt
2012-04-01 21:37           ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov
2012-04-01 21:38             ` [PATCH 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
2012-04-01 21:38             ` [PATCH 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
2012-04-20 21:26             ` [PATCH 0/2] (Was: syscall_regfunc() && TIF_SYSCALL_TRACEPOINT) Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.