All of lore.kernel.org
 help / color / mirror / Atom feed
From: Heiko Carstens <heiko.carstens@de.ibm.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Hendrik Brueckner <brueckner@linux.vnet.ibm.com>,
	Jason Baron <jbaron@redhat.com>,
	linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, rostedt@goodmis.org, peterz@infradead.org,
	jiayingz@google.com, mbligh@google.com, lizf@cn.fujitsu.com
Subject: Re: [PATCH 08/12] add trace events for each syscall entry/exit
Date: Wed, 26 Aug 2009 09:38:20 +0200	[thread overview]
Message-ID: <20090826073819.GA4749@osiris.boeblingen.de.ibm.com> (raw)
In-Reply-To: <20090826000426.0332cbf4@skybase>

On Wed, Aug 26, 2009 at 12:04:26AM +0200, Martin Schwidefsky wrote:
> On Tue, 25 Aug 2009 14:31:19 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > The design proposal for this kthread behavior wrt syscalls is based on a
> > very specific and current kernel behavior, that may happen to change and
> > that I have actually seen proven incorrect. For instance, some
> > proprietary Linux driver does very odd things with system calls within
> > kernel threads, like invoking them with int 0x80.

That's broken.. some proprietary drivers even change the system call table.
Do you want to be able to deal with that as well?

> > Yes, this is odd, but do we really want to tie the tracer that much to
> > the actual OS implementation specificities ?
> > 
> > That sounds like a recipe for endless breakages and missing bits of
> > instrumentation.
> > 
> > So my advice would be: if we want to trace the syscall entry/exit paths,
> > let's trace them for the _whole_ system, and find ways to make it work
> > for corner-cases rather than finding clever ways to diminish
> > instrumentation coverage.
> 
> I guess that the real reason for the crash is hidden in the initialization
> of the pt_regs structure of the kernel thread.

On s390 the reason is that the scvnr in the pt_regs structure of the initial
kernel thread is initialized to 0. svcnr contains the system call number
and system call number 0 does not exist.
That's why we have

static inline long syscall_get_nr(struct task_struct *task,
				  struct pt_regs *regs)
{
	return regs->svcnr ? regs->svcnr : -1;
}

Now, if you fork a kernel thread from the initial task the pt_regs structure
gets copied. Upon ret_from_fork the trace exit path will get -1 for
syscall_get_nr().
 
> > Given the ret from fork example happens to be the first event fired
> > after the thread is created, we should be able to deal with this problem
> > by initializing the thread structure used by syscall exit tracing to an
> > initial "ret from fork" value.
> 
> That is my best guess as well.

What would that value be? __NR_fork?

Syscall tracing of kernel threads seems to be wrong. If somebody would do
a "modprobe" and the init function of the module would create a kernel thread
then syscall_get_nr() at the ret_from_fork path of the kernel thread would
return __NR_init_module. That is of course only true if the old kernel_thread()
API would be used. For kthread_create() it would return the syscall of the
thread from which the kthread daemon was forked (the initial process I would
guess, which was initialized to 0).

So skipping kernel threads at the exit path seems so be the best fix, IMHO ;)

---
 kernel/trace/trace_syscalls.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-next/kernel/trace/trace_syscalls.c
===================================================================
--- linux-next.orig/kernel/trace/trace_syscalls.c
+++ linux-next/kernel/trace/trace_syscalls.c
@@ -253,6 +253,8 @@ void ftrace_syscall_exit(struct pt_regs 
 	struct ring_buffer_event *event;
 	int syscall_nr;
 
+	if (!current->mm)
+		return;
 	syscall_nr = syscall_get_nr(current, regs);
 	if (!test_bit(syscall_nr, enabled_exit_syscalls))
 		return;

  reply	other threads:[~2009-08-26  7:38 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-10 20:52 [PATCH 00/12] add syscall tracepoints V3 Jason Baron
2009-08-10 20:52 ` [PATCH 01/12] map syscall name to number Jason Baron
2009-08-10 20:52 ` [PATCH 02/12] call arch_init_ftrace_syscalls at boot Jason Baron
2009-08-10 20:52 ` [PATCH 03/12] add DECLARE_TRACE_WITH_CALLBACK() macro Jason Baron
2009-08-10 20:52 ` [PATCH 04/12] add syscall tracepoints Jason Baron
2009-08-10 20:52 ` [PATCH 05/12] update FTRACE_SYSCALL_MAX Jason Baron
2009-08-11 11:00   ` Frederic Weisbecker
2009-08-11 19:39     ` Matt Fleming
2009-08-24 13:41     ` Paul Mundt
2009-08-24 14:06       ` Jason Baron
2009-08-24 14:15         ` Paul Mundt
2009-08-24 14:34           ` Frederic Weisbecker
2009-08-24 14:37             ` Paul Mundt
2009-08-24 14:42           ` Jason Baron
2009-08-24 14:50             ` Paul Mundt
2009-08-24 18:34               ` Ingo Molnar
2009-08-10 20:52 ` [PATCH 06/12] trace_event - raw_init bailout Jason Baron
2009-08-10 20:52 ` [PATCH 07/12] add ftrace_event_call void * 'data' field Jason Baron
2009-08-11 10:09   ` Frederic Weisbecker
2009-08-17 22:19     ` Steven Rostedt
2009-08-17 23:09       ` Frederic Weisbecker
2009-08-18  0:06         ` Steven Rostedt
2009-08-10 20:52 ` [PATCH 08/12] add trace events for each syscall entry/exit Jason Baron
2009-08-11 10:50   ` Frederic Weisbecker
2009-08-11 11:45     ` Ingo Molnar
2009-08-11 12:01       ` Frederic Weisbecker
2009-08-25 12:50   ` Hendrik Brueckner
2009-08-25 14:15     ` Frederic Weisbecker
2009-08-25 16:02       ` Hendrik Brueckner
2009-08-25 16:20         ` Mathieu Desnoyers
2009-08-25 16:59           ` Frederic Weisbecker
2009-08-25 17:31             ` Frederic Weisbecker
2009-08-25 18:31               ` Mathieu Desnoyers
2009-08-25 19:42                 ` Frederic Weisbecker
2009-08-25 19:51                   ` Mathieu Desnoyers
2009-08-26  0:19                     ` Frederic Weisbecker
2009-08-26  0:42                       ` Mathieu Desnoyers
2009-08-26  7:28                         ` Ingo Molnar
2009-08-26 17:11                           ` Mathieu Desnoyers
2009-08-26  6:48                   ` Peter Zijlstra
2009-08-25 22:04                 ` Martin Schwidefsky
2009-08-26  7:38                   ` Heiko Carstens [this message]
2009-08-26 12:32                     ` Frederic Weisbecker
2009-08-26  6:21                 ` Peter Zijlstra
2009-08-26 17:08                   ` Mathieu Desnoyers
2009-08-26 18:41                     ` Christoph Hellwig
2009-08-26 18:42                       ` Christoph Hellwig
2009-08-26 19:01                         ` Mathieu Desnoyers
2009-08-26  7:10                 ` Peter Zijlstra
2009-08-26 17:10                   ` Mathieu Desnoyers
2009-08-26 17:24                   ` H. Peter Anvin
2009-08-25 17:04           ` Jason Baron
2009-08-25 18:15             ` Mathieu Desnoyers
2009-08-26 12:35         ` Frederic Weisbecker
2009-08-26 12:59           ` Heiko Carstens
2009-08-26 13:30             ` Frederic Weisbecker
2009-08-26 13:48               ` Steven Rostedt
2009-08-26 13:53                 ` Frederic Weisbecker
2009-08-26 14:44                   ` Steven Rostedt
2009-08-26 13:56                 ` Peter Zijlstra
2009-08-26 14:41                   ` Steven Rostedt
2009-08-26 14:10               ` Heiko Carstens
2009-08-26 14:27                 ` Frederic Weisbecker
2009-08-26 14:43                   ` Steven Rostedt
2009-08-26 16:14                     ` Frederic Weisbecker
2009-08-26 14:43                 ` Steven Rostedt
2009-08-26 14:41           ` Hendrik Brueckner
2009-08-28 12:28         ` [tip:tracing/core] tracing: Don't trace kernel thread syscalls tip-bot for Hendrik Brueckner
2009-08-25 21:40     ` [PATCH 08/12] add trace events for each syscall entry/exit Frederic Weisbecker
2009-08-25 22:09       ` Frederic Weisbecker
2009-08-26  7:47         ` Heiko Carstens
2009-08-28 12:27     ` [tip:tracing/core] tracing: Check invalid syscall nr while tracing syscalls tip-bot for Hendrik Brueckner
2009-08-10 20:52 ` [PATCH 09/12] add support traceopint ids Jason Baron
2009-08-11 11:28   ` Frederic Weisbecker
2009-08-10 20:53 ` [PATCH 10/12] add perf counter support Jason Baron
2009-08-11 12:12   ` Frederic Weisbecker
2009-08-11 12:17     ` Ingo Molnar
2009-08-11 12:25       ` Frederic Weisbecker
2009-08-10 20:53 ` [PATCH 11/12] add more namespace area to 'perf list' output Jason Baron
2009-08-10 20:53 ` [PATCH 12/12] convert x86_64 mmap and uname to use DEFINE_SYSCALL Jason Baron
2009-08-25 12:31 ` [PATCH 00/12] add syscall tracepoints V3 - s390 arch update Hendrik Brueckner
2009-08-25 13:52   ` Frederic Weisbecker
2009-08-25 14:39     ` Heiko Carstens
2009-08-25 19:52       ` Frederic Weisbecker
2009-08-25 15:38     ` Hendrik Brueckner
2009-08-26 16:53   ` Frederic Weisbecker
2009-08-27  7:27     ` [PATCH]: tracing: s390 arch updates for tracing syscalls Hendrik Brueckner
2009-08-28 12:27   ` [tip:tracing/core] tracing: Add syscall tracepoints - s390 arch update tip-bot for Hendrik Brueckner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090826073819.GA4749@osiris.boeblingen.de.ibm.com \
    --to=heiko.carstens@de.ibm.com \
    --cc=brueckner@linux.vnet.ibm.com \
    --cc=fweisbec@gmail.com \
    --cc=jbaron@redhat.com \
    --cc=jiayingz@google.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mbligh@google.com \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.