linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] tracing: make signal tracepoints more useful
@ 2012-01-10 17:45 Oleg Nesterov
  2012-01-10 17:45 ` [PATCH 1/2] tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info Oleg Nesterov
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-10 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Masami Hiramatsu, Seiji Aguchi, Steven Rostedt,
	linux-kernel

Hello.

Linus, I am asking you to review (and hopefully apply) these
changes, I do not know who else can do this. I do not mean the
implementation, the patches are simple. Just the behavioural
change.

2/2 looks like a bugfix to me, but 1/2 changes the output from
trace_signal_generate() and removes trace_signal_overflow_fail.
In essence the change is:

	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",

where
	- grp=0/1 means private or shared

	- res is enum {
			TRACE_SIGNAL_DELIVERED,
			TRACE_SIGNAL_IGNORED,
			TRACE_SIGNAL_ALREADY_PENDING,
			TRACE_SIGNAL_OVERFLOW_FAIL,
			TRACE_SIGNAL_LOSE_INFO,
		};

Obviously this is the user visible change. But personally I do
agree with Seiji who requested this feature. Currently
trace_signal_generate() just records the fact that __send_signal()
was called, you can't know whether the signal was actually sent
or not.

 include/trace/events/signal.h |   85 +++++++++++------------------------------
 kernel/signal.c               |   28 +++++++++----
 2 files changed, 41 insertions(+), 72 deletions(-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info
  2012-01-10 17:45 [PATCH 0/2] tracing: make signal tracepoints more useful Oleg Nesterov
@ 2012-01-10 17:45 ` Oleg Nesterov
  2012-01-10 17:45 ` [PATCH 2/2] tracing: send_sigqueue() needs trace_signal_generate() too Oleg Nesterov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-10 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Masami Hiramatsu, Seiji Aguchi, Steven Rostedt,
	linux-kernel

__send_signal()->trace_signal_generate() doesn't report enough info.
The users want to know was the signal actually delivered or not, and
they also need the shared/private info.

The patch moves trace_signal_generate() at the end of __send_signal()
and adds the 2 additional arguments.

This also allows us to kill trace_signal_overflow_fail/lose_info, we
can simply add the appropriate TRACE_SIGNAL_ "result" codes.

Reported-by: Seiji Aguchi <saguchi@redhat.com>
Reviewed-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/events/signal.h |   85 +++++++++++------------------------------
 kernel/signal.c               |   22 +++++++----
 2 files changed, 36 insertions(+), 71 deletions(-)

diff --git a/include/trace/events/signal.h b/include/trace/events/signal.h
index 17df434..39a8a43 100644
--- a/include/trace/events/signal.h
+++ b/include/trace/events/signal.h
@@ -23,11 +23,23 @@
 		}						\
 	} while (0)
 
+#ifndef TRACE_HEADER_MULTI_READ
+enum {
+	TRACE_SIGNAL_DELIVERED,
+	TRACE_SIGNAL_IGNORED,
+	TRACE_SIGNAL_ALREADY_PENDING,
+	TRACE_SIGNAL_OVERFLOW_FAIL,
+	TRACE_SIGNAL_LOSE_INFO,
+};
+#endif
+
 /**
  * signal_generate - called when a signal is generated
  * @sig: signal number
  * @info: pointer to struct siginfo
  * @task: pointer to struct task_struct
+ * @group: shared or private
+ * @result: TRACE_SIGNAL_*
  *
  * Current process sends a 'sig' signal to 'task' process with
  * 'info' siginfo. If 'info' is SEND_SIG_NOINFO or SEND_SIG_PRIV,
@@ -37,9 +49,10 @@
  */
 TRACE_EVENT(signal_generate,
 
-	TP_PROTO(int sig, struct siginfo *info, struct task_struct *task),
+	TP_PROTO(int sig, struct siginfo *info, struct task_struct *task,
+			int group, int result),
 
-	TP_ARGS(sig, info, task),
+	TP_ARGS(sig, info, task, group, result),
 
 	TP_STRUCT__entry(
 		__field(	int,	sig			)
@@ -47,6 +60,8 @@ TRACE_EVENT(signal_generate,
 		__field(	int,	code			)
 		__array(	char,	comm,	TASK_COMM_LEN	)
 		__field(	pid_t,	pid			)
+		__field(	int,	group			)
+		__field(	int,	result			)
 	),
 
 	TP_fast_assign(
@@ -54,11 +69,14 @@ TRACE_EVENT(signal_generate,
 		TP_STORE_SIGINFO(__entry, info);
 		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
 		__entry->pid	= task->pid;
+		__entry->group	= group;
+		__entry->result	= result;
 	),
 
-	TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
+	TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
 		  __entry->sig, __entry->errno, __entry->code,
-		  __entry->comm, __entry->pid)
+		  __entry->comm, __entry->pid, __entry->group,
+		  __entry->result)
 );
 
 /**
@@ -101,65 +119,6 @@ TRACE_EVENT(signal_deliver,
 		  __entry->sa_handler, __entry->sa_flags)
 );
 
-DECLARE_EVENT_CLASS(signal_queue_overflow,
-
-	TP_PROTO(int sig, int group, struct siginfo *info),
-
-	TP_ARGS(sig, group, info),
-
-	TP_STRUCT__entry(
-		__field(	int,	sig	)
-		__field(	int,	group	)
-		__field(	int,	errno	)
-		__field(	int,	code	)
-	),
-
-	TP_fast_assign(
-		__entry->sig	= sig;
-		__entry->group	= group;
-		TP_STORE_SIGINFO(__entry, info);
-	),
-
-	TP_printk("sig=%d group=%d errno=%d code=%d",
-		  __entry->sig, __entry->group, __entry->errno, __entry->code)
-);
-
-/**
- * signal_overflow_fail - called when signal queue is overflow
- * @sig: signal number
- * @group: signal to process group or not (bool)
- * @info: pointer to struct siginfo
- *
- * Kernel fails to generate 'sig' signal with 'info' siginfo, because
- * siginfo queue is overflow, and the signal is dropped.
- * 'group' is not 0 if the signal will be sent to a process group.
- * 'sig' is always one of RT signals.
- */
-DEFINE_EVENT(signal_queue_overflow, signal_overflow_fail,
-
-	TP_PROTO(int sig, int group, struct siginfo *info),
-
-	TP_ARGS(sig, group, info)
-);
-
-/**
- * signal_lose_info - called when siginfo is lost
- * @sig: signal number
- * @group: signal to process group or not (bool)
- * @info: pointer to struct siginfo
- *
- * Kernel generates 'sig' signal but loses 'info' siginfo, because siginfo
- * queue is overflow.
- * 'group' is not 0 if the signal will be sent to a process group.
- * 'sig' is always one of non-RT signals.
- */
-DEFINE_EVENT(signal_queue_overflow, signal_lose_info,
-
-	TP_PROTO(int sig, int group, struct siginfo *info),
-
-	TP_ARGS(sig, group, info)
-);
-
 #endif /* _TRACE_SIGNAL_H */
 
 /* This part must be outside protection */
diff --git a/kernel/signal.c b/kernel/signal.c
index bb0efa5..4515114 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1025,13 +1025,13 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	struct sigpending *pending;
 	struct sigqueue *q;
 	int override_rlimit;
-
-	trace_signal_generate(sig, info, t);
+	int ret = 0, result;
 
 	assert_spin_locked(&t->sighand->siglock);
 
+	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, from_ancestor_ns))
-		return 0;
+		goto ret;
 
 	pending = group ? &t->signal->shared_pending : &t->pending;
 	/*
@@ -1039,8 +1039,11 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 	 * exactly one non-rt signal, so that we can get more
 	 * detailed information about the cause of the signal.
 	 */
+	result = TRACE_SIGNAL_ALREADY_PENDING;
 	if (legacy_queue(pending, sig))
-		return 0;
+		goto ret;
+
+	result = TRACE_SIGNAL_DELIVERED;
 	/*
 	 * fast-pathed signals for kernel-internal things like SIGSTOP
 	 * or SIGKILL.
@@ -1095,14 +1098,15 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			 * signal was rt and sent by user using something
 			 * other than kill().
 			 */
-			trace_signal_overflow_fail(sig, group, info);
-			return -EAGAIN;
+			result = TRACE_SIGNAL_OVERFLOW_FAIL;
+			ret = -EAGAIN;
+			goto ret;
 		} else {
 			/*
 			 * This is a silent loss of information.  We still
 			 * send the signal, but the *info bits are lost.
 			 */
-			trace_signal_lose_info(sig, group, info);
+			result = TRACE_SIGNAL_LOSE_INFO;
 		}
 	}
 
@@ -1110,7 +1114,9 @@ out_set:
 	signalfd_notify(t, sig);
 	sigaddset(&pending->signal, sig);
 	complete_signal(sig, t, group);
-	return 0;
+ret:
+	trace_signal_generate(sig, info, t, group, result);
+	return ret;
 }
 
 static int send_signal(int sig, struct siginfo *info, struct task_struct *t,
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] tracing: send_sigqueue() needs trace_signal_generate() too
  2012-01-10 17:45 [PATCH 0/2] tracing: make signal tracepoints more useful Oleg Nesterov
  2012-01-10 17:45 ` [PATCH 1/2] tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info Oleg Nesterov
@ 2012-01-10 17:45 ` Oleg Nesterov
  2012-01-10 18:59 ` [PATCH 0/2] tracing: make signal tracepoints more useful Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-10 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Masami Hiramatsu, Seiji Aguchi, Steven Rostedt,
	linux-kernel

Add trace_signal_generate() into send_sigqueue().

send_sigqueue() is very similar to __send_signal(), just it uses
the preallocated info. It should do the same wrt tracing.

Reported-by: Seiji Aguchi <saguchi@redhat.com>
Reviewed-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/signal.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4515114..a6ff776 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1559,7 +1559,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 	int sig = q->info.si_signo;
 	struct sigpending *pending;
 	unsigned long flags;
-	int ret;
+	int ret, result;
 
 	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
 
@@ -1568,6 +1568,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 		goto ret;
 
 	ret = 1; /* the signal is ignored */
+	result = TRACE_SIGNAL_IGNORED;
 	if (!prepare_signal(sig, t, 0))
 		goto out;
 
@@ -1579,6 +1580,7 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 		 */
 		BUG_ON(q->info.si_code != SI_TIMER);
 		q->info.si_overrun++;
+		result = TRACE_SIGNAL_ALREADY_PENDING;
 		goto out;
 	}
 	q->info.si_overrun = 0;
@@ -1588,7 +1590,9 @@ int send_sigqueue(struct sigqueue *q, struct task_struct *t, int group)
 	list_add_tail(&q->list, &pending->list);
 	sigaddset(&pending->signal, sig);
 	complete_signal(sig, t, group);
+	result = TRACE_SIGNAL_DELIVERED;
 out:
+	trace_signal_generate(sig, &q->info, t, group, result);
 	unlock_task_sighand(t, &flags);
 ret:
 	return ret;
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/2] tracing: make signal tracepoints more useful
  2012-01-10 17:45 [PATCH 0/2] tracing: make signal tracepoints more useful Oleg Nesterov
  2012-01-10 17:45 ` [PATCH 1/2] tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info Oleg Nesterov
  2012-01-10 17:45 ` [PATCH 2/2] tracing: send_sigqueue() needs trace_signal_generate() too Oleg Nesterov
@ 2012-01-10 18:59 ` Steven Rostedt
  2012-01-10 21:09 ` Seiji Aguchi
  2012-01-13 18:20 ` [GIT PULL] " Oleg Nesterov
  4 siblings, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2012-01-10 18:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Ingo Molnar, Masami Hiramatsu, Seiji Aguchi,
	linux-kernel

On Tue, 2012-01-10 at 18:45 +0100, Oleg Nesterov wrote:

> Obviously this is the user visible change. But personally I do
> agree with Seiji who requested this feature. Currently
> trace_signal_generate() just records the fact that __send_signal()
> was called, you can't know whether the signal was actually sent
> or not.

Adding more to a tracepoint is never an issue, even without a library to
parse the data correctly (which we still need in the distros). Thus this
change should have no issues.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 0/2] tracing: make signal tracepoints more useful
  2012-01-10 17:45 [PATCH 0/2] tracing: make signal tracepoints more useful Oleg Nesterov
                   ` (2 preceding siblings ...)
  2012-01-10 18:59 ` [PATCH 0/2] tracing: make signal tracepoints more useful Steven Rostedt
@ 2012-01-10 21:09 ` Seiji Aguchi
  2012-01-13 18:20 ` [GIT PULL] " Oleg Nesterov
  4 siblings, 0 replies; 27+ messages in thread
From: Seiji Aguchi @ 2012-01-10 21:09 UTC (permalink / raw)
  To: Oleg Nesterov, Linus Torvalds
  Cc: Ingo Molnar, Masami Hiramatsu, Seiji Aguchi, Steven Rostedt,
	linux-kernel


>Obviously this is the user visible change. But personally I do
>agree with Seiji who requested this feature. Currently
>trace_signal_generate() just records the fact that __send_signal()
>was called, you can't know whether the signal was actually sent
>or not.

We can know behaviors of signal event in detail with this patchset.

Thanks, Oleg.

Seiji




^ permalink raw reply	[flat|nested] 27+ messages in thread

* [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-10 17:45 [PATCH 0/2] tracing: make signal tracepoints more useful Oleg Nesterov
                   ` (3 preceding siblings ...)
  2012-01-10 21:09 ` Seiji Aguchi
@ 2012-01-13 18:20 ` Oleg Nesterov
  2012-01-15 18:24   ` Oleg Nesterov
  2012-01-26 10:10   ` Ingo Molnar
  4 siblings, 2 replies; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-13 18:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Masami Hiramatsu, Seiji Aguchi, Steven Rostedt,
	linux-kernel

Hello,

Please pull from

	git://github.com/utrace/linux sigtrace


Another (4th) attempt to push these simple changes, now in the form
of a pull request (yes, github, I still can't restore my korg account).

2/2 looks like a bugfix to me, but 1/2 changes the output from
trace_signal_generate() and removes trace_signal_overflow_fail.
In essence the change is:

	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",

where
	- grp=0/1 means private or shared

	- res is enum {
			TRACE_SIGNAL_DELIVERED,
			TRACE_SIGNAL_IGNORED,
			TRACE_SIGNAL_ALREADY_PENDING,
			TRACE_SIGNAL_OVERFLOW_FAIL,
			TRACE_SIGNAL_LOSE_INFO,
		};

Obviously this is the user visible change. But personally I do
agree with Seiji who requested this feature. Currently
trace_signal_generate() just records the fact that __send_signal()
was called, you can't know whether the signal was actually sent
or not.

Steven Rostedt says:

	Adding more to a tracepoint is never an issue, even without a library to
	parse the data correctly (which we still need in the distros). Thus this
	change should have no issues.



Oleg Nesterov (2):
      tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info
      tracing: send_sigqueue() needs trace_signal_generate() too

 include/trace/events/signal.h |   85 +++++++++++------------------------------
 kernel/signal.c               |   28 +++++++++----
 2 files changed, 41 insertions(+), 72 deletions(-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-13 18:20 ` [GIT PULL] " Oleg Nesterov
@ 2012-01-15 18:24   ` Oleg Nesterov
  2012-01-16  7:45     ` Ingo Molnar
  2012-01-26 10:10   ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-15 18:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Masami Hiramatsu, Seiji Aguchi, Steven Rostedt,
	linux-kernel

On 01/13, Oleg Nesterov wrote:
>
> Hello,

ping ;)

> Please pull from
>
> 	git://github.com/utrace/linux sigtrace
>
>
> Another (4th) attempt to push these simple changes, now in the form
> of a pull request (yes, github, I still can't restore my korg account).
>
> 2/2 looks like a bugfix to me, but 1/2 changes the output from
> trace_signal_generate() and removes trace_signal_overflow_fail.
> In essence the change is:
>
> 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
>
> where
> 	- grp=0/1 means private or shared
>
> 	- res is enum {
> 			TRACE_SIGNAL_DELIVERED,
> 			TRACE_SIGNAL_IGNORED,
> 			TRACE_SIGNAL_ALREADY_PENDING,
> 			TRACE_SIGNAL_OVERFLOW_FAIL,
> 			TRACE_SIGNAL_LOSE_INFO,
> 		};
>
> Obviously this is the user visible change. But personally I do
> agree with Seiji who requested this feature. Currently
> trace_signal_generate() just records the fact that __send_signal()
> was called, you can't know whether the signal was actually sent
> or not.
>
> Steven Rostedt says:
>
> 	Adding more to a tracepoint is never an issue, even without a library to
> 	parse the data correctly (which we still need in the distros). Thus this
> 	change should have no issues.
>
>
>
> Oleg Nesterov (2):
>       tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info
>       tracing: send_sigqueue() needs trace_signal_generate() too
>
>  include/trace/events/signal.h |   85 +++++++++++------------------------------
>  kernel/signal.c               |   28 +++++++++----
>  2 files changed, 41 insertions(+), 72 deletions(-)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-15 18:24   ` Oleg Nesterov
@ 2012-01-16  7:45     ` Ingo Molnar
  2012-01-16 12:31       ` Steven Rostedt
  2012-01-16 15:03       ` Oleg Nesterov
  0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-16  7:45 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Ingo Molnar, Masami Hiramatsu, Seiji Aguchi,
	Steven Rostedt, linux-kernel, Masami Hiramatsu


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 01/13, Oleg Nesterov wrote:
> >
> > Hello,
> 
> ping ;)
> 
> > Please pull from
> >
> > 	git://github.com/utrace/linux sigtrace
> >
> >
> > Another (4th) attempt to push these simple changes, now in the form
> > of a pull request (yes, github, I still can't restore my korg account).
> >
> > 2/2 looks like a bugfix to me, but 1/2 changes the output from
> > trace_signal_generate() and removes trace_signal_overflow_fail.
> > In essence the change is:
> >
> > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
> >
> > where
> > 	- grp=0/1 means private or shared
> >
> > 	- res is enum {
> > 			TRACE_SIGNAL_DELIVERED,
> > 			TRACE_SIGNAL_IGNORED,
> > 			TRACE_SIGNAL_ALREADY_PENDING,
> > 			TRACE_SIGNAL_OVERFLOW_FAIL,
> > 			TRACE_SIGNAL_LOSE_INFO,
> > 		};
> >
> > Obviously this is the user visible change. But personally I do
> > agree with Seiji who requested this feature. Currently
> > trace_signal_generate() just records the fact that __send_signal()
> > was called, you can't know whether the signal was actually sent
> > or not.
> >
> > Steven Rostedt says:
> >
> > 	Adding more to a tracepoint is never an issue, even without a library to
> > 	parse the data correctly (which we still need in the distros). Thus this
> > 	change should have no issues.
> >
> >
> >
> > Oleg Nesterov (2):
> >       tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info
> >       tracing: send_sigqueue() needs trace_signal_generate() too
> >
> >  include/trace/events/signal.h |   85 +++++++++++------------------------------
> >  kernel/signal.c               |   28 +++++++++----
> >  2 files changed, 41 insertions(+), 72 deletions(-)

I've also Cc:-ed Masami-san who appears to have introduced most 
of this trace information.

Looks good to me at a first (quick) sight, except this bit 
which changes the ABI:

> > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",

That's not how we change tracepoints generally - we add a new 
one and eventually phase out the old one. Which apps/tools rely 
on the old tracepoint? If it's exactly zero apps then we might 
be able to change it, but this needs to be investigated.

Note, it might make sense to send these as two patches to lkml 
with me Cc:-ed to avoid any github trust issues, i can apply 
them and push them to Linus.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16  7:45     ` Ingo Molnar
@ 2012-01-16 12:31       ` Steven Rostedt
  2012-01-16 12:53         ` Ingo Molnar
  2012-01-16 15:03       ` Oleg Nesterov
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2012-01-16 12:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Mon, 2012-01-16 at 08:45 +0100, Ingo Molnar wrote:

> Looks good to me at a first (quick) sight, except this bit 
> which changes the ABI:
> 
> > > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
> 
> That's not how we change tracepoints generally - we add a new 
> one and eventually phase out the old one. Which apps/tools rely 
> on the old tracepoint? If it's exactly zero apps then we might 
> be able to change it, but this needs to be investigated.

But this tracepoint wasn't changed, it was added on to. There's a
difference. Any tool that uses this (including something like powertop)
should be able to handle it. It should be no different than adding
to /proc/stat. We don't create a new /proc file when adding to it. The
original structure is still intact here.

We really need to get a parsing library out to the public. That would
avoid all these issues as the TRACE_EVENT() was originally designed to.

-- Steve




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 12:31       ` Steven Rostedt
@ 2012-01-16 12:53         ` Ingo Molnar
  2012-01-16 15:10           ` Oleg Nesterov
  2012-01-16 15:52           ` Steven Rostedt
  0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-16 12:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2012-01-16 at 08:45 +0100, Ingo Molnar wrote:
> 
> > Looks good to me at a first (quick) sight, except this bit 
> > which changes the ABI:
> > 
> > > > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > > > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
> > 
> > That's not how we change tracepoints generally - we add a new 
> > one and eventually phase out the old one. Which apps/tools rely 
> > on the old tracepoint? If it's exactly zero apps then we might 
> > be able to change it, but this needs to be investigated.
> 
> But this tracepoint wasn't changed, it was added on to. 
> There's a difference. Any tool that uses this (including 
> something like powertop) should be able to handle it. [...]

That's mostly true in theory - the question is, is it true in 
practice?

Say if an app relies on the smaller data structure, it sure 
might get surprised by the kernel writing a wider record ...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16  7:45     ` Ingo Molnar
  2012-01-16 12:31       ` Steven Rostedt
@ 2012-01-16 15:03       ` Oleg Nesterov
  2012-01-16 15:42         ` Seiji Aguchi
  2012-01-16 15:58         ` Steven Rostedt
  1 sibling, 2 replies; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-16 15:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Ingo Molnar, Masami Hiramatsu, Seiji Aguchi,
	Steven Rostedt, linux-kernel, Masami Hiramatsu

On 01/16, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > 2/2 looks like a bugfix to me, but 1/2 changes the output from
> > > trace_signal_generate() and removes trace_signal_overflow_fail.
> > > In essence the change is:
> > >
> > > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
> > >
>
> I've also Cc:-ed Masami-san who appears to have introduced most
> of this trace information.

Thanks... although he is already cc'ed, may be I used the wrong
email.

> Looks good to me at a first (quick) sight, except this bit
> which changes the ABI:
>
> > > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
>
> That's not how we change tracepoints generally - we add a new
> one and eventually phase out the old one.

Well, "vmscan/trace: Add 'file' info to trace_mm_vmscan_lru_isolate()"
ea4d349f adds the new field too. In fact this patch (floating in -mm)
plus the previous discussion convinced me we should go this way.

> Which apps/tools rely
> on the old tracepoint? If it's exactly zero apps then we might
> be able to change it, but this needs to be investigated.

If only I knew. How can we investigate this? Hopefully nothing
relies on the old tracepoint, but who knows.

OK. So we should add the new tracepoint. Looks a bit ugly, but
I understand your concerns.


Say, trace_send_signal(sig, info, t, group, result), OK?


Seiji, please double check this is all you need, it won't be
simply to change this tracepoint again. While we are adding the
new one, we can add/change something in TP_STRUCT__entry if you
think this is needed.

> Note, it might make sense to send these as two patches to lkml
> with me Cc:-ed to avoid any github trust issues, i can apply
> them and push them to Linus.

Ingo, I sent them 3 times and you were cc'ed ;)

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 12:53         ` Ingo Molnar
@ 2012-01-16 15:10           ` Oleg Nesterov
  2012-01-17 18:50             ` Linus Torvalds
  2012-01-16 15:52           ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-16 15:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On 01/16, Ingo Molnar wrote:
>
> * Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Mon, 2012-01-16 at 08:45 +0100, Ingo Molnar wrote:
> >
> > > Looks good to me at a first (quick) sight, except this bit
> > > which changes the ABI:
> > >
> > > > > 	-       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d",
> > > > > 	+       TP_printk("sig=%d errno=%d code=%d comm=%s pid=%d grp=%d res=%d",
> > >
> > > That's not how we change tracepoints generally - we add a new
> > > one and eventually phase out the old one. Which apps/tools rely
> > > on the old tracepoint? If it's exactly zero apps then we might
> > > be able to change it, but this needs to be investigated.
> >
> > But this tracepoint wasn't changed, it was added on to.
> > There's a difference. Any tool that uses this (including
> > something like powertop) should be able to handle it. [...]
>
> That's mostly true in theory - the question is, is it true in
> practice?
>
> Say if an app relies on the smaller data structure, it sure
> might get surprised by the kernel writing a wider record ...

OK, I am not arguing, I'll resend the patch which adds the new
tracepoint...

But do we really need to keep the old tracepoint? IOW, what if
we simply rename it and add more info?

I am looking at "git log include/trace/events/", for example
"mm-tracepoint: rename page-free events" b413d48a.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 15:03       ` Oleg Nesterov
@ 2012-01-16 15:42         ` Seiji Aguchi
  2012-01-16 15:58         ` Steven Rostedt
  1 sibling, 0 replies; 27+ messages in thread
From: Seiji Aguchi @ 2012-01-16 15:42 UTC (permalink / raw)
  To: Oleg Nesterov, Ingo Molnar
  Cc: Linus Torvalds, Ingo Molnar, Masami Hiramatsu, Seiji Aguchi,
	Steven Rostedt, linux-kernel, Masami Hiramatsu

>OK. So we should add the new tracepoint. Looks a bit ugly, but
>I understand your concerns.
>
>
>Say, trace_send_signal(sig, info, t, group, result), OK?
>
>
>Seiji, please double check this is all you need, it won't be
>simply to change this tracepoint again. While we are adding the
>new one, we can add/change something in TP_STRUCT__entry if you
>think this is needed.

I agree to add the new tracepoint as well.
I will review updated patch if you send it.

Seiji

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 12:53         ` Ingo Molnar
  2012-01-16 15:10           ` Oleg Nesterov
@ 2012-01-16 15:52           ` Steven Rostedt
  2012-01-17 10:02             ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2012-01-16 15:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Mon, 2012-01-16 at 13:53 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:

> Say if an app relies on the smaller data structure, it sure 
> might get surprised by the kernel writing a wider record ...

Ingo,

The kernel does this all the time. We have syscalls that may extend the
data structure. This is a common practice. Any app that depends on a
data structure remaining the same size for no good reason is broken by
design.

Now we are making tracepoints stricter than system calls?

I'm starting to regret adding the whole TRACE_EVENT() interface.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 15:03       ` Oleg Nesterov
  2012-01-16 15:42         ` Seiji Aguchi
@ 2012-01-16 15:58         ` Steven Rostedt
  1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2012-01-16 15:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Mon, 2012-01-16 at 16:03 +0100, Oleg Nesterov wrote:

> > Which apps/tools rely
> > on the old tracepoint? If it's exactly zero apps then we might
> > be able to change it, but this needs to be investigated.
> 
> If only I knew. How can we investigate this? Hopefully nothing
> relies on the old tracepoint, but who knows.

I remember Linus saying that we break ABI all the time. It's when
someone complains about it that we revert the breakage.

The chances that an app is using this tracepoint is slim. The chances
that an app is using this tracepoint *and* requires the data structure
size to remain the same is miniscule.

> 
> OK. So we should add the new tracepoint. Looks a bit ugly, but
> I understand your concerns.

And it can bloat the kernel. Each tracepoint can add up to 5K in size.
I'm still working on fixing that.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 15:52           ` Steven Rostedt
@ 2012-01-17 10:02             ` Ingo Molnar
  2012-01-17 12:03               ` Steven Rostedt
  2012-01-17 19:52               ` Steven Rostedt
  0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-17 10:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Mon, 2012-01-16 at 13:53 +0100, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Say if an app relies on the smaller data structure, it sure 
> > might get surprised by the kernel writing a wider record ...
> 
> Ingo,
> 
> The kernel does this all the time. We have syscalls that may 
> extend the data structure. [...]

That is not true *AT ALL* in such an unqualified manner. Steve, 
stop being stupid.

The kernel syscall ABI may indeed sometimes expand *INPUT* 
structures (if via some mechanism it's possible to make sure 
that old ABI uses don't cause the kernel to read undefined 
data), but the trace events are *OUTPUT* structures.

The kernel ABI never ever expands output structures, unless the 
ABI itself is expanded in a safe way that makes it impossible 
for old apps to trigger the new output logic, i.e. which makes 
sure that all old apps work as well. We never ever write beyond 
an already existing 40 byte ABI structure and corrupt the 41th 
byte for an existing app!

In this case we simply don't know whether such an app exists. It 
might be OK to do it, or it might be not so ok.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 10:02             ` Ingo Molnar
@ 2012-01-17 12:03               ` Steven Rostedt
  2012-01-17 12:40                 ` Ingo Molnar
  2012-01-17 19:52               ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Steven Rostedt @ 2012-01-17 12:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Tue, 2012-01-17 at 11:02 +0100, Ingo Molnar wrote:

> That is not true *AT ALL* in such an unqualified manner. Steve, 
> stop being stupid.
> 
> The kernel syscall ABI may indeed sometimes expand *INPUT* 
> structures (if via some mechanism it's possible to make sure 
> that old ABI uses don't cause the kernel to read undefined 
> data), but the trace events are *OUTPUT* structures.

The difference between syscalls and tracepoints is that a tracepoint
always reports the size of the structure that was read, where a syscall
does not. So I do consider this similar to reading the /proc/stat file
as the user can see how much was read. The backwards compatibility
should be easy to write. Old tools should not break, because it wont be
reading the new fields, and new tools can determine which tracepoint is
there because it is trivial to see which version of the tracepoint is
there because of the size read.

But as I'm stupid, I'll shut up now.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 12:03               ` Steven Rostedt
@ 2012-01-17 12:40                 ` Ingo Molnar
  2012-01-17 14:04                   ` Oleg Nesterov
  2012-01-17 14:37                   ` Steven Rostedt
  0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-17 12:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu


* Steven Rostedt <rostedt@goodmis.org> wrote:

> On Tue, 2012-01-17 at 11:02 +0100, Ingo Molnar wrote:
> 
> > That is not true *AT ALL* in such an unqualified manner. Steve, 
> > stop being stupid.
> > 
> > The kernel syscall ABI may indeed sometimes expand *INPUT* 
> > structures (if via some mechanism it's possible to make sure 
> > that old ABI uses don't cause the kernel to read undefined 
> > data), but the trace events are *OUTPUT* structures.
> 
> The difference between syscalls and tracepoints is that a 
> tracepoint always reports the size of the structure that was 
> read, where a syscall does not. So I do consider this similar 
> to reading the /proc/stat file as the user can see how much 
> was read. The backwards compatibility should be easy to write. 
> Old tools should not break, because it wont be reading the new 
> fields, and new tools can determine which tracepoint is there 
> because it is trivial to see which version of the tracepoint 
> is there because of the size read.

Any tool that requests the signal trace event, and copies the 
full (and now larger) record it got in the ring-buffer, without 
expanding the target record's size accordingly will *BREAK*. 

I do not claim that tools will break in practice - i'm raising 
the *possibility* out of caution and i'm frustrated that you 
*STILL* don't understand how ABIs are maintained in Linux. 

You arguing about defined semantics is *MEANINGLESS*. What 
matters is what the apps do in practice. If the apps we know 
about do it robustly and adapt (or don't care) about the 
expansion, and if no-one reports a regression in tools we don't 
know about, then it's probably fine.

But your argument that expansion is somehow part of the ABI is 
patently false and misses the point. Seeing your arguments make 
me *very* nervous about applying any ABI affecting patch from 
you.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 12:40                 ` Ingo Molnar
@ 2012-01-17 14:04                   ` Oleg Nesterov
  2012-01-18 11:59                     ` Ingo Molnar
  2012-01-17 14:37                   ` Steven Rostedt
  1 sibling, 1 reply; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-17 14:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On 01/17, Ingo Molnar wrote:
>
> Any tool that requests the signal trace event, and copies the
> full (and now larger) record it got in the ring-buffer, without
> expanding the target record's size accordingly will *BREAK*.
>
> I do not claim that tools will break in practice - i'm raising
> the *possibility* out of caution and i'm frustrated that you
> *STILL* don't understand how ABIs are maintained in Linux.

OK, but what if we rename the tracepoint?

IOW, add the new tracepoint and remove the old one. Of course,
this can confuse the users of the current "signal_generate", but
this is safe. b413d48a does this...

Or this is not allowed too?

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 12:40                 ` Ingo Molnar
  2012-01-17 14:04                   ` Oleg Nesterov
@ 2012-01-17 14:37                   ` Steven Rostedt
  2012-01-17 14:55                     ` Oleg Nesterov
  2012-01-20 18:01                     ` Jason Baron
  1 sibling, 2 replies; 27+ messages in thread
From: Steven Rostedt @ 2012-01-17 14:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Tue, 2012-01-17 at 13:40 +0100, Ingo Molnar wrote:
> * Steven Rostedt <rostedt@goodmis.org> wrote:

> Any tool that requests the signal trace event, and copies the 
> full (and now larger) record it got in the ring-buffer, without 
> expanding the target record's size accordingly will *BREAK*. 

I'm curious to where it gets the size?

This is not like the kernel writing to a pointer in userspace memory,
where it can indeed break code by writing too much. This is the
userspace program writing from a shared memory location.


> 
> I do not claim that tools will break in practice - i'm raising 
> the *possibility* out of caution and i'm frustrated that you 
> *STILL* don't understand how ABIs are maintained in Linux. 

You are defending code that would do:

	size = read_size(ring_buffer_event);
	memcpy(data, buffer, size);

over code that would most likely do:

	memcpy(data, buffer, sizeof(*data));

???

According to this logic, we should never increase the size
of /proc/stat, because someone might do:

	i = 0;
	fd = open("/proc/stat", O_RDONLY);
	do {
		r = read(fd, buff+i, BUFSIZ);
		i += r;
	} while (r > 0);



> 
> You arguing about defined semantics is *MEANINGLESS*. What 
> matters is what the apps do in practice.

Exactly, to depend on the ring buffer size to do all copies to fixed
size data structures seems to be backwards to what would be done in
practice.


>  If the apps we know 
> about do it robustly and adapt (or don't care) about the 
> expansion, and if no-one reports a regression in tools we don't 
> know about, then it's probably fine.

It's not about robustness, it's about the easy way to copy.

	memcpy(data, buffer, sizeof(*data));

wont break.


> But your argument that expansion is somehow part of the ABI is 
> patently false and misses the point. Seeing your arguments make 
> me *very* nervous about applying any ABI affecting patch from 
> you.

Well you already think I'm stupid, I wont change the ABI anymore.
Obviously I know nothing, because I created a flexible interface that's
not used by anything except perf and trace-cmd, but because there's no
library, we are stuck with fixed tracepoints, which will come to haunt
us in the not so distant future.

This will bloat the kernel. Tracepoints are not free. They bloat the
kernel's text section. Every tracepoint still adds a bit of code in the
"unlikely" part inlined where they are called. So they do have an affect
on icache, as well as the code to process the tracepoint (around 5k per
tracepoint).

People are adding tracepoints all over the kernel without much thought.
We take much more care when we add a system call. By making tracepoints
have as strict a ABI as system calls will cause a maintenance nightmare
in the future. I guarantee that!


           compiled in my box       in include/trace/events  in rest of kernel
In v3.0:         250                        259                  330
In v3.1:         311                        349                  338
in v3.2:         335                        393                  351
latest Linus:    340                        401                  345

(Interesting that tracepoints have disappeared outside the events
directory)

Note, these do not include the syscall tracepoints.

There's already more tracepoints (compiled for my box) than syscalls.

Tracepoints are being added like the US deficit. We need to set some
rules somewhere. Either by making a library that can handle small
changes (like the one we are discussing, even though a memcpy should
cope), or we need to put a kabosh to adding new tracepoints like they
are the new fad app. Perhaps we should put the same requirements on new
tracepoints as we do with new syscalls.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 14:37                   ` Steven Rostedt
@ 2012-01-17 14:55                     ` Oleg Nesterov
  2012-01-20 18:01                     ` Jason Baron
  1 sibling, 0 replies; 27+ messages in thread
From: Oleg Nesterov @ 2012-01-17 14:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On 01/17, Steven Rostedt wrote:
>
> People are adding tracepoints all over the kernel without much thought.

Including the tracepoints in __send_signal(). They are stupid and limited.

Yes, I should blame myself in the first place. I was cc'ed, I participated
in the discussion when they were added.

> We take much more care when we add a system call. By making tracepoints
> have as strict a ABI as system calls will cause a maintenance nightmare
> in the future. I guarantee that!

Personally I agree.

That said, I am going to send the patch which adds the new tracepoint
unless Ingo changes his mind. The new info was requested by users who
actually use these events.

Oleg.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-16 15:10           ` Oleg Nesterov
@ 2012-01-17 18:50             ` Linus Torvalds
  2012-01-18  9:39               ` Ingo Molnar
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2012-01-17 18:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Steven Rostedt, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Mon, Jan 16, 2012 at 7:10 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> But do we really need to keep the old tracepoint? IOW, what if
> we simply rename it and add more info?

Quite frankly, unless somebody can point to something that breaks, I'd
rather just change the existing one.

Nobody outside of a few special cases uses tracepoints. *nobody*.  The
only apps I have ever seen that matters to anybody ends up being
latencytop and powertop. If those two have been tested and don't care,
I don't think we should care.

                       Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 10:02             ` Ingo Molnar
  2012-01-17 12:03               ` Steven Rostedt
@ 2012-01-17 19:52               ` Steven Rostedt
  1 sibling, 0 replies; 27+ messages in thread
From: Steven Rostedt @ 2012-01-17 19:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Tue, 2012-01-17 at 11:02 +0100, Ingo Molnar wrote:
>  
> > The kernel does this all the time. We have syscalls that may 
> > extend the data structure. [...]
> 
> That is not true *AT ALL* in such an unqualified manner. Steve, 
> stop being stupid.

OK, Ingo I think we had a little miscommunication here.

I forgot what I originally wrote, and you cut off too much in your
reply. What I originally said:


"The kernel does this all the time. We have syscalls that may extend the
data structure. This is a common practice. Any app that depends on a
data structure remaining the same size for no good reason is broken by
design."

If you took the "Any app that depends on a data structure remaining the
same size for no good reason is broken by design" was suppose to mean,
any app that depends on syscall data structures is broken by design, I
would agree, that statement is stupid. I didn't write that well, as my
thought process switched back to reading tracepoints and wasn't meant to
be about syscalls there. I didn't articulate that properly.

As you can tell, I was a little upset at being called stupid this
morning. If you took that statement to mean about syscalls, I understand
your calling me stupid, and I apologize for being a bit snippy. I like
to keep LKML discussions technical and not something for name calling.

-- Steve



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 18:50             ` Linus Torvalds
@ 2012-01-18  9:39               ` Ingo Molnar
  0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-18  9:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Steven Rostedt, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Jan 16, 2012 at 7:10 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > But do we really need to keep the old tracepoint? IOW, what 
> > if we simply rename it and add more info?
> 
> Quite frankly, unless somebody can point to something that 
> breaks, I'd rather just change the existing one.
> 
> Nobody outside of a few special cases uses tracepoints. 
> *nobody*.  The only apps I have ever seen that matters to 
> anybody ends up being latencytop and powertop. If those two 
> have been tested and don't care, I don't think we should care.

Correct. (There's also sysprof and perf - both should be fine.)

As i said in my very first mail:

> [...] Which apps/tools rely on the old tracepoint? If it's 
> exactly zero apps then we might be able to change it, but this 
> needs to be investigated.

I resisted Steve's "this ABI change is safe by design" notion 
which is somewhat of a disease. It is probably fine but not by 
definition.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 14:04                   ` Oleg Nesterov
@ 2012-01-18 11:59                     ` Ingo Molnar
  0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-18 11:59 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Linus Torvalds, Ingo Molnar, Masami Hiramatsu,
	Seiji Aguchi, linux-kernel, Masami Hiramatsu


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 01/17, Ingo Molnar wrote:
> >
> > Any tool that requests the signal trace event, and copies 
> > the full (and now larger) record it got in the ring-buffer, 
> > without expanding the target record's size accordingly will 
> > *BREAK*.
> >
> > I do not claim that tools will break in practice - i'm 
> > raising the *possibility* out of caution and i'm frustrated 
> > that you *STILL* don't understand how ABIs are maintained in 
> > Linux.
> 
> OK, but what if we rename the tracepoint?
> 
> IOW, add the new tracepoint and remove the old one. Of course, 
> this can confuse the users of the current "signal_generate", 
> but this is safe. b413d48a does this...
> 
> Or this is not allowed too?

Everything is allowed that makes sense and does not break apps, 
with a strong preference towards the simplest possible variant.

I.e. your patch.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-17 14:37                   ` Steven Rostedt
  2012-01-17 14:55                     ` Oleg Nesterov
@ 2012-01-20 18:01                     ` Jason Baron
  1 sibling, 0 replies; 27+ messages in thread
From: Jason Baron @ 2012-01-20 18:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ingo Molnar, Oleg Nesterov, Linus Torvalds, Ingo Molnar,
	Masami Hiramatsu, Seiji Aguchi, linux-kernel, Masami Hiramatsu

On Tue, Jan 17, 2012 at 09:37:49AM -0500, Steven Rostedt wrote:
> On Tue, 2012-01-17 at 13:40 +0100, Ingo Molnar wrote:
> > * Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Any tool that requests the signal trace event, and copies the 
> > full (and now larger) record it got in the ring-buffer, without 
> > expanding the target record's size accordingly will *BREAK*. 
> 
> I'm curious to where it gets the size?
> 
> This is not like the kernel writing to a pointer in userspace memory,
> where it can indeed break code by writing too much. This is the
> userspace program writing from a shared memory location.
> 
> 
> > 
> > I do not claim that tools will break in practice - i'm raising 
> > the *possibility* out of caution and i'm frustrated that you 
> > *STILL* don't understand how ABIs are maintained in Linux. 
> 
> You are defending code that would do:
> 
> 	size = read_size(ring_buffer_event);
> 	memcpy(data, buffer, size);
> 
> over code that would most likely do:
> 
> 	memcpy(data, buffer, sizeof(*data));
> 
> ???
> 
> According to this logic, we should never increase the size
> of /proc/stat, because someone might do:
> 
> 	i = 0;
> 	fd = open("/proc/stat", O_RDONLY);
> 	do {
> 		r = read(fd, buff+i, BUFSIZ);
> 		i += r;
> 	} while (r > 0);
> 
> 
> 
> > 
> > You arguing about defined semantics is *MEANINGLESS*. What 
> > matters is what the apps do in practice.
> 
> Exactly, to depend on the ring buffer size to do all copies to fixed
> size data structures seems to be backwards to what would be done in
> practice.
> 
> 
> >  If the apps we know 
> > about do it robustly and adapt (or don't care) about the 
> > expansion, and if no-one reports a regression in tools we don't 
> > know about, then it's probably fine.
> 
> It's not about robustness, it's about the easy way to copy.
> 
> 	memcpy(data, buffer, sizeof(*data));
> 
> wont break.
> 
> 
> > But your argument that expansion is somehow part of the ABI is 
> > patently false and misses the point. Seeing your arguments make 
> > me *very* nervous about applying any ABI affecting patch from 
> > you.
> 
> Well you already think I'm stupid, I wont change the ABI anymore.
> Obviously I know nothing, because I created a flexible interface that's
> not used by anything except perf and trace-cmd, but because there's no
> library, we are stuck with fixed tracepoints, which will come to haunt
> us in the not so distant future.
> 
> This will bloat the kernel. Tracepoints are not free. They bloat the
> kernel's text section. Every tracepoint still adds a bit of code in the
> "unlikely" part inlined where they are called. So they do have an affect
> on icache, as well as the code to process the tracepoint (around 5k per
> tracepoint).
> 

Right, with the jump label optimization, the 'unlikely' branch is
usually moved to the end of the function, with only a single no-op in
the hot-path. However, with gcc enhancement the unlikely label could
be labeled something like 'cold', and moved either further out-of-line.
Its a potential improvement for jump labels, that I need to look into.

Thanks,

-Jason


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] tracing: make signal tracepoints more useful
  2012-01-13 18:20 ` [GIT PULL] " Oleg Nesterov
  2012-01-15 18:24   ` Oleg Nesterov
@ 2012-01-26 10:10   ` Ingo Molnar
  1 sibling, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2012-01-26 10:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Linus Torvalds, Ingo Molnar, Masami Hiramatsu, Seiji Aguchi,
	Steven Rostedt, linux-kernel


* Oleg Nesterov <oleg@redhat.com> wrote:

> Hello,
> 
> Please pull from
> 
> 	git://github.com/utrace/linux sigtrace
...
> Oleg Nesterov (2):
>       tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info
>       tracing: send_sigqueue() needs trace_signal_generate() too
> 
>  include/trace/events/signal.h |   85 +++++++++++------------------------------
>  kernel/signal.c               |   28 +++++++++----
>  2 files changed, 41 insertions(+), 72 deletions(-)

Pulled into perf/core, thanks Oleg!

(Btw., you might want to set up a tree on korg, it's generally 
more trusted than github.com.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-01-26 10:10 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-10 17:45 [PATCH 0/2] tracing: make signal tracepoints more useful Oleg Nesterov
2012-01-10 17:45 ` [PATCH 1/2] tracing: let trace_signal_generate() report more info, kill overflow_fail/lose_info Oleg Nesterov
2012-01-10 17:45 ` [PATCH 2/2] tracing: send_sigqueue() needs trace_signal_generate() too Oleg Nesterov
2012-01-10 18:59 ` [PATCH 0/2] tracing: make signal tracepoints more useful Steven Rostedt
2012-01-10 21:09 ` Seiji Aguchi
2012-01-13 18:20 ` [GIT PULL] " Oleg Nesterov
2012-01-15 18:24   ` Oleg Nesterov
2012-01-16  7:45     ` Ingo Molnar
2012-01-16 12:31       ` Steven Rostedt
2012-01-16 12:53         ` Ingo Molnar
2012-01-16 15:10           ` Oleg Nesterov
2012-01-17 18:50             ` Linus Torvalds
2012-01-18  9:39               ` Ingo Molnar
2012-01-16 15:52           ` Steven Rostedt
2012-01-17 10:02             ` Ingo Molnar
2012-01-17 12:03               ` Steven Rostedt
2012-01-17 12:40                 ` Ingo Molnar
2012-01-17 14:04                   ` Oleg Nesterov
2012-01-18 11:59                     ` Ingo Molnar
2012-01-17 14:37                   ` Steven Rostedt
2012-01-17 14:55                     ` Oleg Nesterov
2012-01-20 18:01                     ` Jason Baron
2012-01-17 19:52               ` Steven Rostedt
2012-01-16 15:03       ` Oleg Nesterov
2012-01-16 15:42         ` Seiji Aguchi
2012-01-16 15:58         ` Steven Rostedt
2012-01-26 10:10   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).