linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Status of tip/x86/apic
@ 2014-12-12 20:35 Thomas Gleixner
  2014-12-12 21:45 ` Borislav Petkov
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Thomas Gleixner @ 2014-12-12 20:35 UTC (permalink / raw)
  To: LKML
  Cc: Jiang Liu, x86, Linus Torvalds, Andrew Morton, Bjorn Helgaas,
	Tony Luck, Borislav Petkov, Joerg Roedel, Marc Zyngier,
	Steven Rostedt, Yinghai Lu, Alex Williamson

Folks,

after mulling this in my head for quite some time, I'm going to
postpone the whole thing for 3.20.

That said, I need to say, that I'm really happy with the outcome of
this massive overhaul. I really want to thank all involved people,
especially Jiang, for their great work and help so far!!!

The hierarchical irq domains really improve the code by distangling
the various subsystems and the arm[64] use cases just prove that it
was the right decision.

We're almost there with x86 but my gut feeling tells me that pushing
it now is too risky. I rather prefer quiet holidays for all of us than
the nagging fear that the post holiday inbox will be full of obscure
bug reports and we then start a chase and bandaid race which will kill
the well earned recreation in an instant.

This will block other things in that area for a while, but it's the
only sane decision at the moment, unless Linus insists on pulling the
lot and promises to deal with the fallout. :)

The reasons why I decided to do so are:

 - The bugs we found in the last week. That tells me that there is
   some more stuff lurking.

 - The already existing mess in a some areas which got unearthed by
   this work in the last week. That definitely needs a thorough
   cleanup and not some more bandaids.

 - Lack of proper debugging features. Sending out per issue debug
   patches simply does not scale.

 - It's not bisectable and unfortunately there are too many fixes to
   various places to make manual bisection feasible.

For 3.20 I want to proceed in the following way:

 - Apply all bug fixes to x86/apic

 - Address the issues with the resource management (and elsewhere)
   proper on top

 - Add a proper debugging mechanism (the existing irqdomain debugfs
   interface is completely useless).

   For the hierarchical domains we really want two things:

   1) A debugfs interface which lets us introspect the hierarchy.

      I was working on that before I got dragged into bug chasing and
      merge window frenzy.

      For proper introspection down to the hardware level this
      requires either domain/irq_chip specific callbacks or some
      unified way to track the current state. The latter is painful as
      it requires to store information redundantly.

      So having domain/chip callbacks to retrieve the state is the
      right solution. Most chip/domain implementations cache their
      [hardware] state already, so providing an accessor to convert
      that into a common data format is the best way. If the callback
      is not implemented then the information is not available or
      maybe not relevant.

      I'm not going to have a per domain/chip seqfile print function
      as this is just a complete waste. Pretty printing obscure
      hardware information does not help much for the general user. We
      rather have the raw data and proper post processing tools which
      can provide that pretty print information than bloating the
      kernel binary with randomized and possibly useless seq_print
      functions.

      Another reason why I want just raw binary data is that I want to
      use exactly the same mechanism for tracing. See below.

      After looking at the various new domain/chip implementations its
      sufficient to have 16 bytes of storage space for this, but
      that's a minor detail.

      To provide a proper translation into pretty printed values we
      can do the following:

       Create a new section for storing such data and have a data
       structure there which describes the content of the buffer. That
       section goes into a seperate file and not linked into the
       kernel binary. Simple enough for tools to pick up and for bug
       reporters to use/provide. If the stupid file is not available
       we still can recreate it from source and translate the hex
       dump. And in the most cases the pure hexdump will be sufficient
       for the people who need actually to look at this.

   2) Proper trace point support so we can actually track allocation
      and the hardware access at the various domain levels because
      some of these issues cannot be decoded by looking at a state
      snapshot in debugfs. With some of them we even can't access
      debugfs at all.

      Though one issue with that is, that for the early boot process
      there is no way to store that information as the tracer gets
      enabled way after init_IRQ(). But there is no reason why the
      tracer could not be enabled before that. All it needs is a
      working memory allocator. Steven?

      Now there is another class of problems which might be hard to
      debug. When the machine just boots into a hang, so we dont get a
      ftrace output neither from an oops nor from a console. It would
      be nice if we could have a command line option which prints
      enabled trace points via (early_)printk. That would avoid
      sending out ad hoc printk debug patches which will basically
      provide the same information as the trace_points. That would be
      useful for other hard to debug boot hangs as well. Steven?
      
      I think the above can be solved, so we need to agree on a proper
      set of tracepoints. I came up with the following list:

      - trace_irqdomain_create(domain->id, domain->name, ...)
      - trace_irqdomain_destroy(domain->id)
      
      - trace_irqdomain_alloc(irq_data)

      	struct irq_data contains all relevant information for
      	assigning the tracepoint data.
	
		__entry->virq		= irq_data->virq;
		__entry->domainid	= irq_data->domain;
		__entry->hwirq		= irq_data->hwirq;
		TP_STORE_DATA(__entry->data, irq_data);

	Where TP_STORE_DATA checks for the above callback and uses it
	if available, otherwise we just clear the data field.
      
        So this reuses the callback which we want for debugfs
        anyway. The print format is just hexdump. See my above
        rationale for that.

      - trace_irqdomain_free(virq, domain->id)
	  
      - trace_irqdomain_hw_access(irqdata)

	Same "data" and pretty printing argument as for
	trace_irqdomain_alloc()

	The obvious place to put such a trace point is
	e.g. irq_chip_write_msi_msg() where the callback records the
	currently written msi msg.

Once we have sorted that, I'll push x86/apic into a seperate git
repository so the history is preserved.

After that I'll redo x86/apic from scratch with proper ordering and
all fixes folded to the right places so the whole thing becomes
bisectable.

Thoughts?

Thanks,

	Thomas

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Status of tip/x86/apic
  2014-12-12 20:35 Status of tip/x86/apic Thomas Gleixner
@ 2014-12-12 21:45 ` Borislav Petkov
  2014-12-12 23:02 ` Linus Torvalds
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2014-12-12 21:45 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, x86, Linus Torvalds, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Joerg Roedel, Marc Zyngier,
	Steven Rostedt, Yinghai Lu, Alex Williamson

On Fri, Dec 12, 2014 at 09:35:14PM +0100, Thomas Gleixner wrote:
>       To provide a proper translation into pretty printed values we
>       can do the following:
> 
>        Create a new section for storing such data and have a data
>        structure there which describes the content of the buffer. That
>        section goes into a seperate file and not linked into the
>        kernel binary. Simple enough for tools to pick up and for bug
>        reporters to use/provide. If the stupid file is not available
>        we still can recreate it from source and translate the hex
>        dump.

So maybe we don't need to add that section at all but the script which
parses the buffer should recreate that file simply.

> 	And in the most cases the pure hexdump will be sufficient
>        for the people who need actually to look at this.
> 
>    2) Proper trace point support so we can actually track allocation
>       and the hardware access at the various domain levels because
>       some of these issues cannot be decoded by looking at a state
>       snapshot in debugfs. With some of them we even can't access
>       debugfs at all.
> 
>       Though one issue with that is, that for the early boot process
>       there is no way to store that information as the tracer gets
>       enabled way after init_IRQ(). But there is no reason why the
>       tracer could not be enabled before that. All it needs is a
>       working memory allocator. Steven?
> 
>       Now there is another class of problems which might be hard to
>       debug. When the machine just boots into a hang, so we dont get a
>       ftrace output neither from an oops nor from a console. It would
>       be nice if we could have a command line option which prints
>       enabled trace points via (early_)printk. That would avoid
>       sending out ad hoc printk debug patches which will basically
>       provide the same information as the trace_points. That would be
>       useful for other hard to debug boot hangs as well. Steven?

Actually, I've been thinking about how a such functionality will be
very much useful for tracing other stuff too. Enable tracepoints on the
cmdline and then catch output on serial console. This would be very cool
feature for general debugging too.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Status of tip/x86/apic
  2014-12-12 20:35 Status of tip/x86/apic Thomas Gleixner
  2014-12-12 21:45 ` Borislav Petkov
@ 2014-12-12 23:02 ` Linus Torvalds
  2014-12-12 23:14 ` Steven Rostedt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Linus Torvalds @ 2014-12-12 23:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, the arch/x86 maintainers, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Borislav Petkov, Joerg Roedel,
	Marc Zyngier, Steven Rostedt, Yinghai Lu, Alex Williamson

On Fri, Dec 12, 2014 at 12:35 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>
> This will block other things in that area for a while, but it's the
> only sane decision at the moment, unless Linus insists on pulling the
> lot and promises to deal with the fallout. :)

Heh, no. I'll happily vote for a calm xmas season.

                    Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Status of tip/x86/apic
  2014-12-12 20:35 Status of tip/x86/apic Thomas Gleixner
  2014-12-12 21:45 ` Borislav Petkov
  2014-12-12 23:02 ` Linus Torvalds
@ 2014-12-12 23:14 ` Steven Rostedt
  2014-12-13  5:48   ` [RFC PATCH 0/2] tracing: The Grinch who stole the stealing of Christmas Steven Rostedt
  2014-12-14 10:57 ` Status of tip/x86/apic Jiang Liu
  2014-12-15 15:52 ` Steven Rostedt
  4 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2014-12-12 23:14 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, x86, Linus Torvalds, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Borislav Petkov, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson

On Fri, 12 Dec 2014 21:35:14 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:


> We're almost there with x86 but my gut feeling tells me that pushing
> it now is too risky. I rather prefer quiet holidays for all of us than
> the nagging fear that the post holiday inbox will be full of obscure
> bug reports and we then start a chase and bandaid race which will kill
> the well earned recreation in an instant.


> 
>       Though one issue with that is, that for the early boot process
>       there is no way to store that information as the tracer gets
>       enabled way after init_IRQ(). But there is no reason why the
>       tracer could not be enabled before that. All it needs is a
>       working memory allocator. Steven?
> 
>       Now there is another class of problems which might be hard to
>       debug. When the machine just boots into a hang, so we dont get a
>       ftrace output neither from an oops nor from a console. It would
>       be nice if we could have a command line option which prints
>       enabled trace points via (early_)printk. That would avoid
>       sending out ad hoc printk debug patches which will basically
>       provide the same information as the trace_points. That would be
>       useful for other hard to debug boot hangs as well. Steven?

Sure sure, everyone gets a nice calm xmas except for poor Steven who
has to hack on early tracepoints such that this will be ready for 3.20!

-- Steve (The Grinch who Hacked on Christmas)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 0/2] tracing: The Grinch who stole the stealing of Christmas
  2014-12-12 23:14 ` Steven Rostedt
@ 2014-12-13  5:48   ` Steven Rostedt
  2014-12-13  5:49     ` [RFC PATCH 1/2] tracing: Move enabling tracepoints to just after mm_init() Steven Rostedt
  2014-12-13  5:50     ` [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline Steven Rostedt
  0 siblings, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2014-12-13  5:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, x86, Linus Torvalds, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Borislav Petkov, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson

On Fri, 12 Dec 2014 18:14:20 -0500
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Fri, 12 Dec 2014 21:35:14 +0100 (CET)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> 
> > We're almost there with x86 but my gut feeling tells me that pushing
> > it now is too risky. I rather prefer quiet holidays for all of us than
> > the nagging fear that the post holiday inbox will be full of obscure
> > bug reports and we then start a chase and bandaid race which will kill
> > the well earned recreation in an instant.
> 
> 
> > 
> >       Though one issue with that is, that for the early boot process
> >       there is no way to store that information as the tracer gets
> >       enabled way after init_IRQ(). But there is no reason why the
> >       tracer could not be enabled before that. All it needs is a
> >       working memory allocator. Steven?
> > 
> >       Now there is another class of problems which might be hard to
> >       debug. When the machine just boots into a hang, so we dont get a
> >       ftrace output neither from an oops nor from a console. It would
> >       be nice if we could have a command line option which prints
> >       enabled trace points via (early_)printk. That would avoid
> >       sending out ad hoc printk debug patches which will basically
> >       provide the same information as the trace_points. That would be
> >       useful for other hard to debug boot hangs as well. Steven?
> 
> Sure sure, everyone gets a nice calm xmas except for poor Steven who
> has to hack on early tracepoints such that this will be ready for 3.20!
> 
> -- Steve (The Grinch who Hacked on Christmas)

I guess I can enjoy my Holiday.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/2] tracing: Move enabling tracepoints to just after mm_init()
  2014-12-13  5:48   ` [RFC PATCH 0/2] tracing: The Grinch who stole the stealing of Christmas Steven Rostedt
@ 2014-12-13  5:49     ` Steven Rostedt
  2014-12-13  5:50     ` [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline Steven Rostedt
  1 sibling, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2014-12-13  5:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, x86, Linus Torvalds, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Borislav Petkov, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson


Enabling tracepoints at boot up can be very useful. The tracepoint
can be initialized right after memory has been. There's no need to
wait for the early_initcall() to be called. That's too late for some
things that can use tracepoints for debugging. Move the logic to
enable tracepoints out of the initcalls and into init/main.c to
right after mm_init().

This also allows trace_printk() to be used early too.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 include/linux/ftrace.h        |  6 ++++++
 init/main.c                   |  3 +++
 kernel/trace/trace.c          |  8 +++++++-
 kernel/trace/trace.h          | 13 +++++++++++++
 kernel/trace/trace_events.c   | 10 ++++++++--
 kernel/trace/trace_syscalls.c |  3 +--
 6 files changed, 38 insertions(+), 5 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index ed501953f0b2..0fc3e720d4fd 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -39,6 +39,12 @@
 # define FTRACE_FORCE_LIST_FUNC 0
 #endif
 
+/* Main tracing buffer and events set up */
+#ifdef CONFIG_TRACING
+void trace_init(void);
+#else
+static inline trace_init(void) { }
+#endif
 
 struct module;
 struct ftrace_hash;
diff --git a/init/main.c b/init/main.c
index 800a0daede7e..060e60b6aa59 100644
--- a/init/main.c
+++ b/init/main.c
@@ -561,6 +561,9 @@ asmlinkage __visible void __init start_kernel(void)
 	trap_init();
 	mm_init();
 
+	/* trace_printk() and trace points may be used after this */
+	trace_init();
+
 	/*
 	 * Set up the scheduler prior starting any interrupts (such as the
 	 * timer interrupt). Full topology setup happens at smp_init()
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4ceb2546c7ef..ec3ca694665f 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6876,6 +6876,13 @@ out:
 	return ret;
 }
 
+void __init trace_init(void)
+{
+	tracer_alloc_buffers();
+	init_ftrace_syscalls();
+	trace_event_init();	
+}
+
 __init static int clear_boot_tracer(void)
 {
 	/*
@@ -6895,6 +6902,5 @@ __init static int clear_boot_tracer(void)
 	return 0;
 }
 
-early_initcall(tracer_alloc_buffers);
 fs_initcall(tracer_init_debugfs);
 late_initcall(clear_boot_tracer);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 3255dfb054a0..c138c149d6ef 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1301,4 +1301,17 @@ int perf_ftrace_event_register(struct ftrace_event_call *call,
 #define perf_ftrace_event_register NULL
 #endif
 
+#ifdef CONFIG_FTRACE_SYSCALLS
+void init_ftrace_syscalls(void);
+#else
+static inline void init_ftrace_syscalls(void) { }
+#endif
+
+#ifdef CONFIG_EVENT_TRACING
+void trace_event_init(void);
+#else
+static inline void __init trace_event_init(void) { }
+#endif
+
+
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f9d0cbe014b7..fd9deb0e03f0 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2477,8 +2477,14 @@ static __init int event_trace_init(void)
 #endif
 	return 0;
 }
-early_initcall(event_trace_memsetup);
-core_initcall(event_trace_enable);
+
+void __init trace_event_init(void)
+{
+	event_trace_memsetup();
+	init_ftrace_syscalls();
+	event_trace_enable();
+}
+
 fs_initcall(event_trace_init);
 
 #ifdef CONFIG_FTRACE_STARTUP_TEST
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index a72f3d8d813e..542219ea33ed 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -514,7 +514,7 @@ unsigned long __init __weak arch_syscall_addr(int nr)
 	return (unsigned long)sys_call_table[nr];
 }
 
-static int __init init_ftrace_syscalls(void)
+void __init init_ftrace_syscalls(void)
 {
 	struct syscall_metadata *meta;
 	unsigned long addr;
@@ -539,7 +539,6 @@ static int __init init_ftrace_syscalls(void)
 
 	return 0;
 }
-early_initcall(init_ftrace_syscalls);
 
 #ifdef CONFIG_PERF_EVENTS
 
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline
  2014-12-13  5:48   ` [RFC PATCH 0/2] tracing: The Grinch who stole the stealing of Christmas Steven Rostedt
  2014-12-13  5:49     ` [RFC PATCH 1/2] tracing: Move enabling tracepoints to just after mm_init() Steven Rostedt
@ 2014-12-13  5:50     ` Steven Rostedt
  2014-12-13 10:59       ` Borislav Petkov
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2014-12-13  5:50 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, x86, Linus Torvalds, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Borislav Petkov, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson


Add the kernel command line tracepoint_printk option that will
have tracepoints that are active sent to printk().

Passing "tracepoint_printk" will activate this. To turn it off
the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed
into it. Note, this only works if the cmdline option is used.
Echoing 1 into the sysctl file without the cmdline option will
have no affect.

Note, this is a dangerous option. Having high frequency
tracepoints send their data to printk() can possibly cause
a live lock.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 Documentation/kernel-parameters.txt | 18 ++++++++++++++++++
 include/linux/ftrace.h              |  1 +
 kernel/sysctl.c                     |  7 +++++++
 kernel/trace/trace.c                | 17 +++++++++++++++++
 kernel/trace/trace.h                |  1 +
 kernel/trace/trace_events.c         | 32 ++++++++++++++++++++++++++++++++
 6 files changed, 76 insertions(+)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d09eb37c562..d81f464a7358 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3500,6 +3500,24 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 			See also Documentation/trace/ftrace.txt "trace options"
 			section.
 
+	tracepoint_printk[FTRACE]
+			Have the tracepoints sent to printk as well as the
+			tracing ring buffer. This is useful for early boot up
+			where the system hangs or reboots and does not give the
+			option for reading the tracing buffer or performing a
+			ftrace_dump_on_oops.
+
+			To turn off having tracepoints sent to printk,
+			 echo 0 > /proc/sys/kernel/tracepoint_printk
+			Note, echoing 1 into this file without the
+			tracepoint_printk kernel cmdline option has no effect.
+
+			** CAUTION **
+
+			Having tracepoints sent to printk() and activating high
+			frequency tracepoints such as irq or sched, can cause
+			the system to live lock.
+
 	traceoff_on_warning
 			[FTRACE] enable this option to disable tracing when a
 			warning is hit. This turns off "tracing_on". Tracing can
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0fc3e720d4fd..9e20e7c28aab 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -879,6 +879,7 @@ static inline int test_tsk_trace_graph(struct task_struct *tsk)
 enum ftrace_dump_mode;
 
 extern enum ftrace_dump_mode ftrace_dump_on_oops;
+extern int tracepoint_printk;
 
 extern void disable_trace_on_warning(void);
 extern int __disable_trace_on_warning;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4aada6d9fe74..bb50c2187194 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -622,6 +622,13 @@ static struct ctl_table kern_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec,
 	},
+	{
+		.procname	= "tracepoint_printk",
+		.data		= &tracepoint_printk,
+		.maxlen		= sizeof(tracepoint_printk),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 #endif
 #ifdef CONFIG_KEXEC
 	{
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ec3ca694665f..18a00ab4427e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -63,6 +63,10 @@ static bool __read_mostly tracing_selftest_running;
  */
 bool __read_mostly tracing_selftest_disabled;
 
+/* Pipe tracepoints to printk */
+struct trace_iterator *tracepoint_print_iter;
+int tracepoint_printk;
+
 /* For tracers that don't implement custom flags */
 static struct tracer_opt dummy_tracer_opt[] = {
 	{ }
@@ -193,6 +197,13 @@ static int __init set_trace_boot_clock(char *str)
 }
 __setup("trace_clock=", set_trace_boot_clock);
 
+static int __init set_tracepoint_printk(char *str)
+{
+	if ((strcmp(str, "=0") != 0 && strcmp(str, "=off") != 0))
+		tracepoint_printk = 1;
+	return 1;
+}
+__setup("tracepoint_printk", set_tracepoint_printk);
 
 unsigned long long ns2usecs(cycle_t nsec)
 {
@@ -6878,6 +6889,12 @@ out:
 
 void __init trace_init(void)
 {
+	if (tracepoint_printk) {
+		tracepoint_print_iter =
+			kmalloc(sizeof(*tracepoint_print_iter), GFP_KERNEL);
+		if (WARN_ON(!tracepoint_print_iter))
+			tracepoint_printk = 0;
+	}
 	tracer_alloc_buffers();
 	init_ftrace_syscalls();
 	trace_event_init();	
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c138c149d6ef..8de48bac1ce2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1313,5 +1313,6 @@ void trace_event_init(void);
 static inline void __init trace_event_init(void) { }
 #endif
 
+extern struct trace_iterator *tracepoint_print_iter;
 
 #endif /* _LINUX_KERNEL_TRACE_H */
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index fd9deb0e03f0..9f7175a3df71 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -212,8 +212,40 @@ void *ftrace_event_buffer_reserve(struct ftrace_event_buffer *fbuffer,
 }
 EXPORT_SYMBOL_GPL(ftrace_event_buffer_reserve);
 
+static DEFINE_SPINLOCK(tracepoint_iter_lock);
+
+static void output_printk(struct ftrace_event_buffer *fbuffer)
+{
+	struct ftrace_event_call *event_call;
+	struct trace_event *event;
+	unsigned long flags;
+	struct trace_iterator *iter = tracepoint_print_iter;
+
+	if (!iter)
+		return;
+
+	event_call = fbuffer->ftrace_file->event_call;
+	if (!event_call || !event_call->event.funcs ||
+	    !event_call->event.funcs->trace)
+		return;
+
+	event = &fbuffer->ftrace_file->event_call->event;
+
+	spin_lock_irqsave(&tracepoint_iter_lock, flags);
+	trace_seq_init(&iter->seq);
+	iter->ent = fbuffer->entry;
+	event_call->event.funcs->trace(iter, 0, event);
+	trace_seq_putc(&iter->seq, 0);
+	printk("%s", iter->seq.buffer);
+
+	spin_unlock_irqrestore(&tracepoint_iter_lock, flags);
+}
+
 void ftrace_event_buffer_commit(struct ftrace_event_buffer *fbuffer)
 {
+	if (tracepoint_printk)
+		output_printk(fbuffer);
+
 	event_trigger_unlock_commit(fbuffer->ftrace_file, fbuffer->buffer,
 				    fbuffer->event, fbuffer->entry,
 				    fbuffer->flags, fbuffer->pc);
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline
  2014-12-13  5:50     ` [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline Steven Rostedt
@ 2014-12-13 10:59       ` Borislav Petkov
  2014-12-13 13:18         ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2014-12-13 10:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, Jiang Liu, x86, Linus Torvalds,
	Andrew Morton, Bjorn Helgaas, Tony Luck, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson

On Sat, Dec 13, 2014 at 12:50:37AM -0500, Steven Rostedt wrote:
> 
> Add the kernel command line tracepoint_printk option that will
> have tracepoints that are active sent to printk().
> 
> Passing "tracepoint_printk" will activate this. To turn it off
> the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed
> into it. Note, this only works if the cmdline option is used.
> Echoing 1 into the sysctl file without the cmdline option will
> have no affect.
> 
> Note, this is a dangerous option. Having high frequency
> tracepoints send their data to printk() can possibly cause
> a live lock.
> 
> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> ---
>  Documentation/kernel-parameters.txt | 18 ++++++++++++++++++
>  include/linux/ftrace.h              |  1 +
>  kernel/sysctl.c                     |  7 +++++++
>  kernel/trace/trace.c                | 17 +++++++++++++++++
>  kernel/trace/trace.h                |  1 +
>  kernel/trace/trace_events.c         | 32 ++++++++++++++++++++++++++++++++
>  6 files changed, 76 insertions(+)
> 
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 1d09eb37c562..d81f464a7358 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3500,6 +3500,24 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
>  			See also Documentation/trace/ftrace.txt "trace options"
>  			section.
>  
> +	tracepoint_printk[FTRACE]

Damn, this is long and most likely nasty to type on some dingy box when
you're trying to debug stuff. Can we shorten it?

trace_printk
tp_printk
tp_print
tp_pr
...

Last one is my favourite - we should call it ToiletPaper_Print :-)

> +			Have the tracepoints sent to printk as well as the
> +			tracing ring buffer. This is useful for early boot up
> +			where the system hangs or reboots and does not give the
> +			option for reading the tracing buffer or performing a
> +			ftrace_dump_on_oops.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline
  2014-12-13 10:59       ` Borislav Petkov
@ 2014-12-13 13:18         ` Steven Rostedt
  2014-12-13 13:33           ` Borislav Petkov
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2014-12-13 13:18 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, LKML, Jiang Liu, x86, Linus Torvalds,
	Andrew Morton, Bjorn Helgaas, Tony Luck, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson

On Sat, 13 Dec 2014 11:59:25 +0100
Borislav Petkov <bp@alien8.de> wrote:

> On Sat, Dec 13, 2014 at 12:50:37AM -0500, Steven Rostedt wrote:
> > 
> > Add the kernel command line tracepoint_printk option that will
> > have tracepoints that are active sent to printk().
> > 
> > Passing "tracepoint_printk" will activate this. To turn it off
> > the sysctl /proc/sys/kernel/tracepoint_printk can have '0' echoed
> > into it. Note, this only works if the cmdline option is used.
> > Echoing 1 into the sysctl file without the cmdline option will
> > have no affect.
> > 
> > Note, this is a dangerous option. Having high frequency
> > tracepoints send their data to printk() can possibly cause
> > a live lock.
> > 
> > Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
> > 
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> > ---
> >  Documentation/kernel-parameters.txt | 18 ++++++++++++++++++
> >  include/linux/ftrace.h              |  1 +
> >  kernel/sysctl.c                     |  7 +++++++
> >  kernel/trace/trace.c                | 17 +++++++++++++++++
> >  kernel/trace/trace.h                |  1 +
> >  kernel/trace/trace_events.c         | 32 ++++++++++++++++++++++++++++++++
> >  6 files changed, 76 insertions(+)
> > 
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 1d09eb37c562..d81f464a7358 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -3500,6 +3500,24 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> >  			See also Documentation/trace/ftrace.txt "trace options"
> >  			section.
> >  
> > +	tracepoint_printk[FTRACE]
> 
> Damn, this is long and most likely nasty to type on some dingy box when
> you're trying to debug stuff. Can we shorten it?
> 
> trace_printk

No this is already taken.

> tp_printk

Actually, I was thinking about shortening it to tp_printk.

> tp_print
> tp_pr
> ...
> 
> Last one is my favourite - we should call it ToiletPaper_Print :-)

What you need when your kernel is in the crapper.

-- Steve

> 
> > +			Have the tracepoints sent to printk as well as the
> > +			tracing ring buffer. This is useful for early boot up
> > +			where the system hangs or reboots and does not give the
> > +			option for reading the tracing buffer or performing a
> > +			ftrace_dump_on_oops.
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline
  2014-12-13 13:18         ` Steven Rostedt
@ 2014-12-13 13:33           ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2014-12-13 13:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, Jiang Liu, x86, Linus Torvalds,
	Andrew Morton, Bjorn Helgaas, Tony Luck, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson

On Sat, Dec 13, 2014 at 08:18:32AM -0500, Steven Rostedt wrote:
> > tp_printk
> 
> Actually, I was thinking about shortening it to tp_printk.

Yeah, this is probably the best alternative.

> What you need when your kernel is in the crapper.

Haha, yeah.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Status of tip/x86/apic
  2014-12-12 20:35 Status of tip/x86/apic Thomas Gleixner
                   ` (2 preceding siblings ...)
  2014-12-12 23:14 ` Steven Rostedt
@ 2014-12-14 10:57 ` Jiang Liu
  2014-12-15 15:52 ` Steven Rostedt
  4 siblings, 0 replies; 13+ messages in thread
From: Jiang Liu @ 2014-12-14 10:57 UTC (permalink / raw)
  To: Thomas Gleixner, LKML
  Cc: x86, Linus Torvalds, Andrew Morton, Bjorn Helgaas, Tony Luck,
	Borislav Petkov, Joerg Roedel, Marc Zyngier, Steven Rostedt,
	Yinghai Lu, Alex Williamson

On 2014/12/13 4:35, Thomas Gleixner wrote:
> Folks,
> 
> after mulling this in my head for quite some time, I'm going to
> postpone the whole thing for 3.20.
> 
> That said, I need to say, that I'm really happy with the outcome of
> this massive overhaul. I really want to thank all involved people,
> especially Jiang, for their great work and help so far!!!
> 
> The hierarchical irq domains really improve the code by distangling
> the various subsystems and the arm[64] use cases just prove that it
> was the right decision.
> 
> We're almost there with x86 but my gut feeling tells me that pushing
> it now is too risky. I rather prefer quiet holidays for all of us than
> the nagging fear that the post holiday inbox will be full of obscure
> bug reports and we then start a chase and bandaid race which will kill
> the well earned recreation in an instant.
Hi Thomas,
	It's more safe to let it mature for another merge window
in tip tree:)

> 
> This will block other things in that area for a while, but it's the
> only sane decision at the moment, unless Linus insists on pulling the
> lot and promises to deal with the fallout. :)
> 
> The reasons why I decided to do so are:
> 
>  - The bugs we found in the last week. That tells me that there is
>    some more stuff lurking.
> 
>  - The already existing mess in a some areas which got unearthed by
>    this work in the last week. That definitely needs a thorough
>    cleanup and not some more bandaids.
> 
>  - Lack of proper debugging features. Sending out per issue debug
>    patches simply does not scale.
> 
>  - It's not bisectable and unfortunately there are too many fixes to
>    various places to make manual bisection feasible.
> 
> For 3.20 I want to proceed in the following way:
> 
>  - Apply all bug fixes to x86/apic
> 
>  - Address the issues with the resource management (and elsewhere)
>    proper on top
> 
>  - Add a proper debugging mechanism (the existing irqdomain debugfs
>    interface is completely useless).
> 
>    For the hierarchical domains we really want two things:
> 
>    1) A debugfs interface which lets us introspect the hierarchy.
> 
>       I was working on that before I got dragged into bug chasing and
>       merge window frenzy.
> 
>       For proper introspection down to the hardware level this
>       requires either domain/irq_chip specific callbacks or some
>       unified way to track the current state. The latter is painful as
>       it requires to store information redundantly.
> 
>       So having domain/chip callbacks to retrieve the state is the
>       right solution. Most chip/domain implementations cache their
>       [hardware] state already, so providing an accessor to convert
>       that into a common data format is the best way. If the callback
>       is not implemented then the information is not available or
>       maybe not relevant.
> 
>       I'm not going to have a per domain/chip seqfile print function
>       as this is just a complete waste. Pretty printing obscure
>       hardware information does not help much for the general user. We
>       rather have the raw data and proper post processing tools which
>       can provide that pretty print information than bloating the
>       kernel binary with randomized and possibly useless seq_print
>       functions.
> 
>       Another reason why I want just raw binary data is that I want to
>       use exactly the same mechanism for tracing. See below.
> 
>       After looking at the various new domain/chip implementations its
>       sufficient to have 16 bytes of storage space for this, but
>       that's a minor detail.
> 
>       To provide a proper translation into pretty printed values we
>       can do the following:
> 
>        Create a new section for storing such data and have a data
>        structure there which describes the content of the buffer. That
>        section goes into a seperate file and not linked into the
>        kernel binary. Simple enough for tools to pick up and for bug
>        reporters to use/provide. If the stupid file is not available
>        we still can recreate it from source and translate the hex
>        dump. And in the most cases the pure hexdump will be sufficient
>        for the people who need actually to look at this.
> 
>    2) Proper trace point support so we can actually track allocation
>       and the hardware access at the various domain levels because
>       some of these issues cannot be decoded by looking at a state
>       snapshot in debugfs. With some of them we even can't access
>       debugfs at all.
> 
>       Though one issue with that is, that for the early boot process
>       there is no way to store that information as the tracer gets
>       enabled way after init_IRQ(). But there is no reason why the
>       tracer could not be enabled before that. All it needs is a
>       working memory allocator. Steven?
> 
>       Now there is another class of problems which might be hard to
>       debug. When the machine just boots into a hang, so we dont get a
>       ftrace output neither from an oops nor from a console. It would
>       be nice if we could have a command line option which prints
>       enabled trace points via (early_)printk. That would avoid
>       sending out ad hoc printk debug patches which will basically
>       provide the same information as the trace_points. That would be
>       useful for other hard to debug boot hangs as well. Steven?
>       
>       I think the above can be solved, so we need to agree on a proper
>       set of tracepoints. I came up with the following list:
> 
>       - trace_irqdomain_create(domain->id, domain->name, ...)
>       - trace_irqdomain_destroy(domain->id)
>       
>       - trace_irqdomain_alloc(irq_data)
> 
>       	struct irq_data contains all relevant information for
>       	assigning the tracepoint data.
> 	
> 		__entry->virq		= irq_data->virq;
> 		__entry->domainid	= irq_data->domain;
> 		__entry->hwirq		= irq_data->hwirq;
> 		TP_STORE_DATA(__entry->data, irq_data);
> 
> 	Where TP_STORE_DATA checks for the above callback and uses it
> 	if available, otherwise we just clear the data field.
>       
>         So this reuses the callback which we want for debugfs
>         anyway. The print format is just hexdump. See my above
>         rationale for that.
> 
>       - trace_irqdomain_free(virq, domain->id)
> 	  
>       - trace_irqdomain_hw_access(irqdata)
> 
> 	Same "data" and pretty printing argument as for
> 	trace_irqdomain_alloc()
> 
> 	The obvious place to put such a trace point is
> 	e.g. irq_chip_write_msi_msg() where the callback records the
> 	currently written msi msg.
> 
> Once we have sorted that, I'll push x86/apic into a seperate git
> repository so the history is preserved.
> 
> After that I'll redo x86/apic from scratch with proper ordering and
> all fixes folded to the right places so the whole thing becomes
> bisectable.
> 
> Thoughts?
This really sounds a good idea to debug interrupt.

So I will work on following items for 3.20:
1) Continue to convert PCI MSI code into generic MSI code
as much as possible.
2) Simplify interrupt remapping initialization on x86, the first
version has been posted at: https://lkml.org/lkml/2014/12/10/20.
3) Solve new bugs if any:)
Thanks!
Gerry

> 
> Thanks,
> 
> 	Thomas
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Status of tip/x86/apic
  2014-12-12 20:35 Status of tip/x86/apic Thomas Gleixner
                   ` (3 preceding siblings ...)
  2014-12-14 10:57 ` Status of tip/x86/apic Jiang Liu
@ 2014-12-15 15:52 ` Steven Rostedt
  2015-01-02 17:29   ` Mathieu Desnoyers
  4 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2014-12-15 15:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Jiang Liu, x86, Linus Torvalds, Andrew Morton,
	Bjorn Helgaas, Tony Luck, Borislav Petkov, Joerg Roedel,
	Marc Zyngier, Yinghai Lu, Alex Williamson, Mathieu Desnoyers,
	Frederic Weisbecker

On Fri, 12 Dec 2014 21:35:14 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

>    2) Proper trace point support so we can actually track allocation
>       and the hardware access at the various domain levels because
>       some of these issues cannot be decoded by looking at a state
>       snapshot in debugfs. With some of them we even can't access
>       debugfs at all.
> 
>       Though one issue with that is, that for the early boot process
>       there is no way to store that information as the tracer gets
>       enabled way after init_IRQ(). But there is no reason why the
>       tracer could not be enabled before that. All it needs is a
>       working memory allocator. Steven?

And as we found out, we also need working RCU ;-) (but that still
happens before init_IRQ() which is what we want here).


> 
>       Now there is another class of problems which might be hard to
>       debug. When the machine just boots into a hang, so we dont get a
>       ftrace output neither from an oops nor from a console. It would
>       be nice if we could have a command line option which prints
>       enabled trace points via (early_)printk. That would avoid
>       sending out ad hoc printk debug patches which will basically
>       provide the same information as the trace_points. That would be
>       useful for other hard to debug boot hangs as well. Steven?

Agreed and patches have been sent to Linus.

>       
>       I think the above can be solved, so we need to agree on a proper
>       set of tracepoints. I came up with the following list:
> 
>       - trace_irqdomain_create(domain->id, domain->name, ...)

Is that suppose to be a variable number of args? Tracepoints do not
support a variable length number of args passed in. I guess we could
add that, but it wont be for this merge window.

I've added Mathieu and Frederic to the Cc list here.

If we do support this (and if it is needed) we could make it use the
bprintf() infrastructure. It already supports just saving a format and
args directly to the the buffer, and a way to print them again.

tools/lib/traceevent/event-parse.c will need to deal with this. But it
too also already handles trace_bprintk().


>       - trace_irqdomain_destroy(domain->id)
>       
>       - trace_irqdomain_alloc(irq_data)
> 
>       	struct irq_data contains all relevant information for
>       	assigning the tracepoint data.
> 	
> 		__entry->virq		= irq_data->virq;
> 		__entry->domainid	= irq_data->domain;
> 		__entry->hwirq		= irq_data->hwirq;
> 		TP_STORE_DATA(__entry->data, irq_data);
> 
> 	Where TP_STORE_DATA checks for the above callback and uses it
> 	if available, otherwise we just clear the data field.
>       
>         So this reuses the callback which we want for debugfs
>         anyway. The print format is just hexdump. See my above
>         rationale for that.

We could also create a plugin in tools/lib/traceevent that can give us
more than just a hexdump. That is, we have the code in the kernel
source tree but not in the kernel binary.

-- Steve

> 
>       - trace_irqdomain_free(virq, domain->id)
> 	  
>       - trace_irqdomain_hw_access(irqdata)
> 
> 	Same "data" and pretty printing argument as for
> 	trace_irqdomain_alloc()
> 
> 	The obvious place to put such a trace point is
> 	e.g. irq_chip_write_msi_msg() where the callback records the
> 	currently written msi msg.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Status of tip/x86/apic
  2014-12-15 15:52 ` Steven Rostedt
@ 2015-01-02 17:29   ` Mathieu Desnoyers
  0 siblings, 0 replies; 13+ messages in thread
From: Mathieu Desnoyers @ 2015-01-02 17:29 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Thomas Gleixner, LKML, Jiang Liu, x86, Linus Torvalds,
	Andrew Morton, Bjorn Helgaas, Tony Luck, Borislav Petkov,
	Joerg Roedel, Marc Zyngier, Yinghai Lu, Alex Williamson,
	Frederic Weisbecker

----- Original Message -----
> From: "Steven Rostedt" <rostedt@goodmis.org>
> To: "Thomas Gleixner" <tglx@linutronix.de>
> Cc: "LKML" <linux-kernel@vger.kernel.org>, "Jiang Liu" <jiang.liu@linux.intel.com>, x86@kernel.org, "Linus Torvalds"
> <torvalds@linux-foundation.org>, "Andrew Morton" <akpm@linux-foundation.org>, "Bjorn Helgaas" <bhelgaas@google.com>,
> "Tony Luck" <tony.luck@intel.com>, "Borislav Petkov" <bp@alien8.de>, "Joerg Roedel" <joro@8bytes.org>, "Marc
> Zyngier" <marc.zyngier@arm.com>, "Yinghai Lu" <yinghai@kernel.org>, "Alex Williamson" <alex.williamson@redhat.com>,
> "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>, "Frederic Weisbecker" <fweisbec@gmail.com>
> Sent: Monday, December 15, 2014 10:52:01 AM
> Subject: Re: Status of tip/x86/apic
> 
[...]
> >       
> >       I think the above can be solved, so we need to agree on a proper
> >       set of tracepoints. I came up with the following list:
> > 
> >       - trace_irqdomain_create(domain->id, domain->name, ...)
> 
> Is that suppose to be a variable number of args? Tracepoints do not
> support a variable length number of args passed in. I guess we could
> add that, but it wont be for this merge window.
> 
> I've added Mathieu and Frederic to the Cc list here.

Hi Steven,

Let's wait and see if it's really required first.

FWIW, at the user-space level in LTTng-UST, we have two distinct ways to
do static instrumentation:

  * tracepoint(): similar to those within the Linux kernel, except that the
                  tracepoint is wrapped in a define, so rather than calling:
                    trace_foo(arg1, arg2);
                  users call:
                    tracepoint(foo, arg1, arg2);

                  Which allows skipping over evaluation of "arg1" and "arg2"
                  even if they have side-effects when the tracepoint is disabled.

  * tracef(): I also added a "tracef()" macro, provides a programmer interface
              very similar to printf(), but prints the pretty-printed into the
              trace buffers. It can be enabled dynamically similarly to tracepoints,
              but does not have per-site event names attached. They are either all
              enabled or disabled, and meant mainly for adding temporary debugging
              trace statements.

So far, the feedback I got from end users seemed to split static
instrumentation use-cases in two major categories:

1) Instrumentation added into the code base, well structured (tracepoints),
   meant to be deployed with the application for in-production use.
   They need to be low-overhead,
2) Very quick (and dirty) instrumentation, meant for one-off use while
   in development. IOW, a replacement to printf(), with which people are
   already familiar. Low-overhead still matters, but not as much as it does
   for (1).

This is why we only implemented var arg support in tracef() so far.

> 
> If we do support this (and if it is needed) we could make it use the
> bprintf() infrastructure. It already supports just saving a format and
> args directly to the the buffer, and a way to print them again.

Happy new year :)

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-01-02 17:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-12 20:35 Status of tip/x86/apic Thomas Gleixner
2014-12-12 21:45 ` Borislav Petkov
2014-12-12 23:02 ` Linus Torvalds
2014-12-12 23:14 ` Steven Rostedt
2014-12-13  5:48   ` [RFC PATCH 0/2] tracing: The Grinch who stole the stealing of Christmas Steven Rostedt
2014-12-13  5:49     ` [RFC PATCH 1/2] tracing: Move enabling tracepoints to just after mm_init() Steven Rostedt
2014-12-13  5:50     ` [RFC PATCH 2/2] tracing: Add tracepoint_printk cmdline Steven Rostedt
2014-12-13 10:59       ` Borislav Petkov
2014-12-13 13:18         ` Steven Rostedt
2014-12-13 13:33           ` Borislav Petkov
2014-12-14 10:57 ` Status of tip/x86/apic Jiang Liu
2014-12-15 15:52 ` Steven Rostedt
2015-01-02 17:29   ` Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).