linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] trace,x86: add x86 irq vector tracepoints
@ 2012-09-22  0:40 Seiji Aguchi
  2012-09-25  3:38 ` H. Peter Anvin
  0 siblings, 1 reply; 15+ messages in thread
From: Seiji Aguchi @ 2012-09-22  0:40 UTC (permalink / raw)
  To: H. Peter Anvin, Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, rostedt, 'mingo@elte.hu' (mingo@elte.hu),
	x86
  Cc: dle-develop, Satoru Moriya

Change log 
 v3 -> v4
 - Add a latency measurement of each tracepoint
 - Rebased to 3.6-rc6

 v2 -> v3
 - Remove an invalidate_tlb_vector event because it was replaced by a call function vector
   in a following commit.
   http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=52aec3308db85f4e9f5c8b9f5dc4fbd0138c6fa4

 v1 -> v2
 - Modify variable name from irq to vector.
 - Merge arch-specific tracepoints below to an arch_irq_vector_entry/exit.
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.

http://www.spinics.net/lists/mm-commits/msg85707.html

<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled.  They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.

There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers.  Tracing such events gives
us the information about IRQ interaction with other system events.

The trace also tells where the system is spending its time.  We want to
know which cores are handling interrupts and how they are affecting other
processes in the system.  Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>

On the other hand, my usecase is tracing just local timer event and 
getting a value of instruction pointer.

  I suggested to add an argument local timer event to get instruction pointer before.
  But there is another way to get it with external module like systemtap.
  So, I don't need to add any argument to irq vector tracepoints now.

Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.

This patch modifies Vaibhav's one as follows.
 - Separate generic, and across-architecture tracepoints to enable independently.
   - nmi_vector
   - local_timer_vector
   - reschedule_vector
   - call_function_vector
   - call_function_single_vector 
   - irq_work_entry_vector

 - Rename architecture-specific tracepoints from irq_vector_entry/exit to 
   arch_irq_vector_entry/exit.
   - error_apic_vector
   - thermal_apic_vector
   - threshold_apic_vector
   - spurious_apic_vector
   - x86_platform_ipi_vector

   Those x86 specific ones are not really frequently raised vectors, so
   enabling them all won't affect performance and readability of the
   traces too much.

 Latencies of each event are measured by enabling a latency tracer of ftrace because some of tracepoints are 
 in performace critical pathes. 
 In my machine, all tracepoints are an usec order in case where they are enabled.

 CPU: Xeon X3430  @ 2.40GHz
 kernel: 3.6-rc6
 arch: x86_64

(1) local timer

(1-1) local_timer_entry

 - 3.6-rc6 original
   	<...>-2788    2dNh. 894834337us : exit_idle <-smp_apic_timer_interrupt
   	<...>-2788    2dNh. 894834337us : hrtimer_interrupt <-smp_apic_timer_interrupt

 - 3.6-rc6 + this patch + trace off
   	<...>-1981    0d.h. 210538us : exit_idle <-smp_apic_timer_interrupt
   	<...>-1981    0d.h. 210538us : hrtimer_interrupt <-smp_apic_timer_interrupt

 - 3.6-rc6 + this patch + trace on
   	cat-3158    0d.h. 169673us : exit_idle <-smp_apic_timer_interrupt
    	*cat-3158    0d.h. 169674us : local_timer_entry: vector=239*
     	cat-3158    0d.h. 169674us : hrtimer_interrupt <-smp_apic_timer_interrupt

(1-2) local_timer_exit

 - 3.6-rc6 original
   	<...>-2788    2dNh. 894834361us : lapic_next_event <-clockevents_program_event
   	<...>-2788    2dNh. 894834361us : irq_exit <-smp_apic_timer_interrupt

 - 3.6-rc6 + this patch + trace off
   	<...>-1981    0d.h. 210550us : lapic_next_event <-clockevents_program_event
   	<...>-1981    0d.h. 210550us : irq_exit <-smp_apic_timer_interrupt

 - 3.6-rc6 + this patch + trace on
     	cat-3158    0d.h. 169695us : lapic_next_event <-clockevents_program_event
    	*cat-3158    0d.h. 169695us : local_timer_exit: vector=239*
     	cat-3158    0d.h. 169695us : irq_exit <-smp_apic_timer_interrupt

(2) reschedule
(2-1) reschedule_entry

 - 3.6-rc6 original
  	<idle>-0       2d... 897118426us : smp_reschedule_interrupt <-reschedule_interrupt
  	<idle>-0       2d... 897118426us : scheduler_ipi <-smp_reschedule_interrupt

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d... 212565us : smp_reschedule_interrupt <-reschedule_interrupt
  	<idle>-0       0d... 212565us : scheduler_ipi <-smp_reschedule_interrupt

 - 3.6-rc6 + this patch + trace on
  	<idle>-0       0d... 181656us : smp_reschedule_interrupt <-reschedule_interrupt
 	*<idle>-0       0d... 181656us : reschedule_entry: vector=253*
  	<idle>-0       0d... 181657us : scheduler_ipi <-smp_reschedule_interrupt

(2-2) reschedule_exit

 - 3.6-rc6 original
  	<idle>-0       2d... 897118464us : rcu_idle_enter_common <-rcu_irq_exit
  	<idle>-0       2d... 897118465us : trace_hardirqs_on_thunk <-restore_args

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d... 212703us : rcu_idle_enter_common <-rcu_irq_exit
  	<idle>-0       0d... 212703us : trace_hardirqs_on_thunk <-restore_args

 - 3.6-rc6 + this patch + trace on
  	<idle>-0       0d... 182711us : rcu_idle_enter_common <-rcu_irq_exit
 	*<idle>-0       0d... 182712us : reschedule_exit: vector=253*
  	<idle>-0       0d... 182712us : trace_hardirqs_on_thunk <-restore_args

(3) call_function 

 - 3.6-rc6 original
  	<idle>-0       0dNs. 158960us : __local_bh_enable <-_local_bh_enable
  	<idle>-0       0dNh. 158960us : generic_smp_call_function_interrupt <-smp_call_function_interrupt
  	<idle>-0       0dNh. 158961us : irq_exit <-smp_call_function_interrupt

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d.s. 209117us : __local_bh_enable <-_local_bh_enable
  	<idle>-0       0d.h. 209117us : generic_smp_call_function_interrupt <-smp_call_function_interrupt
  	<idle>-0       0d.h. 209117us : irq_exit <-smp_call_function_interrupt

 - 3.6-rc6 + trace on
	<idle>-0       0d.s. 213068us : __local_bh_enable <-_local_bh_enable
	*<idle>-0       0d.h. 213069us : call_function_entry: vector=252*
	<idle>-0       0d.h. 213069us : generic_smp_call_function_interrupt <-smp_call_function_interrupt
	*<idle>-0       0d.h. 213069us : call_function_exit: vector=252*
	<idle>-0       0d.h. 213070us : irq_exit <-smp_call_function_interrupt

(4) call_function_single

 - 3.6-rc6 original
  	<idle>-0       0dNs. 158963us : __local_bh_enable <-_local_bh_enable
  	<idle>-0       0dNh. 158963us : generic_smp_call_function_single_interrupt <-smp_call_function_single_interrupt
  	<idle>-0       0dNh. 158963us : _raw_spin_lock <-generic_smp_call_function_single_interrupt
  	<idle>-0       0dNh. 158963us : irq_exit <-smp_call_function_single_interrupt

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d.s. 209117us : __local_bh_enable <-_local_bh_enable
  	<idle>-0       0d.h. 209117us : generic_smp_call_function_interrupt <-smp_call_function_interrupt
  	<idle>-0       0d.h. 209117us : irq_exit <-smp_call_function_interrupt

 - 3.6-rc6 + this patch + trace on
  	<idle>-0       0d.s. 213095us : __local_bh_enable <-_local_bh_enable
  	*<idle>-0       0d.h. 213095us : call_function_single_entry: vector=251*
  	<idle>-0       0d.h. 213095us : generic_smp_call_function_single_interrupt <-smp_call_function_single_interrupt
  	<idle>-0       0d.h. 213095us : _raw_spin_lock <-generic_smp_call_function_single_interrupt
  	*<idle>-0       0d.h. 213095us : call_function_single_exit: vector=251*
  	<idle>-0       0d.h. 213096us : irq_exit <-smp_call_function_single_interrupt

(5) irq_work

 - 3.6-rc6 original
  	<idle>-0       0dNs. 158976us : __local_bh_enable <-_local_bh_enable
  	<idle>-0       0dNh. 158976us : irq_exit <-smp_irq_work_interrupt

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d.s. 209163us : __local_bh_enable <-_local_bh_enable
  	<idle>-0       0d.h. 209163us : irq_exit <-smp_irq_work_interrupt

 - 3.6-rc6 + this patch + trace on
  	<idle>-0       0dNs. 181414us : __local_bh_enable <-_local_bh_enable
 	*<idle>-0       0dNh. 181415us : irq_work_entry: vector=246*
 	*<idle>-0       0dNh. 181415us : irq_work_exit: vector=246*
  	<idle>-0       0dNh. 181416us : irq_exit <-smp_irq_work_interrupt

(6) nmi

 - 3.6-rc6 original
   	<...>-2584    0d... 54448291us : trace_hardirqs_on_caller <-sysret_check
   	<...>-2584    0d.h. 54448303us : local_clock <-perf_event_update_userpage
   	<...>-2584    0d.h. 54448303us : intel_pmu_enable_all <-intel_pmu_handle_irq
   	<...>-2584    0d.h. 54448304us : intel_pmu_pebs_enable_all <-intel_pmu_enable_all
   	<...>-2584    0d.h. 54448304us : intel_pmu_lbr_enable_all <-intel_pmu_enable_all
   	<...>-2584    0d.h. 54448304us : arch_trigger_all_cpu_backtrace_handler <-nmi_handle
   	<...>-2584    0d.h. 54448304us : rcu_nmi_exit <-do_nmi

 - 3.6-rc6 + this patch + trace off
	<...>-2581    1d.h. 46380362us : local_clock <-perf_event_update_userpage
	<...>-2581    1d.h. 46380363us : intel_pmu_enable_all <-intel_pmu_handle_irq
	<...>-2581    1d.h. 46380363us : intel_pmu_pebs_enable_all <-intel_pmu_enable_all
	<...>-2581    1d.h. 46380363us : intel_pmu_lbr_enable_all <-intel_pmu_enable_all
	<...>-2581    1d.h. 46380364us : arch_trigger_all_cpu_backtrace_handler <-nmi_handle
	<...>-2581    1d.h. 46380364us : rcu_nmi_exit <-do_nmi


 - 3.6-rc6 + this patch + trace on
	grep-2619    1d... 176854482us : trace_hardirqs_on_caller <-sysret_check
   	*grep-2619    1d.h. 176854528us : nmi_entry: vector=2*
    	grep-2619    1d.h. 176854531us : local_clock <-perf_event_update_userpage
   	grep-2619    1d.h. 176854531us : intel_pmu_enable_all <-intel_pmu_handle_irq
    	grep-2619    1d.h. 176854531us : intel_pmu_pebs_enable_all <-intel_pmu_enable_all
    	grep-2619    1d.h. 176854532us : intel_pmu_lbr_enable_all <-intel_pmu_enable_all
    	grep-2619    1d.h. 176854532us : arch_trigger_all_cpu_backtrace_handler <-nmi_handle
   	*grep-2619    1d.h. 176854532us : nmi_exit: vector=2*
    	grep-2619    1d.h. 176854532us : rcu_nmi_exit <-do_nmi

(7) arch_irq_vector
(7-1) arch_irq_vector_entry

 - 3.6-rc6 original
	<idle>-0       0dNh. 158949us : notifier_call_chain <-atomic_notifier_call_chain
	<idle>-0       0dNh. 158949us : printk <-smp_spurious_interrupt

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d.h. 209125us : notifier_call_chain <-atomic_notifier_call_chain
  	<idle>-0       0d.h. 209125us : printk <-smp_spurious_interrupt

 - 3.6-rc6 + this patch + trace on
  	<idle>-0       0d.h. 213076us : notifier_call_chain <-atomic_notifier_call_chain
 	*<idle>-0       0d.h. 213076us : arch_irq_vector_entry: vector=255 name=SPURIOUS_APIC_VECTOR*
  	<idle>-0       0d.h. 213076us : printk <-smp_spurious_interrupt

(7-2) arch_irq_vector_exit

 - 3.6-rc6 original
	<idle>-0       0dNh. 158955us : _raw_spin_unlock_irqrestore <-console_unlock
  	<idle>-0       0dNh. 158955us : irq_exit <-smp_spurious_interrupt

 - 3.6-rc6 + this patch + trace off
  	<idle>-0       0d.h. 209131us : _raw_spin_unlock_irqrestore <-console_unlock
  	<idle>-0       0d.h. 209131us : irq_exit <-smp_spurious_interrupt

 - 3.6-rc6 + this patch + trace on
  	<idle>-0       0d.h. 213082us : _raw_spin_unlock_irqrestore <-console_unlock
 	*<idle>-0       0d.h. 213082us : arch_irq_vector_exit: vector=255 name=SPURIOUS_APIC_VECTOR*
  	<idle>-0       0d.h. 213083us : irq_exit <-smp_spurious_interrupt

 Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>

---
 arch/x86/include/asm/irq_vectors.h       |    9 ++
 arch/x86/kernel/apic/apic.c              |    7 +
 arch/x86/kernel/cpu/mcheck/therm_throt.c |    3 +
 arch/x86/kernel/cpu/mcheck/threshold.c   |    3 +
 arch/x86/kernel/irq.c                    |    5 +
 arch/x86/kernel/irq_work.c               |    3 +
 arch/x86/kernel/nmi.c                    |    3 +
 arch/x86/kernel/smp.c                    |    7 +
 include/trace/events/irq_vectors.h       |  209 ++++++++++++++++++++++++++++++
 9 files changed, 249 insertions(+), 0 deletions(-)
 create mode 100644 include/trace/events/irq_vectors.h

diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
index 1508e51..510ced5 100644
--- a/arch/x86/include/asm/irq_vectors.h
+++ b/arch/x86/include/asm/irq_vectors.h
@@ -158,4 +158,13 @@ static inline int invalid_vm86_irq(int irq)
 # define NR_IRQS			NR_IRQS_LEGACY
 #endif
 
+#define irq_vector_name(vector) { vector, #vector }
+
+#define irq_vector_name_table						\
+			irq_vector_name(ERROR_APIC_VECTOR),		\
+			irq_vector_name(THERMAL_APIC_VECTOR),		\
+			irq_vector_name(THRESHOLD_APIC_VECTOR),		\
+			irq_vector_name(SPURIOUS_APIC_VECTOR),		\
+			irq_vector_name(X86_PLATFORM_IPI_VECTOR)
+
 #endif /* _ASM_X86_IRQ_VECTORS_H */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 24deb30..b9cdd8f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -34,6 +34,7 @@
 #include <linux/dmi.h>
 #include <linux/smp.h>
 #include <linux/mm.h>
+#include <trace/events/irq_vectors.h>
 
 #include <asm/irq_remapping.h>
 #include <asm/perf_event.h>
@@ -895,7 +896,9 @@ void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
 	 */
 	irq_enter();
 	exit_idle();
+	trace_local_timer_entry(LOCAL_TIMER_VECTOR);
 	local_apic_timer_interrupt();
+	trace_local_timer_exit(LOCAL_TIMER_VECTOR);
 	irq_exit();
 
 	set_irq_regs(old_regs);
@@ -1881,6 +1884,7 @@ void smp_spurious_interrupt(struct pt_regs *regs)
 
 	irq_enter();
 	exit_idle();
+	trace_arch_irq_vector_entry(SPURIOUS_APIC_VECTOR);
 	/*
 	 * Check if this really is a spurious interrupt and ACK it
 	 * if it is a vectored one.  Just in case...
@@ -1895,6 +1899,7 @@ void smp_spurious_interrupt(struct pt_regs *regs)
 	/* see sw-dev-man vol 3, chapter 7.4.13.5 */
 	pr_info("spurious APIC interrupt on CPU#%d, "
 		"should never happen.\n", smp_processor_id());
+	trace_arch_irq_vector_exit(SPURIOUS_APIC_VECTOR);
 	irq_exit();
 }
 
@@ -1918,6 +1923,7 @@ void smp_error_interrupt(struct pt_regs *regs)
 
 	irq_enter();
 	exit_idle();
+	trace_arch_irq_vector_entry(ERROR_APIC_VECTOR);
 	/* First tickle the hardware, only then report what went on. -- REW */
 	v0 = apic_read(APIC_ESR);
 	apic_write(APIC_ESR, 0);
@@ -1938,6 +1944,7 @@ void smp_error_interrupt(struct pt_regs *regs)
 
 	apic_printk(APIC_DEBUG, KERN_CONT "\n");
 
+	trace_arch_irq_vector_exit(ERROR_APIC_VECTOR);
 	irq_exit();
 }
 
diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c
index 47a1870..63c2cc8 100644
--- a/arch/x86/kernel/cpu/mcheck/therm_throt.c
+++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c
@@ -23,6 +23,7 @@
 #include <linux/init.h>
 #include <linux/smp.h>
 #include <linux/cpu.h>
+#include <trace/events/irq_vectors.h>
 
 #include <asm/processor.h>
 #include <asm/apic.h>
@@ -382,8 +383,10 @@ asmlinkage void smp_thermal_interrupt(struct pt_regs *regs)
 {
 	irq_enter();
 	exit_idle();
+	trace_arch_irq_vector_entry(THERMAL_APIC_VECTOR);
 	inc_irq_stat(irq_thermal_count);
 	smp_thermal_vector();
+	trace_arch_irq_vector_exit(THERMAL_APIC_VECTOR);
 	irq_exit();
 	/* Ack only at the end to avoid potential reentry */
 	ack_APIC_irq();
diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c
index aa578ca..de74768 100644
--- a/arch/x86/kernel/cpu/mcheck/threshold.c
+++ b/arch/x86/kernel/cpu/mcheck/threshold.c
@@ -3,6 +3,7 @@
  */
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
+#include <trace/events/irq_vectors.h>
 
 #include <asm/irq_vectors.h>
 #include <asm/apic.h>
@@ -21,8 +22,10 @@ asmlinkage void smp_threshold_interrupt(void)
 {
 	irq_enter();
 	exit_idle();
+	trace_arch_irq_vector_entry(THRESHOLD_APIC_VECTOR);
 	inc_irq_stat(irq_threshold_count);
 	mce_threshold_vector();
+	trace_arch_irq_vector_exit(THRESHOLD_APIC_VECTOR);
 	irq_exit();
 	/* Ack only at the end to avoid potential reentry */
 	ack_APIC_irq();
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 1f5f1d5..f4d7344 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -18,6 +18,9 @@
 #include <asm/mce.h>
 #include <asm/hw_irq.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/irq_vectors.h>
+
 atomic_t irq_err_count;
 
 /* Function pointer for generic interrupt vector handling */
@@ -218,11 +221,13 @@ void smp_x86_platform_ipi(struct pt_regs *regs)
 
 	exit_idle();
 
+	trace_arch_irq_vector_entry(X86_PLATFORM_IPI_VECTOR);
 	inc_irq_stat(x86_platform_ipis);
 
 	if (x86_platform_ipi_callback)
 		x86_platform_ipi_callback();
 
+	trace_arch_irq_vector_exit(X86_PLATFORM_IPI_VECTOR);
 	irq_exit();
 
 	set_irq_regs(old_regs);
diff --git a/arch/x86/kernel/irq_work.c b/arch/x86/kernel/irq_work.c
index ca8f703..2cf7505 100644
--- a/arch/x86/kernel/irq_work.c
+++ b/arch/x86/kernel/irq_work.c
@@ -8,13 +8,16 @@
 #include <linux/irq_work.h>
 #include <linux/hardirq.h>
 #include <asm/apic.h>
+#include <trace/events/irq_vectors.h>
 
 void smp_irq_work_interrupt(struct pt_regs *regs)
 {
 	irq_enter();
 	ack_APIC_irq();
+	trace_irq_work_entry(IRQ_WORK_VECTOR);
 	inc_irq_stat(apic_irq_work_irqs);
 	irq_work_run();
+	trace_irq_work_exit(IRQ_WORK_VECTOR);
 	irq_exit();
 }
 
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index f84f5c5..cc57aba 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -28,6 +28,7 @@
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
+#include <trace/events/irq_vectors.h>
 
 struct nmi_desc {
 	spinlock_t lock;
@@ -482,12 +483,14 @@ do_nmi(struct pt_regs *regs, long error_code)
 	nmi_nesting_preprocess(regs);
 
 	nmi_enter();
+	trace_nmi_entry(NMI_VECTOR);
 
 	inc_irq_stat(__nmi_count);
 
 	if (!ignore_nmis)
 		default_do_nmi(regs);
 
+	trace_nmi_exit(NMI_VECTOR);
 	nmi_exit();
 
 	/* On i386, may loop back to preprocess */
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 48d2b7d..5b2d6de 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -23,6 +23,7 @@
 #include <linux/interrupt.h>
 #include <linux/cpu.h>
 #include <linux/gfp.h>
+#include <trace/events/irq_vectors.h>
 
 #include <asm/mtrr.h>
 #include <asm/tlbflush.h>
@@ -252,8 +253,10 @@ finish:
 void smp_reschedule_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
+	trace_reschedule_entry(RESCHEDULE_VECTOR);
 	inc_irq_stat(irq_resched_count);
 	scheduler_ipi();
+	trace_reschedule_exit(RESCHEDULE_VECTOR);
 	/*
 	 * KVM uses this interrupt to force a cpu out of guest mode
 	 */
@@ -263,8 +266,10 @@ void smp_call_function_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	irq_enter();
+	trace_call_function_entry(CALL_FUNCTION_VECTOR);
 	generic_smp_call_function_interrupt();
 	inc_irq_stat(irq_call_count);
+	trace_call_function_exit(CALL_FUNCTION_VECTOR);
 	irq_exit();
 }
 
@@ -272,8 +277,10 @@ void smp_call_function_single_interrupt(struct pt_regs *regs)
 {
 	ack_APIC_irq();
 	irq_enter();
+	trace_call_function_single_entry(CALL_FUNCTION_SINGLE_VECTOR);
 	generic_smp_call_function_single_interrupt();
 	inc_irq_stat(irq_call_count);
+	trace_call_function_single_exit(CALL_FUNCTION_SINGLE_VECTOR);
 	irq_exit();
 }
 
diff --git a/include/trace/events/irq_vectors.h b/include/trace/events/irq_vectors.h
new file mode 100644
index 0000000..fffe0c0
--- /dev/null
+++ b/include/trace/events/irq_vectors.h
@@ -0,0 +1,209 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM irq_vectors
+
+#if !defined(_TRACE_IRQ_VECTORS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_IRQ_VECTORS_H
+
+#include <linux/tracepoint.h>
+#include <asm/irq.h>
+
+#ifndef irq_vector_name_table
+#define irq_vector_name_table { -1, NULL }
+#endif
+
+
+/*
+ * This class is used by generic ,cross-architecture tracepoints.
+ */
+DECLARE_EVENT_CLASS(irq_vector,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector),
+
+	TP_STRUCT__entry(
+		__field(	int,	vector	)
+	),
+
+	TP_fast_assign(
+		__entry->vector = vector;
+	),
+
+	TP_printk("vector=%d", __entry->vector)
+);
+
+/*
+ * nmi_entry - called before enterring a nmi vector handler
+ */
+DEFINE_EVENT(irq_vector, nmi_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * nmi_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(irq_vector, nmi_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * local_timer_entry - called before enterring a local timer interrupt
+ * vector handler
+ */
+DEFINE_EVENT(irq_vector, local_timer_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * local_timer_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(irq_vector, local_timer_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * reschedule_entry - called before enterring a reschedule vector handler
+ */
+DEFINE_EVENT(irq_vector, reschedule_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * reschedule_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(irq_vector, reschedule_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * call_function_entry - called before enterring a call function
+ *			 vector handler
+ */
+DEFINE_EVENT(irq_vector, call_function_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * call_function_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(irq_vector, call_function_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * call_function_single_entry - called before enterring a call function
+ * single vector handler
+ */
+DEFINE_EVENT(irq_vector, call_function_single_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * call_function_single_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(irq_vector, call_function_single_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * irq_work_entry - called before enterring an irq work vector handler
+ */
+DEFINE_EVENT(irq_vector, irq_work_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * irq_work_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(irq_vector, irq_work_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * This class is used by arch-specific tracepoints.
+ */
+DECLARE_EVENT_CLASS(arch_irq_vector,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector),
+
+	TP_STRUCT__entry(
+		__field(	int,	vector	)
+	),
+
+	TP_fast_assign(
+		__entry->vector = vector;
+	),
+
+	TP_printk("vector=%d name=%s", __entry->vector,
+		__print_symbolic(__entry->vector, irq_vector_name_table))
+);
+
+/*
+ * arch_irq_vector_entry - called before enterring a interrupt vector handler
+ */
+DEFINE_EVENT(arch_irq_vector, arch_irq_vector_entry,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+/*
+ * arch_irq_vector_exit - called immediately after the interrupt vector
+ * handler returns
+ */
+DEFINE_EVENT(arch_irq_vector, arch_irq_vector_exit,
+
+	TP_PROTO(int vector),
+
+	TP_ARGS(vector)
+);
+
+#endif /*  _TRACE_IRQ_VECTORS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
-- 1.7.1 


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-09-22  0:40 [PATCH v4] trace,x86: add x86 irq vector tracepoints Seiji Aguchi
@ 2012-09-25  3:38 ` H. Peter Anvin
  2012-09-27 22:33   ` Seiji Aguchi
  0 siblings, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2012-09-25  3:38 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, rostedt, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On 09/21/2012 05:40 PM, Seiji Aguchi wrote:
>
>     Those x86 specific ones are not really frequently raised vectors, so
>     enabling them all won't affect performance and readability of the
>     traces too much.
>

... except the cost can be reduced to zero *AND* be made into a more 
general mechanism by simply hooking the IDT.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-09-25  3:38 ` H. Peter Anvin
@ 2012-09-27 22:33   ` Seiji Aguchi
  2012-09-28  2:25     ` H. Peter Anvin
  0 siblings, 1 reply; 15+ messages in thread
From: Seiji Aguchi @ 2012-09-27 22:33 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, rostedt, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1123 bytes --]

Hi,

> ... except the cost can be reduced to zero *AND* be made into a more general mechanism by simply hooking the IDT.

Thank you for giving me the comment.
In my understanding,  we can introduce a more general mechanism by sandwiching an existing handler between tracepoints.
The pseudo code is like this:

@@ -17,7 +18,7 @@ static void default_threshold_interrupt(void)
 
 void (*mce_threshold_vector)(void) = default_threshold_interrupt;
 
-asmlinkage void smp_threshold_interrupt(void)
+static void do_smp_threshold_interrupt(void)
 {
 	irq_enter();
 	exit_idle();
@@ -27,3 +28,10 @@ asmlinkage void smp_threshold_interrupt(void)
 	/* Ack only at the end to avoid potential reentry */
 	ack_APIC_irq();
 }
+
+asmlinkage void smp_threshold_interrupt(void) {
+	trace_arch_irq_vector_entry(THRESHOLD_APIC_VECTOR);
+	do_smp_threshold_interrupt();
+	trace_arch_irq_vector_exit(THRESHOLD_APIC_VECTOR);
+}

If I misunderstand something, please let me know.

Seiji
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-09-27 22:33   ` Seiji Aguchi
@ 2012-09-28  2:25     ` H. Peter Anvin
  2012-10-02 19:10       ` Seiji Aguchi
  0 siblings, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2012-09-28  2:25 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, rostedt, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On 09/27/2012 03:33 PM, Seiji Aguchi wrote:
> Hi,
> 
>> ... except the cost can be reduced to zero *AND* be made into a more general mechanism by simply hooking the IDT.
> 
> Thank you for giving me the comment.
> In my understanding,  we can introduce a more general mechanism by sandwiching an existing handler between tracepoints.
> The pseudo code is like this:
> 
> @@ -17,7 +18,7 @@ static void default_threshold_interrupt(void)
>  
>  void (*mce_threshold_vector)(void) = default_threshold_interrupt;
>  
> -asmlinkage void smp_threshold_interrupt(void)
> +static void do_smp_threshold_interrupt(void)
>  {
>  	irq_enter();
>  	exit_idle();
> @@ -27,3 +28,10 @@ asmlinkage void smp_threshold_interrupt(void)
>  	/* Ack only at the end to avoid potential reentry */
>  	ack_APIC_irq();
>  }
> +
> +asmlinkage void smp_threshold_interrupt(void) {
> +	trace_arch_irq_vector_entry(THRESHOLD_APIC_VECTOR);
> +	do_smp_threshold_interrupt();
> +	trace_arch_irq_vector_exit(THRESHOLD_APIC_VECTOR);
> +}
> 
> If I misunderstand something, please let me know.
> 

Quite.

These functions are being invoked from the IDT, which is an indirect
pointer structure.  When not being traced, there is absolutely no reason
why it should go through a thunk with tracepoints.

	-hpa




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-09-28  2:25     ` H. Peter Anvin
@ 2012-10-02 19:10       ` Seiji Aguchi
  2012-10-05 14:13         ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Seiji Aguchi @ 2012-10-02 19:10 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, rostedt, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2542 bytes --]

> >
> > If I misunderstand something, please let me know.
> >
> 
> Quite.
> 
> These functions are being invoked from the IDT, which is an indirect pointer structure.  When not being traced, there is absolutely no
> reason why it should go through a thunk with tracepoints.

I agree that the cost can be absolutely zero by switching  each interrupt hander when turning on/off the tracepoint.

But I would like to talk about a time penalty of a tracepoint more.

When not being traced, the tracepoint is just nop.
And it is described in the documentation below.
So, the tracepoint seems to be designed to add to performance critical paths.

Documentation/trace/tracepoints.txt
<snip>
* Purpose of tracepoints

A tracepoint placed in code provides a hook to call a function (probe) that you can provide at runtime. A tracepoint can be "on" (a probe is connected to it) or "off" (no probe is attached). When a tracepoint is "off" it has no effect, except for adding a tiny time penalty (checking a condition for a branch) and space penalty (adding a few bytes for the function call at the end of the instrumented function and adds a data structure in a separate section).  When a tracepoint is "on", the function you provide is called each time the tracepoint is executed, in the execution context of the caller. When the function provided ends its execution, it returns to the caller (continuing from the tracepoint site).

You can put tracepoints at important locations in the code. They are lightweight hooks that can pass an arbitrary number of parameters, which prototypes are described in a tracepoint declaration placed in a header file.
<snip>

Also, as I submitted an actual latency measurement, the time penalty is almost zero.

<snip>
(1-1) local_timer_entry

 - 3.6-rc6 original
   	<...>-2788    2dNh. 894834337us : exit_idle <-smp_apic_timer_interrupt
   	<...>-2788    2dNh. 894834337us : hrtimer_interrupt <-smp_apic_timer_interrupt

 - 3.6-rc6 + this patch + trace off
   	<...>-1981    0d.h. 210538us : exit_idle <-smp_apic_timer_interrupt
   	<...>-1981    0d.h. 210538us : hrtimer_interrupt <-smp_apic_timer_interrupt
<snip>

When switching each interrupt handler, all cpus have to be interrupt-disable with smp_call_funciton() or something like that.

IMO, rather than doing such a complex thing, just adding a tracepoint is reasonable.

What do you think?

Seiji
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-02 19:10       ` Seiji Aguchi
@ 2012-10-05 14:13         ` Steven Rostedt
  2012-10-06  0:16           ` H. Peter Anvin
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2012-10-05 14:13 UTC (permalink / raw)
  To: Seiji Aguchi
  Cc: H. Peter Anvin, Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Tue, 2012-10-02 at 19:10 +0000, Seiji Aguchi wrote:
> > >
> > > If I misunderstand something, please let me know.
> > >
> > 
> > Quite.
> > 
> > These functions are being invoked from the IDT, which is an indirect pointer structure.  When not being traced, there is absolutely no
> > reason why it should go through a thunk with tracepoints.
> 
> I agree that the cost can be absolutely zero by switching  each interrupt hander when turning on/off the tracepoint.
> 

Peter,

I agree that the IDT version is a zero cost in performance, where as the
tracepoint version is a negligible cost in performance. But my worry is
the complexity (read error prone and possible openings of security
exploits) worth it?

Switching of the IDT is not that trivial, and to make it something for
common activities such as reading tracepoints by tools like ftrace and
perf, that do it often, even on production machines, may lead to issues
if its not done right.

You are the maintainer and are responsible for the outcome of changes to
the x86 arch, thus you do have final say. And if you think there's
nothing to worry about with an IDT change then Seiji should implement
it.

I just want to point out some possible repercussions of doing it in a
more complex way. As tracepoints use nops, and I may be pushing to even
out-of-line the tracepoint unlikely part even more, I'm not sure the
complexity is worth the amount of savings it would be against just
adding the tracepoint in the code.

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-05 14:13         ` Steven Rostedt
@ 2012-10-06  0:16           ` H. Peter Anvin
  2012-10-06  2:57             ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2012-10-06  0:16 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Seiji Aguchi, Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On 10/05/2012 07:13 AM, Steven Rostedt wrote:
> 
> Peter,
> 
> I agree that the IDT version is a zero cost in performance, where as the
> tracepoint version is a negligible cost in performance. But my worry is
> the complexity (read error prone and possible openings of security
> exploits) worth it?
> 
> Switching of the IDT is not that trivial, and to make it something for
> common activities such as reading tracepoints by tools like ftrace and
> perf, that do it often, even on production machines, may lead to issues
> if its not done right.
> 

It's a table of pointers... there really isn't anything magic about it
(except perhaps the slightly weird format.)

> You are the maintainer and are responsible for the outcome of changes to
> the x86 arch, thus you do have final say. And if you think there's
> nothing to worry about with an IDT change then Seiji should implement
> it.
> 
> I just want to point out some possible repercussions of doing it in a
> more complex way. As tracepoints use nops, and I may be pushing to even
> out-of-line the tracepoint unlikely part even more, I'm not sure the
> complexity is worth the amount of savings it would be against just
> adding the tracepoint in the code.

The problem I'm seeing is the constant "oh, just a little bit more."  My
experience over the years is that there is always demand for "just one
more debug feature", each of which has negible cost, because they always
use the previous thing as a baseline... noone ever looks at the grand
total cost of the package (and by the time that happens, it is too late.)

tracepoints in particular are something I'm getting concerned about,
because they are one of those things that turn kernel internals into an
ABI, which means misdesigned tracepoints can actually make kernel
internal changes very hard to do.  The cost of those kinds of issues go
up over time as the strain between where we'd like the code to be vs.
where the code is increases.

	-hpa



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06  0:16           ` H. Peter Anvin
@ 2012-10-06  2:57             ` Steven Rostedt
  2012-10-06 13:05               ` Borislav Petkov
  2012-10-07 15:25               ` H. Peter Anvin
  0 siblings, 2 replies; 15+ messages in thread
From: Steven Rostedt @ 2012-10-06  2:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Seiji Aguchi, Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Fri, 2012-10-05 at 17:16 -0700, H. Peter Anvin wrote:
> On 10/05/2012 07:13 AM, Steven Rostedt wrote:
> > 
> > Peter,
> > 
> > I agree that the IDT version is a zero cost in performance, where as the
> > tracepoint version is a negligible cost in performance. But my worry is
> > the complexity (read error prone and possible openings of security
> > exploits) worth it?
> > 
> > Switching of the IDT is not that trivial, and to make it something for
> > common activities such as reading tracepoints by tools like ftrace and
> > perf, that do it often, even on production machines, may lead to issues
> > if its not done right.
> > 
> 
> It's a table of pointers... there really isn't anything magic about it
> (except perhaps the slightly weird format.)

I didn't say anything magic, but a table of pointers that are very
critical for the system running. Should we implement it with a single
switch, like we discussed in San Diego to do with the system call table?

That is, have a "normal" table, and a "trace" table. The trace table
points to functions that have tracepoints. The first enabler of tracing
switches the table to use the tracepoints, and the last disabler
switches it back?

> 
> > You are the maintainer and are responsible for the outcome of changes to
> > the x86 arch, thus you do have final say. And if you think there's
> > nothing to worry about with an IDT change then Seiji should implement
> > it.
> > 
> > I just want to point out some possible repercussions of doing it in a
> > more complex way. As tracepoints use nops, and I may be pushing to even
> > out-of-line the tracepoint unlikely part even more, I'm not sure the
> > complexity is worth the amount of savings it would be against just
> > adding the tracepoint in the code.
> 
> The problem I'm seeing is the constant "oh, just a little bit more."  My
> experience over the years is that there is always demand for "just one
> more debug feature", each of which has negible cost, because they always
> use the previous thing as a baseline... noone ever looks at the grand
> total cost of the package (and by the time that happens, it is too late.)

Now I can turn this back at you ;-)  We can implement the simple "just
add the tracepoints in the code" first, and then later implement the
table swap version and we can say "hey! we just made the code faster!".

> 
> tracepoints in particular are something I'm getting concerned about,
> because they are one of those things that turn kernel internals into an
> ABI, which means misdesigned tracepoints can actually make kernel
> internal changes very hard to do.  The cost of those kinds of issues go
> up over time as the strain between where we'd like the code to be vs.
> where the code is increases.

Honestly, I'm extremely concerned about this too. In fact, I've bitched
about this so many times in the past, but it just fell to deaf ears:

http://lwn.net/Articles/412685/
http://lwn.net/Articles/415591/
http://lwn.net/Articles/416665/
http://lwn.net/Articles/416684/

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06  2:57             ` Steven Rostedt
@ 2012-10-06 13:05               ` Borislav Petkov
  2012-10-06 14:51                 ` Steven Rostedt
  2012-10-07 15:25               ` H. Peter Anvin
  1 sibling, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2012-10-06 13:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Seiji Aguchi,
	Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Fri, Oct 05, 2012 at 10:57:41PM -0400, Steven Rostedt wrote:
> > The problem I'm seeing is the constant "oh, just a little bit more."  My
> > experience over the years is that there is always demand for "just one
> > more debug feature", each of which has negible cost, because they always
> > use the previous thing as a baseline... noone ever looks at the grand
> > total cost of the package (and by the time that happens, it is too late.)
> 
> Now I can turn this back at you ;-)  We can implement the simple "just
> add the tracepoints in the code" first, and then later implement the
> table swap version and we can say "hey! we just made the code faster!".

I absolutely agree with hpa here - it's like he's reading my mind. The
sum of the total cost of all those features simply and surely slows down
the kernel with time and if we don't pay attention, we might get bogged
down with fat no matter the IPC improvements hw guys give us. Which are
not endless, btw, in case someone wonders.

What I'm missing with all those patches on LKML is: here's a patch that
doesn't add a new feature but gives us n% improv with this and that
workload. I wish we had more of those instead of the gazillion new
features each 3 months.

> > tracepoints in particular are something I'm getting concerned about,
> > because they are one of those things that turn kernel internals into an
> > ABI, which means misdesigned tracepoints can actually make kernel
> > internal changes very hard to do.  The cost of those kinds of issues go
> > up over time as the strain between where we'd like the code to be vs.
> > where the code is increases.
> 
> Honestly, I'm extremely concerned about this too. In fact, I've bitched
> about this so many times in the past, but it just fell to deaf ears:
> 
> http://lwn.net/Articles/412685/
> http://lwn.net/Articles/415591/
> http://lwn.net/Articles/416665/
> http://lwn.net/Articles/416684/

Absolutely agreed too. This is why we had such a long discussion about
the RAS tracepoint format recently, for example.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06 13:05               ` Borislav Petkov
@ 2012-10-06 14:51                 ` Steven Rostedt
  2012-10-06 17:32                   ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2012-10-06 14:51 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, Seiji Aguchi,
	Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Sat, 2012-10-06 at 15:05 +0200, Borislav Petkov wrote:

> What I'm missing with all those patches on LKML is: here's a patch that
> doesn't add a new feature but gives us n% improv with this and that
> workload. I wish we had more of those instead of the gazillion new
> features each 3 months.

That would be nice too. But we can also add a patch that gives us
negligible improvement that makes things a little more complex and also
opens the possibility of a security hole.

Thus my question is, is the swap IDT really worth it? I'm fine if
someone goes ahead and implements it. Heck, I'd love to implement it
when I have time, as it refreshes my knowledge of how intel archs do
interrupt processing.

I'm just worried that we are adding more complexity to code for very
little gain.

I think we need to take another look at this.

1) Are the tracepoints that Seiji worth adding? If not then we can stop
the discussion here.

2) Are the tracepoints done in a way that it's not going to cause "ABI"
issues. If not then we need to redesign the tracepoints.

3) If we are here, then we have tracepoints that are worth adding and
are done in a way that they should be stable for the long term. OK, how
to implement them?  The question really is, should we keep it 0% impact
when not active by the IDT switch or allow for the negligible impact by
adding the tracepoints into the code directly and not worrying about it.

  a) The tracepoints are going in the code regardless. Even with a 
  switch we need to have a duplicate of the calls, one with and one
  without the tracepoints.

  b) It can be done with one big change: add the tracepoints and do the
  duplicate with and without versions for the IDT switch. Or we can
  break it into two parts. First, add the tracepoints, then add the
  switch with the duplicates. I prefer this method if we are doing
  the switch.

I expect that if we do the switch we would have something like this:

void __irq_entry smp_apic_timer_interrupt(struct pt_regs *regs)
{
	struct pt_regs *old_regs = set_irq_regs(regs);

	/*
	 * NOTE! We'd better ACK the irq immediately,
	 * because timer handling can be slow.
	 */
	ack_APIC_irq();
	/*
	 * update_process_times() expects us to have done irq_enter().
	 * Besides, if we don't timer interrupts ignore the global
	 * interrupt lock, which is the WrongThing (tm) to do.
	 */
	irq_enter();
	exit_idle();
	local_apic_timer_interrupt();
	irq_exit();

	set_irq_regs(old_regs);
}

void __irq_entry trace_smp_apic_timer_interrupt(struct pt_regs *regs)
{
	struct pt_regs *old_regs = set_irq_regs(regs);

	/*
	 * NOTE! We'd better ACK the irq immediately,
	 * because timer handling can be slow.
	 */
	ack_APIC_irq();
	/*
	 * update_process_times() expects us to have done irq_enter().
	 * Besides, if we don't timer interrupts ignore the global
	 * interrupt lock, which is the WrongThing (tm) to do.
	 */
	irq_enter();
	exit_idle();
	trace_local_timer_entry(LOCAL_TIMER_VECTOR);
	local_apic_timer_interrupt();
	trace_local_timer_exit(LOCAL_TIMER_VECTOR);
	irq_exit();

	set_irq_regs(old_regs);
}

Now we have two functions accomplishing the same task. Any change to one
must be done to change the other. Due to rcu idle, the tracepoint needs
to be after the exit_idle() and before irq_exit().

We could force the two to be in step by using ugly macro magic:

#define APIC_TIMER_INTERRUPT(trace, trace_enter, trace_exit)	\
void __irq_entry trace##_smp_apic_timer_interrupt(struct pt_regs *regs) \
{ \
	struct pt_regs *old_regs = set_irq_regs(regs); \
\
	/* \
	 * NOTE! We'd better ACK the irq immediately, \
	 * because timer handling can be slow. \
	 */ \
	ack_APIC_irq(); \
	/* \
	 * update_process_times() expects us to have done irq_enter(). \
	 * Besides, if we don't timer interrupts ignore the global \
	 * interrupt lock, which is the WrongThing (tm) to do. \
	 */ \
	irq_enter(); \
	exit_idle(); \
	trace_enter; \
	local_apic_timer_interrupt(); \
	trace_exit; \
	irq_exit(); \
 \
	set_irq_regs(old_regs); \
}

APIC_TIMER_INTERRUPT(,,)
APIC_TIMER_INTERRUPT(trace,trace_local_timer_entry(LOCAL_TIMER_VECTOR), trace_local_timer_exit(LOCAL_TIMER_VECTOR))
 

But I'm not sure we want to go there.

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06 14:51                 ` Steven Rostedt
@ 2012-10-06 17:32                   ` Borislav Petkov
  2012-10-06 18:26                     ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Borislav Petkov @ 2012-10-06 17:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Seiji Aguchi,
	Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Sat, Oct 06, 2012 at 10:51:45AM -0400, Steven Rostedt wrote:
> On Sat, 2012-10-06 at 15:05 +0200, Borislav Petkov wrote:
> > What I'm missing with all those patches on LKML is: here's a patch that
> > doesn't add a new feature but gives us n% improv with this and that
> > workload. I wish we had more of those instead of the gazillion new
> > features each 3 months.
> 
> That would be nice too. But we can also add a patch that gives us
> negligible improvement that makes things a little more complex and also
> opens the possibility of a security hole.
> 
> Thus my question is, is the swap IDT really worth it? I'm fine if
> someone goes ahead and implements it. Heck, I'd love to implement it
> when I have time, as it refreshes my knowledge of how intel archs do
> interrupt processing.
> 
> I'm just worried that we are adding more complexity to code for very
> little gain.
> 
> I think we need to take another look at this.
> 
> 1) Are the tracepoints that Seiji worth adding? If not then we can stop
> the discussion here.

I like straight talk - it saves everybody a lot of time :-)

But seriously, I was adressing the general observation how a lot of new
features get added because "it would be cool if we could do X and Y" and
how we're progressively getting fatter and slowing down over time.

And I like how you're giving that feature a hard look - something we
should be doing always, btw.

So http://marc.info/?l=linux-kernel&m=134827445716419 talks about how it
is good to know which cores handle IRQs and how this affects the system,
and IRQ interaction and yadda yadda... But frankly speaking, that still
doesn't give me a hard-on; I gotta say - it sounds more like a debugging
feature which people can enable, with certain overhead like most of
those in "Kernel hacking" but the general public doesn't need it.

So, do we really really need this or are we better off with a debugging
design where we don't care about overhead?

Hmm, I'd say make it off by default and let people who want it enable it
and go crazy.

> 2) Are the tracepoints done in a way that it's not going to cause "ABI"
> issues. If not then we need to redesign the tracepoints.

Btw, this we should be asking ourselves about *all* TPs, especially if
they're in generic code.

[ … ]

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06 17:32                   ` Borislav Petkov
@ 2012-10-06 18:26                     ` Steven Rostedt
  2012-10-06 23:33                       ` Borislav Petkov
  0 siblings, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2012-10-06 18:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: H. Peter Anvin, Seiji Aguchi,
	Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Sat, 2012-10-06 at 19:32 +0200, Borislav Petkov wrote:

> > 2) Are the tracepoints done in a way that it's not going to cause "ABI"
> > issues. If not then we need to redesign the tracepoints.
> 
> Btw, this we should be asking ourselves about *all* TPs, especially if
> they're in generic code.

I agree, and I'm starting to think I shouldn't have given free reign
over the TPs to system maintainers. That is, I should have pushed harder
to understand all tracepoints added to code to make sure the maintainer
knows that it can become an ABI.

Some maintainers don't worry about it. But I can see it coming back to
haunt them. In the end, it will hurt the maintainer of the code, which
is why I gave the ownership of tracepoints to locations where they are
at (instead of a "joint" ownership). But I probably should have been a
TP cop for a while to allow them to understand the consequences first.

-- Steve



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06 18:26                     ` Steven Rostedt
@ 2012-10-06 23:33                       ` Borislav Petkov
  0 siblings, 0 replies; 15+ messages in thread
From: Borislav Petkov @ 2012-10-06 23:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, Seiji Aguchi,
	Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On Sat, Oct 06, 2012 at 02:26:17PM -0400, Steven Rostedt wrote:
> On Sat, 2012-10-06 at 19:32 +0200, Borislav Petkov wrote:
> 
> > > 2) Are the tracepoints done in a way that it's not going to cause "ABI"
> > > issues. If not then we need to redesign the tracepoints.
> > 
> > Btw, this we should be asking ourselves about *all* TPs, especially if
> > they're in generic code.
> 
> I agree, and I'm starting to think I shouldn't have given free reign
> over the TPs to system maintainers. That is, I should have pushed harder
> to understand all tracepoints added to code to make sure the maintainer
> knows that it can become an ABI.
> 
> Some maintainers don't worry about it. But I can see it coming back to
> haunt them. In the end, it will hurt the maintainer of the code, which
> is why I gave the ownership of tracepoints to locations where they are
> at (instead of a "joint" ownership). But I probably should have been a
> TP cop for a while to allow them to understand the consequences first.

Yeah, even if you were the TP cop and had a shiny uniform with a badge
8-), do you think you'd have the time to review all the code adding TPs?

I think maybe it would've been better to add some text to Documentation
explaining with what care TPs should be designed, have checkpatch warn
on all new tracepoints, hope for the best and prepare for the worst. In
addition maybe review all TPs added to generic or arch-you-care-about
code. Maybe...

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-06  2:57             ` Steven Rostedt
  2012-10-06 13:05               ` Borislav Petkov
@ 2012-10-07 15:25               ` H. Peter Anvin
  2012-10-11 17:04                 ` Seiji Aguchi
  1 sibling, 1 reply; 15+ messages in thread
From: H. Peter Anvin @ 2012-10-07 15:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Seiji Aguchi, Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

On 10/06/2012 10:57 AM, Steven Rostedt wrote:
> 
> I didn't say anything magic, but a table of pointers that are very
> critical for the system running. Should we implement it with a single
> switch, like we discussed in San Diego to do with the system call table?
> 
> That is, have a "normal" table, and a "trace" table. The trace table
> points to functions that have tracepoints. The first enabler of tracing
> switches the table to use the tracepoints, and the last disabler
> switches it back?
> 

That is certainly a reasonable implementation option.  It is slightly
less usable than it is for system calls, though, because the vectors in
the IDTs are somewhat scrambled and so you can't just do an indirect
jump to the original vector content.  This does get messy because you
also want to preserve registers...

>>
>>> You are the maintainer and are responsible for the outcome of changes to
>>> the x86 arch, thus you do have final say. And if you think there's
>>> nothing to worry about with an IDT change then Seiji should implement
>>> it.
>>>
>>> I just want to point out some possible repercussions of doing it in a
>>> more complex way. As tracepoints use nops, and I may be pushing to even
>>> out-of-line the tracepoint unlikely part even more, I'm not sure the
>>> complexity is worth the amount of savings it would be against just
>>> adding the tracepoint in the code.
>>
>> The problem I'm seeing is the constant "oh, just a little bit more."  My
>> experience over the years is that there is always demand for "just one
>> more debug feature", each of which has negible cost, because they always
>> use the previous thing as a baseline... noone ever looks at the grand
>> total cost of the package (and by the time that happens, it is too late.)
> 
> Now I can turn this back at you ;-)  We can implement the simple "just
> add the tracepoints in the code" first, and then later implement the
> table swap version and we can say "hey! we just made the code faster!".

Can we?  My understanding how tracepoints are is that they export a
bunch of data structures via debugfs (a.k.a. sploitfs -- any system with
debugfs mounted really should be considered compromised from the start)
and that they are intimately tied to how they are implemented.

>>
>> tracepoints in particular are something I'm getting concerned about,
>> because they are one of those things that turn kernel internals into an
>> ABI, which means misdesigned tracepoints can actually make kernel
>> internal changes very hard to do.  The cost of those kinds of issues go
>> up over time as the strain between where we'd like the code to be vs.
>> where the code is increases.
> 
> Honestly, I'm extremely concerned about this too. In fact, I've bitched
> about this so many times in the past, but it just fell to deaf ears:
> 
> http://lwn.net/Articles/412685/
> http://lwn.net/Articles/415591/
> http://lwn.net/Articles/416665/
> http://lwn.net/Articles/416684/
> 

Slightly different complaint, but in the same general vein, yes.

	-hpa




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v4] trace,x86: add x86 irq vector tracepoints
  2012-10-07 15:25               ` H. Peter Anvin
@ 2012-10-11 17:04                 ` Seiji Aguchi
  0 siblings, 0 replies; 15+ messages in thread
From: Seiji Aguchi @ 2012-10-11 17:04 UTC (permalink / raw)
  To: H. Peter Anvin, Steven Rostedt
  Cc: Thomas Gleixner (tglx@linutronix.de),
	linux-kernel, 'mingo@elte.hu' (mingo@elte.hu),
	x86, dle-develop, Satoru Moriya

> >
> > I didn't say anything magic, but a table of pointers that are very
> > critical for the system running. Should we implement it with a single
> > switch, like we discussed in San Diego to do with the system call table?
> >
> > That is, have a "normal" table, and a "trace" table. The trace table
> > points to functions that have tracepoints. The first enabler of
> > tracing switches the table to use the tracepoints, and the last
> > disabler switches it back?
> >
> 
> That is certainly a reasonable implementation option.  It is slightly less usable than it is for system calls, though, because the vectors in
> the IDTs are somewhat scrambled and so you can't just do an indirect jump to the original vector content.  This does get messy
> because you also want to preserve registers...
> 

Peter, Steven, 

Thank you for explaining the reason why you think a time penalty should be zero
and discussing its implementation.

I will update my patch so that a time penalty makes zero and submit it shortly.

Seiji

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-10-11 17:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-22  0:40 [PATCH v4] trace,x86: add x86 irq vector tracepoints Seiji Aguchi
2012-09-25  3:38 ` H. Peter Anvin
2012-09-27 22:33   ` Seiji Aguchi
2012-09-28  2:25     ` H. Peter Anvin
2012-10-02 19:10       ` Seiji Aguchi
2012-10-05 14:13         ` Steven Rostedt
2012-10-06  0:16           ` H. Peter Anvin
2012-10-06  2:57             ` Steven Rostedt
2012-10-06 13:05               ` Borislav Petkov
2012-10-06 14:51                 ` Steven Rostedt
2012-10-06 17:32                   ` Borislav Petkov
2012-10-06 18:26                     ` Steven Rostedt
2012-10-06 23:33                       ` Borislav Petkov
2012-10-07 15:25               ` H. Peter Anvin
2012-10-11 17:04                 ` Seiji Aguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).