All of lore.kernel.org
 help / color / mirror / Atom feed
* Cleanup and fixes for power trace events
@ 2010-10-26 23:43 Thomas Renninger
  2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users


[PATCH 1/4] PERF: Do not export power_frequency, but power_start event
[PATCH 2/4] PERF(kernel): Cleanup power events V3
[PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
[PATCH 4/4] PERF: fix power:cpu_idle double end events


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/4] PERF: Do not export power_frequency, but power_start event
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
  2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` [PATCH 2/4] PERF(kernel): Cleanup power events V3 Thomas Renninger
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c37ef64..21ac077 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,9 +201,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/4] PERF: Do not export power_frequency, but power_start event
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c37ef64..21ac077 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,9 +201,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] PERF(kernel): Cleanup power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
  2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events
  - Fixed a copy and paste bug (poll_idle should throw and event
    state of zero, not 1)

Changes in V2:
  - Introduce PWR_EVENT_EXIT instead of 0 to mark non-power state
  - Use u32 instead of u64 for cpuid, state which is by far enough

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   14 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..0b5c841 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,20 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/4] PERF(kernel): Cleanup power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (2 preceding siblings ...)
  2010-10-26 23:43 ` [PATCH 2/4] PERF(kernel): Cleanup power events V3 Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events
  - Fixed a copy and paste bug (poll_idle should throw and event
    state of zero, not 1)

Changes in V2:
  - Introduce PWR_EVENT_EXIT instead of 0 to mark non-power state
  - Use u32 instead of u64 for cpuid, state which is by far enough

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   14 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..0b5c841 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,20 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (4 preceding siblings ...)
  2010-10-26 23:43 ` [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events

Changes in V2:
  - Hanlde PWR_EVENT_EXIT instead of 0 to recon non-power state

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 tools/perf/builtin-timechart.c |   89 ++++++++++++++++++++++++++++++++-------
 tools/perf/util/parse-events.c |   43 +++++++++++++++++++-
 tools/perf/util/parse-events.h |    1 +
 3 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..bcc16b0 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT 0xFFFFFFFF
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,25 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u64	state;
+	u64	cpu_id;
+};
+
+struct power_suspend_entry {
+	struct trace_entry te;
+	u64	state;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +506,46 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		else if (strcmp(event_str, "power:power_start") == 0)
+			c_state_start(peo->cpu_id, data.time, peo->value);
+
+		else if (strcmp(event_str, "power:power_end") == 0)
+			c_state_end(peo->cpu_id, data.time);
+
+		else if (strcmp(event_str, "power:power_frequency") == 0)
+			p_state_change(peo->cpu_id, data.time, peo->value);
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1002,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1015,38 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
 
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (is_valid_tracepoint("power:power_start")) {
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..d706dcb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -824,7 +824,7 @@ int parse_events(const struct option *opt __used, const char *str, int unset __u
 		if (ret != EVT_HANDLED_ALL) {
 			attrs[nr_counters] = attr;
 			nr_counters++;
-		}
+	}
 
 		if (*str == 0)
 			break;
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (3 preceding siblings ...)
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events

Changes in V2:
  - Hanlde PWR_EVENT_EXIT instead of 0 to recon non-power state

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 tools/perf/builtin-timechart.c |   89 ++++++++++++++++++++++++++++++++-------
 tools/perf/util/parse-events.c |   43 +++++++++++++++++++-
 tools/perf/util/parse-events.h |    1 +
 3 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..bcc16b0 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT 0xFFFFFFFF
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,25 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u64	state;
+	u64	cpu_id;
+};
+
+struct power_suspend_entry {
+	struct trace_entry te;
+	u64	state;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +506,46 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		else if (strcmp(event_str, "power:power_start") == 0)
+			c_state_start(peo->cpu_id, data.time, peo->value);
+
+		else if (strcmp(event_str, "power:power_end") == 0)
+			c_state_end(peo->cpu_id, data.time);
+
+		else if (strcmp(event_str, "power:power_frequency") == 0)
+			p_state_change(peo->cpu_id, data.time, peo->value);
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1002,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1015,38 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
 
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (is_valid_tracepoint("power:power_start")) {
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..d706dcb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -824,7 +824,7 @@ int parse_events(const struct option *opt __used, const char *str, int unset __u
 		if (ret != EVT_HANDLED_ALL) {
 			attrs[nr_counters] = attr;
 			nr_counters++;
-		}
+	}
 
 		if (*str == 0)
 			break;
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (5 preceding siblings ...)
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
  7 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Robert Schoene

cpu_idle events (transition into sleep state and exiting) are
both fired in pm_idle().

Entering sleep state and exiting should always get fired inside
pm_idle() already.

This is a revert of commit c882e0feb937af4e5b991cbd1c

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |    4 ----
 arch/x86/kernel/process_64.c |    6 ------
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (6 preceding siblings ...)
  2010-10-26 23:43 ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
  7 siblings, 2 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger, Robert Schoene

cpu_idle events (transition into sleep state and exiting) are
both fired in pm_idle().

Entering sleep state and exiting should always get fired inside
pm_idle() already.

This is a revert of commit c882e0feb937af4e5b991cbd1c

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |    4 ----
 arch/x86/kernel/process_64.c |    6 ------
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-27 15:42   ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-27 15:42 UTC (permalink / raw)
  To: arjan
  Cc: jean.pihet, Robert Schoene, linux-trace-users, mingo, linux-omap,
	linux-pm

On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> cpu_idle events (transition into sleep state and exiting) are
> both fired in pm_idle().
> 
> Entering sleep state and exiting should always get fired inside
> pm_idle() already.
> 
> This is a revert of commit c882e0feb937af4e5b991cbd1c
Robert: I expect you tested this on a machine with no cpuidle
driver registered?

I should have had a deeper look at this at once, done so now:

Current cpu_idle, power_start/end (same before my changes) behavior
on X86 is rather weird (without this patch):

if pm_idle is:
   poll_idle -> should throw double end events

   default_idle -> only throws power_start, your patch
                   fixed that, but in the generic cpu_idle
                   thread function which always gets executed
                   also if pm_idle != default_idle
                   So Robert fixed this case, but at the wrong place.

   cpuidle_idle_call -> depends whether intel_idle or acpi_idle
                        driver registered:

        intel_idle -> throws a cpu idle state event, still double end
                      events (one from cpuidle, one from
                      process_{32,64}.c, due to Robert's patch

        acpi_idle  -> whether a power_start event is thrown at all
                      depends on (cmp with acpi_idle_do_entry()):
                      cx->entry_method == ACPI_CSTATE_FFH
                      will end up in a power_start event via:
                      acpi_processor_ffh_cstate_enter(cx)
                           mwait_idle_with_hints()
                      but it will not in case of:
                      cx->entry_method == ACPI_CSTATE_HALT
                      or IO based switching (the else path there):
                      /* IO port based C-state */
                      

Again without this(my) patch you get:
   poll_idle                  -> double end events
   default_idle               -> all is fine (with your patch)
   cpuidle_idle_call
        intel_idle registered -> double end events
        acpi_idle registered  -> double end events
                                 start events may not be thrown
                                 at all.

perf timechart can handle double end events, this may be
the reason this was overseen.

Argh, I tried to come up with patches, but run out of
time. I will send something soon.
I also found a bug in my userspace stuff: I forgot to convert
u64 to u32..., sorry about that.
I try to resend everything tomorrow.

     Thomas

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
@ 2010-10-27 15:42   ` Thomas Renninger
  2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
                       ` (3 more replies)
  1 sibling, 4 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-27 15:42 UTC (permalink / raw)
  To: arjan
  Cc: jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users,
	Robert Schoene

On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> cpu_idle events (transition into sleep state and exiting) are
> both fired in pm_idle().
> 
> Entering sleep state and exiting should always get fired inside
> pm_idle() already.
> 
> This is a revert of commit c882e0feb937af4e5b991cbd1c
Robert: I expect you tested this on a machine with no cpuidle
driver registered?

I should have had a deeper look at this at once, done so now:

Current cpu_idle, power_start/end (same before my changes) behavior
on X86 is rather weird (without this patch):

if pm_idle is:
   poll_idle -> should throw double end events

   default_idle -> only throws power_start, your patch
                   fixed that, but in the generic cpu_idle
                   thread function which always gets executed
                   also if pm_idle != default_idle
                   So Robert fixed this case, but at the wrong place.

   cpuidle_idle_call -> depends whether intel_idle or acpi_idle
                        driver registered:

        intel_idle -> throws a cpu idle state event, still double end
                      events (one from cpuidle, one from
                      process_{32,64}.c, due to Robert's patch

        acpi_idle  -> whether a power_start event is thrown at all
                      depends on (cmp with acpi_idle_do_entry()):
                      cx->entry_method == ACPI_CSTATE_FFH
                      will end up in a power_start event via:
                      acpi_processor_ffh_cstate_enter(cx)
                           mwait_idle_with_hints()
                      but it will not in case of:
                      cx->entry_method == ACPI_CSTATE_HALT
                      or IO based switching (the else path there):
                      /* IO port based C-state */
                      

Again without this(my) patch you get:
   poll_idle                  -> double end events
   default_idle               -> all is fine (with your patch)
   cpuidle_idle_call
        intel_idle registered -> double end events
        acpi_idle registered  -> double end events
                                 start events may not be thrown
                                 at all.

perf timechart can handle double end events, this may be
the reason this was overseen.

Argh, I tried to come up with patches, but run out of
time. I will send something soon.
I also found a bug in my userspace stuff: I forgot to convert
u64 to u32..., sorry about that.
I try to resend everything tomorrow.

     Thomas



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform
  2010-10-27 15:42   ` Thomas Renninger
  2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
@ 2010-10-28  0:46     ` Thomas Renninger
  2010-11-01  8:11     ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Robert Schöne
  2010-11-01  8:11     ` Robert Schöne
  3 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-28  0:46 UTC (permalink / raw)
  To: arjan
  Cc: jean.pihet, mingo, rjw, linux-pm, linux-trace-users,
	Robert Schoene, Len Brown, linux-acpi

On Wednesday 27 October 2010 05:42:04 pm Thomas Renninger wrote:
> On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> > cpu_idle events (transition into sleep state and exiting) are
> > both fired in pm_idle().
> > 
> > Entering sleep state and exiting should always get fired inside
> > pm_idle() already.
> > 
> > This is a revert of commit c882e0feb937af4e5b991cbd1c
..
> Argh, I tried to come up with patches, but run out of
> time. I will send something soon.

Below patch should fix and clean up current cpu_idle (former power_start/end)
event throwing policy.

But it introduces a change which state number is thrown in acpi_idle
mwait case. This should be done anyway, but needs further adjustings
in userspace.

If a cpuidle driver is registered, now the event throws the state which
maps to the registered cpuidle state.
This is an important fix, so that the whole cpu_idle event (state) 
interface gets architecture independent.
For the intel_idle driver it currently does not matter because the cpuidle
and mwait C1/2/3 states by luck exactly match.

But on my machine with acpi_idle I only get 2 states: C1/C3.
ls /sys/devices/system/cpu/cpu0/cpuidle/state
state0/     state1/ state2/
(poll idle)   (C1)    (C3)
So with this change it would throw state=2, but it is C3, which has to be
grabbed from:
/sys/devices/system/cpu/cpu0/cpuidle/state2/name

But in intel_idle case the state names are bit long to be displayed nicely
in perf timechart:
        { /* MWAIT C1 */
                .name = "NHM-C1",
                .desc = "MWAIT 0x00",
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",

Therefore I suggest to:
Shorten .name (also by cutting the char array down) to 2 characters
1) Either introduce convention that first word in .desc is the longer name,
   arbitrary flags (as strings) follow, like:
        { /* MWAIT C3 */
                .name = "C3",
                .desc = "NHM-C6 MWAIT 0x20",
2) Or introduce another sysfs file, however it's named:
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",
                .abbr = "C3"
The latter is fully userspace (sysfs) compatible, I'd go for that.

If nobody objects, I may have time to implement this (will take more than a week) which includes:
 - place cpu_idle events correctly, so that they always get thrown when idle
   and no double end marker events can happen
 - Make above cpuidle name/sysfs changes
 - Let perf timechart grab the correct idle state name from sysfs.
   -> Mapping between the cpu_idle event and the sysfs cpuidle state.

Thanks,

    Thomas

--------
PERF: fix power:cpu_idle double end events - always throw events with acpi_idle

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

This is not always possible, only exception is:
c1e_idle -> calls default_idle which throws events -> easy and obvious.
Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle

-> this is really easy is now (if I haven't overseen something).

This also introduces another cleanup which may affect userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific numbers.
Fortunately this change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-trace-users@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: Len Brown <lenb@kernel.org>
CC: linux-acpi@vger.kernel.org

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 155d975..b618548 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -387,6 +387,8 @@ void default_idle(void)
 		else
 			local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else {
 		local_irq_enable();
 		/* loop is done by the caller */
@@ -444,8 +446,6 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
  */
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
-	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
-	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -472,6 +472,8 @@ static void mwait_idle(void)
 			__sti_mwait(0, 0);
 		else
 			local_irq_enable();
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else
 		local_irq_enable();
 }
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 08d5f05..491a4af 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -96,7 +96,15 @@ static void cpuidle_idle_call(void)
 
 	/* enter the state and update stats */
 	dev->last_state = target_state;
+
+	trace_power_start(POWER_CSTATE, next_state, dev->cpu);
+	trace_cpu_idle(next_state, dev->cpu);
+
 	dev->last_residency = target_state->enter(dev, target_state);
+
+	trace_power_end(dev->cpu);
+	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
+
 	if (dev->last_state)
 		target_state = dev->last_state;
 
@@ -106,8 +114,6 @@ static void cpuidle_idle_call(void)
 	/* give the governor an opportunity to reflect on the outcome */
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
-	trace_power_end(smp_processor_id());
-	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d3701bf..5539cff 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,8 +201,6 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform
  2010-10-27 15:42   ` Thomas Renninger
@ 2010-10-28  0:46     ` Thomas Renninger
  2010-10-28  0:46     ` Thomas Renninger
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-28  0:46 UTC (permalink / raw)
  To: arjan
  Cc: linux-acpi, jean.pihet, Robert Schoene, linux-trace-users, mingo,
	linux-pm

On Wednesday 27 October 2010 05:42:04 pm Thomas Renninger wrote:
> On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> > cpu_idle events (transition into sleep state and exiting) are
> > both fired in pm_idle().
> > 
> > Entering sleep state and exiting should always get fired inside
> > pm_idle() already.
> > 
> > This is a revert of commit c882e0feb937af4e5b991cbd1c
..
> Argh, I tried to come up with patches, but run out of
> time. I will send something soon.

Below patch should fix and clean up current cpu_idle (former power_start/end)
event throwing policy.

But it introduces a change which state number is thrown in acpi_idle
mwait case. This should be done anyway, but needs further adjustings
in userspace.

If a cpuidle driver is registered, now the event throws the state which
maps to the registered cpuidle state.
This is an important fix, so that the whole cpu_idle event (state) 
interface gets architecture independent.
For the intel_idle driver it currently does not matter because the cpuidle
and mwait C1/2/3 states by luck exactly match.

But on my machine with acpi_idle I only get 2 states: C1/C3.
ls /sys/devices/system/cpu/cpu0/cpuidle/state
state0/     state1/ state2/
(poll idle)   (C1)    (C3)
So with this change it would throw state=2, but it is C3, which has to be
grabbed from:
/sys/devices/system/cpu/cpu0/cpuidle/state2/name

But in intel_idle case the state names are bit long to be displayed nicely
in perf timechart:
        { /* MWAIT C1 */
                .name = "NHM-C1",
                .desc = "MWAIT 0x00",
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",

Therefore I suggest to:
Shorten .name (also by cutting the char array down) to 2 characters
1) Either introduce convention that first word in .desc is the longer name,
   arbitrary flags (as strings) follow, like:
        { /* MWAIT C3 */
                .name = "C3",
                .desc = "NHM-C6 MWAIT 0x20",
2) Or introduce another sysfs file, however it's named:
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",
                .abbr = "C3"
The latter is fully userspace (sysfs) compatible, I'd go for that.

If nobody objects, I may have time to implement this (will take more than a week) which includes:
 - place cpu_idle events correctly, so that they always get thrown when idle
   and no double end marker events can happen
 - Make above cpuidle name/sysfs changes
 - Let perf timechart grab the correct idle state name from sysfs.
   -> Mapping between the cpu_idle event and the sysfs cpuidle state.

Thanks,

    Thomas

--------
PERF: fix power:cpu_idle double end events - always throw events with acpi_idle

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

This is not always possible, only exception is:
c1e_idle -> calls default_idle which throws events -> easy and obvious.
Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle

-> this is really easy is now (if I haven't overseen something).

This also introduces another cleanup which may affect userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific numbers.
Fortunately this change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-trace-users@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: Len Brown <lenb@kernel.org>
CC: linux-acpi@vger.kernel.org

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 155d975..b618548 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -387,6 +387,8 @@ void default_idle(void)
 		else
 			local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else {
 		local_irq_enable();
 		/* loop is done by the caller */
@@ -444,8 +446,6 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
  */
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
-	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
-	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -472,6 +472,8 @@ static void mwait_idle(void)
 			__sti_mwait(0, 0);
 		else
 			local_irq_enable();
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else
 		local_irq_enable();
 }
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 08d5f05..491a4af 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -96,7 +96,15 @@ static void cpuidle_idle_call(void)
 
 	/* enter the state and update stats */
 	dev->last_state = target_state;
+
+	trace_power_start(POWER_CSTATE, next_state, dev->cpu);
+	trace_cpu_idle(next_state, dev->cpu);
+
 	dev->last_residency = target_state->enter(dev, target_state);
+
+	trace_power_end(dev->cpu);
+	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
+
 	if (dev->last_state)
 		target_state = dev->last_state;
 
@@ -106,8 +114,6 @@ static void cpuidle_idle_call(void)
 	/* give the governor an opportunity to reflect on the outcome */
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
-	trace_power_end(smp_processor_id());
-	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d3701bf..5539cff 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,8 +201,6 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-27 15:42   ` Thomas Renninger
  2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
  2010-10-28  0:46     ` Thomas Renninger
@ 2010-11-01  8:11     ` Robert Schöne
  2010-11-01  8:11     ` Robert Schöne
  3 siblings, 0 replies; 18+ messages in thread
From: Robert Schöne @ 2010-11-01  8:11 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: jean.pihet, linux-trace-users, mingo, linux-omap, arjan, linux-pm

Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:

> Robert: I expect you tested this on a machine with no cpuidle
> driver registered?

You're right, there was no idle driver, but the idle process from
process_64.c which called the idle routine.
I reported my thoughts on this on 14th of May this year 2010, mostly
claiming for a standard on where to report these events.

You're also missing the other idle routines from x86/kernel/process.c
mwait_idle_with_hints and mwait_idle only throw start events, so they
should behave like default_idle. poll_idle on the other hand reports the
end event itself.

Robert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-27 15:42   ` Thomas Renninger
                       ` (2 preceding siblings ...)
  2010-11-01  8:11     ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Robert Schöne
@ 2010-11-01  8:11     ` Robert Schöne
  2010-11-04  8:57       ` Thomas Renninger
  2010-11-04  8:57       ` Thomas Renninger
  3 siblings, 2 replies; 18+ messages in thread
From: Robert Schöne @ 2010-11-01  8:11 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:

> Robert: I expect you tested this on a machine with no cpuidle
> driver registered?

You're right, there was no idle driver, but the idle process from
process_64.c which called the idle routine.
I reported my thoughts on this on 14th of May this year 2010, mostly
claiming for a standard on where to report these events.

You're also missing the other idle routines from x86/kernel/process.c
mwait_idle_with_hints and mwait_idle only throw start events, so they
should behave like default_idle. poll_idle on the other hand reports the
end event itself.

Robert


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-11-01  8:11     ` Robert Schöne
  2010-11-04  8:57       ` Thomas Renninger
@ 2010-11-04  8:57       ` Thomas Renninger
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-11-04  8:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: jean.pihet, linux-trace-users, mingo, linux-omap, arjan, linux-pm

Hi Robert,

On Monday 01 November 2010 09:11:28 Robert Schöne wrote:
> Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:
> 
> > Robert: I expect you tested this on a machine with no cpuidle
> > driver registered?
> 
> You're right, there was no idle driver, but the idle process from
> process_64.c which called the idle routine.
> I reported my thoughts on this on 14th of May this year 2010, mostly
> claiming for a standard on where to report these events.
> 
> You're also missing the other idle routines from x86/kernel/process.c
> mwait_idle_with_hints and mwait_idle only throw start events, so they
> should behave like default_idle. poll_idle on the other hand reports
> the end event itself.
I added you to CC of a patch that fixes the issue (and all other double 
or missing (for acpi_idle driver) events) in a nice generic way.
It needs further userspace adjustings and I still wait for this
separate patch series to go into some branch.

Thanks,

   Thomas


_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-11-01  8:11     ` Robert Schöne
@ 2010-11-04  8:57       ` Thomas Renninger
  2010-11-04  8:57       ` Thomas Renninger
  1 sibling, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-11-04  8:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Hi Robert,

On Monday 01 November 2010 09:11:28 Robert Schöne wrote:
> Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:
> 
> > Robert: I expect you tested this on a machine with no cpuidle
> > driver registered?
> 
> You're right, there was no idle driver, but the idle process from
> process_64.c which called the idle routine.
> I reported my thoughts on this on 14th of May this year 2010, mostly
> claiming for a standard on where to report these events.
> 
> You're also missing the other idle routines from x86/kernel/process.c
> mwait_idle_with_hints and mwait_idle only throw start events, so they
> should behave like default_idle. poll_idle on the other hand reports
> the end event itself.
I added you to CC of a patch that fixes the issue (and all other double 
or missing (for acpi_idle driver) events) in a nice generic way.
It needs further userspace adjustings and I still wait for this
separate patch series to go into some branch.

Thanks,

   Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Cleanup and fixes for power trace events
@ 2010-10-26 23:43 Thomas Renninger
  0 siblings, 0 replies; 18+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users


[PATCH 1/4] PERF: Do not export power_frequency, but power_start event
[PATCH 2/4] PERF(kernel): Cleanup power events V3
[PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
[PATCH 4/4] PERF: fix power:cpu_idle double end events

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2010-11-04  8:57 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-26 23:43 ` [PATCH 2/4] PERF(kernel): Cleanup power events V3 Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-26 23:43 ` [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-26 23:43 ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-27 15:42   ` Thomas Renninger
2010-10-27 15:42   ` Thomas Renninger
2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
2010-10-28  0:46     ` Thomas Renninger
2010-11-01  8:11     ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Robert Schöne
2010-11-01  8:11     ` Robert Schöne
2010-11-04  8:57       ` Thomas Renninger
2010-11-04  8:57       ` Thomas Renninger
2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.