All of lore.kernel.org
 help / color / mirror / Atom feed
* Cleanup and fixes for power trace events
@ 2010-10-26 23:43 Thomas Renninger
  2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users


[PATCH 1/4] PERF: Do not export power_frequency, but power_start event
[PATCH 2/4] PERF(kernel): Cleanup power events V3
[PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
[PATCH 4/4] PERF: fix power:cpu_idle double end events


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] PERF: Do not export power_frequency, but power_start event
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
  2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` [PATCH 2/4] PERF(kernel): Cleanup power events V3 Thomas Renninger
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c37ef64..21ac077 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,9 +201,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/4] PERF: Do not export power_frequency, but power_start event
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c37ef64..21ac077 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,9 +201,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] PERF(kernel): Cleanup power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
  2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events
  - Fixed a copy and paste bug (poll_idle should throw and event
    state of zero, not 1)

Changes in V2:
  - Introduce PWR_EVENT_EXIT instead of 0 to mark non-power state
  - Use u32 instead of u64 for cpuid, state which is by far enough

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   14 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..0b5c841 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,20 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] PERF(kernel): Cleanup power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (2 preceding siblings ...)
  2010-10-26 23:43 ` [PATCH 2/4] PERF(kernel): Cleanup power events V3 Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events
  - Fixed a copy and paste bug (poll_idle should throw and event
    state of zero, not 1)

Changes in V2:
  - Introduce PWR_EVENT_EXIT instead of 0 to mark non-power state
  - Use u32 instead of u64 for cpuid, state which is by far enough

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   14 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..0b5c841 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,20 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (4 preceding siblings ...)
  2010-10-26 23:43 ` [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events

Changes in V2:
  - Hanlde PWR_EVENT_EXIT instead of 0 to recon non-power state

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 tools/perf/builtin-timechart.c |   89 ++++++++++++++++++++++++++++++++-------
 tools/perf/util/parse-events.c |   43 +++++++++++++++++++-
 tools/perf/util/parse-events.h |    1 +
 3 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..bcc16b0 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT 0xFFFFFFFF
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,25 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u64	state;
+	u64	cpu_id;
+};
+
+struct power_suspend_entry {
+	struct trace_entry te;
+	u64	state;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +506,46 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		else if (strcmp(event_str, "power:power_start") == 0)
+			c_state_start(peo->cpu_id, data.time, peo->value);
+
+		else if (strcmp(event_str, "power:power_end") == 0)
+			c_state_end(peo->cpu_id, data.time);
+
+		else if (strcmp(event_str, "power:power_frequency") == 0)
+			p_state_change(peo->cpu_id, data.time, peo->value);
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1002,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1015,38 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
 
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (is_valid_tracepoint("power:power_start")) {
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..d706dcb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -824,7 +824,7 @@ int parse_events(const struct option *opt __used, const char *str, int unset __u
 		if (ret != EVT_HANDLED_ALL) {
 			attrs[nr_counters] = attr;
 			nr_counters++;
-		}
+	}
 
 		if (*str == 0)
 			break;
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new power events V3
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (3 preceding siblings ...)
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger

Changes in V3:
  - PWR_EVENT_EXIT is -1 now instead of 0xFFFFFFFF
  - Use cpu_{idle,frequency} instead of processor_{idle,frequency}
    events

Changes in V2:
  - Hanlde PWR_EVENT_EXIT instead of 0 to recon non-power state

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 tools/perf/builtin-timechart.c |   89 ++++++++++++++++++++++++++++++++-------
 tools/perf/util/parse-events.c |   43 +++++++++++++++++++-
 tools/perf/util/parse-events.h |    1 +
 3 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..bcc16b0 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT 0xFFFFFFFF
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,25 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u64	state;
+	u64	cpu_id;
+};
+
+struct power_suspend_entry {
+	struct trace_entry te;
+	u64	state;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +506,46 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		else if (strcmp(event_str, "power:power_start") == 0)
+			c_state_start(peo->cpu_id, data.time, peo->value);
+
+		else if (strcmp(event_str, "power:power_end") == 0)
+			c_state_end(peo->cpu_id, data.time);
+
+		else if (strcmp(event_str, "power:power_frequency") == 0)
+			p_state_change(peo->cpu_id, data.time, peo->value);
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1002,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1015,38 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
 
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (is_valid_tracepoint("power:power_start")) {
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..d706dcb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -824,7 +824,7 @@ int parse_events(const struct option *opt __used, const char *str, int unset __u
 		if (ret != EVT_HANDLED_ALL) {
 			attrs[nr_counters] = attr;
 			nr_counters++;
-		}
+	}
 
 		if (*str == 0)
 			break;
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (5 preceding siblings ...)
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-26 23:43 ` Thomas Renninger
  7 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Robert Schoene

cpu_idle events (transition into sleep state and exiting) are
both fired in pm_idle().

Entering sleep state and exiting should always get fired inside
pm_idle() already.

This is a revert of commit c882e0feb937af4e5b991cbd1c

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |    4 ----
 arch/x86/kernel/process_64.c |    6 ------
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
                   ` (6 preceding siblings ...)
  2010-10-26 23:43 ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Thomas Renninger
@ 2010-10-26 23:43 ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
  7 siblings, 2 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-26 23:43 UTC (permalink / raw)
  To: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users
  Cc: Thomas Renninger, Robert Schoene

cpu_idle events (transition into sleep state and exiting) are
both fired in pm_idle().

Entering sleep state and exiting should always get fired inside
pm_idle() already.

This is a revert of commit c882e0feb937af4e5b991cbd1c

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process_32.c |    4 ----
 arch/x86/kernel/process_64.c |    6 ------
 2 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 ` Thomas Renninger
@ 2010-10-27 15:42   ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-27 15:42 UTC (permalink / raw)
  To: arjan
  Cc: jean.pihet, Robert Schoene, linux-trace-users, mingo, linux-omap,
	linux-pm

On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> cpu_idle events (transition into sleep state and exiting) are
> both fired in pm_idle().
> 
> Entering sleep state and exiting should always get fired inside
> pm_idle() already.
> 
> This is a revert of commit c882e0feb937af4e5b991cbd1c
Robert: I expect you tested this on a machine with no cpuidle
driver registered?

I should have had a deeper look at this at once, done so now:

Current cpu_idle, power_start/end (same before my changes) behavior
on X86 is rather weird (without this patch):

if pm_idle is:
   poll_idle -> should throw double end events

   default_idle -> only throws power_start, your patch
                   fixed that, but in the generic cpu_idle
                   thread function which always gets executed
                   also if pm_idle != default_idle
                   So Robert fixed this case, but at the wrong place.

   cpuidle_idle_call -> depends whether intel_idle or acpi_idle
                        driver registered:

        intel_idle -> throws a cpu idle state event, still double end
                      events (one from cpuidle, one from
                      process_{32,64}.c, due to Robert's patch

        acpi_idle  -> whether a power_start event is thrown at all
                      depends on (cmp with acpi_idle_do_entry()):
                      cx->entry_method == ACPI_CSTATE_FFH
                      will end up in a power_start event via:
                      acpi_processor_ffh_cstate_enter(cx)
                           mwait_idle_with_hints()
                      but it will not in case of:
                      cx->entry_method == ACPI_CSTATE_HALT
                      or IO based switching (the else path there):
                      /* IO port based C-state */
                      

Again without this(my) patch you get:
   poll_idle                  -> double end events
   default_idle               -> all is fine (with your patch)
   cpuidle_idle_call
        intel_idle registered -> double end events
        acpi_idle registered  -> double end events
                                 start events may not be thrown
                                 at all.

perf timechart can handle double end events, this may be
the reason this was overseen.

Argh, I tried to come up with patches, but run out of
time. I will send something soon.
I also found a bug in my userspace stuff: I forgot to convert
u64 to u32..., sorry about that.
I try to resend everything tomorrow.

     Thomas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-26 23:43 ` Thomas Renninger
  2010-10-27 15:42   ` Thomas Renninger
@ 2010-10-27 15:42   ` Thomas Renninger
  2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
                       ` (3 more replies)
  1 sibling, 4 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-27 15:42 UTC (permalink / raw)
  To: arjan
  Cc: jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users,
	Robert Schoene

On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> cpu_idle events (transition into sleep state and exiting) are
> both fired in pm_idle().
> 
> Entering sleep state and exiting should always get fired inside
> pm_idle() already.
> 
> This is a revert of commit c882e0feb937af4e5b991cbd1c
Robert: I expect you tested this on a machine with no cpuidle
driver registered?

I should have had a deeper look at this at once, done so now:

Current cpu_idle, power_start/end (same before my changes) behavior
on X86 is rather weird (without this patch):

if pm_idle is:
   poll_idle -> should throw double end events

   default_idle -> only throws power_start, your patch
                   fixed that, but in the generic cpu_idle
                   thread function which always gets executed
                   also if pm_idle != default_idle
                   So Robert fixed this case, but at the wrong place.

   cpuidle_idle_call -> depends whether intel_idle or acpi_idle
                        driver registered:

        intel_idle -> throws a cpu idle state event, still double end
                      events (one from cpuidle, one from
                      process_{32,64}.c, due to Robert's patch

        acpi_idle  -> whether a power_start event is thrown at all
                      depends on (cmp with acpi_idle_do_entry()):
                      cx->entry_method == ACPI_CSTATE_FFH
                      will end up in a power_start event via:
                      acpi_processor_ffh_cstate_enter(cx)
                           mwait_idle_with_hints()
                      but it will not in case of:
                      cx->entry_method == ACPI_CSTATE_HALT
                      or IO based switching (the else path there):
                      /* IO port based C-state */
                      

Again without this(my) patch you get:
   poll_idle                  -> double end events
   default_idle               -> all is fine (with your patch)
   cpuidle_idle_call
        intel_idle registered -> double end events
        acpi_idle registered  -> double end events
                                 start events may not be thrown
                                 at all.

perf timechart can handle double end events, this may be
the reason this was overseen.

Argh, I tried to come up with patches, but run out of
time. I will send something soon.
I also found a bug in my userspace stuff: I forgot to convert
u64 to u32..., sorry about that.
I try to resend everything tomorrow.

     Thomas



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform
  2010-10-27 15:42   ` Thomas Renninger
  2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
@ 2010-10-28  0:46     ` Thomas Renninger
  2010-11-01  8:11     ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Robert Schöne
  2010-11-01  8:11     ` Robert Schöne
  3 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-28  0:46 UTC (permalink / raw)
  To: arjan
  Cc: jean.pihet, mingo, rjw, linux-pm, linux-trace-users,
	Robert Schoene, Len Brown, linux-acpi

On Wednesday 27 October 2010 05:42:04 pm Thomas Renninger wrote:
> On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> > cpu_idle events (transition into sleep state and exiting) are
> > both fired in pm_idle().
> > 
> > Entering sleep state and exiting should always get fired inside
> > pm_idle() already.
> > 
> > This is a revert of commit c882e0feb937af4e5b991cbd1c
..
> Argh, I tried to come up with patches, but run out of
> time. I will send something soon.

Below patch should fix and clean up current cpu_idle (former power_start/end)
event throwing policy.

But it introduces a change which state number is thrown in acpi_idle
mwait case. This should be done anyway, but needs further adjustings
in userspace.

If a cpuidle driver is registered, now the event throws the state which
maps to the registered cpuidle state.
This is an important fix, so that the whole cpu_idle event (state) 
interface gets architecture independent.
For the intel_idle driver it currently does not matter because the cpuidle
and mwait C1/2/3 states by luck exactly match.

But on my machine with acpi_idle I only get 2 states: C1/C3.
ls /sys/devices/system/cpu/cpu0/cpuidle/state
state0/     state1/ state2/
(poll idle)   (C1)    (C3)
So with this change it would throw state=2, but it is C3, which has to be
grabbed from:
/sys/devices/system/cpu/cpu0/cpuidle/state2/name

But in intel_idle case the state names are bit long to be displayed nicely
in perf timechart:
        { /* MWAIT C1 */
                .name = "NHM-C1",
                .desc = "MWAIT 0x00",
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",

Therefore I suggest to:
Shorten .name (also by cutting the char array down) to 2 characters
1) Either introduce convention that first word in .desc is the longer name,
   arbitrary flags (as strings) follow, like:
        { /* MWAIT C3 */
                .name = "C3",
                .desc = "NHM-C6 MWAIT 0x20",
2) Or introduce another sysfs file, however it's named:
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",
                .abbr = "C3"
The latter is fully userspace (sysfs) compatible, I'd go for that.

If nobody objects, I may have time to implement this (will take more than a week) which includes:
 - place cpu_idle events correctly, so that they always get thrown when idle
   and no double end marker events can happen
 - Make above cpuidle name/sysfs changes
 - Let perf timechart grab the correct idle state name from sysfs.
   -> Mapping between the cpu_idle event and the sysfs cpuidle state.

Thanks,

    Thomas

--------
PERF: fix power:cpu_idle double end events - always throw events with acpi_idle

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

This is not always possible, only exception is:
c1e_idle -> calls default_idle which throws events -> easy and obvious.
Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle

-> this is really easy is now (if I haven't overseen something).

This also introduces another cleanup which may affect userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific numbers.
Fortunately this change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-trace-users@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: Len Brown <lenb@kernel.org>
CC: linux-acpi@vger.kernel.org

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 155d975..b618548 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -387,6 +387,8 @@ void default_idle(void)
 		else
 			local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else {
 		local_irq_enable();
 		/* loop is done by the caller */
@@ -444,8 +446,6 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
  */
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
-	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
-	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -472,6 +472,8 @@ static void mwait_idle(void)
 			__sti_mwait(0, 0);
 		else
 			local_irq_enable();
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else
 		local_irq_enable();
 }
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 08d5f05..491a4af 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -96,7 +96,15 @@ static void cpuidle_idle_call(void)
 
 	/* enter the state and update stats */
 	dev->last_state = target_state;
+
+	trace_power_start(POWER_CSTATE, next_state, dev->cpu);
+	trace_cpu_idle(next_state, dev->cpu);
+
 	dev->last_residency = target_state->enter(dev, target_state);
+
+	trace_power_end(dev->cpu);
+	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
+
 	if (dev->last_state)
 		target_state = dev->last_state;
 
@@ -106,8 +114,6 @@ static void cpuidle_idle_call(void)
 	/* give the governor an opportunity to reflect on the outcome */
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
-	trace_power_end(smp_processor_id());
-	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d3701bf..5539cff 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,8 +201,6 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform
  2010-10-27 15:42   ` Thomas Renninger
@ 2010-10-28  0:46     ` Thomas Renninger
  2010-10-28  0:46     ` Thomas Renninger
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-10-28  0:46 UTC (permalink / raw)
  To: arjan
  Cc: linux-acpi, jean.pihet, Robert Schoene, linux-trace-users, mingo,
	linux-pm

On Wednesday 27 October 2010 05:42:04 pm Thomas Renninger wrote:
> On Wednesday 27 October 2010 01:43:25 Thomas Renninger wrote:
> > cpu_idle events (transition into sleep state and exiting) are
> > both fired in pm_idle().
> > 
> > Entering sleep state and exiting should always get fired inside
> > pm_idle() already.
> > 
> > This is a revert of commit c882e0feb937af4e5b991cbd1c
..
> Argh, I tried to come up with patches, but run out of
> time. I will send something soon.

Below patch should fix and clean up current cpu_idle (former power_start/end)
event throwing policy.

But it introduces a change which state number is thrown in acpi_idle
mwait case. This should be done anyway, but needs further adjustings
in userspace.

If a cpuidle driver is registered, now the event throws the state which
maps to the registered cpuidle state.
This is an important fix, so that the whole cpu_idle event (state) 
interface gets architecture independent.
For the intel_idle driver it currently does not matter because the cpuidle
and mwait C1/2/3 states by luck exactly match.

But on my machine with acpi_idle I only get 2 states: C1/C3.
ls /sys/devices/system/cpu/cpu0/cpuidle/state
state0/     state1/ state2/
(poll idle)   (C1)    (C3)
So with this change it would throw state=2, but it is C3, which has to be
grabbed from:
/sys/devices/system/cpu/cpu0/cpuidle/state2/name

But in intel_idle case the state names are bit long to be displayed nicely
in perf timechart:
        { /* MWAIT C1 */
                .name = "NHM-C1",
                .desc = "MWAIT 0x00",
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",

Therefore I suggest to:
Shorten .name (also by cutting the char array down) to 2 characters
1) Either introduce convention that first word in .desc is the longer name,
   arbitrary flags (as strings) follow, like:
        { /* MWAIT C3 */
                .name = "C3",
                .desc = "NHM-C6 MWAIT 0x20",
2) Or introduce another sysfs file, however it's named:
        { /* MWAIT C3 */
                .name = "NHM-C6",
                .desc = "MWAIT 0x20",
                .abbr = "C3"
The latter is fully userspace (sysfs) compatible, I'd go for that.

If nobody objects, I may have time to implement this (will take more than a week) which includes:
 - place cpu_idle events correctly, so that they always get thrown when idle
   and no double end marker events can happen
 - Make above cpuidle name/sysfs changes
 - Let perf timechart grab the correct idle state name from sysfs.
   -> Mapping between the cpu_idle event and the sysfs cpuidle state.

Thanks,

    Thomas

--------
PERF: fix power:cpu_idle double end events - always throw events with acpi_idle

Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.

This is not always possible, only exception is:
c1e_idle -> calls default_idle which throws events -> easy and obvious.
Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle

-> this is really easy is now (if I haven't overseen something).

This also introduces another cleanup which may affect userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific numbers.
Fortunately this change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-trace-users@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: Len Brown <lenb@kernel.org>
CC: linux-acpi@vger.kernel.org

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 155d975..b618548 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -387,6 +387,8 @@ void default_idle(void)
 		else
 			local_irq_enable();
 		current_thread_info()->status |= TS_POLLING;
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else {
 		local_irq_enable();
 		/* loop is done by the caller */
@@ -444,8 +446,6 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
  */
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
-	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
-	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -472,6 +472,8 @@ static void mwait_idle(void)
 			__sti_mwait(0, 0);
 		else
 			local_irq_enable();
+		trace_power_end(smp_processor_id());
+		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	} else
 		local_irq_enable();
 }
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 4b9befa..8d12878 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -57,8 +57,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
 
 /*
@@ -113,8 +111,6 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 28153a9..d7b3e95 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -51,8 +51,6 @@
 #include <asm/syscalls.h>
 #include <asm/debugreg.h>
 
-#include <trace/events/power.h>
-
 asmlinkage extern void ret_from_fork(void);
 
 DEFINE_PER_CPU(unsigned long, old_rsp);
@@ -141,10 +139,6 @@ void cpu_idle(void)
 			pm_idle();
 			start_critical_timings();
 
-			trace_power_end(smp_processor_id());
-			trace_cpu_idle(PWR_EVENT_EXIT,
-				       smp_processor_id());
-
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
 			   loops can be woken up without interrupt. */
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 08d5f05..491a4af 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -96,7 +96,15 @@ static void cpuidle_idle_call(void)
 
 	/* enter the state and update stats */
 	dev->last_state = target_state;
+
+	trace_power_start(POWER_CSTATE, next_state, dev->cpu);
+	trace_cpu_idle(next_state, dev->cpu);
+
 	dev->last_residency = target_state->enter(dev, target_state);
+
+	trace_power_end(dev->cpu);
+	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
+
 	if (dev->last_state)
 		target_state = dev->last_state;
 
@@ -106,8 +114,6 @@ static void cpuidle_idle_call(void)
 	/* give the governor an opportunity to reflect on the outcome */
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
-	trace_power_end(smp_processor_id());
-	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index d3701bf..5539cff 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,8 +201,6 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-27 15:42   ` Thomas Renninger
  2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
  2010-10-28  0:46     ` Thomas Renninger
@ 2010-11-01  8:11     ` Robert Schöne
  2010-11-01  8:11     ` Robert Schöne
  3 siblings, 0 replies; 17+ messages in thread
From: Robert Schöne @ 2010-11-01  8:11 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: jean.pihet, linux-trace-users, mingo, linux-omap, arjan, linux-pm

Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:

> Robert: I expect you tested this on a machine with no cpuidle
> driver registered?

You're right, there was no idle driver, but the idle process from
process_64.c which called the idle routine.
I reported my thoughts on this on 14th of May this year 2010, mostly
claiming for a standard on where to report these events.

You're also missing the other idle routines from x86/kernel/process.c
mwait_idle_with_hints and mwait_idle only throw start events, so they
should behave like default_idle. poll_idle on the other hand reports the
end event itself.

Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-10-27 15:42   ` Thomas Renninger
                       ` (2 preceding siblings ...)
  2010-11-01  8:11     ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Robert Schöne
@ 2010-11-01  8:11     ` Robert Schöne
  2010-11-04  8:57       ` Thomas Renninger
  2010-11-04  8:57       ` Thomas Renninger
  3 siblings, 2 replies; 17+ messages in thread
From: Robert Schöne @ 2010-11-01  8:11 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:

> Robert: I expect you tested this on a machine with no cpuidle
> driver registered?

You're right, there was no idle driver, but the idle process from
process_64.c which called the idle routine.
I reported my thoughts on this on 14th of May this year 2010, mostly
claiming for a standard on where to report these events.

You're also missing the other idle routines from x86/kernel/process.c
mwait_idle_with_hints and mwait_idle only throw start events, so they
should behave like default_idle. poll_idle on the other hand reports the
end event itself.

Robert


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-11-01  8:11     ` Robert Schöne
  2010-11-04  8:57       ` Thomas Renninger
@ 2010-11-04  8:57       ` Thomas Renninger
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-11-04  8:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: jean.pihet, linux-trace-users, mingo, linux-omap, arjan, linux-pm

Hi Robert,

On Monday 01 November 2010 09:11:28 Robert Schöne wrote:
> Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:
> 
> > Robert: I expect you tested this on a machine with no cpuidle
> > driver registered?
> 
> You're right, there was no idle driver, but the idle process from
> process_64.c which called the idle routine.
> I reported my thoughts on this on 14th of May this year 2010, mostly
> claiming for a standard on where to report these events.
> 
> You're also missing the other idle routines from x86/kernel/process.c
> mwait_idle_with_hints and mwait_idle only throw start events, so they
> should behave like default_idle. poll_idle on the other hand reports
> the end event itself.
I added you to CC of a patch that fixes the issue (and all other double 
or missing (for acpi_idle driver) events) in a nice generic way.
It needs further userspace adjustings and I still wait for this
separate patch series to go into some branch.

Thanks,

   Thomas


_______________________________________________
linux-pm mailing list
linux-pm@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/linux-pm

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/4] PERF: fix power:cpu_idle double end events
  2010-11-01  8:11     ` Robert Schöne
@ 2010-11-04  8:57       ` Thomas Renninger
  2010-11-04  8:57       ` Thomas Renninger
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Renninger @ 2010-11-04  8:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: arjan, jean.pihet, mingo, rjw, linux-omap, linux-pm, linux-trace-users

Hi Robert,

On Monday 01 November 2010 09:11:28 Robert Schöne wrote:
> Am Mittwoch, den 27.10.2010, 17:42 +0200 schrieb Thomas Renninger:
> 
> > Robert: I expect you tested this on a machine with no cpuidle
> > driver registered?
> 
> You're right, there was no idle driver, but the idle process from
> process_64.c which called the idle routine.
> I reported my thoughts on this on 14th of May this year 2010, mostly
> claiming for a standard on where to report these events.
> 
> You're also missing the other idle routines from x86/kernel/process.c
> mwait_idle_with_hints and mwait_idle only throw start events, so they
> should behave like default_idle. poll_idle on the other hand reports
> the end event itself.
I added you to CC of a patch that fixes the issue (and all other double 
or missing (for acpi_idle driver) events) in a nice generic way.
It needs further userspace adjustings and I still wait for this
separate patch series to go into some branch.

Thanks,

   Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-11-04  8:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-26 23:43 Cleanup and fixes for power trace events Thomas Renninger
2010-10-26 23:43 ` [PATCH 1/4] PERF: Do not export power_frequency, but power_start event Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-26 23:43 ` [PATCH 2/4] PERF(kernel): Cleanup power events V3 Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-26 23:43 ` [PATCH 3/4] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-26 23:43 ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Thomas Renninger
2010-10-26 23:43 ` Thomas Renninger
2010-10-27 15:42   ` Thomas Renninger
2010-10-27 15:42   ` Thomas Renninger
2010-10-28  0:46     ` [RFC] PERF: fix power:cpu_idle double end events and missing acpi_idle events - make cpu_idle power events cpuidle sysfs conform Thomas Renninger
2010-10-28  0:46     ` Thomas Renninger
2010-11-01  8:11     ` [PATCH 4/4] PERF: fix power:cpu_idle double end events Robert Schöne
2010-11-01  8:11     ` Robert Schöne
2010-11-04  8:57       ` Thomas Renninger
2010-11-04  8:57       ` Thomas Renninger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.