All of lore.kernel.org
 help / color / mirror / Atom feed
* Cleanup and enhance power trace events
@ 2010-10-28  9:02 Thomas Renninger
  2010-10-28  9:02 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
                   ` (5 more replies)
  0 siblings, 6 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet,
	arjan, mingo, rjw

Tested with:
acpi_idle and intel_idle as cpuidle drivers.
also tested perf timechart userspace tool without the new interface
and it still works fine with the new kernel changes.

There are quite some issues with the balance of cpu_idle
enter/leave a sleep state, but this was the case already.
perf timechart handles double "leave idle" events gracefully (should
still get fixed up at some time).

The first patch makes intel_idle cpuidle driver tracable, even
if compiled as a module.

Ingo: Can you please push these into an approprate git tree/branch.

Thanks,

    Thomas



^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 1/3] PERF: Do not export power_frequency, but power_start event
  2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
  2010-10-28  9:02 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
@ 2010-10-28  9:02 ` Thomas Renninger
  2010-10-28  9:02 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet, arjan, mingo

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c37ef64..21ac077 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,9 +201,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 1/3] PERF: Do not export power_frequency, but power_start event
  2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
@ 2010-10-28  9:02 ` Thomas Renninger
  2010-10-28  9:02 ` Thomas Renninger
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet,
	arjan, mingo, rjw

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index c37ef64..21ac077 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -201,9 +201,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
                   ` (2 preceding siblings ...)
  2010-10-28  9:02 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
@ 2010-10-28  9:02 ` Thomas Renninger
  2010-10-28  9:02 ` [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
  2010-10-28  9:02 ` Thomas Renninger
  5 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet, arjan, mingo

Recent changes:
  - Enable EVENT_POWER_TRACING_DEPRECATED by default

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   15 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..8ccbedd 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
  2010-10-28  9:02 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
  2010-10-28  9:02 ` Thomas Renninger
@ 2010-10-28  9:02 ` Thomas Renninger
  2010-10-28 11:17   ` Rafael J. Wysocki
  2010-10-28 11:17   ` Rafael J. Wysocki
  2010-10-28  9:02 ` Thomas Renninger
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet,
	arjan, mingo, rjw

Recent changes:
  - Enable EVENT_POWER_TRACING_DEPRECATED by default

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   15 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..8ccbedd 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events
  2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
                   ` (3 preceding siblings ...)
  2010-10-28  9:02 ` Thomas Renninger
@ 2010-10-28  9:02 ` Thomas Renninger
  2010-10-28  9:02 ` Thomas Renninger
  5 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet, arjan, mingo

Recent changes:
   - Adjust state/cpuid to u32 as done in the kernel

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 tools/perf/builtin-timechart.c |   91 ++++++++++++++++++++++++++++++++-------
 tools/perf/util/parse-events.c |   41 ++++++++++++++++++
 tools/perf/util/parse-events.h |    1 +
 3 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..1d15228 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT -1
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,21 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static int use_old_power_events;
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u32	state;
+	u32	cpu_id;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +502,48 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == (u32)PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		if (use_old_power_events) {
+			if (strcmp(event_str, "power:power_start") == 0)
+				c_state_start(peo->cpu_id, data.time, peo->value);
+
+			else if (strcmp(event_str, "power:power_end") == 0)
+				c_state_end(peo->cpu_id, data.time);
+
+			else if (strcmp(event_str, "power:power_frequency") == 0)
+				p_state_change(peo->cpu_id, data.time, peo->value);
+		}
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1000,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1013,40 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
-
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (!is_valid_tracepoint("power:cpu_idle") &&
+	    is_valid_tracepoint("power:power_start")) {
+		use_old_power_events = 1;
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..35e3dea 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
-- 
1.6.3

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events
  2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
                   ` (4 preceding siblings ...)
  2010-10-28  9:02 ` [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
@ 2010-10-28  9:02 ` Thomas Renninger
  2010-10-28  9:19   ` Thomas Renninger
  2010-10-28  9:19   ` Thomas Renninger
  5 siblings, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:02 UTC (permalink / raw)
  To: trenn, linux-omap, linux-pm, linux-trace-users, jean.pihet,
	arjan, mingo, rjw

Recent changes:
   - Adjust state/cpuid to u32 as done in the kernel

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 tools/perf/builtin-timechart.c |   91 ++++++++++++++++++++++++++++++++-------
 tools/perf/util/parse-events.c |   41 ++++++++++++++++++
 tools/perf/util/parse-events.h |    1 +
 3 files changed, 116 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..1d15228 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT -1
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,21 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static int use_old_power_events;
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u32	state;
+	u32	cpu_id;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +502,48 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == (u32)PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		if (use_old_power_events) {
+			if (strcmp(event_str, "power:power_start") == 0)
+				c_state_start(peo->cpu_id, data.time, peo->value);
+
+			else if (strcmp(event_str, "power:power_end") == 0)
+				c_state_end(peo->cpu_id, data.time);
+
+			else if (strcmp(event_str, "power:power_frequency") == 0)
+				p_state_change(peo->cpu_id, data.time, peo->value);
+		}
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1000,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1013,40 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
-
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (!is_valid_tracepoint("power:cpu_idle") &&
+	    is_valid_tracepoint("power:power_start")) {
+		use_old_power_events = 1;
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..35e3dea 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events
  2010-10-28  9:02 ` Thomas Renninger
@ 2010-10-28  9:19   ` Thomas Renninger
  2010-10-28  9:19   ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:19 UTC (permalink / raw)
  To: linux-omap; +Cc: jean.pihet, linux-trace-users, linux-pm, arjan, mingo

On Thursday 28 October 2010 11:02:59 Thomas Renninger wrote:
> Recent changes:
>    - Adjust state/cpuid to u32 as done in the kernel
> 
Argh, I forgot a guilt refresh...
Below final :) patch also fixes a bug that got introduced
in 2.6.36 already. Will also send a version for stable
inclusion.

    Thomas

----
PERF(userspace): Adjust perf timechart to the new power events

Recent changes:
   - Adjust state/cpuid to u32 as done in the kernel

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..299e488 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT -1
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,21 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static int use_old_power_events;
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u32	state;
+	u32	cpu_id;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +502,48 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == (u32)PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		if (use_old_power_events) {
+			if (strcmp(event_str, "power:power_start") == 0)
+				c_state_start(peo->cpu_id, data.time, peo->value);
+
+			else if (strcmp(event_str, "power:power_end") == 0)
+				c_state_end(data.cpu, data.time);
+
+			else if (strcmp(event_str, "power:power_frequency") == 0)
+				p_state_change(peo->cpu_id, data.time, peo->value);
+		}
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1000,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1013,40 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
-
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (!is_valid_tracepoint("power:cpu_idle") &&
+	    is_valid_tracepoint("power:power_start")) {
+		use_old_power_events = 1;
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..35e3dea 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new power events
  2010-10-28  9:02 ` Thomas Renninger
  2010-10-28  9:19   ` Thomas Renninger
@ 2010-10-28  9:19   ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28  9:19 UTC (permalink / raw)
  To: linux-omap; +Cc: linux-pm, linux-trace-users, jean.pihet, arjan, mingo, rjw

On Thursday 28 October 2010 11:02:59 Thomas Renninger wrote:
> Recent changes:
>    - Adjust state/cpuid to u32 as done in the kernel
> 
Argh, I forgot a guilt refresh...
Below final :) patch also fixes a bug that got introduced
in 2.6.36 already. Will also send a version for stable
inclusion.

    Thomas

----
PERF(userspace): Adjust perf timechart to the new power events

Recent changes:
   - Adjust state/cpuid to u32 as done in the kernel

The transition was rather smooth, only part I had to fiddle
some time was the check whether a tracepoint/event is
supported by the running kernel.

builtin-timechart must only pass -e power:xy events which
are supported by the running kernel.
For this I added the tiny helper function:
int is_valid_tracepoint(const char *event_string)
to parse-events.[hc]
which could be more generic as an interface and support
hardware/software/... events, not only tracepoints, but someone
else could extend that if needed...

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl

diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c
index 9bcc38f..299e488 100644
--- a/tools/perf/builtin-timechart.c
+++ b/tools/perf/builtin-timechart.c
@@ -32,6 +32,10 @@
 #include "util/session.h"
 #include "util/svghelper.h"
 
+#define SUPPORT_OLD_POWER_EVENTS 1
+#define PWR_EVENT_EXIT -1
+
+
 static char		const *input_name = "perf.data";
 static char		const *output_name = "output.svg";
 
@@ -298,12 +302,21 @@ struct trace_entry {
 	int			lock_depth;
 };
 
-struct power_entry {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static int use_old_power_events;
+struct power_entry_old {
 	struct trace_entry te;
 	u64	type;
 	u64	value;
 	u64	cpu_id;
 };
+#endif
+
+struct power_processor_entry {
+	struct trace_entry te;
+	u32	state;
+	u32	cpu_id;
+};
 
 #define TASK_COMM_LEN 16
 struct wakeup_entry {
@@ -489,29 +502,48 @@ static int process_sample_event(event_t *event, struct perf_session *session)
 	te = (void *)data.raw_data;
 	if (session->sample_type & PERF_SAMPLE_RAW && data.raw_size > 0) {
 		char *event_str;
-		struct power_entry *pe;
-
-		pe = (void *)te;
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		struct power_entry_old *peo;
+		peo = (void *)te;
+#endif
 
 		event_str = perf_header__find_event(te->type);
 
 		if (!event_str)
 			return 0;
 
-		if (strcmp(event_str, "power:power_start") == 0)
-			c_state_start(pe->cpu_id, data.time, pe->value);
-
-		if (strcmp(event_str, "power:power_end") == 0)
-			c_state_end(pe->cpu_id, data.time);
+		if (strcmp(event_str, "power:cpu_idle") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			if (ppe->state == (u32)PWR_EVENT_EXIT)
+				c_state_end(ppe->cpu_id, data.time);
+			else
+				c_state_start(ppe->cpu_id, data.time,
+					      ppe->state);
+		}
 
-		if (strcmp(event_str, "power:power_frequency") == 0)
-			p_state_change(pe->cpu_id, data.time, pe->value);
+		else if (strcmp(event_str, "power:cpu_frequency") == 0) {
+			struct power_processor_entry *ppe = (void *)te;
+			p_state_change(ppe->cpu_id, data.time, ppe->state);
+		}
 
-		if (strcmp(event_str, "sched:sched_wakeup") == 0)
+		else if (strcmp(event_str, "sched:sched_wakeup") == 0)
 			sched_wakeup(data.cpu, data.time, data.pid, te);
 
-		if (strcmp(event_str, "sched:sched_switch") == 0)
+		else if (strcmp(event_str, "sched:sched_switch") == 0)
 			sched_switch(data.cpu, data.time, te);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+		if (use_old_power_events) {
+			if (strcmp(event_str, "power:power_start") == 0)
+				c_state_start(peo->cpu_id, data.time, peo->value);
+
+			else if (strcmp(event_str, "power:power_end") == 0)
+				c_state_end(data.cpu, data.time);
+
+			else if (strcmp(event_str, "power:power_frequency") == 0)
+				p_state_change(peo->cpu_id, data.time, peo->value);
+		}
+#endif
 	}
 	return 0;
 }
@@ -968,7 +1000,8 @@ static const char * const timechart_usage[] = {
 	NULL
 };
 
-static const char *record_args[] = {
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+static const char *record_old_args[] = {
 	"record",
 	"-a",
 	"-R",
@@ -980,16 +1013,40 @@ static const char *record_args[] = {
 	"-e", "sched:sched_wakeup",
 	"-e", "sched:sched_switch",
 };
+#endif
+
+static const char *record_new_args[] = {
+	"record",
+	"-a",
+	"-R",
+	"-f",
+	"-c", "1",
+	"-e", "power:cpu_frequency",
+	"-e", "power:cpu_idle",
+	"-e", "sched:sched_wakeup",
+	"-e", "sched:sched_switch",
+};
 
 static int __cmd_record(int argc, const char **argv)
 {
 	unsigned int rec_argc, i, j;
 	const char **rec_argv;
-
-	rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+	const char **record_args = record_new_args;
+	unsigned int record_elems = ARRAY_SIZE(record_new_args);
+
+#if defined(SUPPORT_OLD_POWER_EVENTS)
+	if (!is_valid_tracepoint("power:cpu_idle") &&
+	    is_valid_tracepoint("power:power_start")) {
+		use_old_power_events = 1;
+		record_args = record_old_args;
+		record_elems = ARRAY_SIZE(record_old_args);
+	}
+#endif
+	
+	rec_argc = record_elems + argc - 1;
 	rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
-	for (i = 0; i < ARRAY_SIZE(record_args); i++)
+	for (i = 0; i < record_elems; i++)
 		rec_argv[i] = strdup(record_args[i]);
 
 	for (j = 1; j < (unsigned int)argc; j++, i++)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4af5bd5..35e3dea 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -906,6 +906,47 @@ static void print_tracepoint_events(void)
 }
 
 /*
+ * Check whether event is in <debugfs_mount_point>/tracing/events
+ */
+
+int is_valid_tracepoint(const char *event_string)
+{
+	DIR *sys_dir, *evt_dir;
+	struct dirent *sys_next, *evt_next, sys_dirent, evt_dirent;
+	char evt_path[MAXPATHLEN];
+	char dir_path[MAXPATHLEN];
+
+	if (debugfs_valid_mountpoint(debugfs_path))
+		return 0;
+
+	sys_dir = opendir(debugfs_path);
+	if (!sys_dir)
+		return 0;
+
+	for_each_subsystem(sys_dir, sys_dirent, sys_next) {
+
+		snprintf(dir_path, MAXPATHLEN, "%s/%s", debugfs_path,
+			 sys_dirent.d_name);
+		evt_dir = opendir(dir_path);
+		if (!evt_dir)
+			continue;
+
+		for_each_event(sys_dirent, evt_dir, evt_dirent, evt_next) {
+			snprintf(evt_path, MAXPATHLEN, "%s:%s",
+				 sys_dirent.d_name, evt_dirent.d_name);
+			if (!strcmp(evt_path, event_string)) {
+				closedir(evt_dir);
+				closedir(sys_dir);
+				return 1;
+			}
+		}
+		closedir(evt_dir);
+	}
+	closedir(sys_dir);
+	return 0;
+}
+
+/*
  * Print the help text for the event symbols:
  */
 void print_events(void)
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index fc4ab3f..7ab4685 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -29,6 +29,7 @@ extern int parse_filter(const struct option *opt, const char *str, int unset);
 #define EVENTS_HELP_MAX (128*1024)
 
 extern void print_events(void);
+extern int is_valid_tracepoint(const char *event_string);
 
 extern char debugfs_path[];
 extern int valid_debugfs_mount(const char *debugfs);

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28  9:02 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
@ 2010-10-28 11:17   ` Rafael J. Wysocki
  2010-10-28 11:17   ` Rafael J. Wysocki
  1 sibling, 0 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-28 11:17 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: jean.pihet, linux-trace-users, linux-pm, linux-omap, arjan, mingo

On Thursday, October 28, 2010, Thomas Renninger wrote:
> Recent changes:
>   - Enable EVENT_POWER_TRACING_DEPRECATED by default
> 
> New power trace events:
> power:cpu_idle
> power:cpu_frequency
> power:machine_suspend
> 
> 
> C-state/idle accounting events:
>   power:power_start
>   power:power_end
> are replaced with:
>   power:cpu_idle
> 
> and
>   power:power_frequency
> is replaced with:
>   power:cpu_frequency
> 
> power:machine_suspend
> is newly introduced, a first implementation
> comes from the ARM side, but it's easy to add these events
> in X86 as well if needed.

Can you please check that changelog, please?

I've asked you for that already once.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28  9:02 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
  2010-10-28 11:17   ` Rafael J. Wysocki
@ 2010-10-28 11:17   ` Rafael J. Wysocki
  2010-10-28 11:31     ` Rafael J. Wysocki
  2010-10-28 11:31     ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 2 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-28 11:17 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-omap, linux-pm, linux-trace-users, jean.pihet, arjan, mingo

On Thursday, October 28, 2010, Thomas Renninger wrote:
> Recent changes:
>   - Enable EVENT_POWER_TRACING_DEPRECATED by default
> 
> New power trace events:
> power:cpu_idle
> power:cpu_frequency
> power:machine_suspend
> 
> 
> C-state/idle accounting events:
>   power:power_start
>   power:power_end
> are replaced with:
>   power:cpu_idle
> 
> and
>   power:power_frequency
> is replaced with:
>   power:cpu_frequency
> 
> power:machine_suspend
> is newly introduced, a first implementation
> comes from the ARM side, but it's easy to add these events
> in X86 as well if needed.

Can you please check that changelog, please?

I've asked you for that already once.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28 11:17   ` Rafael J. Wysocki
@ 2010-10-28 11:31     ` Rafael J. Wysocki
  2010-10-28 11:31     ` [linux-pm] " Rafael J. Wysocki
  1 sibling, 0 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-28 11:31 UTC (permalink / raw)
  To: linux-pm; +Cc: jean.pihet, linux-trace-users, mingo, linux-omap, arjan

On Thursday, October 28, 2010, Rafael J. Wysocki wrote:
> On Thursday, October 28, 2010, Thomas Renninger wrote:
> > Recent changes:
> >   - Enable EVENT_POWER_TRACING_DEPRECATED by default
> > 
> > New power trace events:
> > power:cpu_idle
> > power:cpu_frequency
> > power:machine_suspend
> > 
> > 
> > C-state/idle accounting events:
> >   power:power_start
> >   power:power_end
> > are replaced with:
> >   power:cpu_idle
> > 
> > and
> >   power:power_frequency
> > is replaced with:
> >   power:cpu_frequency
> > 
> > power:machine_suspend
> > is newly introduced, a first implementation
> > comes from the ARM side, but it's easy to add these events
> > in X86 as well if needed.
> 
> Can you please check that changelog, please?

Sorry s/check/modify/

In fact, there won't be any ARM implementation, because it's going to be
added at the core level.

> I've asked you for that already once.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [linux-pm] [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28 11:17   ` Rafael J. Wysocki
  2010-10-28 11:31     ` Rafael J. Wysocki
@ 2010-10-28 11:31     ` Rafael J. Wysocki
  2010-10-28 11:37       ` Thomas Renninger
  2010-10-28 11:37       ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-28 11:31 UTC (permalink / raw)
  To: linux-pm
  Cc: Thomas Renninger, jean.pihet, linux-trace-users, linux-omap,
	arjan, mingo

On Thursday, October 28, 2010, Rafael J. Wysocki wrote:
> On Thursday, October 28, 2010, Thomas Renninger wrote:
> > Recent changes:
> >   - Enable EVENT_POWER_TRACING_DEPRECATED by default
> > 
> > New power trace events:
> > power:cpu_idle
> > power:cpu_frequency
> > power:machine_suspend
> > 
> > 
> > C-state/idle accounting events:
> >   power:power_start
> >   power:power_end
> > are replaced with:
> >   power:cpu_idle
> > 
> > and
> >   power:power_frequency
> > is replaced with:
> >   power:cpu_frequency
> > 
> > power:machine_suspend
> > is newly introduced, a first implementation
> > comes from the ARM side, but it's easy to add these events
> > in X86 as well if needed.
> 
> Can you please check that changelog, please?

Sorry s/check/modify/

In fact, there won't be any ARM implementation, because it's going to be
added at the core level.

> I've asked you for that already once.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28 11:31     ` [linux-pm] " Rafael J. Wysocki
  2010-10-28 11:37       ` Thomas Renninger
@ 2010-10-28 11:37       ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28 11:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: jean.pihet, linux-trace-users, linux-pm, linux-omap, arjan, mingo

Recent changes:
  - Enable EVENT_POWER_TRACING_DEPRECATED by default

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs 
*freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, 
struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ 
at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-
declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) 
{};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 
cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..8ccbedd 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-28 11:31     ` [linux-pm] " Rafael J. Wysocki
@ 2010-10-28 11:37       ` Thomas Renninger
  2010-10-28 11:37       ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-28 11:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, jean.pihet, linux-trace-users, linux-omap, arjan, mingo

Recent changes:
  - Enable EVENT_POWER_TRACING_DEPRECATED by default

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
CC: linux-omap@vger.kernel.org
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..28153a9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..ed4919e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs 
*freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..d3701bf 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, 
struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..f10de41 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ 
at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +130,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-
declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) 
{};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 
cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..8ccbedd 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18 16:34                   ` Jean Pihet
@ 2010-11-19  0:14                     ` Thomas Renninger
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-11-19  0:14 UTC (permalink / raw)
  To: Jean Pihet; +Cc: Ingo Molnar, rjw, linux-kernel, arjan

On Thursday 18 November 2010 05:34:15 pm Jean Pihet wrote:
> On Thu, Nov 18, 2010 at 11:52 AM, Ingo Molnar <mingo@elte.hu> wrote:
> >
...
> The problem is because power.h gets included mutliple times, and so
> the POWER_ enum and the empty deprecated functions need to be
> protected from that.
> 
> Here is a patch below that fixes it, compile tested with and without
> CONFIG_EVENT_POWER_TRACING_DEPRECATED set.
Yep, that should be the correct fix.
While I tested both options before, after the pre-processor mess ups, it
looks like I did test a lot of different archs/flavors through our
build service, but all .configs were set by default to
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
Stupid, sorry about that.
 
> Ingo, Thomas, please let me know if you want me tp refresh the patches
> with that fix.
I'll add it to the end, based on the one Ingo sent.
Ingo: As you started fiddling with it, is that enough or do you prefer
another whole patch series resend?

...

> > From b989c51b6f1989a834eecd9a64a7bd52ed230ea0 Mon Sep 17 00:00:00 2001
> > From: Thomas Renninger <trenn@suse.de>
> > Date: Thu, 18 Nov 2010 10:25:12 +0100
> > Subject: [PATCH] perf: Do not export power_frequency, but power_start
> > event 
This is an independent fix, you can just push it.

Thanks Jean/Ingo, hope that was the last remaining issue...

          Thomas

--------
perf: Clean up power events

Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3d7a3a..4c818a7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c63a438..1109f68 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3c95325..ba5134f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 286784d..46596ad 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,16 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
-#ifndef _TRACE_POWER_ENUM_
-#define _TRACE_POWER_ENUM_
-enum {
-	POWER_NONE	= 0,
-	POWER_CSTATE	= 1,	/* C-State */
-	POWER_PSTATE	= 2,	/* Fequency change or DVFS */
-	POWER_SSTATE	= 3,	/* Suspend */
-};
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
 #endif
 
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+/* This code will be removed after deprecation time exceeded (2.6.41) */
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 /*
  * The power events are used for cpuidle & suspend (power_start, power_end)
  *  and for cpufreq (power_frequency)
@@ -75,6 +126,35 @@ TRACE_EVENT(power_end,
 
 );
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
+
+#else /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+
+/* These dummy declaration have to be ripped out when the deprecated
+   events get removed */
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
 /*
  * The clock events are used for clock enable/disable and for
  *  clock rate change
@@ -153,7 +233,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
 
 	TP_ARGS(name, state, cpu_id)
 );
-
 #endif /* _TRACE_POWER_H */
 
 /* This part must be outside protection */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index ea37e2f..14674dc 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -69,6 +69,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool "Deprecated power event trace API, to be removed"
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18 10:52                 ` Ingo Molnar
@ 2010-11-18 16:34                   ` Jean Pihet
  2010-11-19  0:14                     ` Thomas Renninger
  0 siblings, 1 reply; 72+ messages in thread
From: Jean Pihet @ 2010-11-18 16:34 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Renninger; +Cc: rjw, linux-kernel, arjan

On Thu, Nov 18, 2010 at 11:52 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> I am also getting build failures:
>
> drivers/cpufreq/cpufreq.c:357: error: 'POWER_PSTATE' undeclared (first use in this function)
> drivers/cpufreq/cpufreq.c:357: error: (Each undeclared identifier is reported only once
> drivers/cpufreq/cpufreq.c:357: error: for each function it appears in.)
> arch/x86/kernel/process.c:375: error: 'POWER_CSTATE' undeclared (first use in this function)
> arch/x86/kernel/process.c:375: error: (Each undeclared identifier is reported only once
> arch/x86/kernel/process.c:375: error: for each function it appears in.)
> arch/x86/kernel/process.c:446: error: 'POWER_CSTATE' undeclared (first use in this function)
> arch/x86/kernel/process.c:463: error: 'POWER_CSTATE' undeclared (first use in this function)
> arch/x86/kernel/process.c:485: error: 'POWER_CSTATE' undeclared (first use in this function)
> include/trace/events/power.h:142: error: redefinition of 'trace_power_start'
>
> Config attached.

The problem is because power.h gets included mutliple times, and so
the POWER_ enum and the empty deprecated functions need to be
protected from that.

Here is a patch below that fixes it, compile tested with and without
CONFIG_EVENT_POWER_TRACING_DEPRECATED set.

Ingo, Thomas, please let me know if you want me tp refresh the patches
with that fix.

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 00d9819..89db5a1 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -136,12 +136,24 @@ enum {
        POWER_PSTATE = 2,
 };
 #endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
-#else
+
+#else /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+enum {
+       POWER_NONE = 0,
+       POWER_CSTATE = 1,
+       POWER_PSTATE = 2,
+};
+
 /* These dummy declaration have to be ripped out when the deprecated
    events get removed */
 static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
 static inline void trace_power_end(u64 cpuid) {};
 static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
+
 #endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */

 /*

Thanks,
Jean

>
> Note: please reuse the two commits from below for further work, i did some small
> cleanups to the commit text and to the patches.
>
> Thanks,
>
>        Ingo
>
> ---------------->
> From 87a2cfbda3f53c3bf00c424ce18d97b03b0c3aa0 Mon Sep 17 00:00:00 2001
> From: Thomas Renninger <trenn@suse.de>
> Date: Thu, 18 Nov 2010 10:25:13 +0100
> Subject: [PATCH] perf: Clean up power events
>
> Add these new power trace events:
>
>  power:cpu_idle
>  power:cpu_frequency
>  power:machine_suspend
>
> The old C-state/idle accounting events:
>  power:power_start
>  power:power_end
>
> Have now a replacement (but we are still keeping the old
> tracepoints for compatibility):
>
>  power:cpu_idle
>
> and
>  power:power_frequency
>
> is replaced with:
>  power:cpu_frequency
>
> power:machine_suspend is newly introduced.
>
> Jean Pihet has a patch integrated into the generic layer
> (kernel/power/suspend.c) which will make use of it.
>
> the type= field got removed from both, it was never
> used and the type is differed by the event type itself.
>
> perf timechart userspace tool gets adjusted in a separate patch.
>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> Acked-by: Arjan van de Ven <arjan@linux.intel.com>
> Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: rjw@sisk.pl
> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/kernel/process.c    |    7 +++-
>  arch/x86/kernel/process_32.c |    2 +-
>  arch/x86/kernel/process_64.c |    2 +
>  drivers/cpufreq/cpufreq.c    |    1 +
>  drivers/cpuidle/cpuidle.c    |    1 +
>  drivers/idle/intel_idle.c    |    1 +
>  include/trace/events/power.h |   86 +++++++++++++++++++++++++++++++++++++----
>  kernel/trace/Kconfig         |   15 +++++++
>  kernel/trace/power-traces.c  |    3 +
>  9 files changed, 107 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 57d1868..155d975 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -374,6 +374,7 @@ void default_idle(void)
>  {
>        if (hlt_use_halt()) {
>                trace_power_start(POWER_CSTATE, 1, smp_processor_id());
> +               trace_cpu_idle(1, smp_processor_id());
>                current_thread_info()->status &= ~TS_POLLING;
>                /*
>                 * TS_POLLING-cleared state must be visible before we
> @@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
>  void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
>  {
>        trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
> +       trace_cpu_idle((ax>>4)+1, smp_processor_id());
>        if (!need_resched()) {
>                if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
>                        clflush((void *)&current_thread_info()->flags);
> @@ -460,6 +462,7 @@ static void mwait_idle(void)
>  {
>        if (!need_resched()) {
>                trace_power_start(POWER_CSTATE, 1, smp_processor_id());
> +               trace_cpu_idle(1, smp_processor_id());
>                if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
>                        clflush((void *)&current_thread_info()->flags);
>
> @@ -481,10 +484,12 @@ static void mwait_idle(void)
>  static void poll_idle(void)
>  {
>        trace_power_start(POWER_CSTATE, 0, smp_processor_id());
> +       trace_cpu_idle(0, smp_processor_id());
>        local_irq_enable();
>        while (!need_resched())
>                cpu_relax();
> -       trace_power_end(0);
> +       trace_power_end(smp_processor_id());
> +       trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>  }
>
>  /*
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 96586c3..4b9befa 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -113,8 +113,8 @@ void cpu_idle(void)
>                        stop_critical_timings();
>                        pm_idle();
>                        start_critical_timings();
> -
>                        trace_power_end(smp_processor_id());
> +                       trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>                }
>                tick_nohz_restart_sched_tick();
>                preempt_enable_no_resched();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index b3d7a3a..4c818a7 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -142,6 +142,8 @@ void cpu_idle(void)
>                        start_critical_timings();
>
>                        trace_power_end(smp_processor_id());
> +                       trace_cpu_idle(PWR_EVENT_EXIT,
> +                                      smp_processor_id());
>
>                        /* In many cases the interrupt that ended idle
>                           has already called exit_idle. But some idle
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index c63a438..1109f68 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
>                dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
>                        (unsigned long)freqs->cpu);
>                trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
> +               trace_cpu_frequency(freqs->new, freqs->cpu);
>                srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
>                                CPUFREQ_POSTCHANGE, freqs);
>                if (likely(policy) && likely(policy->cpu == freqs->cpu))
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a507108..08d5f05 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
>        if (cpuidle_curr_governor->reflect)
>                cpuidle_curr_governor->reflect(dev);
>        trace_power_end(smp_processor_id());
> +       trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>  }
>
>  /**
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 3c95325..ba5134f 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
>
>        stop_critical_timings();
>        trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
> +       trace_cpu_idle((eax >> 4) + 1, cpu);
>        if (!need_resched()) {
>
>                __monitor((void *)&current_thread_info()->flags, 0, 0);
> diff --git a/include/trace/events/power.h b/include/trace/events/power.h
> index 286784d..00d9819 100644
> --- a/include/trace/events/power.h
> +++ b/include/trace/events/power.h
> @@ -7,16 +7,67 @@
>  #include <linux/ktime.h>
>  #include <linux/tracepoint.h>
>
> -#ifndef _TRACE_POWER_ENUM_
> -#define _TRACE_POWER_ENUM_
> -enum {
> -       POWER_NONE      = 0,
> -       POWER_CSTATE    = 1,    /* C-State */
> -       POWER_PSTATE    = 2,    /* Fequency change or DVFS */
> -       POWER_SSTATE    = 3,    /* Suspend */
> -};
> +DECLARE_EVENT_CLASS(cpu,
> +
> +       TP_PROTO(unsigned int state, unsigned int cpu_id),
> +
> +       TP_ARGS(state, cpu_id),
> +
> +       TP_STRUCT__entry(
> +               __field(        u32,            state           )
> +               __field(        u32,            cpu_id          )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->state = state;
> +               __entry->cpu_id = cpu_id;
> +       ),
> +
> +       TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
> +                 (unsigned long)__entry->cpu_id)
> +);
> +
> +DEFINE_EVENT(cpu, cpu_idle,
> +
> +       TP_PROTO(unsigned int state, unsigned int cpu_id),
> +
> +       TP_ARGS(state, cpu_id)
> +);
> +
> +/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
> +#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
> +#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
> +
> +#define PWR_EVENT_EXIT -1
>  #endif
>
> +DEFINE_EVENT(cpu, cpu_frequency,
> +
> +       TP_PROTO(unsigned int frequency, unsigned int cpu_id),
> +
> +       TP_ARGS(frequency, cpu_id)
> +);
> +
> +TRACE_EVENT(machine_suspend,
> +
> +       TP_PROTO(unsigned int state),
> +
> +       TP_ARGS(state),
> +
> +       TP_STRUCT__entry(
> +               __field(        u32,            state           )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->state = state;
> +       ),
> +
> +       TP_printk("state=%lu", (unsigned long)__entry->state)
> +);
> +
> +/* This code will be removed after deprecation time exceeded (2.6.41) */
> +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> +
>  /*
>  * The power events are used for cpuidle & suspend (power_start, power_end)
>  *  and for cpufreq (power_frequency)
> @@ -75,6 +126,24 @@ TRACE_EVENT(power_end,
>
>  );
>
> +/* Deprecated dummy functions must be protected against multi-declartion */
> +#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
> +#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
> +
> +enum {
> +       POWER_NONE = 0,
> +       POWER_CSTATE = 1,
> +       POWER_PSTATE = 2,
> +};
> +#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
> +#else
> +/* These dummy declaration have to be ripped out when the deprecated
> +   events get removed */
> +static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
> +static inline void trace_power_end(u64 cpuid) {};
> +static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
> +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
> +
>  /*
>  * The clock events are used for clock enable/disable and for
>  *  clock rate change
> @@ -153,7 +222,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
>
>        TP_ARGS(name, state, cpu_id)
>  );
> -
>  #endif /* _TRACE_POWER_H */
>
>  /* This part must be outside protection */
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index e04b8bc..59b44a1 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -69,6 +69,21 @@ config EVENT_TRACING
>        select CONTEXT_SWITCH_TRACER
>        bool
>
> +config EVENT_POWER_TRACING_DEPRECATED
> +       depends on EVENT_TRACING
> +       bool "Deprecated power event trace API, to be removed"
> +       default y
> +       help
> +         Provides old power event types:
> +         C-state/idle accounting events:
> +         power:power_start
> +         power:power_end
> +         and old cpufreq accounting event:
> +         power:power_frequency
> +         This is for userspace compatibility
> +         and will vanish after 5 kernel iterations,
> +         namely 2.6.41.
> +
>  config CONTEXT_SWITCH_TRACER
>        bool
>
> diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
> index 0e0497d..f55fcf6 100644
> --- a/kernel/trace/power-traces.c
> +++ b/kernel/trace/power-traces.c
> @@ -13,5 +13,8 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/power.h>
>
> +#ifdef EVENT_POWER_TRACING_DEPRECATED
>  EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
> +#endif
> +EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
>
>
> From b989c51b6f1989a834eecd9a64a7bd52ed230ea0 Mon Sep 17 00:00:00 2001
> From: Thomas Renninger <trenn@suse.de>
> Date: Thu, 18 Nov 2010 10:25:12 +0100
> Subject: [PATCH] perf: Do not export power_frequency, but power_start event
>
> power_frequency moved to drivers/cpufreq/cpufreq.c which has
> to be compiled in, no need to export it.
>
> intel_idle can a be module though...
>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
> CC: Arjan van de Ven <arjan@linux.intel.com>
> Cc: rjw@sisk.pl
> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  drivers/idle/intel_idle.c   |    2 --
>  kernel/trace/power-traces.c |    2 +-
>  2 files changed, 1 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 41665d2..3c95325 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -220,9 +220,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
>        kt_before = ktime_get_real();
>
>        stop_critical_timings();
> -#ifndef MODULE
>        trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
> -#endif
>        if (!need_resched()) {
>
>                __monitor((void *)&current_thread_info()->flags, 0, 0);
> diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
> index a22582a..0e0497d 100644
> --- a/kernel/trace/power-traces.c
> +++ b/kernel/trace/power-traces.c
> @@ -13,5 +13,5 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/power.h>
>
> -EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
> +EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
>
>

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18 13:01 Power trace event cleanup by still providing old interface for some time Thomas Renninger
@ 2010-11-18 13:01 ` Thomas Renninger
  0 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-11-18 13:01 UTC (permalink / raw)
  To: j-pihet, arjan, mingo, linux-kernel, trenn; +Cc: rjw

Recent changes:
  - Enable EVENT_POWER_TRACING_DEPRECATED by default

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   86 +++++++++++++++++++++++++++++++++++++----
 kernel/trace/Kconfig         |   15 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 107 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3d7a3a..4c818a7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c63a438..1109f68 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3c95325..ba5134f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 286784d..00d9819 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,16 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
-#ifndef _TRACE_POWER_ENUM_
-#define _TRACE_POWER_ENUM_
-enum {
-	POWER_NONE	= 0,
-	POWER_CSTATE	= 1,	/* C-State */
-	POWER_PSTATE	= 2,	/* Fequency change or DVFS */
-	POWER_SSTATE	= 3,	/* Suspend */
-};
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
 #endif
 
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+/* This code will be removed after deprecation time exceeded (2.6.41) */
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 /*
  * The power events are used for cpuidle & suspend (power_start, power_end)
  *  and for cpufreq (power_frequency)
@@ -75,6 +126,24 @@ TRACE_EVENT(power_end,
 
 );
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
+#else
+/* These dummy declaration have to be ripped out when the deprecated
+   events get removed */
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
 /*
  * The clock events are used for clock enable/disable and for
  *  clock rate change
@@ -153,7 +222,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
 
 	TP_ARGS(name, state, cpu_id)
 );
-
 #endif /* _TRACE_POWER_H */
 
 /* This part must be outside protection */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e04b8bc..59b44a1 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -69,6 +69,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool "Deprecated power event trace API, to be removed"
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18  9:36               ` Ingo Molnar
  2010-11-18  9:44                 ` Jean Pihet
@ 2010-11-18 10:52                 ` Ingo Molnar
  2010-11-18 16:34                   ` Jean Pihet
  1 sibling, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2010-11-18 10:52 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: Jean Pihet, rjw, linux-kernel, arjan

[-- Attachment #1: Type: text/plain, Size: 12091 bytes --]


I am also getting build failures:

drivers/cpufreq/cpufreq.c:357: error: 'POWER_PSTATE' undeclared (first use in this function)
drivers/cpufreq/cpufreq.c:357: error: (Each undeclared identifier is reported only once
drivers/cpufreq/cpufreq.c:357: error: for each function it appears in.)
arch/x86/kernel/process.c:375: error: 'POWER_CSTATE' undeclared (first use in this function)
arch/x86/kernel/process.c:375: error: (Each undeclared identifier is reported only once
arch/x86/kernel/process.c:375: error: for each function it appears in.)
arch/x86/kernel/process.c:446: error: 'POWER_CSTATE' undeclared (first use in this function)
arch/x86/kernel/process.c:463: error: 'POWER_CSTATE' undeclared (first use in this function)
arch/x86/kernel/process.c:485: error: 'POWER_CSTATE' undeclared (first use in this function)
include/trace/events/power.h:142: error: redefinition of 'trace_power_start'

Config attached.

Note: please reuse the two commits from below for further work, i did some small 
cleanups to the commit text and to the patches.

Thanks,

	Ingo

---------------->
>From 87a2cfbda3f53c3bf00c424ce18d97b03b0c3aa0 Mon Sep 17 00:00:00 2001
From: Thomas Renninger <trenn@suse.de>
Date: Thu, 18 Nov 2010 10:25:13 +0100
Subject: [PATCH] perf: Clean up power events

Add these new power trace events:

 power:cpu_idle
 power:cpu_frequency
 power:machine_suspend

The old C-state/idle accounting events:
  power:power_start
  power:power_end

Have now a replacement (but we are still keeping the old
tracepoints for compatibility):

  power:cpu_idle

and
  power:power_frequency

is replaced with:
  power:cpu_frequency

power:machine_suspend is newly introduced.

Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   86 +++++++++++++++++++++++++++++++++++++----
 kernel/trace/Kconfig         |   15 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 107 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3d7a3a..4c818a7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c63a438..1109f68 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3c95325..ba5134f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 286784d..00d9819 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,16 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
-#ifndef _TRACE_POWER_ENUM_
-#define _TRACE_POWER_ENUM_
-enum {
-	POWER_NONE	= 0,
-	POWER_CSTATE	= 1,	/* C-State */
-	POWER_PSTATE	= 2,	/* Fequency change or DVFS */
-	POWER_SSTATE	= 3,	/* Suspend */
-};
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
 #endif
 
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+/* This code will be removed after deprecation time exceeded (2.6.41) */
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 /*
  * The power events are used for cpuidle & suspend (power_start, power_end)
  *  and for cpufreq (power_frequency)
@@ -75,6 +126,24 @@ TRACE_EVENT(power_end,
 
 );
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
+#else
+/* These dummy declaration have to be ripped out when the deprecated
+   events get removed */
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
 /*
  * The clock events are used for clock enable/disable and for
  *  clock rate change
@@ -153,7 +222,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
 
 	TP_ARGS(name, state, cpu_id)
 );
-
 #endif /* _TRACE_POWER_H */
 
 /* This part must be outside protection */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e04b8bc..59b44a1 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -69,6 +69,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool "Deprecated power event trace API, to be removed"
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 

>From b989c51b6f1989a834eecd9a64a7bd52ed230ea0 Mon Sep 17 00:00:00 2001
From: Thomas Renninger <trenn@suse.de>
Date: Thu, 18 Nov 2010 10:25:12 +0100
Subject: [PATCH] perf: Do not export power_frequency, but power_start event

power_frequency moved to drivers/cpufreq/cpufreq.c which has
to be compiled in, no need to export it.

intel_idle can a be module though...

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
Cc: rjw@sisk.pl
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/idle/intel_idle.c   |    2 --
 kernel/trace/power-traces.c |    2 +-
 2 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 41665d2..3c95325 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -220,9 +220,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 	kt_before = ktime_get_real();
 
 	stop_critical_timings();
-#ifndef MODULE
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
-#endif
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index a22582a..0e0497d 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,5 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
+EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
 

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 39692 bytes --]

#
# Automatically generated make config: don't edit
# Linux/i386 2.6.37-rc2 Kernel Configuration
# Thu Nov 18 12:29:46 2010
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
# CONFIG_X86_64 is not set
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
# CONFIG_NEED_DMA_MAP_STATE is not set
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_GPIO=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_GENERIC_TIME_VSYSCALL is not set
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
# CONFIG_HAVE_CPUMASK_OF_CPU_MAP is not set
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
# CONFIG_ZONE_DMA32 is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
# CONFIG_AUDIT_ARCH is not set
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_32_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_KTIME_SCALAR=y
CONFIG_BOOTPARAM_SUPPORT_NOT_WANTED=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_HAVE_IRQ_WORK=y
CONFIG_IRQ_WORK=y

#
# General setup
#
# CONFIG_EXPERIMENTAL is not set
CONFIG_BROKEN_BOOT_ALLOWED4=y
CONFIG_BROKEN_BOOT_ALLOWED3=y
CONFIG_BROKEN_BOOT_ALLOWED2=y
CONFIG_BROKEN_BOOT_ALLOWED=y
CONFIG_BROKEN_BOOT_EUROPE=y
CONFIG_BROKEN_BOOT_TITAN=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_LZO=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_KERNEL_LZO=y
# CONFIG_SWAP is not set
# CONFIG_SYSVIPC is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y
CONFIG_HAVE_GENERIC_HARDIRQS=y

#
# IRQ subsystem
#
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
# CONFIG_GENERIC_HARDIRQS_NO_DEPRECATED is not set
CONFIG_HAVE_SPARSE_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
# CONFIG_AUTO_IRQ_AFFINITY is not set
# CONFIG_IRQ_PER_CPU is not set
# CONFIG_HARDIRQS_SW_RESEND is not set
CONFIG_SPARSE_IRQ=y

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_RCU_TRACE=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_EXACT=y
CONFIG_TREE_RCU_TRACE=y
CONFIG_IKCONFIG=m
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=20
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
CONFIG_PID_NS=y
# CONFIG_NET_NS is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_EMBEDDED=y
CONFIG_UID16=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
# CONFIG_PRINTK is not set
# CONFIG_BUG is not set
# CONFIG_ELF_CORE is not set
# CONFIG_PCSPKR_PLATFORM is not set
CONFIG_BASE_FULL=y
# CONFIG_FUTEX is not set
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_PERF_COUNTERS is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_PROFILING is not set
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
# CONFIG_JUMP_LABEL is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_ATTRS=y
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_HAVE_HW_BREAKPOINT=y
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y
CONFIG_HAVE_USER_RETURN_NOTIFIER=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_ARCH_JUMP_LABEL=y

#
# GCOV-based kernel profiling
#
CONFIG_HAVE_GENERIC_DMA_COHERENT=y
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_LBDAF is not set
CONFIG_BLK_DEV_BSG=y
CONFIG_BLK_DEV_INTEGRITY=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=m
CONFIG_DEFAULT_NOOP=y
CONFIG_DEFAULT_IOSCHED="noop"
# CONFIG_INLINE_SPIN_TRYLOCK is not set
# CONFIG_INLINE_SPIN_TRYLOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK is not set
# CONFIG_INLINE_SPIN_LOCK_BH is not set
# CONFIG_INLINE_SPIN_LOCK_IRQ is not set
# CONFIG_INLINE_SPIN_LOCK_IRQSAVE is not set
CONFIG_INLINE_SPIN_UNLOCK=y
# CONFIG_INLINE_SPIN_UNLOCK_BH is not set
CONFIG_INLINE_SPIN_UNLOCK_IRQ=y
# CONFIG_INLINE_SPIN_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_READ_TRYLOCK is not set
# CONFIG_INLINE_READ_LOCK is not set
# CONFIG_INLINE_READ_LOCK_BH is not set
# CONFIG_INLINE_READ_LOCK_IRQ is not set
# CONFIG_INLINE_READ_LOCK_IRQSAVE is not set
CONFIG_INLINE_READ_UNLOCK=y
# CONFIG_INLINE_READ_UNLOCK_BH is not set
CONFIG_INLINE_READ_UNLOCK_IRQ=y
# CONFIG_INLINE_READ_UNLOCK_IRQRESTORE is not set
# CONFIG_INLINE_WRITE_TRYLOCK is not set
# CONFIG_INLINE_WRITE_LOCK is not set
# CONFIG_INLINE_WRITE_LOCK_BH is not set
# CONFIG_INLINE_WRITE_LOCK_IRQ is not set
# CONFIG_INLINE_WRITE_LOCK_IRQSAVE is not set
CONFIG_INLINE_WRITE_UNLOCK=y
# CONFIG_INLINE_WRITE_UNLOCK_BH is not set
CONFIG_INLINE_WRITE_UNLOCK_IRQ=y
# CONFIG_INLINE_WRITE_UNLOCK_IRQRESTORE is not set
CONFIG_MUTEX_SPIN_ON_OWNER=y
# CONFIG_FREEZER is not set

#
# Processor type and features
#
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP_SUPPORT=y
CONFIG_X86_MPPARSE=y
CONFIG_X86_BIGSMP=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_ELAN is not set
CONFIG_X86_RDC321X=y
# CONFIG_X86_32_NON_STANDARD is not set
CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_PARAVIRT_GUEST=y
# CONFIG_XEN_PRIVILEGED_GUEST is not set
CONFIG_KVM_CLOCK=y
# CONFIG_KVM_GUEST is not set
# CONFIG_LGUEST_GUEST is not set
CONFIG_PARAVIRT=y
CONFIG_PARAVIRT_CLOCK=y
CONFIG_NO_BOOTMEM=y
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
CONFIG_M686=y
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MCORE2 is not set
# CONFIG_MATOM is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CPU=y
CONFIG_X86_INTERNODE_CACHE_SHIFT=5
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_XADD=y
# CONFIG_X86_PPRO_FENCE is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=5
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_PROCESSOR_SELECT=y
CONFIG_CPU_SUP_INTEL=y
# CONFIG_CPU_SUP_CYRIX_32 is not set
# CONFIG_CPU_SUP_AMD is not set
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_CPU_SUP_TRANSMETA_32=y
# CONFIG_CPU_SUP_UMC_32 is not set
# CONFIG_HPET_TIMER is not set
CONFIG_DMI=y
# CONFIG_IOMMU_HELPER is not set
# CONFIG_IOMMU_API is not set
CONFIG_NR_CPUS=32
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS=y
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
# CONFIG_X86_MCE_AMD is not set
CONFIG_X86_ANCIENT_MCE=y
CONFIG_X86_MCE_INJECT=m
CONFIG_VM86=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_X86_REBOOTFIXUPS=y
CONFIG_MICROCODE=m
CONFIG_MICROCODE_INTEL=y
# CONFIG_MICROCODE_AMD is not set
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=m
CONFIG_X86_CPUID=m
# CONFIG_UP_WANTED_1 is not set
CONFIG_SMP=y
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_PAGE_OFFSET=0xC0000000
CONFIG_HIGHMEM=y
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
# CONFIG_ARCH_DMA_ADDR_T_64BIT is not set
CONFIG_ILLEGAL_POINTER_VALUE=0
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_HAVE_MEMBLOCK=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_KSM=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
# CONFIG_MEMORY_FAILURE is not set
# CONFIG_HIGHPTE is not set
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW=64
CONFIG_MATH_EMULATION=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
CONFIG_SECCOMP=y
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
CONFIG_SCHED_HRTICK=y
# CONFIG_KEXEC is not set
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x1000000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x1000000
# CONFIG_HOTPLUG_CPU is not set
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SUSPEND is not set
# CONFIG_PM_RUNTIME is not set
# CONFIG_SFI is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=m
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=m
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=m
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
CONFIG_X86_POWERNOW_K6=m
CONFIG_X86_POWERNOW_K7=m
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
CONFIG_X86_SPEEDSTEP_ICH=m
# CONFIG_X86_P4_CLOCKMOD is not set
CONFIG_X86_LONGRUN=m

#
# shared options
#
CONFIG_X86_SPEEDSTEP_LIB=m
# CONFIG_X86_SPEEDSTEP_RELAXED_CAP_CHECK is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y
CONFIG_INTEL_IDLE=y

#
# Bus options (PCI etc.)
#
# CONFIG_PCI is not set
CONFIG_PCI_BIOS=y
# CONFIG_ARCH_SUPPORTS_MSI is not set
CONFIG_ISA_DMA_API=y
# CONFIG_ISA is not set
CONFIG_MCA=y
CONFIG_MCA_LEGACY=y
# CONFIG_MCA_PROC_FS is not set
# CONFIG_SCx200 is not set
CONFIG_OLPC=y
CONFIG_OLPC_OPENFIRMWARE=y
CONFIG_PCCARD=m
# CONFIG_PCMCIA is not set

#
# PC-card bridges
#

#
# Executable file formats / Emulations
#
# CONFIG_BINFMT_ELF is not set
CONFIG_HAVE_AOUT=y
# CONFIG_BINFMT_AOUT is not set
CONFIG_BINFMT_MISC=m
CONFIG_HAVE_ATOMIC_IOMAP=y
CONFIG_HAVE_TEXT_POKE_SMP=y
CONFIG_NET=y

#
# Networking options
#
# CONFIG_PACKET is not set
CONFIG_UNIX=m
CONFIG_XFRM=y
CONFIG_XFRM_USER=m
CONFIG_XFRM_IPCOMP=m
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_MULTIPLE_TABLES is not set
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
CONFIG_NET_IPGRE_DEMUX=m
# CONFIG_NET_IPGRE is not set
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
# CONFIG_IP_PIMSM_V1 is not set
# CONFIG_IP_PIMSM_V2 is not set
CONFIG_ARPD=y
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=m
# CONFIG_INET_LRO is not set
# CONFIG_INET_DIAG is not set
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
CONFIG_IPV6=m
# CONFIG_IPV6_PRIVACY is not set
CONFIG_IPV6_ROUTER_PREF=y
# CONFIG_INET6_AH is not set
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
CONFIG_IPV6_SIT=m
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_NETLABEL=y
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
CONFIG_ATM=m
# CONFIG_ATM_CLIP is not set
# CONFIG_ATM_LANE is not set
CONFIG_ATM_BR2684=m
CONFIG_ATM_BR2684_IPFILTER=y
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
# CONFIG_BRIDGE is not set
CONFIG_VLAN_8021Q=m
# CONFIG_VLAN_8021Q_GVRP is not set
CONFIG_DECNET=m
CONFIG_LLC=m
CONFIG_LLC2=m
# CONFIG_PHONET is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
CONFIG_NET_SCH_MULTIQ=m
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
CONFIG_NET_SCH_GRED=m
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
CONFIG_NET_SCH_DRR=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
CONFIG_NET_CLS_FW=m
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
CONFIG_NET_CLS_RSVP6=m
CONFIG_NET_CLS_FLOW=m
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
# CONFIG_NET_CLS_ACT is not set
# CONFIG_NET_CLS_IND is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set
CONFIG_RPS=y

#
# Network testing
#
CONFIG_NET_PKTGEN=m
CONFIG_HAMRADIO=y

#
# Packet Radio protocols
#
# CONFIG_AX25 is not set
CONFIG_CAN=m
CONFIG_CAN_RAW=m
CONFIG_CAN_BCM=m

#
# CAN Device Drivers
#
# CONFIG_CAN_VCAN is not set
CONFIG_CAN_DEV=m
# CONFIG_CAN_CALC_BITTIMING is not set
# CONFIG_CAN_SJA1000 is not set
CONFIG_CAN_DEBUG_DEVICES=y
CONFIG_IRDA=m

#
# IrDA protocols
#
CONFIG_IRLAN=m
# CONFIG_IRNET is not set
# CONFIG_IRCOMM is not set
CONFIG_IRDA_ULTRA=y

#
# IrDA options
#
# CONFIG_IRDA_CACHE_LAST_LSAP is not set
# CONFIG_IRDA_FAST_RR is not set
CONFIG_IRDA_DEBUG=y

#
# Infrared-port device drivers
#

#
# SIR device drivers
#
# CONFIG_IRTTY_SIR is not set

#
# Dongle support
#

#
# FIR device drivers
#
# CONFIG_NSC_FIR is not set
# CONFIG_WINBOND_FIR is not set
CONFIG_VIA_FIR=m
CONFIG_BT=m
# CONFIG_BT_L2CAP is not set
# CONFIG_BT_SCO is not set

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIUART is not set
CONFIG_BT_HCIVHCI=m
# CONFIG_BT_MRVL is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_CAIF is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH=""
CONFIG_DEVTMPFS=y
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=m
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_CONNECTOR is not set
CONFIG_MTD=m
CONFIG_MTD_DEBUG=y
CONFIG_MTD_DEBUG_VERBOSE=0
# CONFIG_MTD_TESTS is not set
# CONFIG_MTD_CONCAT is not set
CONFIG_MTD_PARTITIONS=y
# CONFIG_MTD_REDBOOT_PARTS is not set
CONFIG_MTD_AR7_PARTS=m

#
# User Modules And Translation Layers
#
CONFIG_MTD_CHAR=m
CONFIG_MTD_BLKDEVS=m
# CONFIG_MTD_BLOCK is not set
# CONFIG_MTD_BLOCK_RO is not set
# CONFIG_FTL is not set
CONFIG_NFTL=m
# CONFIG_NFTL_RW is not set
# CONFIG_INFTL is not set
# CONFIG_RFD_FTL is not set
# CONFIG_SSFDC is not set
CONFIG_MTD_OOPS=m

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=m
# CONFIG_MTD_JEDECPROBE is not set
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
# CONFIG_MTD_CFI_INTELEXT is not set
# CONFIG_MTD_CFI_AMDSTD is not set
# CONFIG_MTD_CFI_STAA is not set
CONFIG_MTD_CFI_UTIL=m
# CONFIG_MTD_RAM is not set
CONFIG_MTD_ROM=m
# CONFIG_MTD_ABSENT is not set

#
# Mapping drivers for chip access
#
CONFIG_MTD_COMPLEX_MAPPINGS=y
CONFIG_MTD_PHYSMAP=m
CONFIG_MTD_PHYSMAP_COMPAT=y
CONFIG_MTD_PHYSMAP_START=0x8000000
CONFIG_MTD_PHYSMAP_LEN=0
CONFIG_MTD_PHYSMAP_BANKWIDTH=2
# CONFIG_MTD_NETSC520 is not set
# CONFIG_MTD_TS5500 is not set
# CONFIG_MTD_GPIO_ADDR is not set
# CONFIG_MTD_PLATRAM is not set

#
# Self-contained MTD device drivers
#
CONFIG_MTD_SLRAM=m
CONFIG_MTD_PHRAM=m
CONFIG_MTD_MTDRAM=m
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOC2000 is not set
# CONFIG_MTD_DOC2001 is not set
CONFIG_MTD_DOC2001PLUS=m
CONFIG_MTD_DOCPROBE=m
CONFIG_MTD_DOCECC=m
CONFIG_MTD_DOCPROBE_ADVANCED=y
CONFIG_MTD_DOCPROBE_ADDRESS=0x0000
# CONFIG_MTD_DOCPROBE_HIGH is not set
# CONFIG_MTD_DOCPROBE_55AA is not set
# CONFIG_MTD_NAND is not set
CONFIG_MTD_NAND_IDS=m
# CONFIG_MTD_ONENAND is not set

#
# LPDDR flash memory drivers
#
CONFIG_MTD_LPDDR=m
CONFIG_MTD_QINFO_PROBE=m
CONFIG_MTD_UBI=m
CONFIG_MTD_UBI_WL_THRESHOLD=4096
CONFIG_MTD_UBI_BEB_RESERVE=1
# CONFIG_MTD_UBI_GLUEBI is not set

#
# UBI debugging options
#
# CONFIG_MTD_UBI_DEBUG is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
# CONFIG_PARPORT_GSC is not set
CONFIG_PARPORT_AX88796=m
# CONFIG_PARPORT_1284 is not set
CONFIG_PARPORT_NOT_PC=y
# CONFIG_BLK_DEV is not set
CONFIG_MISC_DEVICES=y
CONFIG_AD525X_DPOT=m
# CONFIG_AD525X_DPOT_I2C is not set
CONFIG_ENCLOSURE_SERVICES=m
CONFIG_APDS9802ALS=m
CONFIG_ISL29003=m
CONFIG_ISL29020=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_SENSORS_BH1780 is not set
CONFIG_SENSORS_BH1770=m
CONFIG_SENSORS_APDS990X=m
CONFIG_HMC6352=m
# CONFIG_VMWARE_BALLOON is not set
# CONFIG_BMP085 is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
CONFIG_EEPROM_LEGACY=m
# CONFIG_EEPROM_93CX6 is not set

#
# Texas Instruments shared transport line discipline
#
CONFIG_HAVE_IDE=y
# CONFIG_IDE is not set

#
# SCSI device support
#
CONFIG_SCSI_MOD=m
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
# CONFIG_BLK_DEV_SD is not set
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
# CONFIG_CHR_DEV_SG is not set
CONFIG_CHR_DEV_SCH=m
CONFIG_SCSI_ENCLOSURE=m
CONFIG_SCSI_MULTI_LUN=y
# CONFIG_SCSI_CONSTANTS is not set
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
# CONFIG_SCSI_SAS_HOST_SMP is not set
CONFIG_SCSI_SAS_LIBSAS_DEBUG=y
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
CONFIG_ISCSI_BOOT_SYSFS=m
CONFIG_SCSI_BUSLOGIC=m
CONFIG_LIBFC=m
CONFIG_LIBFCOE=m
# CONFIG_SCSI_FD_MCS is not set
# CONFIG_SCSI_IBMMCA is not set
# CONFIG_SCSI_PPA is not set
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
CONFIG_SCSI_IZIP_SLOW_CTR=y
CONFIG_SCSI_NCR_D700=m
# CONFIG_SCSI_NCR_Q720 is not set
# CONFIG_SCSI_SIM710 is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
# CONFIG_ATA is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
# CONFIG_MD_LINEAR is not set
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID456 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_MD_FAULTY is not set
# CONFIG_BLK_DEV_DM is not set
CONFIG_MACINTOSH_DRIVERS=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
CONFIG_EQUALIZER=m
# CONFIG_TUN is not set
CONFIG_VETH=m
# CONFIG_MII is not set
CONFIG_PHYLIB=m

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=m
CONFIG_CICADA_PHY=m
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=m
CONFIG_BROADCOM_PHY=m
# CONFIG_BCM63XX_PHY is not set
CONFIG_ICPLUS_PHY=m
CONFIG_REALTEK_PHY=m
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
CONFIG_LSI_ET1011C_PHY=m
CONFIG_MICREL_PHY=m
CONFIG_MDIO_BITBANG=m
CONFIG_MDIO_GPIO=m
# CONFIG_NET_ETHERNET is not set
CONFIG_NETDEV_1000=y
# CONFIG_STMMAC_ETH is not set
# CONFIG_NETDEV_10000 is not set
CONFIG_TR=m
# CONFIG_IBMTR is not set
CONFIG_TMS380TR=m
# CONFIG_MADGEMC is not set
# CONFIG_SMCTR is not set
# CONFIG_WLAN is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
CONFIG_WAN=y
# CONFIG_HDLC is not set
# CONFIG_DLCI is not set
# CONFIG_SBNI is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
# CONFIG_ATM_TCP is not set

#
# CAIF transport drivers
#
# CONFIG_PLIP is not set
CONFIG_PPP=m
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
# CONFIG_PPP_SYNC_TTY is not set
CONFIG_PPP_DEFLATE=m
CONFIG_PPP_BSDCOMP=m
CONFIG_PPPOATM=m
# CONFIG_SLIP is not set
CONFIG_SLHC=m
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_ISDN is not set
CONFIG_PHONE=m

#
# Input device support
#
# CONFIG_INPUT is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=m
# CONFIG_SERIO_I8042 is not set
CONFIG_SERIO_SERPORT=m
# CONFIG_SERIO_CT82C710 is not set
CONFIG_SERIO_PARKBD=m
# CONFIG_SERIO_LIBPS2 is not set
# CONFIG_SERIO_RAW is not set
# CONFIG_SERIO_ALTERA_PS2 is not set
CONFIG_SERIO_PS2MULT=m
# CONFIG_GAMEPORT is not set

#
# Character devices
#
# CONFIG_VT is not set
# CONFIG_DEVKMEM is not set
CONFIG_SERIAL_NONSTANDARD=y
CONFIG_N_HDLC=m
CONFIG_RISCOM8=m
# CONFIG_SPECIALIX is not set
CONFIG_STALDRV=y

#
# Serial drivers
#
CONFIG_SERIAL_8250=m
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
# CONFIG_SERIAL_8250_RSA is not set
CONFIG_SERIAL_8250_MCA=m

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=m
CONFIG_SERIAL_TIMBERDALE=m
# CONFIG_SERIAL_ALTERA_JTAGUART is not set
CONFIG_SERIAL_ALTERA_UART=m
CONFIG_SERIAL_ALTERA_UART_MAXPORTS=4
CONFIG_SERIAL_ALTERA_UART_BAUDRATE=115200
CONFIG_UNIX98_PTYS=y
CONFIG_DEVPTS_MULTIPLE_INSTANCES=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_TTY_PRINTK=y
# CONFIG_PRINTER is not set
# CONFIG_PPDEV is not set
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
# CONFIG_IPMI_DEVICE_INTERFACE is not set
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=m
CONFIG_NVRAM=m
# CONFIG_RTC is not set
# CONFIG_GEN_RTC is not set
# CONFIG_R3964 is not set
# CONFIG_MWAVE is not set
# CONFIG_PC8736x_GPIO is not set
CONFIG_NSC_GPIO=m
CONFIG_CS5535_GPIO=m
CONFIG_RAW_DRIVER=m
CONFIG_MAX_RAW_DEVS=256
# CONFIG_HANGCHECK_TIMER is not set
# CONFIG_RAMOOPS is not set
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_COMPAT=y
# CONFIG_I2C_CHARDEV is not set
# CONFIG_I2C_HELPER_AUTO is not set
CONFIG_I2C_SMBUS=m

#
# I2C Algorithms
#
CONFIG_I2C_ALGOBIT=m
CONFIG_I2C_ALGOPCF=m
CONFIG_I2C_ALGOPCA=m

#
# I2C Hardware Bus support
#

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
CONFIG_I2C_GPIO=m
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_PARPORT is not set
CONFIG_I2C_PARPORT_LIGHT=m

#
# Other I2C/SMBus bus drivers
#
CONFIG_I2C_DEBUG_CORE=y
CONFIG_I2C_DEBUG_ALGO=y
CONFIG_I2C_DEBUG_BUS=y
# CONFIG_SPI is not set

#
# PPS support
#
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
CONFIG_GPIOLIB=y

#
# Memory mapped GPIO expanders:
#
CONFIG_GPIO_BASIC_MMIO=m
# CONFIG_GPIO_IT8761E is not set
CONFIG_GPIO_VX855=m

#
# I2C GPIO expanders:
#
# CONFIG_GPIO_MAX7300 is not set
CONFIG_GPIO_MAX732X=m
# CONFIG_GPIO_PCA953X is not set
# CONFIG_GPIO_PCF857X is not set
CONFIG_GPIO_ADP5588=m

#
# PCI GPIO expanders:
#

#
# SPI GPIO expanders:
#

#
# AC97 GPIO expanders:
#

#
# MODULbus GPIO expanders:
#
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_HWMON_DEBUG_CHIP=y

#
# Native drivers
#
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
# CONFIG_SENSORS_ADM1026 is not set
CONFIG_SENSORS_ADM1029=m
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_ASC7621 is not set
# CONFIG_SENSORS_DS1621 is not set
CONFIG_SENSORS_F71805F=m
# CONFIG_SENSORS_F71882FG is not set
CONFIG_SENSORS_F75375S=m
CONFIG_SENSORS_FSCHMD=m
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_GPIO_FAN is not set
# CONFIG_SENSORS_IBMAEM is not set
CONFIG_SENSORS_IBMPEX=m
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_JC42=m
CONFIG_SENSORS_LM63=m
# CONFIG_SENSORS_LM73 is not set
CONFIG_SENSORS_LM75=m
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
# CONFIG_SENSORS_LM87 is not set
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
CONFIG_SENSORS_LM93=m
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
CONFIG_SENSORS_PC87360=m
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
CONFIG_SENSORS_SHT15=m
CONFIG_SENSORS_EMC1403=m
CONFIG_SENSORS_EMC2103=m
CONFIG_SENSORS_SMSC47M1=m
# CONFIG_SENSORS_SMSC47M192 is not set
CONFIG_SENSORS_ADS7828=m
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA_CPUTEMP=m
CONFIG_SENSORS_VT1211=m
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
CONFIG_SENSORS_W83792D=m
CONFIG_SENSORS_W83627HF=m
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_THERMAL is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set
CONFIG_MFD_SUPPORT=y
CONFIG_MFD_CORE=m
CONFIG_MFD_SM501=m
# CONFIG_MFD_SM501_GPIO is not set
CONFIG_HTC_PASIC3=m
# CONFIG_TPS65010 is not set
# CONFIG_TPS6507X is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_PCF50633 is not set
CONFIG_ABX500_CORE=y
CONFIG_MFD_VX855=m
CONFIG_REGULATOR=y
# CONFIG_REGULATOR_DEBUG is not set
# CONFIG_REGULATOR_DUMMY is not set
# CONFIG_REGULATOR_FIXED_VOLTAGE is not set
CONFIG_REGULATOR_VIRTUAL_CONSUMER=m
# CONFIG_REGULATOR_USERSPACE_CONSUMER is not set
# CONFIG_REGULATOR_BQ24022 is not set
# CONFIG_REGULATOR_MAX1586 is not set
# CONFIG_REGULATOR_MAX8649 is not set
# CONFIG_REGULATOR_MAX8660 is not set
CONFIG_REGULATOR_MAX8952=m
CONFIG_REGULATOR_LP3971=m
# CONFIG_REGULATOR_LP3972 is not set
# CONFIG_REGULATOR_TPS65023 is not set
CONFIG_REGULATOR_TPS6507X=m
CONFIG_REGULATOR_ISL6271A=m
# CONFIG_REGULATOR_AD5398 is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_DRM is not set
CONFIG_VGASTATE=m
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=m
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
# CONFIG_FB_BOOT_VESA_SUPPORT is not set
CONFIG_FB_CFB_FILLRECT=m
CONFIG_FB_CFB_COPYAREA=m
CONFIG_FB_CFB_IMAGEBLIT=m
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
CONFIG_FB_SYS_FILLRECT=m
CONFIG_FB_SYS_COPYAREA=m
CONFIG_FB_SYS_IMAGEBLIT=m
# CONFIG_FB_FOREIGN_ENDIAN is not set
CONFIG_FB_SYS_FOPS=m
CONFIG_FB_DEFERRED_IO=y
CONFIG_FB_HECUBA=m
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
# CONFIG_FB_MODE_HELPERS is not set
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_ARC=m
CONFIG_FB_VGA16=m
CONFIG_FB_N411=m
CONFIG_FB_HGA=m
CONFIG_FB_S1D13XXX=m
# CONFIG_FB_TMIO is not set
# CONFIG_FB_SM501 is not set
# CONFIG_FB_VIRTUAL is not set
CONFIG_FB_METRONOME=m
CONFIG_FB_MB862XX=m
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
# CONFIG_LCD_CLASS_DEVICE is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=m
CONFIG_BACKLIGHT_GENERIC=m
CONFIG_BACKLIGHT_MBP_NVIDIA=m
CONFIG_BACKLIGHT_SAHARA=m
CONFIG_BACKLIGHT_ADP8860=m

#
# Display device support
#
CONFIG_DISPLAY_SUPPORT=m

#
# Display hardware drivers
#
# CONFIG_LOGO is not set
# CONFIG_SOUND is not set
CONFIG_USB_SUPPORT=y
# CONFIG_USB_ARCH_HAS_HCD is not set
# CONFIG_USB_ARCH_HAS_OHCI is not set
# CONFIG_USB_ARCH_HAS_EHCI is not set
# CONFIG_USB_OTG_WHITELIST is not set
CONFIG_USB_OTG_BLACKLIST_HUB=y

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#
CONFIG_USB_GADGET=m
# CONFIG_USB_GADGET_DEBUG_FILES is not set
# CONFIG_USB_GADGET_DEBUG_FS is not set
CONFIG_USB_GADGET_VBUS_DRAW=2
CONFIG_USB_GADGET_SELECTED=y
CONFIG_USB_GADGET_R8A66597=y
CONFIG_USB_R8A66597=m
# CONFIG_USB_GADGET_M66592 is not set
CONFIG_USB_GADGET_DUALSPEED=y
# CONFIG_USB_ZERO is not set
CONFIG_USB_ETH=m
CONFIG_USB_ETH_RNDIS=y
CONFIG_USB_ETH_EEM=y
# CONFIG_USB_FILE_STORAGE is not set
# CONFIG_USB_MASS_STORAGE is not set
CONFIG_USB_G_SERIAL=m
# CONFIG_USB_G_PRINTER is not set
# CONFIG_USB_CDC_COMPOSITE is not set
# CONFIG_USB_G_MULTI is not set
CONFIG_USB_G_HID=m
CONFIG_USB_G_DBGP=m
# CONFIG_USB_G_DBGP_PRINTK is not set
CONFIG_USB_G_DBGP_SERIAL=y

#
# OTG and related infrastructure
#
CONFIG_USB_OTG_UTILS=y
CONFIG_USB_GPIO_VBUS=m
CONFIG_NOP_USB_XCEIV=m
# CONFIG_MMC is not set
CONFIG_MEMSTICK=m
CONFIG_MEMSTICK_DEBUG=y

#
# MemoryStick drivers
#
CONFIG_MEMSTICK_UNSAFE_RESUME=y
CONFIG_MSPRO_BLOCK=m

#
# MemoryStick Host Controller Drivers
#
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_ALIX2 is not set
CONFIG_LEDS_GPIO=m
CONFIG_LEDS_GPIO_PLATFORM=y
CONFIG_LEDS_LP3944=m
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_LP5523 is not set
# CONFIG_LEDS_PCA955X is not set
CONFIG_LEDS_REGULATOR=m
# CONFIG_LEDS_BD2802 is not set
CONFIG_LEDS_LT3593=m
CONFIG_LEDS_TRIGGERS=y

#
# LED Triggers
#
# CONFIG_LEDS_TRIGGER_TIMER is not set
# CONFIG_LEDS_TRIGGER_HEARTBEAT is not set
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_GPIO is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
CONFIG_ACCESSIBILITY=y
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
# CONFIG_EDAC_MM_EDAC is not set
# CONFIG_RTC_CLASS is not set
CONFIG_DMADEVICES=y
CONFIG_DMADEVICES_DEBUG=y
# CONFIG_DMADEVICES_VDEBUG is not set

#
# DMA Devices
#
# CONFIG_TIMB_DMA is not set
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=m
CONFIG_UIO_PDRV=m
CONFIG_UIO_PDRV_GENIRQ=m
CONFIG_X86_PLATFORM_DEVICES=y

#
# Firmware Drivers
#
CONFIG_EDD=m
CONFIG_EDD_OFF=y
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=m
# CONFIG_EXT2_FS_XATTR is not set
CONFIG_EXT2_FS_XIP=y
# CONFIG_EXT3_FS is not set
# CONFIG_EXT4_FS is not set
CONFIG_FS_XIP=y
CONFIG_REISERFS_FS=m
CONFIG_REISERFS_CHECK=y
# CONFIG_REISERFS_PROC_INFO is not set
CONFIG_REISERFS_FS_XATTR=y
# CONFIG_REISERFS_FS_POSIX_ACL is not set
# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_JFS_FS is not set
# CONFIG_FS_POSIX_ACL is not set
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
# CONFIG_XFS_POSIX_ACL is not set
CONFIG_XFS_RT=y
# CONFIG_OCFS2_FS is not set
CONFIG_EXPORTFS=m
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY_USER is not set
# CONFIG_FANOTIFY is not set
# CONFIG_QUOTA is not set
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_QUOTACTL=y
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set

#
# Caches
#
CONFIG_FSCACHE=m
# CONFIG_FSCACHE_STATS is not set
# CONFIG_FSCACHE_HISTOGRAM is not set
# CONFIG_FSCACHE_DEBUG is not set
CONFIG_FSCACHE_OBJECT_LIST=y
CONFIG_CACHEFILES=m
# CONFIG_CACHEFILES_DEBUG is not set
# CONFIG_CACHEFILES_HISTOGRAM is not set

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
# CONFIG_PROC_PAGE_MONITOR is not set
CONFIG_SYSFS=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_HFSPLUS_FS is not set
# CONFIG_JFFS2_FS is not set
CONFIG_UBIFS_FS=m
# CONFIG_UBIFS_FS_XATTR is not set
CONFIG_UBIFS_FS_ADVANCED_COMPR=y
CONFIG_UBIFS_FS_LZO=y
CONFIG_UBIFS_FS_ZLIB=y
CONFIG_UBIFS_FS_DEBUG=y
CONFIG_UBIFS_FS_DEBUG_MSG_LVL=0
# CONFIG_UBIFS_FS_DEBUG_CHKS is not set
CONFIG_CRAMFS=m
CONFIG_SQUASHFS=m
# CONFIG_SQUASHFS_XATTR is not set
# CONFIG_SQUASHFS_LZO is not set
CONFIG_SQUASHFS_EMBEDDED=y
CONFIG_SQUASHFS_FRAGMENT_CACHE_SIZE=3
# CONFIG_VXFS_FS is not set
CONFIG_MINIX_FS=m
CONFIG_OMFS_FS=m
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
CONFIG_SYSV_FS=m
# CONFIG_NETWORK_FILESYSTEMS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_NLS is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
CONFIG_MAGIC_SYSRQ=y
CONFIG_STRIP_ASM_SYMS=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
# CONFIG_DEBUG_KERNEL is not set
# CONFIG_HARDLOCKUP_DETECTOR is not set
# CONFIG_SLUB_DEBUG_ON is not set
CONFIG_SLUB_STATS=y
# CONFIG_BKL is not set
# CONFIG_SPARSE_RCU_POINTER is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_MEMORY_INIT is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_LKDTM is not set
# CONFIG_SYSCTL_SYSCALL_CHECK is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
CONFIG_HAVE_C_RECORDMCOUNT=y
CONFIG_RING_BUFFER=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_EVENT_TRACING=y
# CONFIG_EVENT_POWER_TRACING_DEPRECATED is not set
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_GENERIC_TRACER=y
CONFIG_TRACING_SUPPORT=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
CONFIG_FTRACE_SYSCALLS=y
CONFIG_BRANCH_PROFILE_NONE=y
# CONFIG_PROFILE_ANNOTATED_BRANCHES is not set
# CONFIG_PROFILE_ALL_BRANCHES is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE_SELFTEST=y
CONFIG_FTRACE_STARTUP_TEST=y
# CONFIG_EVENT_TRACE_TEST_SYSCALLS is not set
# CONFIG_RING_BUFFER_BENCHMARK is not set
# CONFIG_DMA_API_DEBUG is not set
CONFIG_ATOMIC64_SELFTEST=y
CONFIG_SAMPLES=y
# CONFIG_SAMPLE_TRACEPOINTS is not set
CONFIG_SAMPLE_TRACE_EVENTS=m
CONFIG_SAMPLE_KOBJECT=m
CONFIG_SAMPLE_HW_BREAKPOINT=m
CONFIG_SAMPLE_KFIFO=m
CONFIG_HAVE_ARCH_KGDB=y
CONFIG_HAVE_ARCH_KMEMCHECK=y
# CONFIG_STRICT_DEVMEM is not set
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_EARLY_PRINTK=y
CONFIG_DOUBLEFAULT=y
# CONFIG_IOMMU_STRESS is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY_DMESG_RESTRICT is not set
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
# CONFIG_SECURITY_NETWORK is not set
CONFIG_SECURITY_PATH=y
CONFIG_SECURITY_TOMOYO=y
# CONFIG_SECURITY_APPARMOR is not set
# CONFIG_IMA is not set
CONFIG_DEFAULT_SECURITY_TOMOYO=y
# CONFIG_DEFAULT_SECURITY_DAC is not set
CONFIG_DEFAULT_SECURITY="tomoyo"
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=m
CONFIG_CRYPTO_ALGAPI2=m
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=m
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=m
CONFIG_CRYPTO_HASH=m
CONFIG_CRYPTO_HASH2=m
CONFIG_CRYPTO_RNG2=m
CONFIG_CRYPTO_PCOMP=m
CONFIG_CRYPTO_PCOMP2=m
CONFIG_CRYPTO_MANAGER=m
CONFIG_CRYPTO_MANAGER2=m
CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
CONFIG_CRYPTO_GF128MUL=m
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_WORKQUEUE=m
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
CONFIG_CRYPTO_CTS=m
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_PCBC is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=m
CONFIG_CRYPTO_CRC32C_INTEL=m
CONFIG_CRYPTO_GHASH=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_MICHAEL_MIC=m
# CONFIG_CRYPTO_RMD128 is not set
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_RMD256=m
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=m
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
# CONFIG_CRYPTO_TGR192 is not set
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_586=m
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SEED is not set
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
CONFIG_CRYPTO_TWOFISH_586=m

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
CONFIG_CRYPTO_ZLIB=m
CONFIG_CRYPTO_LZO=m

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
CONFIG_HAVE_KVM=y
CONFIG_VIRTUALIZATION=y
CONFIG_VIRTIO=m
CONFIG_VIRTIO_RING=m
CONFIG_VIRTIO_BALLOON=m
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=m
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
CONFIG_CRC_T10DIF=m
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=m
CONFIG_CRC7=m
CONFIG_LIBCRC32C=m
CONFIG_AUDIT_GENERIC=y
CONFIG_ZLIB_INFLATE=m
CONFIG_ZLIB_DEFLATE=m
CONFIG_LZO_COMPRESS=m
CONFIG_LZO_DECOMPRESS=m
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y
CONFIG_FORCE_SUCCESSFUL_BUILD=y
CONFIG_X86_32_ALWAYS_ON=y

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18  9:36               ` Ingo Molnar
@ 2010-11-18  9:44                 ` Jean Pihet
  2010-11-18 10:52                 ` Ingo Molnar
  1 sibling, 0 replies; 72+ messages in thread
From: Jean Pihet @ 2010-11-18  9:44 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Renninger, rjw, linux-kernel, arjan

On Thu, Nov 18, 2010 at 10:36 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Thomas Renninger <trenn@suse.de> wrote:
>
>> On Thursday 18 November 2010 09:01:32 Ingo Molnar wrote:
>> ...
>> > > @Ingo: If this does not go into x86/tip, but perf or whatever tree, it would
>> > > be great if you can ping me as soon as this stuff is in.
>> >
>> > Mind sending the latest version which has been adjusted/fixed and all acks added?
>> Done.
>> This time with lkml excluded as there were only cleanups/fixes
>> due to a messed merge.
>
> Please do not exclude lkml from such iterations of patches in the future - every
> modification to patches is relevant - often pure resends get resent to lkml as well.

Ok for me!

Acked-by: Jean Pihet <j-pihet@ti.com>

Note the ti.com email address to be used for Sign-offs and Acks.

Thanks,
Jean

>
> Thanks,
>
>        Ingo
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18  9:27             ` Thomas Renninger
@ 2010-11-18  9:36               ` Ingo Molnar
  2010-11-18  9:44                 ` Jean Pihet
  2010-11-18 10:52                 ` Ingo Molnar
  0 siblings, 2 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-11-18  9:36 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: Jean Pihet, rjw, linux-kernel, arjan


* Thomas Renninger <trenn@suse.de> wrote:

> On Thursday 18 November 2010 09:01:32 Ingo Molnar wrote:
> ...
> > > @Ingo: If this does not go into x86/tip, but perf or whatever tree, it would
> > > be great if you can ping me as soon as this stuff is in.
> > 
> > Mind sending the latest version which has been adjusted/fixed and all acks added?
> Done.
> This time with lkml excluded as there were only cleanups/fixes
> due to a messed merge.

Please do not exclude lkml from such iterations of patches in the future - every 
modification to patches is relevant - often pure resends get resent to lkml as well.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-18  8:01           ` Ingo Molnar
@ 2010-11-18  9:27             ` Thomas Renninger
  2010-11-18  9:36               ` Ingo Molnar
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Renninger @ 2010-11-18  9:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Jean Pihet, rjw, linux-kernel, arjan

On Thursday 18 November 2010 09:01:32 Ingo Molnar wrote:
...
> > @Ingo: If this does not go into x86/tip, but perf or whatever tree, it would
> > be great if you can ping me as soon as this stuff is in.
> 
> Mind sending the latest version which has been adjusted/fixed and all acks added?
Done.
This time with lkml excluded as there were only cleanups/fixes
due to a messed merge.

Thanks,

  Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-14 13:34         ` Thomas Renninger
@ 2010-11-18  8:01           ` Ingo Molnar
  2010-11-18  9:27             ` Thomas Renninger
  0 siblings, 1 reply; 72+ messages in thread
From: Ingo Molnar @ 2010-11-18  8:01 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: Jean Pihet, rjw, linux-kernel, arjan


* Thomas Renninger <trenn@suse.de> wrote:

> On Friday 12 November 2010 03:50:21 pm Jean Pihet wrote:
> > On Fri, Nov 12, 2010 at 7:17 PM, Thomas Renninger <trenn@suse.de> wrote:
> ...
> > >> > +
> > >> > +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> > >> > +
> > >> >  #ifndef _TRACE_POWER_ENUM_
> > >> >  #define _TRACE_POWER_ENUM_
> > >> >  enum {
> > >> > @@ -153,8 +214,32 @@ DEFINE_EVENT(power_domain, power_domain_target,
> > >> >
> > >> >        TP_ARGS(name, state, cpu_id)
> > >> >  );
> > >> > -
> > >> > +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
> > >> The clock and power_domain events have been recently introduced and so
> > >> must be part of the new API. Can this #endif be moved right after the
> > >> definition of power_end?
> > > Oops, I pulled again meanwhile and the patches still patched without fuzz,
> > > but probably with some offset.
> > > I'll look at that and resend this one.
> > Ok
> Thanks for pointing this out. Because pre-processor conditionals only have 
> been moved around it looks like my test build after pulling still succeeded,
> while the #ifdefs/#endifs were rather messed up.
> 
> I adjusted these parts and successfully test-built on quite a lot .config 
> flavors on i386, x86_64, different ppc, ia64 and s390.
> 
> > >> A string is needed here. Without it it is impossible to have the option
> > >> unset.
> > >> This does the trick: +bool "Deprecated power event trace API, to be
> > >> removed" 
> Adjusted, thanks.
> 
> > > I am currently rebuilding on several archs/flavors and hope to be able
> > > to re-send this one today or on Tue.
> Done.
> 
> @Ingo: If this does not go into x86/tip, but perf or whatever tree, it would
> be great if you can ping me as soon as this stuff is in.

Mind sending the latest version which has been adjusted/fixed and all acks added?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-14 13:22   ` Thomas Renninger
@ 2010-11-15 15:49     ` Jean Pihet
  0 siblings, 0 replies; 72+ messages in thread
From: Jean Pihet @ 2010-11-15 15:49 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: mingo, rjw, linux-kernel, arjan

Acked-by: Jean Pihet <j-pihet@ti.com>

On Sun, Nov 14, 2010 at 2:22 PM, Thomas Renninger <trenn@suse.de> wrote:
> PERF(kernel): Cleanup power events
>
> Recent changes:
>  - Fix pre-processor conditionals which got messed up silently by a recent merge/pull
>  - Add a comment to EVENT_POWER_TRACING_DEPRECATED .config option
>
> New power trace events:
> power:cpu_idle
> power:cpu_frequency
> power:machine_suspend
>
>
> C-state/idle accounting events:
>  power:power_start
>  power:power_end
> are replaced with:
>  power:cpu_idle
>
> and
>  power:power_frequency
> is replaced with:
>  power:cpu_frequency
>
> power:machine_suspend
> is newly introduced.
> Jean Pihet has a patch integrated into the generic layer
> (kernel/power/suspend.c) which will make use of it.
>
> the type= field got removed from both, it was never
> used and the type is differed by the event type itself.
>
> perf timechart
> userspace tool gets adjusted in a separate patch.
>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> Acked-by: Arjan van de Ven <arjan@linux.intel.com>
> Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
> CC: Arjan van de Ven <arjan@linux.intel.com>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: rjw@sisk.pl
> CC: linux-kernel@vger.kernel.org
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 57d1868..155d975 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -374,6 +374,7 @@ void default_idle(void)
>  {
>        if (hlt_use_halt()) {
>                trace_power_start(POWER_CSTATE, 1, smp_processor_id());
> +               trace_cpu_idle(1, smp_processor_id());
>                current_thread_info()->status &= ~TS_POLLING;
>                /*
>                 * TS_POLLING-cleared state must be visible before we
> @@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
>  void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
>  {
>        trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
> +       trace_cpu_idle((ax>>4)+1, smp_processor_id());
>        if (!need_resched()) {
>                if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
>                        clflush((void *)&current_thread_info()->flags);
> @@ -460,6 +462,7 @@ static void mwait_idle(void)
>  {
>        if (!need_resched()) {
>                trace_power_start(POWER_CSTATE, 1, smp_processor_id());
> +               trace_cpu_idle(1, smp_processor_id());
>                if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
>                        clflush((void *)&current_thread_info()->flags);
>
> @@ -481,10 +484,12 @@ static void mwait_idle(void)
>  static void poll_idle(void)
>  {
>        trace_power_start(POWER_CSTATE, 0, smp_processor_id());
> +       trace_cpu_idle(0, smp_processor_id());
>        local_irq_enable();
>        while (!need_resched())
>                cpu_relax();
> -       trace_power_end(0);
> +       trace_power_end(smp_processor_id());
> +       trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>  }
>
>  /*
> diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> index 96586c3..4b9befa 100644
> --- a/arch/x86/kernel/process_32.c
> +++ b/arch/x86/kernel/process_32.c
> @@ -113,8 +113,8 @@ void cpu_idle(void)
>                        stop_critical_timings();
>                        pm_idle();
>                        start_critical_timings();
> -
>                        trace_power_end(smp_processor_id());
> +                       trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>                }
>                tick_nohz_restart_sched_tick();
>                preempt_enable_no_resched();
> diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
> index b3d7a3a..4c818a7 100644
> --- a/arch/x86/kernel/process_64.c
> +++ b/arch/x86/kernel/process_64.c
> @@ -142,6 +142,8 @@ void cpu_idle(void)
>                        start_critical_timings();
>
>                        trace_power_end(smp_processor_id());
> +                       trace_cpu_idle(PWR_EVENT_EXIT,
> +                                      smp_processor_id());
>
>                        /* In many cases the interrupt that ended idle
>                           has already called exit_idle. But some idle
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index c63a438..1109f68 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
>                dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
>                        (unsigned long)freqs->cpu);
>                trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
> +               trace_cpu_frequency(freqs->new, freqs->cpu);
>                srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
>                                CPUFREQ_POSTCHANGE, freqs);
>                if (likely(policy) && likely(policy->cpu == freqs->cpu))
> diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
> index a507108..08d5f05 100644
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
>        if (cpuidle_curr_governor->reflect)
>                cpuidle_curr_governor->reflect(dev);
>        trace_power_end(smp_processor_id());
> +       trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
>  }
>
>  /**
> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 3c95325..ba5134f 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
>
>        stop_critical_timings();
>        trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
> +       trace_cpu_idle((eax >> 4) + 1, cpu);
>        if (!need_resched()) {
>
>                __monitor((void *)&current_thread_info()->flags, 0, 0);
> diff --git a/include/trace/events/power.h b/include/trace/events/power.h
> index 286784d..00d9819 100644
> --- a/include/trace/events/power.h
> +++ b/include/trace/events/power.h
> @@ -7,16 +7,67 @@
>  #include <linux/ktime.h>
>  #include <linux/tracepoint.h>
>
> -#ifndef _TRACE_POWER_ENUM_
> -#define _TRACE_POWER_ENUM_
> -enum {
> -       POWER_NONE      = 0,
> -       POWER_CSTATE    = 1,    /* C-State */
> -       POWER_PSTATE    = 2,    /* Fequency change or DVFS */
> -       POWER_SSTATE    = 3,    /* Suspend */
> -};
> +DECLARE_EVENT_CLASS(cpu,
> +
> +       TP_PROTO(unsigned int state, unsigned int cpu_id),
> +
> +       TP_ARGS(state, cpu_id),
> +
> +       TP_STRUCT__entry(
> +               __field(        u32,            state           )
> +               __field(        u32,            cpu_id          )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->state = state;
> +               __entry->cpu_id = cpu_id;
> +       ),
> +
> +       TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
> +                 (unsigned long)__entry->cpu_id)
> +);
> +
> +DEFINE_EVENT(cpu, cpu_idle,
> +
> +       TP_PROTO(unsigned int state, unsigned int cpu_id),
> +
> +       TP_ARGS(state, cpu_id)
> +);
> +
> +/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
> +#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
> +#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
> +
> +#define PWR_EVENT_EXIT -1
>  #endif
>
> +DEFINE_EVENT(cpu, cpu_frequency,
> +
> +       TP_PROTO(unsigned int frequency, unsigned int cpu_id),
> +
> +       TP_ARGS(frequency, cpu_id)
> +);
> +
> +TRACE_EVENT(machine_suspend,
> +
> +       TP_PROTO(unsigned int state),
> +
> +       TP_ARGS(state),
> +
> +       TP_STRUCT__entry(
> +               __field(        u32,            state           )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->state = state;
> +       ),
> +
> +       TP_printk("state=%lu", (unsigned long)__entry->state)
> +);
> +
> +/* This code will be removed after deprecation time exceeded (2.6.41) */
> +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> +
>  /*
>  * The power events are used for cpuidle & suspend (power_start, power_end)
>  *  and for cpufreq (power_frequency)
> @@ -75,6 +126,24 @@ TRACE_EVENT(power_end,
>
>  );
>
> +/* Deprecated dummy functions must be protected against multi-declartion */
> +#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
> +#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
> +
> +enum {
> +       POWER_NONE = 0,
> +       POWER_CSTATE = 1,
> +       POWER_PSTATE = 2,
> +};
> +#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
> +#else
> +/* These dummy declaration have to be ripped out when the deprecated
> +   events get removed */
> +static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
> +static inline void trace_power_end(u64 cpuid) {};
> +static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
> +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
> +
>  /*
>  * The clock events are used for clock enable/disable and for
>  *  clock rate change
> @@ -153,7 +222,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
>
>        TP_ARGS(name, state, cpu_id)
>  );
> -
>  #endif /* _TRACE_POWER_H */
>
>  /* This part must be outside protection */
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index e04b8bc..59b44a1 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -69,6 +69,21 @@ config EVENT_TRACING
>        select CONTEXT_SWITCH_TRACER
>        bool
>
> +config EVENT_POWER_TRACING_DEPRECATED
> +       depends on EVENT_TRACING
> +       bool "Deprecated power event trace API, to be removed"
> +       default y
> +       help
> +         Provides old power event types:
> +         C-state/idle accounting events:
> +         power:power_start
> +         power:power_end
> +         and old cpufreq accounting event:
> +         power:power_frequency
> +         This is for userspace compatibility
> +         and will vanish after 5 kernel iterations,
> +         namely 2.6.41.
> +
>  config CONTEXT_SWITCH_TRACER
>        bool
>
> diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
> index 0e0497d..f55fcf6 100644
> --- a/kernel/trace/power-traces.c
> +++ b/kernel/trace/power-traces.c
> @@ -13,5 +13,8 @@
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/power.h>
>
> +#ifdef EVENT_POWER_TRACING_DEPRECATED
>  EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
> +#endif
> +EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
>
>

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-12 21:50       ` Jean Pihet
@ 2010-11-14 13:34         ` Thomas Renninger
  2010-11-18  8:01           ` Ingo Molnar
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Renninger @ 2010-11-14 13:34 UTC (permalink / raw)
  To: Jean Pihet; +Cc: mingo, rjw, linux-kernel, arjan

On Friday 12 November 2010 03:50:21 pm Jean Pihet wrote:
> On Fri, Nov 12, 2010 at 7:17 PM, Thomas Renninger <trenn@suse.de> wrote:
...
> >> > +
> >> > +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> >> > +
> >> >  #ifndef _TRACE_POWER_ENUM_
> >> >  #define _TRACE_POWER_ENUM_
> >> >  enum {
> >> > @@ -153,8 +214,32 @@ DEFINE_EVENT(power_domain, power_domain_target,
> >> >
> >> >        TP_ARGS(name, state, cpu_id)
> >> >  );
> >> > -
> >> > +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
> >> The clock and power_domain events have been recently introduced and so
> >> must be part of the new API. Can this #endif be moved right after the
> >> definition of power_end?
> > Oops, I pulled again meanwhile and the patches still patched without fuzz,
> > but probably with some offset.
> > I'll look at that and resend this one.
> Ok
Thanks for pointing this out. Because pre-processor conditionals only have 
been moved around it looks like my test build after pulling still succeeded,
while the #ifdefs/#endifs were rather messed up.

I adjusted these parts and successfully test-built on quite a lot .config 
flavors on i386, x86_64, different ppc, ia64 and s390.

> >> A string is needed here. Without it it is impossible to have the option
> >> unset.
> >> This does the trick: +bool "Deprecated power event trace API, to be
> >> removed" 
Adjusted, thanks.

> > I am currently rebuilding on several archs/flavors and hope to be able
> > to re-send this one today or on Tue.
Done.

@Ingo: If this does not go into x86/tip, but perf or whatever tree, it would
be great if you can ping me as soon as this stuff is in.
I want to cleanup the "double cpu_idle events" issues on top and make this
more architecture independent (throw cpu_idle events from cpuidle framework
instead of throwing very x86 specific mwait states, etc.).

Thanks,

       Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-11 18:03 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
  2010-11-12 14:20   ` Jean Pihet
@ 2010-11-14 13:22   ` Thomas Renninger
  2010-11-15 15:49     ` Jean Pihet
  1 sibling, 1 reply; 72+ messages in thread
From: Thomas Renninger @ 2010-11-14 13:22 UTC (permalink / raw)
  To: mingo; +Cc: rjw, linux-kernel, arjan, jean.pihet

PERF(kernel): Cleanup power events

Recent changes:
  - Fix pre-processor conditionals which got messed up silently by a recent merge/pull
  - Add a comment to EVENT_POWER_TRACING_DEPRECATED .config option

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: linux-kernel@vger.kernel.org

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3d7a3a..4c818a7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c63a438..1109f68 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3c95325..ba5134f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 286784d..00d9819 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,16 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
-#ifndef _TRACE_POWER_ENUM_
-#define _TRACE_POWER_ENUM_
-enum {
-	POWER_NONE	= 0,
-	POWER_CSTATE	= 1,	/* C-State */
-	POWER_PSTATE	= 2,	/* Fequency change or DVFS */
-	POWER_SSTATE	= 3,	/* Suspend */
-};
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
 #endif
 
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+/* This code will be removed after deprecation time exceeded (2.6.41) */
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 /*
  * The power events are used for cpuidle & suspend (power_start, power_end)
  *  and for cpufreq (power_frequency)
@@ -75,6 +126,24 @@ TRACE_EVENT(power_end,
 
 );
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED
+
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif /* _PWR_EVENT_AVOID_DOUBLE_DEFINING_DEPRECATED */
+#else
+/* These dummy declaration have to be ripped out when the deprecated
+   events get removed */
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
 /*
  * The clock events are used for clock enable/disable and for
  *  clock rate change
@@ -153,7 +222,6 @@ DEFINE_EVENT(power_domain, power_domain_target,
 
 	TP_ARGS(name, state, cpu_id)
 );
-
 #endif /* _TRACE_POWER_H */
 
 /* This part must be outside protection */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e04b8bc..59b44a1 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -69,6 +69,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool "Deprecated power event trace API, to be removed"
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-12 18:17     ` Thomas Renninger
@ 2010-11-12 21:50       ` Jean Pihet
  2010-11-14 13:34         ` Thomas Renninger
  0 siblings, 1 reply; 72+ messages in thread
From: Jean Pihet @ 2010-11-12 21:50 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: mingo, rjw, linux-kernel, arjan

On Fri, Nov 12, 2010 at 7:17 PM, Thomas Renninger <trenn@suse.de> wrote:
> On Friday 12 November 2010 08:20:47 am Jean Pihet wrote:
>> Thomas,
> ...
>> > +
>> > +       TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
>> > +                 (unsigned long)__entry->cpu_id)
>> Using %lu for the state field causes PWR_EVENT_EXIT to appear as
>> 4294967295 instead of -1. Can the field be of a signed type?
> This is intended, what exactly is the problem?
There is no problem, I just wanted to warn about it. I am fine with it.

>
> ...
>> > +       TP_printk("state=%lu", (unsigned long)__entry->state)
>> Same remark about the unsigned type for the state field.
> Same.
>>
>> > +);
>> > +
>> > +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
>> > +
>> >  #ifndef _TRACE_POWER_ENUM_
>> >  #define _TRACE_POWER_ENUM_
>> >  enum {
>> > @@ -153,8 +214,32 @@ DEFINE_EVENT(power_domain, power_domain_target,
>> >
>> >        TP_ARGS(name, state, cpu_id)
>> >  );
>> > -
>> > +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
>> The clock and power_domain events have been recently introduced and so
>> must be part of the new API. Can this #endif be moved right after the
>> definition of power_end?
> Oops, I pulled again meanwhile and the patches still patched without fuzz,
> but probably with some offset.
> I'll look at that and resend this one.
Ok

>
>> >  #endif /* _TRACE_POWER_H */
>> Should this be at the very end of the file?
> Not sure whether this also came from merge issues, but yes, several
> #ifdef conditions need to get corrected.
Ok

>
> ...
>
>> A string is needed here. Without it it is impossible to have the option
>> unset.
>> This does the trick: +bool "Deprecated power event trace API, to be removed"
> Ok, thanks.
>
> I am currently rebuilding on several archs/flavors and hope to be able
> to re-send this one today or on Tue.
>
> Thanks,
>
>    Thomas
>
Thanks!

Jean

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-12 14:20   ` Jean Pihet
@ 2010-11-12 18:17     ` Thomas Renninger
  2010-11-12 21:50       ` Jean Pihet
  0 siblings, 1 reply; 72+ messages in thread
From: Thomas Renninger @ 2010-11-12 18:17 UTC (permalink / raw)
  To: Jean Pihet; +Cc: mingo, rjw, linux-kernel, arjan

On Friday 12 November 2010 08:20:47 am Jean Pihet wrote:
> Thomas,
...
> > +
> > +       TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
> > +                 (unsigned long)__entry->cpu_id)
> Using %lu for the state field causes PWR_EVENT_EXIT to appear as
> 4294967295 instead of -1. Can the field be of a signed type?
This is intended, what exactly is the problem?
 
...
> > +       TP_printk("state=%lu", (unsigned long)__entry->state)
> Same remark about the unsigned type for the state field.
Same.
> 
> > +);
> > +
> > +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> > +
> >  #ifndef _TRACE_POWER_ENUM_
> >  #define _TRACE_POWER_ENUM_
> >  enum {
> > @@ -153,8 +214,32 @@ DEFINE_EVENT(power_domain, power_domain_target,
> >
> >        TP_ARGS(name, state, cpu_id)
> >  );
> > -
> > +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
> The clock and power_domain events have been recently introduced and so
> must be part of the new API. Can this #endif be moved right after the
> definition of power_end?
Oops, I pulled again meanwhile and the patches still patched without fuzz,
but probably with some offset.
I'll look at that and resend this one.

> >  #endif /* _TRACE_POWER_H */
> Should this be at the very end of the file?
Not sure whether this also came from merge issues, but yes, several
#ifdef conditions need to get corrected.

...

> A string is needed here. Without it it is impossible to have the option
> unset. 
> This does the trick: +bool "Deprecated power event trace API, to be removed"
Ok, thanks.

I am currently rebuilding on several archs/flavors and hope to be able
to re-send this one today or on Tue.

Thanks,

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-11 18:03 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
@ 2010-11-12 14:20   ` Jean Pihet
  2010-11-12 18:17     ` Thomas Renninger
  2010-11-14 13:22   ` Thomas Renninger
  1 sibling, 1 reply; 72+ messages in thread
From: Jean Pihet @ 2010-11-12 14:20 UTC (permalink / raw)
  To: Thomas Renninger; +Cc: mingo, rjw, linux-kernel, arjan

Thomas,

Thanks for the patches re-spin!

Here are my comments inlined.

On Thu, Nov 11, 2010 at 7:03 PM, Thomas Renninger <trenn@suse.de> wrote:
> Recent changes:
>  - Enable EVENT_POWER_TRACING_DEPRECATED by default
>
> New power trace events:
> power:cpu_idle
> power:cpu_frequency
> power:machine_suspend
>
>
> C-state/idle accounting events:
>  power:power_start
>  power:power_end
> are replaced with:
>  power:cpu_idle
>
> and
>  power:power_frequency
> is replaced with:
>  power:cpu_frequency
>
> power:machine_suspend
> is newly introduced.
> Jean Pihet has a patch integrated into the generic layer
> (kernel/power/suspend.c) which will make use of it.
>
> the type= field got removed from both, it was never
> used and the type is differed by the event type itself.
>
> perf timechart
> userspace tool gets adjusted in a separate patch.
>
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> Acked-by: Arjan van de Ven <arjan@linux.intel.com>
> Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
> CC: Arjan van de Ven <arjan@linux.intel.com>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: rjw@sisk.pl
> CC: linux-kernel@vger.kernel.org
> ---
>  arch/x86/kernel/process.c    |    7 +++-
>  arch/x86/kernel/process_32.c |    2 +-
>  arch/x86/kernel/process_64.c |    2 +
>  drivers/cpufreq/cpufreq.c    |    1 +
>  drivers/cpuidle/cpuidle.c    |    1 +
>  drivers/idle/intel_idle.c    |    1 +
>  include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
>  kernel/trace/Kconfig         |   15 +++++++
>  kernel/trace/power-traces.c  |    3 +
>  9 files changed, 116 insertions(+), 3 deletions(-)
>
...
> diff --git a/include/trace/events/power.h b/include/trace/events/power.h
> index 286784d..ab26d8e 100644
> --- a/include/trace/events/power.h
> +++ b/include/trace/events/power.h
> @@ -7,6 +7,67 @@
>  #include <linux/ktime.h>
>  #include <linux/tracepoint.h>
>
> +DECLARE_EVENT_CLASS(cpu,
> +
> +       TP_PROTO(unsigned int state, unsigned int cpu_id),
> +
> +       TP_ARGS(state, cpu_id),
> +
> +       TP_STRUCT__entry(
> +               __field(        u32,            state           )
> +               __field(        u32,            cpu_id          )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->state = state;
> +               __entry->cpu_id = cpu_id;
> +       ),
> +
> +       TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
> +                 (unsigned long)__entry->cpu_id)
Using %lu for the state field causes PWR_EVENT_EXIT to appear as
4294967295 instead of -1. Can the field be of a signed type?

> +);
> +
> +DEFINE_EVENT(cpu, cpu_idle,
> +
> +       TP_PROTO(unsigned int state, unsigned int cpu_id),
> +
> +       TP_ARGS(state, cpu_id)
> +);
> +
> +/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
> +#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
> +#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
> +
> +#define PWR_EVENT_EXIT -1
> +
> +#endif
> +
> +DEFINE_EVENT(cpu, cpu_frequency,
> +
> +       TP_PROTO(unsigned int frequency, unsigned int cpu_id),
> +
> +       TP_ARGS(frequency, cpu_id)
> +);
> +
> +TRACE_EVENT(machine_suspend,
> +
> +       TP_PROTO(unsigned int state),
> +
> +       TP_ARGS(state),
> +
> +       TP_STRUCT__entry(
> +               __field(        u32,            state           )
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->state = state;
> +       ),
> +
> +       TP_printk("state=%lu", (unsigned long)__entry->state)
Same remark about the unsigned type for the state field.

> +);
> +
> +#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> +
>  #ifndef _TRACE_POWER_ENUM_
>  #define _TRACE_POWER_ENUM_
>  enum {
> @@ -153,8 +214,32 @@ DEFINE_EVENT(power_domain, power_domain_target,
>
>        TP_ARGS(name, state, cpu_id)
>  );
> -
> +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
The clock and power_domain events have been recently introduced and so
must be part of the new API. Can this #endif be moved right after the
definition of power_end?

>  #endif /* _TRACE_POWER_H */
Should this be at the very end of the file?

>
> +/* Deprecated dummy functions must be protected against multi-declartion */
> +#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
> +#define EVENT_POWER_TRACING_DEPRECATED_PART_H
> +
> +#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
> +
> +#ifndef _TRACE_POWER_ENUM_
> +#define _TRACE_POWER_ENUM_
> +enum {
> +       POWER_NONE = 0,
> +       POWER_CSTATE = 1,
> +       POWER_PSTATE = 2,
> +};
> +#endif
> +
> +static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
> +static inline void trace_power_end(u64 cpuid) {};
> +static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
> +#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
> +
> +#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
> +
> +
> +
>  /* This part must be outside protection */
>  #include <trace/define_trace.h>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index e04b8bc..0be2e7f 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -69,6 +69,21 @@ config EVENT_TRACING
>        select CONTEXT_SWITCH_TRACER
>        bool
>
> +config EVENT_POWER_TRACING_DEPRECATED
> +       depends on EVENT_TRACING
> +       bool
A string is needed here. Without it it is impossible to have the option unset.
This does the trick: +bool "Deprecated power event trace API, to be removed"

> +       default y
> +       help
> +         Provides old power event types:
> +         C-state/idle accounting events:
> +         power:power_start
> +         power:power_end
> +         and old cpufreq accounting event:
> +         power:power_frequency
> +         This is for userspace compatibility
> +         and will vanish after 5 kernel iterations,
> +         namely 2.6.41.
> +
>  config CONTEXT_SWITCH_TRACER
>        bool
>
...

Thanks,
Jean

^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-11-11 18:03 [RESEND] Power trace event cleanup by still providing old interface for some time Thomas Renninger
@ 2010-11-11 18:03 ` Thomas Renninger
  2010-11-12 14:20   ` Jean Pihet
  2010-11-14 13:22   ` Thomas Renninger
  0 siblings, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-11-11 18:03 UTC (permalink / raw)
  To: mingo; +Cc: trenn, rjw, linux-kernel, arjan, jean.pihet

Recent changes:
  - Enable EVENT_POWER_TRACING_DEPRECATED by default

New power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:cpu_idle

and
  power:power_frequency
is replaced with:
  power:cpu_frequency

power:machine_suspend
is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/process.c    |    7 +++-
 arch/x86/kernel/process_32.c |    2 +-
 arch/x86/kernel/process_64.c |    2 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   87 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   15 +++++++
 kernel/trace/power-traces.c  |    3 +
 9 files changed, 116 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..155d975 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_cpu_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_cpu_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -481,10 +484,12 @@ static void mwait_idle(void)
 static void poll_idle(void)
 {
 	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
+	trace_cpu_idle(0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
+	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /*
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 96586c3..4b9befa 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -113,8 +113,8 @@ void cpu_idle(void)
 			stop_critical_timings();
 			pm_idle();
 			start_critical_timings();
-
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 		}
 		tick_nohz_restart_sched_tick();
 		preempt_enable_no_resched();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index b3d7a3a..4c818a7 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,8 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_cpu_idle(PWR_EVENT_EXIT,
+				       smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index c63a438..1109f68 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_cpu_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..08d5f05 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 3c95325..ba5134f 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -221,6 +221,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_cpu_idle((eax >> 4) + 1, cpu);
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 286784d..ab26d8e 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,67 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(cpu,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+		__field(	u32,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(cpu, cpu_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id)
+);
+
+/* This file can get included multiple times, TRACE_HEADER_MULTI_READ at top */
+#ifndef _PWR_EVENT_AVOID_DOUBLE_DEFINING
+#define _PWR_EVENT_AVOID_DOUBLE_DEFINING
+
+#define PWR_EVENT_EXIT -1
+
+#endif
+
+DEFINE_EVENT(cpu, cpu_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u32,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -153,8 +214,32 @@ DEFINE_EVENT(power_domain, power_domain_target,
 
 	TP_ARGS(name, state, cpu_id)
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e04b8bc..0be2e7f 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -69,6 +69,21 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	default y
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..f55fcf6 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(cpu_idle);
 
-- 
1.6.3


^ permalink raw reply related	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 16:00                 ` Arjan van de Ven
  2010-10-25 23:32                   ` Thomas Renninger
@ 2010-10-25 23:32                   ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 23:32 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Thomas Gleixner,
	linux-pm, Masami Hiramatsu, Tejun Heo, Ingo Molnar, linux-omap,
	Linus Torvalds, Andrew Morton, Mathieu Desnoyers

@Ingo: Can you queue up 1/3, it's an independent fix.

On Monday 25 October 2010 06:00:17 pm Arjan van de Ven wrote:
> On 10/25/2010 8:48 AM, Thomas Renninger wrote:
> 
> sure naming is one thing
Yes it should get renamed to not show:
cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
C0
This is wrong and confusing

> >>> and
> >>> "C0 no longer idle"
> >>>
> >>> I'd propose using the number 0 for the first one (it makes the most
> >>> logical sense, it's the least deep idle state etc etc)
> > I would use a special number for the "Linux only" state.
> 
> that special number is 0 though..
> it makes sense in ordering, 0 < 1, 1 < 2 etc
As long as it stays a kernel and perf processor_idle internal number
it does not hurt.
But userspace tools catching the perf idle event of state 0 should never
refer to it as processor idle state 0 (or even worse C0).
Instead they should try to get the name/description of:
/sys/../state0/name
or directly refer to it as "poll idle" state.

Processor idle state C0 is not only defined as "not being idle" in the
specs, also turbostat and cpufreq-aperf use it correctly and refer to C0 when 
they show accounted "not idle" time.

Encouraged by your suggestions I send another version.
It's not a big deal to send 0xFFFFFFFF instead of 0 as "non power saving" 
state. If you can handle compatibility with it in powertop, it doesn't make 
things more complicated in kernel and perf timechart as I first thought it 
does.

      Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 16:00                 ` Arjan van de Ven
@ 2010-10-25 23:32                   ` Thomas Renninger
  2010-10-25 23:32                   ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 23:32 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers

@Ingo: Can you queue up 1/3, it's an independent fix.

On Monday 25 October 2010 06:00:17 pm Arjan van de Ven wrote:
> On 10/25/2010 8:48 AM, Thomas Renninger wrote:
> 
> sure naming is one thing
Yes it should get renamed to not show:
cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
C0
This is wrong and confusing

> >>> and
> >>> "C0 no longer idle"
> >>>
> >>> I'd propose using the number 0 for the first one (it makes the most
> >>> logical sense, it's the least deep idle state etc etc)
> > I would use a special number for the "Linux only" state.
> 
> that special number is 0 though..
> it makes sense in ordering, 0 < 1, 1 < 2 etc
As long as it stays a kernel and perf processor_idle internal number
it does not hurt.
But userspace tools catching the perf idle event of state 0 should never
refer to it as processor idle state 0 (or even worse C0).
Instead they should try to get the name/description of:
/sys/../state0/name
or directly refer to it as "poll idle" state.

Processor idle state C0 is not only defined as "not being idle" in the
specs, also turbostat and cpufreq-aperf use it correctly and refer to C0 when 
they show accounted "not idle" time.

Encouraged by your suggestions I send another version.
It's not a big deal to send 0xFFFFFFFF instead of 0 as "non power saving" 
state. If you can handle compatibility with it in powertop, it doesn't make 
things more complicated in kernel and perf timechart as I first thought it 
does.

      Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 13:58       ` Arjan van de Ven
@ 2010-10-25 20:33         ` Rafael J. Wysocki
  2010-10-25 20:33         ` Rafael J. Wysocki
  1 sibling, 0 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-25 20:33 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, linux-pm, Masami Hiramatsu,
	Tejun Heo, Thomas Gleixner, linux-omap, Linus Torvalds,
	Ingo Molnar

On Monday, October 25, 2010, Arjan van de Ven wrote:
> On 10/25/2010 4:03 AM, Thomas Renninger wrote:
> > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> >> * Thomas Renninger<trenn@suse.de>  wrote:
> >>
> >>> New power trace events:
> >>> power:processor_idle
> >>> power:processor_frequency
> >>> power:machine_suspend
> >>>
> >>>
> >>> C-state/idle accounting events:
> >>>    power:power_start
> >>>    power:power_end
> >>> are replaced with:
> >>>    power:processor_idle
> >> Well, most power saving hw models (and the code implementing them) have this kind of
> >> model:
> >>
> >>   enter power saving mode X
> >>   exit power saving mode
> >>
> >> Where X is some sort of 'power saving deepness' attribute, right?
> > Sure.
> > But ACPI and afaik this model got picked up for PCI and other (sub-)archs
> > as well, defines state 0 as the non-power saving mode.
> 
> correct ,... "C0" is not power efficient... but it's still a valid OS 
> idle state!
> Also tracking processor_idle_{start,end} as a separate event!
> 
> same for "S0"... S0 as standby state is still valid... sure it doesn't 
> save you much power... but that does not mean it's not valid.

If you mean ACPI S0, it is not a standby state.  It actually is the full-power
state.

> (as indication, the Intel Moorestown platform, which is currently in 
> production and available to OEMs, has such a S0 standby state)

Another naming confusion.  How smart.

> > makes no sense and there is no need to introduce:
> > processor_idle_start/processor_idle_end
> > machine_suspend_start/machine_suspend_end
> > device_power_mode_start/device_power_mode_end
> > events.
> > Using state 0 as "exit/end", is much nicer for kernel/
> > userspace implementations/code and the user.
> actually no; having written a few of these in userspace so far, having a 
> separate end event is easier to deal with;
> the actions you take on entry and exit are complete separate code paths.

That's correct, unless you go directly from one low-power state to another
(which is possible for example for PCI).  We don't do that at the moment,
but it's possible in principle and we may want to start doing that at one
point.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 13:58       ` Arjan van de Ven
  2010-10-25 20:33         ` Rafael J. Wysocki
@ 2010-10-25 20:33         ` Rafael J. Wysocki
  1 sibling, 0 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-25 20:33 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Ingo Molnar, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Masami Hiramatsu, Frank Eigler, Steven Rostedt,
	Kevin Hilman, Peter Zijlstra, linux-omap, linux-pm,
	linux-trace-users, Jean Pihet, Pierre Tardy, Frederic Weisbecker,
	Tejun Heo, Mathieu Desnoyers

On Monday, October 25, 2010, Arjan van de Ven wrote:
> On 10/25/2010 4:03 AM, Thomas Renninger wrote:
> > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> >> * Thomas Renninger<trenn@suse.de>  wrote:
> >>
> >>> New power trace events:
> >>> power:processor_idle
> >>> power:processor_frequency
> >>> power:machine_suspend
> >>>
> >>>
> >>> C-state/idle accounting events:
> >>>    power:power_start
> >>>    power:power_end
> >>> are replaced with:
> >>>    power:processor_idle
> >> Well, most power saving hw models (and the code implementing them) have this kind of
> >> model:
> >>
> >>   enter power saving mode X
> >>   exit power saving mode
> >>
> >> Where X is some sort of 'power saving deepness' attribute, right?
> > Sure.
> > But ACPI and afaik this model got picked up for PCI and other (sub-)archs
> > as well, defines state 0 as the non-power saving mode.
> 
> correct ,... "C0" is not power efficient... but it's still a valid OS 
> idle state!
> Also tracking processor_idle_{start,end} as a separate event!
> 
> same for "S0"... S0 as standby state is still valid... sure it doesn't 
> save you much power... but that does not mean it's not valid.

If you mean ACPI S0, it is not a standby state.  It actually is the full-power
state.

> (as indication, the Intel Moorestown platform, which is currently in 
> production and available to OEMs, has such a S0 standby state)

Another naming confusion.  How smart.

> > makes no sense and there is no need to introduce:
> > processor_idle_start/processor_idle_end
> > machine_suspend_start/machine_suspend_end
> > device_power_mode_start/device_power_mode_end
> > events.
> > Using state 0 as "exit/end", is much nicer for kernel/
> > userspace implementations/code and the user.
> actually no; having written a few of these in userspace so far, having a 
> separate end event is easier to deal with;
> the actions you take on entry and exit are complete separate code paths.

That's correct, unless you go directly from one low-power state to another
(which is possible for example for PCI).  We don't do that at the moment,
but it's possible in principle and we may want to start doing that at one
point.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 12:58         ` Mathieu Desnoyers
  2010-10-25 20:29           ` Rafael J. Wysocki
@ 2010-10-25 20:29           ` Rafael J. Wysocki
  1 sibling, 0 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-25 20:29 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andrew Morton, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Arjan van de Ven, linux-pm, Masami Hiramatsu,
	Tejun Heo, Thomas Gleixner, linux-omap, Linus Torvalds,
	Ingo Molnar

On Monday, October 25, 2010, Mathieu Desnoyers wrote:
> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Thomas Renninger <trenn@suse.de> wrote:
> > 
> > > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > > > 
> > > > * Thomas Renninger <trenn@suse.de> wrote:
> > > > 
> > > > > New power trace events:
> > > > > power:processor_idle
> > > > > power:processor_frequency
> > > > > power:machine_suspend
> > > > > 
> > > > > 
> > > > > C-state/idle accounting events:
> > > > >   power:power_start
> > > > >   power:power_end
> > > > > are replaced with:
> > > > >   power:processor_idle
> > > > 
> > > > Well, most power saving hw models (and the code implementing them) have this kind of 
> > > > model:
> > > > 
> > > >  enter power saving mode X
> > > >  exit power saving mode
> > > > 
> > > > Where X is some sort of 'power saving deepness' attribute, right?
> > >
> > > Sure.
> > 
> > Which is is the 'saner' model?
> > 
> > > But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> > > defines state 0 as the non-power saving mode.
> > 
> > But the actual code does not actually deal with any 'state 0', does it? It enters an 
> > idle function and then exits it, right?
> > 
> > 'power state' might be what is used for devices - but even there, we have:
> > 
> >   - enter power state X
> >   - exit power state
> > 
> > right?
> > 
> > > Same as done here with machine suspend state (S0 is back from suspend) and
> > > this model should get picked up when device sleep states get tracked at
> > > some time.
> > >
> > > It's consistent and applies to some well known specifications.
> > 
> > What we want it to be is for it to be the nicest, most understandable, most logical 
> > model - not one matching random hardware specifications.
> > 
> > ( Hardware specifications only matter in so far that it should be possible to 
> >   express all the known hardware state transitions via these events efficiently. )
> > 
> > > Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> > > there is no need to introduce: processor_idle_start/processor_idle_end 
> > > machine_suspend_start/machine_suspend_end 
> > > device_power_mode_start/device_power_mode_end events.
> > 
> > What do you mean by "makes no sense"?
> > 
> > Are they superfluous? Inefficient? Illogical?
> 
> I think it would require deep understanding of specific power modes of each
> architecture to split into this topology. On the bright side, it would bring
> clear understanding of which HW resource is being put to sleep, which would make
> automated analysis much easier to do. But maybe it's too much pain compared to
> the benefit. The related question is also: where is it best to put this logic ?
> In the kernel code ? In per-arch TRACE_EVENT() handlers or in external trace
> analysis plugins ?
> 
> > 
> > > Using state 0 as "exit/end", is much nicer for kernel/ userspace 
> > > implementations/code and the user.
> > 
> > By that argument we should not have separate fork() and exit() syscalls either, but 
> > a set_process_state(1) and set_process_state(0) interface?
> 
> I'm by no mean expert on power saving hardware specs, but if it is possible for
> hardware to switch between two power saving states without passing through power
> state 0, then using a "set state" rather than an enter/exit would be more
> appropriate; even if we go for a scheme introducing
> 
> processor_idle_start/processor_idle_end,
> machine_suspend_start/machine_suspend_end,
> device_power_mode_start/device_power_mode_end.
> 
> I must defer to you guys to figure out if some hardware actually do that for
> either of CPU idle, suspend or device power modes.

Yes, you can go directly from PCI_D1 to PCI_D2, for one example.

Apart from this, attempting to put system suspend to the same bag as cpuidle
is not going to work in the long run.  They are _fundamentally_ different things
event though the power state we get into as a result of suspend is approximately
the same as we can get into via cpuidle (even in that case the energy savings
will generally be different in both cases due to wakeup events).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 12:58         ` Mathieu Desnoyers
@ 2010-10-25 20:29           ` Rafael J. Wysocki
  2010-10-25 20:29           ` Rafael J. Wysocki
  1 sibling, 0 replies; 72+ messages in thread
From: Rafael J. Wysocki @ 2010-10-25 20:29 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Thomas Renninger, Linus Torvalds, Andrew Morton,
	Thomas Gleixner, Masami Hiramatsu, Frank Eigler, Steven Rostedt,
	Kevin Hilman, Peter Zijlstra, linux-omap, linux-pm,
	linux-trace-users, Jean Pihet, Pierre Tardy, Frederic Weisbecker,
	Tejun Heo, Arjan van de Ven

On Monday, October 25, 2010, Mathieu Desnoyers wrote:
> * Ingo Molnar (mingo@elte.hu) wrote:
> > 
> > * Thomas Renninger <trenn@suse.de> wrote:
> > 
> > > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > > > 
> > > > * Thomas Renninger <trenn@suse.de> wrote:
> > > > 
> > > > > New power trace events:
> > > > > power:processor_idle
> > > > > power:processor_frequency
> > > > > power:machine_suspend
> > > > > 
> > > > > 
> > > > > C-state/idle accounting events:
> > > > >   power:power_start
> > > > >   power:power_end
> > > > > are replaced with:
> > > > >   power:processor_idle
> > > > 
> > > > Well, most power saving hw models (and the code implementing them) have this kind of 
> > > > model:
> > > > 
> > > >  enter power saving mode X
> > > >  exit power saving mode
> > > > 
> > > > Where X is some sort of 'power saving deepness' attribute, right?
> > >
> > > Sure.
> > 
> > Which is is the 'saner' model?
> > 
> > > But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> > > defines state 0 as the non-power saving mode.
> > 
> > But the actual code does not actually deal with any 'state 0', does it? It enters an 
> > idle function and then exits it, right?
> > 
> > 'power state' might be what is used for devices - but even there, we have:
> > 
> >   - enter power state X
> >   - exit power state
> > 
> > right?
> > 
> > > Same as done here with machine suspend state (S0 is back from suspend) and
> > > this model should get picked up when device sleep states get tracked at
> > > some time.
> > >
> > > It's consistent and applies to some well known specifications.
> > 
> > What we want it to be is for it to be the nicest, most understandable, most logical 
> > model - not one matching random hardware specifications.
> > 
> > ( Hardware specifications only matter in so far that it should be possible to 
> >   express all the known hardware state transitions via these events efficiently. )
> > 
> > > Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> > > there is no need to introduce: processor_idle_start/processor_idle_end 
> > > machine_suspend_start/machine_suspend_end 
> > > device_power_mode_start/device_power_mode_end events.
> > 
> > What do you mean by "makes no sense"?
> > 
> > Are they superfluous? Inefficient? Illogical?
> 
> I think it would require deep understanding of specific power modes of each
> architecture to split into this topology. On the bright side, it would bring
> clear understanding of which HW resource is being put to sleep, which would make
> automated analysis much easier to do. But maybe it's too much pain compared to
> the benefit. The related question is also: where is it best to put this logic ?
> In the kernel code ? In per-arch TRACE_EVENT() handlers or in external trace
> analysis plugins ?
> 
> > 
> > > Using state 0 as "exit/end", is much nicer for kernel/ userspace 
> > > implementations/code and the user.
> > 
> > By that argument we should not have separate fork() and exit() syscalls either, but 
> > a set_process_state(1) and set_process_state(0) interface?
> 
> I'm by no mean expert on power saving hardware specs, but if it is possible for
> hardware to switch between two power saving states without passing through power
> state 0, then using a "set state" rather than an enter/exit would be more
> appropriate; even if we go for a scheme introducing
> 
> processor_idle_start/processor_idle_end,
> machine_suspend_start/machine_suspend_end,
> device_power_mode_start/device_power_mode_end.
> 
> I must defer to you guys to figure out if some hardware actually do that for
> either of CPU idle, suspend or device power modes.

Yes, you can go directly from PCI_D1 to PCI_D2, for one example.

Apart from this, attempting to put system suspend to the same bag as cpuidle
is not going to work in the long run.  They are _fundamentally_ different things
event though the power state we get into as a result of suspend is approximately
the same as we can get into via cpuidle (even in that case the energy savings
will generally be different in both cases due to wakeup events).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 15:48               ` Thomas Renninger
  2010-10-25 16:00                 ` Arjan van de Ven
@ 2010-10-25 16:00                 ` Arjan van de Ven
  1 sibling, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 16:00 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Thomas Gleixner,
	linux-pm, Masami Hiramatsu, Tejun Heo, Ingo Molnar, linux-omap,
	Linus Torvalds, Andrew Morton, Mathieu Desnoyers

On 10/25/2010 8:48 AM, Thomas Renninger wrote:
> On Monday 25 October 2010 16:56:04 Ingo Molnar wrote:
>> * Arjan van de Ven<arjan@linux.intel.com>  wrote:
>>> On 10/25/2010 7:36 AM, Thomas Renninger wrote:
>>> ok so we have
>>>
>>> "C0 idle"
> Ideally this should not be called C0, but expressed
> as (#define) POLL_IDLE wherever possible.
>
> In all documentations/specs/white papers about other OSes
> C0 is refered to as not being idle.
> Linux mis-uses it as a self-defined idle state which
> is really confusing.

sure naming is one thing
>>> and
>>> "C0 no longer idle"
>>>
>>> I'd propose using the number 0 for the first one (it makes the most
>>> logical sense, it's the least deep idle state etc etc)
> I would use a special number for the "Linux only" state.

that special number is 0 though..
it makes sense in ordering, 0 < 1, 1 < 2 etc



0 makes for a really bad special number for the exit marker; not just here,
but also for your suspend hook, that one definitely needs to change
(since current commercially available SOCs already reuse 0 for this for 
standby level states)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 15:48               ` Thomas Renninger
@ 2010-10-25 16:00                 ` Arjan van de Ven
  2010-10-25 23:32                   ` Thomas Renninger
  2010-10-25 23:32                   ` Thomas Renninger
  2010-10-25 16:00                 ` Arjan van de Ven
  1 sibling, 2 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 16:00 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers

On 10/25/2010 8:48 AM, Thomas Renninger wrote:
> On Monday 25 October 2010 16:56:04 Ingo Molnar wrote:
>> * Arjan van de Ven<arjan@linux.intel.com>  wrote:
>>> On 10/25/2010 7:36 AM, Thomas Renninger wrote:
>>> ok so we have
>>>
>>> "C0 idle"
> Ideally this should not be called C0, but expressed
> as (#define) POLL_IDLE wherever possible.
>
> In all documentations/specs/white papers about other OSes
> C0 is refered to as not being idle.
> Linux mis-uses it as a self-defined idle state which
> is really confusing.

sure naming is one thing
>>> and
>>> "C0 no longer idle"
>>>
>>> I'd propose using the number 0 for the first one (it makes the most
>>> logical sense, it's the least deep idle state etc etc)
> I would use a special number for the "Linux only" state.

that special number is 0 though..
it makes sense in ordering, 0 < 1, 1 < 2 etc



0 makes for a really bad special number for the exit marker; not just here,
but also for your suspend hook, that one definitely needs to change
(since current commercially available SOCs already reuse 0 for this for 
standby level states)



^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:56             ` Ingo Molnar
  2010-10-25 15:48               ` Thomas Renninger
@ 2010-10-25 15:48               ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 15:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, linux-trace-users, Arjan van de Ven,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, linux-pm, Masami Hiramatsu,
	Tejun Heo, Andrew Morton, linux-omap, Linus Torvalds,
	Thomas Gleixner

On Monday 25 October 2010 16:56:04 Ingo Molnar wrote:
> 
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
> > On 10/25/2010 7:36 AM, Thomas Renninger wrote:
> > ok so we have
> > 
> > "C0 idle"
Ideally this should not be called C0, but expressed
as (#define) POLL_IDLE wherever possible.

In all documentations/specs/white papers about other OSes
C0 is refered to as not being idle.
Linux mis-uses it as a self-defined idle state which
is really confusing.

> > and
> > "C0 no longer idle"
> > 
> > I'd propose using the number 0 for the first one (it makes the most
> > logical sense, it's the least deep idle state etc etc)
I would use a special number for the "Linux only" state.

> > we could use "-1" or "INT_MAX" for the later
> > but as a user of the API I rather like a separate "we're no longer idle" event... 
> > but if not, as long as things aren't ambigious I'll find a way to code around it.
> >
> > basically with a separate event, I demultiplex based on event number between entry 
> > and exit.... with a special exit value I would just need a double demultiplex,
> 
> Hm, does not sound particularly smart.
> 
> > one on "idle" and then a second one on the state number to split between 
> > entry/exit.
> 
> The thing is, in terms of CPU idle state, if the old tracepoints give us all the 
> information that the new tracepoints, why dont we simply add the tracepoints to ARM 
> and be done with it? No app needs to be changed in that case, etc.
> 
> Plus, lets express the suspend/resume tracepoints as suspend_enter(X)/suspend_exit() 
> events as well, to keep it symmetric and consistent with the other enter/exit 
> events.
> 
> The rename alone isnt a strong enough reason really. 'entering idle state X' and 
> 'exiting idle' is pretty much synonymous to 'enter idle state X'.
It's not only that, my patch also:
  - eleminates the never ever used type= field
  - uses a better name, currently it's power:power_{start,end}
    How would you name another power event...

Altogether, it should justify the proposed cleanup(s).
But with this C0 clash, I am not sure whether:
  1) as Ingo said any clean up
  2) a minimal cleanup:
       - rename power:power_{start,end} to power:processor_idle{start,end}
       - get rid of type= field
  3) or a maximum cleanup:
       - plus not use start/end events, but use one state transition
         event.
should be done.
I think best is Jean goes with current definitions.
2. is far less intrusive and if you like to have it, I can
still send another patch.

     Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:56             ` Ingo Molnar
@ 2010-10-25 15:48               ` Thomas Renninger
  2010-10-25 16:00                 ` Arjan van de Ven
  2010-10-25 16:00                 ` Arjan van de Ven
  2010-10-25 15:48               ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 15:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers

On Monday 25 October 2010 16:56:04 Ingo Molnar wrote:
> 
> * Arjan van de Ven <arjan@linux.intel.com> wrote:
> > On 10/25/2010 7:36 AM, Thomas Renninger wrote:
> > ok so we have
> > 
> > "C0 idle"
Ideally this should not be called C0, but expressed
as (#define) POLL_IDLE wherever possible.

In all documentations/specs/white papers about other OSes
C0 is refered to as not being idle.
Linux mis-uses it as a self-defined idle state which
is really confusing.

> > and
> > "C0 no longer idle"
> > 
> > I'd propose using the number 0 for the first one (it makes the most
> > logical sense, it's the least deep idle state etc etc)
I would use a special number for the "Linux only" state.

> > we could use "-1" or "INT_MAX" for the later
> > but as a user of the API I rather like a separate "we're no longer idle" event... 
> > but if not, as long as things aren't ambigious I'll find a way to code around it.
> >
> > basically with a separate event, I demultiplex based on event number between entry 
> > and exit.... with a special exit value I would just need a double demultiplex,
> 
> Hm, does not sound particularly smart.
> 
> > one on "idle" and then a second one on the state number to split between 
> > entry/exit.
> 
> The thing is, in terms of CPU idle state, if the old tracepoints give us all the 
> information that the new tracepoints, why dont we simply add the tracepoints to ARM 
> and be done with it? No app needs to be changed in that case, etc.
> 
> Plus, lets express the suspend/resume tracepoints as suspend_enter(X)/suspend_exit() 
> events as well, to keep it symmetric and consistent with the other enter/exit 
> events.
> 
> The rename alone isnt a strong enough reason really. 'entering idle state X' and 
> 'exiting idle' is pretty much synonymous to 'enter idle state X'.
It's not only that, my patch also:
  - eleminates the never ever used type= field
  - uses a better name, currently it's power:power_{start,end}
    How would you name another power event...

Altogether, it should justify the proposed cleanup(s).
But with this C0 clash, I am not sure whether:
  1) as Ingo said any clean up
  2) a minimal cleanup:
       - rename power:power_{start,end} to power:processor_idle{start,end}
       - get rid of type= field
  3) or a maximum cleanup:
       - plus not use start/end events, but use one state transition
         event.
should be done.
I think best is Jean goes with current definitions.
2. is far less intrusive and if you like to have it, I can
still send another patch.

     Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:45           ` Arjan van de Ven
@ 2010-10-25 14:56             ` Ingo Molnar
  2010-10-25 14:56             ` Ingo Molnar
  1 sibling, 0 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-10-25 14:56 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Thomas Gleixner,
	linux-pm, Masami Hiramatsu, Tejun Heo, Andrew Morton, linux-omap,
	Linus Torvalds, Mathieu Desnoyers


* Arjan van de Ven <arjan@linux.intel.com> wrote:

> On 10/25/2010 7:36 AM, Thomas Renninger wrote:
> >>I know that your new API tries to use "0" as exit, but 0 is already
> >>taken (in all power terminology at least on x86 it is) for this.
> >cpuidle indeed misuses C0 as "poll idle" state.
> >That's really bad/misleading, but nothing that can be changed easily.
> >
> >I agree shifting C0 (cpuidle)<->  POLL_IDLE event
> >and              "not idle"<->  real C0 (executing instructions)
> >or however this gets mapped makes things even worse.
> >
> >Damn, it could be that easy and straight forward, but I agree that
> >this kills the approach to trigger state 0 event if C0 is entered
> >(C0 as defined as operational mode executing instructions).
> 
> ok so we have
> 
> "C0 idle"
> and
> "C0 no longer idle"
> 
> I'd propose using the number 0 for the first one (it makes the most
> logical sense, it's the least deep idle state etc etc)
> 
> we could use "-1" or "INT_MAX" for the later
> 
> but as a user of the API I rather like a separate "we're no longer idle" event... 
> but if not, as long as things aren't ambigious I'll find a way to code around it.
>
> basically with a separate event, I demultiplex based on event number between entry 
> and exit.... with a special exit value I would just need a double demultiplex,

Hm, does not sound particularly smart.

> one on "idle" and then a second one on the state number to split between 
> entry/exit.

The thing is, in terms of CPU idle state, if the old tracepoints give us all the 
information that the new tracepoints, why dont we simply add the tracepoints to ARM 
and be done with it? No app needs to be changed in that case, etc.

Plus, lets express the suspend/resume tracepoints as suspend_enter(X)/suspend_exit() 
events as well, to keep it symmetric and consistent with the other enter/exit 
events.

The rename alone isnt a strong enough reason really. 'entering idle state X' and 
'exiting idle' is pretty much synonymous to 'enter idle state X'.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:45           ` Arjan van de Ven
  2010-10-25 14:56             ` Ingo Molnar
@ 2010-10-25 14:56             ` Ingo Molnar
  2010-10-25 15:48               ` Thomas Renninger
  2010-10-25 15:48               ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-10-25 14:56 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers


* Arjan van de Ven <arjan@linux.intel.com> wrote:

> On 10/25/2010 7:36 AM, Thomas Renninger wrote:
> >>I know that your new API tries to use "0" as exit, but 0 is already
> >>taken (in all power terminology at least on x86 it is) for this.
> >cpuidle indeed misuses C0 as "poll idle" state.
> >That's really bad/misleading, but nothing that can be changed easily.
> >
> >I agree shifting C0 (cpuidle)<->  POLL_IDLE event
> >and              "not idle"<->  real C0 (executing instructions)
> >or however this gets mapped makes things even worse.
> >
> >Damn, it could be that easy and straight forward, but I agree that
> >this kills the approach to trigger state 0 event if C0 is entered
> >(C0 as defined as operational mode executing instructions).
> 
> ok so we have
> 
> "C0 idle"
> and
> "C0 no longer idle"
> 
> I'd propose using the number 0 for the first one (it makes the most
> logical sense, it's the least deep idle state etc etc)
> 
> we could use "-1" or "INT_MAX" for the later
> 
> but as a user of the API I rather like a separate "we're no longer idle" event... 
> but if not, as long as things aren't ambigious I'll find a way to code around it.
>
> basically with a separate event, I demultiplex based on event number between entry 
> and exit.... with a special exit value I would just need a double demultiplex,

Hm, does not sound particularly smart.

> one on "idle" and then a second one on the state number to split between 
> entry/exit.

The thing is, in terms of CPU idle state, if the old tracepoints give us all the 
information that the new tracepoints, why dont we simply add the tracepoints to ARM 
and be done with it? No app needs to be changed in that case, etc.

Plus, lets express the suspend/resume tracepoints as suspend_enter(X)/suspend_exit() 
events as well, to keep it symmetric and consistent with the other enter/exit 
events.

The rename alone isnt a strong enough reason really. 'entering idle state X' and 
'exiting idle' is pretty much synonymous to 'enter idle state X'.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:11           ` Arjan van de Ven
  2010-10-25 14:51             ` Thomas Renninger
@ 2010-10-25 14:51             ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 14:51 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Thomas Gleixner,
	linux-pm, Masami Hiramatsu, Tejun Heo, Ingo Molnar, linux-omap,
	Linus Torvalds, Andrew Morton, Mathieu Desnoyers

On Monday 25 October 2010 16:11:10 Arjan van de Ven wrote:
> On 10/25/2010 5:55 AM, Thomas Renninger wrote:
> 
> 
> >> But the actual code does not actually deal with any 'state 0', does it?
> > It does. Not being idle is tracked by cpuidle driver as state 0
> > (arch independent):
> > /sys/devices/system/cpu/cpu0/cpuidle/state0/
> > halt/C1 on X86 is:
> > /sys/devices/system/cpu/cpu0/cpuidle/state1/
> > ...
> state0 is still OS idle!
Yes, I just realized that.
Which is very unfortunate.
The whole cpuidle stuff is based on ACPI C-states and
cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
C0
is plain wrong if it's used as "poll idle" time.
C0 is defined as (in the ACPI spec):
----------
2.5 Processor Power State Definitions
C0 Processor Power State
While the processor is in this state, it executes instructions.
----------

> the API is just weird for this, from a userspace perspective
> 
> if the kernel picks this state 0 for the idle handler, the userspace app 
> gets
> two events
> 
> one for going to state 0 to enter the idle state
> one for going to state 0 to exit idle
> 
> but they're the exact same event in your API.
> 
> rather unpleasant from a userspace program perspective....
Yeah. But the re-definition of C0 being "Linux poll idle"
will confuse users as well. Not sure whether this should get
touched, though.

Thanks for clarification, I wasn't aware of that...

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:11           ` Arjan van de Ven
@ 2010-10-25 14:51             ` Thomas Renninger
  2010-10-25 14:51             ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 14:51 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers

On Monday 25 October 2010 16:11:10 Arjan van de Ven wrote:
> On 10/25/2010 5:55 AM, Thomas Renninger wrote:
> 
> 
> >> But the actual code does not actually deal with any 'state 0', does it?
> > It does. Not being idle is tracked by cpuidle driver as state 0
> > (arch independent):
> > /sys/devices/system/cpu/cpu0/cpuidle/state0/
> > halt/C1 on X86 is:
> > /sys/devices/system/cpu/cpu0/cpuidle/state1/
> > ...
> state0 is still OS idle!
Yes, I just realized that.
Which is very unfortunate.
The whole cpuidle stuff is based on ACPI C-states and
cat /sys/devices/system/cpu/cpu0/cpuidle/state0/name
C0
is plain wrong if it's used as "poll idle" time.
C0 is defined as (in the ACPI spec):
----------
2.5 Processor Power State Definitions
C0 Processor Power State
While the processor is in this state, it executes instructions.
----------

> the API is just weird for this, from a userspace perspective
> 
> if the kernel picks this state 0 for the idle handler, the userspace app 
> gets
> two events
> 
> one for going to state 0 to enter the idle state
> one for going to state 0 to exit idle
> 
> but they're the exact same event in your API.
> 
> rather unpleasant from a userspace program perspective....
Yeah. But the re-definition of C0 being "Linux poll idle"
will confuse users as well. Not sure whether this should get
touched, though.

Thanks for clarification, I wasn't aware of that...

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:36         ` Thomas Renninger
@ 2010-10-25 14:45           ` Arjan van de Ven
  2010-10-25 14:45           ` Arjan van de Ven
  1 sibling, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 14:45 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Mathieu Desnoyers,
	Ingo Molnar, linux-pm, Masami Hiramatsu, Tejun Heo,
	Andrew Morton, linux-omap, Linus Torvalds, Thomas Gleixner

On 10/25/2010 7:36 AM, Thomas Renninger wrote:
>> I know that your new API tries to use "0" as exit, but 0 is already
>> taken (in all power terminology at least on x86 it is) for this.
> cpuidle indeed misuses C0 as "poll idle" state.
> That's really bad/misleading, but nothing that can be changed easily.
>
> I agree shifting C0 (cpuidle)<->  POLL_IDLE event
> and              "not idle"<->  real C0 (executing instructions)
> or however this gets mapped makes things even worse.
>
> Damn, it could be that easy and straight forward, but I agree that
> this kills the approach to trigger state 0 event if C0 is entered
> (C0 as defined as operational mode executing instructions).

ok so we have

"C0 idle"
and
"C0 no longer idle"

I'd propose using the number 0 for the first one (it makes the most 
logical sense, it's the least deep idle state etc etc)

we could use "-1" or "INT_MAX" for the later

but as a user of the API I rather like a separate "we're no longer idle" 
event... but if not, as long as things aren't ambigious I'll find a way 
to code around it.
basically with a separate event, I demultiplex based on event number 
between entry and exit.... with a special exit value I would just need a 
double demultiplex,
one on "idle" and then a second one on the state number to split between 
entry/exit.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 14:36         ` Thomas Renninger
  2010-10-25 14:45           ` Arjan van de Ven
@ 2010-10-25 14:45           ` Arjan van de Ven
  2010-10-25 14:56             ` Ingo Molnar
  2010-10-25 14:56             ` Ingo Molnar
  1 sibling, 2 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 14:45 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Ingo Molnar

On 10/25/2010 7:36 AM, Thomas Renninger wrote:
>> I know that your new API tries to use "0" as exit, but 0 is already
>> taken (in all power terminology at least on x86 it is) for this.
> cpuidle indeed misuses C0 as "poll idle" state.
> That's really bad/misleading, but nothing that can be changed easily.
>
> I agree shifting C0 (cpuidle)<->  POLL_IDLE event
> and              "not idle"<->  real C0 (executing instructions)
> or however this gets mapped makes things even worse.
>
> Damn, it could be that easy and straight forward, but I agree that
> this kills the approach to trigger state 0 event if C0 is entered
> (C0 as defined as operational mode executing instructions).

ok so we have

"C0 idle"
and
"C0 no longer idle"

I'd propose using the number 0 for the first one (it makes the most 
logical sense, it's the least deep idle state etc etc)

we could use "-1" or "INT_MAX" for the later

but as a user of the API I rather like a separate "we're no longer idle" 
event... but if not, as long as things aren't ambigious I'll find a way 
to code around it.
basically with a separate event, I demultiplex based on event number 
between entry and exit.... with a special exit value I would just need a 
double demultiplex,
one on "idle" and then a second one on the state number to split between 
entry/exit.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 13:55       ` Arjan van de Ven
  2010-10-25 14:36         ` Thomas Renninger
@ 2010-10-25 14:36         ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 14:36 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Mathieu Desnoyers,
	Ingo Molnar, linux-pm, Masami Hiramatsu, Tejun Heo,
	Andrew Morton, linux-omap, Linus Torvalds, Thomas Gleixner

On Monday 25 October 2010 15:55:08 Arjan van de Ven wrote:
> On 10/25/2010 2:41 AM, Thomas Renninger wrote:
> > On Monday 25 October 2010 08:54:34 Arjan van de Ven wrote:
> >> On 10/19/2010 4:36 AM, Thomas Renninger wrote:
> >>>    static void poll_idle(void)
> >>>    {
> >>> -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
> >>>    	local_irq_enable();
> >>>    	while (!need_resched())
> >>>    		cpu_relax();
> >>> -	trace_power_end(0);
> >>>    }
> >> why did you remove the idle tracepoints from this one ???
> > Because no idle/sleep state is entered here.
> > State 0 does not exist or say, it means the machine is not idle.
> > The new event uses idle state 0 spec conform as "exit sleep state".
> >
> > If this should still be trackable some kind of dummy sleep state:
> > #define IDLE_BUSY_LOOP 0xFE
> > (or similar) must get defined and passed like this:
> > trace_processor_idle(IDLE_BUSY_LOOP, smp_processor_id());
> >      cpu_relax()
> > trace_processor_idle(0, smp_processor_id());
> >
> > I could imagine this is somewhat worth it to compare idle results
> > to "no idle state at all" is used.
> > But nobody should ever use idle=poll, comparing deep sleep states
> > with C1 with (idle=halt) should be sufficient?
> 
> this is not idle=poll on the command line only.
> this also gets used normally, in two cases
> 1) during real time operations, for some short periods of time
>      (think wallstreet trading)
> 2) by the menu governor when the next event is less than a few 
> microseconds away, so short that even C1 is too much
> 
> I know that your new API tries to use "0" as exit, but 0 is already 
> taken (in all power terminology at least on x86 it is) for this.
cpuidle indeed misuses C0 as "poll idle" state.
That's really bad/misleading, but nothing that can be changed easily.

I agree shifting C0 (cpuidle) <-> POLL_IDLE event
and              "not idle"   <-> real C0 (executing instructions)
or however this gets mapped makes things even worse.

Damn, it could be that easy and straight forward, but I agree that
this kills the approach to trigger state 0 event if C0 is entered
(C0 as defined as operational mode executing instructions).

     Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 13:55       ` Arjan van de Ven
@ 2010-10-25 14:36         ` Thomas Renninger
  2010-10-25 14:45           ` Arjan van de Ven
  2010-10-25 14:45           ` Arjan van de Ven
  2010-10-25 14:36         ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 14:36 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Ingo Molnar

On Monday 25 October 2010 15:55:08 Arjan van de Ven wrote:
> On 10/25/2010 2:41 AM, Thomas Renninger wrote:
> > On Monday 25 October 2010 08:54:34 Arjan van de Ven wrote:
> >> On 10/19/2010 4:36 AM, Thomas Renninger wrote:
> >>>    static void poll_idle(void)
> >>>    {
> >>> -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
> >>>    	local_irq_enable();
> >>>    	while (!need_resched())
> >>>    		cpu_relax();
> >>> -	trace_power_end(0);
> >>>    }
> >> why did you remove the idle tracepoints from this one ???
> > Because no idle/sleep state is entered here.
> > State 0 does not exist or say, it means the machine is not idle.
> > The new event uses idle state 0 spec conform as "exit sleep state".
> >
> > If this should still be trackable some kind of dummy sleep state:
> > #define IDLE_BUSY_LOOP 0xFE
> > (or similar) must get defined and passed like this:
> > trace_processor_idle(IDLE_BUSY_LOOP, smp_processor_id());
> >      cpu_relax()
> > trace_processor_idle(0, smp_processor_id());
> >
> > I could imagine this is somewhat worth it to compare idle results
> > to "no idle state at all" is used.
> > But nobody should ever use idle=poll, comparing deep sleep states
> > with C1 with (idle=halt) should be sufficient?
> 
> this is not idle=poll on the command line only.
> this also gets used normally, in two cases
> 1) during real time operations, for some short periods of time
>      (think wallstreet trading)
> 2) by the menu governor when the next event is less than a few 
> microseconds away, so short that even C1 is too much
> 
> I know that your new API tries to use "0" as exit, but 0 is already 
> taken (in all power terminology at least on x86 it is) for this.
cpuidle indeed misuses C0 as "poll idle" state.
That's really bad/misleading, but nothing that can be changed easily.

I agree shifting C0 (cpuidle) <-> POLL_IDLE event
and              "not idle"   <-> real C0 (executing instructions)
or however this gets mapped makes things even worse.

Damn, it could be that easy and straight forward, but I agree that
this kills the approach to trigger state 0 event if C0 is entered
(C0 as defined as operational mode executing instructions).

     Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 12:55         ` Thomas Renninger
@ 2010-10-25 14:11           ` Arjan van de Ven
  2010-10-25 14:11           ` Arjan van de Ven
  1 sibling, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 14:11 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Thomas Gleixner,
	linux-pm, Masami Hiramatsu, Tejun Heo, Ingo Molnar, linux-omap,
	Linus Torvalds, Andrew Morton, Mathieu Desnoyers

On 10/25/2010 5:55 AM, Thomas Renninger wrote:


>> But the actual code does not actually deal with any 'state 0', does it?
> It does. Not being idle is tracked by cpuidle driver as state 0
> (arch independent):
> /sys/devices/system/cpu/cpu0/cpuidle/state0/
> halt/C1 on X86 is:
> /sys/devices/system/cpu/cpu0/cpuidle/state1/
> ...
state0 is still OS idle!


the API is just weird for this, from a userspace perspective

if the kernel picks this state 0 for the idle handler, the userspace app 
gets
two events

one for going to state 0 to enter the idle state
one for going to state 0 to exit idle

but they're the exact same event in your API.

rather unpleasant from a userspace program perspective....
now I need to start tracking even more state on top in powertop to be 
able to make a guess at which of the two meanings a state 0 entry has.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 12:55         ` Thomas Renninger
  2010-10-25 14:11           ` Arjan van de Ven
@ 2010-10-25 14:11           ` Arjan van de Ven
  2010-10-25 14:51             ` Thomas Renninger
  2010-10-25 14:51             ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 14:11 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers

On 10/25/2010 5:55 AM, Thomas Renninger wrote:


>> But the actual code does not actually deal with any 'state 0', does it?
> It does. Not being idle is tracked by cpuidle driver as state 0
> (arch independent):
> /sys/devices/system/cpu/cpu0/cpuidle/state0/
> halt/C1 on X86 is:
> /sys/devices/system/cpu/cpu0/cpuidle/state1/
> ...
state0 is still OS idle!


the API is just weird for this, from a userspace perspective

if the kernel picks this state 0 for the idle handler, the userspace app 
gets
two events

one for going to state 0 to enter the idle state
one for going to state 0 to exit idle

but they're the exact same event in your API.

rather unpleasant from a userspace program perspective....
now I need to start tracking even more state on top in powertop to be 
able to make a guess at which of the two meanings a state 0 entry has.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:03     ` Thomas Renninger
  2010-10-25 11:55       ` Ingo Molnar
  2010-10-25 11:55       ` Ingo Molnar
@ 2010-10-25 13:58       ` Arjan van de Ven
  2010-10-25 13:58       ` Arjan van de Ven
  3 siblings, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 13:58 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Thomas Gleixner,
	linux-pm, Masami Hiramatsu, Tejun Heo, Ingo Molnar, linux-omap,
	Linus Torvalds, Andrew Morton, Mathieu Desnoyers

On 10/25/2010 4:03 AM, Thomas Renninger wrote:
> On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
>> * Thomas Renninger<trenn@suse.de>  wrote:
>>
>>> New power trace events:
>>> power:processor_idle
>>> power:processor_frequency
>>> power:machine_suspend
>>>
>>>
>>> C-state/idle accounting events:
>>>    power:power_start
>>>    power:power_end
>>> are replaced with:
>>>    power:processor_idle
>> Well, most power saving hw models (and the code implementing them) have this kind of
>> model:
>>
>>   enter power saving mode X
>>   exit power saving mode
>>
>> Where X is some sort of 'power saving deepness' attribute, right?
> Sure.
> But ACPI and afaik this model got picked up for PCI and other (sub-)archs
> as well, defines state 0 as the non-power saving mode.

correct ,... "C0" is not power efficient... but it's still a valid OS 
idle state!
Also tracking processor_idle_{start,end} as a separate event!

same for "S0"... S0 as standby state is still valid... sure it doesn't 
save you much power... but that does not mean it's not valid.
(as indication, the Intel Moorestown platform, which is currently in 
production and available to OEMs, has such a S0 standby state)


> makes no sense and there is no need to introduce:
> processor_idle_start/processor_idle_end
> machine_suspend_start/machine_suspend_end
> device_power_mode_start/device_power_mode_end
> events.
> Using state 0 as "exit/end", is much nicer for kernel/
> userspace implementations/code and the user.
actually no; having written a few of these in userspace so far, having a 
separate end event is easier to deal with;
the actions you take on entry and exit are complete separate code paths.

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:03     ` Thomas Renninger
                         ` (2 preceding siblings ...)
  2010-10-25 13:58       ` Arjan van de Ven
@ 2010-10-25 13:58       ` Arjan van de Ven
  2010-10-25 20:33         ` Rafael J. Wysocki
  2010-10-25 20:33         ` Rafael J. Wysocki
  3 siblings, 2 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 13:58 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Mathieu Desnoyers

On 10/25/2010 4:03 AM, Thomas Renninger wrote:
> On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
>> * Thomas Renninger<trenn@suse.de>  wrote:
>>
>>> New power trace events:
>>> power:processor_idle
>>> power:processor_frequency
>>> power:machine_suspend
>>>
>>>
>>> C-state/idle accounting events:
>>>    power:power_start
>>>    power:power_end
>>> are replaced with:
>>>    power:processor_idle
>> Well, most power saving hw models (and the code implementing them) have this kind of
>> model:
>>
>>   enter power saving mode X
>>   exit power saving mode
>>
>> Where X is some sort of 'power saving deepness' attribute, right?
> Sure.
> But ACPI and afaik this model got picked up for PCI and other (sub-)archs
> as well, defines state 0 as the non-power saving mode.

correct ,... "C0" is not power efficient... but it's still a valid OS 
idle state!
Also tracking processor_idle_{start,end} as a separate event!

same for "S0"... S0 as standby state is still valid... sure it doesn't 
save you much power... but that does not mean it's not valid.
(as indication, the Intel Moorestown platform, which is currently in 
production and available to OEMs, has such a S0 standby state)


> makes no sense and there is no need to introduce:
> processor_idle_start/processor_idle_end
> machine_suspend_start/machine_suspend_end
> device_power_mode_start/device_power_mode_end
> events.
> Using state 0 as "exit/end", is much nicer for kernel/
> userspace implementations/code and the user.
actually no; having written a few of these in userspace so far, having a 
separate end event is easier to deal with;
the actions you take on entry and exit are complete separate code paths.


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25  9:41     ` Thomas Renninger
@ 2010-10-25 13:55       ` Arjan van de Ven
  2010-10-25 13:55       ` Arjan van de Ven
  1 sibling, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 13:55 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Mathieu Desnoyers,
	Ingo Molnar, linux-pm, Masami Hiramatsu, Tejun Heo,
	Andrew Morton, linux-omap, Linus Torvalds, Thomas Gleixner

On 10/25/2010 2:41 AM, Thomas Renninger wrote:
> On Monday 25 October 2010 08:54:34 Arjan van de Ven wrote:
>> On 10/19/2010 4:36 AM, Thomas Renninger wrote:
>>>    static void poll_idle(void)
>>>    {
>>> -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
>>>    	local_irq_enable();
>>>    	while (!need_resched())
>>>    		cpu_relax();
>>> -	trace_power_end(0);
>>>    }
>> why did you remove the idle tracepoints from this one ???
> Because no idle/sleep state is entered here.
> State 0 does not exist or say, it means the machine is not idle.
> The new event uses idle state 0 spec conform as "exit sleep state".
>
> If this should still be trackable some kind of dummy sleep state:
> #define IDLE_BUSY_LOOP 0xFE
> (or similar) must get defined and passed like this:
> trace_processor_idle(IDLE_BUSY_LOOP, smp_processor_id());
>      cpu_relax()
> trace_processor_idle(0, smp_processor_id());
>
> I could imagine this is somewhat worth it to compare idle results
> to "no idle state at all" is used.
> But nobody should ever use idle=poll, comparing deep sleep states
> with C1 with (idle=halt) should be sufficient?

this is not idle=poll on the command line only.
this also gets used normally, in two cases
1) during real time operations, for some short periods of time
     (think wallstreet trading)
2) by the menu governor when the next event is less than a few 
microseconds away, so short that even C1 is too much

I know that your new API tries to use "0" as exit, but 0 is already 
taken (in all power terminology at least on x86 it is) for this.

why isn't your "exit" a special define?


also, if you look at many other similar perf events, they ever separate 
entry/exit points:

process/do_process.cpp:         
perf_events->add_event("irq:irq_handler_entry");
process/do_process.cpp:         
perf_events->add_event("irq:irq_handler_exit");
process/do_process.cpp:         perf_events->add_event("irq:softirq_entry");
process/do_process.cpp:         perf_events->add_event("irq:softirq_exit");
process/do_process.cpp:         
perf_events->add_event("timer:timer_expire_entry");
process/do_process.cpp:         
perf_events->add_event("timer:timer_expire_exit");
process/do_process.cpp:         
perf_events->add_event("timer:hrtimer_expire_entry");
process/do_process.cpp:         
perf_events->add_event("timer:hrtimer_expire_exit");
process/do_process.cpp:         perf_events->add_event("power:power_start");
process/do_process.cpp:         perf_events->add_event("power:power_end");
process/do_process.cpp:         
perf_events->add_event("workqueue:workqueue_execute_start");
process/do_process.cpp:         
perf_events->add_event("workqueue:workqueue_execute_end");

so there is already an API consistency precedent
(and frankly, trying to multiplex in "exit" via a magic value is asking 
for trouble API wise)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25  9:41     ` Thomas Renninger
  2010-10-25 13:55       ` Arjan van de Ven
@ 2010-10-25 13:55       ` Arjan van de Ven
  2010-10-25 14:36         ` Thomas Renninger
  2010-10-25 14:36         ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25 13:55 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Ingo Molnar

On 10/25/2010 2:41 AM, Thomas Renninger wrote:
> On Monday 25 October 2010 08:54:34 Arjan van de Ven wrote:
>> On 10/19/2010 4:36 AM, Thomas Renninger wrote:
>>>    static void poll_idle(void)
>>>    {
>>> -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
>>>    	local_irq_enable();
>>>    	while (!need_resched())
>>>    		cpu_relax();
>>> -	trace_power_end(0);
>>>    }
>> why did you remove the idle tracepoints from this one ???
> Because no idle/sleep state is entered here.
> State 0 does not exist or say, it means the machine is not idle.
> The new event uses idle state 0 spec conform as "exit sleep state".
>
> If this should still be trackable some kind of dummy sleep state:
> #define IDLE_BUSY_LOOP 0xFE
> (or similar) must get defined and passed like this:
> trace_processor_idle(IDLE_BUSY_LOOP, smp_processor_id());
>      cpu_relax()
> trace_processor_idle(0, smp_processor_id());
>
> I could imagine this is somewhat worth it to compare idle results
> to "no idle state at all" is used.
> But nobody should ever use idle=poll, comparing deep sleep states
> with C1 with (idle=halt) should be sufficient?

this is not idle=poll on the command line only.
this also gets used normally, in two cases
1) during real time operations, for some short periods of time
     (think wallstreet trading)
2) by the menu governor when the next event is less than a few 
microseconds away, so short that even C1 is too much

I know that your new API tries to use "0" as exit, but 0 is already 
taken (in all power terminology at least on x86 it is) for this.

why isn't your "exit" a special define?


also, if you look at many other similar perf events, they ever separate 
entry/exit points:

process/do_process.cpp:         
perf_events->add_event("irq:irq_handler_entry");
process/do_process.cpp:         
perf_events->add_event("irq:irq_handler_exit");
process/do_process.cpp:         perf_events->add_event("irq:softirq_entry");
process/do_process.cpp:         perf_events->add_event("irq:softirq_exit");
process/do_process.cpp:         
perf_events->add_event("timer:timer_expire_entry");
process/do_process.cpp:         
perf_events->add_event("timer:timer_expire_exit");
process/do_process.cpp:         
perf_events->add_event("timer:hrtimer_expire_entry");
process/do_process.cpp:         
perf_events->add_event("timer:hrtimer_expire_exit");
process/do_process.cpp:         perf_events->add_event("power:power_start");
process/do_process.cpp:         perf_events->add_event("power:power_end");
process/do_process.cpp:         
perf_events->add_event("workqueue:workqueue_execute_start");
process/do_process.cpp:         
perf_events->add_event("workqueue:workqueue_execute_end");

so there is already an API consistency precedent
(and frankly, trying to multiplex in "exit" via a magic value is asking 
for trouble API wise)


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:55       ` Ingo Molnar
  2010-10-25 12:55         ` Thomas Renninger
  2010-10-25 12:55         ` Thomas Renninger
@ 2010-10-25 12:58         ` Mathieu Desnoyers
  2010-10-25 12:58         ` Mathieu Desnoyers
  3 siblings, 0 replies; 72+ messages in thread
From: Mathieu Desnoyers @ 2010-10-25 12:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Thomas Gleixner, linux-pm, Masami Hiramatsu,
	Tejun Heo, Andrew Morton, linux-omap, Linus Torvalds

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Thomas Renninger <trenn@suse.de> wrote:
> 
> > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > > 
> > > * Thomas Renninger <trenn@suse.de> wrote:
> > > 
> > > > New power trace events:
> > > > power:processor_idle
> > > > power:processor_frequency
> > > > power:machine_suspend
> > > > 
> > > > 
> > > > C-state/idle accounting events:
> > > >   power:power_start
> > > >   power:power_end
> > > > are replaced with:
> > > >   power:processor_idle
> > > 
> > > Well, most power saving hw models (and the code implementing them) have this kind of 
> > > model:
> > > 
> > >  enter power saving mode X
> > >  exit power saving mode
> > > 
> > > Where X is some sort of 'power saving deepness' attribute, right?
> >
> > Sure.
> 
> Which is is the 'saner' model?
> 
> > But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> > defines state 0 as the non-power saving mode.
> 
> But the actual code does not actually deal with any 'state 0', does it? It enters an 
> idle function and then exits it, right?
> 
> 'power state' might be what is used for devices - but even there, we have:
> 
>   - enter power state X
>   - exit power state
> 
> right?
> 
> > Same as done here with machine suspend state (S0 is back from suspend) and
> > this model should get picked up when device sleep states get tracked at
> > some time.
> >
> > It's consistent and applies to some well known specifications.
> 
> What we want it to be is for it to be the nicest, most understandable, most logical 
> model - not one matching random hardware specifications.
> 
> ( Hardware specifications only matter in so far that it should be possible to 
>   express all the known hardware state transitions via these events efficiently. )
> 
> > Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> > there is no need to introduce: processor_idle_start/processor_idle_end 
> > machine_suspend_start/machine_suspend_end 
> > device_power_mode_start/device_power_mode_end events.
> 
> What do you mean by "makes no sense"?
> 
> Are they superfluous? Inefficient? Illogical?

I think it would require deep understanding of specific power modes of each
architecture to split into this topology. On the bright side, it would bring
clear understanding of which HW resource is being put to sleep, which would make
automated analysis much easier to do. But maybe it's too much pain compared to
the benefit. The related question is also: where is it best to put this logic ?
In the kernel code ? In per-arch TRACE_EVENT() handlers or in external trace
analysis plugins ?

> 
> > Using state 0 as "exit/end", is much nicer for kernel/ userspace 
> > implementations/code and the user.
> 
> By that argument we should not have separate fork() and exit() syscalls either, but 
> a set_process_state(1) and set_process_state(0) interface?

I'm by no mean expert on power saving hardware specs, but if it is possible for
hardware to switch between two power saving states without passing through power
state 0, then using a "set state" rather than an enter/exit would be more
appropriate; even if we go for a scheme introducing

processor_idle_start/processor_idle_end,
machine_suspend_start/machine_suspend_end,
device_power_mode_start/device_power_mode_end.

I must defer to you guys to figure out if some hardware actually do that for
either of CPU idle, suspend or device power modes.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:55       ` Ingo Molnar
                           ` (2 preceding siblings ...)
  2010-10-25 12:58         ` Mathieu Desnoyers
@ 2010-10-25 12:58         ` Mathieu Desnoyers
  2010-10-25 20:29           ` Rafael J. Wysocki
  2010-10-25 20:29           ` Rafael J. Wysocki
  3 siblings, 2 replies; 72+ messages in thread
From: Mathieu Desnoyers @ 2010-10-25 12:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Renninger, Linus Torvalds, Andrew Morton, Thomas Gleixner,
	Masami Hiramatsu, Frank Eigler, Steven Rostedt, Kevin Hilman,
	Peter Zijlstra, linux-omap, rjw, linux-pm, linux-trace-users,
	Jean Pihet, Pierre Tardy, Frederic Weisbecker, Tejun Heo,
	Arjan van de Ven

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Thomas Renninger <trenn@suse.de> wrote:
> 
> > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > > 
> > > * Thomas Renninger <trenn@suse.de> wrote:
> > > 
> > > > New power trace events:
> > > > power:processor_idle
> > > > power:processor_frequency
> > > > power:machine_suspend
> > > > 
> > > > 
> > > > C-state/idle accounting events:
> > > >   power:power_start
> > > >   power:power_end
> > > > are replaced with:
> > > >   power:processor_idle
> > > 
> > > Well, most power saving hw models (and the code implementing them) have this kind of 
> > > model:
> > > 
> > >  enter power saving mode X
> > >  exit power saving mode
> > > 
> > > Where X is some sort of 'power saving deepness' attribute, right?
> >
> > Sure.
> 
> Which is is the 'saner' model?
> 
> > But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> > defines state 0 as the non-power saving mode.
> 
> But the actual code does not actually deal with any 'state 0', does it? It enters an 
> idle function and then exits it, right?
> 
> 'power state' might be what is used for devices - but even there, we have:
> 
>   - enter power state X
>   - exit power state
> 
> right?
> 
> > Same as done here with machine suspend state (S0 is back from suspend) and
> > this model should get picked up when device sleep states get tracked at
> > some time.
> >
> > It's consistent and applies to some well known specifications.
> 
> What we want it to be is for it to be the nicest, most understandable, most logical 
> model - not one matching random hardware specifications.
> 
> ( Hardware specifications only matter in so far that it should be possible to 
>   express all the known hardware state transitions via these events efficiently. )
> 
> > Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> > there is no need to introduce: processor_idle_start/processor_idle_end 
> > machine_suspend_start/machine_suspend_end 
> > device_power_mode_start/device_power_mode_end events.
> 
> What do you mean by "makes no sense"?
> 
> Are they superfluous? Inefficient? Illogical?

I think it would require deep understanding of specific power modes of each
architecture to split into this topology. On the bright side, it would bring
clear understanding of which HW resource is being put to sleep, which would make
automated analysis much easier to do. But maybe it's too much pain compared to
the benefit. The related question is also: where is it best to put this logic ?
In the kernel code ? In per-arch TRACE_EVENT() handlers or in external trace
analysis plugins ?

> 
> > Using state 0 as "exit/end", is much nicer for kernel/ userspace 
> > implementations/code and the user.
> 
> By that argument we should not have separate fork() and exit() syscalls either, but 
> a set_process_state(1) and set_process_state(0) interface?

I'm by no mean expert on power saving hardware specs, but if it is possible for
hardware to switch between two power saving states without passing through power
state 0, then using a "set state" rather than an enter/exit would be more
appropriate; even if we go for a scheme introducing

processor_idle_start/processor_idle_end,
machine_suspend_start/machine_suspend_end,
device_power_mode_start/device_power_mode_end.

I must defer to you guys to figure out if some hardware actually do that for
either of CPU idle, suspend or device power modes.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:55       ` Ingo Molnar
  2010-10-25 12:55         ` Thomas Renninger
@ 2010-10-25 12:55         ` Thomas Renninger
  2010-10-25 12:58         ` Mathieu Desnoyers
  2010-10-25 12:58         ` Mathieu Desnoyers
  3 siblings, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 12:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, linux-pm, Masami Hiramatsu,
	Tejun Heo, Andrew Morton, linux-omap, Linus Torvalds,
	Thomas Gleixner

On Monday 25 October 2010 13:55:25 Ingo Molnar wrote:
> 
> * Thomas Renninger <trenn@suse.de> wrote:
> 
> > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > > 
> > > * Thomas Renninger <trenn@suse.de> wrote:
> > > 
> > > > New power trace events:
> > > > power:processor_idle
> > > > power:processor_frequency
> > > > power:machine_suspend
> > > > 
> > > > 
> > > > C-state/idle accounting events:
> > > >   power:power_start
> > > >   power:power_end
> > > > are replaced with:
> > > >   power:processor_idle
> > > 
> > > Well, most power saving hw models (and the code implementing them) have this kind of 
> > > model:
> > > 
> > >  enter power saving mode X
> > >  exit power saving mode
> > > 
> > > Where X is some sort of 'power saving deepness' attribute, right?
> >
> > Sure.
> 
> Which is is the 'saner' model?
> 
> > But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> > defines state 0 as the non-power saving mode.
> 
> But the actual code does not actually deal with any 'state 0', does it?
It does. Not being idle is tracked by cpuidle driver as state 0
(arch independent):
/sys/devices/system/cpu/cpu0/cpuidle/state0/
halt/C1 on X86 is:
/sys/devices/system/cpu/cpu0/cpuidle/state1/
...

> It enters an idle function and then exits it, right?
> 'power state' might be what is used for devices - but even there, we have:
> 
>   - enter power state X
>   - exit power state
> 
> right?
That is not true for PCI, probably others as well.
There you have D0 (being the maximum powered state) up to D3.
Same for PCI Bus Power States (B0, B1, B2, and B3).

Look at drivers/pci/pci.c:pci_raw_set_power_state()
To "exit" a power state you call:
pci_raw_set_power_state(dev, PCI_D0);

Same for suspend. "Exit" suspend is:
#define PM_SUSPEND_ON           ((__force suspend_state_t) 0)
so on resume we enter suspend_state_t 0.

> > Same as done here with machine suspend state (S0 is back from suspend) and
> > this model should get picked up when device sleep states get tracked at
> > some time.
> >
> > It's consistent and applies to some well known specifications.
> 
> What we want it to be is for it to be the nicest, most understandable, most logical 
> model - not one matching random hardware specifications.
> 
> ( Hardware specifications only matter in so far that it should be possible to 
>   express all the known hardware state transitions via these events efficiently. )
> 
> > Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> > there is no need to introduce: processor_idle_start/processor_idle_end 
> > machine_suspend_start/machine_suspend_end 
> > device_power_mode_start/device_power_mode_end events.
> 
> What do you mean by "makes no sense"?
> 
> Are they superfluous?
Yes, you do not need two different events to track one thing.

> Illogical?
Yes, A user who wants to enable processor idle tracking does
want to enable it via:
echo power:processor_idle >/sys/kernel/debug/tracing/events/enable
what do you intend to track with a:
power:power_start
event?

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:55       ` Ingo Molnar
@ 2010-10-25 12:55         ` Thomas Renninger
  2010-10-25 14:11           ` Arjan van de Ven
  2010-10-25 14:11           ` Arjan van de Ven
  2010-10-25 12:55         ` Thomas Renninger
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 12:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Arjan van de Ven

On Monday 25 October 2010 13:55:25 Ingo Molnar wrote:
> 
> * Thomas Renninger <trenn@suse.de> wrote:
> 
> > On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > > 
> > > * Thomas Renninger <trenn@suse.de> wrote:
> > > 
> > > > New power trace events:
> > > > power:processor_idle
> > > > power:processor_frequency
> > > > power:machine_suspend
> > > > 
> > > > 
> > > > C-state/idle accounting events:
> > > >   power:power_start
> > > >   power:power_end
> > > > are replaced with:
> > > >   power:processor_idle
> > > 
> > > Well, most power saving hw models (and the code implementing them) have this kind of 
> > > model:
> > > 
> > >  enter power saving mode X
> > >  exit power saving mode
> > > 
> > > Where X is some sort of 'power saving deepness' attribute, right?
> >
> > Sure.
> 
> Which is is the 'saner' model?
> 
> > But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> > defines state 0 as the non-power saving mode.
> 
> But the actual code does not actually deal with any 'state 0', does it?
It does. Not being idle is tracked by cpuidle driver as state 0
(arch independent):
/sys/devices/system/cpu/cpu0/cpuidle/state0/
halt/C1 on X86 is:
/sys/devices/system/cpu/cpu0/cpuidle/state1/
...

> It enters an idle function and then exits it, right?
> 'power state' might be what is used for devices - but even there, we have:
> 
>   - enter power state X
>   - exit power state
> 
> right?
That is not true for PCI, probably others as well.
There you have D0 (being the maximum powered state) up to D3.
Same for PCI Bus Power States (B0, B1, B2, and B3).

Look at drivers/pci/pci.c:pci_raw_set_power_state()
To "exit" a power state you call:
pci_raw_set_power_state(dev, PCI_D0);

Same for suspend. "Exit" suspend is:
#define PM_SUSPEND_ON           ((__force suspend_state_t) 0)
so on resume we enter suspend_state_t 0.

> > Same as done here with machine suspend state (S0 is back from suspend) and
> > this model should get picked up when device sleep states get tracked at
> > some time.
> >
> > It's consistent and applies to some well known specifications.
> 
> What we want it to be is for it to be the nicest, most understandable, most logical 
> model - not one matching random hardware specifications.
> 
> ( Hardware specifications only matter in so far that it should be possible to 
>   express all the known hardware state transitions via these events efficiently. )
> 
> > Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> > there is no need to introduce: processor_idle_start/processor_idle_end 
> > machine_suspend_start/machine_suspend_end 
> > device_power_mode_start/device_power_mode_end events.
> 
> What do you mean by "makes no sense"?
> 
> Are they superfluous?
Yes, you do not need two different events to track one thing.

> Illogical?
Yes, A user who wants to enable processor idle tracking does
want to enable it via:
echo power:processor_idle >/sys/kernel/debug/tracing/events/enable
what do you intend to track with a:
power:power_start
event?

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:03     ` Thomas Renninger
  2010-10-25 11:55       ` Ingo Molnar
@ 2010-10-25 11:55       ` Ingo Molnar
  2010-10-25 13:58       ` Arjan van de Ven
  2010-10-25 13:58       ` Arjan van de Ven
  3 siblings, 0 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-10-25 11:55 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, linux-pm, Masami Hiramatsu,
	Tejun Heo, Andrew Morton, linux-omap, Linus Torvalds,
	Thomas Gleixner


* Thomas Renninger <trenn@suse.de> wrote:

> On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > 
> > * Thomas Renninger <trenn@suse.de> wrote:
> > 
> > > New power trace events:
> > > power:processor_idle
> > > power:processor_frequency
> > > power:machine_suspend
> > > 
> > > 
> > > C-state/idle accounting events:
> > >   power:power_start
> > >   power:power_end
> > > are replaced with:
> > >   power:processor_idle
> > 
> > Well, most power saving hw models (and the code implementing them) have this kind of 
> > model:
> > 
> >  enter power saving mode X
> >  exit power saving mode
> > 
> > Where X is some sort of 'power saving deepness' attribute, right?
>
> Sure.

Which is is the 'saner' model?

> But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> defines state 0 as the non-power saving mode.

But the actual code does not actually deal with any 'state 0', does it? It enters an 
idle function and then exits it, right?

'power state' might be what is used for devices - but even there, we have:

  - enter power state X
  - exit power state

right?

> Same as done here with machine suspend state (S0 is back from suspend) and
> this model should get picked up when device sleep states get tracked at
> some time.
>
> It's consistent and applies to some well known specifications.

What we want it to be is for it to be the nicest, most understandable, most logical 
model - not one matching random hardware specifications.

( Hardware specifications only matter in so far that it should be possible to 
  express all the known hardware state transitions via these events efficiently. )

> Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> there is no need to introduce: processor_idle_start/processor_idle_end 
> machine_suspend_start/machine_suspend_end 
> device_power_mode_start/device_power_mode_end events.

What do you mean by "makes no sense"?

Are they superfluous? Inefficient? Illogical?

> Using state 0 as "exit/end", is much nicer for kernel/ userspace 
> implementations/code and the user.

By that argument we should not have separate fork() and exit() syscalls either, but 
a set_process_state(1) and set_process_state(0) interface?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 11:03     ` Thomas Renninger
@ 2010-10-25 11:55       ` Ingo Molnar
  2010-10-25 12:55         ` Thomas Renninger
                           ` (3 more replies)
  2010-10-25 11:55       ` Ingo Molnar
                         ` (2 subsequent siblings)
  3 siblings, 4 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-10-25 11:55 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Arjan van de Ven


* Thomas Renninger <trenn@suse.de> wrote:

> On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> > 
> > * Thomas Renninger <trenn@suse.de> wrote:
> > 
> > > New power trace events:
> > > power:processor_idle
> > > power:processor_frequency
> > > power:machine_suspend
> > > 
> > > 
> > > C-state/idle accounting events:
> > >   power:power_start
> > >   power:power_end
> > > are replaced with:
> > >   power:processor_idle
> > 
> > Well, most power saving hw models (and the code implementing them) have this kind of 
> > model:
> > 
> >  enter power saving mode X
> >  exit power saving mode
> > 
> > Where X is some sort of 'power saving deepness' attribute, right?
>
> Sure.

Which is is the 'saner' model?

> But ACPI and afaik this model got picked up for PCI and other (sub-)archs as well, 
> defines state 0 as the non-power saving mode.

But the actual code does not actually deal with any 'state 0', does it? It enters an 
idle function and then exits it, right?

'power state' might be what is used for devices - but even there, we have:

  - enter power state X
  - exit power state

right?

> Same as done here with machine suspend state (S0 is back from suspend) and
> this model should get picked up when device sleep states get tracked at
> some time.
>
> It's consistent and applies to some well known specifications.

What we want it to be is for it to be the nicest, most understandable, most logical 
model - not one matching random hardware specifications.

( Hardware specifications only matter in so far that it should be possible to 
  express all the known hardware state transitions via these events efficiently. )

> Also tracking processor_idle_{start,end} as a separate event makes no sense and 
> there is no need to introduce: processor_idle_start/processor_idle_end 
> machine_suspend_start/machine_suspend_end 
> device_power_mode_start/device_power_mode_end events.

What do you mean by "makes no sense"?

Are they superfluous? Inefficient? Illogical?

> Using state 0 as "exit/end", is much nicer for kernel/ userspace 
> implementations/code and the user.

By that argument we should not have separate fork() and exit() syscalls either, but 
a set_process_state(1) and set_process_state(0) interface?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 10:04   ` Ingo Molnar
@ 2010-10-25 11:03     ` Thomas Renninger
  2010-10-25 11:03     ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 11:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, linux-pm, Masami Hiramatsu,
	Tejun Heo, Andrew Morton, linux-omap, Linus Torvalds,
	Thomas Gleixner

On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> 
> * Thomas Renninger <trenn@suse.de> wrote:
> 
> > New power trace events:
> > power:processor_idle
> > power:processor_frequency
> > power:machine_suspend
> > 
> > 
> > C-state/idle accounting events:
> >   power:power_start
> >   power:power_end
> > are replaced with:
> >   power:processor_idle
> 
> Well, most power saving hw models (and the code implementing them) have this kind of 
> model:
> 
>  enter power saving mode X
>  exit power saving mode
> 
> Where X is some sort of 'power saving deepness' attribute, right?
Sure.
But ACPI and afaik this model got picked up for PCI and other (sub-)archs
as well, defines state 0 as the non-power saving mode.
Same as done here with machine suspend state (S0 is back from suspend) and
this model should get picked up when device sleep states get tracked at
some time.
It's consistent and applies to some well known specifications.

Also tracking processor_idle_{start,end} as a separate event
makes no sense and there is no need to introduce:
processor_idle_start/processor_idle_end
machine_suspend_start/machine_suspend_end
device_power_mode_start/device_power_mode_end
events.
Using state 0 as "exit/end", is much nicer for kernel/
userspace implementations/code and the user.

     Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25 10:04   ` Ingo Molnar
  2010-10-25 11:03     ` Thomas Renninger
@ 2010-10-25 11:03     ` Thomas Renninger
  2010-10-25 11:55       ` Ingo Molnar
                         ` (3 more replies)
  1 sibling, 4 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25 11:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Arjan van de Ven

On Monday 25 October 2010 12:04:28 Ingo Molnar wrote:
> 
> * Thomas Renninger <trenn@suse.de> wrote:
> 
> > New power trace events:
> > power:processor_idle
> > power:processor_frequency
> > power:machine_suspend
> > 
> > 
> > C-state/idle accounting events:
> >   power:power_start
> >   power:power_end
> > are replaced with:
> >   power:processor_idle
> 
> Well, most power saving hw models (and the code implementing them) have this kind of 
> model:
> 
>  enter power saving mode X
>  exit power saving mode
> 
> Where X is some sort of 'power saving deepness' attribute, right?
Sure.
But ACPI and afaik this model got picked up for PCI and other (sub-)archs
as well, defines state 0 as the non-power saving mode.
Same as done here with machine suspend state (S0 is back from suspend) and
this model should get picked up when device sleep states get tracked at
some time.
It's consistent and applies to some well known specifications.

Also tracking processor_idle_{start,end} as a separate event
makes no sense and there is no need to introduce:
processor_idle_start/processor_idle_end
machine_suspend_start/machine_suspend_end
device_power_mode_start/device_power_mode_end
events.
Using state 0 as "exit/end", is much nicer for kernel/
userspace implementations/code and the user.

     Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-19 11:36 ` Thomas Renninger
                     ` (3 preceding siblings ...)
  2010-10-25  6:58   ` Arjan van de Ven
@ 2010-10-25 10:04   ` Ingo Molnar
  2010-10-25 10:04   ` Ingo Molnar
  5 siblings, 0 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-10-25 10:04 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, linux-pm, Masami Hiramatsu,
	Tejun Heo, Andrew Morton, linux-omap, Linus Torvalds,
	Thomas Gleixner


* Thomas Renninger <trenn@suse.de> wrote:

> New power trace events:
> power:processor_idle
> power:processor_frequency
> power:machine_suspend
> 
> 
> C-state/idle accounting events:
>   power:power_start
>   power:power_end
> are replaced with:
>   power:processor_idle

Well, most power saving hw models (and the code implementing them) have this kind of 
model:

 enter power saving mode X
 exit power saving mode

Where X is some sort of 'power saving deepness' attribute, right?

 	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-19 11:36 ` Thomas Renninger
                     ` (4 preceding siblings ...)
  2010-10-25 10:04   ` Ingo Molnar
@ 2010-10-25 10:04   ` Ingo Molnar
  2010-10-25 11:03     ` Thomas Renninger
  2010-10-25 11:03     ` Thomas Renninger
  5 siblings, 2 replies; 72+ messages in thread
From: Ingo Molnar @ 2010-10-25 10:04 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Arjan van de Ven


* Thomas Renninger <trenn@suse.de> wrote:

> New power trace events:
> power:processor_idle
> power:processor_frequency
> power:machine_suspend
> 
> 
> C-state/idle accounting events:
>   power:power_start
>   power:power_end
> are replaced with:
>   power:processor_idle

Well, most power saving hw models (and the code implementing them) have this kind of 
model:

 enter power saving mode X
 exit power saving mode

Where X is some sort of 'power saving deepness' attribute, right?

 	Ingo

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25  6:54   ` Arjan van de Ven
  2010-10-25  9:41     ` Thomas Renninger
@ 2010-10-25  9:41     ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25  9:41 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Mathieu Desnoyers,
	Ingo Molnar, linux-pm, Masami Hiramatsu, Tejun Heo,
	Andrew Morton, linux-omap, Linus Torvalds, Thomas Gleixner

On Monday 25 October 2010 08:54:34 Arjan van de Ven wrote:
> On 10/19/2010 4:36 AM, Thomas Renninger wrote:
> >   static void poll_idle(void)
> >   {
> > -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
> >   	local_irq_enable();
> >   	while (!need_resched())
> >   		cpu_relax();
> > -	trace_power_end(0);
> >   }
> 
> why did you remove the idle tracepoints from this one ???
Because no idle/sleep state is entered here.
State 0 does not exist or say, it means the machine is not idle.
The new event uses idle state 0 spec conform as "exit sleep state".

If this should still be trackable some kind of dummy sleep state:
#define IDLE_BUSY_LOOP 0xFE
(or similar) must get defined and passed like this:
trace_processor_idle(IDLE_BUSY_LOOP, smp_processor_id());
    cpu_relax()
trace_processor_idle(0, smp_processor_id());

I could imagine this is somewhat worth it to compare idle results
to "no idle state at all" is used.
But nobody should ever use idle=poll, comparing deep sleep states
with C1 with (idle=halt) should be sufficient?

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-25  6:54   ` Arjan van de Ven
@ 2010-10-25  9:41     ` Thomas Renninger
  2010-10-25 13:55       ` Arjan van de Ven
  2010-10-25 13:55       ` Arjan van de Ven
  2010-10-25  9:41     ` Thomas Renninger
  1 sibling, 2 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-25  9:41 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Ingo Molnar

On Monday 25 October 2010 08:54:34 Arjan van de Ven wrote:
> On 10/19/2010 4:36 AM, Thomas Renninger wrote:
> >   static void poll_idle(void)
> >   {
> > -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
> >   	local_irq_enable();
> >   	while (!need_resched())
> >   		cpu_relax();
> > -	trace_power_end(0);
> >   }
> 
> why did you remove the idle tracepoints from this one ???
Because no idle/sleep state is entered here.
State 0 does not exist or say, it means the machine is not idle.
The new event uses idle state 0 spec conform as "exit sleep state".

If this should still be trackable some kind of dummy sleep state:
#define IDLE_BUSY_LOOP 0xFE
(or similar) must get defined and passed like this:
trace_processor_idle(IDLE_BUSY_LOOP, smp_processor_id());
    cpu_relax()
trace_processor_idle(0, smp_processor_id());

I could imagine this is somewhat worth it to compare idle results
to "no idle state at all" is used.
But nobody should ever use idle=poll, comparing deep sleep states
with C1 with (idle=halt) should be sufficient?

    Thomas

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-19 11:36 ` Thomas Renninger
  2010-10-25  6:54   ` Arjan van de Ven
  2010-10-25  6:54   ` Arjan van de Ven
@ 2010-10-25  6:58   ` Arjan van de Ven
  2010-10-25  6:58   ` Arjan van de Ven
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25  6:58 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Mathieu Desnoyers,
	Ingo Molnar, linux-pm, Masami Hiramatsu, Tejun Heo,
	Andrew Morton, linux-omap, Linus Torvalds, Thomas Gleixner

On 10/19/2010 4:36 AM, Thomas Renninger wrote:
> New power trace events:
> power:processor_idle
> power:processor_frequency
> power:machine_suspend
>
>
> C-state/idle accounting events:
>    power:power_start
>    power:power_end
> are replaced with:
>    power:processor_idle
>
I think you need two trace points for this
one to enter idle
one to exit

because using magic encoding games to encode "exit"is a mistake; as can 
be seen in this patch.
You're currently trying to use "0" to signal "end of idle", but "0" is 
also a valid idle state (namely that of polling)

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-19 11:36 ` Thomas Renninger
                     ` (2 preceding siblings ...)
  2010-10-25  6:58   ` Arjan van de Ven
@ 2010-10-25  6:58   ` Arjan van de Ven
  2010-10-25 10:04   ` Ingo Molnar
  2010-10-25 10:04   ` Ingo Molnar
  5 siblings, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25  6:58 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Ingo Molnar

On 10/19/2010 4:36 AM, Thomas Renninger wrote:
> New power trace events:
> power:processor_idle
> power:processor_frequency
> power:machine_suspend
>
>
> C-state/idle accounting events:
>    power:power_start
>    power:power_end
> are replaced with:
>    power:processor_idle
>
I think you need two trace points for this
one to enter idle
one to exit

because using magic encoding games to encode "exit"is a mistake; as can 
be seen in this patch.
You're currently trying to use "0" to signal "end of idle", but "0" is 
also a valid idle state (namely that of polling)


^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-19 11:36 ` Thomas Renninger
@ 2010-10-25  6:54   ` Arjan van de Ven
  2010-10-25  6:54   ` Arjan van de Ven
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25  6:54 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: linux-trace-users, Frederic Weisbecker, Pierre Tardy, Jean Pihet,
	Steven Rostedt, Peter Zijlstra, Frank Eigler, Mathieu Desnoyers,
	Ingo Molnar, linux-pm, Masami Hiramatsu, Tejun Heo,
	Andrew Morton, linux-omap, Linus Torvalds, Thomas Gleixner

On 10/19/2010 4:36 AM, Thomas Renninger wrote:
>   static void poll_idle(void)
>   {
> -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
>   	local_irq_enable();
>   	while (!need_resched())
>   		cpu_relax();
> -	trace_power_end(0);
>   }

why did you remove the idle tracepoints from this one ???

^ permalink raw reply	[flat|nested] 72+ messages in thread

* Re: [PATCH 2/3] PERF(kernel): Cleanup power events
  2010-10-19 11:36 ` Thomas Renninger
  2010-10-25  6:54   ` Arjan van de Ven
@ 2010-10-25  6:54   ` Arjan van de Ven
  2010-10-25  9:41     ` Thomas Renninger
  2010-10-25  9:41     ` Thomas Renninger
  2010-10-25  6:58   ` Arjan van de Ven
                     ` (3 subsequent siblings)
  5 siblings, 2 replies; 72+ messages in thread
From: Arjan van de Ven @ 2010-10-25  6:54 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Ingo Molnar

On 10/19/2010 4:36 AM, Thomas Renninger wrote:
>   static void poll_idle(void)
>   {
> -	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
>   	local_irq_enable();
>   	while (!need_resched())
>   		cpu_relax();
> -	trace_power_end(0);
>   }

why did you remove the idle tracepoints from this one ???


^ permalink raw reply	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
       [not found] <1287488171-25303-1-git-send-email-trenn@suse.de>
  2010-10-19 11:36 ` Thomas Renninger
@ 2010-10-19 11:36 ` Thomas Renninger
  1 sibling, 0 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-19 11:36 UTC (permalink / raw)
  To: trenn
  Cc: Arjan van de Ven, linux-trace-users, Frederic Weisbecker,
	Pierre Tardy, Jean Pihet, Steven Rostedt, Peter Zijlstra,
	Frank Eigler, Mathieu Desnoyers, Ingo Molnar, linux-pm,
	Masami Hiramatsu, Tejun Heo, Andrew Morton, linux-omap,
	Linus Torvalds, Thomas Gleixner

New power trace events:
power:processor_idle
power:processor_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:processor_idle

and
  power:power_frequency
is replaced with:
  power:processor_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Frank Eigler <fche@redhat.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Kevin Hilman <khilman@deeprootsystems.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: linux-omap@vger.kernel.org
CC: rjw@sisk.pl
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Pierre Tardy <tardyp@gmail.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Tejun Heo <tj@kernel.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process.c    |    5 ++-
 arch/x86/kernel/process_64.c |    1 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   80 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   14 +++++++
 kernel/trace/power-traces.c  |    3 ++
 8 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..b6b1578 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_processor_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_processor_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_processor_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -480,11 +483,9 @@ static void mwait_idle(void)
  */
 static void poll_idle(void)
 {
-	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
 }
 
 /*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..2c3254c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,7 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_processor_idle(0, smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..33bdc41 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_processor_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..f79de04 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_processor_idle(0, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..c78e496 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_processor_idle((eax >> 4) + 1, smp_processor_id());
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..d5cecd9 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,60 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(processor,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u64,		state		)
+		__field(	u64,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(processor, processor_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	     TP_ARGS(state, cpu_id)
+);
+
+DEFINE_EVENT(processor, processor_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u64,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +123,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..0b5c841 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,20 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..6b6da42 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(processor_idle);
 
-- 
1.6.0.2

^ permalink raw reply related	[flat|nested] 72+ messages in thread

* [PATCH 2/3] PERF(kernel): Cleanup power events
       [not found] <1287488171-25303-1-git-send-email-trenn@suse.de>
@ 2010-10-19 11:36 ` Thomas Renninger
  2010-10-25  6:54   ` Arjan van de Ven
                     ` (5 more replies)
  2010-10-19 11:36 ` Thomas Renninger
  1 sibling, 6 replies; 72+ messages in thread
From: Thomas Renninger @ 2010-10-19 11:36 UTC (permalink / raw)
  To: trenn
  Cc: Linus Torvalds, Andrew Morton, Thomas Gleixner, Masami Hiramatsu,
	Frank Eigler, Steven Rostedt, Kevin Hilman, Peter Zijlstra,
	linux-omap, rjw, linux-pm, linux-trace-users, Jean Pihet,
	Pierre Tardy, Frederic Weisbecker, Tejun Heo, Mathieu Desnoyers,
	Arjan van de Ven, Ingo Molnar

New power trace events:
power:processor_idle
power:processor_frequency
power:machine_suspend


C-state/idle accounting events:
  power:power_start
  power:power_end
are replaced with:
  power:processor_idle

and
  power:power_frequency
is replaced with:
  power:processor_frequency

power:machine_suspend
is newly introduced, a first implementation
comes from the ARM side, but it's easy to add these events
in X86 as well if needed.

the type= field got removed from both, it was never
used and the type is differed by the event type itself.

perf timechart
userspace tool gets adjusted in a separate patch.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
CC: Frank Eigler <fche@redhat.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Kevin Hilman <khilman@deeprootsystems.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: linux-omap@vger.kernel.org
CC: rjw@sisk.pl
CC: linux-pm@lists.linux-foundation.org
CC: linux-trace-users@vger.kernel.org
CC: Jean Pihet <jean.pihet@newoldbits.com>
CC: Pierre Tardy <tardyp@gmail.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Tejun Heo <tj@kernel.org>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/process.c    |    5 ++-
 arch/x86/kernel/process_64.c |    1 +
 drivers/cpufreq/cpufreq.c    |    1 +
 drivers/cpuidle/cpuidle.c    |    1 +
 drivers/idle/intel_idle.c    |    1 +
 include/trace/events/power.h |   80 +++++++++++++++++++++++++++++++++++++++++-
 kernel/trace/Kconfig         |   14 +++++++
 kernel/trace/power-traces.c  |    3 ++
 8 files changed, 103 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 57d1868..b6b1578 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -374,6 +374,7 @@ void default_idle(void)
 {
 	if (hlt_use_halt()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_processor_idle(1, smp_processor_id());
 		current_thread_info()->status &= ~TS_POLLING;
 		/*
 		 * TS_POLLING-cleared state must be visible before we
@@ -444,6 +445,7 @@ EXPORT_SYMBOL_GPL(cpu_idle_wait);
 void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
 {
 	trace_power_start(POWER_CSTATE, (ax>>4)+1, smp_processor_id());
+	trace_processor_idle((ax>>4)+1, smp_processor_id());
 	if (!need_resched()) {
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
@@ -460,6 +462,7 @@ static void mwait_idle(void)
 {
 	if (!need_resched()) {
 		trace_power_start(POWER_CSTATE, 1, smp_processor_id());
+		trace_processor_idle(1, smp_processor_id());
 		if (cpu_has(&current_cpu_data, X86_FEATURE_CLFLUSH_MONITOR))
 			clflush((void *)&current_thread_info()->flags);
 
@@ -480,11 +483,9 @@ static void mwait_idle(void)
  */
 static void poll_idle(void)
 {
-	trace_power_start(POWER_CSTATE, 0, smp_processor_id());
 	local_irq_enable();
 	while (!need_resched())
 		cpu_relax();
-	trace_power_end(0);
 }
 
 /*
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 3d9ea53..2c3254c 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -142,6 +142,7 @@ void cpu_idle(void)
 			start_critical_timings();
 
 			trace_power_end(smp_processor_id());
+			trace_processor_idle(0, smp_processor_id());
 
 			/* In many cases the interrupt that ended idle
 			   has already called exit_idle. But some idle
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 199dcb9..33bdc41 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -355,6 +355,7 @@ void cpufreq_notify_transition(struct cpufreq_freqs *freqs, unsigned int state)
 		dprintk("FREQ: %lu - CPU: %lu", (unsigned long)freqs->new,
 			(unsigned long)freqs->cpu);
 		trace_power_frequency(POWER_PSTATE, freqs->new, freqs->cpu);
+		trace_processor_frequency(freqs->new, freqs->cpu);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a507108..f79de04 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -107,6 +107,7 @@ static void cpuidle_idle_call(void)
 	if (cpuidle_curr_governor->reflect)
 		cpuidle_curr_governor->reflect(dev);
 	trace_power_end(smp_processor_id());
+	trace_processor_idle(0, smp_processor_id());
 }
 
 /**
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index 21ac077..c78e496 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -202,6 +202,7 @@ static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state)
 
 	stop_critical_timings();
 	trace_power_start(POWER_CSTATE, (eax >> 4) + 1, cpu);
+	trace_processor_idle((eax >> 4) + 1, smp_processor_id());
 	if (!need_resched()) {
 
 		__monitor((void *)&current_thread_info()->flags, 0, 0);
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 35a2a6e..d5cecd9 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -7,6 +7,60 @@
 #include <linux/ktime.h>
 #include <linux/tracepoint.h>
 
+DECLARE_EVENT_CLASS(processor,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	TP_ARGS(state, cpu_id),
+
+	TP_STRUCT__entry(
+		__field(	u64,		state		)
+		__field(	u64,		cpu_id		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+		__entry->cpu_id = cpu_id;
+	),
+
+	TP_printk("state=%lu cpu_id=%lu", (unsigned long)__entry->state,
+		  (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(processor, processor_idle,
+
+	TP_PROTO(unsigned int state, unsigned int cpu_id),
+
+	     TP_ARGS(state, cpu_id)
+);
+
+DEFINE_EVENT(processor, processor_frequency,
+
+	TP_PROTO(unsigned int frequency, unsigned int cpu_id),
+
+	TP_ARGS(frequency, cpu_id)
+);
+
+TRACE_EVENT(machine_suspend,
+
+	TP_PROTO(unsigned int state),
+
+	TP_ARGS(state),
+
+	TP_STRUCT__entry(
+		__field(	u64,		state		)
+	),
+
+	TP_fast_assign(
+		__entry->state = state;
+	),
+
+	TP_printk("state=%lu", (unsigned long)__entry->state)
+
+);
+
+#ifdef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
 #ifndef _TRACE_POWER_ENUM_
 #define _TRACE_POWER_ENUM_
 enum {
@@ -69,8 +123,32 @@ TRACE_EVENT(power_end,
 	TP_printk("cpu_id=%lu", (unsigned long)__entry->cpu_id)
 
 );
-
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
 #endif /* _TRACE_POWER_H */
 
+/* Deprecated dummy functions must be protected against multi-declartion */
+#ifndef EVENT_POWER_TRACING_DEPRECATED_PART_H
+#define EVENT_POWER_TRACING_DEPRECATED_PART_H
+
+#ifndef CONFIG_EVENT_POWER_TRACING_DEPRECATED
+
+#ifndef _TRACE_POWER_ENUM_
+#define _TRACE_POWER_ENUM_
+enum {
+	POWER_NONE = 0,
+	POWER_CSTATE = 1,
+	POWER_PSTATE = 2,
+};
+#endif
+
+static inline void trace_power_start(u64 type, u64 state, u64 cpuid) {};
+static inline void trace_power_end(u64 cpuid) {};
+static inline void trace_power_frequency(u64 type, u64 state, u64 cpuid) {};
+#endif /* CONFIG_EVENT_POWER_TRACING_DEPRECATED */
+
+#endif /* EVENT_POWER_TRACING_DEPRECATED_PART_H */
+
+
+
 /* This part must be outside protection */
 #include <trace/define_trace.h>
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 538501c..0b5c841 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -64,6 +64,20 @@ config EVENT_TRACING
 	select CONTEXT_SWITCH_TRACER
 	bool
 
+config EVENT_POWER_TRACING_DEPRECATED
+	depends on EVENT_TRACING
+	bool
+	help
+	  Provides old power event types:
+	  C-state/idle accounting events:
+	  power:power_start
+	  power:power_end
+	  and old cpufreq accounting event:
+	  power:power_frequency
+	  This is for userspace compatibility
+	  and will vanish after 5 kernel iterations,
+	  namely 2.6.41.
+
 config CONTEXT_SWITCH_TRACER
 	bool
 
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 0e0497d..6b6da42 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,5 +13,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
 
+#ifdef EVENT_POWER_TRACING_DEPRECATED
 EXPORT_TRACEPOINT_SYMBOL_GPL(power_start);
+#endif
+EXPORT_TRACEPOINT_SYMBOL_GPL(processor_idle);
 
-- 
1.6.0.2


^ permalink raw reply related	[flat|nested] 72+ messages in thread

end of thread, other threads:[~2010-11-19  0:14 UTC | newest]

Thread overview: 72+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-28  9:02 Cleanup and enhance power trace events Thomas Renninger
2010-10-28  9:02 ` [PATCH 1/3] PERF: Do not export power_frequency, but power_start event Thomas Renninger
2010-10-28  9:02 ` Thomas Renninger
2010-10-28  9:02 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
2010-10-28 11:17   ` Rafael J. Wysocki
2010-10-28 11:17   ` Rafael J. Wysocki
2010-10-28 11:31     ` Rafael J. Wysocki
2010-10-28 11:31     ` [linux-pm] " Rafael J. Wysocki
2010-10-28 11:37       ` Thomas Renninger
2010-10-28 11:37       ` Thomas Renninger
2010-10-28  9:02 ` Thomas Renninger
2010-10-28  9:02 ` [PATCH 3/3] PERF(userspace): Adjust perf timechart to the new " Thomas Renninger
2010-10-28  9:02 ` Thomas Renninger
2010-10-28  9:19   ` Thomas Renninger
2010-10-28  9:19   ` Thomas Renninger
  -- strict thread matches above, loose matches on Subject: below --
2010-11-18 13:01 Power trace event cleanup by still providing old interface for some time Thomas Renninger
2010-11-18 13:01 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
2010-11-11 18:03 [RESEND] Power trace event cleanup by still providing old interface for some time Thomas Renninger
2010-11-11 18:03 ` [PATCH 2/3] PERF(kernel): Cleanup power events Thomas Renninger
2010-11-12 14:20   ` Jean Pihet
2010-11-12 18:17     ` Thomas Renninger
2010-11-12 21:50       ` Jean Pihet
2010-11-14 13:34         ` Thomas Renninger
2010-11-18  8:01           ` Ingo Molnar
2010-11-18  9:27             ` Thomas Renninger
2010-11-18  9:36               ` Ingo Molnar
2010-11-18  9:44                 ` Jean Pihet
2010-11-18 10:52                 ` Ingo Molnar
2010-11-18 16:34                   ` Jean Pihet
2010-11-19  0:14                     ` Thomas Renninger
2010-11-14 13:22   ` Thomas Renninger
2010-11-15 15:49     ` Jean Pihet
     [not found] <1287488171-25303-1-git-send-email-trenn@suse.de>
2010-10-19 11:36 ` Thomas Renninger
2010-10-25  6:54   ` Arjan van de Ven
2010-10-25  6:54   ` Arjan van de Ven
2010-10-25  9:41     ` Thomas Renninger
2010-10-25 13:55       ` Arjan van de Ven
2010-10-25 13:55       ` Arjan van de Ven
2010-10-25 14:36         ` Thomas Renninger
2010-10-25 14:45           ` Arjan van de Ven
2010-10-25 14:45           ` Arjan van de Ven
2010-10-25 14:56             ` Ingo Molnar
2010-10-25 14:56             ` Ingo Molnar
2010-10-25 15:48               ` Thomas Renninger
2010-10-25 16:00                 ` Arjan van de Ven
2010-10-25 23:32                   ` Thomas Renninger
2010-10-25 23:32                   ` Thomas Renninger
2010-10-25 16:00                 ` Arjan van de Ven
2010-10-25 15:48               ` Thomas Renninger
2010-10-25 14:36         ` Thomas Renninger
2010-10-25  9:41     ` Thomas Renninger
2010-10-25  6:58   ` Arjan van de Ven
2010-10-25  6:58   ` Arjan van de Ven
2010-10-25 10:04   ` Ingo Molnar
2010-10-25 10:04   ` Ingo Molnar
2010-10-25 11:03     ` Thomas Renninger
2010-10-25 11:03     ` Thomas Renninger
2010-10-25 11:55       ` Ingo Molnar
2010-10-25 12:55         ` Thomas Renninger
2010-10-25 14:11           ` Arjan van de Ven
2010-10-25 14:11           ` Arjan van de Ven
2010-10-25 14:51             ` Thomas Renninger
2010-10-25 14:51             ` Thomas Renninger
2010-10-25 12:55         ` Thomas Renninger
2010-10-25 12:58         ` Mathieu Desnoyers
2010-10-25 12:58         ` Mathieu Desnoyers
2010-10-25 20:29           ` Rafael J. Wysocki
2010-10-25 20:29           ` Rafael J. Wysocki
2010-10-25 11:55       ` Ingo Molnar
2010-10-25 13:58       ` Arjan van de Ven
2010-10-25 13:58       ` Arjan van de Ven
2010-10-25 20:33         ` Rafael J. Wysocki
2010-10-25 20:33         ` Rafael J. Wysocki
2010-10-19 11:36 ` Thomas Renninger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.