All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
@ 2010-03-12 13:17 Robert Schöne
  2010-03-12 14:52 ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-12 13:17 UTC (permalink / raw)
  To: Dave Jones
  Cc: Thomas Gleixner, Ingo Molnar, Arjan van de Ven, linux-kernel,
	cpufreq, x86

This patch fixes the following behaviour:
Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
It should be reported for the cpu that actually changes its frequency.

Example: when using 
 taskset -c 0 echo <new_frequency> > /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
cpu 0 is traced, instead of cpu 1

Signed of by Robert Schoene <robert.schoene@tu-dresden.de>


diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..0a47f10 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
 
        switch (cmd->type) {
        case SYSTEM_INTEL_MSR_CAPABLE:
+               trace_power_frequency(POWER_PSTATE, cmd->val);
                rdmsr(cmd->addr.msr.reg, lo, hi);
                lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
                wrmsr(cmd->addr.msr.reg, lo, hi);
@@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
                }
        }
 
-       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
 
        switch (data->cpu_feature) {
        case SYSTEM_INTEL_MSR_CAPABLE:



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-12 13:17 [PATCH] trace power_frequency events on the correct cpu (for Intel x86 CPUs) Robert Schöne
@ 2010-03-12 14:52 ` Arjan van de Ven
  2010-03-12 15:41   ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-12 14:52 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Dave Jones, Thomas Gleixner, Ingo Molnar, linux-kernel, cpufreq, x86

On 3/12/2010 5:17, Robert Schöne wrote:
> This patch fixes the following behaviour:
> Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
> It should be reported for the cpu that actually changes its frequency.
>
> Example: when using
>   taskset -c 0 echo<new_frequency>  >  /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
> cpu 0 is traced, instead of cpu 1
>
> Signed of by Robert Schoene<robert.schoene@tu-dresden.de>
>
>
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index 1b1920f..0a47f10 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
>
>          switch (cmd->type) {
>          case SYSTEM_INTEL_MSR_CAPABLE:
> +               trace_power_frequency(POWER_PSTATE, cmd->val);
>                  rdmsr(cmd->addr.msr.reg, lo, hi);
>                  lo = (lo&  ~INTEL_MSR_RANGE) | (cmd->val&  INTEL_MSR_RANGE);
>                  wrmsr(cmd->addr.msr.reg, lo, hi);
> @@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
>                  }
>          }
>
> -       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
>
>          switch (data->cpu_feature) {
>          case SYSTEM_INTEL_MSR_CAPABLE:
>
>

are you sure this is right?
it's moving something from outside a switch statement to inside only one prong of a switch statement...


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-12 14:52 ` Arjan van de Ven
@ 2010-03-12 15:41   ` Robert Schöne
  2010-03-15 10:51     ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-12 15:41 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Dave Jones, Thomas Gleixner, Ingo Molnar, linux-kernel, cpufreq, x86

Am Freitag, den 12.03.2010, 06:52 -0800 schrieb Arjan van de Ven:
> On 3/12/2010 5:17, Robert Schöne wrote:
> > This patch fixes the following behaviour:
> > Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
> > It should be reported for the cpu that actually changes its frequency.
> >
> > Example: when using
> >   taskset -c 0 echo<new_frequency>  >  /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
> > cpu 0 is traced, instead of cpu 1
> >
> > Signed of by Robert Schoene<robert.schoene@tu-dresden.de>
> >
> >
> > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > index 1b1920f..0a47f10 100644
> > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > @@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
> >
> >          switch (cmd->type) {
> >          case SYSTEM_INTEL_MSR_CAPABLE:
> > +               trace_power_frequency(POWER_PSTATE, cmd->val);
> >                  rdmsr(cmd->addr.msr.reg, lo, hi);
> >                  lo = (lo&  ~INTEL_MSR_RANGE) | (cmd->val&  INTEL_MSR_RANGE);
> >                  wrmsr(cmd->addr.msr.reg, lo, hi);
> > @@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> >                  }
> >          }
> >
> > -       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
> >
> >          switch (data->cpu_feature) {
> >          case SYSTEM_INTEL_MSR_CAPABLE:
> >
> >
> 
> are you sure this is right?
> it's moving something from outside a switch statement to inside only one prong of a switch statement...

I'm pretty sure, since I'm moving it from function acpi_cpufreq_target(...) to do_drv_write(...)

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden

Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-12 15:41   ` Robert Schöne
@ 2010-03-15 10:51     ` Thomas Renninger
  2010-03-16  7:13       ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-15 10:51 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Friday 12 March 2010 16:41:46 Robert Schöne wrote:
> Am Freitag, den 12.03.2010, 06:52 -0800 schrieb Arjan van de Ven:
> > On 3/12/2010 5:17, Robert Schöne wrote:
> > > This patch fixes the following behaviour:
> > > Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
> > > It should be reported for the cpu that actually changes its frequency.
> > >
> > > Example: when using
> > >   taskset -c 0 echo<new_frequency>  >  /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
> > > cpu 0 is traced, instead of cpu 1
> > >
> > > Signed of by Robert Schoene<robert.schoene@tu-dresden.de>
> > >
> > >
> > > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > index 1b1920f..0a47f10 100644
> > > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > @@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
> > >
> > >          switch (cmd->type) {
> > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > > +               trace_power_frequency(POWER_PSTATE, cmd->val);
> > >                  rdmsr(cmd->addr.msr.reg, lo, hi);
> > >                  lo = (lo&  ~INTEL_MSR_RANGE) | (cmd->val&  INTEL_MSR_RANGE);
> > >                  wrmsr(cmd->addr.msr.reg, lo, hi);
> > > @@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> > >                  }
> > >          }
> > >
> > > -       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
> > >
> > >          switch (data->cpu_feature) {
> > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > >
> > >
> > 
> > are you sure this is right?
> > it's moving something from outside a switch statement to inside only one prong of a switch statement...
> 
> I'm pretty sure, since I'm moving it from function acpi_cpufreq_target(...) to do_drv_write(...)
What exactly is the argument you are pretty sure this is correct?

I expect Arjan is right.
You now only trace MSR based and not IO based frequency switching.

I don't know the tracing stuff, but it seems the cpu that executes
trace_power_frequency shows up in the statistics as the one on which the
frequency change happened which currently is wrong and you try to fix this?

What exactly is the reason you do not add
trace_power_frequency(..);
also in the
SYSTEM_IO_CAPABLE:
branch in do_drv_write()?

    Thomas



   Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-15 10:51     ` Thomas Renninger
@ 2010-03-16  7:13       ` Robert Schöne
  2010-03-16  9:57         ` Ingo Molnar
  2010-03-16 10:59         ` Thomas Renninger
  0 siblings, 2 replies; 26+ messages in thread
From: Robert Schöne @ 2010-03-16  7:13 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Montag, den 15.03.2010, 11:51 +0100 schrieb Thomas Renninger:
> On Friday 12 March 2010 16:41:46 Robert Schöne wrote:
> > Am Freitag, den 12.03.2010, 06:52 -0800 schrieb Arjan van de Ven:
> > > On 3/12/2010 5:17, Robert Schöne wrote:
> > > > This patch fixes the following behaviour:
> > > > Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
> > > > It should be reported for the cpu that actually changes its frequency.
> > > >
> > > > Example: when using
> > > >   taskset -c 0 echo<new_frequency>  >  /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
> > > > cpu 0 is traced, instead of cpu 1
> > > >
> > > > Signed of by Robert Schoene<robert.schoene@tu-dresden.de>
> > > >
> > > >
> > > > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > index 1b1920f..0a47f10 100644
> > > > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > @@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
> > > >
> > > >          switch (cmd->type) {
> > > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > > > +               trace_power_frequency(POWER_PSTATE, cmd->val);
> > > >                  rdmsr(cmd->addr.msr.reg, lo, hi);
> > > >                  lo = (lo&  ~INTEL_MSR_RANGE) | (cmd->val&  INTEL_MSR_RANGE);
> > > >                  wrmsr(cmd->addr.msr.reg, lo, hi);
> > > > @@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> > > >                  }
> > > >          }
> > > >
> > > > -       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
> > > >
> > > >          switch (data->cpu_feature) {
> > > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > > >
> > > >
> > > 
> > > are you sure this is right?
> > > it's moving something from outside a switch statement to inside only one prong of a switch statement...
You are right, it should be in all cases, which execute a frequency change.
> > 
> > I'm pretty sure, since I'm moving it from function acpi_cpufreq_target(...) to do_drv_write(...)
> What exactly is the argument you are pretty sure this is correct?
> 
> I expect Arjan is right.
> You now only trace MSR based and not IO based frequency switching.
> 
> I don't know the tracing stuff, but it seems the cpu that executes
> trace_power_frequency shows up in the statistics as the one on which the
> frequency change happened which currently is wrong and you try to fix this?
Yes
> 
> What exactly is the reason you do not add
> trace_power_frequency(..);
> also in the
> SYSTEM_IO_CAPABLE:
> branch in do_drv_write()?
I don't know system io capable systems and what they are doing, so I ignored it to prevent reporting wrong "frequencies". 
> 
>     Thomas
> 
> 
> 
>    Thomas
I stand corrected and appended the new patch (with an additional trace command for io capable systems)
Robert


diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..4803883 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -174,11 +174,13 @@ static void do_drv_write(void *_cmd)
 
        switch (cmd->type) {
        case SYSTEM_INTEL_MSR_CAPABLE:
+               trace_power_frequency(POWER_PSTATE, cmd->val);
                rdmsr(cmd->addr.msr.reg, lo, hi);
                lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
                wrmsr(cmd->addr.msr.reg, lo, hi);
                break;
        case SYSTEM_IO_CAPABLE:
+               trace_power_frequency(POWER_PSTATE, cmd->val);
                acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
                                cmd->val,
                                (u32)cmd->addr.io.bit_width);
@@ -363,7 +365,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
                }
        }
 
-       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
 
        switch (data->cpu_feature) {
        case SYSTEM_INTEL_MSR_CAPABLE:



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-16  7:13       ` Robert Schöne
@ 2010-03-16  9:57         ` Ingo Molnar
  2010-03-16 10:59         ` Thomas Renninger
  1 sibling, 0 replies; 26+ messages in thread
From: Ingo Molnar @ 2010-03-16  9:57 UTC (permalink / raw)
  To: Robert Sch?ne
  Cc: Thomas Renninger, Arjan van de Ven, Dave Jones, Thomas Gleixner,
	Ingo Molnar, linux-kernel, cpufreq, x86


* Robert Sch?ne <robert.schoene@tu-dresden.de> wrote:

> Am Montag, den 15.03.2010, 11:51 +0100 schrieb Thomas Renninger:
> > On Friday 12 March 2010 16:41:46 Robert Sch??ne wrote:
> > > Am Freitag, den 12.03.2010, 06:52 -0800 schrieb Arjan van de Ven:
> > > > On 3/12/2010 5:17, Robert Sch??ne wrote:
> > > > > This patch fixes the following behaviour:
> > > > > Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
> > > > > It should be reported for the cpu that actually changes its frequency.
> > > > >
> > > > > Example: when using
> > > > >   taskset -c 0 echo<new_frequency>  >  /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
> > > > > cpu 0 is traced, instead of cpu 1
> > > > >
> > > > > Signed of by Robert Schoene<robert.schoene@tu-dresden.de>
> > > > >
> > > > >
> > > > > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > > index 1b1920f..0a47f10 100644
> > > > > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > > @@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
> > > > >
> > > > >          switch (cmd->type) {
> > > > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > > > > +               trace_power_frequency(POWER_PSTATE, cmd->val);
> > > > >                  rdmsr(cmd->addr.msr.reg, lo, hi);
> > > > >                  lo = (lo&  ~INTEL_MSR_RANGE) | (cmd->val&  INTEL_MSR_RANGE);
> > > > >                  wrmsr(cmd->addr.msr.reg, lo, hi);
> > > > > @@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> > > > >                  }
> > > > >          }
> > > > >
> > > > > -       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
> > > > >
> > > > >          switch (data->cpu_feature) {
> > > > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > > > >
> > > > >
> > > > 
> > > > are you sure this is right?
> > > > it's moving something from outside a switch statement to inside only one prong of a switch statement...
> You are right, it should be in all cases, which execute a frequency change.
> > > 
> > > I'm pretty sure, since I'm moving it from function acpi_cpufreq_target(...) to do_drv_write(...)
> > What exactly is the argument you are pretty sure this is correct?
> > 
> > I expect Arjan is right.
> > You now only trace MSR based and not IO based frequency switching.
> > 
> > I don't know the tracing stuff, but it seems the cpu that executes
> > trace_power_frequency shows up in the statistics as the one on which the
> > frequency change happened which currently is wrong and you try to fix this?
> Yes
> > 
> > What exactly is the reason you do not add
> > trace_power_frequency(..);
> > also in the
> > SYSTEM_IO_CAPABLE:
> > branch in do_drv_write()?
> I don't know system io capable systems and what they are doing, so I ignored it to prevent reporting wrong "frequencies". 
> > 
> >     Thomas
> > 
> > 
> > 
> >    Thomas
>
> I stand corrected and appended the new patch (with an additional trace 
> command for io capable systems) Robert

Please send a changelogged version with everyone Cc:-ed once the dust settles 
and the acks are in.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-16  7:13       ` Robert Schöne
  2010-03-16  9:57         ` Ingo Molnar
@ 2010-03-16 10:59         ` Thomas Renninger
  2010-03-16 14:19           ` Arjan van de Ven
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-16 10:59 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Tuesday 16 March 2010 08:13:48 Robert Schöne wrote:
> Am Montag, den 15.03.2010, 11:51 +0100 schrieb Thomas Renninger:
> > On Friday 12 March 2010 16:41:46 Robert Schöne wrote:
> > > Am Freitag, den 12.03.2010, 06:52 -0800 schrieb Arjan van de Ven:
> > > > On 3/12/2010 5:17, Robert Schöne wrote:
> > > > > This patch fixes the following behaviour:
> > > > > Currently, the power_frequency event is reported for the cpu (core) which initiated the frequency change.
> > > > > It should be reported for the cpu that actually changes its frequency.
> > > > >
> > > > > Example: when using
> > > > >   taskset -c 0 echo<new_frequency>  >  /sys/devices/system/cpu/cpu1/cpufreq/scaling_setspeed
> > > > > cpu 0 is traced, instead of cpu 1
> > > > >
> > > > > Signed of by Robert Schoene<robert.schoene@tu-dresden.de>
> > > > >
> > > > >
> > > > > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > > index 1b1920f..0a47f10 100644
> > > > > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > > > > @@ -174,6 +174,7 @@ static void do_drv_write(void *_cmd)
> > > > >
> > > > >          switch (cmd->type) {
> > > > >          case SYSTEM_INTEL_MSR_CAPABLE:
> > > > > +               trace_power_frequency(POWER_PSTATE, cmd->val);
> > > > >                  rdmsr(cmd->addr.msr.reg, lo, hi);
> > > > >                  lo = (lo&  ~INTEL_MSR_RANGE) | (cmd->val&  INTEL_MSR_RANGE);
> > > > >                  wrmsr(cmd->addr.msr.reg, lo, hi);
> > > > > @@ -363,7 +364,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> > > > >                  }
> > > > >          }
> > > > >
> > > > > -       trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
This is still wrong:
Before the frequency:
   data->freq_table[next_state].frequency
now the control field is traced. This is an arbitrary value which must be
written to the HW (IO or MSR), it's pure luck that in MSR case it seem to
be identical to the frequency (on this HW), but this needs not to be
the case.
   cmd.val = (u32) perf->states[next_perf_state].control


But something else...:
What exactly is the power tracer good for and what is it
capable of which cpufreq_stats is not capable to do?

Beside the fact that it is an ugly macro you cannot grep for,
acpi-cpufreq really seem to be the only place it gets used in
the whole kernel:
grep trace_power_frequency * -rl
arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c

Robert: If you want to get proper cpufreq tracing/statistics,
compile with:
CONFIG_CPU_FREQ_STAT=y
and do:
modprobe cpufreq_stats
cat /sys/devices/system/cpu/cpu*/cpufreq/stats/*

Below patch fixes the problem.
This time submitted on the right mailing list,
it looks like the trace_power_frequency stuff never hit
the cpufreq list, even the maintainer wasn't CC'ed on
any trace_power_frequency submission.

For the trace people: To do it right, you have to hook
your trace function into cpufreq_stats. You also have
to pass the cpu on which the frequency change happened.

---
cpufreq: Remove broken trace_power_frequency

cpufreq_stats is used for frequency statistics and supports *all*
frequency switching drivers/HW.

The trace_power_frequency interface:
  - only supports one cpufreq driver (acpi-cpufreq)
  - has no additional capabilities compared to cpufreq_stats
  - is broken and traces wrong CPUs on frequency switches
    (cmp. with mail thread:
    trace power_frequency events on the correct cpu
    on the cpufreq@vger.kernel.org list)

Signed-off-by: Thomas Renninger <trenn@suse.de>

diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..1808284 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -33,7 +33,6 @@
 #include <linux/cpufreq.h>
 #include <linux/compiler.h>
 #include <linux/dmi.h>
-#include <trace/events/power.h>
 
 #include <linux/acpi.h>
 #include <linux/io.h>
@@ -363,8 +362,6 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
 		}
 	}
 
-	trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
-
 	switch (data->cpu_feature) {
 	case SYSTEM_INTEL_MSR_CAPABLE:
 		cmd.type = SYSTEM_INTEL_MSR_CAPABLE;
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index c4efe9b..82b2b99 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -42,13 +42,6 @@ DEFINE_EVENT(power, power_start,
 	TP_ARGS(type, state)
 );
 
-DEFINE_EVENT(power, power_frequency,
-
-	TP_PROTO(unsigned int type, unsigned int state),
-
-	TP_ARGS(type, state)
-);
-
 TRACE_EVENT(power_end,
 
 	TP_PROTO(int dummy),
diff --git a/kernel/trace/power-traces.c b/kernel/trace/power-traces.c
index 9f4f565..705d926 100644
--- a/kernel/trace/power-traces.c
+++ b/kernel/trace/power-traces.c
@@ -13,6 +13,3 @@
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/power.h>
-
-EXPORT_TRACEPOINT_SYMBOL_GPL(power_frequency);
-

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-16 10:59         ` Thomas Renninger
@ 2010-03-16 14:19           ` Arjan van de Ven
  2010-03-16 14:50             ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-16 14:19 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86


> But something else...:
> What exactly is the power tracer good for and what is it
> capable of which cpufreq_stats is not capable to do?

look at timechart for example.....
it's extremely useful to have this for us that do power tuning...
cpufreq_stats is nice but not nearly good enough since you only get averages,
not time behavior.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-16 14:19           ` Arjan van de Ven
@ 2010-03-16 14:50             ` Thomas Renninger
  2010-03-16 16:40               ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-16 14:50 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Tuesday 16 March 2010 15:19:13 Arjan van de Ven wrote:
> 
> > But something else...:
> > What exactly is the power tracer good for and what is it
> > capable of which cpufreq_stats is not capable to do?
> 
> look at timechart for example.....
> it's extremely useful to have this for us that do power tuning...
Currently even only a small subset of x86 ia supported, I can't see how
useful this is.

> cpufreq_stats is nice but not nearly good enough since you only get
> averages, not time behavior.
As said, try to hook it into cpufreq_stats.
Mark the cpufreq_stats sysfs interface deprecated, etc.

I like the idea of having one central trace utility, which probably
makes it easier for people to find such things.
But please do it properly and CC the cpufreq list in the future.

Still, as this is totally broken:
   - by design -> only one of a dozen cpufreq drivers is supported
   - by implementation -> wrong CPUs are tracked
please submit my patch and remove this again until a proper 
implementation is provided.

Thanks,

    Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-16 14:50             ` Thomas Renninger
@ 2010-03-16 16:40               ` Arjan van de Ven
  2010-03-17 16:36                 ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-16 16:40 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On 3/16/2010 7:50, Thomas Renninger wrote:
> Still, as this is totally broken:
>     - by design ->  only one of a dozen cpufreq drivers is supported

the one I care about is supported. The others... anyone can add support for their favorite driver.


> please submit my patch and remove this again until a proper
> implementation is provided.


how about not. it works right now, and is in active use right now.
feel free to add support to other drivers if you care about those....

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-16 16:40               ` Arjan van de Ven
@ 2010-03-17 16:36                 ` Thomas Renninger
  2010-03-17 16:41                   ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-17 16:36 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Tuesday 16 March 2010 17:40:18 Arjan van de Ven wrote:
> On 3/16/2010 7:50, Thomas Renninger wrote:
> > Still, as this is totally broken:
> >     - by design ->  only one of a dozen cpufreq drivers is supported
> 
> the one I care about is supported.
And that's the problem, before it's not removed, you do not care to
provide/suggest a proper solution that could fit others as well.

> The others... anyone can add
> support for their favorite driver.
Or everyone can write his own debug facility for every driver...

As said, there already is a debug facility. If you need timings,
improve the existing interface, suggest how it could get connected
to the tracing facility or why you see problems doing so.

> > please submit my patch and remove this again until a proper
> > implementation is provided.

> how about not. it works right now,
It does not. Robert's fix (if he had taken the frequency) is correct.
That means you could get totally wrong values
when you put some load on the machine and the scheduler moves around
processes. You'd better compared your results with cpufreq_stats...
> and is in active use right now.
It's obviously not powertop, there I get some statistics on an AMD
machine as well. So where does it get used?

> feel free to add support to other drivers if you care about those....
It's easy to fix up acpi-cpufreq itself and implement this into
powernow-k8. Not sure about possible cpufreq drivers which can switch
frequency on cores without executing the code on this core. I expect
trace(_power_frequency) is not made for such?

I assume everybody who did work on cpufreq interfaces would have given
you a not-ack'ed or at least had asked the questions I did above.
People know you well and x86 maintainers just pushed it, I also doubt
you intentionally did not CC the cpufreq list.
But now things are broken and should get reverted.

Grmpfl, I can have a look at it, but not the next couple of days...

    Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-17 16:36                 ` Thomas Renninger
@ 2010-03-17 16:41                   ` Arjan van de Ven
  2010-03-18 20:43                     ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-17 16:41 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On 3/17/2010 9:36, Thomas Renninger wrote:
> On Tuesday 16 March 2010 17:40:18 Arjan van de Ven wrote:
>> On 3/16/2010 7:50, Thomas Renninger wrote:
>>> Still, as this is totally broken:
>>>      - by design ->   only one of a dozen cpufreq drivers is supported
>>
>> the one I care about is supported.
> And that's the problem, before it's not removed, you do not care to
> provide/suggest a proper solution that could fit others as well.

why don't you provide the others then?


>> how about not. it works right now,
> It does not. Robert's fix (if he had taken the frequency) is correct.
> That means you could get totally wrong values
> when you put some load on the machine and the scheduler moves around
> processes.

the only case where the bug can hit is the userspace governor, yes.
it works correct for everything else. Yes it wants fixing. No it does not make
all data from it worthless.


>> and is in active use right now.
> It's obviously not powertop, there I get some statistics on an AMD
> machine as well. So where does it get used?

as I wrote in my previous mails... timechart.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-17 16:41                   ` Arjan van de Ven
@ 2010-03-18 20:43                     ` Thomas Renninger
  2010-03-19  8:01                       ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-18 20:43 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Wednesday 17 March 2010 05:41:38 pm Arjan van de Ven wrote:
> On 3/17/2010 9:36, Thomas Renninger wrote:
> > On Tuesday 16 March 2010 17:40:18 Arjan van de Ven wrote:
> >> On 3/16/2010 7:50, Thomas Renninger wrote:
> >>> Still, as this is totally broken:
> >>>      - by design ->   only one of a dozen cpufreq drivers is supported
> >>
> >> the one I care about is supported.
> >
> > And that's the problem, before it's not removed, you do not care to
> > provide/suggest a proper solution that could fit others as well.
>
> why don't you provide the others then?
Is it possible somehow to provide a kind of wrapper/backup function:
trace_power_frequency_cpu(POWER_PSTATE, frequency, cpu);
or another trace interface with eventually some more overhead?
Then it should be easy to support all drivers.
Otherwise you'll for example miss the pcc driver which supports
latest/upcoming Intel CPUs and which does IO based switching which needs not
to run on the cpu that gets switched.

     Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-18 20:43                     ` Thomas Renninger
@ 2010-03-19  8:01                       ` Robert Schöne
  2010-03-20 21:37                         ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-19  8:01 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Donnerstag, den 18.03.2010, 21:43 +0100 schrieb Thomas Renninger:
> On Wednesday 17 March 2010 05:41:38 pm Arjan van de Ven wrote:
> > On 3/17/2010 9:36, Thomas Renninger wrote:
> > > On Tuesday 16 March 2010 17:40:18 Arjan van de Ven wrote:
> > >> On 3/16/2010 7:50, Thomas Renninger wrote:
> > >>> Still, as this is totally broken:
> > >>>      - by design ->   only one of a dozen cpufreq drivers is supported
> > >>
> > >> the one I care about is supported.
> > >
> > > And that's the problem, before it's not removed, you do not care to
> > > provide/suggest a proper solution that could fit others as well.
> >
> > why don't you provide the others then?
> Is it possible somehow to provide a kind of wrapper/backup function:
> trace_power_frequency_cpu(POWER_PSTATE, frequency, cpu);
As I understood the tracing correctly this would imply the usage of
smp_call_function_single which should be aoided.
> or another trace interface with eventually some more overhead?
> Then it should be easy to support all drivers.
> Otherwise you'll for example miss the pcc driver which supports
> latest/upcoming Intel CPUs and which does IO based switching which needs not
> to run on the cpu that gets switched.
> 
>      Thomas
This wrapper would make things easier of course, since one might add the
same line for every driver. Or even call it from
cpufreq_notify_transition (POSTCHANGE) which would add this event for
other architectures then x86 (which I would approve).
But - and here's the problem - the tracing infrastructure should provide
as less overhead as possible. It should be called directly after writing
the new frequency for the specific cpu and by this optimally in a
funtion wich is already executed on the cpu.

Otherwise the following might happen (4 cpu system):
cpu 3 wants to change freq of cpu 0:
...
switch to cpu 0 (some overhead)
write new frequency
switch back (some overhead)
switch to cpu 0 (some overhead)
write trace
switch back (some overhead)
...
It's easy to see, that writing of the frequency and writing the trace
should be called directly one after another. It increases the accuracy
of the trace and decreases the overhead.

Robert


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-19  8:01                       ` Robert Schöne
@ 2010-03-20 21:37                         ` Thomas Renninger
  2010-03-22  0:42                           ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-20 21:37 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

> This wrapper would make things easier of course, since one might add the
> same line for every driver
...
It's simply broken and can't be fixed without chaninging the whole concept.
power_frequency_events() must not only pass the cpu, but a cpumask also for 
acpi-cpufreq.
If you look at SW_ANY SW_ALL (or similar, I can't look at code right now) 
parts, acpi-cpufreq can change the frequency for several cores with one MSR 
write.
AFAIK this is for example the recommended way for Dunnington (6 cores).
So the collected data can be totally wrong.
Better remove this before others also start to use it.
It also seem to be (hopefully) a minor feature for timechart, so this should 
not hurt that much (yet).

     Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-20 21:37                         ` Thomas Renninger
@ 2010-03-22  0:42                           ` Arjan van de Ven
  2010-03-22  7:04                             ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-22  0:42 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On 3/20/2010 14:37, Thomas Renninger wrote:

> It also seem to be (hopefully) a minor feature for timechart, so this should
> not hurt that much (yet).

It's actually a major feature for timechart, and one of the key things I and a bunch of others
inside Intel use timechart for.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-22  0:42                           ` Arjan van de Ven
@ 2010-03-22  7:04                             ` Robert Schöne
  2010-03-22 13:57                               ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-22  7:04 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
> On 3/20/2010 14:37, Thomas Renninger wrote:
> 
> > It also seem to be (hopefully) a minor feature for timechart, so this should
> > not hurt that much (yet).
> 
> It's actually a major feature for timechart, and one of the key things I and a bunch of others
> inside Intel use timechart for.
> 
It's a major feature for us too.
I suppose, the cpufreq_notify_transition calls are correct (meaning
being called for all related cpus) for every driver. So there's still
the option to include it in the POST_CHANGE section of this function.
Could this be okay for the both of you?


-- 
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden

Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-22  7:04                             ` Robert Schöne
@ 2010-03-22 13:57                               ` Arjan van de Ven
  2010-03-23 16:28                                 ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-22 13:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Thomas Renninger, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On 3/22/2010 0:04, Robert Schöne wrote:
> Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
>> On 3/20/2010 14:37, Thomas Renninger wrote:
>>
>>> It also seem to be (hopefully) a minor feature for timechart, so this should
>>> not hurt that much (yet).
>>
>> It's actually a major feature for timechart, and one of the key things I and a bunch of others
>> inside Intel use timechart for.
>>
> It's a major feature for us too.
> I suppose, the cpufreq_notify_transition calls are correct (meaning
> being called for all related cpus) for every driver. So there's still
> the option to include it in the POST_CHANGE section of this function.
> Could this be okay for the both of you?

post change would work... that gets frequency afaik..

>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-22 13:57                               ` Arjan van de Ven
@ 2010-03-23 16:28                                 ` Robert Schöne
  2010-03-23 16:57                                   ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-23 16:28 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Montag, den 22.03.2010, 06:57 -0700 schrieb Arjan van de Ven:
> On 3/22/2010 0:04, Robert Schöne wrote:
> > Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
> >> On 3/20/2010 14:37, Thomas Renninger wrote:
> >>
> >>> It also seem to be (hopefully) a minor feature for timechart, so this should
> >>> not hurt that much (yet).
> >>
> >> It's actually a major feature for timechart, and one of the key things I and a bunch of others
> >> inside Intel use timechart for.
> >>
> > It's a major feature for us too.
> > I suppose, the cpufreq_notify_transition calls are correct (meaning
> > being called for all related cpus) for every driver. So there's still
> > the option to include it in the POST_CHANGE section of this function.
> > Could this be okay for the both of you?
> 
> post change would work... that gets frequency afaik..
Are you ok with this too, Thomas?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-23 16:28                                 ` Robert Schöne
@ 2010-03-23 16:57                                   ` Thomas Renninger
  2010-03-23 16:58                                     ` Arjan van de Ven
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-23 16:57 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Tuesday 23 March 2010 17:28:36 Robert Schöne wrote:
> Am Montag, den 22.03.2010, 06:57 -0700 schrieb Arjan van de Ven:
> > On 3/22/2010 0:04, Robert Schöne wrote:
> > > Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
> > >> On 3/20/2010 14:37, Thomas Renninger wrote:
> > >>
> > >>> It also seem to be (hopefully) a minor feature for timechart, so this should
> > >>> not hurt that much (yet).
> > >>
> > >> It's actually a major feature for timechart, and one of the key things I and a bunch of others
> > >> inside Intel use timechart for.
> > >>
> > > It's a major feature for us too.
> > > I suppose, the cpufreq_notify_transition calls are correct (meaning
> > > being called for all related cpus) for every driver. So there's still
> > > the option to include it in the POST_CHANGE section of this function.
> > > Could this be okay for the both of you?
> > 
> > post change would work... that gets frequency afaik..
> Are you ok with this too, Thomas?
You mean hooking it into cpufreq_stat_notifier_trans() in
drivers/cpufreq/cpufreq_stats.c?
That sounds like the right approach.

But what assures that this is executed on the correct cpu, which seem
to be necessary with the current trace function?

   Thomas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-23 16:57                                   ` Thomas Renninger
@ 2010-03-23 16:58                                     ` Arjan van de Ven
  2010-03-24  7:07                                       ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Arjan van de Ven @ 2010-03-23 16:58 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Robert Schöne, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On 3/23/2010 9:57, Thomas Renninger wrote:
> On Tuesday 23 March 2010 17:28:36 Robert Schöne wrote:
>> Am Montag, den 22.03.2010, 06:57 -0700 schrieb Arjan van de Ven:
>>> On 3/22/2010 0:04, Robert Schöne wrote:
>>>> Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
>>>>> On 3/20/2010 14:37, Thomas Renninger wrote:
>>>>>
>>>>>> It also seem to be (hopefully) a minor feature for timechart, so this should
>>>>>> not hurt that much (yet).
>>>>>
>>>>> It's actually a major feature for timechart, and one of the key things I and a bunch of others
>>>>> inside Intel use timechart for.
>>>>>
>>>> It's a major feature for us too.
>>>> I suppose, the cpufreq_notify_transition calls are correct (meaning
>>>> being called for all related cpus) for every driver. So there's still
>>>> the option to include it in the POST_CHANGE section of this function.
>>>> Could this be okay for the both of you?
>>>
>>> post change would work... that gets frequency afaik..
>> Are you ok with this too, Thomas?
> You mean hooking it into cpufreq_stat_notifier_trans() in
> drivers/cpufreq/cpufreq_stats.c?

no


hooking into the post frequency change callback that gets done..
which is guaranteed to be on the right cpu afaics.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-23 16:58                                     ` Arjan van de Ven
@ 2010-03-24  7:07                                       ` Robert Schöne
  2010-03-30  5:46                                         ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-24  7:07 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Dienstag, den 23.03.2010, 09:58 -0700 schrieb Arjan van de Ven:
> On 3/23/2010 9:57, Thomas Renninger wrote:
> > On Tuesday 23 March 2010 17:28:36 Robert Schöne wrote:
> >> Am Montag, den 22.03.2010, 06:57 -0700 schrieb Arjan van de Ven:
> >>> On 3/22/2010 0:04, Robert Schöne wrote:
> >>>> Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
> >>>>> On 3/20/2010 14:37, Thomas Renninger wrote:
> >>>>>
> >>>>>> It also seem to be (hopefully) a minor feature for timechart, so this should
> >>>>>> not hurt that much (yet).
> >>>>>
> >>>>> It's actually a major feature for timechart, and one of the key things I and a bunch of others
> >>>>> inside Intel use timechart for.
> >>>>>
> >>>> It's a major feature for us too.
> >>>> I suppose, the cpufreq_notify_transition calls are correct (meaning
> >>>> being called for all related cpus) for every driver. So there's still
> >>>> the option to include it in the POST_CHANGE section of this function.
> >>>> Could this be okay for the both of you?
> >>>
> >>> post change would work... that gets frequency afaik..
> >> Are you ok with this too, Thomas?
> > You mean hooking it into cpufreq_stat_notifier_trans() in
> > drivers/cpufreq/cpufreq_stats.c?
> 
> no
> 
> 
> hooking into the post frequency change callback that gets done..
> which is guaranteed to be on the right cpu afaics.
> 
I don't see where this would be guaranteed. So I'd be fine with
a) adding it to
cpufreq.c/cpufreq_notify_transition/cpufreq_notify_transition

b) adding an item to the cpufreq_transition_notifier_list

c) adding it to cpufreq_stats.c/cpufreq_stat_notifier_trans

which would imply the usage of smp_call_function_single(...)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-24  7:07                                       ` Robert Schöne
@ 2010-03-30  5:46                                         ` Robert Schöne
  2010-03-30  8:56                                           ` Thomas Renninger
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-30  5:46 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Mittwoch, den 24.03.2010, 08:07 +0100 schrieb Robert Schöne:
> Am Dienstag, den 23.03.2010, 09:58 -0700 schrieb Arjan van de Ven:
> > On 3/23/2010 9:57, Thomas Renninger wrote:
> > > On Tuesday 23 March 2010 17:28:36 Robert Schöne wrote:
> > >> Am Montag, den 22.03.2010, 06:57 -0700 schrieb Arjan van de Ven:
> > >>> On 3/22/2010 0:04, Robert Schöne wrote:
> > >>>> Am Sonntag, den 21.03.2010, 17:42 -0700 schrieb Arjan van de Ven:
> > >>>>> On 3/20/2010 14:37, Thomas Renninger wrote:
> > >>>>>
> > >>>>>> It also seem to be (hopefully) a minor feature for timechart, so this should
> > >>>>>> not hurt that much (yet).
> > >>>>>
> > >>>>> It's actually a major feature for timechart, and one of the key things I and a bunch of others
> > >>>>> inside Intel use timechart for.
> > >>>>>
> > >>>> It's a major feature for us too.
> > >>>> I suppose, the cpufreq_notify_transition calls are correct (meaning
> > >>>> being called for all related cpus) for every driver. So there's still
> > >>>> the option to include it in the POST_CHANGE section of this function.
> > >>>> Could this be okay for the both of you?
> > >>>
> > >>> post change would work... that gets frequency afaik..
> > >> Are you ok with this too, Thomas?
> > > You mean hooking it into cpufreq_stat_notifier_trans() in
> > > drivers/cpufreq/cpufreq_stats.c?
> > 
> > no
> > 
> > 
> > hooking into the post frequency change callback that gets done..
> > which is guaranteed to be on the right cpu afaics.
> > 
> I don't see where this would be guaranteed. So I'd be fine with
> a) adding it to
> cpufreq.c/cpufreq_notify_transition/cpufreq_notify_transition
> 
> b) adding an item to the cpufreq_transition_notifier_list
> 
> c) adding it to cpufreq_stats.c/cpufreq_stat_notifier_trans
> 
> which would imply the usage of smp_call_function_single(...)
> 
I really want to keep this diskussion alive until there's a soultion we
can all agree.
So Arjan and Thomas, are there any comments/preferences to the proposed
options?



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-30  5:46                                         ` Robert Schöne
@ 2010-03-30  8:56                                           ` Thomas Renninger
  2010-03-31  6:40                                             ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Renninger @ 2010-03-30  8:56 UTC (permalink / raw)
  To: Robert Schöne
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

On Tuesday 30 March 2010 07:46:55 Robert Schöne wrote:
...
> I really want to keep this diskussion alive until there's a soultion we
> can all agree.
> So Arjan and Thomas, are there any comments/preferences to the proposed
> options?
I'd like to extend the powertracer and pass the cpu.
This is the only possibility I see to be able to support IO driven
frequency switching drivers where the switching code must not be executed
on the CPU that gets switched (without executing the tracer on each
CPU explicitly which does not make sense).

The next problem where current implementation is unfixable broken with
the tracer just passing the frequency is the fact that several CPUs
could get switched with one MSR write to a depending CPU (SW_ANY).
The same btw applies to C-states for which the tracer is used in
the same way (compare with 8.4.2.2 _CSD (C-State Dependency) of a
current ACPI spec).

No idea what the impact on userspace tools is, if I find some time
I can have a look at timechart how trace data gets read/used.
But I fear the Cstate tracing is used in some more tools already?
It would be great to get feedback/suggestions from people making use
of it already.

Below is still broken, but should make things at least a bit better:

---
X86 cpufreq: Fix powertracer in acpi-cpufreq and exec it on the correct cpu(s)

Several things are broken with the tracer currently.
This patch fixes:
  - With the userspace governor the wrong cpu could get tracked if the target
    function is executed on a CPU which does not get switched
  - In SW_ALL CPU dependency case (CPUFREQ_SHARED_TYPE_ALL) only one CPU got
    tracked. Now all CPUs that depend on each other are tracked.

What this patch does not fix:
  - In SW_ANY case (CPUFREQ_SHARED_TYPE_ANY) it's enough to write to a MSR of
    one of the depending CPUs. The power trace macro misses the ability
    to pass the cpu. Thus only one of the depending CPUs gets tracked correctly.
    To be able to fix this the power trace macro must get extended.


Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schöne <robert.schoene@tu-dresden.de>
CC: x86@kernel.org
CC: cpufreq <cpufreq@vger.kernel.org>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: stable@kernel.org

---
 arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
index 1b1920f..259c49e 100644
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
+++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -144,6 +144,7 @@ struct drv_cmd {
 		struct io_addr io;
 	} addr;
 	u32 val;
+	unsigned int frequency;
 };
 
 /* Called via smp_call_function_single(), on the target CPU */
@@ -177,11 +178,13 @@ static void do_drv_write(void *_cmd)
 		rdmsr(cmd->addr.msr.reg, lo, hi);
 		lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
 		wrmsr(cmd->addr.msr.reg, lo, hi);
+		trace_power_frequency(POWER_PSTATE, cmd->frequency);
 		break;
 	case SYSTEM_IO_CAPABLE:
 		acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
 				cmd->val,
 				(u32)cmd->addr.io.bit_width);
+		trace_power_frequency(POWER_PSTATE, cmd->frequency);
 		break;
 	default:
 		break;
@@ -363,7 +366,7 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
 		}
 	}
 
-	trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
+	cmd.frequency = data->freq_table[next_state].frequency;
 
 	switch (data->cpu_feature) {
 	case SYSTEM_INTEL_MSR_CAPABLE:
  

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-30  8:56                                           ` Thomas Renninger
@ 2010-03-31  6:40                                             ` Robert Schöne
  2010-04-12  6:53                                               ` Robert Schöne
  0 siblings, 1 reply; 26+ messages in thread
From: Robert Schöne @ 2010-03-31  6:40 UTC (permalink / raw)
  To: Thomas Renninger
  Cc: Arjan van de Ven, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Am Dienstag, den 30.03.2010, 09:56 +0100 schrieb Thomas Renninger:
> On Tuesday 30 March 2010 07:46:55 Robert Schöne wrote:
> ...
> > I really want to keep this diskussion alive until there's a soultion we
> > can all agree.
> > So Arjan and Thomas, are there any comments/preferences to the proposed
> > options?
> I'd like to extend the powertracer and pass the cpu.
> This is the only possibility I see to be able to support IO driven
> frequency switching drivers where the switching code must not be executed
> on the CPU that gets switched (without executing the tracer on each
> CPU explicitly which does not make sense).
As I understand you, you want to extend the event data of each power
trace event, which would be fine for me.
However, I think this would need some resorting of the events for the
perf tools.
@Arjan would this be feasible? 
> 
> The next problem where current implementation is unfixable broken with
> the tracer just passing the frequency is the fact that several CPUs
> could get switched with one MSR write to a depending CPU (SW_ANY).
> The same btw applies to C-states for which the tracer is used in
> the same way (compare with 8.4.2.2 _CSD (C-State Dependency) of a
> current ACPI spec).
> 
> No idea what the impact on userspace tools is, if I find some time
> I can have a look at timechart how trace data gets read/used.
> But I fear the Cstate tracing is used in some more tools already?
> It would be great to get feedback/suggestions from people making use
> of it already.
> 
> Below is still broken, but should make things at least a bit better:
> 
> ---
> X86 cpufreq: Fix powertracer in acpi-cpufreq and exec it on the correct cpu(s)
> 
> Several things are broken with the tracer currently.
> This patch fixes:
>   - With the userspace governor the wrong cpu could get tracked if the target
>     function is executed on a CPU which does not get switched
>   - In SW_ALL CPU dependency case (CPUFREQ_SHARED_TYPE_ALL) only one CPU got
>     tracked. Now all CPUs that depend on each other are tracked.
> 
> What this patch does not fix:
>   - In SW_ANY case (CPUFREQ_SHARED_TYPE_ANY) it's enough to write to a MSR of
>     one of the depending CPUs. The power trace macro misses the ability
>     to pass the cpu. Thus only one of the depending CPUs gets tracked correctly.
>     To be able to fix this the power trace macro must get extended.
> 
> 
> Signed-off-by: Thomas Renninger <trenn@suse.de>
> CC: Robert Schöne <robert.schoene@tu-dresden.de>
> CC: x86@kernel.org
> CC: cpufreq <cpufreq@vger.kernel.org>
> CC: Arjan van de Ven <arjan@linux.intel.com>
> CC: stable@kernel.org
> 
> ---
>  arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> index 1b1920f..259c49e 100644
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -144,6 +144,7 @@ struct drv_cmd {
>  		struct io_addr io;
>  	} addr;
>  	u32 val;
> +	unsigned int frequency;
>  };
>  
>  /* Called via smp_call_function_single(), on the target CPU */
> @@ -177,11 +178,13 @@ static void do_drv_write(void *_cmd)
>  		rdmsr(cmd->addr.msr.reg, lo, hi);
>  		lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
>  		wrmsr(cmd->addr.msr.reg, lo, hi);
> +		trace_power_frequency(POWER_PSTATE, cmd->frequency);
>  		break;
>  	case SYSTEM_IO_CAPABLE:
>  		acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
>  				cmd->val,
>  				(u32)cmd->addr.io.bit_width);
> +		trace_power_frequency(POWER_PSTATE, cmd->frequency);
>  		break;
>  	default:
>  		break;
> @@ -363,7 +366,7 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
>  		}
>  	}
>  
> -	trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
> +	cmd.frequency = data->freq_table[next_state].frequency;
>  
>  	switch (data->cpu_feature) {
>  	case SYSTEM_INTEL_MSR_CAPABLE:
>   
> 


-- 
Robert Schoene
Technische Universitaet Dresden
Zentrum fuer Informationsdienste und Hochleistungsrechnen
01062 Dresden

Tel.: (0351) 463-42483, Fax: (0351) 463-37773
E-Mail: Robert.Schoene@tu-dresden.de


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] trace power_frequency events on the correct cpu  (for Intel x86 CPUs)
  2010-03-31  6:40                                             ` Robert Schöne
@ 2010-04-12  6:53                                               ` Robert Schöne
  0 siblings, 0 replies; 26+ messages in thread
From: Robert Schöne @ 2010-04-12  6:53 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Thomas Renninger, Dave Jones, Thomas Gleixner, Ingo Molnar,
	linux-kernel, cpufreq, x86

Resend, to keep the diskussion alive
Am Mittwoch, den 31.03.2010, 08:40 +0200 schrieb Robert Schöne:
> Am Dienstag, den 30.03.2010, 09:56 +0100 schrieb Thomas Renninger:
> > On Tuesday 30 March 2010 07:46:55 Robert Schöne wrote:
> > ...
> > > I really want to keep this diskussion alive until there's a soultion we
> > > can all agree.
> > > So Arjan and Thomas, are there any comments/preferences to the proposed
> > > options?
> > I'd like to extend the powertracer and pass the cpu.
> > This is the only possibility I see to be able to support IO driven
> > frequency switching drivers where the switching code must not be executed
> > on the CPU that gets switched (without executing the tracer on each
> > CPU explicitly which does not make sense).
> As I understand you, you want to extend the event data of each power
> trace event, which would be fine for me.
> However, I think this would need some resorting of the events for the
> perf tools.
> @Arjan would this be feasible?
> > 
> > The next problem where current implementation is unfixable broken with
> > the tracer just passing the frequency is the fact that several CPUs
> > could get switched with one MSR write to a depending CPU (SW_ANY).
> > The same btw applies to C-states for which the tracer is used in
> > the same way (compare with 8.4.2.2 _CSD (C-State Dependency) of a
> > current ACPI spec).
> > 
> > No idea what the impact on userspace tools is, if I find some time
> > I can have a look at timechart how trace data gets read/used.
> > But I fear the Cstate tracing is used in some more tools already?
> > It would be great to get feedback/suggestions from people making use
> > of it already.
> > 
> > Below is still broken, but should make things at least a bit better:
> > 
> > ---
> > X86 cpufreq: Fix powertracer in acpi-cpufreq and exec it on the correct cpu(s)
> > 
> > Several things are broken with the tracer currently.
> > This patch fixes:
> >   - With the userspace governor the wrong cpu could get tracked if the target
> >     function is executed on a CPU which does not get switched
> >   - In SW_ALL CPU dependency case (CPUFREQ_SHARED_TYPE_ALL) only one CPU got
> >     tracked. Now all CPUs that depend on each other are tracked.
> > 
> > What this patch does not fix:
> >   - In SW_ANY case (CPUFREQ_SHARED_TYPE_ANY) it's enough to write to a MSR of
> >     one of the depending CPUs. The power trace macro misses the ability
> >     to pass the cpu. Thus only one of the depending CPUs gets tracked correctly.
> >     To be able to fix this the power trace macro must get extended.
> > 
> > 
> > Signed-off-by: Thomas Renninger <trenn@suse.de>
> > CC: Robert Schöne <robert.schoene@tu-dresden.de>
> > CC: x86@kernel.org
> > CC: cpufreq <cpufreq@vger.kernel.org>
> > CC: Arjan van de Ven <arjan@linux.intel.com>
> > CC: stable@kernel.org
> > 
> > ---
> >  arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c |    5 ++++-
> >  1 files changed, 4 insertions(+), 1 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > index 1b1920f..259c49e 100644
> > --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > +++ b/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> > @@ -144,6 +144,7 @@ struct drv_cmd {
> >  		struct io_addr io;
> >  	} addr;
> >  	u32 val;
> > +	unsigned int frequency;
> >  };
> >  
> >  /* Called via smp_call_function_single(), on the target CPU */
> > @@ -177,11 +178,13 @@ static void do_drv_write(void *_cmd)
> >  		rdmsr(cmd->addr.msr.reg, lo, hi);
> >  		lo = (lo & ~INTEL_MSR_RANGE) | (cmd->val & INTEL_MSR_RANGE);
> >  		wrmsr(cmd->addr.msr.reg, lo, hi);
> > +		trace_power_frequency(POWER_PSTATE, cmd->frequency);
> >  		break;
> >  	case SYSTEM_IO_CAPABLE:
> >  		acpi_os_write_port((acpi_io_address)cmd->addr.io.port,
> >  				cmd->val,
> >  				(u32)cmd->addr.io.bit_width);
> > +		trace_power_frequency(POWER_PSTATE, cmd->frequency);
> >  		break;
> >  	default:
> >  		break;
> > @@ -363,7 +366,7 @@ static int acpi_cpufreq_target(struct cpufreq_policy *policy,
> >  		}
> >  	}
> >  
> > -	trace_power_frequency(POWER_PSTATE, data->freq_table[next_state].frequency);
> > +	cmd.frequency = data->freq_table[next_state].frequency;
> >  
> >  	switch (data->cpu_feature) {
> >  	case SYSTEM_INTEL_MSR_CAPABLE:
> >   
> > 
> 
> 
> -- 
> Robert Schoene
> Technische Universitaet Dresden
> Zentrum fuer Informationsdienste und Hochleistungsrechnen
> 01062 Dresden
> 
> Tel.: (0351) 463-42483, Fax: (0351) 463-37773
> E-Mail: Robert.Schoene@tu-dresden.de
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-04-12  6:53 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-12 13:17 [PATCH] trace power_frequency events on the correct cpu (for Intel x86 CPUs) Robert Schöne
2010-03-12 14:52 ` Arjan van de Ven
2010-03-12 15:41   ` Robert Schöne
2010-03-15 10:51     ` Thomas Renninger
2010-03-16  7:13       ` Robert Schöne
2010-03-16  9:57         ` Ingo Molnar
2010-03-16 10:59         ` Thomas Renninger
2010-03-16 14:19           ` Arjan van de Ven
2010-03-16 14:50             ` Thomas Renninger
2010-03-16 16:40               ` Arjan van de Ven
2010-03-17 16:36                 ` Thomas Renninger
2010-03-17 16:41                   ` Arjan van de Ven
2010-03-18 20:43                     ` Thomas Renninger
2010-03-19  8:01                       ` Robert Schöne
2010-03-20 21:37                         ` Thomas Renninger
2010-03-22  0:42                           ` Arjan van de Ven
2010-03-22  7:04                             ` Robert Schöne
2010-03-22 13:57                               ` Arjan van de Ven
2010-03-23 16:28                                 ` Robert Schöne
2010-03-23 16:57                                   ` Thomas Renninger
2010-03-23 16:58                                     ` Arjan van de Ven
2010-03-24  7:07                                       ` Robert Schöne
2010-03-30  5:46                                         ` Robert Schöne
2010-03-30  8:56                                           ` Thomas Renninger
2010-03-31  6:40                                             ` Robert Schöne
2010-04-12  6:53                                               ` Robert Schöne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.