linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] x86,cpufreq: unify APERF/MPERF computation
@ 2017-06-17  3:03 Len Brown
  2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
  2017-06-19 12:45 ` [GIT PULL] x86,cpufreq: unify APERF/MPERF computation Rafael J. Wysocki
  0 siblings, 2 replies; 9+ messages in thread
From: Len Brown @ 2017-06-17  3:03 UTC (permalink / raw)
  To: rafael; +Cc: x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel


In-Reply-To: 

Hi Rafael,

This patch series has 3 goals:

1. Make "cpu MHz" in /proc/cpuinfo supportable.

2. Make /sys/.../cpufreq/scaling_cur_freq meaningful
   and consistent on modern x86 systems.

3. Use 1. and 2. to remove scheduler and cpufreq overhead

There are 3 main changes since this series was proposed
about a year ago:

This update responds to distro feedback to make /proc/cpuinfo
"cpu MHz" constant.  Originally, we had proposed making it return
the same dynamic value as cpufreq sysfs.

Some community members suggested that sysfs MHz values should
be meaninful, even down to 10ms intervals.  So this has been
changed, versus the original proposal to not re-compute
at intervals shorter than 100ms.

(For those who really care about observing frequency, the
 recommendation remains to use turbostat(8) or equivalent utility,
 which can reliably measure concurrent intervals of arbitrary length)

The intel_pstate sampling mechanism has changed.
Originally this series removed an intel_pstate timer in HWP mode.
Now it removes the analogous scheduler call-back.

Most recently, in response to posting this patch on the list
about 10-days ago, the patch to remove frequency calculation
from inside intel_pstate was dropped, in order to maintain compatibility
with tracing scripts.  Also, the order of the last two patches
has been exchanged.

Please let me know if you see any issues with this series.

thanks!
Len Brown, Intel Open Source Technology Center

The following changes since commit 3c2993b8c6143d8a5793746a54eba8f86f95240f:

  Linux 4.12-rc4 (2017-06-04 16:47:43 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git x86

for you to fetch changes up to d020eed98440faa4a529c621f881aa9fda296956:

  intel_pstate: skip scheduler hook when in "performance" mode. (2017-06-16 19:11:13 -0700)

----------------------------------------------------------------
Len Brown (4):
      x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
      x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
      intel_pstate: delete scheduler hook in HWP mode
      intel_pstate: skip scheduler hook when in "performance" mode.

 arch/x86/kernel/cpu/Makefile     |  1 +
 arch/x86/kernel/cpu/aperfmperf.c | 82 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/proc.c       | 10 +----
 drivers/cpufreq/cpufreq.c        |  7 +++-
 drivers/cpufreq/intel_pstate.c   | 18 +++------
 include/linux/cpufreq.h          | 13 +++++++
 6 files changed, 109 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/aperfmperf.c

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
  2017-06-17  3:03 [GIT PULL] x86,cpufreq: unify APERF/MPERF computation Len Brown
@ 2017-06-17  3:03 ` Len Brown
  2017-06-17  3:03   ` [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF Len Brown
                     ` (3 more replies)
  2017-06-19 12:45 ` [GIT PULL] x86,cpufreq: unify APERF/MPERF computation Rafael J. Wysocki
  1 sibling, 4 replies; 9+ messages in thread
From: Len Brown @ 2017-06-17  3:03 UTC (permalink / raw)
  To: rafael
  Cc: x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel, Len Brown

From: Len Brown <len.brown@intel.com>

cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz
that is otherwise reported in x86 /proc/cpuinfo "cpu MHz".

There are four problems with this scheme,
any of them is sufficient justification to delete it.

1. Depending on which cpufreq driver is loaded, the behavior
   of this field is different.

2. Distros complain that they have to explain to users
   why and how this field changes.  Distros have requested a constant.

3. The two major providers of this information, acpi_cpufreq
   and intel_pstate, both "get it wrong" in different ways.

   acpi_cpufreq lies to the user by telling them that
   they are running at whatever frequency was last
   requested by software.

   intel_pstate lies to the user by telling them that
   they are running at the average frequency computed
   over an undefined measurement.  But an average computed
   over an undefined interval, is itself, undefined...

4. On modern processors, user space utilities, such as
   turbostat(1), are more accurate and more precise, while
   supporing concurrent measurement over arbitrary intervals.

Users who have been consulting /proc/cpuinfo to
track changing CPU frequency will be dissapointed that
it no longer wiggles -- perhaps being unaware of the
limitations of the information they have been consuming.

Yes, they can change their scripts to look in sysfs
cpufreq/scaling_cur_frequency.  Here they will find the same
data of dubious quality here removed from /proc/cpuinfo.
The value in sysfs will be addressed in a subsequent patch
to address issues 1-3, above.

Issue 4 will remain -- users that really care about
accurate frequency information should not be using either
proc or sysfs kernel interfaces.
They should be using using turbostat(8), or a similar
purpose-built analysis tool.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 arch/x86/kernel/cpu/proc.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 6df621a..218f798 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -2,7 +2,6 @@
 #include <linux/timex.h>
 #include <linux/string.h>
 #include <linux/seq_file.h>
-#include <linux/cpufreq.h>
 
 /*
  *	Get CPU information for use by the procfs.
@@ -76,14 +75,9 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 	if (c->microcode)
 		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
 
-	if (cpu_has(c, X86_FEATURE_TSC)) {
-		unsigned int freq = cpufreq_quick_get(cpu);
-
-		if (!freq)
-			freq = cpu_khz;
+	if (cpu_has(c, X86_FEATURE_TSC))
 		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
-			   freq / 1000, (freq % 1000));
-	}
+			   cpu_khz / 1000, (cpu_khz % 1000));
 
 	/* Cache size */
 	if (c->x86_cache_size >= 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
  2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
@ 2017-06-17  3:03   ` Len Brown
  2017-06-20 10:15     ` Thomas Gleixner
  2017-06-17  3:03   ` [PATCH 3/4] intel_pstate: delete scheduler hook in HWP mode Len Brown
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Len Brown @ 2017-06-17  3:03 UTC (permalink / raw)
  To: rafael
  Cc: x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel, Len Brown

From: Len Brown <len.brown@intel.com>

The goal of this change is to give users a uniform and meaningful
result when they read /sys/...cpufreq/scaling_cur_freq
on modern x86 hardware, as compared to what they get today.

Modern x86 processors include the hardware needed
to accurately calculate frequency over an interval --
APERF, MPERF, and the TSC.

Here we provide an x86 routine to make this calculation
on supported hardware, and use it in preference to any
driver driver-specific cpufreq_driver.get() routine.

MHz is computed like so:

MHz = base_MHz * delta_APERF / delta_MPERF

MHz is the average frequency of the busy processor
over a measurement interval.  The interval is
defined to be the time between successive invocations
of aperfmperf_khz_on_cpu(), which are expected to to
happen on-demand when users read sysfs attribute
cpufreq/scaling_cur_freq.

As with previous methods of calculating MHz,
idle time is excluded.

base_MHz above is from TSC calibration global "cpu_khz".

This x86 native method to calculate MHz returns a meaningful result
no matter if P-states are controlled by hardware or firmware
and/or if the Linux cpufreq sub-system is or is-not installed.

When this routine is invoked more frequently, the measurement
interval becomes shorter.  However, the code limits re-computation
to 10ms intervals so that average frequency remains meaningful.

Discerning users are encouraged to take advantage of
the turbostat(8) utility, which can gracefully handle
concurrent measurement intervals of arbitrary length.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 arch/x86/kernel/cpu/Makefile     |  1 +
 arch/x86/kernel/cpu/aperfmperf.c | 82 ++++++++++++++++++++++++++++++++++++++++
 drivers/cpufreq/cpufreq.c        |  7 +++-
 include/linux/cpufreq.h          | 13 +++++++
 4 files changed, 102 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/aperfmperf.c

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 5200001..cdf8249 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -21,6 +21,7 @@ obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
 obj-y			+= bugs.o
+obj-$(CONFIG_CPU_FREQ)	+= aperfmperf.o
 
 obj-$(CONFIG_PROC_FS)	+= proc.o
 obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
new file mode 100644
index 0000000..5ccf63a
--- /dev/null
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -0,0 +1,82 @@
+/*
+ * x86 APERF/MPERF KHz calculation for
+ * /sys/.../cpufreq/scaling_cur_freq
+ *
+ * Copyright (C) 2017 Intel Corp.
+ * Author: Len Brown <len.brown@intel.com>
+ *
+ * This file is licensed under GPLv2.
+ */
+
+#include <linux/jiffies.h>
+#include <linux/math64.h>
+#include <linux/percpu.h>
+#include <linux/smp.h>
+
+struct aperfmperf_sample {
+	unsigned int khz;
+	unsigned long jiffies;
+	u64 aperf;
+	u64 mperf;
+};
+
+static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
+
+/*
+ * aperfmperf_snapshot_khz()
+ * On the current CPU, snapshot APERF, MPERF, and jiffies
+ * unless we already did it within 10ms
+ * calculate kHz, save snapshot
+ */
+static void aperfmperf_snapshot_khz(void *dummy)
+{
+	u64 aperf, aperf_delta;
+	u64 mperf, mperf_delta;
+	struct aperfmperf_sample *s = &get_cpu_var(samples);
+
+	/* Don't bother re-computing within 10 ms */
+	if (time_before(jiffies, s->jiffies + HZ/100))
+		goto out;
+
+	rdmsrl(MSR_IA32_APERF, aperf);
+	rdmsrl(MSR_IA32_MPERF, mperf);
+
+	aperf_delta = aperf - s->aperf;
+	mperf_delta = mperf - s->mperf;
+
+	/*
+	 * There is no architectural guarantee that MPERF
+	 * increments faster than we can read it.
+	 */
+	if (mperf_delta == 0)
+		goto out;
+
+	/*
+	 * if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then
+	 *	khz = (cpu_khz * aperf_delta) / mperf_delta
+	 */
+	if (div64_u64(ULLONG_MAX, cpu_khz) > aperf_delta)
+		s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
+	else	/* khz = aperf_delta / (mperf_delta / cpu_khz) */
+		s->khz = div64_u64(aperf_delta,
+			div64_u64(mperf_delta, cpu_khz));
+	s->jiffies = jiffies;
+	s->aperf = aperf;
+	s->mperf = mperf;
+
+out:
+	put_cpu_var(samples);
+}
+
+unsigned int aperfmperf_khz_on_cpu(unsigned int cpu)
+{
+	if (!cpu_khz)
+		return 0;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return 0;
+
+	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
+
+	return per_cpu(samples.khz, cpu);
+}
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 26b643d..a667fac 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -635,8 +635,13 @@ show_one(scaling_max_freq, max);
 static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 {
 	ssize_t ret;
+	unsigned int freq;
 
-	if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get)
+	freq = arch_freq_get_on_cpu(policy->cpu);
+	if (freq)
+		ret = sprintf(buf, "%u\n", freq);
+	else if (cpufreq_driver && cpufreq_driver->setpolicy &&
+			cpufreq_driver->get)
 		ret = sprintf(buf, "%u\n", cpufreq_driver->get(policy->cpu));
 	else
 		ret = sprintf(buf, "%u\n", policy->cur);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a5ce0bbe..cfc6220 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -883,6 +883,19 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
 }
 #endif
 
+#ifdef CONFIG_X86
+extern unsigned int aperfmperf_khz_on_cpu(unsigned int cpu);
+static inline unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	return aperfmperf_khz_on_cpu(cpu);
+}
+#else
+static inline unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	return 0;
+}
+#endif
+
 /* the following are really really optional */
 extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;
 extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] intel_pstate: delete scheduler hook in HWP mode
  2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
  2017-06-17  3:03   ` [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF Len Brown
@ 2017-06-17  3:03   ` Len Brown
  2017-06-17  3:03   ` [PATCH 4/4] intel_pstate: skip scheduler hook when in "performance" mode Len Brown
  2017-06-20 10:15   ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Thomas Gleixner
  3 siblings, 0 replies; 9+ messages in thread
From: Len Brown @ 2017-06-17  3:03 UTC (permalink / raw)
  To: rafael
  Cc: x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel, Len Brown

From: Len Brown <len.brown@intel.com>

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

In HWP mode, maintaining that value was the sole purpose of
the scheduler hook, intel_pstate_update_util_hwp(),
so it can now be removed.

Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index b7de5bd..4ec5668 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1732,16 +1732,6 @@ static void intel_pstate_adjust_pstate(struct cpudata *cpu, int target_pstate)
 		fp_toint(cpu->iowait_boost * 100));
 }
 
-static void intel_pstate_update_util_hwp(struct update_util_data *data,
-					 u64 time, unsigned int flags)
-{
-	struct cpudata *cpu = container_of(data, struct cpudata, update_util);
-	u64 delta_ns = time - cpu->sample.time;
-
-	if ((s64)delta_ns >= INTEL_PSTATE_HWP_SAMPLING_INTERVAL)
-		intel_pstate_sample(cpu, time);
-}
-
 static void intel_pstate_update_util_pid(struct update_util_data *data,
 					 u64 time, unsigned int flags)
 {
@@ -1933,6 +1923,9 @@ static void intel_pstate_set_update_util_hook(unsigned int cpu_num)
 {
 	struct cpudata *cpu = all_cpu_data[cpu_num];
 
+	if (hwp_active)
+		return;
+
 	if (cpu->update_util_set)
 		return;
 
@@ -2557,7 +2550,6 @@ static int __init intel_pstate_init(void)
 		} else {
 			hwp_active++;
 			intel_pstate.attr = hwp_cpufreq_attrs;
-			pstate_funcs.update_util = intel_pstate_update_util_hwp;
 			goto hwp_cpu_matched;
 		}
 	} else {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] intel_pstate: skip scheduler hook when in "performance" mode.
  2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
  2017-06-17  3:03   ` [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF Len Brown
  2017-06-17  3:03   ` [PATCH 3/4] intel_pstate: delete scheduler hook in HWP mode Len Brown
@ 2017-06-17  3:03   ` Len Brown
  2017-06-20 10:15   ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Thomas Gleixner
  3 siblings, 0 replies; 9+ messages in thread
From: Len Brown @ 2017-06-17  3:03 UTC (permalink / raw)
  To: rafael
  Cc: x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel, Len Brown

From: Len Brown <len.brown@intel.com>

When the governor is set to "performance", intel_pstate does not
need the scheduler hook for doing any calculations.  Under these
conditions, its only purpose is to continue to maintain
cpufreq/scaling_cur_freq.

The cpufreq/scaling_cur_freq sysfs attribute is now provided by
shared x86 cpufreq code on modern x86 systems, including
all systems supported by the intel_pstate driver.

So in "performance" governor mode, the scheduler hook can be skipped.
This applies to both in Software and Hardware P-state control modes.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4ec5668..4538182 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2031,10 +2031,10 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 		 */
 		intel_pstate_clear_update_util_hook(policy->cpu);
 		intel_pstate_max_within_limits(cpu);
+	} else {
+		intel_pstate_set_update_util_hook(policy->cpu);
 	}
 
-	intel_pstate_set_update_util_hook(policy->cpu);
-
 	if (hwp_active)
 		intel_pstate_hwp_set(policy->cpu);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] x86,cpufreq: unify APERF/MPERF computation
  2017-06-17  3:03 [GIT PULL] x86,cpufreq: unify APERF/MPERF computation Len Brown
  2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
@ 2017-06-19 12:45 ` Rafael J. Wysocki
  1 sibling, 0 replies; 9+ messages in thread
From: Rafael J. Wysocki @ 2017-06-19 12:45 UTC (permalink / raw)
  To: Len Brown
  Cc: Rafael J. Wysocki, the arch/x86 maintainers, Srinivas Pandruvada,
	Peter Zijlstra, Linux PM, Linux Kernel Mailing List, Ingo Molnar,
	Thomas Gleixner

On Sat, Jun 17, 2017 at 5:03 AM, Len Brown <lenb@kernel.org> wrote:
>
> In-Reply-To:
>
> Hi Rafael,
>
> This patch series has 3 goals:
>
> 1. Make "cpu MHz" in /proc/cpuinfo supportable.
>
> 2. Make /sys/.../cpufreq/scaling_cur_freq meaningful
>    and consistent on modern x86 systems.
>
> 3. Use 1. and 2. to remove scheduler and cpufreq overhead
>
> There are 3 main changes since this series was proposed
> about a year ago:
>
> This update responds to distro feedback to make /proc/cpuinfo
> "cpu MHz" constant.  Originally, we had proposed making it return
> the same dynamic value as cpufreq sysfs.
>
> Some community members suggested that sysfs MHz values should
> be meaninful, even down to 10ms intervals.  So this has been
> changed, versus the original proposal to not re-compute
> at intervals shorter than 100ms.
>
> (For those who really care about observing frequency, the
>  recommendation remains to use turbostat(8) or equivalent utility,
>  which can reliably measure concurrent intervals of arbitrary length)
>
> The intel_pstate sampling mechanism has changed.
> Originally this series removed an intel_pstate timer in HWP mode.
> Now it removes the analogous scheduler call-back.
>
> Most recently, in response to posting this patch on the list
> about 10-days ago, the patch to remove frequency calculation
> from inside intel_pstate was dropped, in order to maintain compatibility
> with tracing scripts.  Also, the order of the last two patches
> has been exchanged.
>
> Please let me know if you see any issues with this series.
>
> thanks!
> Len Brown, Intel Open Source Technology Center
>
> The following changes since commit 3c2993b8c6143d8a5793746a54eba8f86f95240f:
>
>   Linux 4.12-rc4 (2017-06-04 16:47:43 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git x86
>
> for you to fetch changes up to d020eed98440faa4a529c621f881aa9fda296956:
>
>   intel_pstate: skip scheduler hook when in "performance" mode. (2017-06-16 19:11:13 -0700)
>
> ----------------------------------------------------------------
> Len Brown (4):
>       x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
>       x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
>       intel_pstate: delete scheduler hook in HWP mode
>       intel_pstate: skip scheduler hook when in "performance" mode.

I'd like to hear from the x86 maintainers about the first two patches.
At least I'd like to know that there are no objections there.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
  2017-06-17  3:03   ` [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF Len Brown
@ 2017-06-20 10:15     ` Thomas Gleixner
  2017-06-24  5:03       ` Len Brown
  0 siblings, 1 reply; 9+ messages in thread
From: Thomas Gleixner @ 2017-06-20 10:15 UTC (permalink / raw)
  To: Len Brown
  Cc: rafael, x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel,
	Len Brown

On Fri, 16 Jun 2017, Len Brown wrote:
> +#include <linux/jiffies.h>
> +#include <linux/math64.h>
> +#include <linux/percpu.h>
> +#include <linux/smp.h>
> +
> +struct aperfmperf_sample {
> +	unsigned int khz;
> +	unsigned long jiffies;
> +	u64 aperf;
> +	u64 mperf;

Nit. Please write these in tabular fashion:

	unsigned	int khz;
	unsigned long	jiffies;
	u64		aperf;
	u64		mperf;

> +};
> +
> +static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
> +
> +/*
> + * aperfmperf_snapshot_khz()
> + * On the current CPU, snapshot APERF, MPERF, and jiffies
> + * unless we already did it within 10ms
> + * calculate kHz, save snapshot
> + */
> +static void aperfmperf_snapshot_khz(void *dummy)
> +{
> +	u64 aperf, aperf_delta;
> +	u64 mperf, mperf_delta;
> +	struct aperfmperf_sample *s = &get_cpu_var(samples);

This is invoked via a smp function call, so you want

     this_cpu_ptr(samples)

here.

> +	/* Don't bother re-computing within 10 ms */
> +	if (time_before(jiffies, s->jiffies + HZ/100))
> +		goto out;

That way you can spare the gotos and simply return

> index a5ce0bbe..cfc6220 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -883,6 +883,19 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
>  }
>  #endif
>  
> +#ifdef CONFIG_X86
> +extern unsigned int aperfmperf_khz_on_cpu(unsigned int cpu);
> +static inline unsigned int arch_freq_get_on_cpu(int cpu)

Please don't add arch specific crap in general headers.

Also having cpu as int and unsigned int is not really consistent.

The simple way to avoid that is to have a weak function and have an arch
override for it. If that does not work because the cpufreq stuff must be
built as a module, then you still can avoid CONFIG_X86 and have something
like CONFIG_ARCH_HAS_FOO.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"
  2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
                     ` (2 preceding siblings ...)
  2017-06-17  3:03   ` [PATCH 4/4] intel_pstate: skip scheduler hook when in "performance" mode Len Brown
@ 2017-06-20 10:15   ` Thomas Gleixner
  3 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2017-06-20 10:15 UTC (permalink / raw)
  To: Len Brown
  Cc: rafael, x86, srinivas.pandruvada, peterz, linux-pm, linux-kernel,
	Len Brown

On Fri, 16 Jun 2017, Len Brown wrote:
> Issue 4 will remain -- users that really care about
> accurate frequency information should not be using either
> proc or sysfs kernel interfaces.
> They should be using using turbostat(8), or a similar
> purpose-built analysis tool.
> 
> Signed-off-by: Len Brown <len.brown@intel.com>

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF
  2017-06-20 10:15     ` Thomas Gleixner
@ 2017-06-24  5:03       ` Len Brown
  0 siblings, 0 replies; 9+ messages in thread
From: Len Brown @ 2017-06-24  5:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, X86 ML, Srinivas Pandruvada, Peter Zijlstra,
	Linux PM list, linux-kernel, Len Brown

[-- Attachment #1: Type: text/plain, Size: 2792 bytes --]

On Tue, Jun 20, 2017 at 6:15 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 16 Jun 2017, Len Brown wrote:
>> +#include <linux/jiffies.h>
>> +#include <linux/math64.h>
>> +#include <linux/percpu.h>
>> +#include <linux/smp.h>
>> +
>> +struct aperfmperf_sample {
>> +     unsigned int khz;
>> +     unsigned long jiffies;
>> +     u64 aperf;
>> +     u64 mperf;
>
> Nit. Please write these in tabular fashion:
>         unsigned        int khz;
>         unsigned long   jiffies;
>         u64             aperf;
>         u64             mperf;

sure, no problem -- I agreed that looks better.

But it seems there is some inconsistency about this style nit --
even in this directory.  If there is consensus going forward,
would it make sense for coding-style.rst and checkpatch.pl
to squawk about this, so you don't have to?


>> +};
>> +
>> +static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
>> +
>> +/*
>> + * aperfmperf_snapshot_khz()
>> + * On the current CPU, snapshot APERF, MPERF, and jiffies
>> + * unless we already did it within 10ms
>> + * calculate kHz, save snapshot
>> + */
>> +static void aperfmperf_snapshot_khz(void *dummy)
>> +{
>> +     u64 aperf, aperf_delta;
>> +     u64 mperf, mperf_delta;
>> +     struct aperfmperf_sample *s = &get_cpu_var(samples);
>
> This is invoked via a smp function call, so you want
>
>      this_cpu_ptr(samples)
>
> here.

done. thanks!

>> +     /* Don't bother re-computing within 10 ms */
>> +     if (time_before(jiffies, s->jiffies + HZ/100))
>> +             goto out;
>
> That way you can spare the gotos and simply return

happy to remove the gotos, thanks!

>> index a5ce0bbe..cfc6220 100644
>> --- a/include/linux/cpufreq.h
>> +++ b/include/linux/cpufreq.h
>> @@ -883,6 +883,19 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
>>  }
>>  #endif
>>
>> +#ifdef CONFIG_X86
>> +extern unsigned int aperfmperf_khz_on_cpu(unsigned int cpu);
>> +static inline unsigned int arch_freq_get_on_cpu(int cpu)

> ... having cpu as int and unsigned int is not really consistent.

True.  the SMP code uses int cpu, while cpufreq_policy.cpu is an unsigned int.
I'll use "int cpu".

> Please don't add arch specific crap in general headers.
>
> The simple way to avoid that is to have a weak function and have an arch
> override for it. If that does not work because the cpufreq stuff must be
> built as a module, then you still can avoid CONFIG_X86 and have something
> like CONFIG_ARCH_HAS_FOO.

Thanks -- this is not modular code, and so yes, __weak works -- hadn't
had occasion to use it before...
Attached is the incremental diff responding to your comments, in case
that is convenient.
I'll re-send this series with this patch updated in a sec.

thanks,
Len Brown, Intel Open Source Technology Center

[-- Attachment #2: git.diff --]
[-- Type: text/plain, Size: 2614 bytes --]

diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c
index 5ccf63a..d869c86 100644
--- a/arch/x86/kernel/cpu/aperfmperf.c
+++ b/arch/x86/kernel/cpu/aperfmperf.c
@@ -14,10 +14,10 @@
 #include <linux/smp.h>
 
 struct aperfmperf_sample {
-	unsigned int khz;
-	unsigned long jiffies;
-	u64 aperf;
-	u64 mperf;
+	unsigned int	khz;
+	unsigned long	jiffies;
+	u64	aperf;
+	u64	mperf;
 };
 
 static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
@@ -32,11 +32,11 @@ static void aperfmperf_snapshot_khz(void *dummy)
 {
 	u64 aperf, aperf_delta;
 	u64 mperf, mperf_delta;
-	struct aperfmperf_sample *s = &get_cpu_var(samples);
+	struct aperfmperf_sample *s = this_cpu_ptr(&samples);
 
 	/* Don't bother re-computing within 10 ms */
 	if (time_before(jiffies, s->jiffies + HZ/100))
-		goto out;
+		return;
 
 	rdmsrl(MSR_IA32_APERF, aperf);
 	rdmsrl(MSR_IA32_MPERF, mperf);
@@ -49,7 +49,7 @@ static void aperfmperf_snapshot_khz(void *dummy)
 	 * increments faster than we can read it.
 	 */
 	if (mperf_delta == 0)
-		goto out;
+		return;
 
 	/*
 	 * if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then
@@ -63,12 +63,9 @@ static void aperfmperf_snapshot_khz(void *dummy)
 	s->jiffies = jiffies;
 	s->aperf = aperf;
 	s->mperf = mperf;
-
-out:
-	put_cpu_var(samples);
 }
 
-unsigned int aperfmperf_khz_on_cpu(unsigned int cpu)
+unsigned int arch_freq_get_on_cpu(int cpu)
 {
 	if (!cpu_khz)
 		return 0;
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index a667fac..6e7424d 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -632,6 +632,11 @@ show_one(cpuinfo_transition_latency, cpuinfo.transition_latency);
 show_one(scaling_min_freq, min);
 show_one(scaling_max_freq, max);
 
+__weak unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	return 0;
+}
+
 static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 {
 	ssize_t ret;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index cfc6220..905117b 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -883,18 +883,7 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
 }
 #endif
 
-#ifdef CONFIG_X86
-extern unsigned int aperfmperf_khz_on_cpu(unsigned int cpu);
-static inline unsigned int arch_freq_get_on_cpu(int cpu)
-{
-	return aperfmperf_khz_on_cpu(cpu);
-}
-#else
-static inline unsigned int arch_freq_get_on_cpu(int cpu)
-{
-	return 0;
-}
-#endif
+extern unsigned int arch_freq_get_on_cpu(int cpu);
 
 /* the following are really really optional */
 extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-06-24  5:03 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-17  3:03 [GIT PULL] x86,cpufreq: unify APERF/MPERF computation Len Brown
2017-06-17  3:03 ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Len Brown
2017-06-17  3:03   ` [PATCH 2/4] x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF Len Brown
2017-06-20 10:15     ` Thomas Gleixner
2017-06-24  5:03       ` Len Brown
2017-06-17  3:03   ` [PATCH 3/4] intel_pstate: delete scheduler hook in HWP mode Len Brown
2017-06-17  3:03   ` [PATCH 4/4] intel_pstate: skip scheduler hook when in "performance" mode Len Brown
2017-06-20 10:15   ` [PATCH 1/4] x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" Thomas Gleixner
2017-06-19 12:45 ` [GIT PULL] x86,cpufreq: unify APERF/MPERF computation Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).