linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
@ 2017-11-09 10:38 WANG Chao
  2017-11-09 16:06 ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: WANG Chao @ 2017-11-09 10:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vikas Shivappa,
	Rafael J. Wysocki, Kate Stewart, Len Brown, Greg Kroah-Hartman,
	Philippe Ombredanne, Mathias Krause, x86, stable

Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
a serious performance issue when reading from /proc/cpuinfo on system
with aperfmperf.

For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
On a system with 64 cpus, it takes 1.5s to finish running `cat
/proc/cpuinfo`, while it previously was done in 15ms.

Some programs use /proc/cpuinfo during startup or runtime, eg. QEMU.
This patch will restore the previous behavior and let user space happy.

Fixes: 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo)
Cc: stable@kernel.org   # 4.13
Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
---
 arch/x86/kernel/cpu/Makefile | 2 +-
 arch/x86/kernel/cpu/proc.c   | 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 236999c54edc..c60922a66385 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -22,7 +22,7 @@ obj-y			+= common.o
 obj-y			+= rdrand.o
 obj-y			+= match.o
 obj-y			+= bugs.o
-obj-y			+= aperfmperf.o
+obj-$(CONFIG_CPU_FREQ)	+= aperfmperf.o
 
 obj-$(CONFIG_PROC_FS)	+= proc.o
 obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 4378a729b933..6b7e17bf0b71 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -78,10 +78,8 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
 
 	if (cpu_has(c, X86_FEATURE_TSC)) {
-		unsigned int freq = arch_freq_get_on_cpu(cpu);
+		unsigned int freq = cpufreq_quick_get(cpu);
 
-		if (!freq)
-			freq = cpufreq_quick_get(cpu);
 		if (!freq)
 			freq = cpu_khz;
 		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-09 10:38 [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again WANG Chao
@ 2017-11-09 16:06 ` Rafael J. Wysocki
  2017-11-09 22:30   ` Rafael J. Wysocki
  2017-11-10  7:25   ` Ingo Molnar
  0 siblings, 2 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-09 16:06 UTC (permalink / raw)
  To: WANG Chao, Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Vikas Shivappa, Kate Stewart, Len Brown, Greg Kroah-Hartman,
	Philippe Ombredanne, Mathias Krause, x86, Rafael J. Wysocki,
	Linux PM

Hi Linus,

On 11/9/2017 11:38 AM, WANG Chao wrote:
> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> a serious performance issue when reading from /proc/cpuinfo on system
> with aperfmperf.
>
> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> On a system with 64 cpus, it takes 1.5s to finish running `cat
> /proc/cpuinfo`, while it previously was done in 15ms.

Honestly, I'm not sure what to do to address this ATM.

The last requested frequency is only available in the non-HWP case, so 
it cannot be used universally.

> Some programs use /proc/cpuinfo during startup or runtime, eg. QEMU.
> This patch will restore the previous behavior and let user space happy.
>
> Fixes: 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo)
> Cc: stable@kernel.org   # 4.13
> Signed-off-by: WANG Chao <chao.wang@ucloud.cn>
> ---
>   arch/x86/kernel/cpu/Makefile | 2 +-
>   arch/x86/kernel/cpu/proc.c   | 4 +---
>   2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index 236999c54edc..c60922a66385 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -22,7 +22,7 @@ obj-y			+= common.o
>   obj-y			+= rdrand.o
>   obj-y			+= match.o
>   obj-y			+= bugs.o
> -obj-y			+= aperfmperf.o
> +obj-$(CONFIG_CPU_FREQ)	+= aperfmperf.o
>   
>   obj-$(CONFIG_PROC_FS)	+= proc.o
>   obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o
> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
> index 4378a729b933..6b7e17bf0b71 100644
> --- a/arch/x86/kernel/cpu/proc.c
> +++ b/arch/x86/kernel/cpu/proc.c
> @@ -78,10 +78,8 @@ static int show_cpuinfo(struct seq_file *m, void *v)
>   		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>   
>   	if (cpu_has(c, X86_FEATURE_TSC)) {
> -		unsigned int freq = arch_freq_get_on_cpu(cpu);
> +		unsigned int freq = cpufreq_quick_get(cpu);
>   
> -		if (!freq)
> -			freq = cpufreq_quick_get(cpu);
>   		if (!freq)
>   			freq = cpu_khz;
>   		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-09 16:06 ` Rafael J. Wysocki
@ 2017-11-09 22:30   ` Rafael J. Wysocki
  2017-11-10  0:06     ` Rafael J. Wysocki
  2017-11-10 19:11     ` Linus Torvalds
  2017-11-10  7:25   ` Ingo Molnar
  1 sibling, 2 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-09 22:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: WANG Chao, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Rafael J. Wysocki,
	Linux PM, Rafael J. Wysocki

On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki
<rafael.j.wysocki@intel.com> wrote:
> Hi Linus,
>
> On 11/9/2017 11:38 AM, WANG Chao wrote:
>>
>> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
>> a serious performance issue when reading from /proc/cpuinfo on system
>> with aperfmperf.
>>
>> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
>> On a system with 64 cpus, it takes 1.5s to finish running `cat
>> /proc/cpuinfo`, while it previously was done in 15ms.
>
> Honestly, I'm not sure what to do to address this ATM.
>
> The last requested frequency is only available in the non-HWP case, so it
> cannot be used universally.

OK, here's an idea.

c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
in parallel), then wait for a while (say 5 ms; the current 20 ms wait
is overkill) and then aperfmperf_snapshot_khz() can be run once on
each CPU in show_cpuinfo() without taking the "stale cache" threshold
into account.

I'm going to try that and see how far I can get with it.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-09 22:30   ` Rafael J. Wysocki
@ 2017-11-10  0:06     ` Rafael J. Wysocki
  2017-11-10  4:04       ` WANG Chao
  2017-11-10 19:11     ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-10  0:06 UTC (permalink / raw)
  To: WANG Chao
  Cc: Rafael J. Wysocki, Linus Torvalds, Linux Kernel Mailing List,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vikas Shivappa,
	Kate Stewart, Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote:
> On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki
> <rafael.j.wysocki@intel.com> wrote:
> > Hi Linus,
> >
> > On 11/9/2017 11:38 AM, WANG Chao wrote:
> >>
> >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> >> a serious performance issue when reading from /proc/cpuinfo on system
> >> with aperfmperf.
> >>
> >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> >> On a system with 64 cpus, it takes 1.5s to finish running `cat
> >> /proc/cpuinfo`, while it previously was done in 15ms.
> >
> > Honestly, I'm not sure what to do to address this ATM.
> >
> > The last requested frequency is only available in the non-HWP case, so it
> > cannot be used universally.
> 
> OK, here's an idea.
> 
> c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> is overkill) and then aperfmperf_snapshot_khz() can be run once on
> each CPU in show_cpuinfo() without taking the "stale cache" threshold
> into account.
> 
> I'm going to try that and see how far I can get with it.

Below is what I have.

I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in
aperfmperf_snapshot_all(), because 5 ms tended to add too much
variation to the results on my test box.

I think it may be reduced to 10 ms, though.

Chao, can you please try this one and report back?


---
 arch/x86/kernel/cpu/aperfmperf.c |   42 ++++++++++++++++++++++++++++-----------
 arch/x86/kernel/cpu/cpu.h        |    4 +++
 arch/x86/kernel/cpu/proc.c       |    5 +++-
 3 files changed, 39 insertions(+), 12 deletions(-)

Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c
+++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c
@@ -14,6 +14,8 @@
 #include <linux/percpu.h>
 #include <linux/smp.h>
 
+#include "cpu.h"
+
 struct aperfmperf_sample {
 	unsigned int	khz;
 	ktime_t	time;
@@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
 	u64 aperf, aperf_delta;
 	u64 mperf, mperf_delta;
 	struct aperfmperf_sample *s = this_cpu_ptr(&samples);
-	ktime_t now = ktime_get();
-	s64 time_delta = ktime_ms_delta(now, s->time);
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -57,15 +57,10 @@ static void aperfmperf_snapshot_khz(void
 	if (mperf_delta == 0)
 		return;
 
-	s->time = now;
+	s->time = ktime_get();
 	s->aperf = aperf;
 	s->mperf = mperf;
-
-	/* If the previous iteration was too long ago, discard it. */
-	if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
-		s->khz = 0;
-	else
-		s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
+	s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
 }
 
 unsigned int arch_freq_get_on_cpu(int cpu)
@@ -82,16 +77,41 @@ unsigned int arch_freq_get_on_cpu(int cp
 	/* Don't bother re-computing within the cache threshold time. */
 	time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
 	khz = per_cpu(samples.khz, cpu);
-	if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
+	if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
 		return khz;
 
 	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
 	khz = per_cpu(samples.khz, cpu);
-	if (khz)
+	if (time_delta <= APERFMPERF_STALE_THRESHOLD_MS)
 		return khz;
 
+	/* If the previous iteration was too long ago, take a new data point. */
 	msleep(APERFMPERF_REFRESH_DELAY_MS);
 	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
 
 	return per_cpu(samples.khz, cpu);
 }
+
+void aperfmperf_snapshot_all(void)
+{
+	if (!cpu_khz)
+		return;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return;
+
+	smp_call_function_many(cpu_online_mask, aperfmperf_snapshot_khz, NULL, 1);
+	msleep(APERFMPERF_REFRESH_DELAY_MS);
+}
+
+unsigned int aperfmperf_snapshot_cpu(int cpu)
+{
+	if (!cpu_khz)
+		return 0;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return 0;
+
+	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
+	return per_cpu(samples.khz, cpu);
+}
Index: linux-pm/arch/x86/kernel/cpu/cpu.h
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/cpu.h
+++ linux-pm/arch/x86/kernel/cpu/cpu.h
@@ -47,4 +47,8 @@ extern const struct cpu_dev *const __x86
 
 extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+
+extern unsigned int aperfmperf_snapshot_cpu(int cpu);
+extern void aperfmperf_snapshot_all(void);
+
 #endif /* ARCH_X86_CPU_H */
Index: linux-pm/arch/x86/kernel/cpu/proc.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/proc.c
+++ linux-pm/arch/x86/kernel/cpu/proc.c
@@ -5,6 +5,8 @@
 #include <linux/seq_file.h>
 #include <linux/cpufreq.h>
 
+#include "cpu.h"
+
 /*
  *	Get CPU information for use by the procfs.
  */
@@ -78,7 +80,7 @@ static int show_cpuinfo(struct seq_file
 		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
 
 	if (cpu_has(c, X86_FEATURE_TSC)) {
-		unsigned int freq = arch_freq_get_on_cpu(cpu);
+		unsigned int freq = aperfmperf_snapshot_cpu(cpu);
 
 		if (!freq)
 			freq = cpufreq_quick_get(cpu);
@@ -141,6 +143,7 @@ static int show_cpuinfo(struct seq_file
 
 static void *c_start(struct seq_file *m, loff_t *pos)
 {
+	aperfmperf_snapshot_all();
 	*pos = cpumask_next(*pos - 1, cpu_online_mask);
 	if ((*pos) < nr_cpu_ids)
 		return &cpu_data(*pos);

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-10  0:06     ` Rafael J. Wysocki
@ 2017-11-10  4:04       ` WANG Chao
  2017-11-10  4:11         ` WANG Chao
  0 siblings, 1 reply; 31+ messages in thread
From: WANG Chao @ 2017-11-10  4:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linus Torvalds, Linux Kernel Mailing List,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vikas Shivappa,
	Kate Stewart, Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On 11/10/17 at 01:06P, Rafael J. Wysocki wrote:
> On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote:
> > On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki
> > <rafael.j.wysocki@intel.com> wrote:
> > > Hi Linus,
> > >
> > > On 11/9/2017 11:38 AM, WANG Chao wrote:
> > >>
> > >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> > >> a serious performance issue when reading from /proc/cpuinfo on system
> > >> with aperfmperf.
> > >>
> > >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > >> On a system with 64 cpus, it takes 1.5s to finish running `cat
> > >> /proc/cpuinfo`, while it previously was done in 15ms.
> > >
> > > Honestly, I'm not sure what to do to address this ATM.
> > >
> > > The last requested frequency is only available in the non-HWP case, so it
> > > cannot be used universally.
> > 
> > OK, here's an idea.
> > 
> > c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> > in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> > is overkill) and then aperfmperf_snapshot_khz() can be run once on
> > each CPU in show_cpuinfo() without taking the "stale cache" threshold
> > into account.
> > 
> > I'm going to try that and see how far I can get with it.
> 
> Below is what I have.
> 
> I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in
> aperfmperf_snapshot_all(), because 5 ms tended to add too much
> variation to the results on my test box.
> 
> I think it may be reduced to 10 ms, though.
> 
> Chao, can you please try this one and report back?

Hi, Rafael

Thanks for the patch. But it doesn't work for me. lscpu takes 1.5s to
finish on a 64 cpus AMD box with aperfmperf.

You missed the fact that c_start() will also be called by c_next().

But I don't think the overall idea is good enough. I think /proc/cpuinfo
is too general for usespace too be delayed, no matter it's 10ms or 20ms.

My point is cpu MHz is best to use a cached value for quick access. If
people are looking for reliable and accurate cpu frequency,
/proc/cpuinfo is probably a bad idae.

What do you think?

WANG Chao

> 
> 
> ---
>  arch/x86/kernel/cpu/aperfmperf.c |   42 ++++++++++++++++++++++++++++-----------
>  arch/x86/kernel/cpu/cpu.h        |    4 +++
>  arch/x86/kernel/cpu/proc.c       |    5 +++-
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c
> +++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c
> @@ -14,6 +14,8 @@
>  #include <linux/percpu.h>
>  #include <linux/smp.h>
>  
> +#include "cpu.h"
> +
>  struct aperfmperf_sample {
>  	unsigned int	khz;
>  	ktime_t	time;
> @@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
>  	u64 aperf, aperf_delta;
>  	u64 mperf, mperf_delta;
>  	struct aperfmperf_sample *s = this_cpu_ptr(&samples);
> -	ktime_t now = ktime_get();
> -	s64 time_delta = ktime_ms_delta(now, s->time);
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> @@ -57,15 +57,10 @@ static void aperfmperf_snapshot_khz(void
>  	if (mperf_delta == 0)
>  		return;
>  
> -	s->time = now;
> +	s->time = ktime_get();
>  	s->aperf = aperf;
>  	s->mperf = mperf;
> -
> -	/* If the previous iteration was too long ago, discard it. */
> -	if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
> -		s->khz = 0;
> -	else
> -		s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
> +	s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
>  }
>  
>  unsigned int arch_freq_get_on_cpu(int cpu)
> @@ -82,16 +77,41 @@ unsigned int arch_freq_get_on_cpu(int cp
>  	/* Don't bother re-computing within the cache threshold time. */
>  	time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
>  	khz = per_cpu(samples.khz, cpu);
> -	if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
> +	if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
>  		return khz;
>  
>  	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
>  	khz = per_cpu(samples.khz, cpu);
> -	if (khz)
> +	if (time_delta <= APERFMPERF_STALE_THRESHOLD_MS)
>  		return khz;
>  
> +	/* If the previous iteration was too long ago, take a new data point. */
>  	msleep(APERFMPERF_REFRESH_DELAY_MS);
>  	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
>  
>  	return per_cpu(samples.khz, cpu);
>  }
> +
> +void aperfmperf_snapshot_all(void)
> +{
> +	if (!cpu_khz)
> +		return;
> +
> +	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
> +		return;
> +
> +	smp_call_function_many(cpu_online_mask, aperfmperf_snapshot_khz, NULL, 1);
> +	msleep(APERFMPERF_REFRESH_DELAY_MS);
> +}
> +
> +unsigned int aperfmperf_snapshot_cpu(int cpu)
> +{
> +	if (!cpu_khz)
> +		return 0;
> +
> +	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
> +		return 0;
> +
> +	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
> +	return per_cpu(samples.khz, cpu);
> +}
> Index: linux-pm/arch/x86/kernel/cpu/cpu.h
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/cpu/cpu.h
> +++ linux-pm/arch/x86/kernel/cpu/cpu.h
> @@ -47,4 +47,8 @@ extern const struct cpu_dev *const __x86
>  
>  extern void get_cpu_cap(struct cpuinfo_x86 *c);
>  extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
> +
> +extern unsigned int aperfmperf_snapshot_cpu(int cpu);
> +extern void aperfmperf_snapshot_all(void);
> +
>  #endif /* ARCH_X86_CPU_H */
> Index: linux-pm/arch/x86/kernel/cpu/proc.c
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/cpu/proc.c
> +++ linux-pm/arch/x86/kernel/cpu/proc.c
> @@ -5,6 +5,8 @@
>  #include <linux/seq_file.h>
>  #include <linux/cpufreq.h>
>  
> +#include "cpu.h"
> +
>  /*
>   *	Get CPU information for use by the procfs.
>   */
> @@ -78,7 +80,7 @@ static int show_cpuinfo(struct seq_file
>  		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>  
>  	if (cpu_has(c, X86_FEATURE_TSC)) {
> -		unsigned int freq = arch_freq_get_on_cpu(cpu);
> +		unsigned int freq = aperfmperf_snapshot_cpu(cpu);
>  
>  		if (!freq)
>  			freq = cpufreq_quick_get(cpu);
> @@ -141,6 +143,7 @@ static int show_cpuinfo(struct seq_file
>  
>  static void *c_start(struct seq_file *m, loff_t *pos)
>  {
> +	aperfmperf_snapshot_all();
>  	*pos = cpumask_next(*pos - 1, cpu_online_mask);
>  	if ((*pos) < nr_cpu_ids)
>  		return &cpu_data(*pos);
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-10  4:04       ` WANG Chao
@ 2017-11-10  4:11         ` WANG Chao
  0 siblings, 0 replies; 31+ messages in thread
From: WANG Chao @ 2017-11-10  4:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linus Torvalds, Linux Kernel Mailing List,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vikas Shivappa,
	Kate Stewart, Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On 11/10/17 at 12:04P, WANG Chao wrote:
> On 11/10/17 at 01:06P, Rafael J. Wysocki wrote:
> > On Thursday, November 9, 2017 11:30:54 PM CET Rafael J. Wysocki wrote:
> > > On Thu, Nov 9, 2017 at 5:06 PM, Rafael J. Wysocki
> > > <rafael.j.wysocki@intel.com> wrote:
> > > > Hi Linus,
> > > >
> > > > On 11/9/2017 11:38 AM, WANG Chao wrote:
> > > >>
> > > >> Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> > > >> a serious performance issue when reading from /proc/cpuinfo on system
> > > >> with aperfmperf.
> > > >>
> > > >> For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > > >> On a system with 64 cpus, it takes 1.5s to finish running `cat
> > > >> /proc/cpuinfo`, while it previously was done in 15ms.
> > > >
> > > > Honestly, I'm not sure what to do to address this ATM.
> > > >
> > > > The last requested frequency is only available in the non-HWP case, so it
> > > > cannot be used universally.
> > > 
> > > OK, here's an idea.
> > > 
> > > c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> > > in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> > > is overkill) and then aperfmperf_snapshot_khz() can be run once on
> > > each CPU in show_cpuinfo() without taking the "stale cache" threshold
> > > into account.
> > > 
> > > I'm going to try that and see how far I can get with it.
> > 
> > Below is what I have.
> > 
> > I ended up using APERFMPERF_REFRESH_DELAY_MS for the delay in
> > aperfmperf_snapshot_all(), because 5 ms tended to add too much
> > variation to the results on my test box.
> > 
> > I think it may be reduced to 10 ms, though.
> > 
> > Chao, can you please try this one and report back?
> 
> Hi, Rafael
> 
> Thanks for the patch. But it doesn't work for me. lscpu takes 1.5s to
> finish on a 64 cpus AMD box with aperfmperf.
> 
> You missed the fact that c_start() will also be called by c_next().
> 
> But I don't think the overall idea is good enough. I think /proc/cpuinfo
> is too general for usespace too be delayed, no matter it's 10ms or 20ms.
> 
> My point is cpu MHz is best to use a cached value for quick access. If
> people are looking for reliable and accurate cpu frequency,
> /proc/cpuinfo is probably a bad idae.
> 
> What do you think?

Could you also explain 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in
/proc/cpuinfo) please? The commit message is not clear for me.

Are there any upstream disscutions? I wasn't following this change in
upstream. Now I can't find any.

Thanks,
WANG Chao

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-09 16:06 ` Rafael J. Wysocki
  2017-11-09 22:30   ` Rafael J. Wysocki
@ 2017-11-10  7:25   ` Ingo Molnar
  2017-11-10  9:21     ` WANG Chao
  1 sibling, 1 reply; 31+ messages in thread
From: Ingo Molnar @ 2017-11-10  7:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: WANG Chao, Linus Torvalds, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, x86, Rafael J. Wysocki, Linux PM


* Rafael J. Wysocki <rafael.j.wysocki@intel.com> wrote:

> Hi Linus,
> 
> On 11/9/2017 11:38 AM, WANG Chao wrote:
> > Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> > a serious performance issue when reading from /proc/cpuinfo on system
> > with aperfmperf.
> > 
> > For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > On a system with 64 cpus, it takes 1.5s to finish running `cat
> > /proc/cpuinfo`, while it previously was done in 15ms.
> 
> Honestly, I'm not sure what to do to address this ATM.
> 
> The last requested frequency is only available in the non-HWP case, so it
> cannot be used universally.

This is a serious regression that needs to be fixed ASAP, because the slowdown is 
utterly ridiculous on a 120 CPU system:

  fomalhaut:~> time cat /proc/cpuinfo  >/dev/null

  real    0m2.689s
  user    0m0.001s
  sys     0m0.007s

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-10  7:25   ` Ingo Molnar
@ 2017-11-10  9:21     ` WANG Chao
  0 siblings, 0 replies; 31+ messages in thread
From: WANG Chao @ 2017-11-10  9:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linus Torvalds, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, x86, Rafael J. Wysocki, Linux PM

On 11/10/17 at 08:25P, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rafael.j.wysocki@intel.com> wrote:
> 
> > Hi Linus,
> > 
> > On 11/9/2017 11:38 AM, WANG Chao wrote:
> > > Commit 941f5f0f6ef5 (x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo) caused
> > > a serious performance issue when reading from /proc/cpuinfo on system
> > > with aperfmperf.
> > > 
> > > For each cpu, arch_freq_get_on_cpu() sleeps 20ms to get its frequency.
> > > On a system with 64 cpus, it takes 1.5s to finish running `cat
> > > /proc/cpuinfo`, while it previously was done in 15ms.
> > 
> > Honestly, I'm not sure what to do to address this ATM.
> > 
> > The last requested frequency is only available in the non-HWP case, so it
> > cannot be used universally.
> 
> This is a serious regression that needs to be fixed ASAP, because the slowdown is 
> utterly ridiculous on a 120 CPU system:
> 
>   fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
> 
>   real    0m2.689s
>   user    0m0.001s
>   sys     0m0.007s

Because 4.14 is about to release, how about adding this patch to 4.14
now? We can work on a more sophisticated solution later.

Thanks,
WANG Chao

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-09 22:30   ` Rafael J. Wysocki
  2017-11-10  0:06     ` Rafael J. Wysocki
@ 2017-11-10 19:11     ` Linus Torvalds
  2017-11-10 23:09       ` Rafael J. Wysocki
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2017-11-10 19:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: WANG Chao, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Thu, Nov 9, 2017 at 2:30 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> is overkill) and then aperfmperf_snapshot_khz() can be run once on
> each CPU in show_cpuinfo() without taking the "stale cache" threshold
> into account.

Yeah, that won't work.

What could work is to do that "smp_call_function_many()" at open time,
and *not* set the "wait" flag, but do it entirely asynchronously.

But I don't think that's an option for 4.14 ;(

So I guess I'll have to revert.

            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-10 19:11     ` Linus Torvalds
@ 2017-11-10 23:09       ` Rafael J. Wysocki
  2017-11-14 22:47         ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-10 23:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, WANG Chao, Linux Kernel Mailing List,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Vikas Shivappa,
	Kate Stewart, Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Fri, Nov 10, 2017 at 8:11 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Nov 9, 2017 at 2:30 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
>> in parallel), then wait for a while (say 5 ms; the current 20 ms wait
>> is overkill) and then aperfmperf_snapshot_khz() can be run once on
>> each CPU in show_cpuinfo() without taking the "stale cache" threshold
>> into account.
>
> Yeah, that won't work.

Indeed.

> What could work is to do that "smp_call_function_many()" at open time,
> and *not* set the "wait" flag, but do it entirely asynchronously.

Right.

> But I don't think that's an option for 4.14 ;(

Agreed.

> So I guess I'll have to revert.

Sure thing (and I see that you've reverted it already).

The reason why I wanted to fix this up before the final 4.14 is that
the "cpu MHz" behavior is kind of inconsistent now (generally, it is
either constant or the last requested frequency depending on the
cpufreq configuration), but that's not a blocker by any measure IMO.

Anyway, I'll try to come up with something better next week.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-10 23:09       ` Rafael J. Wysocki
@ 2017-11-14 22:47         ` Rafael J. Wysocki
  2017-11-14 23:02           ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-14 22:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: WANG Chao, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Saturday, November 11, 2017 12:09:18 AM CET Rafael J. Wysocki wrote:
> On Fri, Nov 10, 2017 at 8:11 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Thu, Nov 9, 2017 at 2:30 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>
> >> c_start() can run aperfmperf_snapshot_khz() on all CPUs upfront (say
> >> in parallel), then wait for a while (say 5 ms; the current 20 ms wait
> >> is overkill) and then aperfmperf_snapshot_khz() can be run once on
> >> each CPU in show_cpuinfo() without taking the "stale cache" threshold
> >> into account.
> >
> > Yeah, that won't work.
> 
> Indeed.
> 
> > What could work is to do that "smp_call_function_many()" at open time,
> > and *not* set the "wait" flag, but do it entirely asynchronously.
> 
> Right.
> 
> > But I don't think that's an option for 4.14 ;(
> 
> Agreed.
> 
> > So I guess I'll have to revert.
> 
> Sure thing (and I see that you've reverted it already).
> 
> The reason why I wanted to fix this up before the final 4.14 is that
> the "cpu MHz" behavior is kind of inconsistent now (generally, it is
> either constant or the last requested frequency depending on the
> cpufreq configuration), but that's not a blocker by any measure IMO.
> 
> Anyway, I'll try to come up with something better next week.

So what about the one below?  It works for me as expected.

I'm not super-happy with the __weak thing, but I kind of prefer it to
adding an extra callback to struct seq_operations just for cpuinfo on x86.

---
 arch/x86/kernel/cpu/aperfmperf.c |   74 +++++++++++++++++++++++++++------------
 arch/x86/kernel/cpu/cpu.h        |    3 +
 arch/x86/kernel/cpu/proc.c       |    6 ++-
 fs/proc/cpuinfo.c                |    6 +++
 include/linux/cpufreq.h          |    1 
 5 files changed, 67 insertions(+), 23 deletions(-)

Index: linux-pm/arch/x86/kernel/cpu/proc.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/proc.c
+++ linux-pm/arch/x86/kernel/cpu/proc.c
@@ -5,6 +5,8 @@
 #include <linux/seq_file.h>
 #include <linux/cpufreq.h>
 
+#include "cpu.h"
+
 /*
  *	Get CPU information for use by the procfs.
  */
@@ -78,9 +80,11 @@ static int show_cpuinfo(struct seq_file
 		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
 
 	if (cpu_has(c, X86_FEATURE_TSC)) {
-		unsigned int freq = cpufreq_quick_get(cpu);
+		unsigned int freq = aperfmperf_get_khz(cpu);
 
 		if (!freq)
+			freq = cpufreq_quick_get(cpu);
+		if (!freq)
 			freq = cpu_khz;
 		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
 			   freq / 1000, (freq % 1000));
Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c
+++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c
@@ -14,6 +14,8 @@
 #include <linux/percpu.h>
 #include <linux/smp.h>
 
+#include "cpu.h"
+
 struct aperfmperf_sample {
 	unsigned int	khz;
 	ktime_t	time;
@@ -24,7 +26,7 @@ struct aperfmperf_sample {
 static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
 
 #define APERFMPERF_CACHE_THRESHOLD_MS	10
-#define APERFMPERF_REFRESH_DELAY_MS	20
+#define APERFMPERF_REFRESH_DELAY_MS	10
 #define APERFMPERF_STALE_THRESHOLD_MS	1000
 
 /*
@@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
 	u64 aperf, aperf_delta;
 	u64 mperf, mperf_delta;
 	struct aperfmperf_sample *s = this_cpu_ptr(&samples);
-	ktime_t now = ktime_get();
-	s64 time_delta = ktime_ms_delta(now, s->time);
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -57,38 +57,68 @@ static void aperfmperf_snapshot_khz(void
 	if (mperf_delta == 0)
 		return;
 
-	s->time = now;
+	s->time = ktime_get();
 	s->aperf = aperf;
 	s->mperf = mperf;
-
-	/* If the previous iteration was too long ago, discard it. */
-	if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
-		s->khz = 0;
-	else
-		s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
+	s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
 }
 
-unsigned int arch_freq_get_on_cpu(int cpu)
+static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 {
-	s64 time_delta;
-	unsigned int khz;
+	s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu));
+
+	/* Don't bother re-computing within the cache threshold time. */
+	if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
+		return true;
+
+	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+
+	/* Return false if the previous iteration was too long ago. */
+	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
+}
 
+unsigned int aperfmperf_get_khz(int cpu)
+{
 	if (!cpu_khz)
 		return 0;
 
 	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
 		return 0;
 
-	/* Don't bother re-computing within the cache threshold time. */
-	time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
-	khz = per_cpu(samples.khz, cpu);
-	if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
-		return khz;
+	aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
+	return per_cpu(samples.khz, cpu);
+}
 
-	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
-	khz = per_cpu(samples.khz, cpu);
-	if (khz)
-		return khz;
+void arch_freq_prepare_all(void)
+{
+	ktime_t now = ktime_get();
+	bool wait = false;
+	int cpu;
+
+	if (!cpu_khz)
+		return;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return;
+
+	for_each_online_cpu(cpu)
+		if (!aperfmperf_snapshot_cpu(cpu, now, false))
+			wait = true;
+
+	if (wait)
+		msleep(APERFMPERF_REFRESH_DELAY_MS);
+}
+
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	if (!cpu_khz)
+		return 0;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return 0;
+
+	if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
+		return per_cpu(samples.khz, cpu);
 
 	msleep(APERFMPERF_REFRESH_DELAY_MS);
 	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
Index: linux-pm/arch/x86/kernel/cpu/cpu.h
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/cpu.h
+++ linux-pm/arch/x86/kernel/cpu/cpu.h
@@ -47,4 +47,7 @@ extern const struct cpu_dev *const __x86
 
 extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+
+unsigned int aperfmperf_get_khz(int cpu);
+
 #endif /* ARCH_X86_CPU_H */
Index: linux-pm/fs/proc/cpuinfo.c
===================================================================
--- linux-pm.orig/fs/proc/cpuinfo.c
+++ linux-pm/fs/proc/cpuinfo.c
@@ -1,12 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/cpufreq.h>
 #include <linux/fs.h>
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
+__weak void arch_freq_prepare_all(void)
+{
+}
+
 extern const struct seq_operations cpuinfo_op;
 static int cpuinfo_open(struct inode *inode, struct file *file)
 {
+	arch_freq_prepare_all();
 	return seq_open(file, &cpuinfo_op);
 }
 
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -917,6 +917,7 @@ static inline bool policy_has_boost_freq
 }
 #endif
 
+extern void arch_freq_prepare_all(void);
 extern unsigned int arch_freq_get_on_cpu(int cpu);
 
 extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-14 22:47         ` Rafael J. Wysocki
@ 2017-11-14 23:02           ` Linus Torvalds
  2017-11-14 23:53             ` Thomas Gleixner
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2017-11-14 23:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: WANG Chao, Linux Kernel Mailing List, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Tue, Nov 14, 2017 at 2:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> So what about the one below?  It works for me as expected.

Can somebody with 100+ cores test this? Ingo? You had a 160 core
machine that took forever, didn't you..

            Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-14 23:02           ` Linus Torvalds
@ 2017-11-14 23:53             ` Thomas Gleixner
  2017-11-15  0:04               ` Linus Torvalds
  2017-11-15  0:06               ` Rafael J. Wysocki
  0 siblings, 2 replies; 31+ messages in thread
From: Thomas Gleixner @ 2017-11-14 23:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Tue, 14 Nov 2017, Linus Torvalds wrote:

> On Tue, Nov 14, 2017 at 2:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >
> > So what about the one below?  It works for me as expected.
> 
> Can somebody with 100+ cores test this? Ingo? You had a 160 core
> machine that took forever, didn't you..

On a 144 CPUs machine:

time cat /proc/cpuinfo >/dev/null

Current head:

real	0m0.003s
user	0m0.000s
sys	0m0.002s

Current head + Raphaels patch:

real	0m0.029s
user	0m0.000s
sys	0m0.010s

So that patch is actually slower.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-14 23:53             ` Thomas Gleixner
@ 2017-11-15  0:04               ` Linus Torvalds
  2017-11-15  0:06                 ` Linus Torvalds
  2017-11-15  8:47                 ` Thomas Gleixner
  2017-11-15  0:06               ` Rafael J. Wysocki
  1 sibling, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2017-11-15  0:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Current head + Raphaels patch:
>
> real    0m0.029s
> user    0m0.000s
> sys     0m0.010s
>
> So that patch is actually slower.

Oh it definitely is expected to be slower, because it does the IPI to
all the cores and actually gets their frequency right.

It was the old one that we had to revert (because it did so
sequentially) that was really bad, and took something like 2+ seconds
on Ingo's 160-core thing, iirc.

It sounds like the current patch is slower, but likely acceptable
considering that you get the right results now ..

          Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  0:04               ` Linus Torvalds
@ 2017-11-15  0:06                 ` Linus Torvalds
  2017-11-15  0:30                   ` Rafael J. Wysocki
  2017-11-15  8:47                 ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2017-11-15  0:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Tue, Nov 14, 2017 at 4:04 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> Current head + Raphaels patch:
>>
>> real    0m0.029s
>> user    0m0.000s
>> sys     0m0.010s
>>
>> So that patch is actually slower.
>
> Oh it definitely is expected to be slower, because it does the IPI to
> all the cores and actually gets their frequency right.
>
> It was the old one that we had to revert (because it did so
> sequentially) that was really bad, and took something like 2+ seconds
> on Ingo's 160-core thing, iirc.

Looked it up. Ingo's machine "only" had 120 cores, and he said

    fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
    real    0m2.689s

for the bad serial case, so yeah, it looks "a bit" better than it was ;)

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-14 23:53             ` Thomas Gleixner
  2017-11-15  0:04               ` Linus Torvalds
@ 2017-11-15  0:06               ` Rafael J. Wysocki
  1 sibling, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-15  0:06 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Wednesday, November 15, 2017 12:53:24 AM CET Thomas Gleixner wrote:
> On Tue, 14 Nov 2017, Linus Torvalds wrote:
> 
> > On Tue, Nov 14, 2017 at 2:47 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > >
> > > So what about the one below?  It works for me as expected.
> > 
> > Can somebody with 100+ cores test this? Ingo? You had a 160 core
> > machine that took forever, didn't you..
> 
> On a 144 CPUs machine:
> 
> time cat /proc/cpuinfo >/dev/null
> 
> Current head:
> 
> real	0m0.003s
> user	0m0.000s
> sys	0m0.002s
> 
> Current head + Raphaels patch:
> 
> real	0m0.029s
> user	0m0.000s
> sys	0m0.010s
> 
> So that patch is actually slower.

Thanks for testing!

It is adding a delay (mostly because it has to allow APERF and MPERF to grow
somewhat for the frequency computation to produce a useful result), so it will
be slower even in theory, but really the question is whether or not the slow
down is acceptable.

It doesn't look horrible to me, but that's my patch after all. :-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  0:06                 ` Linus Torvalds
@ 2017-11-15  0:30                   ` Rafael J. Wysocki
  2017-11-15  0:34                     ` Linus Torvalds
  2017-11-15  7:43                     ` [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again Ingo Molnar
  0 siblings, 2 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-15  0:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Wednesday, November 15, 2017 1:06:12 AM CET Linus Torvalds wrote:
> On Tue, Nov 14, 2017 at 4:04 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >> Current head + Raphaels patch:
> >>
> >> real    0m0.029s
> >> user    0m0.000s
> >> sys     0m0.010s
> >>
> >> So that patch is actually slower.
> >
> > Oh it definitely is expected to be slower, because it does the IPI to
> > all the cores and actually gets their frequency right.
> >
> > It was the old one that we had to revert (because it did so
> > sequentially) that was really bad, and took something like 2+ seconds
> > on Ingo's 160-core thing, iirc.
> 
> Looked it up. Ingo's machine "only" had 120 cores, and he said
> 
>     fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
>     real    0m2.689s
> 
> for the bad serial case, so yeah, it looks "a bit" better than it was ;)

OK, so may I queue it up?

I don't think I can get that to work substantially faster anyway ...

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  0:30                   ` Rafael J. Wysocki
@ 2017-11-15  0:34                     ` Linus Torvalds
  2017-11-15  1:13                       ` [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo Rafael J. Wysocki
  2017-11-15  7:43                     ` [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again Ingo Molnar
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2017-11-15  0:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Tue, Nov 14, 2017 at 4:30 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> OK, so may I queue it up?
>
> I don't think I can get that to work substantially faster anyway ...

Yeah, I think it's ok.

If somebody is really doing that /proc/cpuinfo in a loop, the rate
limiting should kick in. And if you do it only once, getting the right
frequency is worth it, I think.

             Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-15  0:34                     ` Linus Torvalds
@ 2017-11-15  1:13                       ` Rafael J. Wysocki
  2017-11-15  8:47                         ` Thomas Gleixner
  2017-11-15  9:33                         ` WANG Chao
  0 siblings, 2 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-15  1:13 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Linus Torvalds, Thomas Gleixner, WANG Chao, Ingo Molnar,
	H. Peter Anvin, Vikas Shivappa, Kate Stewart, Len Brown,
	Greg Kroah-Hartman, Philippe Ombredanne, Mathias Krause,
	the arch/x86 maintainers, Linux PM, Rafael J. Wysocki

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration.  That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".

However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.

Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.

Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Resent with a changelog & tags.

I'm going to route it via the linux-pm tree.

---
 arch/x86/kernel/cpu/aperfmperf.c |   74 +++++++++++++++++++++++++++------------
 arch/x86/kernel/cpu/cpu.h        |    3 +
 arch/x86/kernel/cpu/proc.c       |    6 ++-
 fs/proc/cpuinfo.c                |    6 +++
 include/linux/cpufreq.h          |    1 
 5 files changed, 67 insertions(+), 23 deletions(-)

Index: linux-pm/arch/x86/kernel/cpu/proc.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/proc.c
+++ linux-pm/arch/x86/kernel/cpu/proc.c
@@ -5,6 +5,8 @@
 #include <linux/seq_file.h>
 #include <linux/cpufreq.h>
 
+#include "cpu.h"
+
 /*
  *	Get CPU information for use by the procfs.
  */
@@ -78,9 +80,11 @@ static int show_cpuinfo(struct seq_file
 		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
 
 	if (cpu_has(c, X86_FEATURE_TSC)) {
-		unsigned int freq = cpufreq_quick_get(cpu);
+		unsigned int freq = aperfmperf_get_khz(cpu);
 
 		if (!freq)
+			freq = cpufreq_quick_get(cpu);
+		if (!freq)
 			freq = cpu_khz;
 		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
 			   freq / 1000, (freq % 1000));
Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c
+++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c
@@ -14,6 +14,8 @@
 #include <linux/percpu.h>
 #include <linux/smp.h>
 
+#include "cpu.h"
+
 struct aperfmperf_sample {
 	unsigned int	khz;
 	ktime_t	time;
@@ -24,7 +26,7 @@ struct aperfmperf_sample {
 static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
 
 #define APERFMPERF_CACHE_THRESHOLD_MS	10
-#define APERFMPERF_REFRESH_DELAY_MS	20
+#define APERFMPERF_REFRESH_DELAY_MS	10
 #define APERFMPERF_STALE_THRESHOLD_MS	1000
 
 /*
@@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
 	u64 aperf, aperf_delta;
 	u64 mperf, mperf_delta;
 	struct aperfmperf_sample *s = this_cpu_ptr(&samples);
-	ktime_t now = ktime_get();
-	s64 time_delta = ktime_ms_delta(now, s->time);
 	unsigned long flags;
 
 	local_irq_save(flags);
@@ -57,38 +57,68 @@ static void aperfmperf_snapshot_khz(void
 	if (mperf_delta == 0)
 		return;
 
-	s->time = now;
+	s->time = ktime_get();
 	s->aperf = aperf;
 	s->mperf = mperf;
-
-	/* If the previous iteration was too long ago, discard it. */
-	if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
-		s->khz = 0;
-	else
-		s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
+	s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
 }
 
-unsigned int arch_freq_get_on_cpu(int cpu)
+static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
 {
-	s64 time_delta;
-	unsigned int khz;
+	s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu));
+
+	/* Don't bother re-computing within the cache threshold time. */
+	if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
+		return true;
+
+	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
+
+	/* Return false if the previous iteration was too long ago. */
+	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
+}
 
+unsigned int aperfmperf_get_khz(int cpu)
+{
 	if (!cpu_khz)
 		return 0;
 
 	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
 		return 0;
 
-	/* Don't bother re-computing within the cache threshold time. */
-	time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
-	khz = per_cpu(samples.khz, cpu);
-	if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
-		return khz;
+	aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
+	return per_cpu(samples.khz, cpu);
+}
 
-	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
-	khz = per_cpu(samples.khz, cpu);
-	if (khz)
-		return khz;
+void arch_freq_prepare_all(void)
+{
+	ktime_t now = ktime_get();
+	bool wait = false;
+	int cpu;
+
+	if (!cpu_khz)
+		return;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return;
+
+	for_each_online_cpu(cpu)
+		if (!aperfmperf_snapshot_cpu(cpu, now, false))
+			wait = true;
+
+	if (wait)
+		msleep(APERFMPERF_REFRESH_DELAY_MS);
+}
+
+unsigned int arch_freq_get_on_cpu(int cpu)
+{
+	if (!cpu_khz)
+		return 0;
+
+	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
+		return 0;
+
+	if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
+		return per_cpu(samples.khz, cpu);
 
 	msleep(APERFMPERF_REFRESH_DELAY_MS);
 	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
Index: linux-pm/arch/x86/kernel/cpu/cpu.h
===================================================================
--- linux-pm.orig/arch/x86/kernel/cpu/cpu.h
+++ linux-pm/arch/x86/kernel/cpu/cpu.h
@@ -47,4 +47,7 @@ extern const struct cpu_dev *const __x86
 
 extern void get_cpu_cap(struct cpuinfo_x86 *c);
 extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+
+unsigned int aperfmperf_get_khz(int cpu);
+
 #endif /* ARCH_X86_CPU_H */
Index: linux-pm/fs/proc/cpuinfo.c
===================================================================
--- linux-pm.orig/fs/proc/cpuinfo.c
+++ linux-pm/fs/proc/cpuinfo.c
@@ -1,12 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0
+#include <linux/cpufreq.h>
 #include <linux/fs.h>
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
+__weak void arch_freq_prepare_all(void)
+{
+}
+
 extern const struct seq_operations cpuinfo_op;
 static int cpuinfo_open(struct inode *inode, struct file *file)
 {
+	arch_freq_prepare_all();
 	return seq_open(file, &cpuinfo_op);
 }
 
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -917,6 +917,7 @@ static inline bool policy_has_boost_freq
 }
 #endif
 
+extern void arch_freq_prepare_all(void);
 extern unsigned int arch_freq_get_on_cpu(int cpu);
 
 extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  0:30                   ` Rafael J. Wysocki
  2017-11-15  0:34                     ` Linus Torvalds
@ 2017-11-15  7:43                     ` Ingo Molnar
  2017-11-15  7:54                       ` Greg Kroah-Hartman
  2017-11-15 17:27                       ` Linus Torvalds
  1 sibling, 2 replies; 31+ messages in thread
From: Ingo Molnar @ 2017-11-15  7:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Thomas Gleixner, WANG Chao,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	Vikas Shivappa, Kate Stewart, Len Brown, Greg Kroah-Hartman,
	Philippe Ombredanne, Mathias Krause, the arch/x86 maintainers,
	Linux PM, Rafael J. Wysocki


* Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> On Wednesday, November 15, 2017 1:06:12 AM CET Linus Torvalds wrote:
> > On Tue, Nov 14, 2017 at 4:04 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > >> Current head + Raphaels patch:
> > >>
> > >> real    0m0.029s
> > >> user    0m0.000s
> > >> sys     0m0.010s
> > >>
> > >> So that patch is actually slower.
> > >
> > > Oh it definitely is expected to be slower, because it does the IPI to
> > > all the cores and actually gets their frequency right.
> > >
> > > It was the old one that we had to revert (because it did so
> > > sequentially) that was really bad, and took something like 2+ seconds
> > > on Ingo's 160-core thing, iirc.
> > 
> > Looked it up. Ingo's machine "only" had 120 cores, and he said
> > 
> >     fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
> >     real    0m2.689s
> > 
> > for the bad serial case, so yeah, it looks "a bit" better than it was ;)
> 
> OK, so may I queue it up?
> 
> I don't think I can get that to work substantially faster anyway ...

The new version is OK I suppose:

  Acked-by: Ingo Molnar <mingo@kernel.org>

I also think that /proc/cpuinfo is a pretty bad interface for many uses - I 
personally only very rarely need the cpuinfo of _all_ CPUs.

We we should eventually have /proc/cpu/N/info or so, so that 99% of the times 
cpuinfo is needed to report bugs we can do:

	cat /proc/cpu/0/info

With maybe also the following variants:

	/proc/cpu/first/
	/proc/cpu/last/
	/proc/cpu/current/

... to the first/last/current CPUs.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  7:43                     ` [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again Ingo Molnar
@ 2017-11-15  7:54                       ` Greg Kroah-Hartman
  2017-11-15 17:27                       ` Linus Torvalds
  1 sibling, 0 replies; 31+ messages in thread
From: Greg Kroah-Hartman @ 2017-11-15  7:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Linus Torvalds, Thomas Gleixner, WANG Chao,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	Vikas Shivappa, Kate Stewart, Len Brown, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Wed, Nov 15, 2017 at 08:43:58AM +0100, Ingo Molnar wrote:
> 
> * Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> 
> > On Wednesday, November 15, 2017 1:06:12 AM CET Linus Torvalds wrote:
> > > On Tue, Nov 14, 2017 at 4:04 PM, Linus Torvalds
> > > <torvalds@linux-foundation.org> wrote:
> > > > On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > > >> Current head + Raphaels patch:
> > > >>
> > > >> real    0m0.029s
> > > >> user    0m0.000s
> > > >> sys     0m0.010s
> > > >>
> > > >> So that patch is actually slower.
> > > >
> > > > Oh it definitely is expected to be slower, because it does the IPI to
> > > > all the cores and actually gets their frequency right.
> > > >
> > > > It was the old one that we had to revert (because it did so
> > > > sequentially) that was really bad, and took something like 2+ seconds
> > > > on Ingo's 160-core thing, iirc.
> > > 
> > > Looked it up. Ingo's machine "only" had 120 cores, and he said
> > > 
> > >     fomalhaut:~> time cat /proc/cpuinfo  >/dev/null
> > >     real    0m2.689s
> > > 
> > > for the bad serial case, so yeah, it looks "a bit" better than it was ;)
> > 
> > OK, so may I queue it up?
> > 
> > I don't think I can get that to work substantially faster anyway ...
> 
> The new version is OK I suppose:
> 
>   Acked-by: Ingo Molnar <mingo@kernel.org>
> 
> I also think that /proc/cpuinfo is a pretty bad interface for many uses - I 
> personally only very rarely need the cpuinfo of _all_ CPUs.
> 
> We we should eventually have /proc/cpu/N/info or so, so that 99% of the times 
> cpuinfo is needed to report bugs we can do:
> 
> 	cat /proc/cpu/0/info
> 
> With maybe also the following variants:
> 
> 	/proc/cpu/first/
> 	/proc/cpu/last/
> 	/proc/cpu/current/
> 
> ... to the first/last/current CPUs.

We started to move this info into /sys/devices/cpu/ in individual files,
but that got stalled due to a lack of review and general "freak out" by
the ARM maintainers :)

Hopefully that patch set will come back soon so people can review it
properly.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  0:04               ` Linus Torvalds
  2017-11-15  0:06                 ` Linus Torvalds
@ 2017-11-15  8:47                 ` Thomas Gleixner
  1 sibling, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2017-11-15  8:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, WANG Chao, Linux Kernel Mailing List,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Tue, 14 Nov 2017, Linus Torvalds wrote:
> On Tue, Nov 14, 2017 at 3:53 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> > Current head + Raphaels patch:
> >
> > real    0m0.029s
> > user    0m0.000s
> > sys     0m0.010s
> >
> > So that patch is actually slower.
> 
> Oh it definitely is expected to be slower, because it does the IPI to
> all the cores and actually gets their frequency right.
> 
> It was the old one that we had to revert (because it did so
> sequentially) that was really bad, and took something like 2+ seconds
> on Ingo's 160-core thing, iirc.
> 

Tired brain did not connect it to the revert.

On that machine with ea0ee3398877: Revert "x86: CPU: Fix up "cpu MHz" in
/proc/cpuinfo" reverted it takes:

real	0m4.497s
user	0m0.012s
sys	0m0.000s

> It sounds like the current patch is slower, but likely acceptable
> considering that you get the right results now ..

Correct and the factor 10, i.e. 30ms vs. 3ms is not horrible, while the 4.5
seconds are.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-15  1:13                       ` [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo Rafael J. Wysocki
@ 2017-11-15  8:47                         ` Thomas Gleixner
  2017-11-15  9:33                         ` WANG Chao
  1 sibling, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2017-11-15  8:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Linus Torvalds, WANG Chao,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Wed, 15 Nov 2017, Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> on x86 can be either the nominal CPU frequency (which is constant)
> or the frequency most recently requested by a scaling governor in
> cpufreq, depending on the cpufreq configuration.  That is somewhat
> inconsistent and is different from what it was before 4.13, so in
> order to restore the previous behavior, make it report the current
> CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> 
> To that end, modify the /proc/cpuinfo implementation on x86 to use
> aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> registers, if available, and use their values to compute the CPU
> frequency to be reported as "cpu MHz".
> 
> However, do that carefully enough to avoid accumulating delays that
> lead to unacceptable access times for /proc/cpuinfo on systems with
> many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> asynchronously at the /proc/cpuinfo open time, add a single delay
> upfront (if necessary) at that point and simply compute the current
> frequency while running show_cpuinfo() for each individual CPU.
> 
> Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
> the default delay between consecutive APERF and MPERF reads to 10 ms,
> which should be sufficient to get large enough numbers for the
> frequency computation in all cases.
> 
> Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-15  1:13                       ` [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo Rafael J. Wysocki
  2017-11-15  8:47                         ` Thomas Gleixner
@ 2017-11-15  9:33                         ` WANG Chao
  2017-11-16  0:24                           ` Rafael J. Wysocki
  1 sibling, 1 reply; 31+ messages in thread
From: WANG Chao @ 2017-11-15  9:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On 11/15/17 at 02:13P, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> on x86 can be either the nominal CPU frequency (which is constant)
> or the frequency most recently requested by a scaling governor in
> cpufreq, depending on the cpufreq configuration.  That is somewhat
> inconsistent and is different from what it was before 4.13, so in
> order to restore the previous behavior, make it report the current
> CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> 
> To that end, modify the /proc/cpuinfo implementation on x86 to use
> aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> registers, if available, and use their values to compute the CPU
> frequency to be reported as "cpu MHz".
> 
> However, do that carefully enough to avoid accumulating delays that
> lead to unacceptable access times for /proc/cpuinfo on systems with
> many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> asynchronously at the /proc/cpuinfo open time, add a single delay
> upfront (if necessary) at that point and simply compute the current
> frequency while running show_cpuinfo() for each individual CPU.

Hi, Rafael

I tested your patch. It's much faster.

But from what I got, calling aperfmperf_snapshot_khz() asynchronously
with 10ms sleep takes much longer than calling aperfmperf_snapshot_khz()
synchronously.

Here's my result on 64 CPUs:

 - async aperfmperf_snapshot_khz() w/ 10ms sleep:

# time cat /proc/cpuinfo > /dev/null
real    0m0.014s
user    0m0.000s
sys     0m0.002s

 - sync aperfmperf_snapshot_khz() w/o any sleep:

# time cat /proc/cpuinfo > /dev/null
real    0m0.002s
user    0m0.000s
sys     0m0.002s

Thanks,
WANG Chao

> 
> Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
> the default delay between consecutive APERF and MPERF reads to 10 ms,
> which should be sufficient to get large enough numbers for the
> frequency computation in all cases.
> 
> Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> Resent with a changelog & tags.
> 
> I'm going to route it via the linux-pm tree.
> 
> ---
>  arch/x86/kernel/cpu/aperfmperf.c |   74 +++++++++++++++++++++++++++------------
>  arch/x86/kernel/cpu/cpu.h        |    3 +
>  arch/x86/kernel/cpu/proc.c       |    6 ++-
>  fs/proc/cpuinfo.c                |    6 +++
>  include/linux/cpufreq.h          |    1 
>  5 files changed, 67 insertions(+), 23 deletions(-)
> 
> Index: linux-pm/arch/x86/kernel/cpu/proc.c
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/cpu/proc.c
> +++ linux-pm/arch/x86/kernel/cpu/proc.c
> @@ -5,6 +5,8 @@
>  #include <linux/seq_file.h>
>  #include <linux/cpufreq.h>
>  
> +#include "cpu.h"
> +
>  /*
>   *	Get CPU information for use by the procfs.
>   */
> @@ -78,9 +80,11 @@ static int show_cpuinfo(struct seq_file
>  		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
>  
>  	if (cpu_has(c, X86_FEATURE_TSC)) {
> -		unsigned int freq = cpufreq_quick_get(cpu);
> +		unsigned int freq = aperfmperf_get_khz(cpu);
>  
>  		if (!freq)
> +			freq = cpufreq_quick_get(cpu);
> +		if (!freq)
>  			freq = cpu_khz;
>  		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
>  			   freq / 1000, (freq % 1000));
> Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/cpu/aperfmperf.c
> +++ linux-pm/arch/x86/kernel/cpu/aperfmperf.c
> @@ -14,6 +14,8 @@
>  #include <linux/percpu.h>
>  #include <linux/smp.h>
>  
> +#include "cpu.h"
> +
>  struct aperfmperf_sample {
>  	unsigned int	khz;
>  	ktime_t	time;
> @@ -24,7 +26,7 @@ struct aperfmperf_sample {
>  static DEFINE_PER_CPU(struct aperfmperf_sample, samples);
>  
>  #define APERFMPERF_CACHE_THRESHOLD_MS	10
> -#define APERFMPERF_REFRESH_DELAY_MS	20
> +#define APERFMPERF_REFRESH_DELAY_MS	10
>  #define APERFMPERF_STALE_THRESHOLD_MS	1000
>  
>  /*
> @@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void
>  	u64 aperf, aperf_delta;
>  	u64 mperf, mperf_delta;
>  	struct aperfmperf_sample *s = this_cpu_ptr(&samples);
> -	ktime_t now = ktime_get();
> -	s64 time_delta = ktime_ms_delta(now, s->time);
>  	unsigned long flags;
>  
>  	local_irq_save(flags);
> @@ -57,38 +57,68 @@ static void aperfmperf_snapshot_khz(void
>  	if (mperf_delta == 0)
>  		return;
>  
> -	s->time = now;
> +	s->time = ktime_get();
>  	s->aperf = aperf;
>  	s->mperf = mperf;
> -
> -	/* If the previous iteration was too long ago, discard it. */
> -	if (time_delta > APERFMPERF_STALE_THRESHOLD_MS)
> -		s->khz = 0;
> -	else
> -		s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
> +	s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
>  }
>  
> -unsigned int arch_freq_get_on_cpu(int cpu)
> +static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait)
>  {
> -	s64 time_delta;
> -	unsigned int khz;
> +	s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu));
> +
> +	/* Don't bother re-computing within the cache threshold time. */
> +	if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
> +		return true;
> +
> +	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait);
> +
> +	/* Return false if the previous iteration was too long ago. */
> +	return time_delta <= APERFMPERF_STALE_THRESHOLD_MS;
> +}
>  
> +unsigned int aperfmperf_get_khz(int cpu)
> +{
>  	if (!cpu_khz)
>  		return 0;
>  
>  	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>  		return 0;
>  
> -	/* Don't bother re-computing within the cache threshold time. */
> -	time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu));
> -	khz = per_cpu(samples.khz, cpu);
> -	if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS)
> -		return khz;
> +	aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
> +	return per_cpu(samples.khz, cpu);
> +}
>  
> -	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
> -	khz = per_cpu(samples.khz, cpu);
> -	if (khz)
> -		return khz;
> +void arch_freq_prepare_all(void)
> +{
> +	ktime_t now = ktime_get();
> +	bool wait = false;
> +	int cpu;
> +
> +	if (!cpu_khz)
> +		return;
> +
> +	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
> +		return;
> +
> +	for_each_online_cpu(cpu)
> +		if (!aperfmperf_snapshot_cpu(cpu, now, false))
> +			wait = true;
> +
> +	if (wait)
> +		msleep(APERFMPERF_REFRESH_DELAY_MS);
> +}
> +
> +unsigned int arch_freq_get_on_cpu(int cpu)
> +{
> +	if (!cpu_khz)
> +		return 0;
> +
> +	if (!static_cpu_has(X86_FEATURE_APERFMPERF))
> +		return 0;
> +
> +	if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
> +		return per_cpu(samples.khz, cpu);
>  
>  	msleep(APERFMPERF_REFRESH_DELAY_MS);
>  	smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1);
> Index: linux-pm/arch/x86/kernel/cpu/cpu.h
> ===================================================================
> --- linux-pm.orig/arch/x86/kernel/cpu/cpu.h
> +++ linux-pm/arch/x86/kernel/cpu/cpu.h
> @@ -47,4 +47,7 @@ extern const struct cpu_dev *const __x86
>  
>  extern void get_cpu_cap(struct cpuinfo_x86 *c);
>  extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
> +
> +unsigned int aperfmperf_get_khz(int cpu);
> +
>  #endif /* ARCH_X86_CPU_H */
> Index: linux-pm/fs/proc/cpuinfo.c
> ===================================================================
> --- linux-pm.orig/fs/proc/cpuinfo.c
> +++ linux-pm/fs/proc/cpuinfo.c
> @@ -1,12 +1,18 @@
>  // SPDX-License-Identifier: GPL-2.0
> +#include <linux/cpufreq.h>
>  #include <linux/fs.h>
>  #include <linux/init.h>
>  #include <linux/proc_fs.h>
>  #include <linux/seq_file.h>
>  
> +__weak void arch_freq_prepare_all(void)
> +{
> +}
> +
>  extern const struct seq_operations cpuinfo_op;
>  static int cpuinfo_open(struct inode *inode, struct file *file)
>  {
> +	arch_freq_prepare_all();
>  	return seq_open(file, &cpuinfo_op);
>  }
>  
> Index: linux-pm/include/linux/cpufreq.h
> ===================================================================
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -917,6 +917,7 @@ static inline bool policy_has_boost_freq
>  }
>  #endif
>  
> +extern void arch_freq_prepare_all(void);
>  extern unsigned int arch_freq_get_on_cpu(int cpu);
>  
>  extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15  7:43                     ` [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again Ingo Molnar
  2017-11-15  7:54                       ` Greg Kroah-Hartman
@ 2017-11-15 17:27                       ` Linus Torvalds
  2017-11-15 18:05                         ` Thomas Gleixner
  1 sibling, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2017-11-15 17:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rafael J. Wysocki, Thomas Gleixner, WANG Chao,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	Vikas Shivappa, Kate Stewart, Len Brown, Greg Kroah-Hartman,
	Philippe Ombredanne, Mathias Krause, the arch/x86 maintainers,
	Linux PM, Rafael J. Wysocki

On Tue, Nov 14, 2017 at 11:43 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> I also think that /proc/cpuinfo is a pretty bad interface for many uses - I
> personally only very rarely need the cpuinfo of _all_ CPUs.
>
> We we should eventually have /proc/cpu/N/info or so, so that 99% of the times
> cpuinfo is needed to report bugs we can do:
>
>         cat /proc/cpu/0/info

I don't disagree in theory, and I used to enthusiastically support the
"let's make one value per file" model.

But in practice it's a pain, and I no longer really think it's viable
except for debugging.

As an example, I can do

  cat /proc/cpuinfo | grep MHz

and get a nice overview of what's going on. If I were a system stat
program, I'd do that too.

In contrast, tell me where in /sys something useful is?

Honestly, the /sys directory structure _looks_ much more organized,
and it's lovely for some things, but it's a confusing jungle, and
finding things there is a complete and utter pain.

Yes, you can get the same thing:

     cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq

but here's the deal: time that on your 120-core machine some day.

In other words, /proc/cpuinfo is not perfect. But the "let's split
things out" really doesn't work either. It's worse. It _sounds_ good
in theory, but it really really sucks.

I'm convinced that /sys is wonderful for management purposes. But no,
this whole "we should have individual files" is often a huge huge
mistake.

In fact, I think the current patch for CPU MHz shows exactly why
/proc/cpuinfo is actually hugely superior to the crazy "one file per
cpu" model: we could gather the statistics in parallel, all together,
and make it be reasonable.

Again, try that "cat" example again, and time it.

(And yes, I also know that "cpu*/cpufreq" is a symlink, but the direct
names are illogical, and no faster for me. Try it if you like:

  time cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq > /dev/null

and then look at /proc/cpuinfo again, and realize how lovely that
human-readable file actually is).

                      Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again
  2017-11-15 17:27                       ` Linus Torvalds
@ 2017-11-15 18:05                         ` Thomas Gleixner
  0 siblings, 0 replies; 31+ messages in thread
From: Thomas Gleixner @ 2017-11-15 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Rafael J. Wysocki, WANG Chao,
	Linux Kernel Mailing List, Ingo Molnar, H. Peter Anvin,
	Vikas Shivappa, Kate Stewart, Len Brown, Greg Kroah-Hartman,
	Philippe Ombredanne, Mathias Krause, the arch/x86 maintainers,
	Linux PM, Rafael J. Wysocki

On Wed, 15 Nov 2017, Linus Torvalds wrote:
> On Tue, Nov 14, 2017 at 11:43 PM, Ingo Molnar <mingo@kernel.org> wrote:
> Again, try that "cat" example again, and time it.
> 
> (And yes, I also know that "cpu*/cpufreq" is a symlink, but the direct
> names are illogical, and no faster for me. Try it if you like:
> 
>   time cat /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq > /dev/null

Just for the fun of it I tried. It's exactly the same time as /proc/cpuinfo
with the reverted patch which did the full IPI thing, i.e. >4 seconds on
that machine.

Not a surprise because those files do the full IPI thing as well.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-15  9:33                         ` WANG Chao
@ 2017-11-16  0:24                           ` Rafael J. Wysocki
  2017-11-16  9:50                             ` WANG Chao
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-16  0:24 UTC (permalink / raw)
  To: WANG Chao
  Cc: Linux Kernel Mailing List, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Wednesday, November 15, 2017 10:33:47 AM CET WANG Chao wrote:
> On 11/15/17 at 02:13P, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> > for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> > on x86 can be either the nominal CPU frequency (which is constant)
> > or the frequency most recently requested by a scaling governor in
> > cpufreq, depending on the cpufreq configuration.  That is somewhat
> > inconsistent and is different from what it was before 4.13, so in
> > order to restore the previous behavior, make it report the current
> > CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> > 
> > To that end, modify the /proc/cpuinfo implementation on x86 to use
> > aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> > registers, if available, and use their values to compute the CPU
> > frequency to be reported as "cpu MHz".
> > 
> > However, do that carefully enough to avoid accumulating delays that
> > lead to unacceptable access times for /proc/cpuinfo on systems with
> > many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> > asynchronously at the /proc/cpuinfo open time, add a single delay
> > upfront (if necessary) at that point and simply compute the current
> > frequency while running show_cpuinfo() for each individual CPU.
> 
> Hi, Rafael
> 
> I tested your patch. It's much faster.
> 
> But from what I got, calling aperfmperf_snapshot_khz() asynchronously
> with 10ms sleep takes much longer than calling aperfmperf_snapshot_khz()
> synchronously.
> 
> Here's my result on 64 CPUs:
> 
>  - async aperfmperf_snapshot_khz() w/ 10ms sleep:
> 
> # time cat /proc/cpuinfo > /dev/null
> real    0m0.014s
> user    0m0.000s
> sys     0m0.002s
> 
>  - sync aperfmperf_snapshot_khz() w/o any sleep:
> 
> # time cat /proc/cpuinfo > /dev/null
> real    0m0.002s
> user    0m0.000s
> sys     0m0.002s

Sure, but the delay is there, because without it the computed frequency
may be way off for at least one of the CPUs.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-16  0:24                           ` Rafael J. Wysocki
@ 2017-11-16  9:50                             ` WANG Chao
  2017-11-16 13:54                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: WANG Chao @ 2017-11-16  9:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On 11/16/17 at 01:24P, Rafael J. Wysocki wrote:
> On Wednesday, November 15, 2017 10:33:47 AM CET WANG Chao wrote:
> > On 11/15/17 at 02:13P, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> > > for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> > > on x86 can be either the nominal CPU frequency (which is constant)
> > > or the frequency most recently requested by a scaling governor in
> > > cpufreq, depending on the cpufreq configuration.  That is somewhat
> > > inconsistent and is different from what it was before 4.13, so in
> > > order to restore the previous behavior, make it report the current
> > > CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> > > 
> > > To that end, modify the /proc/cpuinfo implementation on x86 to use
> > > aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> > > registers, if available, and use their values to compute the CPU
> > > frequency to be reported as "cpu MHz".
> > > 
> > > However, do that carefully enough to avoid accumulating delays that
> > > lead to unacceptable access times for /proc/cpuinfo on systems with
> > > many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> > > asynchronously at the /proc/cpuinfo open time, add a single delay
> > > upfront (if necessary) at that point and simply compute the current
> > > frequency while running show_cpuinfo() for each individual CPU.
> > 
> > Hi, Rafael
> > 
> > I tested your patch. It's much faster.
> > 
> > But from what I got, calling aperfmperf_snapshot_khz() asynchronously
> > with 10ms sleep takes much longer than calling aperfmperf_snapshot_khz()
> > synchronously.
> > 
> > Here's my result on 64 CPUs:
> > 
> >  - async aperfmperf_snapshot_khz() w/ 10ms sleep:
> > 
> > # time cat /proc/cpuinfo > /dev/null
> > real    0m0.014s
> > user    0m0.000s
> > sys     0m0.002s
> > 
> >  - sync aperfmperf_snapshot_khz() w/o any sleep:
> > 
> > # time cat /proc/cpuinfo > /dev/null
> > real    0m0.002s
> > user    0m0.000s
> > sys     0m0.002s
> 
> Sure, but the delay is there, because without it the computed frequency
> may be way off for at least one of the CPUs.

Thanks, I understand now. In this case, The 10ms delay turns out to be
the interval of measuring aperf and mperf and computing their deltas.

Last question though, is 10ms best practice or can we make it shorter,
say 5ms?

Thanks,
WANG Chao

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-16  9:50                             ` WANG Chao
@ 2017-11-16 13:54                               ` Rafael J. Wysocki
  2017-11-17  4:27                                 ` WANG Chao
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-16 13:54 UTC (permalink / raw)
  To: WANG Chao
  Cc: Linux Kernel Mailing List, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Thursday, November 16, 2017 10:50:36 AM CET WANG Chao wrote:
> On 11/16/17 at 01:24P, Rafael J. Wysocki wrote:
> > On Wednesday, November 15, 2017 10:33:47 AM CET WANG Chao wrote:
> > > On 11/15/17 at 02:13P, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > 
> > > > After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> > > > for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> > > > on x86 can be either the nominal CPU frequency (which is constant)
> > > > or the frequency most recently requested by a scaling governor in
> > > > cpufreq, depending on the cpufreq configuration.  That is somewhat
> > > > inconsistent and is different from what it was before 4.13, so in
> > > > order to restore the previous behavior, make it report the current
> > > > CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> > > > 
> > > > To that end, modify the /proc/cpuinfo implementation on x86 to use
> > > > aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> > > > registers, if available, and use their values to compute the CPU
> > > > frequency to be reported as "cpu MHz".
> > > > 
> > > > However, do that carefully enough to avoid accumulating delays that
> > > > lead to unacceptable access times for /proc/cpuinfo on systems with
> > > > many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> > > > asynchronously at the /proc/cpuinfo open time, add a single delay
> > > > upfront (if necessary) at that point and simply compute the current
> > > > frequency while running show_cpuinfo() for each individual CPU.
> > > 
> > > Hi, Rafael
> > > 
> > > I tested your patch. It's much faster.
> > > 
> > > But from what I got, calling aperfmperf_snapshot_khz() asynchronously
> > > with 10ms sleep takes much longer than calling aperfmperf_snapshot_khz()
> > > synchronously.
> > > 
> > > Here's my result on 64 CPUs:
> > > 
> > >  - async aperfmperf_snapshot_khz() w/ 10ms sleep:
> > > 
> > > # time cat /proc/cpuinfo > /dev/null
> > > real    0m0.014s
> > > user    0m0.000s
> > > sys     0m0.002s
> > > 
> > >  - sync aperfmperf_snapshot_khz() w/o any sleep:
> > > 
> > > # time cat /proc/cpuinfo > /dev/null
> > > real    0m0.002s
> > > user    0m0.000s
> > > sys     0m0.002s
> > 
> > Sure, but the delay is there, because without it the computed frequency
> > may be way off for at least one of the CPUs.
> 
> Thanks, I understand now. In this case, The 10ms delay turns out to be
> the interval of measuring aperf and mperf and computing their deltas.
> 
> Last question though, is 10ms best practice or can we make it shorter,
> say 5ms?

Experimentally, I found 5 ms to be slightly too short.  It all depends on
how accurate the numbers are expected to be, however, so there is some room
for adjustments.

Regardless, I'd prefer to start with 10 ms as that is what has been used in
intel_pstate for quite a long time at least and adjust later if need be.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-16 13:54                               ` Rafael J. Wysocki
@ 2017-11-17  4:27                                 ` WANG Chao
  2017-11-17 13:33                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: WANG Chao @ 2017-11-17  4:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux Kernel Mailing List, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On 11/16/17 at 02:54P, Rafael J. Wysocki wrote:
> On Thursday, November 16, 2017 10:50:36 AM CET WANG Chao wrote:
> > On 11/16/17 at 01:24P, Rafael J. Wysocki wrote:
> > > On Wednesday, November 15, 2017 10:33:47 AM CET WANG Chao wrote:
> > > > On 11/15/17 at 02:13P, Rafael J. Wysocki wrote:
> > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > 
> > > > > After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> > > > > for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> > > > > on x86 can be either the nominal CPU frequency (which is constant)
> > > > > or the frequency most recently requested by a scaling governor in
> > > > > cpufreq, depending on the cpufreq configuration.  That is somewhat
> > > > > inconsistent and is different from what it was before 4.13, so in
> > > > > order to restore the previous behavior, make it report the current
> > > > > CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> > > > > 
> > > > > To that end, modify the /proc/cpuinfo implementation on x86 to use
> > > > > aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> > > > > registers, if available, and use their values to compute the CPU
> > > > > frequency to be reported as "cpu MHz".
> > > > > 
> > > > > However, do that carefully enough to avoid accumulating delays that
> > > > > lead to unacceptable access times for /proc/cpuinfo on systems with
> > > > > many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> > > > > asynchronously at the /proc/cpuinfo open time, add a single delay
> > > > > upfront (if necessary) at that point and simply compute the current
> > > > > frequency while running show_cpuinfo() for each individual CPU.
> > > > 
> > > > Hi, Rafael
> > > > 
> > > > I tested your patch. It's much faster.
> > > > 
> > > > But from what I got, calling aperfmperf_snapshot_khz() asynchronously
> > > > with 10ms sleep takes much longer than calling aperfmperf_snapshot_khz()
> > > > synchronously.
> > > > 
> > > > Here's my result on 64 CPUs:
> > > > 
> > > >  - async aperfmperf_snapshot_khz() w/ 10ms sleep:
> > > > 
> > > > # time cat /proc/cpuinfo > /dev/null
> > > > real    0m0.014s
> > > > user    0m0.000s
> > > > sys     0m0.002s
> > > > 
> > > >  - sync aperfmperf_snapshot_khz() w/o any sleep:
> > > > 
> > > > # time cat /proc/cpuinfo > /dev/null
> > > > real    0m0.002s
> > > > user    0m0.000s
> > > > sys     0m0.002s
> > > 
> > > Sure, but the delay is there, because without it the computed frequency
> > > may be way off for at least one of the CPUs.
> > 
> > Thanks, I understand now. In this case, The 10ms delay turns out to be
> > the interval of measuring aperf and mperf and computing their deltas.
> > 
> > Last question though, is 10ms best practice or can we make it shorter,
> > say 5ms?
> 
> Experimentally, I found 5 ms to be slightly too short.  It all depends on
> how accurate the numbers are expected to be, however, so there is some room
> for adjustments.
> 
> Regardless, I'd prefer to start with 10 ms as that is what has been used in
> intel_pstate for quite a long time at least and adjust later if need be.

Sure. Thanks for your explanation.

This patch looks good to me.

Reviewed-by: WANG Chao <chao.wang@ucloud.cn>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
  2017-11-17  4:27                                 ` WANG Chao
@ 2017-11-17 13:33                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 31+ messages in thread
From: Rafael J. Wysocki @ 2017-11-17 13:33 UTC (permalink / raw)
  To: WANG Chao
  Cc: Linux Kernel Mailing List, Linus Torvalds, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Vikas Shivappa, Kate Stewart,
	Len Brown, Greg Kroah-Hartman, Philippe Ombredanne,
	Mathias Krause, the arch/x86 maintainers, Linux PM,
	Rafael J. Wysocki

On Friday, November 17, 2017 5:27:07 AM CET WANG Chao wrote:
> On 11/16/17 at 02:54P, Rafael J. Wysocki wrote:
> > On Thursday, November 16, 2017 10:50:36 AM CET WANG Chao wrote:
> > > On 11/16/17 at 01:24P, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 15, 2017 10:33:47 AM CET WANG Chao wrote:
> > > > > On 11/15/17 at 02:13P, Rafael J. Wysocki wrote:
> > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > > > 
> > > > > > After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get()
> > > > > > for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
> > > > > > on x86 can be either the nominal CPU frequency (which is constant)
> > > > > > or the frequency most recently requested by a scaling governor in
> > > > > > cpufreq, depending on the cpufreq configuration.  That is somewhat
> > > > > > inconsistent and is different from what it was before 4.13, so in
> > > > > > order to restore the previous behavior, make it report the current
> > > > > > CPU frequency like the scaling_cur_freq sysfs file in cpufreq.
> > > > > > 
> > > > > > To that end, modify the /proc/cpuinfo implementation on x86 to use
> > > > > > aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
> > > > > > registers, if available, and use their values to compute the CPU
> > > > > > frequency to be reported as "cpu MHz".
> > > > > > 
> > > > > > However, do that carefully enough to avoid accumulating delays that
> > > > > > lead to unacceptable access times for /proc/cpuinfo on systems with
> > > > > > many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
> > > > > > asynchronously at the /proc/cpuinfo open time, add a single delay
> > > > > > upfront (if necessary) at that point and simply compute the current
> > > > > > frequency while running show_cpuinfo() for each individual CPU.
> > > > > 
> > > > > Hi, Rafael
> > > > > 
> > > > > I tested your patch. It's much faster.
> > > > > 
> > > > > But from what I got, calling aperfmperf_snapshot_khz() asynchronously
> > > > > with 10ms sleep takes much longer than calling aperfmperf_snapshot_khz()
> > > > > synchronously.
> > > > > 
> > > > > Here's my result on 64 CPUs:
> > > > > 
> > > > >  - async aperfmperf_snapshot_khz() w/ 10ms sleep:
> > > > > 
> > > > > # time cat /proc/cpuinfo > /dev/null
> > > > > real    0m0.014s
> > > > > user    0m0.000s
> > > > > sys     0m0.002s
> > > > > 
> > > > >  - sync aperfmperf_snapshot_khz() w/o any sleep:
> > > > > 
> > > > > # time cat /proc/cpuinfo > /dev/null
> > > > > real    0m0.002s
> > > > > user    0m0.000s
> > > > > sys     0m0.002s
> > > > 
> > > > Sure, but the delay is there, because without it the computed frequency
> > > > may be way off for at least one of the CPUs.
> > > 
> > > Thanks, I understand now. In this case, The 10ms delay turns out to be
> > > the interval of measuring aperf and mperf and computing their deltas.
> > > 
> > > Last question though, is 10ms best practice or can we make it shorter,
> > > say 5ms?
> > 
> > Experimentally, I found 5 ms to be slightly too short.  It all depends on
> > how accurate the numbers are expected to be, however, so there is some room
> > for adjustments.
> > 
> > Regardless, I'd prefer to start with 10 ms as that is what has been used in
> > intel_pstate for quite a long time at least and adjust later if need be.
> 
> Sure. Thanks for your explanation.
> 
> This patch looks good to me.
> 
> Reviewed-by: WANG Chao <chao.wang@ucloud.cn>

Thanks for the reviews, much appreciated!

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-11-17 13:34 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-09 10:38 [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again WANG Chao
2017-11-09 16:06 ` Rafael J. Wysocki
2017-11-09 22:30   ` Rafael J. Wysocki
2017-11-10  0:06     ` Rafael J. Wysocki
2017-11-10  4:04       ` WANG Chao
2017-11-10  4:11         ` WANG Chao
2017-11-10 19:11     ` Linus Torvalds
2017-11-10 23:09       ` Rafael J. Wysocki
2017-11-14 22:47         ` Rafael J. Wysocki
2017-11-14 23:02           ` Linus Torvalds
2017-11-14 23:53             ` Thomas Gleixner
2017-11-15  0:04               ` Linus Torvalds
2017-11-15  0:06                 ` Linus Torvalds
2017-11-15  0:30                   ` Rafael J. Wysocki
2017-11-15  0:34                     ` Linus Torvalds
2017-11-15  1:13                       ` [PATCH] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo Rafael J. Wysocki
2017-11-15  8:47                         ` Thomas Gleixner
2017-11-15  9:33                         ` WANG Chao
2017-11-16  0:24                           ` Rafael J. Wysocki
2017-11-16  9:50                             ` WANG Chao
2017-11-16 13:54                               ` Rafael J. Wysocki
2017-11-17  4:27                                 ` WANG Chao
2017-11-17 13:33                                   ` Rafael J. Wysocki
2017-11-15  7:43                     ` [PATCH] x86: use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz" again Ingo Molnar
2017-11-15  7:54                       ` Greg Kroah-Hartman
2017-11-15 17:27                       ` Linus Torvalds
2017-11-15 18:05                         ` Thomas Gleixner
2017-11-15  8:47                 ` Thomas Gleixner
2017-11-15  0:06               ` Rafael J. Wysocki
2017-11-10  7:25   ` Ingo Molnar
2017-11-10  9:21     ` WANG Chao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).