linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Frequency invariance fixes for x86
@ 2020-04-16  5:47 Giovanni Gherdovich
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-16  5:47 UTC (permalink / raw)
  To: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson, Giovanni Gherdovich

Patches 1-3 address bugs in the current frequency invariance support for x86,
including the incompatibility with cpu0 hotplug reported by Chris Wilson at
https://lore.kernel.org/lkml/158556634294.3228.4889951961483021094@build.alporthouse.com/
and the issue with CPUs that have less than 4 cores pointed out earlier
today by Like Xu at
https://lore.kernel.org/lkml/20200416021210.170736-1-like.xu@linux.intel.com/ 

Patch 4 is a minor code reorganization.

Giovanni Gherdovich (3):
  x86, sched: Bail out of frequency invariance if base frequency is
    unknown
  x86, sched: Account for CPUs with less than 4 cores in freq.
    invariance
  x86, sched: Move check for CPU type to caller function

Peter Zijlstra (Intel) (1):
  x86, sched: Don't enable static key when starting secondary CPUs

 arch/x86/kernel/smpboot.c | 47 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 14 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-16  5:47 [PATCH 0/4] Frequency invariance fixes for x86 Giovanni Gherdovich
@ 2020-04-16  5:47 ` Giovanni Gherdovich
  2020-04-16  6:41   ` Giovanni Gherdovich
                     ` (3 more replies)
  2020-04-16  5:47 ` [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance Giovanni Gherdovich
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-16  5:47 UTC (permalink / raw)
  To: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson, Giovanni Gherdovich

Some hypervisors such as VMWare ESXi 5.5 advertise support for
X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
clock frequency of the CPU (highest non-turbo frequency), producing a
division by zero when computing the ratio turbo_freq/base_freq necessary
for frequency invariant accounting.

It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
couldn't be done in principle (not that it would make a lot of sense in a
VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
that feature.

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
---
 arch/x86/kernel/smpboot.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fe3ab9632f3b..3a318ec9bc17 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
 	return false;
 
 out:
+	/*
+	 * Some hypervisors advertise X86_FEATURE_APERFMPERF
+	 * but then fill all MSR's with zeroes.
+	 */
+	if (!base_freq) {
+		pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
+		return false;
+	}
+
 	arch_turbo_freq_ratio = div_u64(turbo_freq * SCHED_CAPACITY_SCALE,
 					base_freq);
 	arch_set_max_freq_ratio(turbo_disabled());
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance
  2020-04-16  5:47 [PATCH 0/4] Frequency invariance fixes for x86 Giovanni Gherdovich
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
@ 2020-04-16  5:47 ` Giovanni Gherdovich
  2020-04-16 15:26   ` Dave Kleikamp
  2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
  2020-04-16  5:47 ` [PATCH 3/4] x86, sched: Don't enable static key when starting secondary CPUs Giovanni Gherdovich
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-16  5:47 UTC (permalink / raw)
  To: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson, Giovanni Gherdovich

If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will
rightfully report that the 4C turbo ratio is zero. In such cases, use the
1C turbo ratio instead for frequency invariance calculations.

Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com>
Reported-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
---
 arch/x86/kernel/smpboot.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3a318ec9bc17..5d346b70844b 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1945,18 +1945,23 @@ static bool skx_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq, int size)
 
 static bool core_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq)
 {
+	u64 msr;
 	int err;
 
 	err = rdmsrl_safe(MSR_PLATFORM_INFO, base_freq);
 	if (err)
 		return false;
 
-	err = rdmsrl_safe(MSR_TURBO_RATIO_LIMIT, turbo_freq);
+	err = rdmsrl_safe(MSR_TURBO_RATIO_LIMIT, &msr);
 	if (err)
 		return false;
 
-	*base_freq = (*base_freq >> 8) & 0xFF;      /* max P state */
-	*turbo_freq = (*turbo_freq >> 24) & 0xFF;   /* 4C turbo    */
+	*base_freq = (*base_freq >> 8) & 0xFF;    /* max P state */
+	*turbo_freq = (msr >> 24) & 0xFF;         /* 4C turbo    */
+
+	/* The CPU may have less than 4 cores */
+	if (!*turbo_freq)
+		*turbo_freq = msr & 0xFF;         /* 1C turbo    */
 
 	return true;
 }
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] x86, sched: Don't enable static key when starting secondary CPUs
  2020-04-16  5:47 [PATCH 0/4] Frequency invariance fixes for x86 Giovanni Gherdovich
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
  2020-04-16  5:47 ` [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance Giovanni Gherdovich
@ 2020-04-16  5:47 ` Giovanni Gherdovich
  2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra (Intel)
  2020-04-16  5:47 ` [PATCH 4/4] x86, sched: Move check for CPU type to caller function Giovanni Gherdovich
  2020-04-16  7:20 ` [PATCH 0/4] Frequency invariance fixes for x86 Peter Zijlstra
  4 siblings, 1 reply; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-16  5:47 UTC (permalink / raw)
  To: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson, Giovanni Gherdovich

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

The static key arch_scale_freq_key only needs to be enabled once (at
boot). This change fixes a bug by which the key was enabled every time cpu0
is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are
started from the idle thread: setting a static key from there means
acquiring a lock and may result in sleeping in the idle task, causing CPU
lockup.

Another consequence of this change is that init_counter_refs() is now
called on each CPU correctly; previously the function on_each_cpu() was
used, but it was called at boot when the only online cpu is cpu0.

[ggherdovich@suse.cz: Tested and wrote changelog]
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
---
 arch/x86/kernel/smpboot.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5d346b70844b..dd8e15f648bc 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -147,7 +147,7 @@ static inline void smpboot_restore_warm_reset_vector(void)
 	*((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
 }
 
-static void init_freq_invariance(void);
+static void init_freq_invariance(bool secondary);
 
 /*
  * Report back to the Boot Processor during boot time or to the caller processor
@@ -185,7 +185,7 @@ static void smp_callin(void)
 	 */
 	set_cpu_sibling_map(raw_smp_processor_id());
 
-	init_freq_invariance();
+	init_freq_invariance(true);
 
 	/*
 	 * Get our bogomips.
@@ -1341,7 +1341,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	set_sched_topology(x86_topology);
 
 	set_cpu_sibling_map(0);
-	init_freq_invariance();
+	init_freq_invariance(false);
 	smp_sanity_check();
 
 	switch (apic_intr_mode) {
@@ -2005,7 +2005,7 @@ static bool intel_set_max_freq_ratio(void)
 	return true;
 }
 
-static void init_counter_refs(void *arg)
+static void init_counter_refs(void)
 {
 	u64 aperf, mperf;
 
@@ -2016,18 +2016,25 @@ static void init_counter_refs(void *arg)
 	this_cpu_write(arch_prev_mperf, mperf);
 }
 
-static void init_freq_invariance(void)
+static void init_freq_invariance(bool secondary)
 {
 	bool ret = false;
 
-	if (smp_processor_id() != 0 || !boot_cpu_has(X86_FEATURE_APERFMPERF))
+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
 		return;
 
+	if (secondary) {
+		if (static_branch_likely(&arch_scale_freq_key)) {
+			init_counter_refs();
+		}
+		return;
+	}
+
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
 		ret = intel_set_max_freq_ratio();
 
 	if (ret) {
-		on_each_cpu(init_counter_refs, NULL, 1);
+		init_counter_refs();
 		static_branch_enable(&arch_scale_freq_key);
 	} else {
 		pr_debug("Couldn't determine max cpu frequency, necessary for scale-invariant accounting.\n");
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] x86, sched: Move check for CPU type to caller function
  2020-04-16  5:47 [PATCH 0/4] Frequency invariance fixes for x86 Giovanni Gherdovich
                   ` (2 preceding siblings ...)
  2020-04-16  5:47 ` [PATCH 3/4] x86, sched: Don't enable static key when starting secondary CPUs Giovanni Gherdovich
@ 2020-04-16  5:47 ` Giovanni Gherdovich
  2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
  2020-04-16  7:20 ` [PATCH 0/4] Frequency invariance fixes for x86 Peter Zijlstra
  4 siblings, 1 reply; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-16  5:47 UTC (permalink / raw)
  To: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson, Giovanni Gherdovich

Improve readability of the function intel_set_max_freq_ratio() by moving
the check for KNL CPUs there, together with checks for GLM and SKX.

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
---
 arch/x86/kernel/smpboot.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index dd8e15f648bc..8c89e4d9ad28 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1877,9 +1877,6 @@ static bool knl_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq,
 	int err, i;
 	u64 msr;
 
-	if (!x86_match_cpu(has_knl_turbo_ratio_limits))
-		return false;
-
 	err = rdmsrl_safe(MSR_PLATFORM_INFO, base_freq);
 	if (err)
 		return false;
@@ -1977,7 +1974,8 @@ static bool intel_set_max_freq_ratio(void)
 	    skx_set_max_freq_ratio(&base_freq, &turbo_freq, 1))
 		goto out;
 
-	if (knl_set_max_freq_ratio(&base_freq, &turbo_freq, 1))
+	if (x86_match_cpu(has_knl_turbo_ratio_limits) &&
+	    knl_set_max_freq_ratio(&base_freq, &turbo_freq, 1))
 		goto out;
 
 	if (x86_match_cpu(has_skx_turbo_ratio_limits) &&
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
@ 2020-04-16  6:41   ` Giovanni Gherdovich
  2020-04-16 15:41   ` Rafael J. Wysocki
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-16  6:41 UTC (permalink / raw)
  To: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson, Dario Faggioli

+Dario Faggioli

On Thu, 2020-04-16 at 07:47 +0200, Giovanni Gherdovich wrote:
> Some hypervisors such as VMWare ESXi 5.5 advertise support for
> X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
> MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
> clock frequency of the CPU (highest non-turbo frequency), producing a
> division by zero when computing the ratio turbo_freq/base_freq necessary
> for frequency invariant accounting.
> 
> It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
> data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
> couldn't be done in principle (not that it would make a lot of sense in a
> VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
> appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
> that feature.
> 
> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
> ---
>  arch/x86/kernel/smpboot.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index fe3ab9632f3b..3a318ec9bc17 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
>  	return false;
>  
>  out:
> +	/*
> +	 * Some hypervisors advertise X86_FEATURE_APERFMPERF
> +	 * but then fill all MSR's with zeroes.
> +	 */
> +	if (!base_freq) {
> +		pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
> +		return false;
> +	}
> +
>  	arch_turbo_freq_ratio = div_u64(turbo_freq * SCHED_CAPACITY_SCALE,
>  					base_freq);
>  	arch_set_max_freq_ratio(turbo_disabled());

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] Frequency invariance fixes for x86
  2020-04-16  5:47 [PATCH 0/4] Frequency invariance fixes for x86 Giovanni Gherdovich
                   ` (3 preceding siblings ...)
  2020-04-16  5:47 ` [PATCH 4/4] x86, sched: Move check for CPU type to caller function Giovanni Gherdovich
@ 2020-04-16  7:20 ` Peter Zijlstra
  4 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2020-04-16  7:20 UTC (permalink / raw)
  To: Giovanni Gherdovich
  Cc: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Len Brown, Rafael J . Wysocki, x86, linux-pm,
	linux-kernel, Mel Gorman, Doug Smythies, Like Xu, Neil Rickert,
	Chris Wilson

On Thu, Apr 16, 2020 at 07:47:41AM +0200, Giovanni Gherdovich wrote:
> Patches 1-3 address bugs in the current frequency invariance support for x86,
> including the incompatibility with cpu0 hotplug reported by Chris Wilson at
> https://lore.kernel.org/lkml/158556634294.3228.4889951961483021094@build.alporthouse.com/
> and the issue with CPUs that have less than 4 cores pointed out earlier
> today by Like Xu at
> https://lore.kernel.org/lkml/20200416021210.170736-1-like.xu@linux.intel.com/ 
> 
> Patch 4 is a minor code reorganization.
> 
> Giovanni Gherdovich (3):
>   x86, sched: Bail out of frequency invariance if base frequency is
>     unknown
>   x86, sched: Account for CPUs with less than 4 cores in freq.
>     invariance
>   x86, sched: Move check for CPU type to caller function
> 
> Peter Zijlstra (Intel) (1):
>   x86, sched: Don't enable static key when starting secondary CPUs
> 
>  arch/x86/kernel/smpboot.c | 47 +++++++++++++++++++++++++++++++++--------------
>  1 file changed, 33 insertions(+), 14 deletions(-)

Thanks!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance
  2020-04-16  5:47 ` [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance Giovanni Gherdovich
@ 2020-04-16 15:26   ` Dave Kleikamp
  2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
  1 sibling, 0 replies; 17+ messages in thread
From: Dave Kleikamp @ 2020-04-16 15:26 UTC (permalink / raw)
  To: Giovanni Gherdovich, Srinivas Pandruvada, Thomas Gleixner,
	Ingo Molnar, Peter Zijlstra, Borislav Petkov, Len Brown,
	Rafael J . Wysocki
  Cc: x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson

On 4/16/20 12:47 AM, Giovanni Gherdovich wrote:
> If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will
> rightfully report that the 4C turbo ratio is zero. In such cases, use the
> 1C turbo ratio instead for frequency invariance calculations.

Thank you! This makes my Lenovo T410 happy again. I had tracked down the
problem and come up with a similar patch. I was about to report it when
I came across this patch set.

> 
> Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com>
> Reported-by: Like Xu <like.xu@linux.intel.com>
> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")

Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>

> ---
>  arch/x86/kernel/smpboot.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 3a318ec9bc17..5d346b70844b 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1945,18 +1945,23 @@ static bool skx_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq, int size)
>  
>  static bool core_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq)
>  {
> +	u64 msr;
>  	int err;
>  
>  	err = rdmsrl_safe(MSR_PLATFORM_INFO, base_freq);
>  	if (err)
>  		return false;
>  
> -	err = rdmsrl_safe(MSR_TURBO_RATIO_LIMIT, turbo_freq);
> +	err = rdmsrl_safe(MSR_TURBO_RATIO_LIMIT, &msr);
>  	if (err)
>  		return false;
>  
> -	*base_freq = (*base_freq >> 8) & 0xFF;      /* max P state */
> -	*turbo_freq = (*turbo_freq >> 24) & 0xFF;   /* 4C turbo    */
> +	*base_freq = (*base_freq >> 8) & 0xFF;    /* max P state */
> +	*turbo_freq = (msr >> 24) & 0xFF;         /* 4C turbo    */
> +
> +	/* The CPU may have less than 4 cores */
> +	if (!*turbo_freq)
> +		*turbo_freq = msr & 0xFF;         /* 1C turbo    */
>  
>  	return true;
>  }
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
  2020-04-16  6:41   ` Giovanni Gherdovich
@ 2020-04-16 15:41   ` Rafael J. Wysocki
  2020-04-22 17:15   ` Ricardo Neri
  2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
  3 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2020-04-16 15:41 UTC (permalink / raw)
  To: Giovanni Gherdovich
  Cc: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	Mel Gorman, Doug Smythies, Like Xu, Neil Rickert, Chris Wilson

On Thu, Apr 16, 2020 at 7:48 AM Giovanni Gherdovich <ggherdovich@suse.cz> wrote:
>
> Some hypervisors such as VMWare ESXi 5.5 advertise support for
> X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
> MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
> clock frequency of the CPU (highest non-turbo frequency), producing a
> division by zero when computing the ratio turbo_freq/base_freq necessary
> for frequency invariant accounting.
>
> It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
> data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
> couldn't be done in principle (not that it would make a lot of sense in a
> VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
> appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
> that feature.
>
> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")

Please feel free to add

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

to all patches in the series and I'm expecting them to be routed through tip.

Thanks!

> ---
>  arch/x86/kernel/smpboot.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index fe3ab9632f3b..3a318ec9bc17 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
>         return false;
>
>  out:
> +       /*
> +        * Some hypervisors advertise X86_FEATURE_APERFMPERF
> +        * but then fill all MSR's with zeroes.
> +        */
> +       if (!base_freq) {
> +               pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
> +               return false;
> +       }
> +
>         arch_turbo_freq_ratio = div_u64(turbo_freq * SCHED_CAPACITY_SCALE,
>                                         base_freq);
>         arch_set_max_freq_ratio(turbo_disabled());
> --
> 2.16.4
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
  2020-04-16  6:41   ` Giovanni Gherdovich
  2020-04-16 15:41   ` Rafael J. Wysocki
@ 2020-04-22 17:15   ` Ricardo Neri
  2020-04-23  8:06     ` Giovanni Gherdovich
  2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
  3 siblings, 1 reply; 17+ messages in thread
From: Ricardo Neri @ 2020-04-22 17:15 UTC (permalink / raw)
  To: Giovanni Gherdovich
  Cc: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki,
	x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson

On Thu, Apr 16, 2020 at 07:47:42AM +0200, Giovanni Gherdovich wrote:
> Some hypervisors such as VMWare ESXi 5.5 advertise support for
> X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
> MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
> clock frequency of the CPU (highest non-turbo frequency), producing a
> division by zero when computing the ratio turbo_freq/base_freq necessary
> for frequency invariant accounting.
> 
> It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
> data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
> couldn't be done in principle (not that it would make a lot of sense in a
> VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
> appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
> that feature.
> 
> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
> ---
>  arch/x86/kernel/smpboot.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index fe3ab9632f3b..3a318ec9bc17 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
>  	return false;
>  
>  out:
> +	/*
> +	 * Some hypervisors advertise X86_FEATURE_APERFMPERF
> +	 * but then fill all MSR's with zeroes.
> +	 */
> +	if (!base_freq) {
> +		pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
> +		return false;
> +	}

It may be possible that MSR_TURBO_RATIO_LIMIT is also all-zeros. In
such case, turbo_freq will be also zero. If that is the case,
arch_max_freq_ratio will be zero and we will see a division by zero
exception in arch_scale_freq_tick() because mcnt is multiplied by
arch_max_freq_ratio().

Hence, you should also check for !turbo_freq.

Thanks and BR,
Ricardo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip: sched/urgent] x86, sched: Move check for CPU type to caller function
  2020-04-16  5:47 ` [PATCH 4/4] x86, sched: Move check for CPU type to caller function Giovanni Gherdovich
@ 2020-04-22 21:20   ` tip-bot2 for Giovanni Gherdovich
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Giovanni Gherdovich @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Giovanni Gherdovich, Peter Zijlstra (Intel),
	Rafael J. Wysocki, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     db441bd9f630329c402d5cdd319f11bfcf509fb6
Gitweb:        https://git.kernel.org/tip/db441bd9f630329c402d5cdd319f11bfcf509fb6
Author:        Giovanni Gherdovich <ggherdovich@suse.cz>
AuthorDate:    Thu, 16 Apr 2020 07:47:45 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 22 Apr 2020 23:10:13 +02:00

x86, sched: Move check for CPU type to caller function

Improve readability of the function intel_set_max_freq_ratio() by moving
the check for KNL CPUs there, together with checks for GLM and SKX.

Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-5-ggherdovich@suse.cz
---
 arch/x86/kernel/smpboot.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index dd8e15f..8c89e4d 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1877,9 +1877,6 @@ static bool knl_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq,
 	int err, i;
 	u64 msr;
 
-	if (!x86_match_cpu(has_knl_turbo_ratio_limits))
-		return false;
-
 	err = rdmsrl_safe(MSR_PLATFORM_INFO, base_freq);
 	if (err)
 		return false;
@@ -1977,7 +1974,8 @@ static bool intel_set_max_freq_ratio(void)
 	    skx_set_max_freq_ratio(&base_freq, &turbo_freq, 1))
 		goto out;
 
-	if (knl_set_max_freq_ratio(&base_freq, &turbo_freq, 1))
+	if (x86_match_cpu(has_knl_turbo_ratio_limits) &&
+	    knl_set_max_freq_ratio(&base_freq, &turbo_freq, 1))
 		goto out;
 
 	if (x86_match_cpu(has_skx_turbo_ratio_limits) &&

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/urgent] x86, sched: Account for CPUs with less than 4 cores in freq. invariance
  2020-04-16  5:47 ` [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance Giovanni Gherdovich
  2020-04-16 15:26   ` Dave Kleikamp
@ 2020-04-22 21:20   ` tip-bot2 for Giovanni Gherdovich
  1 sibling, 0 replies; 17+ messages in thread
From: tip-bot2 for Giovanni Gherdovich @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Like Xu, Neil Rickert, Giovanni Gherdovich,
	Peter Zijlstra (Intel),
	Rafael J. Wysocki, Dave Kleikamp, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     23ccee22e834eca236b9a20989caf6905bd6954a
Gitweb:        https://git.kernel.org/tip/23ccee22e834eca236b9a20989caf6905bd6954a
Author:        Giovanni Gherdovich <ggherdovich@suse.cz>
AuthorDate:    Thu, 16 Apr 2020 07:47:43 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 22 Apr 2020 23:10:13 +02:00

x86, sched: Account for CPUs with less than 4 cores in freq. invariance

If a CPU has less than 4 physical cores, MSR_TURBO_RATIO_LIMIT will
rightfully report that the 4C turbo ratio is zero. In such cases, use the
1C turbo ratio instead for frequency invariance calculations.

Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
Reported-by: Like Xu <like.xu@linux.intel.com>
Reported-by: Neil Rickert <nwr10cst-oslnx@yahoo.com>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Link: https://lkml.kernel.org/r/20200416054745.740-3-ggherdovich@suse.cz
---
 arch/x86/kernel/smpboot.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3a318ec..5d346b7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1945,18 +1945,23 @@ static bool skx_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq, int size)
 
 static bool core_set_max_freq_ratio(u64 *base_freq, u64 *turbo_freq)
 {
+	u64 msr;
 	int err;
 
 	err = rdmsrl_safe(MSR_PLATFORM_INFO, base_freq);
 	if (err)
 		return false;
 
-	err = rdmsrl_safe(MSR_TURBO_RATIO_LIMIT, turbo_freq);
+	err = rdmsrl_safe(MSR_TURBO_RATIO_LIMIT, &msr);
 	if (err)
 		return false;
 
-	*base_freq = (*base_freq >> 8) & 0xFF;      /* max P state */
-	*turbo_freq = (*turbo_freq >> 24) & 0xFF;   /* 4C turbo    */
+	*base_freq = (*base_freq >> 8) & 0xFF;    /* max P state */
+	*turbo_freq = (msr >> 24) & 0xFF;         /* 4C turbo    */
+
+	/* The CPU may have less than 4 cores */
+	if (!*turbo_freq)
+		*turbo_freq = msr & 0xFF;         /* 1C turbo    */
 
 	return true;
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/urgent] x86, sched: Don't enable static key when starting secondary CPUs
  2020-04-16  5:47 ` [PATCH 3/4] x86, sched: Don't enable static key when starting secondary CPUs Giovanni Gherdovich
@ 2020-04-22 21:20   ` tip-bot2 for Peter Zijlstra (Intel)
  0 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Peter Zijlstra (Intel) @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chris Wilson, Peter Zijlstra (Intel),
	Giovanni Gherdovich, Rafael J. Wysocki, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     b56e7d45e80796ca963ac10902245b244d823caf
Gitweb:        https://git.kernel.org/tip/b56e7d45e80796ca963ac10902245b244d823caf
Author:        Peter Zijlstra (Intel) <peterz@infradead.org>
AuthorDate:    Thu, 16 Apr 2020 07:47:44 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 22 Apr 2020 23:10:13 +02:00

x86, sched: Don't enable static key when starting secondary CPUs

The static key arch_scale_freq_key only needs to be enabled once (at
boot). This change fixes a bug by which the key was enabled every time cpu0
is started, even as a secondary CPU during cpu hotplug. Secondary CPUs are
started from the idle thread: setting a static key from there means
acquiring a lock and may result in sleeping in the idle task, causing CPU
lockup.

Another consequence of this change is that init_counter_refs() is now
called on each CPU correctly; previously the function on_each_cpu() was
used, but it was called at boot when the only online cpu is cpu0.

[ggherdovich@suse.cz: Tested and wrote changelog]
Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-4-ggherdovich@suse.cz
---
 arch/x86/kernel/smpboot.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5d346b7..dd8e15f 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -147,7 +147,7 @@ static inline void smpboot_restore_warm_reset_vector(void)
 	*((volatile u32 *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = 0;
 }
 
-static void init_freq_invariance(void);
+static void init_freq_invariance(bool secondary);
 
 /*
  * Report back to the Boot Processor during boot time or to the caller processor
@@ -185,7 +185,7 @@ static void smp_callin(void)
 	 */
 	set_cpu_sibling_map(raw_smp_processor_id());
 
-	init_freq_invariance();
+	init_freq_invariance(true);
 
 	/*
 	 * Get our bogomips.
@@ -1341,7 +1341,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	set_sched_topology(x86_topology);
 
 	set_cpu_sibling_map(0);
-	init_freq_invariance();
+	init_freq_invariance(false);
 	smp_sanity_check();
 
 	switch (apic_intr_mode) {
@@ -2005,7 +2005,7 @@ out:
 	return true;
 }
 
-static void init_counter_refs(void *arg)
+static void init_counter_refs(void)
 {
 	u64 aperf, mperf;
 
@@ -2016,18 +2016,25 @@ static void init_counter_refs(void *arg)
 	this_cpu_write(arch_prev_mperf, mperf);
 }
 
-static void init_freq_invariance(void)
+static void init_freq_invariance(bool secondary)
 {
 	bool ret = false;
 
-	if (smp_processor_id() != 0 || !boot_cpu_has(X86_FEATURE_APERFMPERF))
+	if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
 		return;
 
+	if (secondary) {
+		if (static_branch_likely(&arch_scale_freq_key)) {
+			init_counter_refs();
+		}
+		return;
+	}
+
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
 		ret = intel_set_max_freq_ratio();
 
 	if (ret) {
-		on_each_cpu(init_counter_refs, NULL, 1);
+		init_counter_refs();
 		static_branch_enable(&arch_scale_freq_key);
 	} else {
 		pr_debug("Couldn't determine max cpu frequency, necessary for scale-invariant accounting.\n");

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [tip: sched/urgent] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
                     ` (2 preceding siblings ...)
  2020-04-22 17:15   ` Ricardo Neri
@ 2020-04-22 21:20   ` tip-bot2 for Giovanni Gherdovich
  3 siblings, 0 replies; 17+ messages in thread
From: tip-bot2 for Giovanni Gherdovich @ 2020-04-22 21:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Giovanni Gherdovich, Peter Zijlstra (Intel),
	Rafael J. Wysocki, x86, LKML

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     9a6c2c3c7a73ce315c57c1b002caad6fcc858d0f
Gitweb:        https://git.kernel.org/tip/9a6c2c3c7a73ce315c57c1b002caad6fcc858d0f
Author:        Giovanni Gherdovich <ggherdovich@suse.cz>
AuthorDate:    Thu, 16 Apr 2020 07:47:42 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 22 Apr 2020 23:10:13 +02:00

x86, sched: Bail out of frequency invariance if base frequency is unknown

Some hypervisors such as VMWare ESXi 5.5 advertise support for
X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
clock frequency of the CPU (highest non-turbo frequency), producing a
division by zero when computing the ratio turbo_freq/base_freq necessary
for frequency invariant accounting.

It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
couldn't be done in principle (not that it would make a lot of sense in a
VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
that feature.

Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lkml.kernel.org/r/20200416054745.740-2-ggherdovich@suse.cz
---
 arch/x86/kernel/smpboot.c |  9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fe3ab96..3a318ec 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
 	return false;
 
 out:
+	/*
+	 * Some hypervisors advertise X86_FEATURE_APERFMPERF
+	 * but then fill all MSR's with zeroes.
+	 */
+	if (!base_freq) {
+		pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
+		return false;
+	}
+
 	arch_turbo_freq_ratio = div_u64(turbo_freq * SCHED_CAPACITY_SCALE,
 					base_freq);
 	arch_set_max_freq_ratio(turbo_disabled());

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-22 17:15   ` Ricardo Neri
@ 2020-04-23  8:06     ` Giovanni Gherdovich
  2020-04-24  1:32       ` Ricardo Neri
  0 siblings, 1 reply; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-23  8:06 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki,
	x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson

On Wed, 2020-04-22 at 10:15 -0700, Ricardo Neri wrote:
> On Thu, Apr 16, 2020 at 07:47:42AM +0200, Giovanni Gherdovich wrote:
> > Some hypervisors such as VMWare ESXi 5.5 advertise support for
> > X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
> > MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
> > clock frequency of the CPU (highest non-turbo frequency), producing a
> > division by zero when computing the ratio turbo_freq/base_freq necessary
> > for frequency invariant accounting.
> > 
> > It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
> > data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
> > couldn't be done in principle (not that it would make a lot of sense in a
> > VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
> > appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
> > that feature.
> > 
> > Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> > Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
> > ---
> >  arch/x86/kernel/smpboot.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index fe3ab9632f3b..3a318ec9bc17 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
> >  	return false;
> >  
> >  out:
> > +	/*
> > +	 * Some hypervisors advertise X86_FEATURE_APERFMPERF
> > +	 * but then fill all MSR's with zeroes.
> > +	 */
> > +	if (!base_freq) {
> > +		pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
> > +		return false;
> > +	}
> 
> It may be possible that MSR_TURBO_RATIO_LIMIT is also all-zeros. In
> such case, turbo_freq will be also zero. If that is the case,
> arch_max_freq_ratio will be zero and we will see a division by zero
> exception in arch_scale_freq_tick() because mcnt is multiplied by
> arch_max_freq_ratio().

Thanks Ricardo for clarifying this.

Follow-up question: when I see an all-zeros MSR_TURBO_RATIO_LIMIT, can I
assume the CPU doesn't support turbo boost? Or is it possible that such a CPU
has turbo boost, just the turbo ratios aren't declared in the MSR?

Some context: this feature (called "frequency invariance") wants to know
what's the max clock freq a CPU can have at any time (it needs it for some
scheduler calculations). This is hard to know precisely, because turbo can
kick in at any time and depends on many factors.  So it settles for an
"average maximum frequency", which I decided the 4 cores turbo is a good
estimate for. Now, if an all-zeros MSR_TURBO_RATIO_LIMIT means "turbo boost
unsupported", this is actually the easy case because then I know exactly what
the max freq is (base frequency). If, on the other hand, an all-zeros MSR
means "there may or may not be turbo, and you don't know how much" then I must
disable frequency invariance.


Thanks,
Giovanni

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-23  8:06     ` Giovanni Gherdovich
@ 2020-04-24  1:32       ` Ricardo Neri
  2020-04-24  5:53         ` Giovanni Gherdovich
  0 siblings, 1 reply; 17+ messages in thread
From: Ricardo Neri @ 2020-04-24  1:32 UTC (permalink / raw)
  To: Giovanni Gherdovich
  Cc: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki,
	x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson

On Thu, Apr 23, 2020 at 10:06:04AM +0200, Giovanni Gherdovich wrote:
> On Wed, 2020-04-22 at 10:15 -0700, Ricardo Neri wrote:
> > On Thu, Apr 16, 2020 at 07:47:42AM +0200, Giovanni Gherdovich wrote:
> > > Some hypervisors such as VMWare ESXi 5.5 advertise support for
> > > X86_FEATURE_APERFMPERF but then fill all MSR's with zeroes. In particular,
> > > MSR_PLATFORM_INFO set to zero tricks the code that wants to know the base
> > > clock frequency of the CPU (highest non-turbo frequency), producing a
> > > division by zero when computing the ratio turbo_freq/base_freq necessary
> > > for frequency invariant accounting.
> > > 
> > > It is to be noted that even if MSR_PLATFORM_INFO contained the appropriate
> > > data, APERF and MPERF are constantly zero on ESXi 5.5, thus freq-invariance
> > > couldn't be done in principle (not that it would make a lot of sense in a
> > > VM anyway). The real problem is advertising X86_FEATURE_APERFMPERF. This
> > > appears to be fixed in more recent versions: ESXi 6.7 doesn't advertise
> > > that feature.
> > > 
> > > Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> > > Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance")
> > > ---
> > >  arch/x86/kernel/smpboot.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > index fe3ab9632f3b..3a318ec9bc17 100644
> > > --- a/arch/x86/kernel/smpboot.c
> > > +++ b/arch/x86/kernel/smpboot.c
> > > @@ -1985,6 +1985,15 @@ static bool intel_set_max_freq_ratio(void)
> > >  	return false;
> > >  
> > >  out:
> > > +	/*
> > > +	 * Some hypervisors advertise X86_FEATURE_APERFMPERF
> > > +	 * but then fill all MSR's with zeroes.
> > > +	 */
> > > +	if (!base_freq) {
> > > +		pr_debug("Couldn't determine cpu base frequency, necessary for scale-invariant accounting.\n");
> > > +		return false;
> > > +	}
> > 
> > It may be possible that MSR_TURBO_RATIO_LIMIT is also all-zeros. In
> > such case, turbo_freq will be also zero. If that is the case,
> > arch_max_freq_ratio will be zero and we will see a division by zero
> > exception in arch_scale_freq_tick() because mcnt is multiplied by
> > arch_max_freq_ratio().
> 
> Thanks Ricardo for clarifying this.
> 
> Follow-up question: when I see an all-zeros MSR_TURBO_RATIO_LIMIT, can I
> assume the CPU doesn't support turbo boost? Or is it possible that such a CPU
> has turbo boost, just the turbo ratios aren't declared in the MSR?
> 
> Some context: this feature (called "frequency invariance") wants to know
> what's the max clock freq a CPU can have at any time (it needs it for some
> scheduler calculations). This is hard to know precisely, because turbo can
> kick in at any time and depends on many factors.  So it settles for an
> "average maximum frequency", which I decided the 4 cores turbo is a good
> estimate for. Now, if an all-zeros MSR_TURBO_RATIO_LIMIT means "turbo boost
> unsupported", this is actually the easy case because then I know exactly what
> the max freq is (base frequency). If, on the other hand, an all-zeros MSR
> means "there may or may not be turbo, and you don't know how much" then I must
> disable frequency invariance.

I'd say that there can be cases in which the CPU has turbo boost and yet the
turbo ratios are not declared in MSR_TURBO_RATIO_LIMIT. Hence, frequency
invariance should be disabled.

Thanks and BR,
Ricardo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown
  2020-04-24  1:32       ` Ricardo Neri
@ 2020-04-24  5:53         ` Giovanni Gherdovich
  0 siblings, 0 replies; 17+ messages in thread
From: Giovanni Gherdovich @ 2020-04-24  5:53 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Srinivas Pandruvada, Thomas Gleixner, Ingo Molnar,
	Peter Zijlstra, Borislav Petkov, Len Brown, Rafael J . Wysocki,
	x86, linux-pm, linux-kernel, Mel Gorman, Doug Smythies, Like Xu,
	Neil Rickert, Chris Wilson

On Thu, 2020-04-23 at 18:32 -0700, Ricardo Neri wrote:
> On Thu, Apr 23, 2020 at 10:06:04AM +0200, Giovanni Gherdovich wrote:
> > > 
> > > It may be possible that MSR_TURBO_RATIO_LIMIT is also all-zeros. In
> > > such case, turbo_freq will be also zero. If that is the case,
> > > arch_max_freq_ratio will be zero and we will see a division by zero
> > > exception in arch_scale_freq_tick() because mcnt is multiplied by
> > > arch_max_freq_ratio().
> > 
> > Thanks Ricardo for clarifying this.
> > 
> > Follow-up question: when I see an all-zeros MSR_TURBO_RATIO_LIMIT, can I
> > assume the CPU doesn't support turbo boost? Or is it possible that such a CPU
> > has turbo boost, just the turbo ratios aren't declared in the MSR?
> > 
> > Some context: this feature (called "frequency invariance") wants to know
> > what's the max clock freq a CPU can have at any time (it needs it for some
> > scheduler calculations). This is hard to know precisely, because turbo can
> > kick in at any time and depends on many factors.  So it settles for an
> > "average maximum frequency", which I decided the 4 cores turbo is a good
> > estimate for. Now, if an all-zeros MSR_TURBO_RATIO_LIMIT means "turbo boost
> > unsupported", this is actually the easy case because then I know exactly what
> > the max freq is (base frequency). If, on the other hand, an all-zeros MSR
> > means "there may or may not be turbo, and you don't know how much" then I must
> > disable frequency invariance.
> 
> I'd say that there can be cases in which the CPU has turbo boost and yet the
> turbo ratios are not declared in MSR_TURBO_RATIO_LIMIT. Hence, frequency
> invariance should be disabled.

Great, thanks for the information Ricardo!

For the tip tree maintainers: Ricardo is identifying an additional corner case
I need to take care of, but this series stands on its own: the commits
correctly do what their changelog says, and fix existing bugs.

I'll send an additional patch that follows Ricardo's recommendations, and it
will apply on top of this series.


Giovanni

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-04-24  5:53 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16  5:47 [PATCH 0/4] Frequency invariance fixes for x86 Giovanni Gherdovich
2020-04-16  5:47 ` [PATCH 1/4] x86, sched: Bail out of frequency invariance if base frequency is unknown Giovanni Gherdovich
2020-04-16  6:41   ` Giovanni Gherdovich
2020-04-16 15:41   ` Rafael J. Wysocki
2020-04-22 17:15   ` Ricardo Neri
2020-04-23  8:06     ` Giovanni Gherdovich
2020-04-24  1:32       ` Ricardo Neri
2020-04-24  5:53         ` Giovanni Gherdovich
2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
2020-04-16  5:47 ` [PATCH 2/4] x86, sched: Account for CPUs with less than 4 cores in freq. invariance Giovanni Gherdovich
2020-04-16 15:26   ` Dave Kleikamp
2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
2020-04-16  5:47 ` [PATCH 3/4] x86, sched: Don't enable static key when starting secondary CPUs Giovanni Gherdovich
2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra (Intel)
2020-04-16  5:47 ` [PATCH 4/4] x86, sched: Move check for CPU type to caller function Giovanni Gherdovich
2020-04-22 21:20   ` [tip: sched/urgent] " tip-bot2 for Giovanni Gherdovich
2020-04-16  7:20 ` [PATCH 0/4] Frequency invariance fixes for x86 Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).