linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Introduce x86_get_cpufreq_khz()
@ 2021-12-01  2:46 zhenwei pi
  2021-12-01  2:46 ` [PATCH v2 1/2] x86/cpu: " zhenwei pi
  2021-12-01  2:46 ` [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock zhenwei pi
  0 siblings, 2 replies; 12+ messages in thread
From: zhenwei pi @ 2021-12-01  2:46 UTC (permalink / raw)
  To: tglx, pbonzini; +Cc: kvm, linux-kernel, x86, zhenwei pi

v2:
 - Rename x86_get_freq() to x86_get_cpufreq_khz() to make function
   specific. (Suggested by Thomas Gleixner)
 - Use cpu_feature_enabled(X86_FEATURE_TSC) instead of
   cpu_has(&cpu_data(cpu), X86_FEATURE_TSC) to test feature.
   (Suggested by Thomas Gleixner)
 - Spell fixes. (Suggested by Thomas Gleixner)

v1:
 - Introduce x86_get_freq(), hide detailed implemention from proc
   routine.
 - KVM uses x86_get_freq() to get CPU frequecy.

zhenwei pi (2):
  x86/cpu: Introduce x86_get_cpufreq_khz()
  KVM: x86: use x86_get_freq to get freq for kvmclock

 arch/x86/include/asm/processor.h |  2 ++
 arch/x86/kernel/cpu/common.c     | 19 +++++++++++++++++++
 arch/x86/kernel/cpu/proc.c       | 13 +++----------
 arch/x86/kvm/x86.c               |  4 +---
 4 files changed, 25 insertions(+), 13 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] x86/cpu: Introduce x86_get_cpufreq_khz()
  2021-12-01  2:46 [PATCH v2 0/2] Introduce x86_get_cpufreq_khz() zhenwei pi
@ 2021-12-01  2:46 ` zhenwei pi
  2021-12-02 22:25   ` Peter Zijlstra
  2021-12-01  2:46 ` [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock zhenwei pi
  1 sibling, 1 reply; 12+ messages in thread
From: zhenwei pi @ 2021-12-01  2:46 UTC (permalink / raw)
  To: tglx, pbonzini; +Cc: kvm, linux-kernel, x86, zhenwei pi

Wrapper function x86_get_cpufreq_khz() to get frequency on a x86
platform, hide detailed implementation from proc routine.

Also export this function for the further use, a typical case is that
kvm module gets the frequency of the host and tell the guest side by
kvmclock.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 arch/x86/include/asm/processor.h |  2 ++
 arch/x86/kernel/cpu/common.c     | 19 +++++++++++++++++++
 arch/x86/kernel/cpu/proc.c       | 13 +++----------
 3 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 355d38c0cf60..22f183dee593 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -855,4 +855,6 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+unsigned int x86_get_cpufreq_khz(unsigned int cpu);
+
 #endif /* _ASM_X86_PROCESSOR_H */
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0083464de5e3..997026fedbb4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -22,6 +22,7 @@
 #include <linux/io.h>
 #include <linux/syscore_ops.h>
 #include <linux/pgtable.h>
+#include <linux/cpufreq.h>
 
 #include <asm/cmdline.h>
 #include <asm/stackprotector.h>
@@ -2104,3 +2105,21 @@ void arch_smt_update(void)
 	/* Check whether IPI broadcasting can be enabled */
 	apic_smt_update();
 }
+
+unsigned int x86_get_cpufreq_khz(unsigned int cpu)
+{
+	unsigned int freq = 0;
+
+	if (!cpu_feature_enabled(X86_FEATURE_TSC))
+		return 0;
+
+	freq = aperfmperf_get_khz(cpu);
+	if (!freq)
+		freq = cpufreq_quick_get(cpu);
+
+	if (!freq)
+		freq = cpu_khz;
+
+	return freq;
+}
+EXPORT_SYMBOL_GPL(x86_get_cpufreq_khz);
diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c
index 4eec8889b0ff..8ed17f969f72 100644
--- a/arch/x86/kernel/cpu/proc.c
+++ b/arch/x86/kernel/cpu/proc.c
@@ -3,7 +3,6 @@
 #include <linux/timex.h>
 #include <linux/string.h>
 #include <linux/seq_file.h>
-#include <linux/cpufreq.h>
 
 #include "cpu.h"
 
@@ -61,7 +60,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c)
 static int show_cpuinfo(struct seq_file *m, void *v)
 {
 	struct cpuinfo_x86 *c = v;
-	unsigned int cpu;
+	unsigned int cpu, freq;
 	int i;
 
 	cpu = c->cpu_index;
@@ -83,16 +82,10 @@ static int show_cpuinfo(struct seq_file *m, void *v)
 	if (c->microcode)
 		seq_printf(m, "microcode\t: 0x%x\n", c->microcode);
 
-	if (cpu_has(c, X86_FEATURE_TSC)) {
-		unsigned int freq = aperfmperf_get_khz(cpu);
-
-		if (!freq)
-			freq = cpufreq_quick_get(cpu);
-		if (!freq)
-			freq = cpu_khz;
+	freq = x86_get_cpufreq_khz(cpu);
+	if (freq)
 		seq_printf(m, "cpu MHz\t\t: %u.%03u\n",
 			   freq / 1000, (freq % 1000));
-	}
 
 	/* Cache size */
 	if (c->x86_cache_size)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-01  2:46 [PATCH v2 0/2] Introduce x86_get_cpufreq_khz() zhenwei pi
  2021-12-01  2:46 ` [PATCH v2 1/2] x86/cpu: " zhenwei pi
@ 2021-12-01  2:46 ` zhenwei pi
  2021-12-02  2:48   ` Thomas Gleixner
  1 sibling, 1 reply; 12+ messages in thread
From: zhenwei pi @ 2021-12-01  2:46 UTC (permalink / raw)
  To: tglx, pbonzini; +Cc: kvm, linux-kernel, x86, zhenwei pi

If the host side supports APERF&MPERF feature, the guest side may get
mismatched frequency.

KVM uses x86_get_cpufreq_khz() to get the same frequency for guest side.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 arch/x86/kvm/x86.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5a403d92833f..125ed3c8b21a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8305,10 +8305,8 @@ static void tsc_khz_changed(void *data)
 
 	if (data)
 		khz = freq->new;
-	else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
-		khz = cpufreq_quick_get(raw_smp_processor_id());
 	if (!khz)
-		khz = tsc_khz;
+		khz = x86_get_cpufreq_khz(raw_smp_processor_id());
 	__this_cpu_write(cpu_tsc_khz, khz);
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-01  2:46 ` [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock zhenwei pi
@ 2021-12-02  2:48   ` Thomas Gleixner
  2021-12-02  5:26     ` zhenwei pi
  0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2021-12-02  2:48 UTC (permalink / raw)
  To: zhenwei pi, pbonzini; +Cc: kvm, linux-kernel, x86, zhenwei pi

On Wed, Dec 01 2021 at 10:46, zhenwei pi wrote:
> If the host side supports APERF&MPERF feature, the guest side may get
> mismatched frequency.
>
> KVM uses x86_get_cpufreq_khz() to get the same frequency for guest side.
>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  arch/x86/kvm/x86.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5a403d92833f..125ed3c8b21a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8305,10 +8305,8 @@ static void tsc_khz_changed(void *data)
>  
>  	if (data)
>  		khz = freq->new;
> -	else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> -		khz = cpufreq_quick_get(raw_smp_processor_id());
>  	if (!khz)
> -		khz = tsc_khz;
> +		khz = x86_get_cpufreq_khz(raw_smp_processor_id());

my brain compiler tells me that this is broken.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-02  2:48   ` Thomas Gleixner
@ 2021-12-02  5:26     ` zhenwei pi
  2021-12-02  7:19       ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: zhenwei pi @ 2021-12-02  5:26 UTC (permalink / raw)
  To: Thomas Gleixner, pbonzini; +Cc: kvm, linux-kernel, x86

On 12/2/21 10:48 AM, Thomas Gleixner wrote:
> On Wed, Dec 01 2021 at 10:46, zhenwei pi wrote:
>> If the host side supports APERF&MPERF feature, the guest side may get
>> mismatched frequency.
>>
>> KVM uses x86_get_cpufreq_khz() to get the same frequency for guest side.
>>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
>>   arch/x86/kvm/x86.c | 4 +---
>>   1 file changed, 1 insertion(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5a403d92833f..125ed3c8b21a 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -8305,10 +8305,8 @@ static void tsc_khz_changed(void *data)
>>   
>>   	if (data)
>>   		khz = freq->new;
>> -	else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
>> -		khz = cpufreq_quick_get(raw_smp_processor_id());
>>   	if (!khz)
>> -		khz = tsc_khz;
>> +		khz = x86_get_cpufreq_khz(raw_smp_processor_id());
> 
> my brain compiler tells me that this is broken.
> Without this patch:
1, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is true:
no kvmclock_cpufreq_notifier, and khz = tsc_khz;

2, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is false:
during installing kmod, try cpufreq_quick_get(), or use tsc_khz;
and get changed by kvmclock_cpufreq_notifier.

With this patch:
1, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is true:
no kvmclock_cpufreq_notifier, try aperf/mperf, or try 
cpufreq_quick_get(), or use cpu_khz

2, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is false:
during installing kmod, try aperf/mperf, or try cpufreq_quick_get(), or 
use cpu_khz;
and get changed by kvmclock_cpufreq_notifier.

I tested on Skylake&Icelake CPU, and got different CPU frequency from 
host & guest, the main purpose of this patch is to get the same frequency.

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-02  5:26     ` zhenwei pi
@ 2021-12-02  7:19       ` Maxim Levitsky
  2021-12-02 22:45         ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2021-12-02  7:19 UTC (permalink / raw)
  To: zhenwei pi, Thomas Gleixner, pbonzini; +Cc: kvm, linux-kernel, x86

On Thu, 2021-12-02 at 13:26 +0800, zhenwei pi wrote:
> On 12/2/21 10:48 AM, Thomas Gleixner wrote:
> > On Wed, Dec 01 2021 at 10:46, zhenwei pi wrote:
> > > If the host side supports APERF&MPERF feature, the guest side may get
> > > mismatched frequency.
> > > 
> > > KVM uses x86_get_cpufreq_khz() to get the same frequency for guest side.
> > > 
> > > Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> > > ---
> > >   arch/x86/kvm/x86.c | 4 +---
> > >   1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 5a403d92833f..125ed3c8b21a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -8305,10 +8305,8 @@ static void tsc_khz_changed(void *data)
> > >   
> > >   	if (data)
> > >   		khz = freq->new;
> > > -	else if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC))
> > > -		khz = cpufreq_quick_get(raw_smp_processor_id());
> > >   	if (!khz)
> > > -		khz = tsc_khz;
> > > +		khz = x86_get_cpufreq_khz(raw_smp_processor_id());
> > 
> > my brain compiler tells me that this is broken.
> > Without this patch:
> 1, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is true:
> no kvmclock_cpufreq_notifier, and khz = tsc_khz;
> 
> 2, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is false:
> during installing kmod, try cpufreq_quick_get(), or use tsc_khz;
> and get changed by kvmclock_cpufreq_notifier.
> 
> With this patch:
> 1, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is true:
> no kvmclock_cpufreq_notifier, try aperf/mperf, or try 
> cpufreq_quick_get(), or use cpu_khz
> 
> 2, boot_cpu_has(X86_FEATURE_CONSTANT_TSC) is false:
> during installing kmod, try aperf/mperf, or try cpufreq_quick_get(), or 
> use cpu_khz;
> and get changed by kvmclock_cpufreq_notifier.
> 
> I tested on Skylake&Icelake CPU, and got different CPU frequency from 
> host & guest, the main purpose of this patch is to get the same frequency.
> 

Note that on my Zen2 machine (3970X), aperf/mperf returns current cpu freqency,
as now see in /proc/cpuinfo, while TSC is always running with base CPU clock frequency (3.7 GHZ)
(that is max frequency that CPU is guranteed to run with, anything above is boost 'bonus')

[mlevitsk@starship ~/Kernel/br-vm-64/src]$cat /proc/cpuinfo | grep "cpu MHz"
cpu MHz		: 3685.333
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2761.946
cpu MHz		: 2200.000
cpu MHz		: 2200.000
cpu MHz		: 2200.000
...

[mlevitsk@starship ~/Kernel/master/src]$dmesg | grep tsc
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 3700.230 MHz processor
...


Before I forget about it I do want to point out few things
that are not 100% related to this thread but do related to TSC:

1. It sucks that on AMD, the TSC frequency is calibrated from other 
clocksources like PIT/HPET, since the result is not exact and varies
from boot to boot. I do wonder if they have something like that
APERF/MPERF thing which sadly is not what I was looking for.

2. In the guest on AMD, we mark the TSC as unsynchronized always due to the code
in unsynchronized_tsc, unless invariant tsc is used in guest cpuid,
which is IMHO not fair to AMD as we don't do this for  Intel cpus.
(look at unsynchronized_tsc function)

3. I wish the kernel would export the tsc frequency it found to userspace
somewhere in /sys or /proc, as this would be very useful for userspace applications.
Currently it can only be found in dmesg if I am not mistaken..
I don't mind if such frequency would only be exported if the TSC is stable,
always running, not affected by CPUfreq, etc.


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] x86/cpu: Introduce x86_get_cpufreq_khz()
  2021-12-01  2:46 ` [PATCH v2 1/2] x86/cpu: " zhenwei pi
@ 2021-12-02 22:25   ` Peter Zijlstra
  2021-12-03  7:34     ` zhenwei pi
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2021-12-02 22:25 UTC (permalink / raw)
  To: zhenwei pi; +Cc: tglx, pbonzini, kvm, linux-kernel, x86

On Wed, Dec 01, 2021 at 10:46:49AM +0800, zhenwei pi wrote:
> Wrapper function x86_get_cpufreq_khz() to get frequency on a x86
> platform, hide detailed implementation from proc routine.
> 
> Also export this function for the further use, a typical case is that
> kvm module gets the frequency of the host and tell the guest side by
> kvmclock.

This function was already complete crap, and now you want to export it,
*WHY* ?!

What possible use does KVM have for this random number generator?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-02  7:19       ` Maxim Levitsky
@ 2021-12-02 22:45         ` Peter Zijlstra
  2021-12-03  7:53           ` Maxim Levitsky
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2021-12-02 22:45 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: zhenwei pi, Thomas Gleixner, pbonzini, kvm, linux-kernel, x86

On Thu, Dec 02, 2021 at 09:19:25AM +0200, Maxim Levitsky wrote:
> On Thu, 2021-12-02 at 13:26 +0800, zhenwei pi wrote:

> Note that on my Zen2 machine (3970X), aperf/mperf returns current cpu freqency,

Correct, and it computes it over a random period of history. IOW, it's a
random number generator.

> 1. It sucks that on AMD, the TSC frequency is calibrated from other 
> clocksources like PIT/HPET, since the result is not exact and varies
> from boot to boot.

CPUID.15h is supposed to tell us the actual frequency; except even Intel
find it very hard to actually put the right (or any, really) number in
there :/ Bribe your friendly AMD engineer with beers or something.

> 2. In the guest on AMD, we mark the TSC as unsynchronized always due to the code
> in unsynchronized_tsc, unless invariant tsc is used in guest cpuid,
> which is IMHO not fair to AMD as we don't do this for  Intel cpus.
> (look at unsynchronized_tsc function)

Possibly we could treat >= Zen similar to Intel there. Also that comment
there is hillarious, it talks about multi-socket and then tests
num_possible_cpus(). Clearly that code hasn't been touched in like
forever.

> 3. I wish the kernel would export the tsc frequency it found to userspace
> somewhere in /sys or /proc, as this would be very useful for userspace applications.
> Currently it can only be found in dmesg if I am not mistaken..
> I don't mind if such frequency would only be exported if the TSC is stable,
> always running, not affected by CPUfreq, etc.

Perf exposes it, it's not really convenient if you're not using perf,
but it can be found there.


---
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2e076a459a0c..09da2935534a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -29,6 +29,7 @@
 #include <asm/intel-family.h>
 #include <asm/i8259.h>
 #include <asm/uv/uv.h>
+#include <asm/topology.h>
 
 unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
 EXPORT_SYMBOL(cpu_khz);
@@ -1221,9 +1222,20 @@ int unsynchronized_tsc(void)
 	 * Intel systems are normally all synchronized.
 	 * Exceptions must mark TSC as unstable:
 	 */
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
+	switch (boot_cpu_data.x86_vendor) {
+	case X86_VENDOR_INTEL:
+		/* Really only Core and later */
+		break;
+
+	case X86_VENDOR_AMD:
+	case X86_VENDOR_HYGON:
+		if (boot_cpu_data.x86 >= 0x17) /* >= Zen */
+			break;
+		fallthrough;
+
+	default:
 		/* assume multi socket systems are not synchronized: */
-		if (num_possible_cpus() > 1)
+		if (topology_max_packages() > 1)
 			return 1;
 	}
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 1/2] x86/cpu: Introduce x86_get_cpufreq_khz()
  2021-12-02 22:25   ` Peter Zijlstra
@ 2021-12-03  7:34     ` zhenwei pi
  2021-12-04 10:46       ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: zhenwei pi @ 2021-12-03  7:34 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: tglx, pbonzini, kvm, linux-kernel, x86

On 12/3/21 6:25 AM, Peter Zijlstra wrote:
> On Wed, Dec 01, 2021 at 10:46:49AM +0800, zhenwei pi wrote:
>> Wrapper function x86_get_cpufreq_khz() to get frequency on a x86
>> platform, hide detailed implementation from proc routine.
>>
>> Also export this function for the further use, a typical case is that
>> kvm module gets the frequency of the host and tell the guest side by
>> kvmclock.
> 
> This function was already complete crap, and now you want to export it,
> *WHY* ?!
> 
> What possible use does KVM have for this random number generator?
> 
A KVM guest overwrites the '.calibrate_tsc' and '.calibrate_cpu' if 
kvmclock is supported:

in function kvmclock_init(void) (linux/arch/x86/kernel/kvmclock.c)
	...
         x86_platform.calibrate_tsc = kvm_get_tsc_khz;
         x86_platform.calibrate_cpu = kvm_get_tsc_khz;
	...

And kvm_get_tsc_khz reads PV data from host side. Before guest reads 
this, KVM should writes the frequency into the PV data structure.

And the problem is that KVM gets tsc_khz directly without aperf/mperf 
detection. So user may gets different frequency(cat /proc/cpuinfo) from 
guest & host.

Or is that possible to export function 'aperfmperf_get_khz'?

-- 
zhenwei pi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-02 22:45         ` Peter Zijlstra
@ 2021-12-03  7:53           ` Maxim Levitsky
  2021-12-04 10:44             ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Maxim Levitsky @ 2021-12-03  7:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: zhenwei pi, Thomas Gleixner, pbonzini, kvm, linux-kernel, x86

On Thu, 2021-12-02 at 23:45 +0100, Peter Zijlstra wrote:
> On Thu, Dec 02, 2021 at 09:19:25AM +0200, Maxim Levitsky wrote:
> > On Thu, 2021-12-02 at 13:26 +0800, zhenwei pi wrote:
> > Note that on my Zen2 machine (3970X), aperf/mperf returns current cpu freqency,
> 
> Correct, and it computes it over a random period of history. IOW, it's a
> random number generator.
> 
> > 1. It sucks that on AMD, the TSC frequency is calibrated from other 
> > clocksources like PIT/HPET, since the result is not exact and varies
> > from boot to boot.
> 
> CPUID.15h is supposed to tell us the actual frequency; except even Intel
> find it very hard to actually put the right (or any, really) number in
> there :/ Bribe your friendly AMD engineer with beers or something.

That what I thought. I asked just in case maybe AMD does have some vendor specific msrs
you know about but we didn't bother to support it.
I didn't find any in their PRM.

> 
> > 2. In the guest on AMD, we mark the TSC as unsynchronized always due to the code
> > in unsynchronized_tsc, unless invariant tsc is used in guest cpuid,
> > which is IMHO not fair to AMD as we don't do this for  Intel cpus.
> > (look at unsynchronized_tsc function)
> 
> Possibly we could treat >= Zen similar to Intel there. Also that comment
> there is hillarious, it talks about multi-socket and then tests
> num_possible_cpus(). Clearly that code hasn't been touched in like
> forever.
Thank you!

> 
> > 3. I wish the kernel would export the tsc frequency it found to userspace
> > somewhere in /sys or /proc, as this would be very useful for userspace applications.
> > Currently it can only be found in dmesg if I am not mistaken..
> > I don't mind if such frequency would only be exported if the TSC is stable,
> > always running, not affected by CPUfreq, etc.
> 
> Perf exposes it, it's not really convenient if you're not using perf,
> but it can be found there.
That is good to know! I will check out the source but if you remember,
is there cli option in perf to show it, or it only uses it for internal
purposes?

> 
> 
> ---
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 2e076a459a0c..09da2935534a 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -29,6 +29,7 @@
>  #include <asm/intel-family.h>
>  #include <asm/i8259.h>
>  #include <asm/uv/uv.h>
> +#include <asm/topology.h>
>  
>  unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
>  EXPORT_SYMBOL(cpu_khz);
> @@ -1221,9 +1222,20 @@ int unsynchronized_tsc(void)
>  	 * Intel systems are normally all synchronized.
>  	 * Exceptions must mark TSC as unstable:
>  	 */
> -	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
> +	switch (boot_cpu_data.x86_vendor) {
> +	case X86_VENDOR_INTEL:
> +		/* Really only Core and later */
> +		break;
> +
> +	case X86_VENDOR_AMD:
> +	case X86_VENDOR_HYGON:
> +		if (boot_cpu_data.x86 >= 0x17) /* >= Zen */
> +			break;
> +		fallthrough;
> +
> +	default:
>  		/* assume multi socket systems are not synchronized: */
> -		if (num_possible_cpus() > 1)
> +		if (topology_max_packages() > 1)
>  			return 1;
>  	}
>  

This makes sense!

> 


Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock
  2021-12-03  7:53           ` Maxim Levitsky
@ 2021-12-04 10:44             ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2021-12-04 10:44 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: zhenwei pi, Thomas Gleixner, pbonzini, kvm, linux-kernel, x86

On Fri, Dec 03, 2021 at 09:53:53AM +0200, Maxim Levitsky wrote:

> > Perf exposes it, it's not really convenient if you're not using perf,
> > but it can be found there.
> That is good to know! I will check out the source but if you remember,
> is there cli option in perf to show it, or it only uses it for internal
> purposes?

perf tool doesn't expose it I think, it's stuffed in
perf_event_mmap_page::time_* fields, see
include/uapi/linux/perf_event.h, it has comments on.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Re: [PATCH v2 1/2] x86/cpu: Introduce x86_get_cpufreq_khz()
  2021-12-03  7:34     ` zhenwei pi
@ 2021-12-04 10:46       ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2021-12-04 10:46 UTC (permalink / raw)
  To: zhenwei pi; +Cc: tglx, pbonzini, kvm, linux-kernel, x86

On Fri, Dec 03, 2021 at 03:34:04PM +0800, zhenwei pi wrote:
> A KVM guest overwrites the '.calibrate_tsc' and '.calibrate_cpu' if kvmclock
> is supported:
> 
> in function kvmclock_init(void) (linux/arch/x86/kernel/kvmclock.c)
> 	...
>         x86_platform.calibrate_tsc = kvm_get_tsc_khz;
>         x86_platform.calibrate_cpu = kvm_get_tsc_khz;
> 	...
> 
> And kvm_get_tsc_khz reads PV data from host side. Before guest reads this,
> KVM should writes the frequency into the PV data structure.
> 
> And the problem is that KVM gets tsc_khz directly without aperf/mperf
> detection. So user may gets different frequency(cat /proc/cpuinfo) from
> guest & host.
> 
> Or is that possible to export function 'aperfmperf_get_khz'?

TSC frequency and aperf/mperf are unrelated. You're trying to make apple
juice with carrots.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-12-04 10:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01  2:46 [PATCH v2 0/2] Introduce x86_get_cpufreq_khz() zhenwei pi
2021-12-01  2:46 ` [PATCH v2 1/2] x86/cpu: " zhenwei pi
2021-12-02 22:25   ` Peter Zijlstra
2021-12-03  7:34     ` zhenwei pi
2021-12-04 10:46       ` Peter Zijlstra
2021-12-01  2:46 ` [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock zhenwei pi
2021-12-02  2:48   ` Thomas Gleixner
2021-12-02  5:26     ` zhenwei pi
2021-12-02  7:19       ` Maxim Levitsky
2021-12-02 22:45         ` Peter Zijlstra
2021-12-03  7:53           ` Maxim Levitsky
2021-12-04 10:44             ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).