All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
@ 2022-10-21  6:21 Feng Tang
  2022-10-21  6:21 ` [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Feng Tang
  2022-10-21 15:00 ` [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Zhang Rui
  0 siblings, 2 replies; 13+ messages in thread
From: Feng Tang @ 2022-10-21  6:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel
  Cc: rui.zhang, tim.c.chen, Xiongfeng Wang, liaoyu15, Feng Tang

Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduced to solve problem that
sometimes TSC clocksource is wrongly judged as unstable by watchdog
like 'jiffies', HPET, etc.

In it, the hardware socket number is a key factor for judging
whether to disable the watchdog for TSC, and 'nr_online_nodes' was
chosen as an estimation due to it is needed in early boot phase
before registering 'tsc-early' clocksource, where all none-boot
CPUs are not brought up yet.

In recent patch review, Dave Hansen pointed out there are many
cases that 'nr_online_nodes' could have issue, like:
* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
  persistent memory nodes.
* SNC (sub-numa cluster) mode is enabled

Peter Zijlstra suggested to use logical package ids, but it is
only usable after smp_init() and all CPUs are initialized.

One solution is to skip the watchdog for 'tsc-early' clocksource,
and move the check after smp_init(), while before 'tsc'
clocksoure is registered, where 'logical_packages' could be used
as a much more accurate socket number.

Signed-off-by: Feng Tang <feng.tang@intel.com>
---
Hi reviewers,

I separate the code to 2 patches, as I think they are covering 2
problems and easy for bisect. Feel free to combine them into one,
as the 2/2 are a trivial change.

Thanks,
Feng

Changelog:
 
 Since RFC:
 * use 'logical_packages' instead of topology_max_packages(), whose
   implementaion is not accurate, like for heterogeneous systems
   which have combination of Core/Atom CPUs like Alderlake (Dave Hansen)

 arch/x86/include/asm/topology.h |  4 ++++
 arch/x86/kernel/smpboot.c       |  2 +-
 arch/x86/kernel/tsc.c           | 42 +++++++++++++--------------------
 3 files changed, 21 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..f9002549770c 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -122,8 +122,11 @@ extern unsigned int __max_die_per_package;
 #define topology_sibling_cpumask(cpu)		(per_cpu(cpu_sibling_map, cpu))
 
 extern unsigned int __max_logical_packages;
+extern unsigned int logical_packages;
 #define topology_max_packages()			(__max_logical_packages)
 
+extern unsigned int logical_packages;
+
 static inline int topology_max_die_per_package(void)
 {
 	return __max_die_per_package;
@@ -144,6 +147,7 @@ bool topology_is_primary_thread(unsigned int cpu);
 bool topology_smt_supported(void);
 #else
 #define topology_max_packages()			(1)
+#define logical_packages 			(1)
 static inline int
 topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
 static inline int
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3f3ea0287f69..d81156beb7e7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -102,7 +102,7 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);
 /* Logical package management. We might want to allocate that dynamically */
 unsigned int __max_logical_packages __read_mostly;
 EXPORT_SYMBOL(__max_logical_packages);
-static unsigned int logical_packages __read_mostly;
+unsigned int logical_packages __read_mostly;
 static unsigned int logical_die __read_mostly;
 
 /* Maximum number of SMT threads on any online core */
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index cafacb2e58cc..178448ef00c7 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1131,8 +1131,7 @@ static struct clocksource clocksource_tsc_early = {
 	.uncertainty_margin	= 32 * NSEC_PER_MSEC,
 	.read			= read_tsc,
 	.mask			= CLOCKSOURCE_MASK(64),
-	.flags			= CLOCK_SOURCE_IS_CONTINUOUS |
-				  CLOCK_SOURCE_MUST_VERIFY,
+	.flags			= CLOCK_SOURCE_IS_CONTINUOUS,
 	.vdso_clock_mode	= VDSO_CLOCKMODE_TSC,
 	.enable			= tsc_cs_enable,
 	.resume			= tsc_resume,
@@ -1180,12 +1179,6 @@ void mark_tsc_unstable(char *reason)
 
 EXPORT_SYMBOL_GPL(mark_tsc_unstable);
 
-static void __init tsc_disable_clocksource_watchdog(void)
-{
-	clocksource_tsc_early.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-	clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
-}
-
 static void __init check_system_tsc_reliable(void)
 {
 #if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
@@ -1202,23 +1195,6 @@ static void __init check_system_tsc_reliable(void)
 #endif
 	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
 		tsc_clocksource_reliable = 1;
-
-	/*
-	 * Disable the clocksource watchdog when the system has:
-	 *  - TSC running at constant frequency
-	 *  - TSC which does not stop in C-States
-	 *  - the TSC_ADJUST register which allows to detect even minimal
-	 *    modifications
-	 *  - not more than two sockets. As the number of sockets cannot be
-	 *    evaluated at the early boot stage where this has to be
-	 *    invoked, check the number of online memory nodes as a
-	 *    fallback solution which is an reasonable estimate.
-	 */
-	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
-	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
-	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    nr_online_nodes <= 2)
-		tsc_disable_clocksource_watchdog();
 }
 
 /*
@@ -1413,6 +1389,20 @@ static int __init init_tsc_clocksource(void)
 	if (boot_cpu_has(X86_FEATURE_NONSTOP_TSC_S3))
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 
+	/*
+	 * Disable the clocksource watchdog when the system has:
+	 *  - TSC running at constant frequency
+	 *  - TSC which does not stop in C-States
+	 *  - the TSC_ADJUST register which allows to detect even minimal
+	 *    modifications
+	 *  - not more than two sockets.
+	 */
+	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
+	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
+	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+	    logical_packages <= 2)
+		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
+
 	/*
 	 * When TSC frequency is known (retrieved via MSR or CPUID), we skip
 	 * the refined calibration and directly register it as a clocksource.
@@ -1547,7 +1537,7 @@ void __init tsc_init(void)
 	}
 
 	if (tsc_clocksource_reliable || no_tsc_watchdog)
-		tsc_disable_clocksource_watchdog();
+		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
 
 	clocksource_register_khz(&clocksource_tsc_early, tsc_khz);
 	detect_art();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform
  2022-10-21  6:21 [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Feng Tang
@ 2022-10-21  6:21 ` Feng Tang
  2023-06-02 18:00   ` Paul E. McKenney
  2022-10-21 15:00 ` [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Zhang Rui
  1 sibling, 1 reply; 13+ messages in thread
From: Feng Tang @ 2022-10-21  6:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel
  Cc: rui.zhang, tim.c.chen, Xiongfeng Wang, liaoyu15, Feng Tang

There is report again that the tsc clocksource on a 4 sockets x86
Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
and disabled [1].

Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
on qualified platorms") was introduce to deal with these false
alarms of tsc unstable issues, covering qualified platforms for 2
sockets or smaller ones.

Extend the exemption to 4 sockets to fix the issue.

We also got similar reports on 8 sockets platform from internal test,
but as Peter pointed out, there was tsc sync issues for 8-sockets
platform, and it'd better be handled architecture by architecture,
instead of directly changing the threshold to 8 here.

Rui also proposed another way to disable 'jiffies' as clocksource
watchdog [2], which can also solve this specific problem in an
architecture independent way, with one limitation that there are
also some tsc false alarms which were reported by other hardware
watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly
used in kernel boot phase.

[1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/
[2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/

Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 arch/x86/kernel/tsc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 178448ef00c7..356f06287034 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void)
 	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
 	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
 	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
-	    logical_packages <= 2)
+	    logical_packages <= 4)
 		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
 
 	/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-21  6:21 [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Feng Tang
  2022-10-21  6:21 ` [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Feng Tang
@ 2022-10-21 15:00 ` Zhang Rui
  2022-10-21 16:21   ` Dave Hansen
  1 sibling, 1 reply; 13+ messages in thread
From: Zhang Rui @ 2022-10-21 15:00 UTC (permalink / raw)
  To: Feng Tang, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, H . Peter Anvin, Peter Zijlstra, x86, linux-kernel
  Cc: tim.c.chen, Xiongfeng Wang, liaoyu15

On Fri, 2022-10-21 at 14:21 +0800, Feng Tang wrote:
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduced to solve problem that
> sometimes TSC clocksource is wrongly judged as unstable by watchdog
> like 'jiffies', HPET, etc.
> 
> In it, the hardware socket number is a key factor for judging
> whether to disable the watchdog for TSC, and 'nr_online_nodes' was
> chosen as an estimation due to it is needed in early boot phase
> before registering 'tsc-early' clocksource, where all none-boot
> CPUs are not brought up yet.
> 
> In recent patch review, Dave Hansen pointed out there are many
> cases that 'nr_online_nodes' could have issue, like:
> * numa emulation (numa=fake=4 etc.)
> * numa=off
> * platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
>   persistent memory nodes.
> * SNC (sub-numa cluster) mode is enabled
> 
> Peter Zijlstra suggested to use logical package ids, but it is
> only usable after smp_init() and all CPUs are initialized.
> 
> One solution is to skip the watchdog for 'tsc-early' clocksource,
> and move the check after smp_init(), while before 'tsc'
> clocksoure is registered, where 'logical_packages' could be used
> as a much more accurate socket number.
> 
> Signed-off-by: Feng Tang <feng.tang@intel.com>
> ---
> Hi reviewers,
> 
> I separate the code to 2 patches, as I think they are covering 2
> problems and easy for bisect. Feel free to combine them into one,
> as the 2/2 are a trivial change.
> 
> Thanks,
> Feng
> 
> Changelog:
>  
>  Since RFC:
>  * use 'logical_packages' instead of topology_max_packages(), whose
>    implementaion is not accurate, like for heterogeneous systems
>    which have combination of Core/Atom CPUs like Alderlake (Dave
> Hansen)

I checked the history of '__max_logical_packages', and realized that

1. for topology_max_packages()/'__max_logical_packages', the divisor
   'ncpus' uses cpu_data(0).booted_cores, which is based on the
   *online* CPUs. So when using kernel cmdlines like maxcpus=/nr_cpus=,
   '__max_logical_packages' can get over-estimated.

2. for 'logical_packages', it equals the number of different physical
   Package IDs for all *online* CPUs. So with kernel cmdlines like
   nr_cpus=/maxcpus=, it can gets under-estimated.

BTW, I also checked CPUID.B/1F, which can tell a fixed number of CPUs
within a package. But we don't have a fixed number of total CPUs from
hardware.
On my Dell laptop, BIOS allows me to disable/enable one or several
cores. When this happens, the 'total_cpus' changes, but CPUID.B/1F does
not change. So I don't think CPUID.B/1F can be used to optimize the '__
max_logical_packages' calculation.

I'm not sure if we have a perfect solution here.

thanks,
rui







^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-21 15:00 ` [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Zhang Rui
@ 2022-10-21 16:21   ` Dave Hansen
  2022-10-22 16:12     ` Zhang Rui
  2022-10-24  7:37     ` Feng Tang
  0 siblings, 2 replies; 13+ messages in thread
From: Dave Hansen @ 2022-10-21 16:21 UTC (permalink / raw)
  To: Zhang Rui, Feng Tang, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86,
	linux-kernel
  Cc: tim.c.chen, Xiongfeng Wang, liaoyu15

On 10/21/22 08:00, Zhang Rui wrote:
> I checked the history of '__max_logical_packages', and realized that
> 
> 1. for topology_max_packages()/'__max_logical_packages', the divisor
>    'ncpus' uses cpu_data(0).booted_cores, which is based on the
>    *online* CPUs. So when using kernel cmdlines like maxcpus=/nr_cpus=,
>    '__max_logical_packages' can get over-estimated.
> 
> 2. for 'logical_packages', it equals the number of different physical
>    Package IDs for all *online* CPUs. So with kernel cmdlines like
>    nr_cpus=/maxcpus=, it can gets under-estimated.
> 
> BTW, I also checked CPUID.B/1F, which can tell a fixed number of CPUs
> within a package. But we don't have a fixed number of total CPUs from
> hardware.
> On my Dell laptop, BIOS allows me to disable/enable one or several
> cores. When this happens, the 'total_cpus' changes, but CPUID.B/1F does
> not change. So I don't think CPUID.B/1F can be used to optimize the '__
> max_logical_packages' calculation.
> 
> I'm not sure if we have a perfect solution here.

Are the implementations fixable?  Or, at least tolerable?

For instance, I can live with the implementation being a bit goofy when
kernel commandlines are in play.  We can pr_info() about those cases.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-21 16:21   ` Dave Hansen
@ 2022-10-22 16:12     ` Zhang Rui
  2022-10-24 15:42       ` Dave Hansen
  2022-10-24  7:37     ` Feng Tang
  1 sibling, 1 reply; 13+ messages in thread
From: Zhang Rui @ 2022-10-22 16:12 UTC (permalink / raw)
  To: Dave Hansen, Feng Tang, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86,
	linux-kernel
  Cc: tim.c.chen, Xiongfeng Wang, liaoyu15

On Fri, 2022-10-21 at 09:21 -0700, Dave Hansen wrote:
> On 10/21/22 08:00, Zhang Rui wrote:
> > I checked the history of '__max_logical_packages', and realized
> > that
> > 
> > 1. for topology_max_packages()/'__max_logical_packages', the
> > divisor
> >    'ncpus' uses cpu_data(0).booted_cores, which is based on the
> >    *online* CPUs. So when using kernel cmdlines like
> > maxcpus=/nr_cpus=,
> >    '__max_logical_packages' can get over-estimated.
> > 
> > 
> > 2. for 'logical_packages', it equals the number of different
> > physical
> >    Package IDs for all *online* CPUs. So with kernel cmdlines like
> >    nr_cpus=/maxcpus=, it can gets under-estimated.
> > 
> > BTW, I also checked CPUID.B/1F, which can tell a fixed number of
> > CPUs
> > within a package. But we don't have a fixed number of total CPUs
> > from
> > hardware.
> > On my Dell laptop, BIOS allows me to disable/enable one or several
> > cores. When this happens, the 'total_cpus' changes, but CPUID.B/1F
> > does
> > not change. So I don't think CPUID.B/1F can be used to optimize the
> > '__
> > max_logical_packages' calculation.
> > 
> > I'm not sure if we have a perfect solution here.
> 
> Are the implementations fixable?

currently, I don't have any idea.

>   Or, at least tolerable?
> 
> For instance, I can live with the implementation being a bit goofy
> when
> kernel commandlines are in play.  We can pr_info() about those cases.

My understanding is that the cpus in the last package may still have
small cpu id value. This means that the 'logical_packages' is hard to
break unless we boot with very small CPU count and happened to disable
all cpus in one/more packages. Feng is experiencing with this and may
have some update later.

If this is the case, is this a valid case that we need to take care of?

thanks,
rui


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-21 16:21   ` Dave Hansen
  2022-10-22 16:12     ` Zhang Rui
@ 2022-10-24  7:37     ` Feng Tang
  2022-10-24 15:43       ` Dave Hansen
  1 sibling, 1 reply; 13+ messages in thread
From: Feng Tang @ 2022-10-24  7:37 UTC (permalink / raw)
  To: Dave Hansen, Zhang Rui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86
  Cc: linux-kernel, tim.c.chen, Xiongfeng Wang, liaoyu15

On Fri, Oct 21, 2022 at 09:21:02AM -0700, Dave Hansen wrote:
> On 10/21/22 08:00, Zhang Rui wrote:
> > I checked the history of '__max_logical_packages', and realized that
> > 
> > 1. for topology_max_packages()/'__max_logical_packages', the divisor
> >    'ncpus' uses cpu_data(0).booted_cores, which is based on the
> >    *online* CPUs. So when using kernel cmdlines like maxcpus=/nr_cpus=,
> >    '__max_logical_packages' can get over-estimated.
> > 
> > 2. for 'logical_packages', it equals the number of different physical
> >    Package IDs for all *online* CPUs. So with kernel cmdlines like
> >    nr_cpus=/maxcpus=, it can gets under-estimated.

Thanks for raising this new case! For 'maxcpus=' cmdline parameter, I
run this on 4-Sockets and 2-Sockets platform, and found the
'logical_packages' is still correct if the 'maxcpus >= (total_cpus/2 + 1)',
but gets wrong for smaller numbers.

SRAT parsing solution [1]. can solve 'maxcpus' case, but it will fail
for other user cases like sub-numa cluster or 'numa=off' case

IIUC, 'maxcpus' are likely for debug purpose, I think 'logical_packages'
is the better option for socket number estimation in the several
existing solutions.

> For instance, I can live with the implementation being a bit goofy when
> kernel commandlines are in play.  We can pr_info() about those cases.

Something like adding

pr_info("Watchdog for TSC is disabled for this platform while estimating
	the socket number is %d, if the real socket number is bigger than
	4 (may due to some tricks like 'maxcpus=' cmdline parameter, please
	add 'tsc=watchdog' to cmdline as well\n", logical_packages);

and adding a new 'tsc=watchdog' option to force watchdog on (might be
over-complexed?)

[1]. https://lore.kernel.org/lkml/Y0UgeUIJSFNR4mQB@feng-clx/

Thanks,
Feng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-22 16:12     ` Zhang Rui
@ 2022-10-24 15:42       ` Dave Hansen
  2022-10-25  7:35         ` Feng Tang
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2022-10-24 15:42 UTC (permalink / raw)
  To: Zhang Rui, Feng Tang, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86,
	linux-kernel
  Cc: tim.c.chen, Xiongfeng Wang, liaoyu15

On 10/22/22 09:12, Zhang Rui wrote:
>>> I'm not sure if we have a perfect solution here.
>> Are the implementations fixable?
> currently, I don't have any idea.
> 
>>   Or, at least tolerable?

That would be great to figure out before we start throwing more patches
around.

>> For instance, I can live with the implementation being a bit goofy
>> when
>> kernel commandlines are in play.  We can pr_info() about those cases.
> My understanding is that the cpus in the last package may still have
> small cpu id value. This means that the 'logical_packages' is hard to
> break unless we boot with very small CPU count and happened to disable
> all cpus in one/more packages. Feng is experiencing with this and may
> have some update later.
> 
> If this is the case, is this a valid case that we need to take care of?

Well, let's talk through it a bit.

What is the triggering event and what's the fallout?

Is the user on a truly TSC stable system or not?

What kind of maxcpus= argument do they need to specify?  Is it something
that's likely to get used in production or is it most likely just for
debugging?

What is the maxcpus= fallout?  Does it over estimate or under estimate
the number of logical packages?

How many cases outside of maxcpus= do we know of that lead to an
imprecise "logical packages" calculation?

Does this lead to the TSC being mistakenly marked stable when it is not,
or *not* being marked stable when it is?

Let's get all of that info in one place and make sure we are all agreed
on the *problem* before we got to the solution space.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-24  7:37     ` Feng Tang
@ 2022-10-24 15:43       ` Dave Hansen
  2022-10-25  7:57         ` Feng Tang
  0 siblings, 1 reply; 13+ messages in thread
From: Dave Hansen @ 2022-10-24 15:43 UTC (permalink / raw)
  To: Feng Tang, Zhang Rui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86
  Cc: linux-kernel, tim.c.chen, Xiongfeng Wang, liaoyu15

On 10/24/22 00:37, Feng Tang wrote:
>> For instance, I can live with the implementation being a bit goofy when
>> kernel commandlines are in play.  We can pr_info() about those cases.
> Something like adding
> 
> pr_info("Watchdog for TSC is disabled for this platform while estimating
> 	the socket number is %d, if the real socket number is bigger than
> 	4 (may due to some tricks like 'maxcpus=' cmdline parameter, please
> 	add 'tsc=watchdog' to cmdline as well\n", logical_packages);

That's too wishy-washy.  Also, I *KNOW* Intel has built systems with
wonky, opaque numbers of "sockets".  Cascade Lake was a single physical
"socket", but in all other respects (including enumeration to software)
it acted like two logical sockets.

So, what was the "real" socket number for Cascade Lake?  If you looked
in a chassis, you'd see one socket.  But, there were two dies in that
socket talking to each other over UPI, so it had a system topology which
was indistinguishable from a 2-socket system.

Let's just state the facts:

	pr_info("Disabling TSC watchdog on %d-package system.", ...)

Then, we can have a flag elsewhere to say how reliable that number is.
A taint flag or CPU bug is probably going to far, but something like this:

bool logical_package_count_unreliable = false;

void mark_bad_package_count(char *reason)
{
	if (logical_package_count_unreliable)
		return true;

	pr_warn("processor package count is unreliable");
}

Might be OK.  Then you can call mark_bad_package_count() from multiple
sites, like the maxcpus= code.

But, like I said in the other thread, let's make sure we're agreed on
the precise problem that we're solving before we go down this road.

> and adding a new 'tsc=watchdog' option to force watchdog on (might be
> over-complexed?)

Agreed, I don't think that's quite warranted yet.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-24 15:42       ` Dave Hansen
@ 2022-10-25  7:35         ` Feng Tang
  0 siblings, 0 replies; 13+ messages in thread
From: Feng Tang @ 2022-10-25  7:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Zhang Rui, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, tim.c.chen,
	Xiongfeng Wang, liaoyu15

On Mon, Oct 24, 2022 at 08:42:30AM -0700, Dave Hansen wrote:
> On 10/22/22 09:12, Zhang Rui wrote:
> >>> I'm not sure if we have a perfect solution here.
> >> Are the implementations fixable?
> > currently, I don't have any idea.
> > 
> >>   Or, at least tolerable?
> 
> That would be great to figure out before we start throwing more patches
> around.

Yes, agreed!

> >> For instance, I can live with the implementation being a bit goofy
> >> when
> >> kernel commandlines are in play.  We can pr_info() about those cases.
> > My understanding is that the cpus in the last package may still have
> > small cpu id value. This means that the 'logical_packages' is hard to
> > break unless we boot with very small CPU count and happened to disable
> > all cpus in one/more packages. Feng is experiencing with this and may
> > have some update later.
> > 
> > If this is the case, is this a valid case that we need to take care of?
> 
> Well, let's talk through it a bit.
> 
> What is the triggering event and what's the fallout?

In worst case (2 sockets), if the maxcpus falls to '<= total_cpus/2',
the 'logical_packages' will be less than the real number.

> Is the user on a truly TSC stable system or not?
> 
> What kind of maxcpus= argument do they need to specify?  Is it something
> that's likely to get used in production or is it most likely just for
> debugging?

IIUC, for the server side, it's most likely for debug use. And for
clients, socket number is not an issue.

> What is the maxcpus= fallout?  Does it over estimate or under estimate
> the number of logical packages?
 
Only under estimate.

> How many cases outside of maxcpus= do we know of that lead to an
> imprecise "logical packages" calculation?
 
Thanks to you, Peter and Rui's info, we have listed a bunch of
user cases than 'maxcpus', and they won't lead to imprecise
'logical_packages'. And I'm not sure if there is other case which
hasn't poped up.

> Does this lead to the TSC being mistakenly marked stable when it is not,
> or *not* being marked stable when it is?

Only the former case 'mistakenly marked stable' is possible, say we
use 'maxcpus=8' on a 192 core 8 sockets machine.

> Let's get all of that info in one place and make sure we are all agreed
> on the *problem* before we got to the solution space.

OK.

Thanks,
Feng




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-24 15:43       ` Dave Hansen
@ 2022-10-25  7:57         ` Feng Tang
  2022-11-04  7:21           ` Feng Tang
  0 siblings, 1 reply; 13+ messages in thread
From: Feng Tang @ 2022-10-25  7:57 UTC (permalink / raw)
  To: Dave Hansen, Zhang Rui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86
  Cc: linux-kernel, tim.c.chen, Xiongfeng Wang, liaoyu15, oliver.sang,
	ying.huang

On Mon, Oct 24, 2022 at 08:43:33AM -0700, Dave Hansen wrote:
> On 10/24/22 00:37, Feng Tang wrote:
> >> For instance, I can live with the implementation being a bit goofy when
> >> kernel commandlines are in play.  We can pr_info() about those cases.
> > Something like adding
> > 
> > pr_info("Watchdog for TSC is disabled for this platform while estimating
> > 	the socket number is %d, if the real socket number is bigger than
> > 	4 (may due to some tricks like 'maxcpus=' cmdline parameter, please
> > 	add 'tsc=watchdog' to cmdline as well\n", logical_packages);
> 
> That's too wishy-washy.  Also, I *KNOW* Intel has built systems with
> wonky, opaque numbers of "sockets".  Cascade Lake was a single physical
> "socket", but in all other respects (including enumeration to software)
> it acted like two logical sockets.
> 
> So, what was the "real" socket number for Cascade Lake?  If you looked
> in a chassis, you'd see one socket.  But, there were two dies in that
> socket talking to each other over UPI, so it had a system topology which
> was indistinguishable from a 2-socket system.
 
Good to know and thanks for the info.

I have to admit I haven't checked a server's internals before, and
thanks to Oliver for helping checking some Cascade Lake boxes of 0Day.

In one box where 'lscpu' shows 4 sockets (96 cores in total), it does
only have 2 physical processors in the chassis, just like you
mentioned, it has 2 dies for each processor. And in another box,
'lscpu' shows 2 sockets (44 cores in total), it also has 2 physical
processors but with much smaller size.

And fortunately the 'logical_packages' for these 2 boxes are both
the correct value: 2.

> Let's just state the facts:
> 
> 	pr_info("Disabling TSC watchdog on %d-package system.", ...)
> 
> Then, we can have a flag elsewhere to say how reliable that number is.
> A taint flag or CPU bug is probably going to far, but something like this:
> 
> bool logical_package_count_unreliable = false;
> 
> void mark_bad_package_count(char *reason)
> {
> 	if (logical_package_count_unreliable)
> 		return true;
> 
> 	pr_warn("processor package count is unreliable");
> }
> 
> Might be OK.  Then you can call mark_bad_package_count() from multiple
> sites, like the maxcpus= code.

This should work! we can just add one more check:

	boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
	!logical_package_count_unreliable &&
	logical_packages <= 2 

And when some new case leading to a imprecise 'logical_packages' is
found in future, we could just apply to this to it.

> But, like I said in the other thread, let's make sure we're agreed on
> the precise problem that we're solving before we go down this road.

Sure.

Thanks,
Feng


> > and adding a new 'tsc=watchdog' option to force watchdog on (might be
> > over-complexed?)
> 
> Agreed, I don't think that's quite warranted yet.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers
  2022-10-25  7:57         ` Feng Tang
@ 2022-11-04  7:21           ` Feng Tang
  0 siblings, 0 replies; 13+ messages in thread
From: Feng Tang @ 2022-11-04  7:21 UTC (permalink / raw)
  To: Dave Hansen, Zhang Rui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Peter Zijlstra, x86
  Cc: linux-kernel, tim.c.chen, Xiongfeng Wang, liaoyu15, oliver.sang,
	ying.huang

On Tue, Oct 25, 2022 at 03:57:12PM +0800, Feng Tang wrote:
> On Mon, Oct 24, 2022 at 08:43:33AM -0700, Dave Hansen wrote:
> > That's too wishy-washy.  Also, I *KNOW* Intel has built systems with
> > wonky, opaque numbers of "sockets".  Cascade Lake was a single physical
> > "socket", but in all other respects (including enumeration to software)
> > it acted like two logical sockets.
> > 
> > So, what was the "real" socket number for Cascade Lake?  If you looked
> > in a chassis, you'd see one socket.  But, there were two dies in that
> > socket talking to each other over UPI, so it had a system topology which
> > was indistinguishable from a 2-socket system.
>  
> Good to know and thanks for the info.
> 
> I have to admit I haven't checked a server's internals before, and
> thanks to Oliver for helping checking some Cascade Lake boxes of 0Day.
> 
> In one box where 'lscpu' shows 4 sockets (96 cores in total), it does
> only have 2 physical processors in the chassis, just like you
> mentioned, it has 2 dies for each processor. And in another box,
> 'lscpu' shows 2 sockets (44 cores in total), it also has 2 physical
> processors but with much smaller size.
> 
> And fortunately the 'logical_packages' for these 2 boxes are both
> the correct value: 2.
> 
> > Let's just state the facts:
> > 
> > 	pr_info("Disabling TSC watchdog on %d-package system.", ...)
> > 
> > Then, we can have a flag elsewhere to say how reliable that number is.
> > A taint flag or CPU bug is probably going to far, but something like this:
> > 
> > bool logical_package_count_unreliable = false;
> > 
> > void mark_bad_package_count(char *reason)
> > {
> > 	if (logical_package_count_unreliable)
> > 		return true;
> > 
> > 	pr_warn("processor package count is unreliable");
> > }
> > 
> > Might be OK.  Then you can call mark_bad_package_count() from multiple
> > sites, like the maxcpus= code.
> 
> This should work! we can just add one more check:
> 
> 	boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> 	!logical_package_count_unreliable &&
> 	logical_packages <= 2 
> 
> And when some new case leading to a imprecise 'logical_packages' is
> found in future, we could just apply to this to it.

Hi Thomas, Peter and reviewers,

For estimating socket numbers, there are quite some BIOS/kernel
settings that can affect its accuracy, like:

* numa emulation (numa=fake=4 etc.)
* numa=off
* platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
  persistent memory nodes.
* SNC (sub-numa cluster) mode is enabled
* 'maxcpus=' cmdline limiting onlined CPU numbers

Ideally, BIOS could provide that info in some MSR or through
CPUID, as it knows most of the information before transfering
to OS, but that's just my wishful thinking for now. 

And we checked several ways for estimating socket number:
* nr_online_nodes
* SRAT table init(parsing node with CPUs)
* 'logical_packages'

And 'logical_packages' is a better option after comparison,
as it works for mostof the cases above except 'maxcpus='
one, where the 'logical_package' could be smaller than the
real number. Dave suggested to explicitly skip the check
and warn.

We plan to use 'logical_packages' to replace current
'nr_online_nodes' for estimation of socket number, any thoughts
on this, or some suggestions? thanks!

- Feng




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform
  2022-10-21  6:21 ` [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Feng Tang
@ 2023-06-02 18:00   ` Paul E. McKenney
  2023-06-05  6:28     ` Feng Tang
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2023-06-02 18:00 UTC (permalink / raw)
  To: Feng Tang
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	tim.c.chen, Xiongfeng Wang, liaoyu15

On Fri, Oct 21, 2022 at 02:21:31PM +0800, Feng Tang wrote:
> There is report again that the tsc clocksource on a 4 sockets x86
> Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
> and disabled [1].
> 
> Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> on qualified platorms") was introduce to deal with these false
> alarms of tsc unstable issues, covering qualified platforms for 2
> sockets or smaller ones.
> 
> Extend the exemption to 4 sockets to fix the issue.
> 
> We also got similar reports on 8 sockets platform from internal test,
> but as Peter pointed out, there was tsc sync issues for 8-sockets
> platform, and it'd better be handled architecture by architecture,
> instead of directly changing the threshold to 8 here.
> 
> Rui also proposed another way to disable 'jiffies' as clocksource
> watchdog [2], which can also solve this specific problem in an
> architecture independent way, with one limitation that there are
> also some tsc false alarms which were reported by other hardware
> watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly
> used in kernel boot phase.
> 
> [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/
> [2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/
> 
> Reported-by: Yu Liao <liaoyu15@huawei.com>
> Signed-off-by: Feng Tang <feng.tang@intel.com>

We have a number of four-socket systems whose TSCs seem to be reliable.
We do see issues where high memory load forces the TSC to be marked
unstable, but that is because those systems are using an older kernel.

If the TSCs do start to misbehave, I will of course let you all know.
But in the meantime:

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>

The previous patch that changes the definition of "socket" I have no
opinion on.  I must let you guys work that out.  However, I do note that
this patch can be rebased so as to no longer depend on that patch.

						Thanx, Paul

> ---
>  arch/x86/kernel/tsc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 178448ef00c7..356f06287034 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void)
>  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
>  	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> -	    logical_packages <= 2)
> +	    logical_packages <= 4)
>  		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
>  
>  	/*
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform
  2023-06-02 18:00   ` Paul E. McKenney
@ 2023-06-05  6:28     ` Feng Tang
  0 siblings, 0 replies; 13+ messages in thread
From: Feng Tang @ 2023-06-05  6:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H . Peter Anvin, Peter Zijlstra, x86, linux-kernel, rui.zhang,
	tim.c.chen, Xiongfeng Wang, liaoyu15

On Fri, Jun 02, 2023 at 11:00:55AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 21, 2022 at 02:21:31PM +0800, Feng Tang wrote:
> > There is report again that the tsc clocksource on a 4 sockets x86
> > Skylake server was wrongly judged as 'unstable' by 'jiffies' watchdog,
> > and disabled [1].
> > 
> > Commit b50db7095fe0 ("x86/tsc: Disable clocksource watchdog for TSC
> > on qualified platorms") was introduce to deal with these false
> > alarms of tsc unstable issues, covering qualified platforms for 2
> > sockets or smaller ones.
> > 
> > Extend the exemption to 4 sockets to fix the issue.
> > 
> > We also got similar reports on 8 sockets platform from internal test,
> > but as Peter pointed out, there was tsc sync issues for 8-sockets
> > platform, and it'd better be handled architecture by architecture,
> > instead of directly changing the threshold to 8 here.
> > 
> > Rui also proposed another way to disable 'jiffies' as clocksource
> > watchdog [2], which can also solve this specific problem in an
> > architecture independent way, with one limitation that there are
> > also some tsc false alarms which were reported by other hardware
> > watchdogs like HPET/PMTIMER, while 'jiffies' watchdog is mostly
> > used in kernel boot phase.
> > 
> > [1]. https://lore.kernel.org/all/9d3bf570-3108-0336-9c52-9bee15767d29@huawei.com/
> > [2]. https://lore.kernel.org/all/bd5b97f89ab2887543fc262348d1c7cafcaae536.camel@intel.com/
> > 
> > Reported-by: Yu Liao <liaoyu15@huawei.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> 
> We have a number of four-socket systems whose TSCs seem to be reliable.
> We do see issues where high memory load forces the TSC to be marked
> unstable, but that is because those systems are using an older kernel.

Thanks for sharing the info.

> 
> If the TSCs do start to misbehave, I will of course let you all know.

That will be very helpful! I don't have much access to 4 socket machines.

> But in the meantime:
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> 
> The previous patch that changes the definition of "socket" I have no
> opinion on.  I must let you guys work that out.  However, I do note that
> this patch can be rebased so as to no longer depend on that patch.

During previous discussion, Thomas and Peter mentioned they only
saw real TSC synchronization issue on some old generation 8
socket/package machine.

I can separate this and send out for review. Meanwhile I'll rework
on the 1/2 patch and test more.

Thanks,
Feng

> 						Thanx, Paul
> 
> > ---
> >  arch/x86/kernel/tsc.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> > index 178448ef00c7..356f06287034 100644
> > --- a/arch/x86/kernel/tsc.c
> > +++ b/arch/x86/kernel/tsc.c
> > @@ -1400,7 +1400,7 @@ static int __init init_tsc_clocksource(void)
> >  	if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> >  	    boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> >  	    boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
> > -	    logical_packages <= 2)
> > +	    logical_packages <= 4)
> >  		clocksource_tsc.flags &= ~CLOCK_SOURCE_MUST_VERIFY;
> >  
> >  	/*
> > -- 
> > 2.34.1
> > 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-05  6:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-21  6:21 [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Feng Tang
2022-10-21  6:21 ` [PATCH v1 2/2] x86/tsc: Extend watchdog check exemption to 4-Sockets platform Feng Tang
2023-06-02 18:00   ` Paul E. McKenney
2023-06-05  6:28     ` Feng Tang
2022-10-21 15:00 ` [PATCH v1 1/2] x86/tsc: use logical_package as a better estimation of socket numbers Zhang Rui
2022-10-21 16:21   ` Dave Hansen
2022-10-22 16:12     ` Zhang Rui
2022-10-24 15:42       ` Dave Hansen
2022-10-25  7:35         ` Feng Tang
2022-10-24  7:37     ` Feng Tang
2022-10-24 15:43       ` Dave Hansen
2022-10-25  7:57         ` Feng Tang
2022-11-04  7:21           ` Feng Tang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.