All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/tsc: Avoid TSC sync failure
@ 2022-03-14 19:46 Waiman Long
  2022-03-14 19:46 ` [PATCH 1/2] x86/tsc: Reduce external interference on max_warp detection Waiman Long
  2022-03-14 19:46 ` [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment Waiman Long
  0 siblings, 2 replies; 11+ messages in thread
From: Waiman Long @ 2022-03-14 19:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray,
	Jirka Hladky, Waiman Long

When booting a kernel on some Intel CooperLake systems, it was found
that the initial TSC sync at boot up may sometime fail leading to a
fallback to HPET as the system clock instead with some performance
degradation. For example

[    0.034090] TSC deadline timer available
[    0.008807] TSC ADJUST compensate: CPU36 observed 95626 warp. Adjust: 95626
[    0.008807] TSC ADJUST compensate: CPU36 observed 74 warp. Adjust: 95700
[    0.974281] TSC synchronization [CPU#0 -> CPU#36]:
[    0.974281] Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
[    0.974281] tsc: Marking TSC unstable due to check_tsc_sync_source failed

This patch set tries to minimize these type of failures by
 1) Put all the TSC synchronization variables in their own cacheline to
    minimize external interference.
 2) Try to estimate the synchronization overhead and add it to the
    adjustment value.

With these changes in place, only one TSC adjustment was observed for
each of the affected cpus with no failure.

Waiman Long (2):
  x86/tsc: Reduce external interference on max_warp detection
  x86/tsc_sync: Add synchronization overhead to tsc adjustment

 arch/x86/kernel/tsc_sync.c | 84 ++++++++++++++++++++++----------------
 1 file changed, 48 insertions(+), 36 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] x86/tsc: Reduce external interference on max_warp detection
  2022-03-14 19:46 [PATCH 0/2] x86/tsc: Avoid TSC sync failure Waiman Long
@ 2022-03-14 19:46 ` Waiman Long
  2022-03-14 19:46 ` [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment Waiman Long
  1 sibling, 0 replies; 11+ messages in thread
From: Waiman Long @ 2022-03-14 19:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray,
	Jirka Hladky, Waiman Long

The TSC max_warp detection code in check_tsc_warp() is very timing
sensitive. Due to the possibility of false cacheline sharing,
activities done in other CPUs may have an impact on the max_warp
detection process. Put the max_wrap
detection data variables on their own cacheline to reduce that kind
of external interference.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/x86/kernel/tsc_sync.c | 57 ++++++++++++++++++++------------------
 1 file changed, 30 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 9452dc9664b5..70aeb254b62b 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -253,12 +253,15 @@ static atomic_t test_runs;
  * we want to have the fastest, inlined, non-debug version
  * of a critical section, to be able to prove TSC time-warps:
  */
-static arch_spinlock_t sync_lock = __ARCH_SPIN_LOCK_UNLOCKED;
-
-static cycles_t last_tsc;
-static cycles_t max_warp;
-static int nr_warps;
-static int random_warps;
+static struct {
+	arch_spinlock_t lock;
+	int nr_warps;
+	int random_warps;
+	cycles_t last_tsc;
+	cycles_t max_warp;
+} sync ____cacheline_aligned_in_smp = {
+	.lock = __ARCH_SPIN_LOCK_UNLOCKED,
+};
 
 /*
  * TSC-warp measurement loop running on both CPUs.  This is not called
@@ -281,11 +284,11 @@ static cycles_t check_tsc_warp(unsigned int timeout)
 		 * previous TSC that was measured (possibly on
 		 * another CPU) and update the previous TSC timestamp.
 		 */
-		arch_spin_lock(&sync_lock);
-		prev = last_tsc;
+		arch_spin_lock(&sync.lock);
+		prev = sync.last_tsc;
 		now = rdtsc_ordered();
-		last_tsc = now;
-		arch_spin_unlock(&sync_lock);
+		sync.last_tsc = now;
+		arch_spin_unlock(&sync.lock);
 
 		/*
 		 * Be nice every now and then (and also check whether
@@ -304,18 +307,18 @@ static cycles_t check_tsc_warp(unsigned int timeout)
 		 * we saw a time-warp of the TSC going backwards:
 		 */
 		if (unlikely(prev > now)) {
-			arch_spin_lock(&sync_lock);
-			max_warp = max(max_warp, prev - now);
-			cur_max_warp = max_warp;
+			arch_spin_lock(&sync.lock);
+			sync.max_warp = max(sync.max_warp, prev - now);
+			cur_max_warp = sync.max_warp;
 			/*
 			 * Check whether this bounces back and forth. Only
 			 * one CPU should observe time going backwards.
 			 */
-			if (cur_warps != nr_warps)
-				random_warps++;
-			nr_warps++;
-			cur_warps = nr_warps;
-			arch_spin_unlock(&sync_lock);
+			if (cur_warps != sync.nr_warps)
+				sync.random_warps++;
+			sync.nr_warps++;
+			cur_warps = sync.nr_warps;
+			arch_spin_unlock(&sync.lock);
 		}
 	}
 	WARN(!(now-start),
@@ -394,21 +397,21 @@ void check_tsc_sync_source(int cpu)
 	 * stop. If not, decrement the number of runs an check if we can
 	 * retry. In case of random warps no retry is attempted.
 	 */
-	if (!nr_warps) {
+	if (!sync.nr_warps) {
 		atomic_set(&test_runs, 0);
 
 		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
 			smp_processor_id(), cpu);
 
-	} else if (atomic_dec_and_test(&test_runs) || random_warps) {
+	} else if (atomic_dec_and_test(&test_runs) || sync.random_warps) {
 		/* Force it to 0 if random warps brought us here */
 		atomic_set(&test_runs, 0);
 
 		pr_warn("TSC synchronization [CPU#%d -> CPU#%d]:\n",
 			smp_processor_id(), cpu);
 		pr_warn("Measured %Ld cycles TSC warp between CPUs, "
-			"turning off TSC clock.\n", max_warp);
-		if (random_warps)
+			"turning off TSC clock.\n", sync.max_warp);
+		if (sync.random_warps)
 			pr_warn("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
 	}
@@ -417,10 +420,10 @@ void check_tsc_sync_source(int cpu)
 	 * Reset it - just in case we boot another CPU later:
 	 */
 	atomic_set(&start_count, 0);
-	random_warps = 0;
-	nr_warps = 0;
-	max_warp = 0;
-	last_tsc = 0;
+	sync.random_warps = 0;
+	sync.nr_warps = 0;
+	sync.max_warp = 0;
+	sync.last_tsc = 0;
 
 	/*
 	 * Let the target continue with the bootup:
@@ -476,7 +479,7 @@ void check_tsc_sync_target(void)
 	/*
 	 * Store the maximum observed warp value for a potential retry:
 	 */
-	gbl_max_warp = max_warp;
+	gbl_max_warp = sync.max_warp;
 
 	/*
 	 * Ok, we are done:
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-03-14 19:46 [PATCH 0/2] x86/tsc: Avoid TSC sync failure Waiman Long
  2022-03-14 19:46 ` [PATCH 1/2] x86/tsc: Reduce external interference on max_warp detection Waiman Long
@ 2022-03-14 19:46 ` Waiman Long
  2022-04-03 10:03   ` Thomas Gleixner
  1 sibling, 1 reply; 11+ messages in thread
From: Waiman Long @ 2022-03-14 19:46 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray,
	Jirka Hladky, Waiman Long

As stated in the comment of check_tsc_sync_target():

  The adjustment value is slightly off by the overhead of the
  sync mechanism (observed values are ~200 TSC cycles), but this
  really depends on CPU, node distance and frequency. So
  compensating for this is hard to get right.

That overhead, however, can cause the tsc adjustment to fail after 3
test runs as can be seen when booting up a hot 4-socket Intel CooperLake
system:

[    0.034090] TSC deadline timer available
[    0.008807] TSC ADJUST compensate: CPU36 observed 95626 warp. Adjust: 95626
[    0.008807] TSC ADJUST compensate: CPU36 observed 74 warp. Adjust: 95700
[    0.974281] TSC synchronization [CPU#0 -> CPU#36]:
[    0.974281] Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
[    0.974281] tsc: Marking TSC unstable due to check_tsc_sync_source failed

To prevent this tsc adjustment failure, we need to estimate the sync
overhead which will be at least an unlock operation in one cpu followed
by a lock operation in another cpu.

The measurement is done in check_tsc_sync_target() after stop_count
reached 2 which is set by the source cpu after it re-initializes the sync
variables causing the lock cacheline to be remote from the target cpu.
The subsequent time measurement will then be similar to latency between
successive 2-cpu sync loop in check_tsc_warp().

Interrupt should not yet been enabled when check_tsc_sync_target() is
called. However some interference may have caused the overhead estimation
to vary a bit.  With this patch applied, the measured overhead on the
same CooperLake system on different reboot runs varies from 104 to 326.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 arch/x86/kernel/tsc_sync.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 70aeb254b62b..e2c43ba4e7b8 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -445,6 +445,7 @@ void check_tsc_sync_target(void)
 	struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
 	unsigned int cpu = smp_processor_id();
 	cycles_t cur_max_warp, gbl_max_warp;
+	cycles_t start, sync_overhead;
 	int cpus = 2;
 
 	/* Also aborts if there is no TSC. */
@@ -505,29 +506,37 @@ void check_tsc_sync_target(void)
 	if (!atomic_read(&test_runs))
 		return;
 
+	/*
+	 * Estimate the synchronization overhead by measuring the time for
+	 * a lock/unlock operation.
+	 */
+	start = rdtsc_ordered();
+	arch_spin_lock(&sync.lock);
+	arch_spin_unlock(&sync.lock);
+	sync_overhead = rdtsc_ordered() - start;
+
 	/*
 	 * If the warp value of this CPU is 0, then the other CPU
 	 * observed time going backwards so this TSC was ahead and
 	 * needs to move backwards.
 	 */
-	if (!cur_max_warp)
+	if (!cur_max_warp) {
 		cur_max_warp = -gbl_max_warp;
+		sync_overhead = -sync_overhead;
+	}
 
 	/*
 	 * Add the result to the previous adjustment value.
 	 *
 	 * The adjustment value is slightly off by the overhead of the
 	 * sync mechanism (observed values are ~200 TSC cycles), but this
-	 * really depends on CPU, node distance and frequency. So
-	 * compensating for this is hard to get right. Experiments show
-	 * that the warp is not longer detectable when the observed warp
-	 * value is used. In the worst case the adjustment needs to go
-	 * through a 3rd run for fine tuning.
+	 * really depends on CPU, node distance and frequency. Add the
+	 * estimated sync overhead to the adjustment value.
 	 */
-	cur->adjusted += cur_max_warp;
+	cur->adjusted += cur_max_warp + sync_overhead;
 
-	pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
-		cpu, cur_max_warp, cur->adjusted);
+	pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp (overhead %lld). Adjust: %lld\n",
+		cpu, cur_max_warp, sync_overhead, cur->adjusted);
 
 	wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
 	goto retry;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-03-14 19:46 ` [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment Waiman Long
@ 2022-04-03 10:03   ` Thomas Gleixner
       [not found]     ` <d1a04785-4822-3a3f-5c37-81329a562364@redhat.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2022-04-03 10:03 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray,
	Jirka Hladky, Waiman Long

On Mon, Mar 14 2022 at 15:46, Waiman Long wrote:
> As stated in the comment of check_tsc_sync_target():
>
>   The adjustment value is slightly off by the overhead of the
>   sync mechanism (observed values are ~200 TSC cycles), but this
>   really depends on CPU, node distance and frequency. So
>   compensating for this is hard to get right.
>
> That overhead, however, can cause the tsc adjustment to fail after 3
> test runs as can be seen when booting up a hot 4-socket Intel CooperLake
> system:
>
> [    0.034090] TSC deadline timer available
> [    0.008807] TSC ADJUST compensate: CPU36 observed 95626 warp. Adjust: 95626
> [    0.008807] TSC ADJUST compensate: CPU36 observed 74 warp. Adjust: 95700
> [    0.974281] TSC synchronization [CPU#0 -> CPU#36]:
> [    0.974281] Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
> [    0.974281] tsc: Marking TSC unstable due to check_tsc_sync_source failed
>
> To prevent this tsc adjustment failure, we need to estimate the sync
> overhead which will be at least an unlock operation in one cpu followed
> by a lock operation in another cpu.
>
> The measurement is done in check_tsc_sync_target() after stop_count
> reached 2 which is set by the source cpu after it re-initializes the sync
> variables causing the lock cacheline to be remote from the target cpu.
> The subsequent time measurement will then be similar to latency between
> successive 2-cpu sync loop in check_tsc_warp().
>
> Interrupt should not yet been enabled when check_tsc_sync_target() is

Interrupts _ARE_ not enabled.

> called. However some interference may have caused the overhead estimation
> to vary a bit.  With this patch applied, the measured overhead on the
> same CooperLake system on different reboot runs varies from 104 to 326.

Hmm.

> [    0.008807] TSC ADJUST compensate: CPU36 observed 95626 warp. Adjust: 95626
> [    0.008807] TSC ADJUST compensate: CPU36 observed 74 warp. Adjust: 95700
> [    0.974281] Measured 4 cycles TSC warp between CPUs, turning off TSC clock.
> [    0.974281] tsc: Marking TSC unstable due to check_tsc_sync_source failed

Can you please provide the output after your changes? It's hard to tell
what effect adding the lock compensation has. Ideally from several runs
for both patched and unpatched.

Also if the compensation value is at the upper end and the real overhead
is way lower then the validation run might end up with the opposite
result. I'm a bit worried about this variation.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
       [not found]     ` <d1a04785-4822-3a3f-5c37-81329a562364@redhat.com>
@ 2022-04-22 10:41       ` Thomas Gleixner
  2022-04-25 13:20         ` Waiman Long
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2022-04-22 10:41 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On Mon, Apr 18 2022 at 11:41, Waiman Long wrote:
> On 4/3/22 06:03, Thomas Gleixner wrote:
> [    0.008815] TSC ADJUST compensate: CPU36 observed 86056 warp 
> (overhead 150). Adjust: 86206
> [    0.008815] TSC ADJUST compensate: CPU54 observed 86148 warp 
> (overhead 158). Adjust: 86306
>
>> Also if the compensation value is at the upper end and the real overhead
>> is way lower then the validation run might end up with the opposite
>> result. I'm a bit worried about this variation.
>
> I also have a little concern about that. That is why I add patch 1 to 
> minimize as much external interference as possible. For the TSC 
> adjustment samples that I got so far, I have never seen one that need a 
> 2nd adjustment to go backward.

I did some experiments and noticed that the boot time overhead is
different from the overhead when doing the sync check after boot
(offline a socket and on/offline the first CPU of it several times).

During boot the overhead is lower on this machine (SKL-X), during
runtime it's way higher and more noisy.

The noise can be pretty much eliminated by running the sync_overhead
measurement multiple times and building the average.

The reason why it is higher is that after offlining the socket the CPU
comes back up with a frequency of 700Mhz while during boot it runs with
2100Mhz.

Sync overhead: 118
Sync overhead:  51 A: 22466 M: 22448 F: 2101683

Sync overhead: 178
Sync overhead: 152 A: 22477 M: 67380 F:  700529

Sync overhead: 212
Sync overhead: 152 A: 22475 M: 67380 F:  700467

Sync overhead: 153
Sync overhead: 152 A: 22497 M: 67452 F:  700404

Can you try the patch below and check whether the overhead stabilizes
accross several attempts on that copperlake machine and whether the
frequency is always the same or varies?

Independent of the outcome on that, I think have to take the actual CPU
frequency into account for calculating the overhead.

Thanks,

        tglx
---
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -446,7 +446,8 @@ void check_tsc_sync_target(void)
 	unsigned int cpu = smp_processor_id();
 	cycles_t cur_max_warp, gbl_max_warp;
 	cycles_t start, sync_overhead;
-	int cpus = 2;
+	u64 ap1, ap2, mp1, mp2;
+	int i, cpus = 2;
 
 	/* Also aborts if there is no TSC. */
 	if (unsynchronized_tsc())
@@ -514,6 +515,23 @@ void check_tsc_sync_target(void)
 	arch_spin_lock(&sync.lock);
 	arch_spin_unlock(&sync.lock);
 	sync_overhead = rdtsc_ordered() - start;
+	pr_info("Sync overhead: %lld\n", sync_overhead);
+
+	sync_overhead = 0;
+	rdmsrl(MSR_IA32_APERF, ap1);
+	rdmsrl(MSR_IA32_MPERF, mp1);
+	for (i = 0; i < 256; i++) {
+		start = rdtsc_ordered();
+		arch_spin_lock(&sync.lock);
+		arch_spin_unlock(&sync.lock);
+		sync_overhead += rdtsc_ordered() - start;
+	};
+	rdmsrl(MSR_IA32_APERF, ap2);
+	rdmsrl(MSR_IA32_MPERF, mp2);
+	ap2 -= ap1;
+	mp2 -= mp1;
+	pr_info("Sync overhead: %lld A: %llu M: %llu F: %llu\n", sync_overhead >> 8,
+		ap2, mp2, mp2 ? div64_u64((cpu_khz * ap2), mp2) : 0);
 
 	/*
 	 * If the warp value of this CPU is 0, then the other CPU

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-04-22 10:41       ` Thomas Gleixner
@ 2022-04-25 13:20         ` Waiman Long
  2022-04-25 19:24           ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2022-04-25 13:20 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On 4/22/22 06:41, Thomas Gleixner wrote:
> On Mon, Apr 18 2022 at 11:41, Waiman Long wrote:
>> On 4/3/22 06:03, Thomas Gleixner wrote:
>> [    0.008815] TSC ADJUST compensate: CPU36 observed 86056 warp
>> (overhead 150). Adjust: 86206
>> [    0.008815] TSC ADJUST compensate: CPU54 observed 86148 warp
>> (overhead 158). Adjust: 86306
>>
>>> Also if the compensation value is at the upper end and the real overhead
>>> is way lower then the validation run might end up with the opposite
>>> result. I'm a bit worried about this variation.
>> I also have a little concern about that. That is why I add patch 1 to
>> minimize as much external interference as possible. For the TSC
>> adjustment samples that I got so far, I have never seen one that need a
>> 2nd adjustment to go backward.
> I did some experiments and noticed that the boot time overhead is
> different from the overhead when doing the sync check after boot
> (offline a socket and on/offline the first CPU of it several times).
>
> During boot the overhead is lower on this machine (SKL-X), during
> runtime it's way higher and more noisy.
>
> The noise can be pretty much eliminated by running the sync_overhead
> measurement multiple times and building the average.
>
> The reason why it is higher is that after offlining the socket the CPU
> comes back up with a frequency of 700Mhz while during boot it runs with
> 2100Mhz.
>
> Sync overhead: 118
> Sync overhead:  51 A: 22466 M: 22448 F: 2101683
One explanation of the sync overhead difference (118 vs 51) here is 
whether the lock cacheline is local or remote. My analysis the 
interaction between check_tsc_sync_source() and check_tsc_sync_target() 
is that real overhead is about locking with remote cacheline (local to 
source, remote to target). When you do a 256 loop of locking, it is all 
local cacheline. That is why the overhead is lower. It also depends on 
if the remote cacheline is in the same socket or a different socket.
>
> Sync overhead: 178
> Sync overhead: 152 A: 22477 M: 67380 F:  700529
>
> Sync overhead: 212
> Sync overhead: 152 A: 22475 M: 67380 F:  700467
>
> Sync overhead: 153
> Sync overhead: 152 A: 22497 M: 67452 F:  700404
>
> Can you try the patch below and check whether the overhead stabilizes
> accross several attempts on that copperlake machine and whether the
> frequency is always the same or varies?
Yes, I will try that experiment and report back the results.
>
> Independent of the outcome on that, I think have to take the actual CPU
> frequency into account for calculating the overhead.

Assuming that the clock frequency remains the same during the 
check_tsc_warp() loop and the sync overhead computation time, I don't 
think the actual clock frequency matters much. However, it will be a 
different matter if the frequency does change. In this case, it is more 
likely the frequency will go up than down. Right? IOW, we may 
underestimate the sync overhead in this case. I think it is better than 
overestimating it.

Cheers,
Longman

>
> Thanks,
>
>          tglx
> ---
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -446,7 +446,8 @@ void check_tsc_sync_target(void)
>   	unsigned int cpu = smp_processor_id();
>   	cycles_t cur_max_warp, gbl_max_warp;
>   	cycles_t start, sync_overhead;
> -	int cpus = 2;
> +	u64 ap1, ap2, mp1, mp2;
> +	int i, cpus = 2;
>   
>   	/* Also aborts if there is no TSC. */
>   	if (unsynchronized_tsc())
> @@ -514,6 +515,23 @@ void check_tsc_sync_target(void)
>   	arch_spin_lock(&sync.lock);
>   	arch_spin_unlock(&sync.lock);
>   	sync_overhead = rdtsc_ordered() - start;
> +	pr_info("Sync overhead: %lld\n", sync_overhead);
> +
> +	sync_overhead = 0;
> +	rdmsrl(MSR_IA32_APERF, ap1);
> +	rdmsrl(MSR_IA32_MPERF, mp1);
> +	for (i = 0; i < 256; i++) {
> +		start = rdtsc_ordered();
> +		arch_spin_lock(&sync.lock);
> +		arch_spin_unlock(&sync.lock);
> +		sync_overhead += rdtsc_ordered() - start;
> +	};
> +	rdmsrl(MSR_IA32_APERF, ap2);
> +	rdmsrl(MSR_IA32_MPERF, mp2);
> +	ap2 -= ap1;
> +	mp2 -= mp1;
> +	pr_info("Sync overhead: %lld A: %llu M: %llu F: %llu\n", sync_overhead >> 8,
> +		ap2, mp2, mp2 ? div64_u64((cpu_khz * ap2), mp2) : 0);
>   
>   	/*
>   	 * If the warp value of this CPU is 0, then the other CPU
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-04-25 13:20         ` Waiman Long
@ 2022-04-25 19:24           ` Thomas Gleixner
  2022-04-26 15:36             ` Waiman Long
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2022-04-25 19:24 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On Mon, Apr 25 2022 at 09:20, Waiman Long wrote:
> On 4/22/22 06:41, Thomas Gleixner wrote:
>> I did some experiments and noticed that the boot time overhead is
>> different from the overhead when doing the sync check after boot
>> (offline a socket and on/offline the first CPU of it several times).
>>
>> During boot the overhead is lower on this machine (SKL-X), during
>> runtime it's way higher and more noisy.
>>
>> The noise can be pretty much eliminated by running the sync_overhead
>> measurement multiple times and building the average.
>>
>> The reason why it is higher is that after offlining the socket the CPU
>> comes back up with a frequency of 700Mhz while during boot it runs with
>> 2100Mhz.
>>
>> Sync overhead: 118
>> Sync overhead:  51 A: 22466 M: 22448 F: 2101683
> One explanation of the sync overhead difference (118 vs 51) here is 
> whether the lock cacheline is local or remote. My analysis the 
> interaction between check_tsc_sync_source() and check_tsc_sync_target() 
> is that real overhead is about locking with remote cacheline (local to 
> source, remote to target). When you do a 256 loop of locking, it is all 
> local cacheline. That is why the overhead is lower. It also depends on 
> if the remote cacheline is in the same socket or a different socket.

Yes. It's clear that the initial sync overhead is due to the cache line
being remote, but I rather underestimate the compensation. Aside of that
it's not guaranteed that the cache line is actually remote on the first
access. It's by chance, but not by design.

>> Sync overhead: 178
>> Sync overhead: 152 A: 22477 M: 67380 F:  700529
>>
>> Sync overhead: 212
>> Sync overhead: 152 A: 22475 M: 67380 F:  700467
>>
>> Sync overhead: 153
>> Sync overhead: 152 A: 22497 M: 67452 F:  700404
>>
>> Can you try the patch below and check whether the overhead stabilizes
>> accross several attempts on that copperlake machine and whether the
>> frequency is always the same or varies?
> Yes, I will try that experiment and report back the results.
>>
>> Independent of the outcome on that, I think have to take the actual CPU
>> frequency into account for calculating the overhead.
>
> Assuming that the clock frequency remains the same during the 
> check_tsc_warp() loop and the sync overhead computation time, I don't 
> think the actual clock frequency matters much. However, it will be a 
> different matter if the frequency does change. In this case, it is more 
> likely the frequency will go up than down. Right? IOW, we may 
> underestimate the sync overhead in this case. I think it is better than 
> overestimating it.

The question is not whether the clock frequency changes during the loop.
The point is:

    start = rdtsc();
    do_stuff();
    end = rdtsc();
    compensation = end - start;
    
do_stuff() executes a constant number of instructions which are executed
in a constant number of CPU clock cycles, let's say 100 for simplicity.
TSC runs with 2000MHz.

With a CPU frequency of 1000 MHz the real computation time is:

   100/1000MHz = 100 nsec = 200 TSC cycles

while with a CPU frequency of 2000MHz it is obviously:

   100/2000MHz =  50 nsec = 100 TSC cyles

IOW, TSC runs with a constant frequency independent of the actual CPU
frequency, ergo the CPU frequency dependent execution time has an
influence on the resulting compensation value, no?

On the machine I tested on, it's a factor of 3 between the minimal and
the maximal CPU frequency, which makes quite a difference, right?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-04-25 19:24           ` Thomas Gleixner
@ 2022-04-26 15:36             ` Waiman Long
  2022-04-27 22:38               ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2022-04-26 15:36 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On 4/25/22 15:24, Thomas Gleixner wrote:
> On Mon, Apr 25 2022 at 09:20, Waiman Long wrote:
>> On 4/22/22 06:41, Thomas Gleixner wrote:
>>> I did some experiments and noticed that the boot time overhead is
>>> different from the overhead when doing the sync check after boot
>>> (offline a socket and on/offline the first CPU of it several times).
>>>
>>> During boot the overhead is lower on this machine (SKL-X), during
>>> runtime it's way higher and more noisy.
>>>
>>> The noise can be pretty much eliminated by running the sync_overhead
>>> measurement multiple times and building the average.
>>>
>>> The reason why it is higher is that after offlining the socket the CPU
>>> comes back up with a frequency of 700Mhz while during boot it runs with
>>> 2100Mhz.
>>>
>>> Sync overhead: 118
>>> Sync overhead:  51 A: 22466 M: 22448 F: 2101683
>> One explanation of the sync overhead difference (118 vs 51) here is
>> whether the lock cacheline is local or remote. My analysis the
>> interaction between check_tsc_sync_source() and check_tsc_sync_target()
>> is that real overhead is about locking with remote cacheline (local to
>> source, remote to target). When you do a 256 loop of locking, it is all
>> local cacheline. That is why the overhead is lower. It also depends on
>> if the remote cacheline is in the same socket or a different socket.
> Yes. It's clear that the initial sync overhead is due to the cache line
> being remote, but I rather underestimate the compensation. Aside of that
> it's not guaranteed that the cache line is actually remote on the first
> access. It's by chance, but not by design.

In check_tsc_warp(), the (unlikely(prev > now) check may only be 
triggered to record the possible wrap if last_tsc was previously written 
to by another cpu. That requires the transfer of lock cacheline from the 
remote cpu to local cpu as well. So sync overhead with remote cacheline 
is what really matters here. I had actually thought about just measuring 
local cacheline sync overhead so as to underestimate it and I am fine 
about doing it.

>>> Sync overhead: 178
>>> Sync overhead: 152 A: 22477 M: 67380 F:  700529
>>>
>>> Sync overhead: 212
>>> Sync overhead: 152 A: 22475 M: 67380 F:  700467
>>>
>>> Sync overhead: 153
>>> Sync overhead: 152 A: 22497 M: 67452 F:  700404
>>>
>>> Can you try the patch below and check whether the overhead stabilizes
>>> accross several attempts on that copperlake machine and whether the
>>> frequency is always the same or varies?
>> Yes, I will try that experiment and report back the results.
>>> Independent of the outcome on that, I think have to take the actual CPU
>>> frequency into account for calculating the overhead.
>> Assuming that the clock frequency remains the same during the
>> check_tsc_warp() loop and the sync overhead computation time, I don't
>> think the actual clock frequency matters much. However, it will be a
>> different matter if the frequency does change. In this case, it is more
>> likely the frequency will go up than down. Right? IOW, we may
>> underestimate the sync overhead in this case. I think it is better than
>> overestimating it.
> The question is not whether the clock frequency changes during the loop.
> The point is:
>
>      start = rdtsc();
>      do_stuff();
>      end = rdtsc();
>      compensation = end - start;
>      
> do_stuff() executes a constant number of instructions which are executed
> in a constant number of CPU clock cycles, let's say 100 for simplicity.
> TSC runs with 2000MHz.
>
> With a CPU frequency of 1000 MHz the real computation time is:
>
>     100/1000MHz = 100 nsec = 200 TSC cycles
>
> while with a CPU frequency of 2000MHz it is obviously:
>
>     100/2000MHz =  50 nsec = 100 TSC cyles
>
> IOW, TSC runs with a constant frequency independent of the actual CPU
> frequency, ergo the CPU frequency dependent execution time has an
> influence on the resulting compensation value, no?
>
> On the machine I tested on, it's a factor of 3 between the minimal and
> the maximal CPU frequency, which makes quite a difference, right?

Yes, I understand that. The measurement of sync_overhead is for 
estimating the delay (in TSC cycles) that the locking overhead 
introduces. With 1000MHz frequency, the delay in TSC cycle will be 
double that of a cpu running at 2000MHz. So you need more compensation 
in this case. That is why I said that as long as clock frequency doesn't 
change in the check_tsc_wrap() loop and the sync_overhead measurement 
part of the code, the actual cpu frequency does not matter here.

However about we half the measure sync_overhead as compensation to avoid 
over-estimation, but probably increase the chance that we need a second 
adjustment of TSC wrap.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-04-26 15:36             ` Waiman Long
@ 2022-04-27 22:38               ` Thomas Gleixner
  2022-04-29 17:41                 ` Waiman Long
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2022-04-27 22:38 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On Tue, Apr 26 2022 at 11:36, Waiman Long wrote:
> On 4/25/22 15:24, Thomas Gleixner wrote:
>> Yes. It's clear that the initial sync overhead is due to the cache line
>> being remote, but I rather underestimate the compensation. Aside of that
>> it's not guaranteed that the cache line is actually remote on the first
>> access. It's by chance, but not by design.
>
> In check_tsc_warp(), the (unlikely(prev > now) check may only be 
> triggered to record the possible wrap if last_tsc was previously written 
> to by another cpu. That requires the transfer of lock cacheline from the 
> remote cpu to local cpu as well. So sync overhead with remote cacheline 
> is what really matters here. I had actually thought about just measuring 
> local cacheline sync overhead so as to underestimate it and I am fine 
> about doing it.

Fair enough, but what I meant is that when estimating the actual sync
overhead then there is no guarantee that the cache line is remote.

The CPU which does that estimation might have been the last to lock,
there is no guarantee that the reference CPU locked last or wrote to the
cache line last.

>> IOW, TSC runs with a constant frequency independent of the actual CPU
>> frequency, ergo the CPU frequency dependent execution time has an
>> influence on the resulting compensation value, no?
>>
>> On the machine I tested on, it's a factor of 3 between the minimal and
>> the maximal CPU frequency, which makes quite a difference, right?
>
> Yes, I understand that. The measurement of sync_overhead is for 
> estimating the delay (in TSC cycles) that the locking overhead 
> introduces. With 1000MHz frequency, the delay in TSC cycle will be 
> double that of a cpu running at 2000MHz. So you need more compensation 
> in this case. That is why I said that as long as clock frequency doesn't 
> change in the check_tsc_wrap() loop and the sync_overhead measurement 
> part of the code, the actual cpu frequency does not matter here.

I grant you that it does not matter for the loop under the assumption
that the loop runs at constant frequency, but is that a guarantee that
it does not matter later on?

If you overcompensate by a factor of 3 because the upcoming CPU ran at
the lowest frequency, then it might become visible later when everything
runs at full speed.

> However about we half the measure sync_overhead as compensation to avoid 
> over-estimation, but probably increase the chance that we need a second 
> adjustment of TSC wrap.

Half of what?

> With this patch applied, the measured overhead on the same CooperLake
> system on different reboot runs varies from 104 to 326.

Half of something which jumps around? Not convinced. :)

Btw:
>> Yes, I will try that experiment and report back the results.

Could you please do that? I really like to see the data points.

It's so sad that we have still to deal with this nonsense just because
HW people don't give a damn.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-04-27 22:38               ` Thomas Gleixner
@ 2022-04-29 17:41                 ` Waiman Long
  2022-05-02  7:56                   ` Thomas Gleixner
  0 siblings, 1 reply; 11+ messages in thread
From: Waiman Long @ 2022-04-29 17:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On 4/27/22 18:38, Thomas Gleixner wrote:
> On Tue, Apr 26 2022 at 11:36, Waiman Long wrote:
>> On 4/25/22 15:24, Thomas Gleixner wrote:
>>> Yes. It's clear that the initial sync overhead is due to the cache line
>>> being remote, but I rather underestimate the compensation. Aside of that
>>> it's not guaranteed that the cache line is actually remote on the first
>>> access. It's by chance, but not by design.
>> In check_tsc_warp(), the (unlikely(prev > now) check may only be
>> triggered to record the possible wrap if last_tsc was previously written
>> to by another cpu. That requires the transfer of lock cacheline from the
>> remote cpu to local cpu as well. So sync overhead with remote cacheline
>> is what really matters here. I had actually thought about just measuring
>> local cacheline sync overhead so as to underestimate it and I am fine
>> about doing it.
> Fair enough, but what I meant is that when estimating the actual sync
> overhead then there is no guarantee that the cache line is remote.
>
> The CPU which does that estimation might have been the last to lock,
> there is no guarantee that the reference CPU locked last or wrote to the
> cache line last.
>
>>> IOW, TSC runs with a constant frequency independent of the actual CPU
>>> frequency, ergo the CPU frequency dependent execution time has an
>>> influence on the resulting compensation value, no?
>>>
>>> On the machine I tested on, it's a factor of 3 between the minimal and
>>> the maximal CPU frequency, which makes quite a difference, right?
>> Yes, I understand that. The measurement of sync_overhead is for
>> estimating the delay (in TSC cycles) that the locking overhead
>> introduces. With 1000MHz frequency, the delay in TSC cycle will be
>> double that of a cpu running at 2000MHz. So you need more compensation
>> in this case. That is why I said that as long as clock frequency doesn't
>> change in the check_tsc_wrap() loop and the sync_overhead measurement
>> part of the code, the actual cpu frequency does not matter here.
> I grant you that it does not matter for the loop under the assumption
> that the loop runs at constant frequency, but is that a guarantee that
> it does not matter later on?
Yes, that is my point that frequency doesn't matter if frequency remain 
the same. Of course, all bets are off if frequency really change.
>
> If you overcompensate by a factor of 3 because the upcoming CPU ran at
> the lowest frequency, then it might become visible later when everything
> runs at full speed.
I don't think the overhead will be directly proportional to the cpu 
frequency. A 3X increase in frequency will certainly cause the overhead 
to be lowered, but it won't be 1/3. Maybe 1/2 at most.
>
>> However about we half the measure sync_overhead as compensation to avoid
>> over-estimation, but probably increase the chance that we need a second
>> adjustment of TSC wrap.
> Half of what?

What I mean is

@@ -533,7 +551,7 @@ void check_tsc_sync_target(void)
          * really depends on CPU, node distance and frequency. Add the
          * estimated sync overhead to the adjustment value.
          */
-       cur->adjusted += cur_max_warp + sync_overhead;
+       cur->adjusted += cur_max_warp + sync_overhead/2;

         pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp 
(overhead %lld>
                 cpu, cur_max_warp, sync_overhead, cur->adjusted);

>> With this patch applied, the measured overhead on the same CooperLake
>> system on different reboot runs varies from 104 to 326.
> Half of something which jumps around? Not convinced. :)
>
> Btw:
>>> Yes, I will try that experiment and report back the results.
> Could you please do that? I really like to see the data points.

I have applied your patch with some modification and below was the 
relevant part of the boot up log on a Cooperlake system with this TSC 
sync problem.

[    0.008858] smpboot: CPU 36 Converting physical 0 to logical die 2
[    0.008858] Sync overhead: 230
[    0.008858] Sync overhead: 547 A: 149597 M: 149596 F: 2500016
[    0.008858] TSC ADJUST compensate: CPU36 observed 76342 warp 
(overhead 230). Adjust: 76457
[    0.008858] smpboot: CPU 54 Converting physical 0 to logical die 3
[    0.008858] Sync overhead: 178
[    0.008858] Sync overhead: 658 A: 177970 M: 177968 F: 2500028
[    0.008858] TSC ADJUST compensate: CPU54 observed 76568 warp 
(overhead 178). Adjust: 76657

BTW, CPUs 36 and 54 are the first CPU of socket 2 and 3 respectively. It 
is always these CPUs that need TSC adjustment.

I have no idea why the sync overhead actually increase in the subsequent 
measurement as the frequency was set at 2.5GHz, I think.

I tried to offline all the CPUs in a socket and then online the first 
CPU as suggested. However, I was not able to cause the tsc_sync loop to run.

BTW, without my patch the same system will boot up with the following log:

[    0.008823] TSC ADJUST compensate: CPU36 observed 86036 warp. Adjust: 
86036
[    0.008823] TSC ADJUST compensate: CPU36 observed 122 warp. Adjust: 86158
[    0.923620] Measured 2 cycles TSC warp between CPUs, turning off TSC 
clock.
[    0.923620] tsc: Marking TSC unstable due to check_tsc_sync_source failed

How about we add the full sync_overhead at bootup stage, but then half 
sync_overhead after boot in case the tsc_sync loop is run again for new 
online CPUs? Or any other scheme that you think is appropriate.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment
  2022-04-29 17:41                 ` Waiman Long
@ 2022-05-02  7:56                   ` Thomas Gleixner
  0 siblings, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2022-05-02  7:56 UTC (permalink / raw)
  To: Waiman Long, Ingo Molnar, Borislav Petkov, Dave Hansen
  Cc: x86, linux-kernel, H. Peter Anvin, Feng Tang, Bill Gray, Jirka Hladky

On Fri, Apr 29 2022 at 13:41, Waiman Long wrote:
> On 4/27/22 18:38, Thomas Gleixner wrote:
>> I grant you that it does not matter for the loop under the assumption
>> that the loop runs at constant frequency, but is that a guarantee that
>> it does not matter later on?
> Yes, that is my point that frequency doesn't matter if frequency remain 
> the same. Of course, all bets are off if frequency really change.

Frequency changes after boot.

>> If you overcompensate by a factor of 3 because the upcoming CPU ran at
>> the lowest frequency, then it might become visible later when everything
>> runs at full speed.
> I don't think the overhead will be directly proportional to the cpu 
> frequency. A 3X increase in frequency will certainly cause the overhead 
> to be lowered, but it won't be 1/3. Maybe 1/2 at most.

It's very close to proportional according to my experiments.

>>> However about we half the measure sync_overhead as compensation to avoid
>>> over-estimation, but probably increase the chance that we need a second
>>> adjustment of TSC wrap.
>> Half of what?
>
> What I mean is
>
> @@ -533,7 +551,7 @@ void check_tsc_sync_target(void)
>           * really depends on CPU, node distance and frequency. Add the
>           * estimated sync overhead to the adjustment value.
>           */
> -       cur->adjusted += cur_max_warp + sync_overhead;
> +       cur->adjusted += cur_max_warp + sync_overhead/2;

The point is? Make it magically half while it still jumps around:

>>> With this patch applied, the measured overhead on the same CooperLake
>>> system on different reboot runs varies from 104 to 326.
>> Half of something which jumps around? Not convinced. :)

...

> I have applied your patch with some modification and below was the 
> relevant part of the boot up log on a Cooperlake system with this TSC 
> sync problem.
>
> [    0.008858] smpboot: CPU 36 Converting physical 0 to logical die 2
> [    0.008858] Sync overhead: 230
> [    0.008858] Sync overhead: 547 A: 149597 M: 149596 F: 2500016
> [    0.008858] TSC ADJUST compensate: CPU36 observed 76342 warp 
> (overhead 230). Adjust: 76457
> [    0.008858] smpboot: CPU 54 Converting physical 0 to logical die 3
> [    0.008858] Sync overhead: 178
> [    0.008858] Sync overhead: 658 A: 177970 M: 177968 F: 2500028
> [    0.008858] TSC ADJUST compensate: CPU54 observed 76568 warp 
> (overhead 178). Adjust: 76657
>
> BTW, CPUs 36 and 54 are the first CPU of socket 2 and 3 respectively. It 
> is always these CPUs that need TSC adjustment.
>
> I have no idea why the sync overhead actually increase in the subsequent 
> measurement as the frequency was set at 2.5GHz, I think.

See, that's the thing I'm worried about. An allegedly remote lock takes
178 cycles, but a loop of cache local lock operations takes on average
658 cycles.

That does not make sense and I haven't seen that behaviour on any of my
machines I ran that on. The average was always less than the initial
measurement.

> I tried to offline all the CPUs in a socket and then online the first 
> CPU as suggested. However, I was not able to cause the tsc_sync loop
> to run.

It runs for sure when the first CPU on a socket goes online except when
TSC is already disabled. It does not matter whether it was online before
or not.

> BTW, without my patch the same system will boot up with the following
> log:

We know that by now.

> How about we add the full sync_overhead at bootup stage, but then half 
> sync_overhead after boot in case the tsc_sync loop is run again for new 
> online CPUs? Or any other scheme that you think is appropriate.

Can you please stop grasping for straws?

I'm as much interested as you to get this solved, but I'm not interested
in 'works for me' and 'we don't know why it works' solutions simply
because they end up on my desk sooner than later again.

Thanks

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-02  7:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-14 19:46 [PATCH 0/2] x86/tsc: Avoid TSC sync failure Waiman Long
2022-03-14 19:46 ` [PATCH 1/2] x86/tsc: Reduce external interference on max_warp detection Waiman Long
2022-03-14 19:46 ` [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment Waiman Long
2022-04-03 10:03   ` Thomas Gleixner
     [not found]     ` <d1a04785-4822-3a3f-5c37-81329a562364@redhat.com>
2022-04-22 10:41       ` Thomas Gleixner
2022-04-25 13:20         ` Waiman Long
2022-04-25 19:24           ` Thomas Gleixner
2022-04-26 15:36             ` Waiman Long
2022-04-27 22:38               ` Thomas Gleixner
2022-04-29 17:41                 ` Waiman Long
2022-05-02  7:56                   ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.