All of lore.kernel.org
 help / color / mirror / Atom feed
* IRQ_PIPELINE: TSC marked as unstable
@ 2021-09-03  8:06 Bezdeka, Florian
  2021-09-03  8:33 ` Philippe Gerum
  0 siblings, 1 reply; 5+ messages in thread
From: Bezdeka, Florian @ 2021-09-03  8:06 UTC (permalink / raw)
  To: xenomai

Hi all,

I'm able to reproduce the following on two different platforms now, so
I assume it's a IRQ_PIPELINE generic issue:

Platform A):
Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
1 Socket, 4 Cores, 1 thread per core

Platform B):
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
2 Sockets, 6 cores per socket, 2 threads per core
(2 NUMA nodes)


Platform A) reports the TSC being unstable during the boot phase,
platform B) reports the TSC as unstable when running stress tests:

Taken from a B) based system:

[57615.671114] clocksource: timekeeping watchdog on CPU17: Marking clocksource 'tsc' as unstable because the skew is too large:
[57615.738269] clocksource:                       'hpet' wd_now: 12f85ed0 wd_last: 2c5eab7b mask: ffffffff
[57615.794489] clocksource:                       'tsc' cs_now: 68e299c3708c cs_last: 6864c6ea3970 mask: ffffffffffffffff
[57615.858552] tsc: Marking TSC unstable due to clocksource watchdog
[57615.858582] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[57615.910138] sched_clock: Marking unstable (57615104375773, 749891156)<-(57616072553488, -213973554)
[57615.905983] clocksource: Checking clocksource tsc synchronization from CPU 15.
[57615.949626] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
[57616.016343] clocksource: Switched to clocksource hpet

The clocksource watchdog is migrated between CPUs to make sure the TSC
is synchronized between cores. For me it looks like a late delivery of
the watchdog timer.

Available workaround(s):
- Add "tsc=reliable" to the kernel cmdline args
- At least for A) based systems it helped to apply the following diff to the kernel
  configuration. I do not consider that as "solution" for now.

-CONFIG_HZ_100=y
+CONFIG_HZ_1000=y


As soon as I disable CONFIG_IRQ_PIPELINE the problem is gone.

I already tried testing with CONFIG_DEBUG_IRQ_PIPELINE enabled, but
that didn't help so far.

Any advise how to debug that?

Best regards,
Florian

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IRQ_PIPELINE: TSC marked as unstable
  2021-09-03  8:06 IRQ_PIPELINE: TSC marked as unstable Bezdeka, Florian
@ 2021-09-03  8:33 ` Philippe Gerum
  2021-09-03  8:41   ` Philippe Gerum
  2021-09-03  8:44   ` Bezdeka, Florian
  0 siblings, 2 replies; 5+ messages in thread
From: Philippe Gerum @ 2021-09-03  8:33 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: xenomai, jan.kiszka


Bezdeka, Florian <florian.bezdeka@siemens.com> writes:

> Hi all,
>
> I'm able to reproduce the following on two different platforms now, so
> I assume it's a IRQ_PIPELINE generic issue:
>
> Platform A):
> Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
> 1 Socket, 4 Cores, 1 thread per core
>
> Platform B):
> Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
> 2 Sockets, 6 cores per socket, 2 threads per core
> (2 NUMA nodes)
>
>
> Platform A) reports the TSC being unstable during the boot phase,
> platform B) reports the TSC as unstable when running stress tests:
>
> Taken from a B) based system:
>
> [57615.671114] clocksource: timekeeping watchdog on CPU17: Marking clocksource 'tsc' as unstable because the skew is too large:
> [57615.738269] clocksource:                       'hpet' wd_now: 12f85ed0 wd_last: 2c5eab7b mask: ffffffff
> [57615.794489] clocksource:                       'tsc' cs_now: 68e299c3708c cs_last: 6864c6ea3970 mask: ffffffffffffffff
> [57615.858552] tsc: Marking TSC unstable due to clocksource watchdog
> [57615.858582] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> [57615.910138] sched_clock: Marking unstable (57615104375773, 749891156)<-(57616072553488, -213973554)
> [57615.905983] clocksource: Checking clocksource tsc synchronization from CPU 15.
> [57615.949626] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
> [57616.016343] clocksource: Switched to clocksource hpet
>
> The clocksource watchdog is migrated between CPUs to make sure the TSC
> is synchronized between cores. For me it looks like a late delivery of
> the watchdog timer.
>
> Available workaround(s):
> - Add "tsc=reliable" to the kernel cmdline args
> - At least for A) based systems it helped to apply the following diff to the kernel
>   configuration. I do not consider that as "solution" for now.
>
> -CONFIG_HZ_100=y
> +CONFIG_HZ_1000=y
>
>
> As soon as I disable CONFIG_IRQ_PIPELINE the problem is gone.
>
> I already tried testing with CONFIG_DEBUG_IRQ_PIPELINE enabled, but
> that didn't help so far.
>
> Any advise how to debug that?
>
> Best regards,
> Florian

Could this be related [1] (HPET stanza)?

[1] https://evlproject.org/core/caveat/#x86-caveat

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IRQ_PIPELINE: TSC marked as unstable
  2021-09-03  8:33 ` Philippe Gerum
@ 2021-09-03  8:41   ` Philippe Gerum
  2021-09-06 12:30     ` Bezdeka, Florian
  2021-09-03  8:44   ` Bezdeka, Florian
  1 sibling, 1 reply; 5+ messages in thread
From: Philippe Gerum @ 2021-09-03  8:41 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: xenomai, jan.kiszka


Philippe Gerum <rpm@xenomai.org> writes:

> Bezdeka, Florian <florian.bezdeka@siemens.com> writes:
>
>> Hi all,
>>
>> I'm able to reproduce the following on two different platforms now, so
>> I assume it's a IRQ_PIPELINE generic issue:
>>
>> Platform A):
>> Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
>> 1 Socket, 4 Cores, 1 thread per core
>>
>> Platform B):
>> Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
>> 2 Sockets, 6 cores per socket, 2 threads per core
>> (2 NUMA nodes)
>>
>>
>> Platform A) reports the TSC being unstable during the boot phase,
>> platform B) reports the TSC as unstable when running stress tests:
>>
>> Taken from a B) based system:
>>
>> [57615.671114] clocksource: timekeeping watchdog on CPU17: Marking clocksource 'tsc' as unstable because the skew is too large:
>> [57615.738269] clocksource:                       'hpet' wd_now: 12f85ed0 wd_last: 2c5eab7b mask: ffffffff
>> [57615.794489] clocksource:                       'tsc' cs_now: 68e299c3708c cs_last: 6864c6ea3970 mask: ffffffffffffffff
>> [57615.858552] tsc: Marking TSC unstable due to clocksource watchdog
>> [57615.858582] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
>> [57615.910138] sched_clock: Marking unstable (57615104375773, 749891156)<-(57616072553488, -213973554)
>> [57615.905983] clocksource: Checking clocksource tsc synchronization from CPU 15.
>> [57615.949626] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
>> [57616.016343] clocksource: Switched to clocksource hpet
>>
>> The clocksource watchdog is migrated between CPUs to make sure the TSC
>> is synchronized between cores. For me it looks like a late delivery of
>> the watchdog timer.
>>
>> Available workaround(s):
>> - Add "tsc=reliable" to the kernel cmdline args
>> - At least for A) based systems it helped to apply the following diff to the kernel
>>   configuration. I do not consider that as "solution" for now.
>>
>> -CONFIG_HZ_100=y
>> +CONFIG_HZ_1000=y
>>
>>
>> As soon as I disable CONFIG_IRQ_PIPELINE the problem is gone.
>>
>> I already tried testing with CONFIG_DEBUG_IRQ_PIPELINE enabled, but
>> that didn't help so far.
>>
>> Any advise how to debug that?
>>
>> Best regards,
>> Florian
>
> Could this be related [1] (HPET stanza)?
>
> [1] https://evlproject.org/core/caveat/#x86-caveat


Not directly, you do have HPET enabled and the refined source is not
involved. Did you try enabling the Dovetail torture tests, particularly
on the machine that has the issue at boot time?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IRQ_PIPELINE: TSC marked as unstable
  2021-09-03  8:33 ` Philippe Gerum
  2021-09-03  8:41   ` Philippe Gerum
@ 2021-09-03  8:44   ` Bezdeka, Florian
  1 sibling, 0 replies; 5+ messages in thread
From: Bezdeka, Florian @ 2021-09-03  8:44 UTC (permalink / raw)
  To: rpm; +Cc: xenomai, jan.kiszka

On Fri, 2021-09-03 at 10:33 +0200, Philippe Gerum wrote:
> Bezdeka, Florian <florian.bezdeka@siemens.com> writes:
> 
> > Hi all,
> > 
> > I'm able to reproduce the following on two different platforms now, so
> > I assume it's a IRQ_PIPELINE generic issue:
> > 
> > Platform A):
> > Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
> > 1 Socket, 4 Cores, 1 thread per core
> > 
> > Platform B):
> > Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
> > 2 Sockets, 6 cores per socket, 2 threads per core
> > (2 NUMA nodes)
> > 
> > 
> > Platform A) reports the TSC being unstable during the boot phase,
> > platform B) reports the TSC as unstable when running stress tests:
> > 
> > Taken from a B) based system:
> > 
> > [57615.671114] clocksource: timekeeping watchdog on CPU17: Marking clocksource 'tsc' as unstable because the skew is too large:
> > [57615.738269] clocksource:                       'hpet' wd_now: 12f85ed0 wd_last: 2c5eab7b mask: ffffffff
> > [57615.794489] clocksource:                       'tsc' cs_now: 68e299c3708c cs_last: 6864c6ea3970 mask: ffffffffffffffff
> > [57615.858552] tsc: Marking TSC unstable due to clocksource watchdog
> > [57615.858582] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > [57615.910138] sched_clock: Marking unstable (57615104375773, 749891156)<-(57616072553488, -213973554)
> > [57615.905983] clocksource: Checking clocksource tsc synchronization from CPU 15.
> > [57615.949626] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
> > [57616.016343] clocksource: Switched to clocksource hpet
> > 
> > The clocksource watchdog is migrated between CPUs to make sure the TSC
> > is synchronized between cores. For me it looks like a late delivery of
> > the watchdog timer.
> > 
> > Available workaround(s):
> > - Add "tsc=reliable" to the kernel cmdline args
> > - At least for A) based systems it helped to apply the following diff to the kernel
> >   configuration. I do not consider that as "solution" for now.
> > 
> > -CONFIG_HZ_100=y
> > +CONFIG_HZ_1000=y
> > 
> > 
> > As soon as I disable CONFIG_IRQ_PIPELINE the problem is gone.
> > 
> > I already tried testing with CONFIG_DEBUG_IRQ_PIPELINE enabled, but
> > that didn't help so far.
> > 
> > Any advise how to debug that?
> > 
> > Best regards,
> > Florian
> 
> Could this be related [1] (HPET stanza)?
> 
> [1] https://evlproject.org/core/caveat/#x86-caveat
> 

In the A) scenario it could be related. HPET is disabled / not
available there. Thanks for the hint! 

With B) we have HPET enabled and it never happend when IRQ_PIPELINE was
not compiled in.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: IRQ_PIPELINE: TSC marked as unstable
  2021-09-03  8:41   ` Philippe Gerum
@ 2021-09-06 12:30     ` Bezdeka, Florian
  0 siblings, 0 replies; 5+ messages in thread
From: Bezdeka, Florian @ 2021-09-06 12:30 UTC (permalink / raw)
  To: rpm; +Cc: xenomai, jan.kiszka

On Fri, 2021-09-03 at 10:41 +0200, Philippe Gerum wrote:
> Philippe Gerum <rpm@xenomai.org> writes:
> 
> > Bezdeka, Florian <florian.bezdeka@siemens.com> writes:
> > 
> > > Hi all,
> > > 
> > > I'm able to reproduce the following on two different platforms now, so
> > > I assume it's a IRQ_PIPELINE generic issue:
> > > 
> > > Platform A):
> > > Intel(R) Xeon(R) CPU D-1518 @ 2.20GHz
> > > 1 Socket, 4 Cores, 1 thread per core
> > > 
> > > Platform B):
> > > Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
> > > 2 Sockets, 6 cores per socket, 2 threads per core
> > > (2 NUMA nodes)
> > > 
> > > 
> > > Platform A) reports the TSC being unstable during the boot phase,
> > > platform B) reports the TSC as unstable when running stress tests:
> > > 
> > > Taken from a B) based system:
> > > 
> > > [57615.671114] clocksource: timekeeping watchdog on CPU17: Marking clocksource 'tsc' as unstable because the skew is too large:
> > > [57615.738269] clocksource:                       'hpet' wd_now: 12f85ed0 wd_last: 2c5eab7b mask: ffffffff
> > > [57615.794489] clocksource:                       'tsc' cs_now: 68e299c3708c cs_last: 6864c6ea3970 mask: ffffffffffffffff
> > > [57615.858552] tsc: Marking TSC unstable due to clocksource watchdog
> > > [57615.858582] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
> > > [57615.910138] sched_clock: Marking unstable (57615104375773, 749891156)<-(57616072553488, -213973554)
> > > [57615.905983] clocksource: Checking clocksource tsc synchronization from CPU 15.
> > > [57615.949626] clocksource: Override clocksource tsc is unstable and not HRT compatible - cannot switch while in HRT/NOHZ mode
> > > [57616.016343] clocksource: Switched to clocksource hpet
> > > 
> > > The clocksource watchdog is migrated between CPUs to make sure the TSC
> > > is synchronized between cores. For me it looks like a late delivery of
> > > the watchdog timer.
> > > 
> > > Available workaround(s):
> > > - Add "tsc=reliable" to the kernel cmdline args
> > > - At least for A) based systems it helped to apply the following diff to the kernel
> > >   configuration. I do not consider that as "solution" for now.
> > > 
> > > -CONFIG_HZ_100=y
> > > +CONFIG_HZ_1000=y
> > > 
> > > 
> > > As soon as I disable CONFIG_IRQ_PIPELINE the problem is gone.
> > > 
> > > I already tried testing with CONFIG_DEBUG_IRQ_PIPELINE enabled, but
> > > that didn't help so far.
> > > 
> > > Any advise how to debug that?
> > > 
> > > Best regards,
> > > Florian
> > 
> > Could this be related [1] (HPET stanza)?
> > 
> > [1] https://evlproject.org/core/caveat/#x86-caveat
> 
> 
> Not directly, you do have HPET enabled and the refined source is not
> involved. Did you try enabling the Dovetail torture tests, particularly
> on the machine that has the issue at boot time?
> 

I enabled the torture tests today, they don't report any error. 

I guess I can stop searching for the root cause on A) based systems.
I'm quite sure it's HPET not being available. BIOS guys have been
informed. Waiting for response.

On B) based systems I just learned that we might have a hardware issue.
The HW guys are already on it. So for now: All problems gone.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-06 12:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-03  8:06 IRQ_PIPELINE: TSC marked as unstable Bezdeka, Florian
2021-09-03  8:33 ` Philippe Gerum
2021-09-03  8:41   ` Philippe Gerum
2021-09-06 12:30     ` Bezdeka, Florian
2021-09-03  8:44   ` Bezdeka, Florian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.