linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Debugging unexpected latency on a new Xeon W C422 system
@ 2020-02-20 16:56 Kansky, Jan E.
  2020-02-21 17:54 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 2+ messages in thread
From: Kansky, Jan E. @ 2020-02-20 16:56 UTC (permalink / raw)
  To: linux-rt-users

Here is my new system:
Intel(R) Xeon(R) W-2123 CPU @ 3.60GHz
C422 chipset
Centos 8

I've compiled and built two versions of the realtime kernel to take
the place of the stock Centos kernel:
4.19.103-rt42
and
5.4.17-rt9

For both kernels, I run cyclictest:
./cyclictest -p 98 --smp -b 600  -f -m

The worst case latencies approach 2 milliseconds at times.

I have disabled SpeedStep in the BIOS.

I have also experimented with the following additional kernel boot
parameters for both kernels:
noibrs noibpb nospectre_v2 nospectre_v1 l1tf=off
nospec_store_bypass_disable no_stf_barrier mds=off mitigations=off
nosoftlockup intel_idle.max_cstate=0 mce=ignore_ce
processor.max_cstate=0

These parameters didn't solve the problem.

The trace results show that the unexpected large latency seems to
occur in the following ways:

llvmpipe-9436    3d...... 4901974us!: switch_fpu_return
<-prepare_exit_to_usermode
llvmpipe-9436    3d...... 4902845us : smp_apic_timer_interrupt
<-apic_timer_interrupt

or

    Xorg-8825    3....... 4905576us!: kfree <-__audit_syscall_exit
  <idle>-0       2d...1.. 4905876us : smp_apic_timer_interrupt
<-apic_timer_interrupt

or
  <idle>-0       2d...1.. 4905910us!: mwait_idle <-default_idle_call
  <idle>-0       0d...1.. 4906049us : smp_apic_timer_interrupt
<-apic_timer_interrupt

or

llvmpipe-9435    1d...... 4917323us!: rcu_irq_exit <-irq_exit
llvmpipe-9438    3d...... 4917845us : smp_apic_timer_interrupt
<-apic_timer_interrupt

I do see NMIs occurring on the system, although not all latency events
seem to correlate with an increment in the NMI counter in
/proc/interrupts.

I would greatly appreciate any advice on what I should do to trace the
problem with this new system.  I can send my .config files if needed.
CONFIG_PREEMPT_RT_FULL=y is set.

Thanks!
Jan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Debugging unexpected latency on a new Xeon W C422 system
  2020-02-20 16:56 Debugging unexpected latency on a new Xeon W C422 system Kansky, Jan E.
@ 2020-02-21 17:54 ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 2+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-02-21 17:54 UTC (permalink / raw)
  To: Kansky, Jan E.; +Cc: linux-rt-users

On 2020-02-20 11:56:23 [-0500], Kansky, Jan E. wrote:
> For both kernels, I run cyclictest:
> ./cyclictest -p 98 --smp -b 600  -f -m

you could use the -b option to stop the trace once you hit the 2ms
latency.

> The trace results show that the unexpected large latency seems to
> occur in the following ways:
> 
> llvmpipe-9436    3d...... 4901974us!: switch_fpu_return
> <-prepare_exit_to_usermode
> llvmpipe-9436    3d...... 4902845us : smp_apic_timer_interrupt
> <-apic_timer_interrupt

It is sometimes hard to read with the additional line feed.
However, this is probably okay because you return to user space and come
back later after a timer interrupt.

> or
> 
>     Xorg-8825    3....... 4905576us!: kfree <-__audit_syscall_exit
>   <idle>-0       2d...1.. 4905876us : smp_apic_timer_interrupt
> <-apic_timer_interrupt
> 
> or
>   <idle>-0       2d...1.. 4905910us!: mwait_idle <-default_idle_call
>   <idle>-0       0d...1.. 4906049us : smp_apic_timer_interrupt
> <-apic_timer_interrupt
> 
> or
> 
> llvmpipe-9435    1d...... 4917323us!: rcu_irq_exit <-irq_exit
> llvmpipe-9438    3d...... 4917845us : smp_apic_timer_interrupt
> <-apic_timer_interrupt

The number before the d is the CPU number. So if you kfree() on CPU3
follwed by smp_apic_timer_interrupt() on CPU2 there is no need to worry.
Same for the other two examples. You need to see what happens after that
gap.

> I do see NMIs occurring on the system, although not all latency events
> seem to correlate with an increment in the NMI counter in
> /proc/interrupts.

perf may be responsible for some of them or the "hardware watchdog". If
you suspect that the BIOS is doing something, there is the
CONFIG_HWLAT_TRACER to proof it.

> I would greatly appreciate any advice on what I should do to trace the
> problem with this new system.  I can send my .config files if needed.
> CONFIG_PREEMPT_RT_FULL=y is set.
> 
> Thanks!
> Jan

Sebastian

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-02-21 17:54 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-20 16:56 Debugging unexpected latency on a new Xeon W C422 system Kansky, Jan E.
2020-02-21 17:54 ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).