Need help in tracking down OS jitters using Intel PT

* Need help in tracking down OS jitters using Intel PT
@ 2017-03-23  9:28 Yilong Li
  2017-03-23 15:30 ` David Ahern
       [not found] ` <CAPAnC4MSH7s+Xi_WUHxYsJBEgPt1i8gcFP4yH2bTG=0vCVHsJA@mail.gmail.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Yilong Li @ 2017-03-23  9:28 UTC (permalink / raw)
  To: linux-perf-users

Hello,

I have been trying to track down some OS jitters that take around
15~16us on my machine. I created a simple test program "user_loop"
which simply calls rdtscp in a tight loop, collects the first 100 gaps
that take more than 30000 cycles, and sends SIGUSR2 to the `perf
record -e intel_pt// --snapshot` process that is tracing the test
program. The output of the test program looks like this:
[snip]
jitter 88: 31656 cycles, timestamp 7135093048653524
jitter 89: 35714 cycles, timestamp 7135093052200982
jitter 90: 31508 cycles, timestamp 7135093052820884
jitter 91: 31183 cycles, timestamp 7135093068653931
jitter 92: 32813 cycles, timestamp 7135093088516264
[snip]

Then I used `perf script` to display the trace output hoping to see
what exactly get executed during the gap. I used Andi Kleen's
perf/xed-6 branch[1] to get the assembly. However, when I looked
around the last rdtscp instruction in the trace output, I saw nothing
from the kernel. The trace output looks like this:
[snip]
561650.137702631:                          40061b main  rdtscp
561650.137702631:                          40061e main  shl $0x20, %rdx
561650.137702631:                          400622 main  movl  %ecx, -0x1c(%rbp)
561650.137702631:                          400625 main  or %rdx, %rax
561650.137702631:                          400628 main  mov %rax, %rdx
561650.137702631:                          40062b main  sub %rsi, %rdx
561650.137702631:                          40062e main  cmp $0x7530, %rdx
561650.137702631:   jcc                    400635 main  jbe 0x400618 <main+0x88>
561650.137702631:                          400618 main  mov %rax, %rsi
561650.137702631:                          40061b main  rdtscp
561650.137702631:                          40061e main  shl $0x20, %rdx
561650.137702631:                          400622 main  movl  %ecx, -0x1c(%rbp)
561650.137702631:                          400625 main  or %rdx, %rax
561650.137702631:                          400628 main  mov %rax, %rdx
561650.137702631:                          40062b main  sub %rsi, %rdx
561650.137702631:                          40062e main  cmp $0x7530, %rdx
561650.137702631:   jcc                    400635 main  jbe 0x400618 <main+0x88>
561650.137702631:                          400618 main  mov %rax, %rsi
561650.137702631:                          40061b main  rdtscp
<===========LAST ITERATION
561650.137702631:                          40061e main  shl $0x20, %rdx
561650.137702631:                          400622 main  movl  %ecx, -0x1c(%rbp)
561650.137702631:                          400625 main  or %rdx, %rax
561650.137702631:                          400628 main  mov %rax, %rdx
561650.137702631:                          40062b main  sub %rsi, %rdx
561650.137702631:                          40062e main  cmp $0x7530, %rdx
561650.137702631:   jcc                    400635 main  jbe 0x400618 <main+0x88>
561650.137702631:                          400637 main  mov %r8d, %ecx
561650.137702631:                          40063a main  add $0x1, %r8d
561650.137702631:                          40063e main  cmp $0x63, %r8d
561650.137702631:                          400642 main  movq  %rdx,
0x6013a0(,%rcx,8)
561650.137702631:                          40064a main  movq  %rax,
0x601080(,%rcx,8)
561650.137702631:   jcc                    400652 main  jbe 0x400618 <main+0x88>
561650.137702631:                          400654 main  test %edi, %edi
561650.137702631:   jcc                    400656 main  jle 0x400662 <main+0xd2>
561650.137702631:                          400658 main  mov $0xc, %esi
561650.137702631:   call                   40065d main  callq
0x400550 <kill@plt>
561650.137702631:   jmp                    400550 kill@plt      jmpq
0x200ada(%rip)
561650.137702631:                          400556 kill@plt      pushq  $0x3
561650.137702631:   jmp                    40055b kill@plt      jmp
0x400510 <_init+0x20>
561650.137702631:                          400510 _init pushq  0x200af2(%rip)
561650.137702631:   jmp                    400516 _init jmpq  0x200af4(%rip)
561650.137702631:                    7f1b8882c6a0
_dl_runtime_resolve_avx       pushq  %rbx
561650.137702631:                    7f1b8882c6a1
_dl_runtime_resolve_avx       mov %rsp, %rbx
561650.137702631:                    7f1b8882c6a4
_dl_runtime_resolve_avx       and $0xffffffffffffffe0, %rsp
[snip]

Note: the cpu uarch on this machine is Broadwell, so the timestamps
are very crude.

What am I missing? Where does the 15us go? The program and the scripts
I used in the experiment can be found here:
https://github.com/yilongli/OSJitterTest.

Thanks,
Yilong

[1] https://lkml.org/lkml/2017/1/18/1125

^ permalink raw reply	[flat|nested] 5+ messages in thread