* Need help in tracking down OS jitters using Intel PT
@ 2017-03-23 9:28 Yilong Li
2017-03-23 15:30 ` David Ahern
[not found] ` <CAPAnC4MSH7s+Xi_WUHxYsJBEgPt1i8gcFP4yH2bTG=0vCVHsJA@mail.gmail.com>
0 siblings, 2 replies; 5+ messages in thread
From: Yilong Li @ 2017-03-23 9:28 UTC (permalink / raw)
To: linux-perf-users
Hello,
I have been trying to track down some OS jitters that take around
15~16us on my machine. I created a simple test program "user_loop"
which simply calls rdtscp in a tight loop, collects the first 100 gaps
that take more than 30000 cycles, and sends SIGUSR2 to the `perf
record -e intel_pt// --snapshot` process that is tracing the test
program. The output of the test program looks like this:
[snip]
jitter 88: 31656 cycles, timestamp 7135093048653524
jitter 89: 35714 cycles, timestamp 7135093052200982
jitter 90: 31508 cycles, timestamp 7135093052820884
jitter 91: 31183 cycles, timestamp 7135093068653931
jitter 92: 32813 cycles, timestamp 7135093088516264
[snip]
Then I used `perf script` to display the trace output hoping to see
what exactly get executed during the gap. I used Andi Kleen's
perf/xed-6 branch[1] to get the assembly. However, when I looked
around the last rdtscp instruction in the trace output, I saw nothing
from the kernel. The trace output looks like this:
[snip]
561650.137702631: 40061b main rdtscp
561650.137702631: 40061e main shl $0x20, %rdx
561650.137702631: 400622 main movl %ecx, -0x1c(%rbp)
561650.137702631: 400625 main or %rdx, %rax
561650.137702631: 400628 main mov %rax, %rdx
561650.137702631: 40062b main sub %rsi, %rdx
561650.137702631: 40062e main cmp $0x7530, %rdx
561650.137702631: jcc 400635 main jbe 0x400618 <main+0x88>
561650.137702631: 400618 main mov %rax, %rsi
561650.137702631: 40061b main rdtscp
561650.137702631: 40061e main shl $0x20, %rdx
561650.137702631: 400622 main movl %ecx, -0x1c(%rbp)
561650.137702631: 400625 main or %rdx, %rax
561650.137702631: 400628 main mov %rax, %rdx
561650.137702631: 40062b main sub %rsi, %rdx
561650.137702631: 40062e main cmp $0x7530, %rdx
561650.137702631: jcc 400635 main jbe 0x400618 <main+0x88>
561650.137702631: 400618 main mov %rax, %rsi
561650.137702631: 40061b main rdtscp
<===========LAST ITERATION
561650.137702631: 40061e main shl $0x20, %rdx
561650.137702631: 400622 main movl %ecx, -0x1c(%rbp)
561650.137702631: 400625 main or %rdx, %rax
561650.137702631: 400628 main mov %rax, %rdx
561650.137702631: 40062b main sub %rsi, %rdx
561650.137702631: 40062e main cmp $0x7530, %rdx
561650.137702631: jcc 400635 main jbe 0x400618 <main+0x88>
561650.137702631: 400637 main mov %r8d, %ecx
561650.137702631: 40063a main add $0x1, %r8d
561650.137702631: 40063e main cmp $0x63, %r8d
561650.137702631: 400642 main movq %rdx,
0x6013a0(,%rcx,8)
561650.137702631: 40064a main movq %rax,
0x601080(,%rcx,8)
561650.137702631: jcc 400652 main jbe 0x400618 <main+0x88>
561650.137702631: 400654 main test %edi, %edi
561650.137702631: jcc 400656 main jle 0x400662 <main+0xd2>
561650.137702631: 400658 main mov $0xc, %esi
561650.137702631: call 40065d main callq
0x400550 <kill@plt>
561650.137702631: jmp 400550 kill@plt jmpq
0x200ada(%rip)
561650.137702631: 400556 kill@plt pushq $0x3
561650.137702631: jmp 40055b kill@plt jmp
0x400510 <_init+0x20>
561650.137702631: 400510 _init pushq 0x200af2(%rip)
561650.137702631: jmp 400516 _init jmpq 0x200af4(%rip)
561650.137702631: 7f1b8882c6a0
_dl_runtime_resolve_avx pushq %rbx
561650.137702631: 7f1b8882c6a1
_dl_runtime_resolve_avx mov %rsp, %rbx
561650.137702631: 7f1b8882c6a4
_dl_runtime_resolve_avx and $0xffffffffffffffe0, %rsp
[snip]
Note: the cpu uarch on this machine is Broadwell, so the timestamps
are very crude.
What am I missing? Where does the 15us go? The program and the scripts
I used in the experiment can be found here:
https://github.com/yilongli/OSJitterTest.
Thanks,
Yilong
[1] https://lkml.org/lkml/2017/1/18/1125
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Need help in tracking down OS jitters using Intel PT
2017-03-23 9:28 Need help in tracking down OS jitters using Intel PT Yilong Li
@ 2017-03-23 15:30 ` David Ahern
2017-03-23 19:56 ` Yilong Li
2017-03-24 6:16 ` Yilong Li
[not found] ` <CAPAnC4MSH7s+Xi_WUHxYsJBEgPt1i8gcFP4yH2bTG=0vCVHsJA@mail.gmail.com>
1 sibling, 2 replies; 5+ messages in thread
From: David Ahern @ 2017-03-23 15:30 UTC (permalink / raw)
To: Yilong Li, linux-perf-users
On 3/23/17 3:28 AM, Yilong Li wrote:
> What am I missing? Where does the 15us go? The program and the scripts
> I used in the experiment can be found here:
> https://github.com/yilongli/OSJitterTest.
have you tried the hw latency detector? kernel/trace/trace_hwlat.c and
config HWLAT_TRACER
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Need help in tracking down OS jitters using Intel PT
[not found] ` <CAPAnC4MSH7s+Xi_WUHxYsJBEgPt1i8gcFP4yH2bTG=0vCVHsJA@mail.gmail.com>
@ 2017-03-23 19:55 ` Yilong Li
0 siblings, 0 replies; 5+ messages in thread
From: Yilong Li @ 2017-03-23 19:55 UTC (permalink / raw)
To: jim mauro; +Cc: linux-perf-use.
Yes, I have used cpuset to create a simple shield of 1 cpu and
migrated as many kthreads out of the shield as possible. I also tried
the full dynticks mode and ftrace before. I am confused because I know
if the jitter is due to kthreads, timer interrupts, or context
switches, I should be able to observe them in the output of ftrace or
intel pt. However, these 15us jitters are somehow invisible to these
tools.
Yilong
On Thu, Mar 23, 2017 at 8:37 AM, jim mauro <jim.mauro@gmail.com> wrote:
> If you bind the user-loop code to a core (taskset(1)) do you still observe
> the jitter?
>
> On Thu, Mar 23, 2017 at 2:28 AM, Yilong Li <yilongl@cs.stanford.edu> wrote:
>>
>> Hello,
>>
>> I have been trying to track down some OS jitters that take around
>> 15~16us on my machine. I created a simple test program "user_loop"
>> which simply calls rdtscp in a tight loop, collects the first 100 gaps
>> that take more than 30000 cycles, and sends SIGUSR2 to the `perf
>> record -e intel_pt// --snapshot` process that is tracing the test
>> program. The output of the test program looks like this:
>> [snip]
>> jitter 88: 31656 cycles, timestamp 7135093048653524
>> jitter 89: 35714 cycles, timestamp 7135093052200982
>> jitter 90: 31508 cycles, timestamp 7135093052820884
>> jitter 91: 31183 cycles, timestamp 7135093068653931
>> jitter 92: 32813 cycles, timestamp 7135093088516264
>> [snip]
>>
>> Then I used `perf script` to display the trace output hoping to see
>> what exactly get executed during the gap. I used Andi Kleen's
>> perf/xed-6 branch[1] to get the assembly. However, when I looked
>> around the last rdtscp instruction in the trace output, I saw nothing
>> from the kernel. The trace output looks like this:
>> [snip]
>> 561650.137702631: 40061b main rdtscp
>> 561650.137702631: 40061e main shl $0x20, %rdx
>> 561650.137702631: 400622 main movl %ecx,
>> -0x1c(%rbp)
>> 561650.137702631: 400625 main or %rdx, %rax
>> 561650.137702631: 400628 main mov %rax, %rdx
>> 561650.137702631: 40062b main sub %rsi, %rdx
>> 561650.137702631: 40062e main cmp $0x7530, %rdx
>> 561650.137702631: jcc 400635 main jbe 0x400618
>> <main+0x88>
>> 561650.137702631: 400618 main mov %rax, %rsi
>> 561650.137702631: 40061b main rdtscp
>> 561650.137702631: 40061e main shl $0x20, %rdx
>> 561650.137702631: 400622 main movl %ecx,
>> -0x1c(%rbp)
>> 561650.137702631: 400625 main or %rdx, %rax
>> 561650.137702631: 400628 main mov %rax, %rdx
>> 561650.137702631: 40062b main sub %rsi, %rdx
>> 561650.137702631: 40062e main cmp $0x7530, %rdx
>> 561650.137702631: jcc 400635 main jbe 0x400618
>> <main+0x88>
>> 561650.137702631: 400618 main mov %rax, %rsi
>> 561650.137702631: 40061b main rdtscp
>> <===========LAST ITERATION
>> 561650.137702631: 40061e main shl $0x20, %rdx
>> 561650.137702631: 400622 main movl %ecx,
>> -0x1c(%rbp)
>> 561650.137702631: 400625 main or %rdx, %rax
>> 561650.137702631: 400628 main mov %rax, %rdx
>> 561650.137702631: 40062b main sub %rsi, %rdx
>> 561650.137702631: 40062e main cmp $0x7530, %rdx
>> 561650.137702631: jcc 400635 main jbe 0x400618
>> <main+0x88>
>> 561650.137702631: 400637 main mov %r8d, %ecx
>> 561650.137702631: 40063a main add $0x1, %r8d
>> 561650.137702631: 40063e main cmp $0x63, %r8d
>> 561650.137702631: 400642 main movq %rdx,
>> 0x6013a0(,%rcx,8)
>> 561650.137702631: 40064a main movq %rax,
>> 0x601080(,%rcx,8)
>> 561650.137702631: jcc 400652 main jbe 0x400618
>> <main+0x88>
>> 561650.137702631: 400654 main test %edi, %edi
>> 561650.137702631: jcc 400656 main jle 0x400662
>> <main+0xd2>
>> 561650.137702631: 400658 main mov $0xc, %esi
>> 561650.137702631: call 40065d main callq
>> 0x400550 <kill@plt>
>> 561650.137702631: jmp 400550 kill@plt jmpq
>> 0x200ada(%rip)
>> 561650.137702631: 400556 kill@plt pushq
>> $0x3
>> 561650.137702631: jmp 40055b kill@plt jmp
>> 0x400510 <_init+0x20>
>> 561650.137702631: 400510 _init pushq
>> 0x200af2(%rip)
>> 561650.137702631: jmp 400516 _init jmpq
>> 0x200af4(%rip)
>> 561650.137702631: 7f1b8882c6a0
>> _dl_runtime_resolve_avx pushq %rbx
>> 561650.137702631: 7f1b8882c6a1
>> _dl_runtime_resolve_avx mov %rsp, %rbx
>> 561650.137702631: 7f1b8882c6a4
>> _dl_runtime_resolve_avx and $0xffffffffffffffe0, %rsp
>> [snip]
>>
>> Note: the cpu uarch on this machine is Broadwell, so the timestamps
>> are very crude.
>>
>> What am I missing? Where does the 15us go? The program and the scripts
>> I used in the experiment can be found here:
>> https://github.com/yilongli/OSJitterTest.
>>
>> Thanks,
>> Yilong
>>
>> [1] https://lkml.org/lkml/2017/1/18/1125
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-perf-users" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Need help in tracking down OS jitters using Intel PT
2017-03-23 15:30 ` David Ahern
@ 2017-03-23 19:56 ` Yilong Li
2017-03-24 6:16 ` Yilong Li
1 sibling, 0 replies; 5+ messages in thread
From: Yilong Li @ 2017-03-23 19:56 UTC (permalink / raw)
To: David Ahern; +Cc: linux-perf-use.
Thanks for the information! I didn't know that. I will give it a try.
Yilong
On Thu, Mar 23, 2017 at 8:30 AM, David Ahern <dsahern@gmail.com> wrote:
> On 3/23/17 3:28 AM, Yilong Li wrote:
>> What am I missing? Where does the 15us go? The program and the scripts
>> I used in the experiment can be found here:
>> https://github.com/yilongli/OSJitterTest.
>
> have you tried the hw latency detector? kernel/trace/trace_hwlat.c and
> config HWLAT_TRACER
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Need help in tracking down OS jitters using Intel PT
2017-03-23 15:30 ` David Ahern
2017-03-23 19:56 ` Yilong Li
@ 2017-03-24 6:16 ` Yilong Li
1 sibling, 0 replies; 5+ messages in thread
From: Yilong Li @ 2017-03-24 6:16 UTC (permalink / raw)
To: David Ahern; +Cc: linux-perf-use.
I have tried the hw latency detector but it's not enough to pin point
the problem for me because the current implementation only outputs the
maximum jitter observed. Here is the example output:
# tracer: hwlat
#
# entries-in-buffer/entries-written: 64/64 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-2512 [000] d... 2584.575347: #1
inner/outer(us): 165/223 ts:1490333353.666940246 nmi-total:9
nmi-count:1
<...>-2512 [001] dn.. 2585.586994: #2
inner/outer(us): 185/200 ts:1490333354.678950385
<...>-2512 [002] dn.. 2586.610779: #3
inner/outer(us): 173/209 ts:1490333355.702960161
<...>-2512 [003] d... 2587.634763: #4
inner/outer(us): 229/210 ts:1490333356.718969641
<...>-2512 [004] d... 2588.658749: #5
inner/outer(us): 181/234 ts:1490333357.750979263
<...>-2512 [005] d... 2589.682887: #6
inner/outer(us): 17/236 ts:1490333358.758988675
<...>-2512 [006] dn.. 2590.706737: #7
inner/outer(us): 211/172 ts:1490333359.798998395
<...>-2512 [007] dn.. 2591.730706: #8
inner/outer(us): 20/238 ts:1490333360.823007948
<...>-2512 [000] dn.. 2592.754690: #9
inner/outer(us): 153/231 ts:1490333361.847017517
[snip]
In addition, I also tried reading msr 0x34 before and after my test
program. I found that the delta of the readings, which I believe
should be the number of SMI fired during my run, matched the number of
jitters that take > 300000 cycles (~150us). So I am ruling out SMI for
my 15us jitters.
Is it safe to say that only hardware and firmware induced jitters are
invisible to Intel PT? If so, why isn't Intel PT aware of them? Also,
does anyone have any suggestion on how to identify the sources of
hardware/firmware jitters? Thanks.
Best,
Yilong
On Thu, Mar 23, 2017 at 8:30 AM, David Ahern <dsahern@gmail.com> wrote:
> On 3/23/17 3:28 AM, Yilong Li wrote:
>> What am I missing? Where does the 15us go? The program and the scripts
>> I used in the experiment can be found here:
>> https://github.com/yilongli/OSJitterTest.
>
> have you tried the hw latency detector? kernel/trace/trace_hwlat.c and
> config HWLAT_TRACER
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-03-24 6:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-23 9:28 Need help in tracking down OS jitters using Intel PT Yilong Li
2017-03-23 15:30 ` David Ahern
2017-03-23 19:56 ` Yilong Li
2017-03-24 6:16 ` Yilong Li
[not found] ` <CAPAnC4MSH7s+Xi_WUHxYsJBEgPt1i8gcFP4yH2bTG=0vCVHsJA@mail.gmail.com>
2017-03-23 19:55 ` Yilong Li
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.