* [Xenomai] Reading /proc/xenomai/stat causes high latencies
@ 2014-04-22 16:02 Jeroen Van den Keybus
2014-04-23 9:14 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-22 16:02 UTC (permalink / raw)
To: xenomai
Using a 3.10.18 kernel with Xenomai 2.6.3, reading the stat entry of
/proc/xenomai causes high latencies in RT tasks. I've found a report
on a similar issue in
https://sites.google.com/site/manisbutareed/linuxcnc-2-5/xenomai-user-threads.
We also had this occurring on a 3.8.13 kernel.
A typical latency test run looks like:
RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD| -2.382| -2.357| -1.945| 0| 0| -2.382| -1.945
RTD| -2.577| -2.360| -1.749| 0| 0| -2.577| -1.749
RTD| -2.380| -2.360| -1.865| 0| 0| -2.577| -1.749
RTD| -2.568| -2.361| -1.530| 0| 0| -2.577| -1.530
RTD| -2.379| -2.359| -1.732| 0| 0| -2.577| -1.530
RTD| -2.381| -2.361| -2.008| 0| 0| -2.577| -1.530
RTD| -2.381| -2.360| -2.085| 0| 0| -2.577| -1.530
RTD| -2.699| -2.359| 2.566| 0| 0| -2.699| 2.566
RTD| -2.380| -2.320| -1.876| 0| 0| -2.699| 2.566
RTD| -2.381| -2.359| 2.528| 0| 0| -2.699| 2.566
RTD| -2.380| -2.360| -1.805| 0| 0| -2.699| 2.566
RTD| -2.579| -2.311| -0.045| 0| 0| -2.699| 2.566
RTD| -2.380| -2.359| -2.072| 0| 0| -2.699| 2.566
RTD| -2.575| -2.360| 2.065| 0| 0| -2.699| 2.566
RTD| -2.381| 19.028| 3043.067| 31| 0| -2.699| 3043.067
RTD| -2.566| 26.488| 105.823| 32| 0| -2.699| 3043.067
RTD| -2.443| -2.276| 0.597| 32| 0| -2.699| 3043.067
RTD| -2.584| -2.306| 2.032| 32| 0| -2.699| 3043.067
RTD| -2.377| -2.242| 4.106| 32| 0| -2.699| 3043.067
RTD| -2.537| -2.291| 4.394| 32| 0| -2.699| 3043.067
It is obvious where I issued the 'cat /proc/xenomai/stat' command.
I will try to create an I-trace now. (config is same as under my
previous post 'Slow execution of RT task' - I'm still looking into
that issue as well).
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-22 16:02 [Xenomai] Reading /proc/xenomai/stat causes high latencies Jeroen Van den Keybus
@ 2014-04-23 9:14 ` Jeroen Van den Keybus
2014-04-23 13:45 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-23 9:14 UTC (permalink / raw)
To: xenomai
Curious. If I enable the I-pipe tracer, the problem goes away. If I
disable it, it reliably returns. In the last case, however, loading
xeno_native is either slow (several seconds) or does not terminate
unitl I hit a key (I used <SHIFT L>) on an attached keyboard.
(dmesg log contains the same rcutree warnings in both cases as
mentioned under 'Slow ... RT task' post)
Jeroen.
2014-04-22 18:02 GMT+02:00 Jeroen Van den Keybus
<jeroen.vandenkeybus@gmail.com>:
> Using a 3.10.18 kernel with Xenomai 2.6.3, reading the stat entry of
> /proc/xenomai causes high latencies in RT tasks. I've found a report
> on a similar issue in
> https://sites.google.com/site/manisbutareed/linuxcnc-2-5/xenomai-user-threads.
> We also had this occurring on a 3.8.13 kernel.
>
> A typical latency test run looks like:
>
> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
> RTD| -2.382| -2.357| -1.945| 0| 0| -2.382| -1.945
> RTD| -2.577| -2.360| -1.749| 0| 0| -2.577| -1.749
> RTD| -2.380| -2.360| -1.865| 0| 0| -2.577| -1.749
> RTD| -2.568| -2.361| -1.530| 0| 0| -2.577| -1.530
> RTD| -2.379| -2.359| -1.732| 0| 0| -2.577| -1.530
> RTD| -2.381| -2.361| -2.008| 0| 0| -2.577| -1.530
> RTD| -2.381| -2.360| -2.085| 0| 0| -2.577| -1.530
> RTD| -2.699| -2.359| 2.566| 0| 0| -2.699| 2.566
> RTD| -2.380| -2.320| -1.876| 0| 0| -2.699| 2.566
> RTD| -2.381| -2.359| 2.528| 0| 0| -2.699| 2.566
> RTD| -2.380| -2.360| -1.805| 0| 0| -2.699| 2.566
> RTD| -2.579| -2.311| -0.045| 0| 0| -2.699| 2.566
> RTD| -2.380| -2.359| -2.072| 0| 0| -2.699| 2.566
> RTD| -2.575| -2.360| 2.065| 0| 0| -2.699| 2.566
> RTD| -2.381| 19.028| 3043.067| 31| 0| -2.699| 3043.067
> RTD| -2.566| 26.488| 105.823| 32| 0| -2.699| 3043.067
> RTD| -2.443| -2.276| 0.597| 32| 0| -2.699| 3043.067
> RTD| -2.584| -2.306| 2.032| 32| 0| -2.699| 3043.067
> RTD| -2.377| -2.242| 4.106| 32| 0| -2.699| 3043.067
> RTD| -2.537| -2.291| 4.394| 32| 0| -2.699| 3043.067
>
> It is obvious where I issued the 'cat /proc/xenomai/stat' command.
>
> I will try to create an I-trace now. (config is same as under my
> previous post 'Slow execution of RT task' - I'm still looking into
> that issue as well).
>
>
> Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 9:14 ` Jeroen Van den Keybus
@ 2014-04-23 13:45 ` Jeroen Van den Keybus
2014-04-23 14:07 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-23 13:45 UTC (permalink / raw)
To: xenomai
I've attached an I-trace from what happens when 'modprobe xeno_native'
stalls. I could use some hints as where to start looking into this
issue. Right now, I would say I have a look at which code paths are
traversed and not when CONFIG_IPIPE_TRACE is unset and set
respectively.
Jeroen.
2014-04-23 11:14 GMT+02:00 Jeroen Van den Keybus
<jeroen.vandenkeybus@gmail.com>:
> Curious. If I enable the I-pipe tracer, the problem goes away. If I
> disable it, it reliably returns. In the last case, however, loading
> xeno_native is either slow (several seconds) or does not terminate
> unitl I hit a key (I used <SHIFT L>) on an attached keyboard.
>
> (dmesg log contains the same rcutree warnings in both cases as
> mentioned under 'Slow ... RT task' post)
>
> Jeroen.
>
>
> 2014-04-22 18:02 GMT+02:00 Jeroen Van den Keybus
> <jeroen.vandenkeybus@gmail.com>:
>> Using a 3.10.18 kernel with Xenomai 2.6.3, reading the stat entry of
>> /proc/xenomai causes high latencies in RT tasks. I've found a report
>> on a similar issue in
>> https://sites.google.com/site/manisbutareed/linuxcnc-2-5/xenomai-user-threads.
>> We also had this occurring on a 3.8.13 kernel.
>>
>> A typical latency test run looks like:
>>
>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>> RTD| -2.382| -2.357| -1.945| 0| 0| -2.382| -1.945
>> RTD| -2.577| -2.360| -1.749| 0| 0| -2.577| -1.749
>> RTD| -2.380| -2.360| -1.865| 0| 0| -2.577| -1.749
>> RTD| -2.568| -2.361| -1.530| 0| 0| -2.577| -1.530
>> RTD| -2.379| -2.359| -1.732| 0| 0| -2.577| -1.530
>> RTD| -2.381| -2.361| -2.008| 0| 0| -2.577| -1.530
>> RTD| -2.381| -2.360| -2.085| 0| 0| -2.577| -1.530
>> RTD| -2.699| -2.359| 2.566| 0| 0| -2.699| 2.566
>> RTD| -2.380| -2.320| -1.876| 0| 0| -2.699| 2.566
>> RTD| -2.381| -2.359| 2.528| 0| 0| -2.699| 2.566
>> RTD| -2.380| -2.360| -1.805| 0| 0| -2.699| 2.566
>> RTD| -2.579| -2.311| -0.045| 0| 0| -2.699| 2.566
>> RTD| -2.380| -2.359| -2.072| 0| 0| -2.699| 2.566
>> RTD| -2.575| -2.360| 2.065| 0| 0| -2.699| 2.566
>> RTD| -2.381| 19.028| 3043.067| 31| 0| -2.699| 3043.067
>> RTD| -2.566| 26.488| 105.823| 32| 0| -2.699| 3043.067
>> RTD| -2.443| -2.276| 0.597| 32| 0| -2.699| 3043.067
>> RTD| -2.584| -2.306| 2.032| 32| 0| -2.699| 3043.067
>> RTD| -2.377| -2.242| 4.106| 32| 0| -2.699| 3043.067
>> RTD| -2.537| -2.291| 4.394| 32| 0| -2.699| 3043.067
>>
>> It is obvious where I issued the 'cat /proc/xenomai/stat' command.
>>
>> I will try to create an I-trace now. (config is same as under my
>> previous post 'Slow execution of RT task' - I'm still looking into
>> that issue as well).
>>
>>
>> Jeroen.
-------------- next part --------------
I-pipe worst-case tracing service on 3.10.18-ipipe/ipipe release #1
-------------------------------------------------------------
CPU: 0, Begin: 382257313528 cycles, Trace Points: 7 (-2048/+1), Length: 19678412 us
Calibrated minimum trace-point overhead: 0.060 us
+----- Hard IRQs ('|': locked)
|+-- Xenomai
||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
||| +---------- Delay flag ('+': > 1 us, '!': > 10 us)
||| | +- NMI noise ('N')
||| | |
Type User Val. Time Delay Function (Parent)
| +end 0x80000000 -10026 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10026 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10026 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10026 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10026 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10026 0.095 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10026 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10026 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10026 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10025 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10025 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10025 0.091 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10025 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10025 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10025 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10025 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10025 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10025 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10024 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10024 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10024 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10024 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10024 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10024 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10024 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10024 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10024 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10024 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10024 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10023 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10023 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10023 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10023 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10023 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10023 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10023 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10023 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10023 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10022 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10022 0.095 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10022 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10022 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10022 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10022 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10022 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10022 0.091 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10022 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10022 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10021 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10021 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10021 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10021 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10021 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10021 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10021 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10021 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10021 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10021 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10020 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10020 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10020 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10020 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10020 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10020 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10020 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10020 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10020 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10020 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10019 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10019 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10019 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10019 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10019 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10019 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10019 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10019 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10019 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10019 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10018 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10018 0.090 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10018 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10018 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10018 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10018 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10018 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10018 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10018 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10018 0.092 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10017 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10017 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10017 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10017 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10017 0.092 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10017 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10017 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10017 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10017 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10017 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10016 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10016 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10016 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10016 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10016 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10016 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10016 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10016 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10016 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10016 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10015 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10015 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10015 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10015 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10015 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10015 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10015 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10015 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10015 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10015 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10015 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10014 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10014 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10014 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10014 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10014 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10014 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10014 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10014 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10014 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10014 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10013 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10013 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10013 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10013 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10013 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10013 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10013 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10013 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10013 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10012 0.091 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10012 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10012 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10012 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10012 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10012 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10012 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10012 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10012 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10012 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10012 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10011 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10011 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10011 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10011 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10011 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10011 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10011 0.092 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10011 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10011 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10010 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10010 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10010 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10010 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10010 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10010 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10010 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10010 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10010 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10010 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10010 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10009 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10009 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10009 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10009 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10009 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10009 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10009 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10009 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10009 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10009 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10008 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10008 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10008 0.091 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10008 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10008 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10008 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10008 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10008 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10008 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10007 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10007 0.092 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10007 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10007 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10007 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10007 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10007 0.092 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10007 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10007 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10007 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10006 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10006 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10006 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10006 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10006 0.094 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10006 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10006 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10006 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10006 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10006 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10005 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10005 0.095 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10005 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10005 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10005 0.111 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10005 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10005 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10005 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10005 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10005 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10004 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10004 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10004 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10004 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10004 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10004 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10004 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10004 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10004 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10004 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10003 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10003 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10003 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10003 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10003 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10003 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10003 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10003 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10003 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10003 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10002 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10002 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10002 0.091 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10002 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10002 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10002 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10002 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10002 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10002 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10002 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10001 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10001 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10001 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10001 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -10001 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -10001 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -10001 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10001 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10001 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -10001 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -10000 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -10000 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -10000 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -10000 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -10000 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -10000 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -10000 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -10000 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -10000 0.090 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10000 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9999 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9999 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9999 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9999 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9999 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9999 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9999 0.091 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9999 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9999 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9999 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9998 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9998 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9998 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9998 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9998 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9998 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9998 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9998 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9998 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9998 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9997 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9997 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9997 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9997 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9997 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9997 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9997 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9997 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9997 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9997 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9996 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9996 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9996 0.090 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9996 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9996 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9996 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9996 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9996 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9996 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9996 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9995 0.091 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9995 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9995 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9995 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9995 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9995 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9995 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9995 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9995 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9995 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9994 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9994 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9994 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9994 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9994 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9994 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9994 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9994 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9994 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9994 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9993 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9993 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9993 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9993 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9993 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9993 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9993 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9993 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9993 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9993 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9992 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9992 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9992 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9992 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9992 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9992 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9992 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9992 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9992 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9992 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9991 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9991 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9991 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9991 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9991 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9991 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9991 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9991 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9991 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9991 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9990 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9990 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9990 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9990 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9990 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9990 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9990 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9990 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9990 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9990 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9989 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9989 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9989 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9989 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9989 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9989 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9989 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9989 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9989 0.091 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9989 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9988 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9988 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9988 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9988 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9988 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9988 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9988 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9988 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9988 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9988 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9987 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9987 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9987 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9987 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9987 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9987 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9987 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9987 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9987 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9987 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9986 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9986 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9986 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9986 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9986 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9986 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9986 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9986 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9986 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9986 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9985 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9985 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9985 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9985 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9985 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9985 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9985 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9985 0.092 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9985 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9985 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9984 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9984 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9984 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9984 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9984 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9984 0.094 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9984 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9984 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9984 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9984 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9983 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9983 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9983 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9983 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9983 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9983 0.111 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9983 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9983 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9983 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9983 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9982 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9982 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9982 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9982 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9982 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9982 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9982 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9982 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9982 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9982 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9981 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9981 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9981 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9981 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9981 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9981 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9981 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9981 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9981 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9981 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9980 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9980 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9980 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9980 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9980 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9980 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9980 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9980 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9980 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9980 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9979 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9979 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9979 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9979 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9979 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9979 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9979 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9979 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9979 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9979 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9978 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9978 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9978 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9978 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9978 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9978 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9978 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9978 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9978 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9978 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9977 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9977 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9977 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9977 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9977 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9977 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9977 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9977 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9977 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9977 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9976 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9976 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9976 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9976 0.097 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9976 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9976 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9976 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9976 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9976 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9976 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9975 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9975 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9975 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9975 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9975 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9975 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9975 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9975 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9975 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9975 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9974 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9974 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9974 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9974 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9974 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9974 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9974 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9974 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9974 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9974 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9973 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9973 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9973 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9973 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9973 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9973 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9973 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9973 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9973 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9973 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9972 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9972 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9972 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9972 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9972 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9972 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9972 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9972 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9972 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9972 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9971 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9971 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9971 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9971 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9971 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9971 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9971 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9971 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9971 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9971 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9970 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9970 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9970 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9970 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9970 0.094 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9970 0.091 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9970 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9970 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9970 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9970 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9969 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9969 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9969 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9969 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9969 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9969 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9969 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9969 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9969 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9969 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9968 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9968 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9968 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9968 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9968 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9968 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9968 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9968 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9968 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9968 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9967 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9967 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9967 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9967 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9967 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9967 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9967 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9967 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9967 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9967 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9966 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9966 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9966 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9966 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9966 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9966 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9966 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9966 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9966 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9966 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9965 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9965 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9965 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9965 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9965 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9965 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9965 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9965 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9965 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9965 0.095 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9964 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9964 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9964 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9964 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9964 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9964 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9964 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9964 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9964 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9964 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9963 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9963 0.098 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9963 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9963 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9963 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9963 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9963 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9963 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9963 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9963 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9962 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9962 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9962 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9962 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9962 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9962 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9962 0.094 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9962 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9962 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9962 0.109 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9961 0.094 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9961 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9961 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9961 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9961 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9961 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9961 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9961 0.118 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9961 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9961 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9960 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9960 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9960 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9960 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9960 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9960 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9960 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9960 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9960 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9959 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9959 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9959 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9959 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9959 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9959 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9959 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9959 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9959 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9959 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9959 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9958 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9958 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9958 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9958 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9958 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9958 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9958 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9958 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9958 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9958 0.094 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9957 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9957 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9957 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9957 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9957 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9957 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9957 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9957 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9957 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9956 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9956 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9956 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9956 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9956 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9956 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9956 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9956 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9956 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9956 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9955 0.090 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9955 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9955 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9955 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9955 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9955 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9955 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9955 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9955 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9955 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9955 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9954 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9954 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9954 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9954 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9954 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9954 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9954 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9954 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9954 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9954 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9953 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9953 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9953 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9953 0.096 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9953 0.090 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9953 0.089 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9953 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9953 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9953 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9952 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9952 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9952 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9952 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9952 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9952 0.090 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9952 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9952 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9952 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9952 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9952 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9951 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9951 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9951 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9951 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9951 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9951 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9951 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9951 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9951 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9951 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9950 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9950 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9950 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9950 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9950 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9950 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9950 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9950 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9950 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9949 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9949 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9949 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9949 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9949 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9949 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9949 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9949 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9949 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9949 0.090 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9949 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9948 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9948 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9948 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9948 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9948 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9948 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9948 0.092 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9948 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9948 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9947 0.095 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9947 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9947 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9947 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9947 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9947 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9947 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9947 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9947 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9947 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9947 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9946 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9946 0.094 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9946 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9946 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9946 0.122 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9946 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9946 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9946 0.092 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9946 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9946 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9945 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9945 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9945 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9945 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9945 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9945 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9945 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9945 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9945 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9944 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9944 0.092 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9944 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9944 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9944 0.092 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9944 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9944 0.092 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9944 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9944 0.120 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9944 0.087 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9943 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9943 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9943 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9943 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9943 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9943 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9943 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9943 0.091 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9943 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9943 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9942 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9942 0.094 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9942 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9942 0.090 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9942 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9942 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9942 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9942 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9942 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9942 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9941 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9941 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9941 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9941 0.095 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9941 0.092 __alloc_pages_nodemask+0x0 (__vmalloc_node_range+0xf3)
+func -9941 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9941 0.121 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9941 0.112 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9941 0.096 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9941 0.091 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9940 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9940 0.095 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9940 0.121 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
#func -9940 0.085 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9940 0.091 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9940 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9940 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9940 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9940 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9940 0.143 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9939 0.122 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9939 0.107 map_vm_area+0x0 (__vmalloc_node_range+0x11e)
+func -9939 0.594 vmap_page_range_noflush+0x0 (map_vm_area+0x2e)
+func -9939 0.096 __ipipe_pin_range_globally+0x0 (vmap_page_range_noflush+0x22a)
+func -9939 5.627 _raw_spin_lock+0x0 (__ipipe_pin_range_globally+0x8e)
+func -9933 0.377 xnheap_init+0x0 [xeno_nucleus] (xnpod_init+0x2ad [xeno_nucleus])
+func -9933 6.681 init_extent+0x0 [xeno_nucleus] (xnheap_init+0x1f1 [xeno_nucleus])
| +begin 0x80000000 -9926 0.370 xnheap_init+0x239 [xeno_nucleus] (xnpod_init+0x2ad [xeno_nucleus])
| *+func -9925 0.122 __ipipe_restore_head+0x0 (xnheap_init+0x360 [xeno_nucleus])
| +end 0x80000000 -9925 0.191 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9925 0.096 xnheap_set_label+0x0 [xeno_nucleus] (xnpod_init+0x2c8 [xeno_nucleus])
| +begin 0x80000000 -9925 0.270 xnheap_set_label+0x5d [xeno_nucleus] (xnpod_init+0x2c8 [xeno_nucleus])
| *+func -9925 0.112 __ipipe_restore_head+0x0 (xnheap_set_label+0x168 [xeno_nucleus])
| +end 0x80000000 -9925 0.089 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9925 0.109 kmalloc_order_trace+0x0 (xnpod_init+0x2dc [xeno_nucleus])
+func -9924 0.090 __get_free_pages+0x0 (kmalloc_order_trace+0x2e)
+func -9924 0.094 __alloc_pages_nodemask+0x0 (__get_free_pages+0x17)
+func -9924 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9924 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9924 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9924 0.102 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9924 0.090 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9924 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9924 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9924 0.151 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
+func -9923 0.111 _raw_spin_lock_irqsave+0x0 (get_page_from_freelist+0x5de)
#func -9923 0.131 __rmqueue+0x0 (get_page_from_freelist+0x5ef)
#func -9923 0.156 get_pageblock_flags_group+0x0 (get_page_from_freelist+0x60b)
#func -9923 0.103 __mod_zone_page_state+0x0 (get_page_from_freelist+0x622)
#func -9923 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9923 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9923 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9923 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9923 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9922 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9922 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9922 0.261 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9922 0.192 xnheap_init+0x0 [xeno_nucleus] (xnpod_init+0x2fe [xeno_nucleus])
+func -9922 3.767 init_extent+0x0 [xeno_nucleus] (xnheap_init+0x1f1 [xeno_nucleus])
| +begin 0x80000000 -9918 0.161 xnheap_init+0x239 [xeno_nucleus] (xnpod_init+0x2fe [xeno_nucleus])
| *+func -9918 0.102 __ipipe_restore_head+0x0 (xnheap_init+0x360 [xeno_nucleus])
| +end 0x80000000 -9918 0.103 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9918 0.103 xnheap_set_label+0x0 [xeno_nucleus] (xnpod_init+0x31b [xeno_nucleus])
| +begin 0x80000000 -9918 0.198 xnheap_set_label+0x5d [xeno_nucleus] (xnpod_init+0x31b [xeno_nucleus])
| *+func -9917 0.111 __ipipe_restore_head+0x0 (xnheap_set_label+0x168 [xeno_nucleus])
| +end 0x80000000 -9917 0.274 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9917 0.271 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9917 0.425 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9916 0.243 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9916 0.200 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9916 0.110 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9916 0.229 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9915 0.105 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9915 0.231 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9915 0.110 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9915 0.175 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9915 0.302 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9915 0.165 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9914 0.161 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9914 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9914 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9914 0.165 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9914 0.165 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9914 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9914 0.258 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9913 0.308 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9913 0.176 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9913 0.161 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9913 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9913 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9912 0.098 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9912 0.163 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9912 0.112 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9912 0.163 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9912 0.088 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9912 0.230 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9912 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9911 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9911 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9911 0.155 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9911 0.094 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9911 0.171 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9911 0.112 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9911 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9911 0.174 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9910 0.169 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9910 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9910 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9910 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9910 0.156 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9910 0.163 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9909 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9909 0.185 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9909 0.132 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9909 0.165 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9909 0.165 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9909 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9909 0.091 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9908 0.090 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9908 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9908 0.114 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9908 0.095 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9908 0.090 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9908 0.197 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9908 0.160 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9908 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9907 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9907 0.154 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9907 0.090 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9907 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9907 0.104 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9907 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9907 0.249 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9906 0.162 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9906 0.162 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9906 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9906 0.154 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9906 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9906 0.163 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9906 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9905 0.192 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9905 0.161 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9905 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9905 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9905 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9905 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9905 0.091 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9904 0.163 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9904 0.102 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9904 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9904 0.089 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9904 0.201 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9904 0.158 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9904 0.165 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9904 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9903 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9903 0.090 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9903 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9903 0.103 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9903 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9903 0.265 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9903 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9902 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9902 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9902 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9902 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9902 0.163 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9902 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9902 0.185 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9901 0.185 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9901 0.162 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9901 0.165 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9901 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9901 0.091 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9901 0.089 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9901 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9900 0.104 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9900 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9900 0.091 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9900 0.200 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9900 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9900 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9900 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9899 0.154 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9899 0.090 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9899 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9899 0.104 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9899 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9899 0.272 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9899 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9898 0.163 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9898 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9898 0.154 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9898 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9898 0.163 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9898 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9898 0.185 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9897 0.110 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9897 0.162 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9897 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9897 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9897 0.091 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9897 0.091 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9897 0.183 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9896 0.103 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9896 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9896 0.091 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9896 0.194 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9896 0.160 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9896 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9896 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9896 0.154 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9895 0.090 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9895 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9895 0.105 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9895 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9895 0.290 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9895 0.160 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9895 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9894 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9894 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9894 0.158 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9894 0.161 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9894 0.104 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9894 0.207 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9893 0.188 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9893 0.162 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9893 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9893 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9893 0.091 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9893 0.091 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9893 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9892 0.103 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9892 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9892 0.089 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9892 0.194 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9892 0.160 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9892 0.165 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9892 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9892 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9891 0.090 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9891 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9891 0.103 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9891 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9891 0.174 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9891 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9891 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9890 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9890 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9890 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9890 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9890 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9890 0.176 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9890 0.111 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9890 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9889 0.164 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9889 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9889 0.091 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9889 0.089 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9889 0.161 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9889 0.102 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9889 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9889 0.091 xnsched_init+0x0 [xeno_nucleus] (xnpod_init+0x33a [xeno_nucleus])
+func -9888 0.196 xnsched_rt_init+0x0 [xeno_nucleus] (xnsched_init+0x4e [xeno_nucleus])
+func -9888 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| +begin 0x80000000 -9888 0.162 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0xbe [xeno_nucleus])
| *+func -9888 0.102 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9888 0.154 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9888 0.089 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| +begin 0x80000000 -9888 0.162 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0xf3 [xeno_nucleus])
| *+func -9887 0.101 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9887 0.094 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9887 0.254 xnthread_init+0x0 [xeno_nucleus] (xnsched_init+0x174 [xeno_nucleus])
+func -9887 0.158 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| +begin 0x80000000 -9887 0.162 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x15f [xeno_nucleus])
| *+func -9887 0.103 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9887 0.152 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9886 0.157 __xntimer_init+0x0 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| +begin 0x80000000 -9886 0.161 __xntimer_init+0xe9 [xeno_nucleus] (xnthread_init+0x19f [xeno_nucleus])
| *+func -9886 0.121 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9886 0.200 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9886 0.161 xnsched_set_policy+0x0 [xeno_nucleus] (xnthread_init+0x3ac [xeno_nucleus])
+func -9886 0.161 __xntimer_init+0x0 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| +begin 0x80000000 -9885 0.165 __xntimer_init+0xe9 [xeno_nucleus] (xnsched_init+0x26a [xeno_nucleus])
| *+func -9885 0.101 __ipipe_restore_head+0x0 (__xntimer_init+0x21b [xeno_nucleus])
| +end 0x80000000 -9885 0.092 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9885 0.091 xntimer_migrate+0x0 [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| +begin 0x80000000 -9885 0.162 xntimer_migrate+0x2f [xeno_nucleus] (xnsched_init+0x2c4 [xeno_nucleus])
| *+func -9885 0.100 __ipipe_restore_head+0x0 (xntimer_migrate+0x140 [xeno_nucleus])
| +end 0x80000000 -9885 0.103 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9885 0.217 ipipe_virtualize_irq+0x0 (xnpod_init+0x3b1 [xeno_nucleus])
+func -9884 0.112 ipipe_request_irq+0x0 (ipipe_virtualize_irq+0x13)
+func -9884 0.098 __ipipe_spin_lock_irqsave+0x0 (ipipe_request_irq+0x48)
| +begin 0x80000001 -9884 0.281 __ipipe_spin_lock_irqsave+0x93 (ipipe_request_irq+0x48)
| #func -9884 0.170 __ipipe_spin_unlock_irqrestore+0x0 (ipipe_request_irq+0x6e)
| +end 0x80000001 -9884 0.183 ipipe_trace_end+0x19 (__ipipe_spin_unlock_irqrestore+0x39)
+func -9884 0.091 xnregistry_init+0x0 [xeno_nucleus] (xnpod_init+0x3b6 [xeno_nucleus])
+func -9883 0.091 kmalloc_order_trace+0x0 (xnregistry_init+0x1e [xeno_nucleus])
+func -9883 0.089 __get_free_pages+0x0 (kmalloc_order_trace+0x2e)
+func -9883 0.092 __alloc_pages_nodemask+0x0 (__get_free_pages+0x17)
+func -9883 0.090 ipipe_root_only+0x0 (__alloc_pages_nodemask+0x1e5)
| +begin 0x80000001 -9883 0.123 ipipe_root_only+0xa3 (__alloc_pages_nodemask+0x1e5)
| +end 0x80000001 -9883 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9883 0.101 _cond_resched+0x0 (__alloc_pages_nodemask+0x1ea)
+func -9883 0.089 next_zones_zonelist+0x0 (__alloc_pages_nodemask+0x115)
+func -9883 0.091 get_page_from_freelist+0x0 (__alloc_pages_nodemask+0x148)
+func -9883 0.096 next_zones_zonelist+0x0 (get_page_from_freelist+0x68)
+func -9883 0.114 __zone_watermark_ok+0x0 (get_page_from_freelist+0x12c)
+func -9882 0.121 _raw_spin_lock_irqsave+0x0 (get_page_from_freelist+0x5de)
#func -9882 0.142 __rmqueue+0x0 (get_page_from_freelist+0x5ef)
#func -9882 0.112 get_pageblock_flags_group+0x0 (get_page_from_freelist+0x60b)
#func -9882 0.092 __mod_zone_page_state+0x0 (get_page_from_freelist+0x622)
#func -9882 0.088 ipipe_restore_root+0x0 (get_page_from_freelist+0x42f)
#func -9882 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9882 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9882 0.094 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9882 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9881 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9881 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9881 0.437 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9881 0.162 xnvfile_init_dir+0x0 [xeno_nucleus] (xnregistry_init+0x48 [xeno_nucleus])
+func -9881 0.154 proc_mkdir+0x0 (xnvfile_init_dir+0x23 [xeno_nucleus])
+func -9880 0.156 proc_mkdir_data+0x0 (proc_mkdir+0x15)
+func -9880 0.162 __proc_create+0x0 (proc_mkdir_data+0x3a)
+func -9880 0.190 _raw_spin_lock+0x0 (__proc_create+0x3a)
+func -9880 0.190 __xlate_proc_name+0x0 (__proc_create+0x49)
+func -9880 0.097 __kmalloc+0x0 (__proc_create+0x9b)
+func -9880 0.107 kmalloc_slab+0x0 (__kmalloc+0x2e)
+func -9880 0.091 ipipe_root_only+0x0 (__kmalloc+0x55)
| +begin 0x80000001 -9879 0.120 ipipe_root_only+0xa3 (__kmalloc+0x55)
| +end 0x80000001 -9879 0.107 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9879 0.145 _cond_resched+0x0 (__kmalloc+0x5a)
+func -9879 0.337 __slab_alloc.constprop.69+0x0 (__kmalloc+0x133)
#func -9879 0.098 ipipe_restore_root+0x0 (__slab_alloc.constprop.69+0x4ef)
#func -9879 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9879 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9878 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9878 0.089 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9878 0.090 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9878 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9878 0.130 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9878 0.091 proc_register+0x0 (proc_mkdir_data+0x52)
+func -9878 0.302 proc_alloc_inum+0x0 (proc_register+0x20)
+func -9878 0.109 kmem_cache_alloc+0x0 (__idr_pre_get+0x74)
+func -9877 0.094 ipipe_root_only+0x0 (kmem_cache_alloc+0x31)
| +begin 0x80000001 -9877 0.121 ipipe_root_only+0xa3 (kmem_cache_alloc+0x31)
| +end 0x80000001 -9877 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9877 0.383 _cond_resched+0x0 (kmem_cache_alloc+0x36)
+func -9877 0.123 _raw_spin_lock_irqsave+0x0 (__idr_pre_get+0x38)
#func -9877 0.096 __ipipe_spin_unlock_debug+0x0 (__idr_pre_get+0x54)
#func -9876 0.089 _raw_spin_unlock_irqrestore+0x0 (__idr_pre_get+0x5f)
#func -9876 0.087 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x1c)
#func -9876 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9876 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9876 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9876 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9876 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9876 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9876 0.114 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9876 0.430 _raw_spin_lock_irq+0x0 (proc_alloc_inum+0x24)
#func -9875 0.109 _raw_spin_lock_irqsave+0x0 (get_from_free_list+0x1a)
#func -9875 0.109 __ipipe_spin_unlock_debug+0x0 (get_from_free_list+0x44)
#func -9875 0.100 _raw_spin_unlock_irqrestore+0x0 (get_from_free_list+0x4f)
#func -9875 0.088 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -9875 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9875 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9875 0.131 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9874 0.202 kmem_cache_free+0x0 (ida_get_new_above+0x228)
#func -9874 0.097 ipipe_unstall_root+0x0 (proc_alloc_inum+0x45)
| #begin 0x80000000 -9874 0.091 ipipe_unstall_root+0x1c (proc_alloc_inum+0x45)
| #func -9874 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9874 0.177 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9874 1.247 _raw_spin_lock+0x0 (proc_register+0x6e)
+func -9872 0.171 xnvfile_init_regular+0x0 [xeno_nucleus] (xnregistry_init+0x6a [xeno_nucleus])
+func -9872 0.090 proc_create_data+0x0 (xnvfile_init_regular+0x41 [xeno_nucleus])
+func -9872 0.100 __proc_create+0x0 (proc_create_data+0x4d)
+func -9872 0.104 _raw_spin_lock+0x0 (__proc_create+0x3a)
+func -9872 0.138 __xlate_proc_name+0x0 (__proc_create+0x49)
+func -9872 0.101 __kmalloc+0x0 (__proc_create+0x9b)
+func -9872 0.090 kmalloc_slab+0x0 (__kmalloc+0x2e)
+func -9872 0.089 ipipe_root_only+0x0 (__kmalloc+0x55)
| +begin 0x80000001 -9872 0.122 ipipe_root_only+0xa3 (__kmalloc+0x55)
| +end 0x80000001 -9871 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9871 0.144 _cond_resched+0x0 (__kmalloc+0x5a)
+func -9871 0.101 proc_register+0x0 (proc_create_data+0x69)
+func -9871 0.091 proc_alloc_inum+0x0 (proc_register+0x20)
+func -9871 0.090 kmem_cache_alloc+0x0 (__idr_pre_get+0x74)
+func -9871 0.092 ipipe_root_only+0x0 (kmem_cache_alloc+0x31)
| +begin 0x80000001 -9871 0.121 ipipe_root_only+0xa3 (kmem_cache_alloc+0x31)
| +end 0x80000001 -9871 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9871 0.123 _cond_resched+0x0 (kmem_cache_alloc+0x36)
+func -9870 0.111 _raw_spin_lock_irqsave+0x0 (__idr_pre_get+0x38)
#func -9870 0.101 __ipipe_spin_unlock_debug+0x0 (__idr_pre_get+0x54)
#func -9870 0.096 _raw_spin_unlock_irqrestore+0x0 (__idr_pre_get+0x5f)
#func -9870 0.089 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x1c)
#func -9870 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9870 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9870 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9870 0.090 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9870 0.092 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9870 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9869 0.090 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9869 0.141 _raw_spin_lock_irq+0x0 (proc_alloc_inum+0x24)
#func -9869 0.111 _raw_spin_lock_irqsave+0x0 (get_from_free_list+0x1a)
#func -9869 0.096 __ipipe_spin_unlock_debug+0x0 (get_from_free_list+0x44)
#func -9869 0.097 _raw_spin_unlock_irqrestore+0x0 (get_from_free_list+0x4f)
#func -9869 0.087 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -9869 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9869 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9869 0.112 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9868 0.094 kmem_cache_free+0x0 (ida_get_new_above+0x228)
#func -9868 0.087 ipipe_unstall_root+0x0 (proc_alloc_inum+0x45)
| #begin 0x80000000 -9868 0.091 ipipe_unstall_root+0x1c (proc_alloc_inum+0x45)
| #func -9868 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9868 0.109 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9868 0.191 _raw_spin_lock+0x0 (proc_register+0x6e)
+func -9868 0.095 rthal_apc_alloc+0x0 (xnregistry_init+0x89 [xeno_nucleus])
| +begin 0x80000000 -9868 0.357 rthal_apc_alloc+0x3b (xnregistry_init+0x89 [xeno_nucleus])
| *+func -9867 0.115 __ipipe_restore_head+0x0 (rthal_apc_alloc+0x123)
| +end 0x80000000 -9867 5.592 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -9862 0.185 kmem_cache_alloc_trace+0x0 (xnregistry_init+0x1c5 [xeno_nucleus])
+func -9861 0.090 ipipe_root_only+0x0 (kmem_cache_alloc_trace+0x38)
| +begin 0x80000001 -9861 0.130 ipipe_root_only+0xa3 (kmem_cache_alloc_trace+0x38)
| +end 0x80000001 -9861 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
+func -9861 0.211 _cond_resched+0x0 (kmem_cache_alloc_trace+0x3d)
+func -9861 0.322 __slab_alloc.constprop.69+0x0 (kmem_cache_alloc_trace+0xf3)
#func -9861 0.096 ipipe_restore_root+0x0 (__slab_alloc.constprop.69+0x4ef)
#func -9860 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9860 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9860 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9860 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9860 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9860 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9860 0.604 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9859 0.191 xnsynch_init+0x0 [xeno_nucleus] (xnregistry_init+0x200 [xeno_nucleus])
+func -9859 0.277 rthal_smi_init+0x0 (xnpod_init+0x3c6 [xeno_nucleus])
+func -9859 0.170 pci_get_class+0x0 (rthal_smi_init+0x24)
+func -9859 0.176 pci_get_dev_by_id+0x0 (pci_get_class+0x44)
+func -9858 0.560 bus_find_device+0x0 (pci_get_dev_by_id+0x4e)
+func -9858 0.249 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9858 0.169 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9857 0.201 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9857 0.165 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9857 0.261 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9857 0.167 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9857 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9857 0.168 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9856 0.108 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9856 0.164 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9856 0.104 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9856 0.157 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9856 0.107 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9856 0.165 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9856 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9855 0.157 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9855 0.107 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9855 0.169 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9855 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9855 0.268 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9855 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9855 0.165 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9854 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9854 0.156 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9854 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9854 0.168 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9854 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9854 0.165 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9854 0.107 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9853 0.156 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9853 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9853 0.167 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9853 0.105 _raw_spin_lock+0x0 (klist_next+0x20)
+func -9853 0.255 match_pci_dev_by_id+0x0 (bus_find_device+0x62)
+func -9853 0.196 get_device+0x0 (bus_find_device+0x98)
+func -9852 0.269 _raw_spin_lock+0x0 (klist_put+0x25)
+func -9852 0.115 printk+0x0 (rthal_smi_init+0xe6)
| +begin 0x80000001 -9852 0.158 printk+0x66 (rthal_smi_init+0xe6)
| +end 0x80000001 -9852 0.181 ipipe_trace_end+0x19 (printk+0x15c)
+func -9852 0.298 vprintk_emit+0x0 (printk+0x178)
#func -9851 0.378 _raw_spin_lock+0x0 (vprintk_emit+0x10f)
#func -9851 0.422 log_store+0x0 (vprintk_emit+0x1cb)
#func -9851 0.172 console_trylock+0x0 (vprintk_emit+0x1d6)
#func -9850 0.088 down_trylock+0x0 (console_trylock+0x19)
#func -9850 0.118 _raw_spin_lock_irqsave+0x0 (down_trylock+0x16)
#func -9850 0.101 __ipipe_spin_unlock_debug+0x0 (down_trylock+0x2f)
#func -9850 0.100 _raw_spin_unlock_irqrestore+0x0 (down_trylock+0x3a)
#func -9850 0.085 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -9850 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9850 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9850 0.310 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9849 0.087 console_unlock+0x0 (vprintk_emit+0x285)
#func -9849 0.127 _raw_spin_lock_irqsave+0x0 (console_unlock+0x34)
#func -9849 0.089 __ipipe_spin_unlock_debug+0x0 (console_unlock+0x6f)
#func -9849 0.098 _raw_spin_unlock_irqrestore+0x0 (console_unlock+0x7e)
#func -9849 0.085 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -9849 0.091 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9849 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9849 0.111 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9849 0.318 _raw_spin_lock_irqsave+0x0 (console_unlock+0x94)
#func -9848 0.327 msg_print_text+0x0 (console_unlock+0x38b)
#func -9848 0.109 print_prefix+0x0 (msg_print_text+0xc0)
#func -9848 0.584 print_time.part.4+0x0 (print_prefix+0x6f)
#func -9847 0.108 print_prefix+0x0 (msg_print_text+0x160)
#func -9847 0.600 print_time.part.4+0x0 (print_prefix+0x6f)
#func -9847 0.097 print_prefix+0x0 (msg_print_text+0xc0)
#func -9847 0.177 print_time.part.4+0x0 (print_prefix+0x6f)
#func -9846 0.097 print_prefix+0x0 (msg_print_text+0x160)
#func -9846 0.292 print_time.part.4+0x0 (print_prefix+0x6f)
#func -9846 0.109 print_prefix+0x0 (msg_print_text+0xc0)
#func -9846 0.155 print_time.part.4+0x0 (print_prefix+0x6f)
#func -9846 0.089 print_prefix+0x0 (msg_print_text+0x160)
#func -9846 0.245 print_time.part.4+0x0 (print_prefix+0x6f)
#func -9845 0.089 call_console_drivers.constprop.15+0x0 (console_unlock+0x3d9)
#func -9845 0.100 ipipe_restore_root+0x0 (console_unlock+0x3e8)
#func -9845 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9845 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9845 0.111 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9845 0.141 _raw_spin_lock_irqsave+0x0 (console_unlock+0x94)
#func -9845 0.098 up+0x0 (console_unlock+0x1cf)
#func -9845 0.121 _raw_spin_lock_irqsave+0x0 (up+0x14)
#func -9844 0.090 __ipipe_spin_unlock_debug+0x0 (up+0x30)
#func -9844 0.088 _raw_spin_unlock_irqrestore+0x0 (up+0x3b)
#func -9844 0.085 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -9844 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9844 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9844 0.114 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9844 0.107 _raw_spin_lock+0x0 (console_unlock+0x1db)
#func -9844 0.098 __ipipe_spin_unlock_debug+0x0 (console_unlock+0x1f1)
#func -9844 0.087 _raw_spin_unlock_irqrestore+0x0 (console_unlock+0x200)
#func -9844 0.085 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -9844 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9843 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9843 0.177 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9843 0.225 wake_up_klogd+0x0 (console_unlock+0x21c)
#func -9843 0.088 ipipe_restore_root+0x0 (vprintk_emit+0x216)
#func -9843 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9843 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9843 0.102 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9843 0.088 ipipe_unstall_root+0x0 (ipipe_restore_root+0x3d)
| #begin 0x80000000 -9842 0.091 ipipe_unstall_root+0x1c (ipipe_restore_root+0x3d)
| #func -9842 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -9842 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -9842 0.115 pci_dev_put+0x0 (rthal_smi_init+0xee)
+func -9842 0.115 put_device+0x0 (pci_dev_put+0x1a)
+func -9842 0.264 rthal_smi_disable+0x0 (xnpod_init+0x3cb [xeno_nucleus])
+func -9842 0.100 xnshadow_grab_events+0x0 [xeno_nucleus] (xnpod_init+0x3d0 [xeno_nucleus])
+func -9842 0.290 ipipe_catch_event+0x0 (xnshadow_grab_events+0x21 [xeno_nucleus])
+func -9841 0.164 ipipe_set_hooks+0x0 (ipipe_catch_event+0xbb)
+func -9841 0.144 ipipe_critical_enter+0x0 (ipipe_set_hooks+0x45)
| +begin 0x80000001 -9841 0.311 ipipe_critical_enter+0x229 (ipipe_set_hooks+0x45)
| +func -9841 0.170 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -9840 3.507 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -9837 0.927 ipipe_critical_exit+0x0 (ipipe_set_hooks+0x14d)
| +end 0x80000001 -9836 0.258 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
+func -9836 0.171 ipipe_catch_event+0x0 (xnshadow_grab_events+0x39 [xeno_nucleus])
+func -9836 0.117 ipipe_set_hooks+0x0 (ipipe_catch_event+0xbb)
+func -9835 0.110 ipipe_critical_enter+0x0 (ipipe_set_hooks+0x45)
| +begin 0x80000001 -9835 0.162 ipipe_critical_enter+0x229 (ipipe_set_hooks+0x45)
| +func -9835 0.120 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -9835 3.310 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -9832 0.885 ipipe_critical_exit+0x0 (ipipe_set_hooks+0x14d)
| +end 0x80000001 -9831 0.207 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
+func -9831 0.178 ipipe_catch_event+0x0 (xnshadow_grab_events+0x51 [xeno_nucleus])
+func -9830 0.108 ipipe_set_hooks+0x0 (ipipe_catch_event+0xbb)
+func -9830 0.107 ipipe_critical_enter+0x0 (ipipe_set_hooks+0x45)
| +begin 0x80000001 -9830 0.167 ipipe_critical_enter+0x229 (ipipe_set_hooks+0x45)
| +func -9830 0.123 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -9830 1782.548 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -8047 1.057 ipipe_critical_exit+0x0 (ipipe_set_hooks+0x14d)
| +end 0x80000001 -8046 27.121 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
| +func -8019 0.201 __ipipe_handle_irq+0x0 (apic_timer_interrupt+0x7c)
| +func -8019 0.137 __ipipe_dispatch_irq+0x0 (__ipipe_handle_irq+0x8d)
| +func -8019 0.155 __ipipe_ack_apic+0x0 (__ipipe_dispatch_irq+0x357)
| +func -8019 0.152 __ipipe_set_irq_pending+0x0 (__ipipe_dispatch_irq+0x216)
| +func -8019 0.151 __ipipe_do_sync_pipeline+0x0 (__ipipe_dispatch_irq+0x2a1)
| +func -8018 0.127 __ipipe_do_sync_stage+0x0 (__ipipe_do_sync_pipeline+0x115)
| #end 0x80000000 -8018 0.097 ipipe_trace_end+0x19 (__ipipe_do_sync_stage+0xe8)
#func -8018 0.096 __ipipe_do_IRQ+0x0 (__ipipe_do_sync_stage+0x1c8)
#func -8018 0.120 __ipipe_get_ioapic_irq_vector+0x0 (__ipipe_do_IRQ+0x24)
#func -8018 0.090 smp_apic_timer_interrupt+0x0 (__ipipe_do_IRQ+0x79)
#func -8018 0.088 irq_enter+0x0 (smp_apic_timer_interrupt+0x2a)
#func -8018 0.135 rcu_irq_enter+0x0 (irq_enter+0x17)
#func -8018 0.089 ipipe_restore_root+0x0 (rcu_irq_enter+0x92)
#func -8018 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -8018 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -8017 0.147 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8017 0.100 exit_idle+0x0 (smp_apic_timer_interrupt+0x2f)
#func -8017 0.104 hrtimer_interrupt+0x0 (smp_apic_timer_interrupt+0x55)
#func -8017 0.092 _raw_spin_lock+0x0 (hrtimer_interrupt+0x4f)
#func -8017 0.123 ktime_get_update_offsets+0x0 (hrtimer_interrupt+0x81)
#func -8017 0.110 __run_hrtimer+0x0 (hrtimer_interrupt+0xf7)
#func -8017 0.150 __remove_hrtimer+0x0 (__run_hrtimer+0x67)
#func -8017 0.089 tick_sched_timer+0x0 (__run_hrtimer+0x91)
#func -8016 0.121 ktime_get+0x0 (tick_sched_timer+0x1f)
#func -8016 0.128 _raw_spin_lock+0x0 (tick_sched_timer+0x8f)
#func -8016 0.097 do_timer+0x0 (tick_sched_timer+0xcb)
#func -8016 0.150 _raw_spin_lock_irqsave+0x0 (do_timer+0x2c)
#func -8016 0.110 ntp_tick_length+0x0 (do_timer+0x83)
#func -8016 0.148 ntp_tick_length+0x0 (do_timer+0x2ec)
#func -8016 0.098 timekeeping_update.constprop.8+0x0 (do_timer+0x1e1)
#func -8016 0.111 update_vsyscall+0x0 (timekeeping_update.constprop.8+0x1d)
#func -8015 0.102 set_normalized_timespec+0x0 (update_vsyscall+0xf7)
#func -8015 0.100 ipipe_update_hostrt+0x0 (update_vsyscall+0x12c)
#func -8015 0.088 ipipe_root_only+0x0 (ipipe_update_hostrt+0x28)
| #begin 0x80000001 -8015 0.117 ipipe_root_only+0xa3 (ipipe_update_hostrt+0x28)
| #end 0x80000001 -8015 0.103 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8015 0.095 __ipipe_notify_kevent+0x0 (ipipe_update_hostrt+0x7d)
#func -8015 0.088 ipipe_root_only+0x0 (__ipipe_notify_kevent+0x21)
| #begin 0x80000001 -8015 0.117 ipipe_root_only+0xa3 (__ipipe_notify_kevent+0x21)
| #end 0x80000001 -8015 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| #begin 0x80000001 -8015 0.116 __ipipe_notify_kevent+0xcb (ipipe_update_hostrt+0x7d)
| #end 0x80000001 -8014 0.097 ipipe_trace_end+0x19 (__ipipe_notify_kevent+0xf2)
#func -8014 0.096 ipipe_kevent_hook+0x0 (__ipipe_notify_kevent+0x81)
#func -8014 0.091 hostrt_event+0x0 [xeno_nucleus] (ipipe_kevent_hook+0x1f)
#func -8014 0.089 __ipipe_spin_lock_irqsave+0x0 (hostrt_event+0x19 [xeno_nucleus])
| #begin 0x80000001 -8014 0.167 __ipipe_spin_lock_irqsave+0x93 (hostrt_event+0x19 [xeno_nucleus])
| #func -8014 0.098 __ipipe_spin_unlock_irqrestore+0x0 (hostrt_event+0x87 [xeno_nucleus])
| #end 0x80000001 -8014 0.095 ipipe_trace_end+0x19 (__ipipe_spin_unlock_irqrestore+0x39)
| #begin 0x80000001 -8014 0.103 __ipipe_notify_kevent+0xde (ipipe_update_hostrt+0x7d)
| #end 0x80000001 -8014 0.095 ipipe_trace_end+0x19 (__ipipe_notify_kevent+0xaa)
#func -8014 0.087 raw_notifier_call_chain+0x0 (timekeeping_update.constprop.8+0x32)
#func -8013 0.088 notifier_call_chain+0x0 (raw_notifier_call_chain+0x16)
#func -8013 0.107 __ipipe_spin_unlock_debug+0x0 (do_timer+0x1f5)
#func -8013 0.088 _raw_spin_unlock_irqrestore+0x0 (do_timer+0x204)
#func -8013 0.085 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -8013 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -8013 0.116 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -8013 0.108 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8013 0.109 calc_global_load+0x0 (do_timer+0x20c)
#func -8013 0.100 update_root_process_times+0x0 (tick_sched_timer+0x3f)
#func -8013 0.089 update_process_times+0x0 (update_root_process_times+0x57)
#func -8012 0.108 account_process_tick+0x0 (update_process_times+0x2d)
#func -8012 0.191 account_system_time+0x0 (account_process_tick+0x3d)
#func -8012 0.185 cpuacct_account_field+0x0 (account_system_time+0xc6)
#func -8012 0.091 acct_account_cputime+0x0 (account_system_time+0xce)
#func -8012 0.128 __acct_update_integrals+0x0 (acct_account_cputime+0x1c)
#func -8012 0.135 jiffies_to_timeval+0x0 (__acct_update_integrals+0x73)
#func -8012 0.084 ipipe_restore_root+0x0 (__acct_update_integrals+0x93)
#func -8012 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -8011 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -8011 0.107 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8011 0.092 hrtimer_run_queues+0x0 (update_process_times+0x32)
#func -8011 0.109 raise_softirq+0x0 (update_process_times+0x3c)
#func -8011 0.088 ipipe_restore_root+0x0 (raise_softirq+0xda)
#func -8011 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -8011 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -8011 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8011 0.148 rcu_check_callbacks+0x0 (update_process_times+0x47)
#func -8010 0.152 cpu_needs_another_gp+0x0 (rcu_check_callbacks+0x21b)
#func -8010 0.214 cpu_needs_another_gp+0x0 (rcu_check_callbacks+0x21b)
#func -8010 0.120 wake_up_klogd_work_func+0x0 (__irq_work_run+0x8f)
#func -8010 0.089 __wake_up+0x0 (wake_up_klogd_work_func+0x42)
#func -8010 0.121 _raw_spin_lock_irqsave+0x0 (__wake_up+0x23)
#func -8010 0.092 __wake_up_common+0x0 (__wake_up+0x39)
#func -8010 0.087 ipipe_root_only+0x0 (__wake_up_common+0x2f)
| #begin 0x80000001 -8010 0.118 ipipe_root_only+0xa3 (__wake_up_common+0x2f)
| #end 0x80000001 -8009 0.415 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8009 0.101 autoremove_wake_function+0x0 (__wake_up_common+0x60)
#func -8009 0.096 default_wake_function+0x0 (autoremove_wake_function+0x12)
#func -8009 0.101 try_to_wake_up+0x0 (default_wake_function+0x12)
#func -8009 0.355 _raw_spin_lock_irqsave+0x0 (try_to_wake_up+0x31)
#func -8008 0.241 task_waking_fair+0x0 (try_to_wake_up+0xb4)
#func -8008 0.172 select_task_rq_fair+0x0 (try_to_wake_up+0xc6)
#func -8008 0.120 source_load+0x0 (select_task_rq_fair+0x3ba)
#func -8008 0.274 target_load+0x0 (select_task_rq_fair+0x3c9)
#func -8008 0.335 effective_load.isra.35+0x0 (select_task_rq_fair+0x442)
#func -8007 0.164 effective_load.isra.35+0x0 (select_task_rq_fair+0x49a)
#func -8007 0.148 idle_cpu+0x0 (select_task_rq_fair+0x50f)
#func -8007 0.114 _raw_spin_lock+0x0 (try_to_wake_up+0x1c0)
#func -8007 0.090 ttwu_do_activate.constprop.82+0x0 (try_to_wake_up+0x1cb)
#func -8007 0.095 activate_task+0x0 (ttwu_do_activate.constprop.82+0x33)
#func -8007 0.096 enqueue_task+0x0 (activate_task+0x23)
#func -8007 0.180 update_rq_clock.part.71+0x0 (enqueue_task+0x6c)
#func -8006 0.121 enqueue_task_fair+0x0 (enqueue_task+0x51)
#func -8006 0.229 update_curr+0x0 (enqueue_task_fair+0x44c)
#func -8006 0.121 __compute_runnable_contrib.part.48+0x0 (enqueue_task_fair+0xf61)
#func -8006 0.140 update_cfs_rq_blocked_load+0x0 (enqueue_task_fair+0x28e)
#func -8006 0.115 account_entity_enqueue+0x0 (enqueue_task_fair+0x299)
#func -8006 0.124 update_cfs_shares+0x0 (enqueue_task_fair+0x2a1)
#func -8006 0.222 place_entity+0x0 (enqueue_task_fair+0x2ae)
#func -8005 0.134 __enqueue_entity+0x0 (enqueue_task_fair+0x3ec)
#func -8005 0.208 update_curr+0x0 (enqueue_task_fair+0x44c)
#func -8005 0.129 update_cfs_rq_blocked_load+0x0 (enqueue_task_fair+0x28e)
#func -8005 0.124 account_entity_enqueue+0x0 (enqueue_task_fair+0x299)
#func -8005 0.120 update_cfs_shares+0x0 (enqueue_task_fair+0x2a1)
#func -8005 0.191 place_entity+0x0 (enqueue_task_fair+0x2ae)
#func -8004 0.165 __enqueue_entity+0x0 (enqueue_task_fair+0x3ec)
#func -8004 0.110 hrtick_update+0x0 (enqueue_task_fair+0xb1e)
#func -8004 0.101 ttwu_do_wakeup+0x0 (ttwu_do_activate.constprop.82+0x5d)
#func -8004 0.160 check_preempt_curr+0x0 (ttwu_do_wakeup+0x19)
#func -8004 0.211 resched_task+0x0 (check_preempt_curr+0x75)
#func -8004 0.122 native_smp_send_reschedule+0x0 (resched_task+0x64)
#func -8004 0.098 flat_send_IPI_mask+0x0 (native_smp_send_reschedule+0x47)
| #begin 0x80000001 -8003 0.142 flat_send_IPI_mask+0xef (native_smp_send_reschedule+0x47)
| #end 0x80000001 -8003 0.112 ipipe_trace_end+0x19 (flat_send_IPI_mask+0xaf)
#func -8003 0.102 ttwu_stat+0x0 (try_to_wake_up+0x1e8)
#func -8003 0.087 __ipipe_spin_unlock_debug+0x0 (try_to_wake_up+0x1f0)
#func -8003 0.088 _raw_spin_unlock_irqrestore+0x0 (try_to_wake_up+0x1fb)
#func -8003 0.087 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -8003 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -8003 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -8003 0.128 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8002 0.088 __ipipe_spin_unlock_debug+0x0 (__wake_up+0x41)
#func -8002 0.088 _raw_spin_unlock_irqrestore+0x0 (__wake_up+0x4c)
#func -8002 0.087 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -8002 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -8002 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -8002 0.125 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -8002 0.125 scheduler_tick+0x0 (update_process_times+0x66)
#func -8002 0.115 _raw_spin_lock+0x0 (scheduler_tick+0x44)
#func -8002 0.164 update_rq_clock.part.71+0x0 (scheduler_tick+0x1b8)
#func -8001 0.098 task_tick_fair+0x0 (scheduler_tick+0x151)
#func -8001 0.183 update_curr+0x0 (task_tick_fair+0x2ab)
#func -8001 0.117 update_min_vruntime+0x0 (update_curr+0x79)
#func -8001 0.120 cpuacct_charge+0x0 (update_curr+0x9f)
#func -8001 0.148 update_cfs_rq_blocked_load+0x0 (task_tick_fair+0x212)
#func -8001 0.148 update_cfs_shares+0x0 (task_tick_fair+0x21a)
#func -8001 0.101 update_curr+0x0 (update_cfs_shares+0xd0)
#func -8001 0.102 update_min_vruntime+0x0 (update_curr+0x79)
#func -8000 0.122 account_entity_dequeue+0x0 (update_cfs_shares+0x87)
#func -8000 0.104 account_entity_enqueue+0x0 (update_cfs_shares+0xa4)
#func -8000 0.121 update_curr+0x0 (task_tick_fair+0x2ab)
#func -8000 0.105 update_cfs_rq_blocked_load+0x0 (task_tick_fair+0x212)
#func -8000 0.210 update_cfs_shares+0x0 (task_tick_fair+0x21a)
#func -8000 0.115 trigger_load_balance+0x0 (scheduler_tick+0x185)
#func -8000 0.132 run_posix_cpu_timers+0x0 (update_process_times+0x6e)
#func -8000 0.092 profile_tick+0x0 (tick_sched_timer+0x49)
#func -7999 0.110 hrtimer_forward+0x0 (tick_sched_timer+0x5b)
#func -7999 0.094 _raw_spin_lock+0x0 (__run_hrtimer+0xa1)
#func -7999 0.164 enqueue_hrtimer+0x0 (__run_hrtimer+0xbc)
#func -7999 0.089 tick_program_event+0x0 (hrtimer_interrupt+0x136)
#func -7999 0.089 clockevents_program_event+0x0 (tick_program_event+0x24)
#func -7999 0.107 ktime_get+0x0 (clockevents_program_event+0x39)
#func -7999 0.176 lapic_next_deadline+0x0 (clockevents_program_event+0x6b)
#func -7999 0.098 irq_exit+0x0 (smp_apic_timer_interrupt+0x5a)
#func -7999 0.121 do_softirq+0x0 (irq_exit+0x7d)
#func -7998 0.088 __do_softirq+0x0 (call_softirq+0x1e)
#func -7998 0.090 msecs_to_jiffies+0x0 (__do_softirq+0x20)
#func -7998 0.088 ipipe_unstall_root+0x0 (__do_softirq+0xa8)
| #begin 0x80000000 -7998 0.091 ipipe_unstall_root+0x1c (__do_softirq+0xa8)
| #func -7998 0.141 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -7998 0.120 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -7998 0.098 run_timer_softirq+0x0 (__do_softirq+0xf7)
+func -7998 0.110 hrtimer_run_pending+0x0 (run_timer_softirq+0x24)
+func -7998 0.151 _raw_spin_lock_irq+0x0 (run_timer_softirq+0x3d)
#func -7997 0.090 ipipe_unstall_root+0x0 (run_timer_softirq+0x18f)
| #begin 0x80000000 -7997 0.091 ipipe_unstall_root+0x1c (run_timer_softirq+0x18f)
| #func -7997 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -7997 0.089 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -7997 0.105 rcu_bh_qs+0x0 (__do_softirq+0x11b)
#func -7997 0.110 __local_bh_enable+0x0 (__do_softirq+0x17a)
#func -7997 0.088 ipipe_restore_root+0x0 (do_softirq+0x8a)
#func -7997 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -7997 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -7996 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -7996 0.118 rcu_irq_exit+0x0 (irq_exit+0x50)
#func -7996 0.088 ipipe_restore_root+0x0 (rcu_irq_exit+0x8a)
#func -7996 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -7996 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -7996 0.355 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| +end 0x000000ef -7996 0.381 ipipe_trace_end+0x19 (apic_timer_interrupt+0x8f)
+func -7995 0.137 ipipe_catch_event+0x0 (xnshadow_grab_events+0x69 [xeno_nucleus])
+func -7995 0.092 ipipe_set_hooks+0x0 (ipipe_catch_event+0xbb)
+func -7995 0.088 ipipe_critical_enter+0x0 (ipipe_set_hooks+0x45)
| +begin 0x80000001 -7995 0.160 ipipe_critical_enter+0x229 (ipipe_set_hooks+0x45)
| +func -7995 0.100 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -7995 3.098 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -7992 1.004 ipipe_critical_exit+0x0 (ipipe_set_hooks+0x14d)
| +end 0x80000001 -7991 0.203 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
+func -7990 0.172 ipipe_catch_event+0x0 (xnshadow_grab_events+0x81 [xeno_nucleus])
+func -7990 0.111 ipipe_set_hooks+0x0 (ipipe_catch_event+0xbb)
+func -7990 0.110 ipipe_critical_enter+0x0 (ipipe_set_hooks+0x45)
| +begin 0x80000001 -7990 0.164 ipipe_critical_enter+0x229 (ipipe_set_hooks+0x45)
| +func -7990 0.122 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -7990 3.148 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -7987 1.029 ipipe_critical_exit+0x0 (ipipe_set_hooks+0x14d)
| +end 0x80000001 -7985 0.189 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
+func -7985 0.243 ipipe_catch_event+0x0 (xnshadow_grab_events+0x99 [xeno_nucleus])
+func -7985 0.131 ipipe_set_hooks+0x0 (ipipe_catch_event+0xbb)
+func -7985 0.110 ipipe_critical_enter+0x0 (ipipe_set_hooks+0x45)
| +begin 0x80000001 -7985 0.172 ipipe_critical_enter+0x229 (ipipe_set_hooks+0x45)
| +func -7985 0.120 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -7985 3934.961 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -4050 1.023 ipipe_critical_exit+0x0 (ipipe_set_hooks+0x14d)
| +end 0x80000001 -4049 26.631 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
| +func -4022 0.205 __ipipe_handle_irq+0x0 (apic_timer_interrupt+0x7c)
| +func -4022 0.115 __ipipe_dispatch_irq+0x0 (__ipipe_handle_irq+0x8d)
| +func -4022 0.128 __ipipe_ack_apic+0x0 (__ipipe_dispatch_irq+0x357)
| +func -4021 0.143 __ipipe_set_irq_pending+0x0 (__ipipe_dispatch_irq+0x216)
| +func -4021 0.152 __ipipe_do_sync_pipeline+0x0 (__ipipe_dispatch_irq+0x2a1)
| +func -4021 0.129 __ipipe_do_sync_stage+0x0 (__ipipe_do_sync_pipeline+0x115)
| #end 0x80000000 -4021 0.107 ipipe_trace_end+0x19 (__ipipe_do_sync_stage+0xe8)
#func -4021 0.089 __ipipe_do_IRQ+0x0 (__ipipe_do_sync_stage+0x1c8)
#func -4021 0.102 __ipipe_get_ioapic_irq_vector+0x0 (__ipipe_do_IRQ+0x24)
#func -4021 0.097 smp_apic_timer_interrupt+0x0 (__ipipe_do_IRQ+0x79)
#func -4021 0.097 irq_enter+0x0 (smp_apic_timer_interrupt+0x2a)
#func -4021 0.127 rcu_irq_enter+0x0 (irq_enter+0x17)
#func -4020 0.089 ipipe_restore_root+0x0 (rcu_irq_enter+0x92)
#func -4020 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -4020 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -4020 0.123 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -4020 0.103 exit_idle+0x0 (smp_apic_timer_interrupt+0x2f)
#func -4020 0.105 hrtimer_interrupt+0x0 (smp_apic_timer_interrupt+0x55)
#func -4020 0.092 _raw_spin_lock+0x0 (hrtimer_interrupt+0x4f)
#func -4020 0.117 ktime_get_update_offsets+0x0 (hrtimer_interrupt+0x81)
#func -4020 0.100 __run_hrtimer+0x0 (hrtimer_interrupt+0xf7)
#func -4019 0.107 __remove_hrtimer+0x0 (__run_hrtimer+0x67)
#func -4019 0.088 tick_sched_timer+0x0 (__run_hrtimer+0x91)
#func -4019 0.111 ktime_get+0x0 (tick_sched_timer+0x1f)
#func -4019 0.092 _raw_spin_lock+0x0 (tick_sched_timer+0x8f)
#func -4019 0.094 do_timer+0x0 (tick_sched_timer+0xcb)
#func -4019 0.148 _raw_spin_lock_irqsave+0x0 (do_timer+0x2c)
#func -4019 0.096 ntp_tick_length+0x0 (do_timer+0x83)
#func -4019 0.141 ntp_tick_length+0x0 (do_timer+0x2ec)
#func -4019 0.101 timekeeping_update.constprop.8+0x0 (do_timer+0x1e1)
#func -4018 0.094 update_vsyscall+0x0 (timekeeping_update.constprop.8+0x1d)
#func -4018 0.102 set_normalized_timespec+0x0 (update_vsyscall+0xf7)
#func -4018 0.089 ipipe_update_hostrt+0x0 (update_vsyscall+0x12c)
#func -4018 0.089 ipipe_root_only+0x0 (ipipe_update_hostrt+0x28)
| #begin 0x80000001 -4018 0.120 ipipe_root_only+0xa3 (ipipe_update_hostrt+0x28)
| #end 0x80000001 -4018 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -4018 0.095 __ipipe_notify_kevent+0x0 (ipipe_update_hostrt+0x7d)
#func -4018 0.088 ipipe_root_only+0x0 (__ipipe_notify_kevent+0x21)
| #begin 0x80000001 -4018 0.121 ipipe_root_only+0xa3 (__ipipe_notify_kevent+0x21)
| #end 0x80000001 -4018 0.095 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| #begin 0x80000001 -4017 0.103 __ipipe_notify_kevent+0xcb (ipipe_update_hostrt+0x7d)
| #end 0x80000001 -4017 0.092 ipipe_trace_end+0x19 (__ipipe_notify_kevent+0xf2)
#func -4017 0.089 ipipe_kevent_hook+0x0 (__ipipe_notify_kevent+0x81)
#func -4017 0.089 hostrt_event+0x0 [xeno_nucleus] (ipipe_kevent_hook+0x1f)
#func -4017 0.089 __ipipe_spin_lock_irqsave+0x0 (hostrt_event+0x19 [xeno_nucleus])
| #begin 0x80000001 -4017 0.137 __ipipe_spin_lock_irqsave+0x93 (hostrt_event+0x19 [xeno_nucleus])
| #func -4017 0.092 __ipipe_spin_unlock_irqrestore+0x0 (hostrt_event+0x87 [xeno_nucleus])
| #end 0x80000001 -4017 0.097 ipipe_trace_end+0x19 (__ipipe_spin_unlock_irqrestore+0x39)
| #begin 0x80000001 -4017 0.092 __ipipe_notify_kevent+0xde (ipipe_update_hostrt+0x7d)
| #end 0x80000001 -4017 0.097 ipipe_trace_end+0x19 (__ipipe_notify_kevent+0xaa)
#func -4017 0.089 raw_notifier_call_chain+0x0 (timekeeping_update.constprop.8+0x32)
#func -4016 0.090 notifier_call_chain+0x0 (raw_notifier_call_chain+0x16)
#func -4016 0.087 __ipipe_spin_unlock_debug+0x0 (do_timer+0x1f5)
#func -4016 0.088 _raw_spin_unlock_irqrestore+0x0 (do_timer+0x204)
#func -4016 0.087 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -4016 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -4016 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -4016 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -4016 0.091 calc_global_load+0x0 (do_timer+0x20c)
#func -4016 0.090 update_root_process_times+0x0 (tick_sched_timer+0x3f)
#func -4016 0.087 update_process_times+0x0 (update_root_process_times+0x57)
#func -4015 0.110 account_process_tick+0x0 (update_process_times+0x2d)
#func -4015 0.109 account_system_time+0x0 (account_process_tick+0x3d)
#func -4015 0.098 cpuacct_account_field+0x0 (account_system_time+0xc6)
#func -4015 0.087 acct_account_cputime+0x0 (account_system_time+0xce)
#func -4015 0.109 __acct_update_integrals+0x0 (acct_account_cputime+0x1c)
#func -4015 0.097 jiffies_to_timeval+0x0 (__acct_update_integrals+0x73)
#func -4015 0.087 ipipe_restore_root+0x0 (__acct_update_integrals+0x93)
#func -4015 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -4015 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -4015 0.107 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -4014 0.089 hrtimer_run_queues+0x0 (update_process_times+0x32)
#func -4014 0.109 raise_softirq+0x0 (update_process_times+0x3c)
#func -4014 0.088 ipipe_restore_root+0x0 (raise_softirq+0xda)
#func -4014 0.090 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -4014 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -4014 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -4014 0.130 rcu_check_callbacks+0x0 (update_process_times+0x47)
#func -4014 0.149 cpu_needs_another_gp+0x0 (rcu_check_callbacks+0x21b)
#func -4014 0.117 cpu_needs_another_gp+0x0 (rcu_check_callbacks+0x21b)
#func -4013 0.101 scheduler_tick+0x0 (update_process_times+0x66)
#func -4013 0.116 _raw_spin_lock+0x0 (scheduler_tick+0x44)
#func -4013 0.149 update_rq_clock.part.71+0x0 (scheduler_tick+0x1b8)
#func -4013 0.101 task_tick_fair+0x0 (scheduler_tick+0x151)
#func -4013 0.101 update_curr+0x0 (task_tick_fair+0x2ab)
#func -4013 0.088 update_min_vruntime+0x0 (update_curr+0x79)
#func -4013 0.123 cpuacct_charge+0x0 (update_curr+0x9f)
#func -4013 0.137 update_cfs_rq_blocked_load+0x0 (task_tick_fair+0x212)
#func -4013 0.127 update_cfs_shares+0x0 (task_tick_fair+0x21a)
#func -4012 0.098 update_curr+0x0 (update_cfs_shares+0xd0)
#func -4012 0.091 update_min_vruntime+0x0 (update_curr+0x79)
#func -4012 0.095 account_entity_dequeue+0x0 (update_cfs_shares+0x87)
#func -4012 0.094 account_entity_enqueue+0x0 (update_cfs_shares+0xa4)
#func -4012 0.122 update_curr+0x0 (task_tick_fair+0x2ab)
#func -4012 0.118 update_cfs_rq_blocked_load+0x0 (task_tick_fair+0x212)
#func -4012 0.176 update_cfs_shares+0x0 (task_tick_fair+0x21a)
#func -4012 0.087 trigger_load_balance+0x0 (scheduler_tick+0x185)
#func -4012 0.104 run_posix_cpu_timers+0x0 (update_process_times+0x6e)
#func -4011 0.091 profile_tick+0x0 (tick_sched_timer+0x49)
#func -4011 0.090 hrtimer_forward+0x0 (tick_sched_timer+0x5b)
#func -4011 0.092 _raw_spin_lock+0x0 (__run_hrtimer+0xa1)
#func -4011 0.120 enqueue_hrtimer+0x0 (__run_hrtimer+0xbc)
#func -4011 0.090 tick_program_event+0x0 (hrtimer_interrupt+0x136)
#func -4011 0.090 clockevents_program_event+0x0 (tick_program_event+0x24)
#func -4011 0.096 ktime_get+0x0 (clockevents_program_event+0x39)
#func -4011 0.161 lapic_next_deadline+0x0 (clockevents_program_event+0x6b)
#func -4011 0.102 irq_exit+0x0 (smp_apic_timer_interrupt+0x5a)
#func -4010 0.112 do_softirq+0x0 (irq_exit+0x7d)
#func -4010 0.089 __do_softirq+0x0 (call_softirq+0x1e)
#func -4010 0.092 msecs_to_jiffies+0x0 (__do_softirq+0x20)
#func -4010 0.088 ipipe_unstall_root+0x0 (__do_softirq+0xa8)
| #begin 0x80000000 -4010 0.092 ipipe_unstall_root+0x1c (__do_softirq+0xa8)
| #func -4010 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -4010 0.090 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -4010 0.096 run_timer_softirq+0x0 (__do_softirq+0xf7)
+func -4010 0.096 hrtimer_run_pending+0x0 (run_timer_softirq+0x24)
+func -4010 0.128 _raw_spin_lock_irq+0x0 (run_timer_softirq+0x3d)
#func -4009 0.110 ipipe_unstall_root+0x0 (run_timer_softirq+0x18f)
| #begin 0x80000000 -4009 0.091 ipipe_unstall_root+0x1c (run_timer_softirq+0x18f)
| #func -4009 0.142 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -4009 0.090 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -4009 0.102 rcu_bh_qs+0x0 (__do_softirq+0x11b)
#func -4009 0.107 __local_bh_enable+0x0 (__do_softirq+0x17a)
#func -4009 0.090 ipipe_restore_root+0x0 (do_softirq+0x8a)
#func -4009 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -4009 0.121 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -4009 0.104 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -4008 0.121 rcu_irq_exit+0x0 (irq_exit+0x50)
#func -4008 0.090 ipipe_restore_root+0x0 (rcu_irq_exit+0x8a)
#func -4008 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -4008 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -4008 0.300 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| +end 0x000000ef -4008 0.482 ipipe_trace_end+0x19 (apic_timer_interrupt+0x8f)
+func -4007 0.107 xnpod_enable_timesource+0x0 [xeno_nucleus] (xnpod_init+0x3d5 [xeno_nucleus])
| +begin 0x80000000 -4007 0.249 xnpod_enable_timesource+0x28 [xeno_nucleus] (xnpod_init+0x3d5 [xeno_nucleus])
| *+func -4007 0.715 xnintr_init+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x11b [xeno_nucleus])
| *+func -4006 0.112 __ipipe_restore_head+0x0 (xnpod_enable_timesource+0x16c [xeno_nucleus])
| +end 0x80000000 -4006 0.088 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -4006 0.102 do_gettimeofday+0x0 (xnpod_enable_timesource+0x175 [xeno_nucleus])
+func -4006 0.087 getnstimeofday+0x0 (do_gettimeofday+0x1a)
+func -4006 0.232 __getnstimeofday+0x0 (getnstimeofday+0xe)
+func -4006 0.197 xnarch_tsc_to_ns+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x1a1 [xeno_nucleus])
+func -4005 0.177 rthal_timer_request+0x0 (xnpod_enable_timesource+0x227 [xeno_nucleus])
+func -4005 0.194 ipipe_timer_start+0x0 (rthal_timer_request+0x19)
+func -4005 0.092 ipipe_critical_enter+0x0 (ipipe_timer_start+0x44)
| +begin 0x80000001 -4005 0.142 ipipe_critical_enter+0x229 (ipipe_timer_start+0x44)
| +func -4005 0.101 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -4005 3.056 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -4002 0.120 ipipe_request_irq+0x0 (ipipe_timer_start+0x79)
| +func -4001 0.195 __ipipe_spin_lock_irqsave+0x0 (ipipe_request_irq+0x48)
| #func -4001 0.192 __ipipe_spin_unlock_irqrestore+0x0 (ipipe_request_irq+0x6e)
| +func -4001 1.108 ipipe_critical_exit+0x0 (ipipe_timer_start+0x9a)
| +end 0x80000001 -4000 0.192 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
+func -4000 0.418 irq_to_desc+0x0 (ipipe_timer_start+0xa1)
+func -3999 0.130 rthal_irq_request+0x0 (rthal_timer_request+0x97)
+func -3999 0.123 ipipe_virtualize_irq+0x0 (rthal_irq_request+0x30)
+func -3999 0.107 ipipe_request_irq+0x0 (ipipe_virtualize_irq+0x13)
+func -3999 0.128 __ipipe_spin_lock_irqsave+0x0 (ipipe_request_irq+0x48)
| +begin 0x80000001 -3999 0.224 __ipipe_spin_lock_irqsave+0x93 (ipipe_request_irq+0x48)
| #func -3999 0.131 __ipipe_spin_unlock_irqrestore+0x0 (ipipe_request_irq+0x6e)
| +end 0x80000001 -3998 0.137 ipipe_trace_end+0x19 (__ipipe_spin_unlock_irqrestore+0x39)
| +begin 0x80000000 -3998 0.263 xnpod_enable_timesource+0x244 [xeno_nucleus] (xnpod_init+0x3d5 [xeno_nucleus])
| *+func -3998 0.209 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x395 [xeno_nucleus])
| *+func -3998 0.240 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *+func -3998 0.201 xntimer_next_local_shot+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x225 [xeno_nucleus])
| *+event tick@-3997 -3997 0.194 xntimer_next_local_shot+0x63 [xeno_nucleus] (xntimer_start_aperiodic+0x225 [xeno_nucleus])
| *+func -3997 0.162 ipipe_timer_set+0x0 (xntimer_next_local_shot+0x6b [xeno_nucleus])
| *+func -3997 0.124 ipipe_raise_irq+0x0 (ipipe_timer_set+0x7f)
| *+func -3997 0.164 __ipipe_handle_irq+0x0 (ipipe_raise_irq+0x3b)
| *+func -3997 0.176 __ipipe_dispatch_irq+0x0 (__ipipe_handle_irq+0x8d)
| *+func -3997 0.262 __ipipe_set_irq_pending+0x0 (__ipipe_dispatch_irq+0x564)
| *+func -3996 0.150 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x329 [xeno_nucleus])
| *+func -3996 0.130 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *+func -3996 0.234 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x190 [xeno_nucleus])
| *+func -3996 0.177 __ipipe_restore_head+0x0 (xnpod_enable_timesource+0x378 [xeno_nucleus])
| +func -3996 0.200 __ipipe_do_sync_pipeline+0x0 (__ipipe_sync_pipeline+0x38)
| + func -3995 0.298 __ipipe_do_sync_stage+0x0 (__ipipe_do_sync_pipeline+0x97)
| # func -3995 0.344 xnintr_clock_handler+0x0 [xeno_nucleus] (__ipipe_do_sync_stage+0x103)
| # func -3995 0.227 xntimer_tick_aperiodic+0x0 [xeno_nucleus] (xnintr_clock_handler+0x142 [xeno_nucleus])
| # func -3995 0.131 xntimer_next_local_shot+0x0 [xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # event tick@996052-3994 0.108 xntimer_next_local_shot+0x63 [xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # func -3994 0.116 ipipe_timer_set+0x0 (xntimer_next_local_shot+0x6b [xeno_nucleus])
| # func -3994 0.524 lapic_next_deadline+0x0 (ipipe_timer_set+0x6a)
| # func -3994 0.208 __xnpod_schedule+0x0 [xeno_nucleus] (xnintr_clock_handler+0x305 [xeno_nucleus])
| # [ 2027] -<?>- -1 -3993 0.181 __xnpod_schedule+0x168 [xeno_nucleus] (xnintr_clock_handler+0x305 [xeno_nucleus])
| # func -3993 0.098 ipipe_send_ipi+0x0 (__xnpod_schedule+0xab5 [xeno_nucleus])
| # func -3993 0.204 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| # func -3993 0.316 xnsched_pick_next+0x0 [xeno_nucleus] (__xnpod_schedule+0x2e5 [xeno_nucleus])
| # func -3993 0.137 xnintr_host_tick+0x0 [xeno_nucleus] (__xnpod_schedule+0x937 [xeno_nucleus])
| # func -3993 0.307 __ipipe_set_irq_pending+0x0 (xnintr_host_tick+0x3a [xeno_nucleus])
| +func -3992 0.178 __ipipe_do_sync_stage+0x0 (__ipipe_do_sync_pipeline+0x115)
| #end 0x80000000 -3992 0.121 ipipe_trace_end+0x19 (__ipipe_do_sync_stage+0xe8)
#func -3992 0.091 __ipipe_do_IRQ+0x0 (__ipipe_do_sync_stage+0x1c8)
#func -3992 0.098 __ipipe_get_ioapic_irq_vector+0x0 (__ipipe_do_IRQ+0x24)
#func -3992 0.108 smp_apic_timer_interrupt+0x0 (__ipipe_do_IRQ+0x79)
#func -3992 0.114 irq_enter+0x0 (smp_apic_timer_interrupt+0x2a)
#func -3992 0.157 rcu_irq_enter+0x0 (irq_enter+0x17)
#func -3991 0.097 ipipe_restore_root+0x0 (rcu_irq_enter+0x92)
#func -3991 0.109 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -3991 0.152 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -3991 0.136 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -3991 0.098 exit_idle+0x0 (smp_apic_timer_interrupt+0x2f)
#func -3991 0.107 hrtimer_interrupt+0x0 (smp_apic_timer_interrupt+0x55)
#func -3991 0.122 _raw_spin_lock+0x0 (hrtimer_interrupt+0x4f)
#func -3991 0.169 ktime_get_update_offsets+0x0 (hrtimer_interrupt+0x81)
#func -3990 0.107 tick_program_event+0x0 (hrtimer_interrupt+0x136)
#func -3990 0.096 clockevents_program_event+0x0 (tick_program_event+0x24)
#func -3990 0.216 ktime_get+0x0 (clockevents_program_event+0x39)
#func -3990 0.123 xnarch_next_htick_shot+0x0 [xeno_nucleus] (clockevents_program_event+0x6b)
| #begin 0x80000000 -3990 0.218 xnarch_next_htick_shot+0x2b [xeno_nucleus] (clockevents_program_event+0x6b)
| *#func -3990 1.936 __xnlock_spin+0x0 [xeno_nucleus] (xnarch_next_htick_shot+0x28d [xeno_nucleus])
| *#func -3988 0.120 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnarch_next_htick_shot+0xff [xeno_nucleus])
| *#func -3988 0.160 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *#func -3987 0.144 xntimer_next_local_shot+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x225 [xeno_nucleus])
| *#event tick@-56 -3987 0.103 xntimer_next_local_shot+0x63 [xeno_nucleus] (xntimer_start_aperiodic+0x225 [xeno_nucleus])
| *#func -3987 0.127 ipipe_timer_set+0x0 (xntimer_next_local_shot+0x6b [xeno_nucleus])
| *#func -3987 0.315 lapic_next_deadline+0x0 (ipipe_timer_set+0x6a)
| *#func -3987 0.131 __ipipe_restore_head+0x0 (xnarch_next_htick_shot+0x142 [xeno_nucleus])
| #end 0x80000000 -3987 0.158 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
#func -3986 0.162 irq_exit+0x0 (smp_apic_timer_interrupt+0x5a)
#func -3986 0.149 rcu_irq_exit+0x0 (irq_exit+0x50)
#func -3986 0.116 ipipe_restore_root+0x0 (rcu_irq_exit+0x8a)
#func -3986 0.109 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -3986 0.143 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -3986 0.258 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| +end 0x80000000 -3985 0.188 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -3985 0.118 rthal_timer_request+0x0 (xnpod_enable_timesource+0x227 [xeno_nucleus])
+func -3985 0.242 ipipe_timer_start+0x0 (rthal_timer_request+0x19)
+func -3985 0.109 ipipe_critical_enter+0x0 (ipipe_timer_start+0x44)
| +begin 0x80000001 -3985 0.164 ipipe_critical_enter+0x229 (ipipe_timer_start+0x44)
| +func -3985 0.121 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -3985 3935.843 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -49 0.982 ipipe_critical_exit+0x0 (ipipe_timer_start+0x9a)
| +end 0x80000001 -48 22.585 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
| +func -25 0.211 __ipipe_handle_irq+0x0 (apic_timer_interrupt+0x7c)
| +func -25 0.128 __ipipe_dispatch_irq+0x0 (__ipipe_handle_irq+0x8d)
| +func -25 0.136 __ipipe_ack_hrtimer_irq+0x0 (__ipipe_dispatch_irq+0x357)
| +func -25 0.209 lapic_itimer_ack+0x0 (__ipipe_ack_hrtimer_irq+0x4b)
| # func -24 0.220 xnintr_clock_handler+0x0 [xeno_nucleus] (__ipipe_dispatch_irq+0x14c)
| # func -24 0.120 xntimer_tick_aperiodic+0x0 [xeno_nucleus] (xnintr_clock_handler+0x142 [xeno_nucleus])
| # func -24 0.104 xntimer_next_local_shot+0x0 [xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # event tick@996052 -24 0.088 xntimer_next_local_shot+0x63 [xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # func -24 0.107 ipipe_timer_set+0x0 (xntimer_next_local_shot+0x6b [xeno_nucleus])
| # func -24 0.282 lapic_next_deadline+0x0 (ipipe_timer_set+0x6a)
| # func -24 0.183 __ipipe_set_irq_pending+0x0 (xnintr_clock_handler+0x274 [xeno_nucleus])
| +func -23 0.154 __ipipe_do_sync_pipeline+0x0 (__ipipe_dispatch_irq+0x1f2)
| +func -23 0.129 __ipipe_do_sync_stage+0x0 (__ipipe_do_sync_pipeline+0x115)
| #end 0x80000000 -23 0.104 ipipe_trace_end+0x19 (__ipipe_do_sync_stage+0xe8)
#func -23 0.088 __ipipe_do_IRQ+0x0 (__ipipe_do_sync_stage+0x1c8)
#func -23 0.091 __ipipe_get_ioapic_irq_vector+0x0 (__ipipe_do_IRQ+0x24)
#func -23 0.085 smp_apic_timer_interrupt+0x0 (__ipipe_do_IRQ+0x79)
#func -23 0.087 irq_enter+0x0 (smp_apic_timer_interrupt+0x2a)
#func -23 0.109 rcu_irq_enter+0x0 (irq_enter+0x17)
#func -22 0.087 ipipe_restore_root+0x0 (rcu_irq_enter+0x92)
#func -22 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -22 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -22 0.129 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -22 0.096 exit_idle+0x0 (smp_apic_timer_interrupt+0x2f)
#func -22 0.089 hrtimer_interrupt+0x0 (smp_apic_timer_interrupt+0x55)
#func -22 0.092 _raw_spin_lock+0x0 (hrtimer_interrupt+0x4f)
#func -22 0.144 ktime_get_update_offsets+0x0 (hrtimer_interrupt+0x81)
#func -22 0.101 __run_hrtimer+0x0 (hrtimer_interrupt+0xf7)
#func -22 0.117 __remove_hrtimer+0x0 (__run_hrtimer+0x67)
#func -21 0.091 tick_sched_timer+0x0 (__run_hrtimer+0x91)
#func -21 0.117 ktime_get+0x0 (tick_sched_timer+0x1f)
#func -21 0.094 _raw_spin_lock+0x0 (tick_sched_timer+0x8f)
#func -21 0.101 do_timer+0x0 (tick_sched_timer+0xcb)
#func -21 0.144 _raw_spin_lock_irqsave+0x0 (do_timer+0x2c)
#func -21 0.095 ntp_tick_length+0x0 (do_timer+0x83)
#func -21 0.122 ntp_tick_length+0x0 (do_timer+0x2ec)
#func -21 0.090 timekeeping_update.constprop.8+0x0 (do_timer+0x1e1)
#func -21 0.092 update_vsyscall+0x0 (timekeeping_update.constprop.8+0x1d)
#func -20 0.103 set_normalized_timespec+0x0 (update_vsyscall+0xf7)
#func -20 0.089 ipipe_update_hostrt+0x0 (update_vsyscall+0x12c)
#func -20 0.087 ipipe_root_only+0x0 (ipipe_update_hostrt+0x28)
| #begin 0x80000001 -20 0.120 ipipe_root_only+0xa3 (ipipe_update_hostrt+0x28)
| #end 0x80000001 -20 0.096 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -20 0.094 __ipipe_notify_kevent+0x0 (ipipe_update_hostrt+0x7d)
#func -20 0.087 ipipe_root_only+0x0 (__ipipe_notify_kevent+0x21)
| #begin 0x80000001 -20 0.118 ipipe_root_only+0xa3 (__ipipe_notify_kevent+0x21)
| #end 0x80000001 -20 0.097 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| #begin 0x80000001 -20 0.103 __ipipe_notify_kevent+0xcb (ipipe_update_hostrt+0x7d)
| #end 0x80000001 -19 0.092 ipipe_trace_end+0x19 (__ipipe_notify_kevent+0xf2)
#func -19 0.107 ipipe_kevent_hook+0x0 (__ipipe_notify_kevent+0x81)
#func -19 0.084 hostrt_event+0x0 [xeno_nucleus] (ipipe_kevent_hook+0x1f)
#func -19 0.088 __ipipe_spin_lock_irqsave+0x0 (hostrt_event+0x19 [xeno_nucleus])
| #begin 0x80000001 -19 0.147 __ipipe_spin_lock_irqsave+0x93 (hostrt_event+0x19 [xeno_nucleus])
| #func -19 0.094 __ipipe_spin_unlock_irqrestore+0x0 (hostrt_event+0x87 [xeno_nucleus])
| #end 0x80000001 -19 0.095 ipipe_trace_end+0x19 (__ipipe_spin_unlock_irqrestore+0x39)
| #begin 0x80000001 -19 0.090 __ipipe_notify_kevent+0xde (ipipe_update_hostrt+0x7d)
| #end 0x80000001 -19 0.095 ipipe_trace_end+0x19 (__ipipe_notify_kevent+0xaa)
#func -19 0.088 raw_notifier_call_chain+0x0 (timekeeping_update.constprop.8+0x32)
#func -18 0.089 notifier_call_chain+0x0 (raw_notifier_call_chain+0x16)
#func -18 0.085 __ipipe_spin_unlock_debug+0x0 (do_timer+0x1f5)
#func -18 0.087 _raw_spin_unlock_irqrestore+0x0 (do_timer+0x204)
#func -18 0.085 ipipe_restore_root+0x0 (_raw_spin_unlock_irqrestore+0x2a)
#func -18 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -18 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -18 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -18 0.092 calc_global_load+0x0 (do_timer+0x20c)
#func -18 0.087 update_root_process_times+0x0 (tick_sched_timer+0x3f)
#func -18 0.087 update_process_times+0x0 (update_root_process_times+0x57)
#func -18 0.101 account_process_tick+0x0 (update_process_times+0x2d)
#func -17 0.102 account_system_time+0x0 (account_process_tick+0x3d)
#func -17 0.100 cpuacct_account_field+0x0 (account_system_time+0xc6)
#func -17 0.084 acct_account_cputime+0x0 (account_system_time+0xce)
#func -17 0.109 __acct_update_integrals+0x0 (acct_account_cputime+0x1c)
#func -17 0.100 jiffies_to_timeval+0x0 (__acct_update_integrals+0x73)
#func -17 0.085 ipipe_restore_root+0x0 (__acct_update_integrals+0x93)
#func -17 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -17 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -17 0.107 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -17 0.094 hrtimer_run_queues+0x0 (update_process_times+0x32)
#func -16 0.109 raise_softirq+0x0 (update_process_times+0x3c)
#func -16 0.087 ipipe_restore_root+0x0 (raise_softirq+0xda)
#func -16 0.087 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -16 0.120 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -16 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -16 0.111 rcu_check_callbacks+0x0 (update_process_times+0x47)
#func -16 0.125 cpu_needs_another_gp+0x0 (rcu_check_callbacks+0x21b)
#func -16 0.100 cpu_needs_another_gp+0x0 (rcu_check_callbacks+0x21b)
#func -16 0.088 scheduler_tick+0x0 (update_process_times+0x66)
#func -16 0.117 _raw_spin_lock+0x0 (scheduler_tick+0x44)
#func -15 0.161 update_rq_clock.part.71+0x0 (scheduler_tick+0x1b8)
#func -15 0.090 task_tick_fair+0x0 (scheduler_tick+0x151)
#func -15 0.136 update_curr+0x0 (task_tick_fair+0x2ab)
#func -15 0.089 update_min_vruntime+0x0 (update_curr+0x79)
#func -15 0.121 cpuacct_charge+0x0 (update_curr+0x9f)
#func -15 0.127 update_cfs_rq_blocked_load+0x0 (task_tick_fair+0x212)
#func -15 0.111 update_cfs_shares+0x0 (task_tick_fair+0x21a)
#func -15 0.098 update_curr+0x0 (update_cfs_shares+0xd0)
#func -14 0.088 update_min_vruntime+0x0 (update_curr+0x79)
#func -14 0.094 account_entity_dequeue+0x0 (update_cfs_shares+0x87)
#func -14 0.097 account_entity_enqueue+0x0 (update_cfs_shares+0xa4)
#func -14 0.120 update_curr+0x0 (task_tick_fair+0x2ab)
#func -14 0.125 update_cfs_rq_blocked_load+0x0 (task_tick_fair+0x212)
#func -14 0.187 update_cfs_shares+0x0 (task_tick_fair+0x21a)
#func -14 0.087 trigger_load_balance+0x0 (scheduler_tick+0x185)
#func -14 0.110 run_posix_cpu_timers+0x0 (update_process_times+0x6e)
#func -14 0.097 profile_tick+0x0 (tick_sched_timer+0x49)
#func -13 0.089 hrtimer_forward+0x0 (tick_sched_timer+0x5b)
#func -13 0.092 _raw_spin_lock+0x0 (__run_hrtimer+0xa1)
#func -13 0.104 enqueue_hrtimer+0x0 (__run_hrtimer+0xbc)
#func -13 0.087 tick_program_event+0x0 (hrtimer_interrupt+0x136)
#func -13 0.088 clockevents_program_event+0x0 (tick_program_event+0x24)
#func -13 0.108 ktime_get+0x0 (clockevents_program_event+0x39)
#func -13 0.095 xnarch_next_htick_shot+0x0 [xeno_nucleus] (clockevents_program_event+0x6b)
| #begin 0x80000000 -13 0.131 xnarch_next_htick_shot+0x2b [xeno_nucleus] (clockevents_program_event+0x6b)
| *#func -13 0.100 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnarch_next_htick_shot+0xff [xeno_nucleus])
| *#func -13 0.140 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *#func -12 0.111 xntimer_next_local_shot+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x225 [xeno_nucleus])
| *#event tick@3939 -12 0.109 xntimer_next_local_shot+0x63 [xeno_nucleus] (xntimer_start_aperiodic+0x225 [xeno_nucleus])
| *#func -12 0.098 ipipe_timer_set+0x0 (xntimer_next_local_shot+0x6b [xeno_nucleus])
| *#func -12 0.198 lapic_next_deadline+0x0 (ipipe_timer_set+0x6a)
| *#func -12 0.101 __ipipe_restore_head+0x0 (xnarch_next_htick_shot+0x142 [xeno_nucleus])
| #end 0x80000000 -12 0.102 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
#func -12 0.118 irq_exit+0x0 (smp_apic_timer_interrupt+0x5a)
#func -12 0.134 do_softirq+0x0 (irq_exit+0x7d)
#func -11 0.097 __do_softirq+0x0 (call_softirq+0x1e)
#func -11 0.100 msecs_to_jiffies+0x0 (__do_softirq+0x20)
#func -11 0.095 ipipe_unstall_root+0x0 (__do_softirq+0xa8)
| #begin 0x80000000 -11 0.091 ipipe_unstall_root+0x1c (__do_softirq+0xa8)
| #func -11 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -11 0.097 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -11 0.089 run_timer_softirq+0x0 (__do_softirq+0xf7)
+func -11 0.096 hrtimer_run_pending+0x0 (run_timer_softirq+0x24)
+func -11 0.136 _raw_spin_lock_irq+0x0 (run_timer_softirq+0x3d)
#func -11 0.089 ipipe_unstall_root+0x0 (run_timer_softirq+0x18f)
| #begin 0x80000000 -10 0.091 ipipe_unstall_root+0x1c (run_timer_softirq+0x18f)
| #func -10 0.140 ipipe_root_only+0x0 (ipipe_unstall_root+0x21)
| +end 0x80000000 -10 0.090 ipipe_trace_end+0x19 (ipipe_unstall_root+0x59)
+func -10 0.101 rcu_bh_qs+0x0 (__do_softirq+0x11b)
#func -10 0.105 __local_bh_enable+0x0 (__do_softirq+0x17a)
#func -10 0.090 ipipe_restore_root+0x0 (do_softirq+0x8a)
#func -10 0.089 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -10 0.117 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -10 0.105 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
#func -9 0.107 rcu_irq_exit+0x0 (irq_exit+0x50)
#func -9 0.087 ipipe_restore_root+0x0 (rcu_irq_exit+0x8a)
#func -9 0.088 ipipe_root_only+0x0 (ipipe_restore_root+0x12)
| #begin 0x80000001 -9 0.118 ipipe_root_only+0xa3 (ipipe_restore_root+0x12)
| #end 0x80000001 -9 0.291 ipipe_trace_end+0x19 (ipipe_root_only+0x86)
| +end 0x000000ef -9 0.210 ipipe_trace_end+0x19 (apic_timer_interrupt+0x8f)
+func -9 0.151 irq_to_desc+0x0 (ipipe_timer_start+0xa1)
| +begin 0x80000000 -8 0.142 xnpod_enable_timesource+0x244 [xeno_nucleus] (xnpod_init+0x3d5 [xeno_nucleus])
| *+func -8 0.108 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x395 [xeno_nucleus])
| *+func -8 0.177 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *+func -8 0.107 ipipe_send_ipi+0x0 (xntimer_start_aperiodic+0x211 [xeno_nucleus])
| *+func -8 0.155 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| *+func -8 0.107 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x329 [xeno_nucleus])
| *+func -8 0.105 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *+func -8 0.181 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x190 [xeno_nucleus])
| *+func -7 0.104 __ipipe_restore_head+0x0 (xnpod_enable_timesource+0x378 [xeno_nucleus])
| +end 0x80000000 -7 0.105 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func -7 0.098 rthal_timer_request+0x0 (xnpod_enable_timesource+0x227 [xeno_nucleus])
+func -7 0.196 ipipe_timer_start+0x0 (rthal_timer_request+0x19)
+func -7 0.091 ipipe_critical_enter+0x0 (ipipe_timer_start+0x44)
| +begin 0x80000001 -7 0.143 ipipe_critical_enter+0x229 (ipipe_timer_start+0x44)
| +func -7 0.097 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
| +func -7 3.857 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| +func -3 0.998 ipipe_critical_exit+0x0 (ipipe_timer_start+0x9a)
| +end 0x80000001 -2 0.116 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
+func -2 0.122 irq_to_desc+0x0 (ipipe_timer_start+0xa1)
| +begin 0x80000000 -1 0.208 xnpod_enable_timesource+0x244 [xeno_nucleus] (xnpod_init+0x3d5 [xeno_nucleus])
| *+func -1 0.122 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x395 [xeno_nucleus])
| *+func -1 0.171 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *+func -1 0.123 ipipe_send_ipi+0x0 (xntimer_start_aperiodic+0x211 [xeno_nucleus])
| *+func -1 0.168 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
| *+func -1 0.116 xntimer_start_aperiodic+0x0 [xeno_nucleus] (xnpod_enable_timesource+0x329 [xeno_nucleus])
| *+func 0 0.118 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x179 [xeno_nucleus])
| *+func 0 0.201 xnarch_ns_to_tsc+0x0 [xeno_nucleus] (xntimer_start_aperiodic+0x190 [xeno_nucleus])
| *+func 0 0.130 __ipipe_restore_head+0x0 (xnpod_enable_timesource+0x378 [xeno_nucleus])
| +end 0x80000000 0 0.130 ipipe_trace_end+0x19 (__ipipe_restore_head+0x6c)
+func 0 0.109 rthal_timer_request+0x0 (xnpod_enable_timesource+0x227 [xeno_nucleus])
+func 0 0.191 ipipe_timer_start+0x0 (rthal_timer_request+0x19)
+func 0 0.115 ipipe_critical_enter+0x0 (ipipe_timer_start+0x44)
>| +begin 0x80000001 0 0.160 ipipe_critical_enter+0x229 (ipipe_timer_start+0x44)
:| +func 0 0.127 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
:| +func 0! 19678406.428 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
:| +func 19678406 0.824 ipipe_send_ipi+0x0 (ipipe_critical_enter+0x1da)
:| +func 19678407+ 3.476 flat_send_IPI_mask+0x0 (ipipe_send_ipi+0x56)
:| +func 19678411+ 1.008 ipipe_critical_exit+0x0 (ipipe_timer_start+0x9a)
<| +end 0x80000001 19678412 0.490 ipipe_trace_end+0x19 (ipipe_critical_exit+0x8a)
| +begin 0x000000ef 19678412 0.000 apic_timer_interrupt+0x6d (ipipe_critical_exit+0x76)
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 13:45 ` Jeroen Van den Keybus
@ 2014-04-23 14:07 ` Gilles Chanteperdrix
2014-04-23 20:54 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-23 14:07 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai
On 04/23/2014 03:45 PM, Jeroen Van den Keybus wrote:
> I've attached an I-trace from what happens when 'modprobe xeno_native'
> stalls. I could use some hints as where to start looking into this
> issue. Right now, I would say I have a look at which code paths are
> traversed and not when CONFIG_IPIPE_TRACE is unset and set
> respectively.
Do you get the same behaviour if you only enable the tracer after
loading the xeno_native module?
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 14:07 ` Gilles Chanteperdrix
@ 2014-04-23 20:54 ` Jeroen Van den Keybus
2014-04-23 20:56 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-23 20:54 UTC (permalink / raw)
To: Gilles Chanteperdrix, xenomai
> Do you get the same behaviour if you only enable the tracer after loading
> the xeno_native module?
Yes. but I just found out that the intel_idle driver option is causing
the stalls. Limiting to C1 state (using intel_idle.max_cstate=1)
apparently isn't cutting it and other nasty things must be happening
in this driver. After turning it off, the random stalling stopped.
Good.
Apart from that, I somehow managed to stop the high latencies from
occurring, I then saved the .config and then double-checked that (by
turning it off again) turning _on_ 'Run-time PM core functionality'
had solved the problem.
Surprisingly, the latencies are high again. But even with the stored
.config I have not been able to recreate the situation with tracer off
and normal latencies when reading xenomai/stat.
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 20:54 ` Jeroen Van den Keybus
@ 2014-04-23 20:56 ` Gilles Chanteperdrix
2014-04-23 21:39 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-23 20:56 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai
On 04/23/2014 10:54 PM, Jeroen Van den Keybus wrote:
> Surprisingly, the latencies are high again. But even with the stored
> .config I have not been able to recreate the situation with tracer off
> and normal latencies when reading xenomai/stat.
You mean with tracer on? Then you have a trace for the high latency?
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 20:56 ` Gilles Chanteperdrix
@ 2014-04-23 21:39 ` Jeroen Van den Keybus
2014-04-23 22:25 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-23 21:39 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
> You mean with tracer on? Then you have a trace for the high latency?
No, unfortunately. Every time I enable the tracer, all is well, no latencies.
Since this is so elusive, I started to install all kernels I built
(and keep the configs along). I just found that one of them does not
have the high latencies and does not have the I-pipe debugging on
either. Hooray.
However, when I powerdown the machine and reboot this very kernel, the
problem reappears.
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 21:39 ` Jeroen Van den Keybus
@ 2014-04-23 22:25 ` Gilles Chanteperdrix
2014-04-24 8:57 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-23 22:25 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai
On 04/23/2014 11:39 PM, Jeroen Van den Keybus wrote:
>> You mean with tracer on? Then you have a trace for the high latency?
>
> No, unfortunately. Every time I enable the tracer, all is well, no latencies.
>
> Since this is so elusive, I started to install all kernels I built
> (and keep the configs along). I just found that one of them does not
> have the high latencies and does not have the I-pipe debugging on
> either. Hooray.
>
> However, when I powerdown the machine and reboot this very kernel, the
> problem reappears.
Could you put a printk in the function vfile_stat_rewind to see if it
gets called (more than once) when the problem happens?
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-23 22:25 ` Gilles Chanteperdrix
@ 2014-04-24 8:57 ` Jeroen Van den Keybus
2014-04-24 14:46 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-24 8:57 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
> Could you put a printk in the function vfile_stat_rewind to see if it
> gets called (more than once) when the problem happens?
I patched as follows:
static int vfile_stat_rewind(struct xnvfile_snapshot_iterator *it)
{
struct vfile_stat_priv *priv = xnvfile_iterator_priv(it);
int irqnr;
int ret, irq;
/*
* The activity numbers on each valid interrupt descriptor are
* grouped under a pseudo-thread.
*/
priv->curr = getheadq(&nkpod->threadq);
irq = priv->irq;
priv->irq = 0;
irqnr = xnintr_query_init(&priv->intr_it) * XNARCH_NR_CPUS;
ret = irqnr + countq(&nkpod->threadq);
printk(KERN_DEBUG "%s: priv=%p, ->curr=%p, ->irq=(%d), irqnr=%d, ret=%d\n",
__FUNCTION__, priv, priv->curr, irq, irqnr, ret);
return ret;
}
The result (3 first accesses are without latency running, the 2 last
ones with latency running - each time 120 µs delay)
[ 173.098667] vfile_stat_rewind: priv=ffff880211863228,
->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=264
[ 181.547424] vfile_stat_rewind: priv=ffff880211863b28,
->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=264
[ 183.002400] vfile_stat_rewind: priv=ffff880211863228,
->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=264
[ 201.475071] vfile_stat_rewind: priv=ffff880211863228,
->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=266
[ 209.432070] vfile_stat_rewind: priv=ffff8802118631a8,
->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=266
So vfile... is called exactly once upon issuing 'cat /proc/xenomai/stat'.
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-24 8:57 ` Jeroen Van den Keybus
@ 2014-04-24 14:46 ` Jeroen Van den Keybus
2014-04-25 8:15 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-24 14:46 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
I've been hammering on this stat entry with various tracing options to
try and catch a trace.
I did eventually catch one, but the function tracing was off at the time:
:| # [ 2738] samplin 99 -140 0.050 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
:| # event tick@-40 -140 0.181 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
:| # [ 0] -<?>- -1 -140 0.275 __xnpod_schedule+0x11d
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
:| # [ 2738] samplin 99 -140 0.456 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
:| # [ 2738] samplin 99 -139 0.307 __xnpod_schedule+0x11d
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
:| # [ 0] -<?>- -1 -139! 105.640 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
:| # [ 2738] samplin 99 -33 0.081 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
:| # event tick@59 -33! 32.250 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
:| # [ 0] -<?>- -1 -1 0.349 __xnpod_schedule+0x11d
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
:| # [ 2738] samplin 99 0 0.974 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
< + freeze 0x00009088 0 45.490 xnshadow_sys_trace+0xed
[xeno_nucleus] (hisyscall_event+0x1a8 [xeno_nucleus])
| # [ 2738] samplin 99 45 0.350 __xnpod_schedule+0x11d
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
| # [ 0] -<?>- -1 45 52.196 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 98 0.078 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
| # event tick@159 98 1.790 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # [ 0] -<?>- -1 99 0.283 __xnpod_schedule+0x11d
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 100 0.676 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
+ freeze 0x00009515 100 0.321 xnshadow_sys_trace+0xed
[xeno_nucleus] (hisyscall_event+0x1a8 [xeno_nucleus])
| # [ 2738] samplin 99 101 0.323 __xnpod_schedule+0x11d
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
| # [ 0] -<?>- -1 101 166.050 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 267 0.137 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
| # event tick@359 267 84.425 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # [ 0] -<?>- -1 352 0.335 __xnpod_schedule+0x11d
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 352 0.672 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
+ freeze 0x0002e7eb 353 130.030 xnshadow_sys_trace+0xed
[xeno_nucleus] (hisyscall_event+0x1a8 [xeno_nucleus])
| # event tick@559 483 0.715 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # [ 2738] samplin 99 483 0.315 __xnpod_schedule+0x11d
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
| # [ 0] -<?>- -1 484 87.028 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 571 0.050 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
| # event tick@659 571 1.335 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
| # [ 0] -<?>- -1 572 0.283 __xnpod_schedule+0x11d
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 572 3.602 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
| # [ 2738] samplin 99 576 0.307 __xnpod_schedule+0x11d
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
| # [ 0] -<?>- -1 576 556.825 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
| # [ 2738] samplin 99 1133 0.196 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
| # event tick@1159 1133 4.925 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
When I configured with 'Instrument function entries', it is possible
to trigger the bug when the trace is disabled. If I enable it, it
cannot be triggered anymore, but I also noticed that the function
tracing, which worked correctly before , no longer worked properly and
a warning is issued:
[ 497.121124] ------------[ cut here ]------------
[ 497.121131] WARNING: at kernel/trace/ftrace.c:386
register_ftrace_function+0x1e4/0x230()
[ 497.121132] Modules linked in: xeno_native xeno_nucleus i915
drm_kms_helper drm coretemp ghash_clmulni_intel aesni_intel aes_x86_64
lrw gf128mul glue_helper ablk_helper cryptd ehci_pci ehci_hcd video
lpc_ich rtc_cmos backlight mfd_core e1000e xhci_hcd igb usbcore
firewire_ohci firewire_core i2c_algo_bit hwmon ptp crc_itu_t pps_core
usb_common
[ 497.121152] CPU: 0 PID: 2259 Comm: bash Tainted: G W
3.10.18-ipipe-test-14 #82
[ 497.121153] Hardware name: Supermicro X10SAE/X10SAE, BIOS 1.1a 01/03/2014
[ 497.121154] 0000000000000009 ffff88020e9afe28 ffffffff8148ed1e
ffff88020e9afe60
[ 497.121156] ffffffff8103c071 00000000fffffff0 0000000000000002
0000000001eb0008
[ 497.121158] 0000000000000002 0000000000000000 ffff88020e9afe70
ffffffff8103c14a
[ 497.121160] Call Trace:
[ 497.121164] [<ffffffff8148ed1e>] dump_stack+0x19/0x1b
[ 497.121167] [<ffffffff8103c071>] warn_slowpath_common+0x61/0x80
[ 497.121169] [<ffffffff8103c14a>] warn_slowpath_null+0x1a/0x20
[ 497.121171] [<ffffffff810c85b4>] register_ftrace_function+0x1e4/0x230
[ 497.121173] [<ffffffff810c2afc>] __ipipe_wr_enable+0x10c/0x120
[ 497.121177] [<ffffffff81231911>] ? security_file_permission+0x21/0xa0
[ 497.121180] [<ffffffff811ae56d>] proc_reg_write+0x3d/0x80
[ 497.121182] [<ffffffff8114d17d>] vfs_write+0xbd/0x1e0
[ 497.121184] [<ffffffff8114db49>] SyS_write+0x49/0xa0
[ 497.121187] [<ffffffff8149c866>] system_call_fastpath+0x16/0x1b
[ 497.121188] ---[ end trace 4c5426aeef218be7 ]---
When it is still working, the traces rather read like:
: + func -130 0.058 __rt_task_wait_period+0x0
[xeno_native] (hisyscall_event+0x1a8 [xeno_nucleus])
: + func -130 0.070 rt_task_wait_period+0x0
[xeno_native] (__rt_task_wait_period+0x1a [xeno_native])
: + func -130 0.109 xnpod_wait_thread_period+0x0
[xeno_nucleus] (rt_task_wait_period+0x4f [xeno_native])
:| # func -130 0.115 xnpod_suspend_thread+0x0
[xeno_nucleus] (xnpod_wait_thread_period+0x115 [xeno_nucleus])
:| # func -130 0.068 __xnpod_schedule+0x0
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
:| # [ 2271] samplin 99 -130 0.056 __xnpod_schedule+0x11d
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
:| # func -130 0.470 xnsched_pick_next+0x0
[xeno_nucleus] (__xnpod_schedule+0x272 [xeno_nucleus])
:| # [30287] -<?>- -1 -129 0.245 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
:| +func -129+ 1.082 __ipipe_do_sync_pipeline+0x0
(__ipipe_dispatch_irq+0x1c0)
:| +func -128 0.097 __ipipe_handle_exception+0x0
(device_not_available+0x1f)
:| #func -128 0.105 __ipipe_notify_trap+0x0
(__ipipe_handle_exception+0x75)
:| #func -128 0.062 do_device_not_available+0x0
(__ipipe_handle_exception+0xf7)
:| #func -128 0.087 user_exit+0x0
(do_device_not_available+0x17)
:| #func -128 0.056 ipipe_restore_root+0x0 (user_exit+0xb2)
:| #func -128 0.161 math_state_restore+0x0
(do_device_not_available+0x25)
:| #func -128! 93.612
__ipipe_restore_root_nosync+0x0 (__ipipe_handle_exception+0x106)
:| +func -34 0.142 __ipipe_handle_irq+0x0
(apic_timer_interrupt+0x60)
:| +func -34 0.081 __ipipe_dispatch_irq+0x0
(__ipipe_handle_irq+0x65)
:| +func -34 0.090 __ipipe_ack_hrtimer_irq+0x0
(__ipipe_dispatch_irq+0x70)
:| +func -34 0.131 lapic_itimer_ack+0x0
(__ipipe_ack_hrtimer_irq+0x3b)
:| # func -33 0.164 xnintr_clock_handler+0x0
[xeno_nucleus] (__ipipe_dispatch_irq+0x171)
:| # func -33 0.088 xntimer_tick_aperiodic+0x0
[xeno_nucleus] (xnintr_clock_handler+0x142 [xeno_nucleus])
:| # func -33 0.077 xnthread_periodic_handler+0x0
[xeno_nucleus] (xntimer_tick_aperiodic+0xd5 [xeno_nucleus])
:| # func -33 0.076 xnpod_resume_thread+0x0
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
:| # [ 2271] samplin 99 -33 0.175 xnpod_resume_thread+0xe8
[xeno_nucleus] (xnthread_periodic_handler+0x35 [xeno_nucleus])
:| # func -33 0.063 xntimer_next_local_shot+0x0
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
:| # event tick@65 -33 0.041 xntimer_next_local_shot+0x63
[xeno_nucleus] (xntimer_tick_aperiodic+0x1b0 [xeno_nucleus])
:| # func -33 0.057 ipipe_timer_set+0x0
(xntimer_next_local_shot+0x6b [xeno_nucleus])
:| # func -33 0.231 lapic_next_deadline+0x0
(ipipe_timer_set+0x5c)
:| # func -32 0.103 __xnpod_schedule+0x0
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
:| # [30287] -<?>- -1 -32 0.078 __xnpod_schedule+0x11d
[xeno_nucleus] (xnintr_clock_handler+0x315 [xeno_nucleus])
:| # func -32! 27.879 xnsched_pick_next+0x0
[xeno_nucleus] (__xnpod_schedule+0x272 [xeno_nucleus])
:| # func -4 0.580
__ipipe_notify_vm_preemption+0x0 (__xnpod_schedule+0x71d
[xeno_nucleus])
:| # [ 2271] samplin 99 -4 0.301 __xnpod_schedule+0x4e1
[xeno_nucleus] (xnpod_suspend_thread+0x44a [xeno_nucleus])
:| # func -4 0.114 xntimer_get_overruns+0x0
[xeno_nucleus] (xnpod_wait_thread_period+0x13c [xeno_nucleus])
:| # func -3+ 1.210 __ipipe_restore_head+0x0
(xnpod_wait_thread_period+0x182 [xeno_nucleus])
: + func -2 0.084 __ipipe_syscall_root+0x0
(__ipipe_syscall_root_thunk+0x35)
: + func -2 0.068 __ipipe_notify_syscall+0x0
(__ipipe_syscall_root+0x35)
: + func -2 0.058 ipipe_syscall_hook+0x0
(__ipipe_notify_syscall+0xbf)
Jeroen.
2014-04-24 10:57 GMT+02:00 Jeroen Van den Keybus
<jeroen.vandenkeybus@gmail.com>:
>> Could you put a printk in the function vfile_stat_rewind to see if it
>> gets called (more than once) when the problem happens?
>
> I patched as follows:
>
> static int vfile_stat_rewind(struct xnvfile_snapshot_iterator *it)
> {
> struct vfile_stat_priv *priv = xnvfile_iterator_priv(it);
> int irqnr;
> int ret, irq;
>
> /*
> * The activity numbers on each valid interrupt descriptor are
> * grouped under a pseudo-thread.
> */
> priv->curr = getheadq(&nkpod->threadq);
> irq = priv->irq;
> priv->irq = 0;
> irqnr = xnintr_query_init(&priv->intr_it) * XNARCH_NR_CPUS;
>
> ret = irqnr + countq(&nkpod->threadq);
>
> printk(KERN_DEBUG "%s: priv=%p, ->curr=%p, ->irq=(%d), irqnr=%d, ret=%d\n",
> __FUNCTION__, priv, priv->curr, irq, irqnr, ret);
>
> return ret;
> }
>
>
> The result (3 first accesses are without latency running, the 2 last
> ones with latency running - each time 120 µs delay)
>
> [ 173.098667] vfile_stat_rewind: priv=ffff880211863228,
> ->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=264
> [ 181.547424] vfile_stat_rewind: priv=ffff880211863b28,
> ->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=264
> [ 183.002400] vfile_stat_rewind: priv=ffff880211863228,
> ->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=264
> [ 201.475071] vfile_stat_rewind: priv=ffff880211863228,
> ->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=266
> [ 209.432070] vfile_stat_rewind: priv=ffff8802118631a8,
> ->curr=ffffffffa06b1110, ->irq=(0), irqnr=256, ret=266
>
>
> So vfile... is called exactly once upon issuing 'cat /proc/xenomai/stat'.
>
>
> Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-24 14:46 ` Jeroen Van den Keybus
@ 2014-04-25 8:15 ` Jeroen Van den Keybus
2014-04-25 10:44 ` Jeroen Van den Keybus
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-25 8:15 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
I'm currently looking in vfile.c. I noticed that the bug gets
triggered in stat mainly because my NR_IRQS is high (2^14) and
vfile_stat_next is simply called much more frequently than in the
other snapshot entries. When removing the call to xnintr_query_next
from vfile_stat_next (replacing ret = xnintr_query_next(...) by ret =
-ENODEV) the problem still persists.
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-25 8:15 ` Jeroen Van den Keybus
@ 2014-04-25 10:44 ` Jeroen Van den Keybus
2014-09-09 21:03 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-04-25 10:44 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
For testing, I've removed the locks from the vfile system. Then the
high latencies reliably disappear.
To test, I made two xeno_nucleus modules: one with the xnlock_get/put_
in place and one with dummies. Subsequently, I use a program that
simply opens and reads the stat file 1,000 times.
With locks:
RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD| -2.575| -2.309| 9.286| 0| 0| -2.575| 9.286
RTD| -2.364| -2.276| 1.600| 0| 0| -2.575| 9.286
RTD| -2.482| -2.274| 2.165| 0| 0| -2.575| 9.286
RTD| -2.368| 135.261| 1478.154| 13008| 0| -2.575| 1478.154
RTD| -2.368| -2.272| 2.602| 13008| 0| -2.575| 1478.154
RTD| -2.499| -2.272| 6.933| 13008| 0| -2.575| 1478.154
Without locks:
RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
RTD| -2.503| -2.270| 3.310| 0| 0| -2.503| 3.310
RTD| -2.418| -2.284| -1.646| 0| 0| -2.503| 3.310
RTD| -2.496| -2.275| 4.630| 0| 0| -2.503| 4.630
RTD| -2.374| -2.285| -1.458| 0| 0| -2.503| 4.630
RTD| -2.452| -2.273| 3.559| 0| 0| -2.503| 4.630
RTD| -2.370| -2.285| -1.518| 0| 0| -2.503| 4.630
RTD| -2.458| -2.274| 4.203| 0| 0| -2.503| 4.630
I'll now have a closer look into the vfile system but if the locks are
malfunctioning, I'm clueless.
BTW I found that unloading and loading xeno_nucleus didn't work due to
a missing rthal_free_ptdkey call in xnshadow_cleanup. I used the
following patch to fix that. (The ability to swap out xenomai modules
is a real lifesaver when debugging. Thanks!)
--- /home/vdkeybus/work/xenomai/ksrc/nucleus/shadow.c 2014-04-16
22:46:19.018851844 +0200
+++ shadow.c 2014-04-25 09:43:49.838735832 +0200
@@ -3139,6 +3139,8 @@ void xnshadow_cleanup(void)
}
rthal_apc_free(lostage_apc);
+
+ rthal_free_ptdkey(nkmmptd);
rthal_free_ptdkey(nkerrptd);
rthal_free_ptdkey(nkthrptd);
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-04-25 10:44 ` Jeroen Van den Keybus
@ 2014-09-09 21:03 ` Gilles Chanteperdrix
2014-09-10 13:50 ` Jeroen Van den Keybus
2014-09-11 5:11 ` Jan Kiszka
0 siblings, 2 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-09 21:03 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai
On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
> For testing, I've removed the locks from the vfile system. Then the
> high latencies reliably disappear.
>
> To test, I made two xeno_nucleus modules: one with the xnlock_get/put_
> in place and one with dummies. Subsequently, I use a program that
> simply opens and reads the stat file 1,000 times.
>
> With locks:
>
> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
> RTD| -2.575| -2.309| 9.286| 0| 0| -2.575| 9.286
> RTD| -2.364| -2.276| 1.600| 0| 0| -2.575| 9.286
> RTD| -2.482| -2.274| 2.165| 0| 0| -2.575| 9.286
> RTD| -2.368| 135.261| 1478.154| 13008| 0| -2.575| 1478.154
> RTD| -2.368| -2.272| 2.602| 13008| 0| -2.575| 1478.154
> RTD| -2.499| -2.272| 6.933| 13008| 0| -2.575| 1478.154
>
> Without locks:
>
> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
> RTD| -2.503| -2.270| 3.310| 0| 0| -2.503| 3.310
> RTD| -2.418| -2.284| -1.646| 0| 0| -2.503| 3.310
> RTD| -2.496| -2.275| 4.630| 0| 0| -2.503| 4.630
> RTD| -2.374| -2.285| -1.458| 0| 0| -2.503| 4.630
> RTD| -2.452| -2.273| 3.559| 0| 0| -2.503| 4.630
> RTD| -2.370| -2.285| -1.518| 0| 0| -2.503| 4.630
> RTD| -2.458| -2.274| 4.203| 0| 0| -2.503| 4.630
>
> I'll now have a closer look into the vfile system but if the locks are
> malfunctioning, I'm clueless.
Answering with a "little" delay, could you try the following patch?
diff --git a/include/asm-generic/bits/pod.h b/include/asm-generic/bits/pod.h
index a6be0dc..cfb0c71 100644
--- a/include/asm-generic/bits/pod.h
+++ b/include/asm-generic/bits/pod.h
@@ -248,6 +248,7 @@ void __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
cpu_relax();
xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
XNLOCK_DBG_PASS_CONTEXT);
+ xnarch_memory_barrier();
} while(atomic_read(&lock->owner) != ~0);
}
EXPORT_SYMBOL_GPL(__xnlock_spin);
diff --git a/include/asm-generic/system.h b/include/asm-generic/system.h
index 25bd83f..7a8c4d0 100644
--- a/include/asm-generic/system.h
+++ b/include/asm-generic/system.h
@@ -378,6 +378,8 @@ static inline void xnlock_put(xnlock_t *lock)
xnarch_memory_barrier();
atomic_set(&lock->owner, ~0);
+
+ xnarch_memory_barrier();
}
static inline spl_t
diff --git a/ksrc/nucleus/vfile.c b/ksrc/nucleus/vfile.c
index c8e0363..066c12f 100644
--- a/ksrc/nucleus/vfile.c
+++ b/ksrc/nucleus/vfile.c
@@ -279,6 +279,15 @@ redo:
data += vfile->datasz;
it->nrdata++;
}
+#ifdef CONFIG_SMP
+ {
+ /* Leave some time for other cpus to get the lock */
+ xnticks_t wakeup = xnarch_get_cpu_tsc();
+ wakeup += xnarch_ns_to_tsc(1000);
+ while ((xnsticks_t)(xnarch_get_cpu_tsc() - wakeup) < 0)
+ cpu_relax();
+ }
+#endif
}
if (ret < 0) {
>
>
> BTW I found that unloading and loading xeno_nucleus didn't work due to
> a missing rthal_free_ptdkey call in xnshadow_cleanup. I used the
> following patch to fix that. (The ability to swap out xenomai modules
> is a real lifesaver when debugging. Thanks!)
>
> --- /home/vdkeybus/work/xenomai/ksrc/nucleus/shadow.c 2014-04-16
> 22:46:19.018851844 +0200
> +++ shadow.c 2014-04-25 09:43:49.838735832 +0200
> @@ -3139,6 +3139,8 @@ void xnshadow_cleanup(void)
> }
>
> rthal_apc_free(lostage_apc);
> +
> + rthal_free_ptdkey(nkmmptd);
> rthal_free_ptdkey(nkerrptd);
> rthal_free_ptdkey(nkthrptd);
Merged, thanks.
--
Gilles.
^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-09 21:03 ` Gilles Chanteperdrix
@ 2014-09-10 13:50 ` Jeroen Van den Keybus
2014-09-10 19:47 ` Gilles Chanteperdrix
2014-09-11 5:11 ` Jan Kiszka
1 sibling, 1 reply; 40+ messages in thread
From: Jeroen Van den Keybus @ 2014-09-10 13:50 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: xenomai
Hi Gilles,
> Answering with a "little" delay, could you try the following patch?
>
No problem. And we understand you are busy.
The tests below consist of running ./latency at 10 kHz and continuously
open, read and close /proc/xenomai/stat. We can read at about 200 Hz. We
let it cook for 10 minutes.
To verify, we tested again without the patch (problems after one sec
already, so we stopped this):
== Sampling period: 100 us
== Test mode: periodic user-mode task
== All results in microseconds
warming up...
RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat
worst
RTD| 0.725| 0.956| 1.524| 0| 0| 0.725|
1.524
RTD| 0.782| 0.936| 1.482| 0| 0| 0.725|
1.524
RTD| 0.886| 0.936| 1.750| 0| 0| 0.725|
1.750
RTD| 0.886| 2.355| 546.854| 5| 0| 0.725|
546.854
RTD| 1.253| 4.380| 629.025| 15| 0| 0.725|
629.025
RTD| 1.292| 4.348| 578.529| 19| 0| 0.725|
629.025
RTD| 1.287| 4.375| 662.344| 27| 0| 0.725|
662.344
RTD| 1.265| 4.372| 369.331| 35| 0| 0.725|
662.344
And with the patch (same conditions for 10 minutes):
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat
worst
RTD| 1.152| 1.230| 1.889| 0| 0| 0.547|
2.591
RTD| 1.054| 1.231| 1.824| 0| 0| 0.547|
2.591
RTD| 1.148| 1.229| 1.775| 0| 0| 0.547|
2.591
RTD| 1.057| 1.231| 1.782| 0| 0| 0.547|
2.591
RTD| 1.150| 1.231| 1.786| 0| 0| 0.547|
2.591
RTD| 1.045| 1.230| 1.888| 0| 0| 0.547|
2.591
RTD| 1.151| 1.231| 1.761| 0| 0| 0.547|
2.591
RTD| 0.999| 1.230| 2.049| 0| 0| 0.547|
2.591
RTD| 1.148| 1.231| 1.818| 0| 0| 0.547|
2.591
RTD| 1.031| 1.231| 1.784| 0| 0| 0.547|
2.591
RTD| 1.149| 1.231| 1.818| 0| 0| 0.547|
2.591
RTD| 0.832| 1.228| 1.976| 0| 0| 0.547|
2.591
RTD| 1.149| 1.226| 1.805| 0| 0| 0.547|
2.591
RTD| 1.025| 1.225| 1.842| 0| 0| 0.547|
2.591
RTD| 1.150| 1.225| 1.795| 0| 0| 0.547|
2.591
RTD| 1.053| 1.225| 1.774| 0| 0| 0.547|
2.591
RTD| 1.150| 1.226| 1.876| 0| 0| 0.547|
2.591
RTD| 0.910| 1.225| 2.205| 0| 0| 0.547|
2.591
RTD| 1.149| 1.225| 1.819| 0| 0| 0.547|
2.591
RTD| 0.716| 1.225| 1.774| 0| 0| 0.547|
2.591
RTD| 0.873| 1.225| 1.925| 0| 0| 0.547|
2.591
RTT| 00:09:49 (periodic user-mode task, 100 us period, priority 99)
We also checked the kernel performance without reading stat:
RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat
worst
RTD| 0.906| 0.917| 1.377| 0| 0| 0.494|
2.600
RTD| 0.905| 0.916| 1.501| 0| 0| 0.494|
2.600
RTD| 0.905| 0.916| 1.357| 0| 0| 0.494|
2.600
RTD| 0.905| 0.960| 1.395| 0| 0| 0.494|
2.600
RTD| 0.594| 0.916| 1.349| 0| 0| 0.494|
2.600
RTD| 0.906| 0.917| 1.364| 0| 0| 0.494|
2.600
RTD| 0.905| 0.916| 1.331| 0| 0| 0.494|
2.600
RTD| 0.905| 0.917| 1.333| 0| 0| 0.494|
2.600
RTD| 0.846| 0.954| 1.363| 0| 0| 0.494|
2.600
RTD| 0.906| 0.917| 1.369| 0| 0| 0.494|
2.600
RTD| 0.906| 0.917| 1.365| 0| 0| 0.494|
2.600
RTD| 0.905| 0.917| 1.341| 0| 0| 0.494|
2.600
RTD| 0.906| 0.917| 1.354| 0| 0| 0.494|
2.600
RTD| 0.906| 0.957| 1.340| 0| 0| 0.494|
2.600
RTD| 0.905| 0.917| 1.380| 0| 0| 0.494|
2.600
RTD| 0.905| 0.918| 1.356| 0| 0| 0.494|
2.600
RTD| 0.905| 0.917| 1.339| 0| 0| 0.494|
2.600
RTD| 0.905| 0.917| 1.331| 0| 0| 0.494|
2.600
RTD| 0.906| 0.955| 1.376| 0| 0| 0.494|
2.600
RTD| 0.906| 0.917| 1.353| 0| 0| 0.494|
2.600
RTD| 0.604| 0.916| 1.379| 0| 0| 0.494|
2.600
RTT| 00:10:10 (periodic user-mode task, 100 us period, priority 99)
You can clearly observe an increase of the average latency (about 300 ns),
but the worst case latency isn't necessarily worse.
Obviously the patch works. Thanks !
Do I understand correctly that currently a flag (lock) set on one CPU isn't
observable by another CPU (due to e.g. cache) ?
Jeroen.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-10 13:50 ` Jeroen Van den Keybus
@ 2014-09-10 19:47 ` Gilles Chanteperdrix
0 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-10 19:47 UTC (permalink / raw)
To: Jeroen Van den Keybus; +Cc: xenomai
On 09/10/2014 03:50 PM, Jeroen Van den Keybus wrote:
> Hi Gilles,
>
>
>
>> Answering with a "little" delay, could you try the following patch?
>>
>
> No problem. And we understand you are busy.
>
> The tests below consist of running ./latency at 10 kHz and continuously
> open, read and close /proc/xenomai/stat. We can read at about 200 Hz. We
> let it cook for 10 minutes.
>
> To verify, we tested again without the patch (problems after one sec
> already, so we stopped this):
>
> == Sampling period: 100 us
>
> == Test mode: periodic user-mode task
>
> == All results in microseconds
>
> warming up...
>
> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat
> worst
>
> RTD| 0.725| 0.956| 1.524| 0| 0| 0.725|
> 1.524
>
> RTD| 0.782| 0.936| 1.482| 0| 0| 0.725|
> 1.524
>
> RTD| 0.886| 0.936| 1.750| 0| 0| 0.725|
> 1.750
>
> RTD| 0.886| 2.355| 546.854| 5| 0| 0.725|
> 546.854
>
> RTD| 1.253| 4.380| 629.025| 15| 0| 0.725|
> 629.025
>
> RTD| 1.292| 4.348| 578.529| 19| 0| 0.725|
> 629.025
>
> RTD| 1.287| 4.375| 662.344| 27| 0| 0.725|
> 662.344
>
> RTD| 1.265| 4.372| 369.331| 35| 0| 0.725|
> 662.344
>
>
> And with the patch (same conditions for 10 minutes):
>
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat
> worst
>
> RTD| 1.152| 1.230| 1.889| 0| 0| 0.547|
> 2.591
>
> RTD| 1.054| 1.231| 1.824| 0| 0| 0.547|
> 2.591
>
> RTD| 1.148| 1.229| 1.775| 0| 0| 0.547|
> 2.591
>
> RTD| 1.057| 1.231| 1.782| 0| 0| 0.547|
> 2.591
>
> RTD| 1.150| 1.231| 1.786| 0| 0| 0.547|
> 2.591
>
> RTD| 1.045| 1.230| 1.888| 0| 0| 0.547|
> 2.591
>
> RTD| 1.151| 1.231| 1.761| 0| 0| 0.547|
> 2.591
>
> RTD| 0.999| 1.230| 2.049| 0| 0| 0.547|
> 2.591
>
> RTD| 1.148| 1.231| 1.818| 0| 0| 0.547|
> 2.591
>
> RTD| 1.031| 1.231| 1.784| 0| 0| 0.547|
> 2.591
>
> RTD| 1.149| 1.231| 1.818| 0| 0| 0.547|
> 2.591
>
> RTD| 0.832| 1.228| 1.976| 0| 0| 0.547|
> 2.591
>
> RTD| 1.149| 1.226| 1.805| 0| 0| 0.547|
> 2.591
>
> RTD| 1.025| 1.225| 1.842| 0| 0| 0.547|
> 2.591
>
> RTD| 1.150| 1.225| 1.795| 0| 0| 0.547|
> 2.591
>
> RTD| 1.053| 1.225| 1.774| 0| 0| 0.547|
> 2.591
>
> RTD| 1.150| 1.226| 1.876| 0| 0| 0.547|
> 2.591
>
> RTD| 0.910| 1.225| 2.205| 0| 0| 0.547|
> 2.591
>
> RTD| 1.149| 1.225| 1.819| 0| 0| 0.547|
> 2.591
>
> RTD| 0.716| 1.225| 1.774| 0| 0| 0.547|
> 2.591
>
> RTD| 0.873| 1.225| 1.925| 0| 0| 0.547|
> 2.591
>
> RTT| 00:09:49 (periodic user-mode task, 100 us period, priority 99)
>
>
>
> We also checked the kernel performance without reading stat:
>
> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat
> worst
>
> RTD| 0.906| 0.917| 1.377| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.916| 1.501| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.916| 1.357| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.960| 1.395| 0| 0| 0.494|
> 2.600
>
> RTD| 0.594| 0.916| 1.349| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.917| 1.364| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.916| 1.331| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.917| 1.333| 0| 0| 0.494|
> 2.600
>
> RTD| 0.846| 0.954| 1.363| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.917| 1.369| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.917| 1.365| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.917| 1.341| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.917| 1.354| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.957| 1.340| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.917| 1.380| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.918| 1.356| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.917| 1.339| 0| 0| 0.494|
> 2.600
>
> RTD| 0.905| 0.917| 1.331| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.955| 1.376| 0| 0| 0.494|
> 2.600
>
> RTD| 0.906| 0.917| 1.353| 0| 0| 0.494|
> 2.600
>
> RTD| 0.604| 0.916| 1.379| 0| 0| 0.494|
> 2.600
>
> RTT| 00:10:10 (periodic user-mode task, 100 us period, priority 99)
>
>
>
> You can clearly observe an increase of the average latency (about 300 ns),
> but the worst case latency isn't necessarily worse.
>
> Obviously the patch works. Thanks !
>
> Do I understand correctly that currently a flag (lock) set on one CPU isn't
> observable by another CPU (due to e.g. cache) ?
Currently, freeing the lock is not perceived immediately by the cpus
waiting for the lock, that is because the lock is freed with atomic_set,
which does not contain a barrier. The lock, however, is locked with
atomic_cmpxchg, which is a full barrier, so, there is no way a locked
lock can be perceived as free.
That accountfs for the barriers in xnlock_put and __xnlock_spin. However
these two barriers do not seem to be sufficient to avoid the issue
completely, either busy sleeping in the snapshot code, or putting a
barrier in front of the cmpxchg in the xlock_get code seems to be
necessary. I have chosen the latter, because it has a smaller impact
than the former, but I am not entirely satisfied (busy sleeping is a bit
ugly).
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-09 21:03 ` Gilles Chanteperdrix
2014-09-10 13:50 ` Jeroen Van den Keybus
@ 2014-09-11 5:11 ` Jan Kiszka
2014-09-11 5:19 ` Jan Kiszka
2014-09-16 11:09 ` Gilles Chanteperdrix
1 sibling, 2 replies; 40+ messages in thread
From: Jan Kiszka @ 2014-09-11 5:11 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>> For testing, I've removed the locks from the vfile system. Then the
>> high latencies reliably disappear.
>>
>> To test, I made two xeno_nucleus modules: one with the xnlock_get/put_
>> in place and one with dummies. Subsequently, I use a program that
>> simply opens and reads the stat file 1,000 times.
>>
>> With locks:
>>
>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>> RTD| -2.575| -2.309| 9.286| 0| 0| -2.575| 9.286
>> RTD| -2.364| -2.276| 1.600| 0| 0| -2.575| 9.286
>> RTD| -2.482| -2.274| 2.165| 0| 0| -2.575| 9.286
>> RTD| -2.368| 135.261| 1478.154| 13008| 0| -2.575| 1478.154
>> RTD| -2.368| -2.272| 2.602| 13008| 0| -2.575| 1478.154
>> RTD| -2.499| -2.272| 6.933| 13008| 0| -2.575| 1478.154
>>
>> Without locks:
>>
>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>> RTD| -2.503| -2.270| 3.310| 0| 0| -2.503| 3.310
>> RTD| -2.418| -2.284| -1.646| 0| 0| -2.503| 3.310
>> RTD| -2.496| -2.275| 4.630| 0| 0| -2.503| 4.630
>> RTD| -2.374| -2.285| -1.458| 0| 0| -2.503| 4.630
>> RTD| -2.452| -2.273| 3.559| 0| 0| -2.503| 4.630
>> RTD| -2.370| -2.285| -1.518| 0| 0| -2.503| 4.630
>> RTD| -2.458| -2.274| 4.203| 0| 0| -2.503| 4.630
>>
>> I'll now have a closer look into the vfile system but if the locks are
>> malfunctioning, I'm clueless.
>
> Answering with a "little" delay, could you try the following patch?
>
> diff --git a/include/asm-generic/bits/pod.h b/include/asm-generic/bits/pod.h
> index a6be0dc..cfb0c71 100644
> --- a/include/asm-generic/bits/pod.h
> +++ b/include/asm-generic/bits/pod.h
> @@ -248,6 +248,7 @@ void __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
> cpu_relax();
> xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
> XNLOCK_DBG_PASS_CONTEXT);
> + xnarch_memory_barrier();
> } while(atomic_read(&lock->owner) != ~0);
> }
> EXPORT_SYMBOL_GPL(__xnlock_spin);
> diff --git a/include/asm-generic/system.h b/include/asm-generic/system.h
> index 25bd83f..7a8c4d0 100644
> --- a/include/asm-generic/system.h
> +++ b/include/asm-generic/system.h
> @@ -378,6 +378,8 @@ static inline void xnlock_put(xnlock_t *lock)
> xnarch_memory_barrier();
>
> atomic_set(&lock->owner, ~0);
> +
> + xnarch_memory_barrier();
That's pretty heavy-weighted now (it was already due to the first memory
barrier). Maybe it's better to look at some ticket lock mechanism like
Linux uses for fairness. At least on x86 (and other strictly ordered
archs), those require no memory barriers on release.
Jan
> }
>
> static inline spl_t
> diff --git a/ksrc/nucleus/vfile.c b/ksrc/nucleus/vfile.c
> index c8e0363..066c12f 100644
> --- a/ksrc/nucleus/vfile.c
> +++ b/ksrc/nucleus/vfile.c
> @@ -279,6 +279,15 @@ redo:
> data += vfile->datasz;
> it->nrdata++;
> }
> +#ifdef CONFIG_SMP
> + {
> + /* Leave some time for other cpus to get the lock */
> + xnticks_t wakeup = xnarch_get_cpu_tsc();
> + wakeup += xnarch_ns_to_tsc(1000);
> + while ((xnsticks_t)(xnarch_get_cpu_tsc() - wakeup) < 0)
> + cpu_relax();
> + }
> +#endif
> }
>
> if (ret < 0) {
>
>
>>
>>
>> BTW I found that unloading and loading xeno_nucleus didn't work due to
>> a missing rthal_free_ptdkey call in xnshadow_cleanup. I used the
>> following patch to fix that. (The ability to swap out xenomai modules
>> is a real lifesaver when debugging. Thanks!)
>>
>> --- /home/vdkeybus/work/xenomai/ksrc/nucleus/shadow.c 2014-04-16
>> 22:46:19.018851844 +0200
>> +++ shadow.c 2014-04-25 09:43:49.838735832 +0200
>> @@ -3139,6 +3139,8 @@ void xnshadow_cleanup(void)
>> }
>>
>> rthal_apc_free(lostage_apc);
>> +
>> + rthal_free_ptdkey(nkmmptd);
>> rthal_free_ptdkey(nkerrptd);
>> rthal_free_ptdkey(nkthrptd);
>
> Merged, thanks.
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140911/90346dde/attachment.sig>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-11 5:11 ` Jan Kiszka
@ 2014-09-11 5:19 ` Jan Kiszka
2014-09-18 11:46 ` Gilles Chanteperdrix
2014-09-16 11:09 ` Gilles Chanteperdrix
1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-11 5:19 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-11 07:11, Jan Kiszka wrote:
> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>> For testing, I've removed the locks from the vfile system. Then the
>>> high latencies reliably disappear.
>>>
>>> To test, I made two xeno_nucleus modules: one with the xnlock_get/put_
>>> in place and one with dummies. Subsequently, I use a program that
>>> simply opens and reads the stat file 1,000 times.
>>>
>>> With locks:
>>>
>>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>>> RTD| -2.575| -2.309| 9.286| 0| 0| -2.575| 9.286
>>> RTD| -2.364| -2.276| 1.600| 0| 0| -2.575| 9.286
>>> RTD| -2.482| -2.274| 2.165| 0| 0| -2.575| 9.286
>>> RTD| -2.368| 135.261| 1478.154| 13008| 0| -2.575| 1478.154
>>> RTD| -2.368| -2.272| 2.602| 13008| 0| -2.575| 1478.154
>>> RTD| -2.499| -2.272| 6.933| 13008| 0| -2.575| 1478.154
>>>
>>> Without locks:
>>>
>>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>>> RTD| -2.503| -2.270| 3.310| 0| 0| -2.503| 3.310
>>> RTD| -2.418| -2.284| -1.646| 0| 0| -2.503| 3.310
>>> RTD| -2.496| -2.275| 4.630| 0| 0| -2.503| 4.630
>>> RTD| -2.374| -2.285| -1.458| 0| 0| -2.503| 4.630
>>> RTD| -2.452| -2.273| 3.559| 0| 0| -2.503| 4.630
>>> RTD| -2.370| -2.285| -1.518| 0| 0| -2.503| 4.630
>>> RTD| -2.458| -2.274| 4.203| 0| 0| -2.503| 4.630
>>>
>>> I'll now have a closer look into the vfile system but if the locks are
>>> malfunctioning, I'm clueless.
>>
>> Answering with a "little" delay, could you try the following patch?
>>
>> diff --git a/include/asm-generic/bits/pod.h b/include/asm-generic/bits/pod.h
>> index a6be0dc..cfb0c71 100644
>> --- a/include/asm-generic/bits/pod.h
>> +++ b/include/asm-generic/bits/pod.h
>> @@ -248,6 +248,7 @@ void __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>> cpu_relax();
>> xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>> XNLOCK_DBG_PASS_CONTEXT);
>> + xnarch_memory_barrier();
>> } while(atomic_read(&lock->owner) != ~0);
>> }
>> EXPORT_SYMBOL_GPL(__xnlock_spin);
>> diff --git a/include/asm-generic/system.h b/include/asm-generic/system.h
>> index 25bd83f..7a8c4d0 100644
>> --- a/include/asm-generic/system.h
>> +++ b/include/asm-generic/system.h
>> @@ -378,6 +378,8 @@ static inline void xnlock_put(xnlock_t *lock)
>> xnarch_memory_barrier();
>>
>> atomic_set(&lock->owner, ~0);
>> +
>> + xnarch_memory_barrier();
>
> That's pretty heavy-weighted now (it was already due to the first memory
> barrier). Maybe it's better to look at some ticket lock mechanism like
> Linux uses for fairness. At least on x86 (and other strictly ordered
> archs), those require no memory barriers on release.
In fact, memory barriers aren't needed on strictly ordered archs already
today, independent of the spinlock granting algorithm. So there are two
optimization possibilities:
- ticket-based granting
- arch-specific (thus optimized) core
Jan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140911/b4129df0/attachment.sig>
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-11 5:11 ` Jan Kiszka
2014-09-11 5:19 ` Jan Kiszka
@ 2014-09-16 11:09 ` Gilles Chanteperdrix
1 sibling, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-16 11:09 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/11/2014 07:11 AM, Jan Kiszka wrote:
> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>> For testing, I've removed the locks from the vfile system. Then the
>>> high latencies reliably disappear.
>>>
>>> To test, I made two xeno_nucleus modules: one with the xnlock_get/put_
>>> in place and one with dummies. Subsequently, I use a program that
>>> simply opens and reads the stat file 1,000 times.
>>>
>>> With locks:
>>>
>>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>>> RTD| -2.575| -2.309| 9.286| 0| 0| -2.575| 9.286
>>> RTD| -2.364| -2.276| 1.600| 0| 0| -2.575| 9.286
>>> RTD| -2.482| -2.274| 2.165| 0| 0| -2.575| 9.286
>>> RTD| -2.368| 135.261| 1478.154| 13008| 0| -2.575| 1478.154
>>> RTD| -2.368| -2.272| 2.602| 13008| 0| -2.575| 1478.154
>>> RTD| -2.499| -2.272| 6.933| 13008| 0| -2.575| 1478.154
>>>
>>> Without locks:
>>>
>>> RTT| 00:00:01 (periodic user-mode task, 100 us period, priority 99)
>>> RTH|----lat min|----lat avg|----lat max|-overrun|---msw|---lat best|--lat worst
>>> RTD| -2.503| -2.270| 3.310| 0| 0| -2.503| 3.310
>>> RTD| -2.418| -2.284| -1.646| 0| 0| -2.503| 3.310
>>> RTD| -2.496| -2.275| 4.630| 0| 0| -2.503| 4.630
>>> RTD| -2.374| -2.285| -1.458| 0| 0| -2.503| 4.630
>>> RTD| -2.452| -2.273| 3.559| 0| 0| -2.503| 4.630
>>> RTD| -2.370| -2.285| -1.518| 0| 0| -2.503| 4.630
>>> RTD| -2.458| -2.274| 4.203| 0| 0| -2.503| 4.630
>>>
>>> I'll now have a closer look into the vfile system but if the locks are
>>> malfunctioning, I'm clueless.
>>
>> Answering with a "little" delay, could you try the following patch?
>>
>> diff --git a/include/asm-generic/bits/pod.h b/include/asm-generic/bits/pod.h
>> index a6be0dc..cfb0c71 100644
>> --- a/include/asm-generic/bits/pod.h
>> +++ b/include/asm-generic/bits/pod.h
>> @@ -248,6 +248,7 @@ void __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>> cpu_relax();
>> xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>> XNLOCK_DBG_PASS_CONTEXT);
>> + xnarch_memory_barrier();
>> } while(atomic_read(&lock->owner) != ~0);
>> }
>> EXPORT_SYMBOL_GPL(__xnlock_spin);
>> diff --git a/include/asm-generic/system.h b/include/asm-generic/system.h
>> index 25bd83f..7a8c4d0 100644
>> --- a/include/asm-generic/system.h
>> +++ b/include/asm-generic/system.h
>> @@ -378,6 +378,8 @@ static inline void xnlock_put(xnlock_t *lock)
>> xnarch_memory_barrier();
>>
>> atomic_set(&lock->owner, ~0);
>> +
>> + xnarch_memory_barrier();
>
> That's pretty heavy-weighted now (it was already due to the first memory
> barrier). Maybe it's better to look at some ticket lock mechanism like
> Linux uses for fairness. At least on x86 (and other strictly ordered
> archs), those require no memory barriers on release.
Maybe I can use atomic_cmpxchg(cpu, ~0), at least it will be only one
big barrier instead of two, I believe this is what the original xnlock
code did, I do not remember why it got changed to the current bogus
implemenation, it even allows a cheap check for invalid unlock. I am not
too fond of such an invasive change as changing the locks implementation
completely in 2.6. Especially since the issue we have is a corner case
(the problem is caused by /proc/xenomai/stat which quickly locks and
unlocks the nklock).
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-11 5:19 ` Jan Kiszka
@ 2014-09-18 11:46 ` Gilles Chanteperdrix
2014-09-18 11:59 ` Jan Kiszka
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 11:46 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 09/11/2014 07:19 AM, Jan Kiszka wrote:
> On 2014-09-11 07:11, Jan Kiszka wrote:
>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>> For testing, I've removed the locks from the vfile system.
>>>> Then the high latencies reliably disappear.
>>>>
>>>> To test, I made two xeno_nucleus modules: one with the
>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>> I use a program that simply opens and reads the stat file
>>>> 1,000 times.
>>>>
>>>> With locks:
>>>>
>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>> 0| -2.575| 1478.154
>>>>
>>>> Without locks:
>>>>
>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>> 4.203| 0| 0| -2.503| 4.630
>>>>
>>>> I'll now have a closer look into the vfile system but if the
>>>> locks are malfunctioning, I'm clueless.
>>>
>>> Answering with a "little" delay, could you try the following
>>> patch?
>>>
>>> diff --git a/include/asm-generic/bits/pod.h
>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>> --- a/include/asm-generic/bits/pod.h +++
>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>> while(atomic_read(&lock->owner) != ~0); }
>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>> index 25bd83f..7a8c4d0 100644 ---
>>> a/include/asm-generic/system.h +++
>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>> inline void xnlock_put(xnlock_t *lock)
>>> xnarch_memory_barrier();
>>>
>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>
>> That's pretty heavy-weighted now (it was already due to the first
>> memory barrier). Maybe it's better to look at some ticket lock
>> mechanism like Linux uses for fairness. At least on x86 (and
>> other strictly ordered archs), those require no memory barriers
>> on release.
>
> In fact, memory barriers aren't needed on strictly ordered archs
> already today, independent of the spinlock granting algorithm. So
> there are two optimization possibilities:
>
> - ticket-based granting - arch-specific (thus optimized) core
Ok, no answer, so I will try to be more clear.
I do not pretend to understand how memory barriers work at a low
level, this is a shame, I know, and am sorry for that. My "high level"
view, is that memory barriers on SMP systems act as synchronization
points, meaning that when a CPU issues a barrier, it will "see" the
state of the other CPUs at the time of their last barrier. This means
that for a CPU to see a store that occured on another CPU, there must
have been two barriers: a barrier after the store on one cpu, and a
barrier after that before the read on the other cpu. This view of
things seems to be corroborated by the fact that the patch works, and
by the following sentence in Documentation/memory-barriers.txt:
(*) There is no guarantee that a CPU will see the correct order of
effects from a second CPU's accesses, even _if_ the second CPU uses a
memory barrier, unless the first CPU _also_ uses a matching memory
barrier (see the subsection on "SMP Barrier Pairing").
So, the lack of memory barrier after atomic_set in xnlock_put looks
like a bug to me, and your assertion that ticket based algorithm do
not require memory barriers looks dubious.
Now, I do not really know what "strictly ordered architecture" means,
(a shame, again, sorry) but I suspect it implies strict ordering on
one core, but not amongst cores, so that the two barriers thing
remains mandatory. So, in short, on a fully ordered system, the
barrier before atomic_set can be removed, but the one after atomic_set
is still necessary. If this is the case, then we would simply need to
define an xnarch_local_memory_barrier() which implies ordering on the
current cpu, and that would simply be a compiler barrier on x86, and
we do not need a complete reimplementation of the spinlocks just for
one barrier.
For the same reason, I find that the memory barrier before atomic_read
in __xnlock_spin is necessary. In fact it is necessary only on x86
which is the only architecture where cpu_relax() is not defined to be
a barrier, but anyway, I do not believe this barrier is a problem
since it happens on a slow path.
- --
Gilles.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Icedove - http://www.enigmail.net/
iD8DBQFUGsYvGpcgE6m/fboRAtJjAKCBOIeeWT5OnSKfozydZR3lwxcK6ACfbTW4
o1rwRixqvFXN3/WGX1MVn/E=
=R5hK
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 11:46 ` Gilles Chanteperdrix
@ 2014-09-18 11:59 ` Jan Kiszka
2014-09-18 12:11 ` Gilles Chanteperdrix
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 11:59 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>> For testing, I've removed the locks from the vfile system.
>>>>> Then the high latencies reliably disappear.
>>>>>
>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>> I use a program that simply opens and reads the stat file
>>>>> 1,000 times.
>>>>>
>>>>> With locks:
>>>>>
>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>> 0| -2.575| 1478.154
>>>>>
>>>>> Without locks:
>>>>>
>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>
>>>>> I'll now have a closer look into the vfile system but if the
>>>>> locks are malfunctioning, I'm clueless.
>>>>
>>>> Answering with a "little" delay, could you try the following
>>>> patch?
>>>>
>>>> diff --git a/include/asm-generic/bits/pod.h
>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>> --- a/include/asm-generic/bits/pod.h +++
>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>> while(atomic_read(&lock->owner) != ~0); }
>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>> index 25bd83f..7a8c4d0 100644 ---
>>>> a/include/asm-generic/system.h +++
>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>> inline void xnlock_put(xnlock_t *lock)
>>>> xnarch_memory_barrier();
>>>>
>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>
>>> That's pretty heavy-weighted now (it was already due to the first
>>> memory barrier). Maybe it's better to look at some ticket lock
>>> mechanism like Linux uses for fairness. At least on x86 (and
>>> other strictly ordered archs), those require no memory barriers
>>> on release.
>
>> In fact, memory barriers aren't needed on strictly ordered archs
>> already today, independent of the spinlock granting algorithm. So
>> there are two optimization possibilities:
>
>> - ticket-based granting - arch-specific (thus optimized) core
>
> Ok, no answer, so I will try to be more clear.
>
> I do not pretend to understand how memory barriers work at a low
> level, this is a shame, I know, and am sorry for that. My "high level"
> view, is that memory barriers on SMP systems act as synchronization
> points, meaning that when a CPU issues a barrier, it will "see" the
> state of the other CPUs at the time of their last barrier. This means
> that for a CPU to see a store that occured on another CPU, there must
> have been two barriers: a barrier after the store on one cpu, and a
> barrier after that before the read on the other cpu. This view of
> things seems to be corroborated by the fact that the patch works, and
> by the following sentence in Documentation/memory-barriers.txt:
>
> (*) There is no guarantee that a CPU will see the correct order of
> effects from a second CPU's accesses, even _if_ the second CPU uses a
> memory barrier, unless the first CPU _also_ uses a matching memory
> barrier (see the subsection on "SMP Barrier Pairing").
[quick answer]
...or the architecture refrains from reordering write requests, like x86
does. What may happen, though, is that the compiler reorders the writes.
Therefore you need at least a (must cheaper) compiler barrier on those
archs. See also linux/Documentation/memory-barriers.txt on this and more.
Jan
>
> So, the lack of memory barrier after atomic_set in xnlock_put looks
> like a bug to me, and your assertion that ticket based algorithm do
> not require memory barriers looks dubious.
>
> Now, I do not really know what "strictly ordered architecture" means,
> (a shame, again, sorry) but I suspect it implies strict ordering on
> one core, but not amongst cores, so that the two barriers thing
> remains mandatory. So, in short, on a fully ordered system, the
> barrier before atomic_set can be removed, but the one after atomic_set
> is still necessary. If this is the case, then we would simply need to
> define an xnarch_local_memory_barrier() which implies ordering on the
> current cpu, and that would simply be a compiler barrier on x86, and
> we do not need a complete reimplementation of the spinlocks just for
> one barrier.
>
> For the same reason, I find that the memory barrier before atomic_read
> in __xnlock_spin is necessary. In fact it is necessary only on x86
> which is the only architecture where cpu_relax() is not defined to be
> a barrier, but anyway, I do not believe this barrier is a problem
> since it happens on a slow path.
>
>
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 11:59 ` Jan Kiszka
@ 2014-09-18 12:11 ` Gilles Chanteperdrix
2014-09-18 12:17 ` Gilles Chanteperdrix
2014-09-18 20:21 ` Gilles Chanteperdrix
2 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 12:11 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 01:59 PM, Jan Kiszka wrote:
> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>> Then the high latencies reliably disappear.
>>>>>>
>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>> I use a program that simply opens and reads the stat file
>>>>>> 1,000 times.
>>>>>>
>>>>>> With locks:
>>>>>>
>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>> 0| -2.575| 1478.154
>>>>>>
>>>>>> Without locks:
>>>>>>
>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>
>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>> locks are malfunctioning, I'm clueless.
>>>>>
>>>>> Answering with a "little" delay, could you try the following
>>>>> patch?
>>>>>
>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>> a/include/asm-generic/system.h +++
>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>> xnarch_memory_barrier();
>>>>>
>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>
>>>> That's pretty heavy-weighted now (it was already due to the first
>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>> other strictly ordered archs), those require no memory barriers
>>>> on release.
>>
>>> In fact, memory barriers aren't needed on strictly ordered archs
>>> already today, independent of the spinlock granting algorithm. So
>>> there are two optimization possibilities:
>>
>>> - ticket-based granting - arch-specific (thus optimized) core
>>
>> Ok, no answer, so I will try to be more clear.
>>
>> I do not pretend to understand how memory barriers work at a low
>> level, this is a shame, I know, and am sorry for that. My "high level"
>> view, is that memory barriers on SMP systems act as synchronization
>> points, meaning that when a CPU issues a barrier, it will "see" the
>> state of the other CPUs at the time of their last barrier. This means
>> that for a CPU to see a store that occured on another CPU, there must
>> have been two barriers: a barrier after the store on one cpu, and a
>> barrier after that before the read on the other cpu. This view of
>> things seems to be corroborated by the fact that the patch works, and
>> by the following sentence in Documentation/memory-barriers.txt:
>>
>> (*) There is no guarantee that a CPU will see the correct order of
>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>> memory barrier, unless the first CPU _also_ uses a matching memory
>> barrier (see the subsection on "SMP Barrier Pairing").
>
> [quick answer]
>
> ...or the architecture refrains from reordering write requests, like x86
> does. What may happen, though, is that the compiler reorders the writes.
> Therefore you need at least a (must cheaper) compiler barrier on those
> archs. See also linux/Documentation/memory-barriers.txt on this and more.
I have answered that, please read the mail completely.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 11:59 ` Jan Kiszka
2014-09-18 12:11 ` Gilles Chanteperdrix
@ 2014-09-18 12:17 ` Gilles Chanteperdrix
2014-09-18 12:20 ` Jan Kiszka
2014-09-18 20:21 ` Gilles Chanteperdrix
2 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 12:17 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 01:59 PM, Jan Kiszka wrote:
> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>> Then the high latencies reliably disappear.
>>>>>>
>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>> I use a program that simply opens and reads the stat file
>>>>>> 1,000 times.
>>>>>>
>>>>>> With locks:
>>>>>>
>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>> 0| -2.575| 1478.154
>>>>>>
>>>>>> Without locks:
>>>>>>
>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>
>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>> locks are malfunctioning, I'm clueless.
>>>>>
>>>>> Answering with a "little" delay, could you try the following
>>>>> patch?
>>>>>
>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>> a/include/asm-generic/system.h +++
>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>> xnarch_memory_barrier();
>>>>>
>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>
>>>> That's pretty heavy-weighted now (it was already due to the first
>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>> other strictly ordered archs), those require no memory barriers
>>>> on release.
>>
>>> In fact, memory barriers aren't needed on strictly ordered archs
>>> already today, independent of the spinlock granting algorithm. So
>>> there are two optimization possibilities:
>>
>>> - ticket-based granting - arch-specific (thus optimized) core
>>
>> Ok, no answer, so I will try to be more clear.
>>
>> I do not pretend to understand how memory barriers work at a low
>> level, this is a shame, I know, and am sorry for that. My "high level"
>> view, is that memory barriers on SMP systems act as synchronization
>> points, meaning that when a CPU issues a barrier, it will "see" the
>> state of the other CPUs at the time of their last barrier. This means
>> that for a CPU to see a store that occured on another CPU, there must
>> have been two barriers: a barrier after the store on one cpu, and a
>> barrier after that before the read on the other cpu. This view of
>> things seems to be corroborated by the fact that the patch works, and
>> by the following sentence in Documentation/memory-barriers.txt:
>>
>> (*) There is no guarantee that a CPU will see the correct order of
>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>> memory barrier, unless the first CPU _also_ uses a matching memory
>> barrier (see the subsection on "SMP Barrier Pairing").
>
> [quick answer]
>
> ...or the architecture refrains from reordering write requests, like x86
> does. What may happen, though, is that the compiler reorders the writes.
> Therefore you need at least a (must cheaper) compiler barrier on those
> archs. See also linux/Documentation/memory-barriers.txt on this and more.
quick answer: I do not believe an SMP architecture can enforce stores
ordering accross multiple cpus, with cpus local caches and such. And the
fact that the patch I sent fixed the issue on x86 tend to prove me right.
The only reason to not put the barrier after the atomic_set would be
some kind of "optimistic unlocking" optimization, where we let the store
pending up to the next barrier, believing that such a barrier should
happen inevitably. This improves the spinlock overhead on one cpu, at
the expense of the spinning time of the other cpus in the contended case.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 12:17 ` Gilles Chanteperdrix
@ 2014-09-18 12:20 ` Jan Kiszka
2014-09-18 13:05 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 12:20 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>> Then the high latencies reliably disappear.
>>>>>>>
>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>> 1,000 times.
>>>>>>>
>>>>>>> With locks:
>>>>>>>
>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>> 0| -2.575| 1478.154
>>>>>>>
>>>>>>> Without locks:
>>>>>>>
>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>
>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>
>>>>>> Answering with a "little" delay, could you try the following
>>>>>> patch?
>>>>>>
>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>> a/include/asm-generic/system.h +++
>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>> xnarch_memory_barrier();
>>>>>>
>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>
>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>> other strictly ordered archs), those require no memory barriers
>>>>> on release.
>>>
>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>> already today, independent of the spinlock granting algorithm. So
>>>> there are two optimization possibilities:
>>>
>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>
>>> Ok, no answer, so I will try to be more clear.
>>>
>>> I do not pretend to understand how memory barriers work at a low
>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>> view, is that memory barriers on SMP systems act as synchronization
>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>> state of the other CPUs at the time of their last barrier. This means
>>> that for a CPU to see a store that occured on another CPU, there must
>>> have been two barriers: a barrier after the store on one cpu, and a
>>> barrier after that before the read on the other cpu. This view of
>>> things seems to be corroborated by the fact that the patch works, and
>>> by the following sentence in Documentation/memory-barriers.txt:
>>>
>>> (*) There is no guarantee that a CPU will see the correct order of
>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>> barrier (see the subsection on "SMP Barrier Pairing").
>>
>> [quick answer]
>>
>> ...or the architecture refrains from reordering write requests, like x86
>> does. What may happen, though, is that the compiler reorders the writes.
>> Therefore you need at least a (must cheaper) compiler barrier on those
>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>
> quick answer: I do not believe an SMP architecture can enforce stores
> ordering accross multiple cpus, with cpus local caches and such. And the
> fact that the patch I sent fixed the issue on x86 tend to prove me right.
It's not wrong, it's just (costly, on larger machines) overkill as the
other cores either see the lock release and all prior changes committed
or the lock taken (and the prior changes do not matter then). They will
never see later changes committed before the lock being visible as free.
That's architecturally guaranteed, and that's why you have no memory
barriers in x86 spinlock release operations.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 12:20 ` Jan Kiszka
@ 2014-09-18 13:05 ` Gilles Chanteperdrix
2014-09-18 13:26 ` Jan Kiszka
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 13:05 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 02:20 PM, Jan Kiszka wrote:
> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>
>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>> 1,000 times.
>>>>>>>>
>>>>>>>> With locks:
>>>>>>>>
>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>
>>>>>>>> Without locks:
>>>>>>>>
>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>
>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>
>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>> patch?
>>>>>>>
>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>> a/include/asm-generic/system.h +++
>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>> xnarch_memory_barrier();
>>>>>>>
>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>
>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>> on release.
>>>>
>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>> already today, independent of the spinlock granting algorithm. So
>>>>> there are two optimization possibilities:
>>>>
>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>
>>>> Ok, no answer, so I will try to be more clear.
>>>>
>>>> I do not pretend to understand how memory barriers work at a low
>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>> view, is that memory barriers on SMP systems act as synchronization
>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>> state of the other CPUs at the time of their last barrier. This means
>>>> that for a CPU to see a store that occured on another CPU, there must
>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>> barrier after that before the read on the other cpu. This view of
>>>> things seems to be corroborated by the fact that the patch works, and
>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>
>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>
>>> [quick answer]
>>>
>>> ...or the architecture refrains from reordering write requests, like x86
>>> does. What may happen, though, is that the compiler reorders the writes.
>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>
>> quick answer: I do not believe an SMP architecture can enforce stores
>> ordering accross multiple cpus, with cpus local caches and such. And the
>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>
> It's not wrong, it's just (costly, on larger machines) overkill as the
> other cores either see the lock release and all prior changes committed
> or the lock taken (and the prior changes do not matter then). They will
> never see later changes committed before the lock being visible as free.
I agree. But this is true on all architectures, not just on strictly
ordered ones, this is just due to how barriers work on SMP systems, as I
explained.
> That's architecturally guaranteed, and that's why you have no memory
> barriers in x86 spinlock release operations.
I disagree, as explained in the paragraph just below the one you quote,
I believe this is an optimization, which is almost valid on any
architecture. Almost valid, because if the cpu which has done the unlock
does another lock without any time for a barrier in between to
synchronize cpus, we have a problem, because the other cpus will never
see the spinlock as free. With ticket spinlocks, you just add a store on
the cpu which spins, and you have to add a barrier after that, if you
want the barrier before the read on the cpu which will acquire the lock
to see that the spinlock is contended. So I do not see how this requires
less barriers.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 13:05 ` Gilles Chanteperdrix
@ 2014-09-18 13:26 ` Jan Kiszka
2014-09-18 13:44 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 13:26 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>
>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>> 1,000 times.
>>>>>>>>>
>>>>>>>>> With locks:
>>>>>>>>>
>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>
>>>>>>>>> Without locks:
>>>>>>>>>
>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>
>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>
>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>> patch?
>>>>>>>>
>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>> xnarch_memory_barrier();
>>>>>>>>
>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>
>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>> on release.
>>>>>
>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>> there are two optimization possibilities:
>>>>>
>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>
>>>>> Ok, no answer, so I will try to be more clear.
>>>>>
>>>>> I do not pretend to understand how memory barriers work at a low
>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>> barrier after that before the read on the other cpu. This view of
>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>
>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>
>>>> [quick answer]
>>>>
>>>> ...or the architecture refrains from reordering write requests, like x86
>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>
>>> quick answer: I do not believe an SMP architecture can enforce stores
>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>
>> It's not wrong, it's just (costly, on larger machines) overkill as the
>> other cores either see the lock release and all prior changes committed
>> or the lock taken (and the prior changes do not matter then). They will
>> never see later changes committed before the lock being visible as free.
>
> I agree. But this is true on all architectures, not just on strictly
> ordered ones, this is just due to how barriers work on SMP systems, as I
> explained.
>
>> That's architecturally guaranteed, and that's why you have no memory
>> barriers in x86 spinlock release operations.
>
> I disagree, as explained in the paragraph just below the one you quote,
> I believe this is an optimization, which is almost valid on any
> architecture. Almost valid, because if the cpu which has done the unlock
> does another lock without any time for a barrier in between to
> synchronize cpus, we have a problem, because the other cpus will never
> see the spinlock as free. With ticket spinlocks, you just add a store on
> the cpu which spins, and you have to add a barrier after that, if you
> want the barrier before the read on the cpu which will acquire the lock
> to see that the spinlock is contended. So I do not see how this requires
> less barriers.
Ticket locks prevent unfair starvation without the closing barrier as
they grant the next ticket to the next waiter, not the current holder.
See the Linux implementation.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 13:26 ` Jan Kiszka
@ 2014-09-18 13:44 ` Gilles Chanteperdrix
2014-09-18 16:14 ` Jan Kiszka
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 13:44 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 03:26 PM, Jan Kiszka wrote:
> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>
>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>> 1,000 times.
>>>>>>>>>>
>>>>>>>>>> With locks:
>>>>>>>>>>
>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>
>>>>>>>>>> Without locks:
>>>>>>>>>>
>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>
>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>
>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>> patch?
>>>>>>>>>
>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>
>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>
>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>> on release.
>>>>>>
>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>> there are two optimization possibilities:
>>>>>>
>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>
>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>
>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>
>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>
>>>>> [quick answer]
>>>>>
>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>
>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>
>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>> other cores either see the lock release and all prior changes committed
>>> or the lock taken (and the prior changes do not matter then). They will
>>> never see later changes committed before the lock being visible as free.
>>
>> I agree. But this is true on all architectures, not just on strictly
>> ordered ones, this is just due to how barriers work on SMP systems, as I
>> explained.
>>
>>> That's architecturally guaranteed, and that's why you have no memory
>>> barriers in x86 spinlock release operations.
>>
>> I disagree, as explained in the paragraph just below the one you quote,
>> I believe this is an optimization, which is almost valid on any
>> architecture. Almost valid, because if the cpu which has done the unlock
>> does another lock without any time for a barrier in between to
>> synchronize cpus, we have a problem, because the other cpus will never
>> see the spinlock as free. With ticket spinlocks, you just add a store on
>> the cpu which spins, and you have to add a barrier after that, if you
>> want the barrier before the read on the cpu which will acquire the lock
>> to see that the spinlock is contended. So I do not see how this requires
>> less barriers.
>
> Ticket locks prevent unfair starvation without the closing barrier as
> they grant the next ticket to the next waiter, not the current holder.
> See the Linux implementation.
Whether to put the closing barrier after the last store is orthogonal,
to whether implementing ticket locks or not. This is all a question of
tradeoffs.
Without the barrier after the last store, you increase the spinning time
due to time taken for the store to be visible on other cpus, but you
optimize the overhead of unlocking.
With ticket spinlocks you avoid the starvation situation, at the expense
of increasing the overhread of spinlock operations.
I do not know which is worse. I suspect all this does not make much of a
difference, and what dominates is the duration of spinlock sections anyway.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 13:44 ` Gilles Chanteperdrix
@ 2014-09-18 16:14 ` Jan Kiszka
2014-09-18 16:28 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 16:14 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>
>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>
>>>>>>>>>>> With locks:
>>>>>>>>>>>
>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>
>>>>>>>>>>> Without locks:
>>>>>>>>>>>
>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>
>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>
>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>> patch?
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>
>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>
>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>> on release.
>>>>>>>
>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>> there are two optimization possibilities:
>>>>>>>
>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>
>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>
>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>
>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>
>>>>>> [quick answer]
>>>>>>
>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>
>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>
>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>> other cores either see the lock release and all prior changes committed
>>>> or the lock taken (and the prior changes do not matter then). They will
>>>> never see later changes committed before the lock being visible as free.
>>>
>>> I agree. But this is true on all architectures, not just on strictly
>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>> explained.
>>>
>>>> That's architecturally guaranteed, and that's why you have no memory
>>>> barriers in x86 spinlock release operations.
>>>
>>> I disagree, as explained in the paragraph just below the one you quote,
>>> I believe this is an optimization, which is almost valid on any
>>> architecture. Almost valid, because if the cpu which has done the unlock
>>> does another lock without any time for a barrier in between to
>>> synchronize cpus, we have a problem, because the other cpus will never
>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>> the cpu which spins, and you have to add a barrier after that, if you
>>> want the barrier before the read on the cpu which will acquire the lock
>>> to see that the spinlock is contended. So I do not see how this requires
>>> less barriers.
>>
>> Ticket locks prevent unfair starvation without the closing barrier as
>> they grant the next ticket to the next waiter, not the current holder.
>> See the Linux implementation.
>
> Whether to put the closing barrier after the last store is orthogonal,
> to whether implementing ticket locks or not. This is all a question of
> tradeoffs.
>
> Without the barrier after the last store, you increase the spinning time
> due to time taken for the store to be visible on other cpus, but you
> optimize the overhead of unlocking.
>
> With ticket spinlocks you avoid the starvation situation, at the expense
> of increasing the overhread of spinlock operations.
>
> I do not know which is worse. I suspect all this does not make much of a
> difference, and what dominates is the duration of spinlock sections anyway.
I think the way classic Linux spinlock did this on x86 provide the answer.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 16:14 ` Jan Kiszka
@ 2014-09-18 16:28 ` Gilles Chanteperdrix
2014-09-18 18:39 ` Gilles Chanteperdrix
2014-09-18 19:09 ` Jan Kiszka
0 siblings, 2 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 16:28 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 06:14 PM, Jan Kiszka wrote:
> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>
>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>
>>>>>>>>>>>> With locks:
>>>>>>>>>>>>
>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>
>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>
>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>
>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>
>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>> patch?
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>
>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>
>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>> on release.
>>>>>>>>
>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>> there are two optimization possibilities:
>>>>>>>>
>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>
>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>
>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>
>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>
>>>>>>> [quick answer]
>>>>>>>
>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>
>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>
>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>> other cores either see the lock release and all prior changes committed
>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>> never see later changes committed before the lock being visible as free.
>>>>
>>>> I agree. But this is true on all architectures, not just on strictly
>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>> explained.
>>>>
>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>> barriers in x86 spinlock release operations.
>>>>
>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>> I believe this is an optimization, which is almost valid on any
>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>> does another lock without any time for a barrier in between to
>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>> want the barrier before the read on the cpu which will acquire the lock
>>>> to see that the spinlock is contended. So I do not see how this requires
>>>> less barriers.
>>>
>>> Ticket locks prevent unfair starvation without the closing barrier as
>>> they grant the next ticket to the next waiter, not the current holder.
>>> See the Linux implementation.
>>
>> Whether to put the closing barrier after the last store is orthogonal,
>> to whether implementing ticket locks or not. This is all a question of
>> tradeoffs.
>>
>> Without the barrier after the last store, you increase the spinning time
>> due to time taken for the store to be visible on other cpus, but you
>> optimize the overhead of unlocking.
>>
>> With ticket spinlocks you avoid the starvation situation, at the expense
>> of increasing the overhread of spinlock operations.
>>
>> I do not know which is worse. I suspect all this does not make much of a
>> difference, and what dominates is the duration of spinlock sections anyway.
>
> I think the way classic Linux spinlock did this on x86 provide the answer.
The situation is completely different: linux spinlocks are well split,
xenomai basically has one only spinlock, so chances are that it will be
more contended, so the heavy unlock path (the one which implements the
ticket stuff) will be triggered more often. Also, xenomai spinlock (we
can loose the s anyway) being more contended, the "pending store
barrier" optimization has in fact chances of being detrimental. And
finally, due to the way spinlocks are split, Linux has scalability
issues that Xenomai can not even begin to imagine tackling.
Anyway, the discussion is kind of moot, because as I said, we are not
going to change the spinlock implementation in 2.6. What we are
discussing here is whether to put the barrier after the atomic_set, or
whether to put that barrier where it is really needed: in the snapshot
code, and what to do for forge. I also agree that the barrier before the
atomic_set in xnlock_put is not needed on x86 and proposed an
architecture macro to replace it with a compiler barrier in that case.
I also proposed to replace the atomic_set with a cmpxchg, cmpxchg has
two barriers on ARM, but I guess on x86 it is only one barrier, this
would solve the architecture dependency nicely.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 16:28 ` Gilles Chanteperdrix
@ 2014-09-18 18:39 ` Gilles Chanteperdrix
2014-09-18 19:23 ` Jan Kiszka
2014-09-18 19:09 ` Jan Kiszka
1 sibling, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 18:39 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 06:28 PM, Gilles Chanteperdrix wrote:
> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>
>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>> patch?
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>
>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>
>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>> on release.
>>>>>>>>>
>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>
>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>
>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>
>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>
>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>
>>>>>>>> [quick answer]
>>>>>>>>
>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>
>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>
>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>> other cores either see the lock release and all prior changes committed
>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>> never see later changes committed before the lock being visible as free.
>>>>>
>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>> explained.
>>>>>
>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>> barriers in x86 spinlock release operations.
>>>>>
>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>> I believe this is an optimization, which is almost valid on any
>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>> does another lock without any time for a barrier in between to
>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>> less barriers.
>>>>
>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>> they grant the next ticket to the next waiter, not the current holder.
>>>> See the Linux implementation.
>>>
>>> Whether to put the closing barrier after the last store is orthogonal,
>>> to whether implementing ticket locks or not. This is all a question of
>>> tradeoffs.
>>>
>>> Without the barrier after the last store, you increase the spinning time
>>> due to time taken for the store to be visible on other cpus, but you
>>> optimize the overhead of unlocking.
>>>
>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>> of increasing the overhread of spinlock operations.
>>>
>>> I do not know which is worse. I suspect all this does not make much of a
>>> difference, and what dominates is the duration of spinlock sections anyway.
>>
>> I think the way classic Linux spinlock did this on x86 provide the answer.
>
> The situation is completely different: linux spinlocks are well split,
> xenomai basically has one only spinlock, so chances are that it will be
> more contended, so the heavy unlock path (the one which implements the
> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
> can loose the s anyway) being more contended, the "pending store
> barrier" optimization has in fact chances of being detrimental. And
> finally, due to the way spinlocks are split, Linux has scalability
> issues that Xenomai can not even begin to imagine tackling.
Finally, in the eternal worst case vs average case fight, the worst case
worth optimizing is the contended case in our case, and I believe adding
the barrier after atomic_set in xnlock_put is what optimizes this worst
case best, because, again, it reduces the time between the unlock and
its visibility on the spinning cpu. That is at least something Linux
does not have to care about, because the worst case is not what it is
optimized for.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 16:28 ` Gilles Chanteperdrix
2014-09-18 18:39 ` Gilles Chanteperdrix
@ 2014-09-18 19:09 ` Jan Kiszka
2014-09-18 19:32 ` Gilles Chanteperdrix
1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 19:09 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 18:28, Gilles Chanteperdrix wrote:
> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>
>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>
>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>
>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>
>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>> patch?
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>
>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>
>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>> on release.
>>>>>>>>>
>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>
>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>
>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>
>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>
>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>
>>>>>>>> [quick answer]
>>>>>>>>
>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>
>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>
>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>> other cores either see the lock release and all prior changes committed
>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>> never see later changes committed before the lock being visible as free.
>>>>>
>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>> explained.
>>>>>
>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>> barriers in x86 spinlock release operations.
>>>>>
>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>> I believe this is an optimization, which is almost valid on any
>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>> does another lock without any time for a barrier in between to
>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>> less barriers.
>>>>
>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>> they grant the next ticket to the next waiter, not the current holder.
>>>> See the Linux implementation.
>>>
>>> Whether to put the closing barrier after the last store is orthogonal,
>>> to whether implementing ticket locks or not. This is all a question of
>>> tradeoffs.
>>>
>>> Without the barrier after the last store, you increase the spinning time
>>> due to time taken for the store to be visible on other cpus, but you
>>> optimize the overhead of unlocking.
>>>
>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>> of increasing the overhread of spinlock operations.
>>>
>>> I do not know which is worse. I suspect all this does not make much of a
>>> difference, and what dominates is the duration of spinlock sections anyway.
>>
>> I think the way classic Linux spinlock did this on x86 provide the answer.
>
> The situation is completely different: linux spinlocks are well split,
> xenomai basically has one only spinlock, so chances are that it will be
> more contended, so the heavy unlock path (the one which implements the
> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
> can loose the s anyway) being more contended, the "pending store
> barrier" optimization has in fact chances of being detrimental. And
> finally, due to the way spinlocks are split, Linux has scalability
> issues that Xenomai can not even begin to imagine tackling.
>
> Anyway, the discussion is kind of moot, because as I said, we are not
> going to change the spinlock implementation in 2.6. What we are
> discussing here is whether to put the barrier after the atomic_set, or
> whether to put that barrier where it is really needed: in the snapshot
> code, and what to do for forge. I also agree that the barrier before the
> atomic_set in xnlock_put is not needed on x86 and proposed an
> architecture macro to replace it with a compiler barrier in that case.
Yes, seems reasonable.
>
> I also proposed to replace the atomic_set with a cmpxchg, cmpxchg has
> two barriers on ARM, but I guess on x86 it is only one barrier, this
> would solve the architecture dependency nicely.
That saves an abstraction but I have no clue if "mfence" is equally
expensive as "lock cmpxchg". If it is, that's fine, but I suspect it's
not (due to the cacheline "lock").
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 18:39 ` Gilles Chanteperdrix
@ 2014-09-18 19:23 ` Jan Kiszka
2014-09-18 19:31 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 19:23 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 20:39, Gilles Chanteperdrix wrote:
> On 09/18/2014 06:28 PM, Gilles Chanteperdrix wrote:
>> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>>
>>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>>
>>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>>> on release.
>>>>>>>>>>
>>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>>
>>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>>
>>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>>
>>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>>
>>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>>
>>>>>>>>> [quick answer]
>>>>>>>>>
>>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>>
>>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>>
>>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>>> other cores either see the lock release and all prior changes committed
>>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>>> never see later changes committed before the lock being visible as free.
>>>>>>
>>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>>> explained.
>>>>>>
>>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>>> barriers in x86 spinlock release operations.
>>>>>>
>>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>>> I believe this is an optimization, which is almost valid on any
>>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>>> does another lock without any time for a barrier in between to
>>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>>> less barriers.
>>>>>
>>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>>> they grant the next ticket to the next waiter, not the current holder.
>>>>> See the Linux implementation.
>>>>
>>>> Whether to put the closing barrier after the last store is orthogonal,
>>>> to whether implementing ticket locks or not. This is all a question of
>>>> tradeoffs.
>>>>
>>>> Without the barrier after the last store, you increase the spinning time
>>>> due to time taken for the store to be visible on other cpus, but you
>>>> optimize the overhead of unlocking.
>>>>
>>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>>> of increasing the overhread of spinlock operations.
>>>>
>>>> I do not know which is worse. I suspect all this does not make much of a
>>>> difference, and what dominates is the duration of spinlock sections anyway.
>>>
>>> I think the way classic Linux spinlock did this on x86 provide the answer.
>>
>> The situation is completely different: linux spinlocks are well split,
>> xenomai basically has one only spinlock, so chances are that it will be
>> more contended, so the heavy unlock path (the one which implements the
>> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
>> can loose the s anyway) being more contended, the "pending store
>> barrier" optimization has in fact chances of being detrimental. And
>> finally, due to the way spinlocks are split, Linux has scalability
>> issues that Xenomai can not even begin to imagine tackling.
>
> Finally, in the eternal worst case vs average case fight, the worst case
> worth optimizing is the contended case in our case, and I believe adding
> the barrier after atomic_set in xnlock_put is what optimizes this worst
> case best, because, again, it reduces the time between the unlock and
> its visibility on the spinning cpu. That is at least something Linux
> does not have to care about, because the worst case is not what it is
> optimized for.
Maybe. Unsure right now, if we see prolonged spinning time due to this
on x86. I suspect not as spinning is not only increasing latencies but
also burning CPU power uselessly, and that would be noticed and disliked
under Linux.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 19:23 ` Jan Kiszka
@ 2014-09-18 19:31 ` Gilles Chanteperdrix
0 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 19:31 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 09:23 PM, Jan Kiszka wrote:
> On 2014-09-18 20:39, Gilles Chanteperdrix wrote:
>> On 09/18/2014 06:28 PM, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>>>> on release.
>>>>>>>>>>>
>>>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>>>
>>>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>>>
>>>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>>>
>>>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>>>
>>>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>>>
>>>>>>>>>> [quick answer]
>>>>>>>>>>
>>>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>>>
>>>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>>>
>>>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>>>> other cores either see the lock release and all prior changes committed
>>>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>>>> never see later changes committed before the lock being visible as free.
>>>>>>>
>>>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>>>> explained.
>>>>>>>
>>>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>>>> barriers in x86 spinlock release operations.
>>>>>>>
>>>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>>>> I believe this is an optimization, which is almost valid on any
>>>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>>>> does another lock without any time for a barrier in between to
>>>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>>>> less barriers.
>>>>>>
>>>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>>>> they grant the next ticket to the next waiter, not the current holder.
>>>>>> See the Linux implementation.
>>>>>
>>>>> Whether to put the closing barrier after the last store is orthogonal,
>>>>> to whether implementing ticket locks or not. This is all a question of
>>>>> tradeoffs.
>>>>>
>>>>> Without the barrier after the last store, you increase the spinning time
>>>>> due to time taken for the store to be visible on other cpus, but you
>>>>> optimize the overhead of unlocking.
>>>>>
>>>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>>>> of increasing the overhread of spinlock operations.
>>>>>
>>>>> I do not know which is worse. I suspect all this does not make much of a
>>>>> difference, and what dominates is the duration of spinlock sections anyway.
>>>>
>>>> I think the way classic Linux spinlock did this on x86 provide the answer.
>>>
>>> The situation is completely different: linux spinlocks are well split,
>>> xenomai basically has one only spinlock, so chances are that it will be
>>> more contended, so the heavy unlock path (the one which implements the
>>> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
>>> can loose the s anyway) being more contended, the "pending store
>>> barrier" optimization has in fact chances of being detrimental. And
>>> finally, due to the way spinlocks are split, Linux has scalability
>>> issues that Xenomai can not even begin to imagine tackling.
>>
>> Finally, in the eternal worst case vs average case fight, the worst case
>> worth optimizing is the contended case in our case, and I believe adding
>> the barrier after atomic_set in xnlock_put is what optimizes this worst
>> case best, because, again, it reduces the time between the unlock and
>> its visibility on the spinning cpu. That is at least something Linux
>> does not have to care about, because the worst case is not what it is
>> optimized for.
>
> Maybe. Unsure right now, if we see prolonged spinning time due to this
> on x86. I suspect not as spinning is not only increasing latencies but
> also burning CPU power uselessly, and that would be noticed and disliked
> under Linux.
Probably it does not have this issue, because it uses atomic add or
atomic cmpxchg to unlock the spinlock as far as I can tell. But these
instructions look as heavy as a barrier to me.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 19:09 ` Jan Kiszka
@ 2014-09-18 19:32 ` Gilles Chanteperdrix
2014-09-18 19:56 ` Jan Kiszka
0 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 19:32 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 09:09 PM, Jan Kiszka wrote:
> On 2014-09-18 18:28, Gilles Chanteperdrix wrote:
>> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>>
>>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>>
>>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>>> on release.
>>>>>>>>>>
>>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>>
>>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>>
>>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>>
>>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>>
>>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>>
>>>>>>>>> [quick answer]
>>>>>>>>>
>>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>>
>>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>>
>>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>>> other cores either see the lock release and all prior changes committed
>>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>>> never see later changes committed before the lock being visible as free.
>>>>>>
>>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>>> explained.
>>>>>>
>>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>>> barriers in x86 spinlock release operations.
>>>>>>
>>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>>> I believe this is an optimization, which is almost valid on any
>>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>>> does another lock without any time for a barrier in between to
>>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>>> less barriers.
>>>>>
>>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>>> they grant the next ticket to the next waiter, not the current holder.
>>>>> See the Linux implementation.
>>>>
>>>> Whether to put the closing barrier after the last store is orthogonal,
>>>> to whether implementing ticket locks or not. This is all a question of
>>>> tradeoffs.
>>>>
>>>> Without the barrier after the last store, you increase the spinning time
>>>> due to time taken for the store to be visible on other cpus, but you
>>>> optimize the overhead of unlocking.
>>>>
>>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>>> of increasing the overhread of spinlock operations.
>>>>
>>>> I do not know which is worse. I suspect all this does not make much of a
>>>> difference, and what dominates is the duration of spinlock sections anyway.
>>>
>>> I think the way classic Linux spinlock did this on x86 provide the answer.
>>
>> The situation is completely different: linux spinlocks are well split,
>> xenomai basically has one only spinlock, so chances are that it will be
>> more contended, so the heavy unlock path (the one which implements the
>> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
>> can loose the s anyway) being more contended, the "pending store
>> barrier" optimization has in fact chances of being detrimental. And
>> finally, due to the way spinlocks are split, Linux has scalability
>> issues that Xenomai can not even begin to imagine tackling.
>>
>> Anyway, the discussion is kind of moot, because as I said, we are not
>> going to change the spinlock implementation in 2.6. What we are
>> discussing here is whether to put the barrier after the atomic_set, or
>> whether to put that barrier where it is really needed: in the snapshot
>> code, and what to do for forge. I also agree that the barrier before the
>> atomic_set in xnlock_put is not needed on x86 and proposed an
>> architecture macro to replace it with a compiler barrier in that case.
>
> Yes, seems reasonable.
>
>>
>> I also proposed to replace the atomic_set with a cmpxchg, cmpxchg has
>> two barriers on ARM, but I guess on x86 it is only one barrier, this
>> would solve the architecture dependency nicely.
>
> That saves an abstraction but I have no clue if "mfence" is equally
> expensive as "lock cmpxchg". If it is, that's fine, but I suspect it's
> not (due to the cacheline "lock").
Unless I misunderstand something in Linux code, it also uses the "lock"
prefix for unlocking ticket spinlocks. Either with a lock; add or with a
lock; cmpxchg.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 19:32 ` Gilles Chanteperdrix
@ 2014-09-18 19:56 ` Jan Kiszka
2014-09-18 20:13 ` Gilles Chanteperdrix
0 siblings, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-18 19:56 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-18 21:32, Gilles Chanteperdrix wrote:
> On 09/18/2014 09:09 PM, Jan Kiszka wrote:
>> On 2014-09-18 18:28, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>>>> on release.
>>>>>>>>>>>
>>>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>>>
>>>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>>>
>>>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>>>
>>>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>>>
>>>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>>>
>>>>>>>>>> [quick answer]
>>>>>>>>>>
>>>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>>>
>>>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>>>
>>>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>>>> other cores either see the lock release and all prior changes committed
>>>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>>>> never see later changes committed before the lock being visible as free.
>>>>>>>
>>>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>>>> explained.
>>>>>>>
>>>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>>>> barriers in x86 spinlock release operations.
>>>>>>>
>>>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>>>> I believe this is an optimization, which is almost valid on any
>>>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>>>> does another lock without any time for a barrier in between to
>>>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>>>> less barriers.
>>>>>>
>>>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>>>> they grant the next ticket to the next waiter, not the current holder.
>>>>>> See the Linux implementation.
>>>>>
>>>>> Whether to put the closing barrier after the last store is orthogonal,
>>>>> to whether implementing ticket locks or not. This is all a question of
>>>>> tradeoffs.
>>>>>
>>>>> Without the barrier after the last store, you increase the spinning time
>>>>> due to time taken for the store to be visible on other cpus, but you
>>>>> optimize the overhead of unlocking.
>>>>>
>>>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>>>> of increasing the overhread of spinlock operations.
>>>>>
>>>>> I do not know which is worse. I suspect all this does not make much of a
>>>>> difference, and what dominates is the duration of spinlock sections anyway.
>>>>
>>>> I think the way classic Linux spinlock did this on x86 provide the answer.
>>>
>>> The situation is completely different: linux spinlocks are well split,
>>> xenomai basically has one only spinlock, so chances are that it will be
>>> more contended, so the heavy unlock path (the one which implements the
>>> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
>>> can loose the s anyway) being more contended, the "pending store
>>> barrier" optimization has in fact chances of being detrimental. And
>>> finally, due to the way spinlocks are split, Linux has scalability
>>> issues that Xenomai can not even begin to imagine tackling.
>>>
>>> Anyway, the discussion is kind of moot, because as I said, we are not
>>> going to change the spinlock implementation in 2.6. What we are
>>> discussing here is whether to put the barrier after the atomic_set, or
>>> whether to put that barrier where it is really needed: in the snapshot
>>> code, and what to do for forge. I also agree that the barrier before the
>>> atomic_set in xnlock_put is not needed on x86 and proposed an
>>> architecture macro to replace it with a compiler barrier in that case.
>>
>> Yes, seems reasonable.
>>
>>>
>>> I also proposed to replace the atomic_set with a cmpxchg, cmpxchg has
>>> two barriers on ARM, but I guess on x86 it is only one barrier, this
>>> would solve the architecture dependency nicely.
>>
>> That saves an abstraction but I have no clue if "mfence" is equally
>> expensive as "lock cmpxchg". If it is, that's fine, but I suspect it's
>> not (due to the cacheline "lock").
>
> Unless I misunderstand something in Linux code, it also uses the "lock"
> prefix for unlocking ticket spinlocks. Either with a lock; add or with a
> lock; cmpxchg.
I'm looking at arch/x86/asm/spinlock.h, and there is only a non-atomic
__add. Same in the disassembly.
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 19:56 ` Jan Kiszka
@ 2014-09-18 20:13 ` Gilles Chanteperdrix
0 siblings, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 20:13 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 09:56 PM, Jan Kiszka wrote:
> On 2014-09-18 21:32, Gilles Chanteperdrix wrote:
>> On 09/18/2014 09:09 PM, Jan Kiszka wrote:
>>> On 2014-09-18 18:28, Gilles Chanteperdrix wrote:
>>>> On 09/18/2014 06:14 PM, Jan Kiszka wrote:
>>>>> On 2014-09-18 15:44, Gilles Chanteperdrix wrote:
>>>>>> On 09/18/2014 03:26 PM, Jan Kiszka wrote:
>>>>>>> On 2014-09-18 15:05, Gilles Chanteperdrix wrote:
>>>>>>>> On 09/18/2014 02:20 PM, Jan Kiszka wrote:
>>>>>>>>> On 2014-09-18 14:17, Gilles Chanteperdrix wrote:
>>>>>>>>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>>>>>>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>>>>>>>>> On 09/11/2014 07:19 AM, Jan Kiszka wrote:
>>>>>>>>>>>>> On 2014-09-11 07:11, Jan Kiszka wrote:
>>>>>>>>>>>>>> On 2014-09-09 23:03, Gilles Chanteperdrix wrote:
>>>>>>>>>>>>>>> On 04/25/2014 12:44 PM, Jeroen Van den Keybus wrote:
>>>>>>>>>>>>>>>> For testing, I've removed the locks from the vfile system.
>>>>>>>>>>>>>>>> Then the high latencies reliably disappear.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> To test, I made two xeno_nucleus modules: one with the
>>>>>>>>>>>>>>>> xnlock_get/put_ in place and one with dummies. Subsequently,
>>>>>>>>>>>>>>>> I use a program that simply opens and reads the stat file
>>>>>>>>>>>>>>>> 1,000 times.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> With locks:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.575|
>>>>>>>>>>>>>>>> -2.309| 9.286| 0| 0| -2.575| 9.286
>>>>>>>>>>>>>>>> RTD| -2.364| -2.276| 1.600| 0| 0|
>>>>>>>>>>>>>>>> -2.575| 9.286 RTD| -2.482| -2.274| 2.165|
>>>>>>>>>>>>>>>> 0| 0| -2.575| 9.286 RTD| -2.368| 135.261|
>>>>>>>>>>>>>>>> 1478.154| 13008| 0| -2.575| 1478.154 RTD|
>>>>>>>>>>>>>>>> -2.368| -2.272| 2.602| 13008| 0| -2.575|
>>>>>>>>>>>>>>>> 1478.154 RTD| -2.499| -2.272| 6.933| 13008|
>>>>>>>>>>>>>>>> 0| -2.575| 1478.154
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Without locks:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> RTT| 00:00:01 (periodic user-mode task, 100 us period,
>>>>>>>>>>>>>>>> priority 99) RTH|----lat min|----lat avg|----lat
>>>>>>>>>>>>>>>> max|-overrun|---msw|---lat best|--lat worst RTD| -2.503|
>>>>>>>>>>>>>>>> -2.270| 3.310| 0| 0| -2.503| 3.310
>>>>>>>>>>>>>>>> RTD| -2.418| -2.284| -1.646| 0| 0|
>>>>>>>>>>>>>>>> -2.503| 3.310 RTD| -2.496| -2.275| 4.630|
>>>>>>>>>>>>>>>> 0| 0| -2.503| 4.630 RTD| -2.374| -2.285|
>>>>>>>>>>>>>>>> -1.458| 0| 0| -2.503| 4.630 RTD|
>>>>>>>>>>>>>>>> -2.452| -2.273| 3.559| 0| 0| -2.503|
>>>>>>>>>>>>>>>> 4.630 RTD| -2.370| -2.285| -1.518| 0|
>>>>>>>>>>>>>>>> 0| -2.503| 4.630 RTD| -2.458| -2.274|
>>>>>>>>>>>>>>>> 4.203| 0| 0| -2.503| 4.630
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'll now have a closer look into the vfile system but if the
>>>>>>>>>>>>>>>> locks are malfunctioning, I'm clueless.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Answering with a "little" delay, could you try the following
>>>>>>>>>>>>>>> patch?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/include/asm-generic/bits/pod.h
>>>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h index a6be0dc..cfb0c71 100644
>>>>>>>>>>>>>>> --- a/include/asm-generic/bits/pod.h +++
>>>>>>>>>>>>>>> b/include/asm-generic/bits/pod.h @@ -248,6 +248,7 @@ void
>>>>>>>>>>>>>>> __xnlock_spin(xnlock_t *lock /*, */ XNLOCK_DBG_CONTEXT_ARGS)
>>>>>>>>>>>>>>> cpu_relax(); xnlock_dbg_spinning(lock, cpu, &spin_limit /*, */
>>>>>>>>>>>>>>> XNLOCK_DBG_PASS_CONTEXT); + xnarch_memory_barrier(); }
>>>>>>>>>>>>>>> while(atomic_read(&lock->owner) != ~0); }
>>>>>>>>>>>>>>> EXPORT_SYMBOL_GPL(__xnlock_spin); diff --git
>>>>>>>>>>>>>>> a/include/asm-generic/system.h b/include/asm-generic/system.h
>>>>>>>>>>>>>>> index 25bd83f..7a8c4d0 100644 ---
>>>>>>>>>>>>>>> a/include/asm-generic/system.h +++
>>>>>>>>>>>>>>> b/include/asm-generic/system.h @@ -378,6 +378,8 @@ static
>>>>>>>>>>>>>>> inline void xnlock_put(xnlock_t *lock)
>>>>>>>>>>>>>>> xnarch_memory_barrier();
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> atomic_set(&lock->owner, ~0); + + xnarch_memory_barrier();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That's pretty heavy-weighted now (it was already due to the first
>>>>>>>>>>>>>> memory barrier). Maybe it's better to look at some ticket lock
>>>>>>>>>>>>>> mechanism like Linux uses for fairness. At least on x86 (and
>>>>>>>>>>>>>> other strictly ordered archs), those require no memory barriers
>>>>>>>>>>>>>> on release.
>>>>>>>>>>>>
>>>>>>>>>>>>> In fact, memory barriers aren't needed on strictly ordered archs
>>>>>>>>>>>>> already today, independent of the spinlock granting algorithm. So
>>>>>>>>>>>>> there are two optimization possibilities:
>>>>>>>>>>>>
>>>>>>>>>>>>> - ticket-based granting - arch-specific (thus optimized) core
>>>>>>>>>>>>
>>>>>>>>>>>> Ok, no answer, so I will try to be more clear.
>>>>>>>>>>>>
>>>>>>>>>>>> I do not pretend to understand how memory barriers work at a low
>>>>>>>>>>>> level, this is a shame, I know, and am sorry for that. My "high level"
>>>>>>>>>>>> view, is that memory barriers on SMP systems act as synchronization
>>>>>>>>>>>> points, meaning that when a CPU issues a barrier, it will "see" the
>>>>>>>>>>>> state of the other CPUs at the time of their last barrier. This means
>>>>>>>>>>>> that for a CPU to see a store that occured on another CPU, there must
>>>>>>>>>>>> have been two barriers: a barrier after the store on one cpu, and a
>>>>>>>>>>>> barrier after that before the read on the other cpu. This view of
>>>>>>>>>>>> things seems to be corroborated by the fact that the patch works, and
>>>>>>>>>>>> by the following sentence in Documentation/memory-barriers.txt:
>>>>>>>>>>>>
>>>>>>>>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>>>>>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>>>>>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>>>>>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>>>>>>>>
>>>>>>>>>>> [quick answer]
>>>>>>>>>>>
>>>>>>>>>>> ...or the architecture refrains from reordering write requests, like x86
>>>>>>>>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>>>>>>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>>>>>>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>>>>>>>>
>>>>>>>>>> quick answer: I do not believe an SMP architecture can enforce stores
>>>>>>>>>> ordering accross multiple cpus, with cpus local caches and such. And the
>>>>>>>>>> fact that the patch I sent fixed the issue on x86 tend to prove me right.
>>>>>>>>>
>>>>>>>>> It's not wrong, it's just (costly, on larger machines) overkill as the
>>>>>>>>> other cores either see the lock release and all prior changes committed
>>>>>>>>> or the lock taken (and the prior changes do not matter then). They will
>>>>>>>>> never see later changes committed before the lock being visible as free.
>>>>>>>>
>>>>>>>> I agree. But this is true on all architectures, not just on strictly
>>>>>>>> ordered ones, this is just due to how barriers work on SMP systems, as I
>>>>>>>> explained.
>>>>>>>>
>>>>>>>>> That's architecturally guaranteed, and that's why you have no memory
>>>>>>>>> barriers in x86 spinlock release operations.
>>>>>>>>
>>>>>>>> I disagree, as explained in the paragraph just below the one you quote,
>>>>>>>> I believe this is an optimization, which is almost valid on any
>>>>>>>> architecture. Almost valid, because if the cpu which has done the unlock
>>>>>>>> does another lock without any time for a barrier in between to
>>>>>>>> synchronize cpus, we have a problem, because the other cpus will never
>>>>>>>> see the spinlock as free. With ticket spinlocks, you just add a store on
>>>>>>>> the cpu which spins, and you have to add a barrier after that, if you
>>>>>>>> want the barrier before the read on the cpu which will acquire the lock
>>>>>>>> to see that the spinlock is contended. So I do not see how this requires
>>>>>>>> less barriers.
>>>>>>>
>>>>>>> Ticket locks prevent unfair starvation without the closing barrier as
>>>>>>> they grant the next ticket to the next waiter, not the current holder.
>>>>>>> See the Linux implementation.
>>>>>>
>>>>>> Whether to put the closing barrier after the last store is orthogonal,
>>>>>> to whether implementing ticket locks or not. This is all a question of
>>>>>> tradeoffs.
>>>>>>
>>>>>> Without the barrier after the last store, you increase the spinning time
>>>>>> due to time taken for the store to be visible on other cpus, but you
>>>>>> optimize the overhead of unlocking.
>>>>>>
>>>>>> With ticket spinlocks you avoid the starvation situation, at the expense
>>>>>> of increasing the overhread of spinlock operations.
>>>>>>
>>>>>> I do not know which is worse. I suspect all this does not make much of a
>>>>>> difference, and what dominates is the duration of spinlock sections anyway.
>>>>>
>>>>> I think the way classic Linux spinlock did this on x86 provide the answer.
>>>>
>>>> The situation is completely different: linux spinlocks are well split,
>>>> xenomai basically has one only spinlock, so chances are that it will be
>>>> more contended, so the heavy unlock path (the one which implements the
>>>> ticket stuff) will be triggered more often. Also, xenomai spinlock (we
>>>> can loose the s anyway) being more contended, the "pending store
>>>> barrier" optimization has in fact chances of being detrimental. And
>>>> finally, due to the way spinlocks are split, Linux has scalability
>>>> issues that Xenomai can not even begin to imagine tackling.
>>>>
>>>> Anyway, the discussion is kind of moot, because as I said, we are not
>>>> going to change the spinlock implementation in 2.6. What we are
>>>> discussing here is whether to put the barrier after the atomic_set, or
>>>> whether to put that barrier where it is really needed: in the snapshot
>>>> code, and what to do for forge. I also agree that the barrier before the
>>>> atomic_set in xnlock_put is not needed on x86 and proposed an
>>>> architecture macro to replace it with a compiler barrier in that case.
>>>
>>> Yes, seems reasonable.
>>>
>>>>
>>>> I also proposed to replace the atomic_set with a cmpxchg, cmpxchg has
>>>> two barriers on ARM, but I guess on x86 it is only one barrier, this
>>>> would solve the architecture dependency nicely.
>>>
>>> That saves an abstraction but I have no clue if "mfence" is equally
>>> expensive as "lock cmpxchg". If it is, that's fine, but I suspect it's
>>> not (due to the cacheline "lock").
>>
>> Unless I misunderstand something in Linux code, it also uses the "lock"
>> prefix for unlocking ticket spinlocks. Either with a lock; add or with a
>> lock; cmpxchg.
>
> I'm looking at arch/x86/asm/spinlock.h, and there is only a non-atomic
> __add. Same in the disassembly.
Indeed, the lock prefix is only used for some erratum on x86_32.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 11:59 ` Jan Kiszka
2014-09-18 12:11 ` Gilles Chanteperdrix
2014-09-18 12:17 ` Gilles Chanteperdrix
@ 2014-09-18 20:21 ` Gilles Chanteperdrix
2014-09-19 2:06 ` Gilles Chanteperdrix
2 siblings, 1 reply; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-18 20:21 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 01:59 PM, Jan Kiszka wrote:
> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>> (*) There is no guarantee that a CPU will see the correct order of
>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>> memory barrier, unless the first CPU _also_ uses a matching memory
>> barrier (see the subsection on "SMP Barrier Pairing").
>
> [quick answer]
>
> ...or the architecture refrains from reordering write requests, like x86
> does. What may happen, though, is that the compiler reorders the writes.
> Therefore you need at least a (must cheaper) compiler barrier on those
> archs. See also linux/Documentation/memory-barriers.txt on this and more.
The passage you quote is quoted from memory-barriers.txt, and I find it
makes it pretty clear that the two barriers are needed for cache
synchronization in the general case. Now, I read more in
memory-barriers, and I do not find easily details about what the fact
that x86 is "strictly ordered" means, and how it relaxes the constraints
on what rules. Maybe you would care to give us the exact passage where
this is mentioned? Also, I would welcome any detail about how SMP cache
synchronization actually works on x86.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-18 20:21 ` Gilles Chanteperdrix
@ 2014-09-19 2:06 ` Gilles Chanteperdrix
2014-09-19 5:41 ` Jan Kiszka
2014-09-19 10:51 ` Gilles Chanteperdrix
0 siblings, 2 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-19 2:06 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/18/2014 10:21 PM, Gilles Chanteperdrix wrote:
> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>> (*) There is no guarantee that a CPU will see the correct order of
>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>> barrier (see the subsection on "SMP Barrier Pairing").
>>
>> [quick answer]
>>
>> ...or the architecture refrains from reordering write requests, like x86
>> does. What may happen, though, is that the compiler reorders the writes.
>> Therefore you need at least a (must cheaper) compiler barrier on those
>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>
> The passage you quote is quoted from memory-barriers.txt, and I find it
> makes it pretty clear that the two barriers are needed for cache
> synchronization in the general case. Now, I read more in
> memory-barriers, and I do not find easily details about what the fact
> that x86 is "strictly ordered" means, and how it relaxes the constraints
> on what rules. Maybe you would care to give us the exact passage where
> this is mentioned? Also, I would welcome any detail about how SMP cache
> synchronization actually works on x86.
Ok, I have read a few things, it would seem recent x86 architectures
(nehalem, sandy bridge and probably haswell) use the MESIF cache
coherence protocol, with a twist for haswell since it introduced
transactional memory. A cache coherence protocol ensures in theory
transparently the same view of cache on all cpus. MESIF itself is
derived from the MESI cache coherence protocol, which is said (by
wikipedia article) to have some performance issues which are generally
compensated by adding a store buffer, which in turn requires memory
barriers for a store on one cpu to be visible in the cache (and so on
other cpus). I did not find any indication that memory barriers are
still needed for this case (which is exactly the case we are interested
in) with MESIF, but no indication that they are not needed either.
Then, I had a look at the ticket spinlocks implementations. The
operations they do are roughly the same as the xnlock implementation,
except that they are optimized for each architecture, and so remove the
useless barriers. The ARM implementation has the barrier after unlock,
and use in addition the special "sev" instruction, allowing the spinning
cpu to wait for this signal with the "wfe" (wait for event) instruction,
and to not burn cpu power when spinning. In fact it does not spin.
Of course, the problem is that they are not recursive, so implementing
recursive tickets spinlocks without adding overhead seems tricky. Just
to test if ticket spinlocks solve the issue which started this thread, I
made the following implementation:
typedef struct {
unsigned owner;
arch_spinlock_t alock;
} xnlock_t;
static inline int __xnlock_get(xnlock_t *lock /*, */
XNLOCK_DBG_CONTEXT_ARGS)
{
unsigned long long start;
int cpu = xnarch_current_cpu();
if (lock->owner == cpu)
return 1;
xnlock_dbg_prepare_acquire(&start);
arch_spin_lock(&lock->alock);
lock->owner = cpu;
xnlock_dbg_acquired(lock, cpu, &start /*, */ XNLOCK_DBG_PASS_CONTEXT);
return 0;
}
static inline void xnlock_put(xnlock_t *lock)
{
if (xnlock_dbg_release(lock))
return;
lock->owner = ~0U;
arch_spin_unlock(&lock->alock);
}
And the good news is yes, this avoids the issue with /proc/xenomai/stat.
The bad news is that it does not answer the question about visibility on
one cpu of stores on another cpu without barrier. Because the ticket
spinlocks work either way on x86: the atomic add at the beginning of
arch_spin_lock ensures both the visibility of the fact that there is a
waiter to the cpu attempting to relock, and of the fact that the spin
lock has been unlocked to the waiting cpu. So, in the particular case of
the concurrent cat /proc/xenomai/stat, the "two barriers needed for
visibility" rule is respected.
I have also measured latencies with a cat /proc/xenomai/stat loop
running, with and without a memory barrier after arch_spin_unlock, and
could not find any difference, minimum, average and maximum latency
after a few minutes of runtime are the same, or at least inferior to 100ns.
I am also wondering if this xnlock implementation could be used on
forge. It has the advantage of benefiting from architecture
optimization, without the need for maintaining architecture dependent code.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-19 2:06 ` Gilles Chanteperdrix
@ 2014-09-19 5:41 ` Jan Kiszka
2014-09-19 7:04 ` Philippe Gerum
2014-09-19 10:51 ` Gilles Chanteperdrix
1 sibling, 1 reply; 40+ messages in thread
From: Jan Kiszka @ 2014-09-19 5:41 UTC (permalink / raw)
To: Gilles Chanteperdrix, Jeroen Van den Keybus; +Cc: xenomai
On 2014-09-19 04:06, Gilles Chanteperdrix wrote:
> On 09/18/2014 10:21 PM, Gilles Chanteperdrix wrote:
>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>
>>> [quick answer]
>>>
>>> ...or the architecture refrains from reordering write requests, like x86
>>> does. What may happen, though, is that the compiler reorders the writes.
>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>
>> The passage you quote is quoted from memory-barriers.txt, and I find it
>> makes it pretty clear that the two barriers are needed for cache
>> synchronization in the general case. Now, I read more in
>> memory-barriers, and I do not find easily details about what the fact
>> that x86 is "strictly ordered" means, and how it relaxes the constraints
>> on what rules. Maybe you would care to give us the exact passage where
>> this is mentioned? Also, I would welcome any detail about how SMP cache
>> synchronization actually works on x86.
>
> Ok, I have read a few things, it would seem recent x86 architectures
> (nehalem, sandy bridge and probably haswell) use the MESIF cache
> coherence protocol, with a twist for haswell since it introduced
> transactional memory. A cache coherence protocol ensures in theory
> transparently the same view of cache on all cpus. MESIF itself is
> derived from the MESI cache coherence protocol, which is said (by
> wikipedia article) to have some performance issues which are generally
> compensated by adding a store buffer, which in turn requires memory
> barriers for a store on one cpu to be visible in the cache (and so on
> other cpus). I did not find any indication that memory barriers are
> still needed for this case (which is exactly the case we are interested
> in) with MESIF, but no indication that they are not needed either.
>
> Then, I had a look at the ticket spinlocks implementations. The
> operations they do are roughly the same as the xnlock implementation,
> except that they are optimized for each architecture, and so remove the
> useless barriers. The ARM implementation has the barrier after unlock,
> and use in addition the special "sev" instruction, allowing the spinning
> cpu to wait for this signal with the "wfe" (wait for event) instruction,
> and to not burn cpu power when spinning. In fact it does not spin.
>
> Of course, the problem is that they are not recursive, so implementing
> recursive tickets spinlocks without adding overhead seems tricky. Just
> to test if ticket spinlocks solve the issue which started this thread, I
> made the following implementation:
>
> typedef struct {
> unsigned owner;
> arch_spinlock_t alock;
> } xnlock_t;
>
> static inline int __xnlock_get(xnlock_t *lock /*, */
> XNLOCK_DBG_CONTEXT_ARGS)
> {
> unsigned long long start;
> int cpu = xnarch_current_cpu();
>
> if (lock->owner == cpu)
> return 1;
>
> xnlock_dbg_prepare_acquire(&start);
>
> arch_spin_lock(&lock->alock);
> lock->owner = cpu;
>
> xnlock_dbg_acquired(lock, cpu, &start /*, */ XNLOCK_DBG_PASS_CONTEXT);
>
> return 0;
> }
>
> static inline void xnlock_put(xnlock_t *lock)
> {
> if (xnlock_dbg_release(lock))
> return;
>
> lock->owner = ~0U;
> arch_spin_unlock(&lock->alock);
> }
>
> And the good news is yes, this avoids the issue with /proc/xenomai/stat.
> The bad news is that it does not answer the question about visibility on
> one cpu of stores on another cpu without barrier. Because the ticket
> spinlocks work either way on x86: the atomic add at the beginning of
> arch_spin_lock ensures both the visibility of the fact that there is a
> waiter to the cpu attempting to relock, and of the fact that the spin
> lock has been unlocked to the waiting cpu. So, in the particular case of
> the concurrent cat /proc/xenomai/stat, the "two barriers needed for
> visibility" rule is respected.
>
> I have also measured latencies with a cat /proc/xenomai/stat loop
> running, with and without a memory barrier after arch_spin_unlock, and
> could not find any difference, minimum, average and maximum latency
> after a few minutes of runtime are the same, or at least inferior to 100ns.
>
> I am also wondering if this xnlock implementation could be used on
> forge. It has the advantage of benefiting from architecture
> optimization, without the need for maintaining architecture dependent code.
>
Indeed, that would be very elegant!
Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-19 5:41 ` Jan Kiszka
@ 2014-09-19 7:04 ` Philippe Gerum
0 siblings, 0 replies; 40+ messages in thread
From: Philippe Gerum @ 2014-09-19 7:04 UTC (permalink / raw)
To: Gilles Chanteperdrix; +Cc: Jan Kiszka, xenomai
On 09/19/2014 07:41 AM, Jan Kiszka wrote:
> On 2014-09-19 04:06, Gilles Chanteperdrix wrote:
>> On 09/18/2014 10:21 PM, Gilles Chanteperdrix wrote:
>>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>>
>>>> [quick answer]
>>>>
>>>> ...or the architecture refrains from reordering write requests, like x86
>>>> does. What may happen, though, is that the compiler reorders the writes.
>>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>>
>>> The passage you quote is quoted from memory-barriers.txt, and I find it
>>> makes it pretty clear that the two barriers are needed for cache
>>> synchronization in the general case. Now, I read more in
>>> memory-barriers, and I do not find easily details about what the fact
>>> that x86 is "strictly ordered" means, and how it relaxes the constraints
>>> on what rules. Maybe you would care to give us the exact passage where
>>> this is mentioned? Also, I would welcome any detail about how SMP cache
>>> synchronization actually works on x86.
>>
>> Ok, I have read a few things, it would seem recent x86 architectures
>> (nehalem, sandy bridge and probably haswell) use the MESIF cache
>> coherence protocol, with a twist for haswell since it introduced
>> transactional memory. A cache coherence protocol ensures in theory
>> transparently the same view of cache on all cpus. MESIF itself is
>> derived from the MESI cache coherence protocol, which is said (by
>> wikipedia article) to have some performance issues which are generally
>> compensated by adding a store buffer, which in turn requires memory
>> barriers for a store on one cpu to be visible in the cache (and so on
>> other cpus). I did not find any indication that memory barriers are
>> still needed for this case (which is exactly the case we are interested
>> in) with MESIF, but no indication that they are not needed either.
>>
>> Then, I had a look at the ticket spinlocks implementations. The
>> operations they do are roughly the same as the xnlock implementation,
>> except that they are optimized for each architecture, and so remove the
>> useless barriers. The ARM implementation has the barrier after unlock,
>> and use in addition the special "sev" instruction, allowing the spinning
>> cpu to wait for this signal with the "wfe" (wait for event) instruction,
>> and to not burn cpu power when spinning. In fact it does not spin.
>>
>> Of course, the problem is that they are not recursive, so implementing
>> recursive tickets spinlocks without adding overhead seems tricky. Just
>> to test if ticket spinlocks solve the issue which started this thread, I
>> made the following implementation:
>>
>> typedef struct {
>> unsigned owner;
>> arch_spinlock_t alock;
>> } xnlock_t;
>>
>> static inline int __xnlock_get(xnlock_t *lock /*, */
>> XNLOCK_DBG_CONTEXT_ARGS)
>> {
>> unsigned long long start;
>> int cpu = xnarch_current_cpu();
>>
>> if (lock->owner == cpu)
>> return 1;
>>
>> xnlock_dbg_prepare_acquire(&start);
>>
>> arch_spin_lock(&lock->alock);
>> lock->owner = cpu;
>>
>> xnlock_dbg_acquired(lock, cpu, &start /*, */ XNLOCK_DBG_PASS_CONTEXT);
>>
>> return 0;
>> }
>>
>> static inline void xnlock_put(xnlock_t *lock)
>> {
>> if (xnlock_dbg_release(lock))
>> return;
>>
>> lock->owner = ~0U;
>> arch_spin_unlock(&lock->alock);
>> }
>>
>> And the good news is yes, this avoids the issue with /proc/xenomai/stat.
>> The bad news is that it does not answer the question about visibility on
>> one cpu of stores on another cpu without barrier. Because the ticket
>> spinlocks work either way on x86: the atomic add at the beginning of
>> arch_spin_lock ensures both the visibility of the fact that there is a
>> waiter to the cpu attempting to relock, and of the fact that the spin
>> lock has been unlocked to the waiting cpu. So, in the particular case of
>> the concurrent cat /proc/xenomai/stat, the "two barriers needed for
>> visibility" rule is respected.
>>
>> I have also measured latencies with a cat /proc/xenomai/stat loop
>> running, with and without a memory barrier after arch_spin_unlock, and
>> could not find any difference, minimum, average and maximum latency
>> after a few minutes of runtime are the same, or at least inferior to 100ns.
>>
>> I am also wondering if this xnlock implementation could be used on
>> forge. It has the advantage of benefiting from architecture
>> optimization, without the need for maintaining architecture dependent code.
>>
>
> Indeed, that would be very elegant!
>
Ack. As you are at it, could you please think of a debug instrumentation
for tracking lock nesting? As we discussed earlier, at some point after
3.0 is out, we may want to get rid of the recursion support in xnlocks,
to eventually map 1:1 over native spinlocks. That would help with this
process.
--
Philippe.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [Xenomai] Reading /proc/xenomai/stat causes high latencies
2014-09-19 2:06 ` Gilles Chanteperdrix
2014-09-19 5:41 ` Jan Kiszka
@ 2014-09-19 10:51 ` Gilles Chanteperdrix
1 sibling, 0 replies; 40+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-19 10:51 UTC (permalink / raw)
To: Jan Kiszka, Jeroen Van den Keybus; +Cc: xenomai
On 09/19/2014 04:06 AM, Gilles Chanteperdrix wrote:
> On 09/18/2014 10:21 PM, Gilles Chanteperdrix wrote:
>> On 09/18/2014 01:59 PM, Jan Kiszka wrote:
>>> On 2014-09-18 13:46, Gilles Chanteperdrix wrote:
>>>> (*) There is no guarantee that a CPU will see the correct order of
>>>> effects from a second CPU's accesses, even _if_ the second CPU uses a
>>>> memory barrier, unless the first CPU _also_ uses a matching memory
>>>> barrier (see the subsection on "SMP Barrier Pairing").
>>>
>>> [quick answer]
>>>
>>> ...or the architecture refrains from reordering write requests, like x86
>>> does. What may happen, though, is that the compiler reorders the writes.
>>> Therefore you need at least a (must cheaper) compiler barrier on those
>>> archs. See also linux/Documentation/memory-barriers.txt on this and more.
>>
>> The passage you quote is quoted from memory-barriers.txt, and I find it
>> makes it pretty clear that the two barriers are needed for cache
>> synchronization in the general case. Now, I read more in
>> memory-barriers, and I do not find easily details about what the fact
>> that x86 is "strictly ordered" means, and how it relaxes the constraints
>> on what rules. Maybe you would care to give us the exact passage where
>> this is mentioned? Also, I would welcome any detail about how SMP cache
>> synchronization actually works on x86.
>
> Ok, I have read a few things, it would seem recent x86 architectures
> (nehalem, sandy bridge and probably haswell) use the MESIF cache
> coherence protocol, with a twist for haswell since it introduced
> transactional memory. A cache coherence protocol ensures in theory
> transparently the same view of cache on all cpus. MESIF itself is
> derived from the MESI cache coherence protocol, which is said (by
> wikipedia article) to have some performance issues which are generally
> compensated by adding a store buffer, which in turn requires memory
> barriers for a store on one cpu to be visible in the cache (and so on
> other cpus). I did not find any indication that memory barriers are
> still needed for this case (which is exactly the case we are interested
> in) with MESIF, but no indication that they are not needed either.
Thinking more about this, the store buffer is there for timing reasons
(because getting the cache line from another cpu takes time), so I
suspect the barrier does not in fact really flush the buffer, but wait
for it to drain, which means issuing the barrier will not, in fact,
change the timing for the visibility of the last store on a distant cpu,
it will simply stall the current cpu.
--
Gilles.
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2014-09-19 10:51 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-22 16:02 [Xenomai] Reading /proc/xenomai/stat causes high latencies Jeroen Van den Keybus
2014-04-23 9:14 ` Jeroen Van den Keybus
2014-04-23 13:45 ` Jeroen Van den Keybus
2014-04-23 14:07 ` Gilles Chanteperdrix
2014-04-23 20:54 ` Jeroen Van den Keybus
2014-04-23 20:56 ` Gilles Chanteperdrix
2014-04-23 21:39 ` Jeroen Van den Keybus
2014-04-23 22:25 ` Gilles Chanteperdrix
2014-04-24 8:57 ` Jeroen Van den Keybus
2014-04-24 14:46 ` Jeroen Van den Keybus
2014-04-25 8:15 ` Jeroen Van den Keybus
2014-04-25 10:44 ` Jeroen Van den Keybus
2014-09-09 21:03 ` Gilles Chanteperdrix
2014-09-10 13:50 ` Jeroen Van den Keybus
2014-09-10 19:47 ` Gilles Chanteperdrix
2014-09-11 5:11 ` Jan Kiszka
2014-09-11 5:19 ` Jan Kiszka
2014-09-18 11:46 ` Gilles Chanteperdrix
2014-09-18 11:59 ` Jan Kiszka
2014-09-18 12:11 ` Gilles Chanteperdrix
2014-09-18 12:17 ` Gilles Chanteperdrix
2014-09-18 12:20 ` Jan Kiszka
2014-09-18 13:05 ` Gilles Chanteperdrix
2014-09-18 13:26 ` Jan Kiszka
2014-09-18 13:44 ` Gilles Chanteperdrix
2014-09-18 16:14 ` Jan Kiszka
2014-09-18 16:28 ` Gilles Chanteperdrix
2014-09-18 18:39 ` Gilles Chanteperdrix
2014-09-18 19:23 ` Jan Kiszka
2014-09-18 19:31 ` Gilles Chanteperdrix
2014-09-18 19:09 ` Jan Kiszka
2014-09-18 19:32 ` Gilles Chanteperdrix
2014-09-18 19:56 ` Jan Kiszka
2014-09-18 20:13 ` Gilles Chanteperdrix
2014-09-18 20:21 ` Gilles Chanteperdrix
2014-09-19 2:06 ` Gilles Chanteperdrix
2014-09-19 5:41 ` Jan Kiszka
2014-09-19 7:04 ` Philippe Gerum
2014-09-19 10:51 ` Gilles Chanteperdrix
2014-09-16 11:09 ` Gilles Chanteperdrix
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.