* Re: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
[not found] <VI1PR06MB44627C92EA94534421191FEBCEA90@VI1PR06MB4462.eurprd06.prod.outlook.com>
@ 2018-12-07 11:44 ` Sebastian Andrzej Siewior
2018-12-07 14:46 ` Andreas Hoefler
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-12-07 11:44 UTC (permalink / raw)
To: Andreas Hoefler; +Cc: linux-rt-users
On 2018-12-06 12:08:00 [+0000], Andreas Hoefler wrote:
> Hi everyone
Hi,
> I have a test application which measures the latency for sending and receiving boost message queues under a certain cpu load.
> There are two background load tasks with rt priority -20 pinned to either of the two cores of an ARMv71 in order to consume a certain amount of cpu load e.g. 40% each.
> My sending and receiving threads have both rt prio -49 FIFO, using a semaphore mechanism with a mutex with attribute PTHREAD_MUTEX_ADAPTIVE and PTHREAD_PRIO_INHERIT for signaling when the message was received and is ready to get the next one. Nothing else is running besides that.
> Apparently I see (pretty much periodically) outliers of up to 30ms (compared to around 10us per send and receive).
I'm not sure of PTHREAD_MUTEX_ADAPTIVE makes any difference. I don't
know what the semaphore does. If you trace the sys_futex syscall you
should only see FUTEX_LOCK_PI and FUTEX_UNLOCK_PI and no FUTEX_WAIT or
FUTEX_WAKE for the two threads.
> I traced it with LTTng with all kernel events and some userspace events enabled and this is what I observe:
> - The Background load gets interrupted as it should by the higher priority sending and receiving threads
> - Periodically I see a sudden stop in either the receiving or the sending thread, in no particular section of the code.
> - During this period one of the threads is in "wait for CPU" state
> - In all cases there seems to be the IRQ19 (arch_timer) which triggers either of the threads to continue (but appeared several times in between)
>
> I am writing because this used to work fine on an older version (not sure but I think it was 4.4.32) where max latencies of about 100us occurred.
> Currently I am testing with Linux am57xx-evm 4.14.79-rt47-gda0d0b490c #8 SMP PREEMPT RT Fri Nov 23 14:32:07 CET 2018 armv7l GNU/Linux. (TI Sitara)
So with HZ=1000 you should have one timer tick each ms. That means for
your 30ms outliner you should see ~29-30 of those. Unless you have NO_HZ
and the CPU wend idle for a longer period of time. Is there no timer for
30ms and the suddenly the one timer wakes the application up? "wait for
CPU" sounds like something got interrupted and something else is busy.
A sched_switch tracer + sched_wakeup events might reveal what happens in
between.
> Thanx a lot
> Andy
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
2018-12-07 11:44 ` outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt Sebastian Andrzej Siewior
@ 2018-12-07 14:46 ` Andreas Hoefler
2018-12-10 14:08 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Hoefler @ 2018-12-07 14:46 UTC (permalink / raw)
To: linux-rt-users; +Cc: Sebastian Andrzej Siewior
Hi
First of all thank you very much for your fast reply, highly appreciated!
>I'm not sure of PTHREAD_MUTEX_ADAPTIVE makes any difference. I don't know what the semaphore does. If you trace the sys_futex syscall you should only see FUTEX_LOCK_PI and >FUTEX_UNLOCK_PI and no FUTEX_WAIT or FUTEX_WAKE for the two threads.
I am tracing futexes, what I see is :
Channel CPU Event type Contents TID Prio
k_0 0 syscall_entry_futex uaddr=2079364, op=139, val=19377, utime=0, uaddr2=2079336, val3=2079360 867 -49
not sure where to find FUTEX_UNLOCK_PI etc..(sorry quite new to this stuff)
>So with HZ=1000 you should have one timer tick each ms. That means for your 30ms outliner you should see ~29-30 of those. Unless you have NO_HZ and the CPU wend idle for a longer period >of time. Is there no timer for 30ms and the suddenly the one timer wakes the application up? "wait for CPU" sounds like something got interrupted and something else is busy.
>A sched_switch tracer + sched_wakeup events might reveal what happens in between.
My config looks like this:
...
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
...
and
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT_COUNT=y
CONFIG_HZ_FIXED=0
CONFIG_HZ_100=y
# CONFIG_HZ_200 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_500 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
Therefore I think it is correct that I am seeing a tick each 10ms, I guess.
This is a snippet from the tracing just before it hangs:
In this case a message has been successfully sent and the sending thread is again at the beginning of its while loop and waiting on the semaphore.
Furthermore the receiving thread has successfully received the message (msqueue->timed_receive()). Right after that it should do a clock_gettime() put the result in a vector,post the semaphore and return to the start of its while loop and wait again for the next message. But it takes 40ms to do this.
Sorry for the lengthy trace here:
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 968 471 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceSend, event_id=9847, message=send
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 968 634 k_1 1 syscall_exit_futex ret=0, uaddr=3069280280, uaddr2=128 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 971 399 k_0 0 syscall_entry_getcpu tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 971 562 k_1 1 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 972 375 k_0 0 syscall_exit_getcpu ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 972 538 k_1 1 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 973 189 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceSend, event_id=9848, message=wait for semaphore
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 974 002 u_1 1 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=9847, message=leaving timed_receive()
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 975 954 k_0 0 syscall_entry_futex uaddr=2079364, op=139, val=19377, utime=0, uaddr2=2079336, val3=2079360 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 976 442 k_1 1 syscall_entry_clock_gettime which_clock=4 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 978 069 k_1 1 syscall_exit_clock_gettime ret=0, tp=1733956 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 980 346 k_0 0 rcu_utilization s=Start context switch 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 981 485 k_0 0 rcu_utilization s=End context switch 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 981 973 k_1 1 irq_handler_entry irq=19, name=arch_timer 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 983 925 k_1 1 timer_hrtimer_cancel hrtimer=0xeed45c90 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 986 039 k_1 1 timer_hrtimer_expire_entry hrtimer=0xeed45c90, now=9665050003223, function=0xc02b3c28 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 987 503 k_0 0 sched_switch prev_comm=MSQueueThreadSe, prev_tid=867, prev_prio=-49, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 991 245 k_0 0 irq_handler_entry irq=19, name=arch_timer 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 993 522 k_0 0 timer_hrtimer_cancel hrtimer=0xeed35c90 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 994 661 k_1 1 irq_softirq_raise vec=7 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 995 149 k_0 0 timer_hrtimer_expire_entry hrtimer=0xeed35c90, now=9665050012658, function=0xc02b3c28 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 996 450 k_1 1 rcu_utilization s=Start scheduler-tick 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 998 239 k_1 1 rcu_utilization s=End scheduler-tick 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 000 842 k_1 1 timer_hrtimer_expire_exit hrtimer=0xeed45c90 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 001 981 k_1 1 timer_hrtimer_start hrtimer=0xeed45c90, function=0xc02b3c28, expires=9665060000000, softexpires=9665060000000, mode=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 004 095 k_0 0 irq_softirq_raise vec=7 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 004 909 k_1 1 irq_handler_exit irq=19, ret=1 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 005 722 k_0 0 irq_softirq_raise vec=1 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 006 698 k_0 0 rcu_utilization s=Start scheduler-tick 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 008 813 k_0 0 rcu_utilization s=End scheduler-tick 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 010 114 k_0 0 timer_hrtimer_expire_exit hrtimer=0xeed35c90 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 010 765 k_1 1 rcu_utilization s=Start context switch 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 011 090 k_0 0 timer_hrtimer_start hrtimer=0xeed35c90, function=0xc02b3c28, expires=9665060000000, softexpires=9665060000000, mode=0 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 011 903 k_1 1 rcu_utilization s=End context switch 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 015 970 k_0 0 irq_handler_exit irq=19, ret=1 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 018 247 k_0 0 sched_waking comm=ktimersoftd/0, tid=8, prio=-64, target_cpu=0 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 019 386 k_1 1 sched_switch prev_comm=MSQueueThreadRc, prev_tid=868, prev_prio=-49, prev_state=0, next_comm=ksoftirqd/1, next_tid=22, next_prio=20 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 022 314 k_0 0 sched_wakeup comm=ktimersoftd/0, tid=8, prio=-64, target_cpu=0 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 024 754 k_0 0 rcu_utilization s=Start context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 025 730 k_0 0 rcu_utilization s=End context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 027 845 k_0 0 sched_switch prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=ktimersoftd/0, next_tid=8, next_prio=-64 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 028 333 k_1 1 irq_softirq_entry vec=7 22 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 035 002 k_0 0 irq_softirq_entry vec=1 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 038 093 k_1 1 irq_softirq_exit vec=7 22 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 038 906 k_0 0 irq_softirq_exit vec=1 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 041 021 k_1 1 rcu_utilization s=Start context switch 22 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 041 834 k_0 0 rcu_utilization s=Start context switch 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 041 834 k_1 1 rcu_utilization s=End context switch 22 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 042 810 k_0 0 rcu_utilization s=End context switch 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 045 087 k_1 1 sched_stat_runtime comm=ksoftirqd/1, tid=22, runtime=29768, vruntime=32336127794 22 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 048 991 k_0 0 sched_switch prev_comm=ktimersoftd/0, prev_tid=8, prev_prio=-64, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 052 407 k_1 1 sched_switch prev_comm=ksoftirqd/1, prev_tid=22, prev_prio=20, prev_state=1, next_comm=kworker/1:0, next_tid=798, next_prio=20 22 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 060 541 k_1 1 rcu_utilization s=Start context switch 798 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 061 517 k_1 1 rcu_utilization s=End context switch 798 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 062 818 k_1 1 sched_stat_runtime comm=kworker/1:0, tid=798, runtime=19683, vruntime=32342094773 798 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 072 253 k_1 1 workqueue_execute_start work=0xedc26e60, function=0xc07c2c00 798 20
..
.. doing basically nothing here for 40ms ..
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 982 356 k_0 0 irq_handler_entry irq=19, name=arch_timer 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 983 007 k_1 1 irq_handler_entry irq=19, name=arch_timer 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 985 447 k_0 0 timer_hrtimer_cancel hrtimer=0xeed35c90 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 985 772 k_1 1 timer_hrtimer_cancel hrtimer=0xc10a37f8 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 987 887 k_0 0 timer_hrtimer_expire_entry hrtimer=0xeed35c90, now=9665090004258, function=0xc02b3c28 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 987 887 k_1 1 timer_hrtimer_expire_entry hrtimer=0xc10a37f8, now=9665090005071, function=0xc026bb58 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 002 039 k_0 0 sched_stat_runtime comm=lttng-consumerd, tid=777, runtime=10003512, vruntime=9743221624 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 003 178 k_1 1 timer_hrtimer_expire_exit hrtimer=0xc10a37f8 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 005 130 k_1 1 timer_hrtimer_start hrtimer=0xc10a37f8, function=0xc026bb58, expires=9666090000000, softexpires=9666090000000, mode=0 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 006 756 k_1 1 timer_hrtimer_cancel hrtimer=0xeed45c90 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 008 708 k_1 1 timer_hrtimer_expire_entry hrtimer=0xeed45c90, now=9665090005071, function=0xc02b3c28 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 010 010 k_0 0 irq_softirq_raise vec=1 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 011 799 k_0 0 rcu_utilization s=Start scheduler-tick 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 012 450 k_1 1 irq_softirq_raise vec=7 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 014 239 k_1 1 irq_softirq_raise vec=1 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 014 890 k_0 0 rcu_utilization s=End scheduler-tick 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 015 378 k_1 1 rcu_utilization s=Start scheduler-tick 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 016 516 k_0 0 timer_hrtimer_expire_exit hrtimer=0xeed35c90 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 017 492 k_1 1 rcu_utilization s=End scheduler-tick 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 017 655 k_0 0 timer_hrtimer_start hrtimer=0xeed35c90, function=0xc02b3c28, expires=9665100000000, softexpires=9665100000000, mode=0 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 018 794 k_1 1 timer_hrtimer_expire_exit hrtimer=0xeed45c90 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 019 770 k_1 1 timer_hrtimer_start hrtimer=0xeed45c90, function=0xc02b3c28, expires=9665100000000, softexpires=9665100000000, mode=0 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 020 746 k_0 0 irq_handler_exit irq=19, ret=1 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 022 372 k_1 1 irq_handler_exit irq=19, ret=1 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 023 511 k_0 0 rcu_utilization s=Start context switch 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 024 812 k_1 1 sched_waking comm=ksoftirqd/1, tid=22, prio=20, target_cpu=1 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 024 812 k_0 0 rcu_utilization s=End context switch 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 028 228 k_0 0 sched_stat_runtime comm=lttng-consumerd, tid=777, runtime=25864, vruntime=9743247488 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 030 994 k_1 1 sched_wakeup comm=ksoftirqd/1, tid=22, prio=20, target_cpu=1 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 031 807 k_0 0 sched_switch prev_comm=lttng-consumerd, prev_tid=777, prev_prio=20, prev_state=4096, next_comm=ktimersoftd/0, next_tid=8, next_prio=-64 777 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 036 199 k_1 1 rcu_utilization s=Start context switch 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 037 500 k_1 1 rcu_utilization s=End context switch 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 039 940 k_1 1 sched_switch prev_comm=swapper/1, prev_tid=0, prev_prio=20, prev_state=0, next_comm=ktimersoftd/1, next_tid=21, next_prio=-64 0 20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 041 404 k_0 0 irq_softirq_entry vec=1 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 045 796 k_0 0 timer_cancel timer=0xff7e3508 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 047 911 k_0 0 timer_expire_entry timer=0xff7e3508, now=936510, function=0xbf339aa4 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 053 279 k_0 0 timer_start timer=0xff7e3508, function=0xbf339aa4, expires=936530, now=936510, flags=256901120 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 054 418 k_1 1 irq_softirq_entry vec=1 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 055 556 k_0 0 timer_expire_exit timer=0xff7e3508 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 057 346 k_1 1 timer_cancel timer=0xff7f3508 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 058 810 k_1 1 timer_expire_entry timer=0xff7f3508, now=936510, function=0xbf339aa4 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 060 274 k_0 0 irq_softirq_exit vec=1 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 061 738 k_1 1 timer_start timer=0xff7f3508, function=0xbf339aa4, expires=936530, now=936510, flags=244318209 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 063 527 k_1 1 timer_expire_exit timer=0xff7f3508 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 063 690 k_0 0 rcu_utilization s=Start context switch 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 064 503 k_0 0 rcu_utilization s=End context switch 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 069 383 k_1 1 irq_softirq_exit vec=1 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 073 612 k_1 1 rcu_utilization s=Start context switch 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 074 751 k_0 0 sched_switch prev_comm=ktimersoftd/0, prev_tid=8, prev_prio=-64, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20 8 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 074 751 k_1 1 rcu_utilization s=End context switch 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 080 770 k_1 1 sched_switch prev_comm=ktimersoftd/1, prev_tid=21, prev_prio=-64, prev_state=1, next_comm=irq/85-48484000, next_tid=105, next_prio=-51 21 -64
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 091 668 k_1 1 sched_migrate_task comm=irq/72-mmc0, tid=85, prio=-51, orig_cpu=1, dest_cpu=0 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 097 850 k_0 0 rcu_utilization s=Start context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 098 988 k_0 0 rcu_utilization s=End context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 102 079 k_0 0 sched_switch prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=irq/72-mmc0, next_tid=85, next_prio=-51 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 118 508 k_1 1 irq_softirq_raise vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 123 876 k_0 0 timer_start timer=0xedd2eddc, function=0xc07e7420, expires=974910, now=936510, flags=587202560 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 125 015 k_1 1 irq_softirq_entry vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 131 684 k_0 0 irq_handler_entry irq=85, name=48484000.ethernet 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 133 311 k_0 0 irq_handler_exit irq=85, ret=2 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 139 167 k_1 1 kmem_cache_alloc call_site=0xc086aac8, ptr=0xed183c00, bytes_req=192, bytes_alloc=192, gfp_flags=17301536 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 145 836 k_0 0 irq_handler_entry irq=72, name=mmc0 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 145 999 k_1 1 net_if_receive_skb skbaddr=0xed242300, len=102, name=eth0, network_header_type=_unknown, network_header=unknown=[] 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 146 650 k_0 0 irq_handler_exit irq=72, ret=2 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 158 524 k_0 0 random_mix_pool_bytes_nolock pool_name=input, bytes=16, IP=3227816468 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 159 012 k_1 1 kmem_cache_alloc call_site=0xc088bc84, ptr=0xd2672400, bytes_req=128, bytes_alloc=128, gfp_flags=17301536 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 164 218 k_0 0 random_credit_entropy_bits pool_name=input, bits=1, entropy_count=47, entropy_total=8765, IP=3227816496 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 169 098 k_1 1 skb_kfree skbaddr=0xed242300, location=0xc08f899c, protocol=2048 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 174 466 k_1 1 kmem_cache_free call_site=0xc0869bc0, ptr=0xed242300 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 183 087 k_0 0 rcu_utilization s=Start context switch 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 184 226 k_0 0 rcu_utilization s=End context switch 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 185 202 k_1 1 napi_poll napi=0xeda39038, dev_name=eth0, work=1, budget=64 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 187 642 k_1 1 irq_softirq_exit vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 191 220 k_0 0 sched_switch prev_comm=irq/72-mmc0, prev_tid=85, prev_prio=-51, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20 85 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 196 914 k_1 1 sched_migrate_task comm=MSQueueThreadRc, tid=868, prio=-49, orig_cpu=1, dest_cpu=0 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 202 119 k_0 0 rcu_utilization s=Start context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 203 095 k_0 0 rcu_utilization s=End context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 205 210 k_0 0 sched_switch prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadRc, next_tid=868, next_prio=-49 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 209 764 k_1 1 irq_softirq_raise vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 218 060 k_1 1 irq_softirq_entry vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 218 386 k_0 0 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 220 663 k_1 1 napi_poll napi=0xeda39038, dev_name=eth0, work=0, budget=64 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 221 151 k_0 0 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 222 615 k_1 1 irq_softirq_exit vec=3 105 -51
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.317 224 730 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=0, message=received
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 236 279 k_1 1 rcu_utilization s=Start context switch 105 -51
..
..
Resumes again as normal...
BR
Andy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
2018-12-07 14:46 ` Andreas Hoefler
@ 2018-12-10 14:08 ` Sebastian Andrzej Siewior
2018-12-10 15:54 ` Andreas Hoefler
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-12-10 14:08 UTC (permalink / raw)
To: Andreas Hoefler; +Cc: linux-rt-users
On 2018-12-07 14:46:26 [+0000], Andreas Hoefler wrote:
> Hi
Hi,
> I am tracing futexes, what I see is :
> Channel CPU Event type Contents TID Prio
> k_0 0 syscall_entry_futex uaddr=2079364, op=139, val=19377, utime=0, uaddr2=2079336, val3=2079360 867 -49
>
> not sure where to find FUTEX_UNLOCK_PI etc..(sorry quite new to this stuff)
In include/uapi/linux/futex.h you have
| #define FUTEX_LOCK_PI 6
| #define FUTEX_WAIT_BITSET 9
among other defines. So op=139 is FUTEX_PRIVATE_FLAG +
FUTEX_WAIT_REQUEUE_PI. This looks good.
> Therefore I think it is correct that I am seeing a tick each 10ms, I guess.
For HZ=100, yes. No idea why I assumed HZ=1000.
> This is a snippet from the tracing just before it hangs:
> In this case a message has been successfully sent and the sending thread is again at the beginning of its while loop and waiting on the semaphore.
> Furthermore the receiving thread has successfully received the message (msqueue->timed_receive()). Right after that it should do a clock_gettime() put the result in a vector,post the semaphore and return to the start of its while loop and wait again for the next message. But it takes 40ms to do this.
> Sorry for the lengthy trace here:
…
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 048 991 k_0 0 sched_switch prev_comm=ktimersoftd/0, prev_tid=8, prev_prio=-64, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20 8 -64
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 052 407 k_1 1 sched_switch prev_comm=ksoftirqd/1, prev_tid=22, prev_prio=20, prev_state=1, next_comm=kworker/1:0, next_tid=798, next_prio=20 22 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 060 541 k_1 1 rcu_utilization s=Start context switch 798 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 061 517 k_1 1 rcu_utilization s=End context switch 798 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 062 818 k_1 1 sched_stat_runtime comm=kworker/1:0, tid=798, runtime=19683, vruntime=32342094773 798 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.277 072 253 k_1 1 workqueue_execute_start work=0xedc26e60, function=0xc07c2c00 798 20
> ..
> .. doing basically nothing here for 40ms ..
So CPU0 switched ktimersoftd/0 -> CPU-BG-Thread0 and
CPU siwtched ksoftirqd/1 -> kworker/1:0.
CPU0 is in userland so we don't see much. CPU1 does the work-func and
don't see much either. The latter is little odd.
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 982 356 k_0 0 irq_handler_entry irq=19, name=arch_timer 777 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.316 983 007 k_1 1 irq_handler_entry irq=19, name=arch_timer 0 20
…
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 031 807 k_0 0 sched_switch prev_comm=lttng-consumerd, prev_tid=777, prev_prio=20, prev_state=4096, next_comm=ktimersoftd/0, next_tid=8, next_prio=-64 777 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 036 199 k_1 1 rcu_utilization s=Start context switch 0 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 037 500 k_1 1 rcu_utilization s=End context switch 0 20
> 10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 039 940 k_1 1 sched_switch prev_comm=swapper/1, prev_tid=0, prev_prio=20, prev_state=0, next_comm=ktimersoftd/1, next_tid=21, next_prio=-64 0 20
and here we switch on CPU0 lttng-consumerd -> ktimersoftd/0 and
on CPU1 swapper/1 -> timersoftd/1.
But based on last output CPU0/1 switched to CPU-BG-Thread0 and
kworker/1:0. So I would expect for CPU0 to switch *from* CPU-BG-Thread0
instead lttng-consumerd unless some output is missing.
> ..
> ..
> Resumes again as normal...
>
> BR
> Andy
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
2018-12-10 14:08 ` Sebastian Andrzej Siewior
@ 2018-12-10 15:54 ` Andreas Hoefler
2018-12-11 8:28 ` Andreas Hoefler
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Hoefler @ 2018-12-10 15:54 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-rt-users
Hi,
Thx again for digging into my problem!
> In include/uapi/linux/futex.h you have
> | #define FUTEX_LOCK_PI 6
> | #define FUTEX_WAIT_BITSET 9
> among other defines. So op=139 is FUTEX_PRIVATE_FLAG +
> FUTEX_WAIT_REQUEUE_PI. This looks good.
Thx, I figured this out in the meantime;-). But taking a closer look at the syscall_futex_entries I see also a lot of op=0 and op=1, meaning FUTEX_WAIT and FUTEX_WAKE. Is this problematic here since you mentioned I should only see FUTEX_LOCK_PI and FUTEX_UNLOCK_PI?
Here is my sending loop in MSQueueThreadSend:
while (msg_counter_send < cnt) {
vosMessageTracepoint("RTTestTraceSend", msg_counter_send, "before wait");
retval = msg_queue_sem->Wait(-1); // wait receiving thread posts the semaphore
vosMessageTracepoint("RTTestTraceSend", retval, "after wait");
getPrecTime(&msq_t0); //clock_gettime()
msq_send->Send(&msg_counter_send, kmsg_size); //boost::interprocess::message_queue::send()
vosMessageTracepoint("RTTestTraceSend", msg_counter_send, "send");
msg_counter_send++;
}
and this is the receiving thread part:
while (msg_counter_rcv < cnt) {
vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv, "start while");
if (msq_rcv->TimedReceive(reinterpret_cast<void*>(&msg), kmsg_size, &recvd_size, kmsq_tmo)) { //boost::interprocess::message_queue::timed_receive()
vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv, "out of timed_receive");
getPrecTime(&msq_t1);
err = (msg != (int)msg_counter_rcv);
fillMsqUsageTime(cnt, &msq_t0, &msq_t1, err);
vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv-msg, "received");
msg_counter_rcv++;
retval = msg_queue_sem->Post(); //signal to send the next message
vosMessageTracepoint("RTTestTraceRec", retval, "post");
}
}
On a healthy run the traces look basically like this (sorry again for the lengthy stuff):
Trace Timestamp Channel CPU Event type Contents TID Prio
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 666 237 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceSend, event_id=9844, message=send
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 666 399 k_1 1 syscall_exit_futex ret=0, uaddr=3069280280, uaddr2=128 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 668 677 k_0 0 syscall_entry_getcpu tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 669 327 k_1 1 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 669 490 k_0 0 syscall_exit_getcpu ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 670 466 k_1 1 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 670 466 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceSend, event_id=9845, message=before wait
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 671 767 u_1 1 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=9844, message= out of timed_receive
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 672 906 k_0 0 syscall_entry_futex uaddr=2079364, op=139, val=19371, utime=0, uaddr2=2079336, val3=2079360 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 673 882 k_1 1 syscall_entry_clock_gettime which_clock=4 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 675 509 k_1 1 syscall_exit_clock_gettime ret=0, tp=1733956 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 677 461 k_0 0 rcu_utilization s=Start context switch 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 678 111 k_1 1 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 678 274 k_0 0 rcu_utilization s=End context switch 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 678 925 k_1 1 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 679 901 u_1 1 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=0, message=received
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 682 503 k_1 1 syscall_entry_futex uaddr=2079364, op=140, val=1, utime=0, uaddr2=2079336, val3=19372 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 683 642 k_0 0 sched_switch prev_comm=MSQueueThreadSe, prev_tid=867, prev_prio=-49, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 688 522 k_1 1 kmem_kmalloc call_site=0xc02b5540, ptr=0xd26876c0, bytes_req=48, bytes_alloc=64, gfp_flags=21004480 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 697 143 k_1 1 syscall_exit_futex ret=1, uaddr=2079364, uaddr2=2079336 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 699 421 k_1 1 syscall_entry_futex uaddr=2079336, op=135, val=35, utime=2147484516, uaddr2=3044516608, val3=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 704 789 k_1 1 sched_waking comm=MSQueueThreadSe, tid=867, prio=-49, target_cpu=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 709 669 k_1 1 sched_wakeup comm=MSQueueThreadSe, tid=867, prio=-49, target_cpu=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 710 319 k_0 0 rcu_utilization s=Start context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 711 458 k_0 0 rcu_utilization s=End context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 713 085 k_1 1 syscall_exit_futex ret=0, uaddr=2079336, uaddr2=3044516608 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 713 410 k_0 0 sched_switch prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadSe, next_tid=867, next_prio=-49 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 715 525 k_1 1 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 716 663 k_1 1 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 717 802 u_1 1 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=0, message=post
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 718 290 k_0 0 kmem_kfree call_site=0xc02b5948, ptr=0xd26876c0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 720 242 k_1 1 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 721 055 k_0 0 syscall_exit_futex ret=0, uaddr=2079364, uaddr2=2079336 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 721 055 k_1 1 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 722 031 u_1 1 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=9845, message=start while
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 723 333 k_0 0 syscall_entry_futex uaddr=2079336, op=135, val=35, utime=2147484515, uaddr2=3044950784, val3=0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 726 911 k_0 0 syscall_exit_futex ret=0, uaddr=2079336, uaddr2=3044950784 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 729 677 k_0 0 syscall_entry_getcpu tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 730 653 k_0 0 syscall_exit_getcpu ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 731 954 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceSend, event_id=0, message=after wait
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 734 069 k_0 0 syscall_entry_clock_gettime which_clock=4 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 735 533 k_0 0 syscall_exit_clock_gettime ret=0, tp=1733948 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 741 389 k_0 0 syscall_entry_futex uaddr=3069284376, op=0, val=2, utime=0, uaddr2=3069284376, val3=0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 742 365 k_1 1 syscall_entry_futex uaddr=3069280280, op=1, val=1, utime=0, uaddr2=128, val3=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 747 407 k_0 0 syscall_exit_futex ret=-11, uaddr=3069284376, uaddr2=3069284376 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 749 034 k_1 1 syscall_exit_futex ret=0, uaddr=3069280280, uaddr2=128 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 750 335 k_0 0 syscall_entry_futex uaddr=3069284400, op=0, val=2, utime=0, uaddr2=3069284400, val3=0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 750 498 k_1 1 syscall_entry_futex uaddr=3069280304, op=1, val=1, utime=0, uaddr2=0, val3=19097 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 755 215 k_0 0 syscall_exit_futex ret=-11, uaddr=3069284400, uaddr2=3069284400 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 755 866 k_1 1 syscall_exit_futex ret=0, uaddr=3069280304, uaddr2=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 756 679 k_0 0 syscall_entry_futex uaddr=3069284404, op=5, val=1, utime=1, uaddr2=3069284400, val3=67108865 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 757 330 k_1 1 syscall_entry_futex uaddr=3069280308, op=265, val=19097, utime=3044514512, uaddr2=0, val3=4294967295 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 758 957 k_1 1 timer_hrtimer_init hrtimer=0xd26e3e00, clockid=0, mode=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 763 349 k_0 0 syscall_exit_futex ret=0, uaddr=3069284404, uaddr2=3069284400 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 764 487 k_1 1 syscall_exit_futex ret=-11, uaddr=3069280308, uaddr2=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 765 138 k_0 0 syscall_entry_futex uaddr=3069284376, op=1, val=1, utime=0, uaddr2=128, val3=0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 767 578 k_1 1 syscall_entry_futex uaddr=3069280280, op=1, val=1, utime=0, uaddr2=128, val3=0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 768 229 k_0 0 syscall_exit_futex ret=0, uaddr=3069284376, uaddr2=128 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 770 669 k_1 1 syscall_exit_futex ret=0, uaddr=3069280280, uaddr2=128 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 770 994 k_0 0 syscall_entry_getcpu tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.276 771 970 k_0 0 syscall_exit_getcpu ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0 867 -49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.276 773 271 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceSend, event_id=9845, message=send
On an unhealthy run I would e.g see the "send" and also the "out of timed_receive" (and also as a last thing the exit_clock_gettime), but I would get the "receive" trace 40ms later.
I noticed a sched_migrate_task for my receiving thread just before getting back to normal:
Trace Timestamp Channel CPU Event type Contents TID Prio
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 196 914 k_1 1 sched_migrate_task comm=MSQueueThreadRc, tid=868, prio=-49, orig_cpu=1, dest_cpu=0 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 202 119 k_0 0 rcu_utilization s=Start context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 203 095 k_0 0 rcu_utilization s=End context switch 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 205 210 k_0 0 sched_switch prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadRc, next_tid=868, next_prio=-49 836 -20
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 209 764 k_1 1 irq_softirq_raise vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 218 060 k_1 1 irq_softirq_entry vec=3 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 218 386 k_0 0 syscall_entry_getcpu tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 220 663 k_1 1 napi_poll napi=0xeda39038, dev_name=eth0, work=0, budget=64 105 -51
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 221 151 k_0 0 syscall_exit_getcpu ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0 868 -49
10.41.140.114/vos70-20161103-195610/kernel 20:57:42.317 222 615 k_1 1 irq_softirq_exit vec=3 105 -51
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit 20:57:42.317 224 730 u_0 0 abb_lttng_ngsa_tp:event_message_tp service_name=RTTestTraceRec, event_id=0, message=received
Sorry for this lengthy post but I am kind of clueless where to look any further.
Cheers
Andy
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
2018-12-10 15:54 ` Andreas Hoefler
@ 2018-12-11 8:28 ` Andreas Hoefler
2018-12-13 16:18 ` Sebastian Andrzej Siewior
0 siblings, 1 reply; 6+ messages in thread
From: Andreas Hoefler @ 2018-12-11 8:28 UTC (permalink / raw)
To: Sebastian Andrzej Siewior; +Cc: linux-rt-users
Hi all
I finally found the root cause of my problem! What made me very suspicious was the fact, that the spikes occur pretty much exactly after 950ms each time.
It turns out that throttling was activated having sched_rt_runtime_us set to 950000. Switching it to -1 and everything runs just fine!
Just out of curiosity: in case the throttling kicks in, are all the rt task just in a suspended state?
Again thanx to Sebastian for helping me out here!
BR
Andy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
2018-12-11 8:28 ` Andreas Hoefler
@ 2018-12-13 16:18 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-12-13 16:18 UTC (permalink / raw)
To: Andreas Hoefler; +Cc: linux-rt-users
On 2018-12-11 08:28:12 [+0000], Andreas Hoefler wrote:
> Hi all
Him
> I finally found the root cause of my problem! What made me very suspicious was the fact, that the spikes occur pretty much exactly after 950ms each time.
> It turns out that throttling was activated having sched_rt_runtime_us set to 950000. Switching it to -1 and everything runs just fine!
Be aware that once a RT task forgets to go to sleep, it may lockup /
starve the whole system.
> Just out of curiosity: in case the throttling kicks in, are all the rt task just in a suspended state?
All RT tasks. This includes threaded interrupts which usually run at
FIFO 50.
> Again thanx to Sebastian for helping me out here!
>
> BR
> Andy
Sebastian
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-13 16:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <VI1PR06MB44627C92EA94534421191FEBCEA90@VI1PR06MB4462.eurprd06.prod.outlook.com>
2018-12-07 11:44 ` outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt Sebastian Andrzej Siewior
2018-12-07 14:46 ` Andreas Hoefler
2018-12-10 14:08 ` Sebastian Andrzej Siewior
2018-12-10 15:54 ` Andreas Hoefler
2018-12-11 8:28 ` Andreas Hoefler
2018-12-13 16:18 ` Sebastian Andrzej Siewior
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.