All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
       [not found] <VI1PR06MB44627C92EA94534421191FEBCEA90@VI1PR06MB4462.eurprd06.prod.outlook.com>
@ 2018-12-07 11:44 ` Sebastian Andrzej Siewior
  2018-12-07 14:46   ` Andreas Hoefler
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-12-07 11:44 UTC (permalink / raw)
  To: Andreas Hoefler; +Cc: linux-rt-users

On 2018-12-06 12:08:00 [+0000], Andreas Hoefler wrote:
> Hi everyone
Hi,

> I have a test application which measures the latency for sending and receiving boost message queues under a certain cpu load.
> There are two background load tasks with rt priority -20 pinned to either of the two cores of an ARMv71 in order to consume a certain amount of cpu load e.g. 40% each.
> My sending and receiving threads have both rt prio -49 FIFO, using a semaphore mechanism with a mutex with attribute PTHREAD_MUTEX_ADAPTIVE and PTHREAD_PRIO_INHERIT for signaling when the message was received and is ready to get the next one. Nothing else is running besides that.
> Apparently  I see (pretty much periodically) outliers of up  to 30ms (compared to around 10us per send and receive).

I'm not sure of PTHREAD_MUTEX_ADAPTIVE makes any difference. I don't
know what the semaphore does. If you trace the sys_futex syscall you
should only see FUTEX_LOCK_PI and FUTEX_UNLOCK_PI and no FUTEX_WAIT or
FUTEX_WAKE for the two threads.

> I traced it with LTTng with all kernel events and some userspace events enabled and this is what I observe: 
> -	The Background load gets interrupted as it should by the higher priority sending and receiving threads
> -	Periodically I see a sudden stop in either the receiving or the sending thread, in no particular section of the code.
> -	During this period one of the threads is in "wait for CPU" state
> -	In all cases there seems to be the IRQ19 (arch_timer) which triggers either of the threads to continue (but appeared several times in between)
> 
> I am writing because this used to work fine on an older version (not sure but I think it was 4.4.32) where max latencies of about 100us occurred.
> Currently I am testing with Linux am57xx-evm 4.14.79-rt47-gda0d0b490c #8 SMP PREEMPT RT Fri Nov 23 14:32:07 CET 2018 armv7l GNU/Linux. (TI Sitara)

So with HZ=1000 you should have one timer tick each ms. That means for
your 30ms outliner you should see ~29-30 of those. Unless you have NO_HZ
and the CPU wend idle for a longer period of time. Is there no timer for
30ms and the suddenly the one timer wakes the application up? "wait for
CPU" sounds like something got interrupted and something else is busy.
A sched_switch tracer + sched_wakeup events might reveal what happens in
between.

> Thanx a lot
> Andy

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
  2018-12-07 11:44 ` outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt Sebastian Andrzej Siewior
@ 2018-12-07 14:46   ` Andreas Hoefler
  2018-12-10 14:08     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Hoefler @ 2018-12-07 14:46 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Sebastian Andrzej Siewior

Hi
First of all thank you very much for your fast reply, highly appreciated!

>I'm not sure of PTHREAD_MUTEX_ADAPTIVE makes any difference. I don't know what the semaphore does. If you trace the sys_futex syscall you should only see FUTEX_LOCK_PI and >FUTEX_UNLOCK_PI and no FUTEX_WAIT or FUTEX_WAKE for the two threads.

I am tracing futexes, what I see is :
Channel CPU	Event type		Contents									TID	 Prio
k_0	0	syscall_entry_futex	uaddr=2079364, op=139, val=19377, utime=0, uaddr2=2079336, val3=2079360	867	-49				

not sure where to find FUTEX_UNLOCK_PI etc..(sorry quite new to this stuff)

>So with HZ=1000 you should have one timer tick each ms. That means for your 30ms outliner you should see ~29-30 of those. Unless you have NO_HZ and the CPU wend idle for a longer period >of time. Is there no timer for 30ms and the suddenly the one timer wakes the application up? "wait for CPU" sounds like something got interrupted and something else is busy.
>A sched_switch tracer + sched_wakeup events might reveal what happens in between.

My config looks like this: 
...
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
...
and
CONFIG_PREEMPT_RT_FULL=y
CONFIG_PREEMPT_COUNT=y
CONFIG_HZ_FIXED=0
CONFIG_HZ_100=y
# CONFIG_HZ_200 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
# CONFIG_HZ_500 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=100
CONFIG_SCHED_HRTICK=y
Therefore I think it is correct that I am seeing a tick each 10ms, I guess.

This is a snippet from the tracing just before it hangs:
In this case a message has been successfully sent and the sending thread is again at the beginning of its while loop and waiting on the semaphore.
Furthermore the receiving thread has successfully received the message (msqueue->timed_receive()). Right after that it should do a clock_gettime() put the result in a vector,post the semaphore and return to the start of its while loop and wait again for the next message. But it takes 40ms to do this.
Sorry for the lengthy trace here:

10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 968 471	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9847, message=send						
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 968 634	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 971 399	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 971 562	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 972 375	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 972 538	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49				
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 973 189	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9848, message=wait for semaphore						
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 974 002	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=9847, message=leaving timed_receive()						
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 975 954	k_0	0	syscall_entry_futex	uaddr=2079364, op=139, val=19377, utime=0, uaddr2=2079336, val3=2079360	867	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 976 442	k_1	1	syscall_entry_clock_gettime	which_clock=4	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 978 069	k_1	1	syscall_exit_clock_gettime	ret=0, tp=1733956	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 980 346	k_0	0	rcu_utilization	s=Start context switch	867	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 981 485	k_0	0	rcu_utilization	s=End context switch	867	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 981 973	k_1	1	irq_handler_entry	irq=19, name=arch_timer	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 983 925	k_1	1	timer_hrtimer_cancel	hrtimer=0xeed45c90	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 986 039	k_1	1	timer_hrtimer_expire_entry	hrtimer=0xeed45c90, now=9665050003223, function=0xc02b3c28	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 987 503	k_0	0	sched_switch	prev_comm=MSQueueThreadSe, prev_tid=867, prev_prio=-49, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	867	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 991 245	k_0	0	irq_handler_entry	irq=19, name=arch_timer	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 993 522	k_0	0	timer_hrtimer_cancel	hrtimer=0xeed35c90	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 994 661	k_1	1	irq_softirq_raise	vec=7	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 995 149	k_0	0	timer_hrtimer_expire_entry	hrtimer=0xeed35c90, now=9665050012658, function=0xc02b3c28	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 996 450	k_1	1	rcu_utilization	s=Start scheduler-tick	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 998 239	k_1	1	rcu_utilization	s=End scheduler-tick	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 000 842	k_1	1	timer_hrtimer_expire_exit	hrtimer=0xeed45c90	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 001 981	k_1	1	timer_hrtimer_start	hrtimer=0xeed45c90, function=0xc02b3c28, expires=9665060000000, softexpires=9665060000000, mode=0	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 004 095	k_0	0	irq_softirq_raise	vec=7	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 004 909	k_1	1	irq_handler_exit	irq=19, ret=1	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 005 722	k_0	0	irq_softirq_raise	vec=1	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 006 698	k_0	0	rcu_utilization	s=Start scheduler-tick	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 008 813	k_0	0	rcu_utilization	s=End scheduler-tick	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 010 114	k_0	0	timer_hrtimer_expire_exit	hrtimer=0xeed35c90	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 010 765	k_1	1	rcu_utilization	s=Start context switch	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 011 090	k_0	0	timer_hrtimer_start	hrtimer=0xeed35c90, function=0xc02b3c28, expires=9665060000000, softexpires=9665060000000, mode=0	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 011 903	k_1	1	rcu_utilization	s=End context switch	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 015 970	k_0	0	irq_handler_exit	irq=19, ret=1	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 018 247	k_0	0	sched_waking	comm=ktimersoftd/0, tid=8, prio=-64, target_cpu=0	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 019 386	k_1	1	sched_switch	prev_comm=MSQueueThreadRc, prev_tid=868, prev_prio=-49, prev_state=0, next_comm=ksoftirqd/1, next_tid=22, next_prio=20	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 022 314	k_0	0	sched_wakeup	comm=ktimersoftd/0, tid=8, prio=-64, target_cpu=0	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 024 754	k_0	0	rcu_utilization	s=Start context switch	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 025 730	k_0	0	rcu_utilization	s=End context switch	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 027 845	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=ktimersoftd/0, next_tid=8, next_prio=-64	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 028 333	k_1	1	irq_softirq_entry	vec=7	22	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 035 002	k_0	0	irq_softirq_entry	vec=1	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 038 093	k_1	1	irq_softirq_exit	vec=7	22	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 038 906	k_0	0	irq_softirq_exit	vec=1	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 041 021	k_1	1	rcu_utilization	s=Start context switch	22	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 041 834	k_0	0	rcu_utilization	s=Start context switch	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 041 834	k_1	1	rcu_utilization	s=End context switch	22	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 042 810	k_0	0	rcu_utilization	s=End context switch	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 045 087	k_1	1	sched_stat_runtime	comm=ksoftirqd/1, tid=22, runtime=29768, vruntime=32336127794	22	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 048 991	k_0	0	sched_switch	prev_comm=ktimersoftd/0, prev_tid=8, prev_prio=-64, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 052 407	k_1	1	sched_switch	prev_comm=ksoftirqd/1, prev_tid=22, prev_prio=20, prev_state=1, next_comm=kworker/1:0, next_tid=798, next_prio=20	22	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 060 541	k_1	1	rcu_utilization	s=Start context switch	798	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 061 517	k_1	1	rcu_utilization	s=End context switch	798	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 062 818	k_1	1	sched_stat_runtime	comm=kworker/1:0, tid=798, runtime=19683, vruntime=32342094773	798	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 072 253	k_1	1	workqueue_execute_start	work=0xedc26e60, function=0xc07c2c00	798	20	
..
.. doing basically nothing here for 40ms ..
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 982 356	k_0	0	irq_handler_entry	irq=19, name=arch_timer	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 983 007	k_1	1	irq_handler_entry	irq=19, name=arch_timer	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 985 447	k_0	0	timer_hrtimer_cancel	hrtimer=0xeed35c90	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 985 772	k_1	1	timer_hrtimer_cancel	hrtimer=0xc10a37f8	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 987 887	k_0	0	timer_hrtimer_expire_entry	hrtimer=0xeed35c90, now=9665090004258, function=0xc02b3c28	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 987 887	k_1	1	timer_hrtimer_expire_entry	hrtimer=0xc10a37f8, now=9665090005071, function=0xc026bb58	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 002 039	k_0	0	sched_stat_runtime	comm=lttng-consumerd, tid=777, runtime=10003512, vruntime=9743221624	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 003 178	k_1	1	timer_hrtimer_expire_exit	hrtimer=0xc10a37f8	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 005 130	k_1	1	timer_hrtimer_start	hrtimer=0xc10a37f8, function=0xc026bb58, expires=9666090000000, softexpires=9666090000000, mode=0	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 006 756	k_1	1	timer_hrtimer_cancel	hrtimer=0xeed45c90	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 008 708	k_1	1	timer_hrtimer_expire_entry	hrtimer=0xeed45c90, now=9665090005071, function=0xc02b3c28	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 010 010	k_0	0	irq_softirq_raise	vec=1	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 011 799	k_0	0	rcu_utilization	s=Start scheduler-tick	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 012 450	k_1	1	irq_softirq_raise	vec=7	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 014 239	k_1	1	irq_softirq_raise	vec=1	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 014 890	k_0	0	rcu_utilization	s=End scheduler-tick	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 015 378	k_1	1	rcu_utilization	s=Start scheduler-tick	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 016 516	k_0	0	timer_hrtimer_expire_exit	hrtimer=0xeed35c90	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 017 492	k_1	1	rcu_utilization	s=End scheduler-tick	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 017 655	k_0	0	timer_hrtimer_start	hrtimer=0xeed35c90, function=0xc02b3c28, expires=9665100000000, softexpires=9665100000000, mode=0	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 018 794	k_1	1	timer_hrtimer_expire_exit	hrtimer=0xeed45c90	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 019 770	k_1	1	timer_hrtimer_start	hrtimer=0xeed45c90, function=0xc02b3c28, expires=9665100000000, softexpires=9665100000000, mode=0	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 020 746	k_0	0	irq_handler_exit	irq=19, ret=1	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 022 372	k_1	1	irq_handler_exit	irq=19, ret=1	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 023 511	k_0	0	rcu_utilization	s=Start context switch	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 024 812	k_1	1	sched_waking	comm=ksoftirqd/1, tid=22, prio=20, target_cpu=1	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 024 812	k_0	0	rcu_utilization	s=End context switch	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 028 228	k_0	0	sched_stat_runtime	comm=lttng-consumerd, tid=777, runtime=25864, vruntime=9743247488	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 030 994	k_1	1	sched_wakeup	comm=ksoftirqd/1, tid=22, prio=20, target_cpu=1	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 031 807	k_0	0	sched_switch	prev_comm=lttng-consumerd, prev_tid=777, prev_prio=20, prev_state=4096, next_comm=ktimersoftd/0, next_tid=8, next_prio=-64	777	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 036 199	k_1	1	rcu_utilization	s=Start context switch	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 037 500	k_1	1	rcu_utilization	s=End context switch	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 039 940	k_1	1	sched_switch	prev_comm=swapper/1, prev_tid=0, prev_prio=20, prev_state=0, next_comm=ktimersoftd/1, next_tid=21, next_prio=-64	0	20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 041 404	k_0	0	irq_softirq_entry	vec=1	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 045 796	k_0	0	timer_cancel	timer=0xff7e3508	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 047 911	k_0	0	timer_expire_entry	timer=0xff7e3508, now=936510, function=0xbf339aa4	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 053 279	k_0	0	timer_start	timer=0xff7e3508, function=0xbf339aa4, expires=936530, now=936510, flags=256901120	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 054 418	k_1	1	irq_softirq_entry	vec=1	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 055 556	k_0	0	timer_expire_exit	timer=0xff7e3508	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 057 346	k_1	1	timer_cancel	timer=0xff7f3508	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 058 810	k_1	1	timer_expire_entry	timer=0xff7f3508, now=936510, function=0xbf339aa4	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 060 274	k_0	0	irq_softirq_exit	vec=1	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 061 738	k_1	1	timer_start	timer=0xff7f3508, function=0xbf339aa4, expires=936530, now=936510, flags=244318209	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 063 527	k_1	1	timer_expire_exit	timer=0xff7f3508	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 063 690	k_0	0	rcu_utilization	s=Start context switch	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 064 503	k_0	0	rcu_utilization	s=End context switch	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 069 383	k_1	1	irq_softirq_exit	vec=1	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 073 612	k_1	1	rcu_utilization	s=Start context switch	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 074 751	k_0	0	sched_switch	prev_comm=ktimersoftd/0, prev_tid=8, prev_prio=-64, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	8	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 074 751	k_1	1	rcu_utilization	s=End context switch	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 080 770	k_1	1	sched_switch	prev_comm=ktimersoftd/1, prev_tid=21, prev_prio=-64, prev_state=1, next_comm=irq/85-48484000, next_tid=105, next_prio=-51	21	-64				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 091 668	k_1	1	sched_migrate_task	comm=irq/72-mmc0, tid=85, prio=-51, orig_cpu=1, dest_cpu=0	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 097 850	k_0	0	rcu_utilization	s=Start context switch	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 098 988	k_0	0	rcu_utilization	s=End context switch	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 102 079	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=irq/72-mmc0, next_tid=85, next_prio=-51	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 118 508	k_1	1	irq_softirq_raise	vec=3	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 123 876	k_0	0	timer_start	timer=0xedd2eddc, function=0xc07e7420, expires=974910, now=936510, flags=587202560	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 125 015	k_1	1	irq_softirq_entry	vec=3	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 131 684	k_0	0	irq_handler_entry	irq=85, name=48484000.ethernet	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 133 311	k_0	0	irq_handler_exit	irq=85, ret=2	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 139 167	k_1	1	kmem_cache_alloc	call_site=0xc086aac8, ptr=0xed183c00, bytes_req=192, bytes_alloc=192, gfp_flags=17301536	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 145 836	k_0	0	irq_handler_entry	irq=72, name=mmc0	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 145 999	k_1	1	net_if_receive_skb	skbaddr=0xed242300, len=102, name=eth0, network_header_type=_unknown, network_header=unknown=[]	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 146 650	k_0	0	irq_handler_exit	irq=72, ret=2	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 158 524	k_0	0	random_mix_pool_bytes_nolock	pool_name=input, bytes=16, IP=3227816468	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 159 012	k_1	1	kmem_cache_alloc	call_site=0xc088bc84, ptr=0xd2672400, bytes_req=128, bytes_alloc=128, gfp_flags=17301536	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 164 218	k_0	0	random_credit_entropy_bits	pool_name=input, bits=1, entropy_count=47, entropy_total=8765, IP=3227816496	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 169 098	k_1	1	skb_kfree	skbaddr=0xed242300, location=0xc08f899c, protocol=2048	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 174 466	k_1	1	kmem_cache_free	call_site=0xc0869bc0, ptr=0xed242300	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 183 087	k_0	0	rcu_utilization	s=Start context switch	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 184 226	k_0	0	rcu_utilization	s=End context switch	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 185 202	k_1	1	napi_poll	napi=0xeda39038, dev_name=eth0, work=1, budget=64	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 187 642	k_1	1	irq_softirq_exit	vec=3	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 191 220	k_0	0	sched_switch	prev_comm=irq/72-mmc0, prev_tid=85, prev_prio=-51, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	85	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 196 914	k_1	1	sched_migrate_task	comm=MSQueueThreadRc, tid=868, prio=-49, orig_cpu=1, dest_cpu=0	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 202 119	k_0	0	rcu_utilization	s=Start context switch	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 203 095	k_0	0	rcu_utilization	s=End context switch	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 205 210	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadRc, next_tid=868, next_prio=-49	836	-20				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 209 764	k_1	1	irq_softirq_raise	vec=3	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 218 060	k_1	1	irq_softirq_entry	vec=3	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 218 386	k_0	0	syscall_entry_getcpu	tcache=0x0	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 220 663	k_1	1	napi_poll	napi=0xeda39038, dev_name=eth0, work=0, budget=64	105	-51				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 221 151	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49				
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 222 615	k_1	1	irq_softirq_exit	vec=3	105	-51				
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.317 224 730	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=received						
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 236 279	k_1	1	rcu_utilization	s=Start context switch	105	-51				
..
..
Resumes again as normal...

BR
Andy


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
  2018-12-07 14:46   ` Andreas Hoefler
@ 2018-12-10 14:08     ` Sebastian Andrzej Siewior
  2018-12-10 15:54       ` Andreas Hoefler
  0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-12-10 14:08 UTC (permalink / raw)
  To: Andreas Hoefler; +Cc: linux-rt-users

On 2018-12-07 14:46:26 [+0000], Andreas Hoefler wrote:
> Hi
Hi,

> I am tracing futexes, what I see is :
> Channel CPU	Event type		Contents									TID	 Prio
> k_0	0	syscall_entry_futex	uaddr=2079364, op=139, val=19377, utime=0, uaddr2=2079336, val3=2079360	867	-49				
> 
> not sure where to find FUTEX_UNLOCK_PI etc..(sorry quite new to this stuff)

In include/uapi/linux/futex.h you have 
| #define FUTEX_LOCK_PI           6
| #define FUTEX_WAIT_BITSET       9

among other defines. So op=139 is FUTEX_PRIVATE_FLAG +
FUTEX_WAIT_REQUEUE_PI. This looks good.

> Therefore I think it is correct that I am seeing a tick each 10ms, I guess.

For HZ=100, yes. No idea why I assumed HZ=1000.
 
> This is a snippet from the tracing just before it hangs:
> In this case a message has been successfully sent and the sending thread is again at the beginning of its while loop and waiting on the semaphore.
> Furthermore the receiving thread has successfully received the message (msqueue->timed_receive()). Right after that it should do a clock_gettime() put the result in a vector,post the semaphore and return to the start of its while loop and wait again for the next message. But it takes 40ms to do this.
> Sorry for the lengthy trace here:
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 048 991	k_0	0	sched_switch	prev_comm=ktimersoftd/0, prev_tid=8, prev_prio=-64, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	8	-64				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 052 407	k_1	1	sched_switch	prev_comm=ksoftirqd/1, prev_tid=22, prev_prio=20, prev_state=1, next_comm=kworker/1:0, next_tid=798, next_prio=20	22	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 060 541	k_1	1	rcu_utilization	s=Start context switch	798	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 061 517	k_1	1	rcu_utilization	s=End context switch	798	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 062 818	k_1	1	sched_stat_runtime	comm=kworker/1:0, tid=798, runtime=19683, vruntime=32342094773	798	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.277 072 253	k_1	1	workqueue_execute_start	work=0xedc26e60, function=0xc07c2c00	798	20	
> ..
> .. doing basically nothing here for 40ms ..

So CPU0 switched ktimersoftd/0 -> CPU-BG-Thread0 and
CPU siwtched ksoftirqd/1 -> kworker/1:0.
CPU0 is in userland so we don't see much. CPU1 does the work-func and
don't see much either. The latter is little odd.

> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 982 356	k_0	0	irq_handler_entry	irq=19, name=arch_timer	777	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.316 983 007	k_1	1	irq_handler_entry	irq=19, name=arch_timer	0	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 031 807	k_0	0	sched_switch	prev_comm=lttng-consumerd, prev_tid=777, prev_prio=20, prev_state=4096, next_comm=ktimersoftd/0, next_tid=8, next_prio=-64	777	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 036 199	k_1	1	rcu_utilization	s=Start context switch	0	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 037 500	k_1	1	rcu_utilization	s=End context switch	0	20				
> 10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 039 940	k_1	1	sched_switch	prev_comm=swapper/1, prev_tid=0, prev_prio=20, prev_state=0, next_comm=ktimersoftd/1, next_tid=21, next_prio=-64	0	20				

and here we switch on CPU0 lttng-consumerd -> ktimersoftd/0 and 
on CPU1 swapper/1 -> timersoftd/1.
But based on last output CPU0/1 switched to CPU-BG-Thread0 and
kworker/1:0. So I would expect for CPU0 to switch *from* CPU-BG-Thread0
instead lttng-consumerd unless some output is missing.

> ..
> ..
> Resumes again as normal...
> 
> BR
> Andy

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
  2018-12-10 14:08     ` Sebastian Andrzej Siewior
@ 2018-12-10 15:54       ` Andreas Hoefler
  2018-12-11  8:28         ` Andreas Hoefler
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Hoefler @ 2018-12-10 15:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi,
Thx again for digging into my problem!

> In include/uapi/linux/futex.h you have
> | #define FUTEX_LOCK_PI           6
> | #define FUTEX_WAIT_BITSET       9

> among other defines. So op=139 is FUTEX_PRIVATE_FLAG +
> FUTEX_WAIT_REQUEUE_PI. This looks good.

Thx, I figured this out in the meantime;-). But taking a closer look at the syscall_futex_entries I see also a lot of op=0 and op=1, meaning FUTEX_WAIT and FUTEX_WAKE. Is this problematic here since you mentioned I should only see FUTEX_LOCK_PI and FUTEX_UNLOCK_PI?
Here is my sending loop in MSQueueThreadSend:

while (msg_counter_send < cnt) {
	vosMessageTracepoint("RTTestTraceSend", msg_counter_send, "before wait");
	retval = msg_queue_sem->Wait(-1); // wait receiving thread posts the semaphore
	vosMessageTracepoint("RTTestTraceSend", retval, "after wait");
   	getPrecTime(&msq_t0); //clock_gettime()
   	msq_send->Send(&msg_counter_send, kmsg_size); //boost::interprocess::message_queue::send()
	vosMessageTracepoint("RTTestTraceSend", msg_counter_send, "send");
    	msg_counter_send++;
  }

and this is the receiving thread part:

while (msg_counter_rcv < cnt) {
	  vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv, "start while");
	  if (msq_rcv->TimedReceive(reinterpret_cast<void*>(&msg), kmsg_size, &recvd_size, kmsq_tmo)) { //boost::interprocess::message_queue::timed_receive()
		  vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv, "out of timed_receive");
		  getPrecTime(&msq_t1);
		  err = (msg != (int)msg_counter_rcv);
		  fillMsqUsageTime(cnt, &msq_t0, &msq_t1, err);
		  vosMessageTracepoint("RTTestTraceRec", msg_counter_rcv-msg, "received");
		  msg_counter_rcv++;
		  retval = msg_queue_sem->Post(); //signal to send the next message
		 vosMessageTracepoint("RTTestTraceRec", retval, "post");
	  }
  }

On a healthy run the traces look basically like this (sorry again for the lengthy stuff):

Trace	Timestamp	Channel	CPU	Event type	Contents	TID	Prio
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 666 237	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9844, message=send		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 666 399	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 668 677	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 669 327	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 669 490	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 670 466	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 670 466	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9845, message=before wait		
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 671 767	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=9844, message= out of timed_receive		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 672 906	k_0	0	syscall_entry_futex	uaddr=2079364, op=139, val=19371, utime=0, uaddr2=2079336, val3=2079360	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 673 882	k_1	1	syscall_entry_clock_gettime	which_clock=4	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 675 509	k_1	1	syscall_exit_clock_gettime	ret=0, tp=1733956	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 677 461	k_0	0	rcu_utilization	s=Start context switch	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 678 111	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 678 274	k_0	0	rcu_utilization	s=End context switch	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 678 925	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 679 901	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=received		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 682 503	k_1	1	syscall_entry_futex	uaddr=2079364, op=140, val=1, utime=0, uaddr2=2079336, val3=19372	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 683 642	k_0	0	sched_switch	prev_comm=MSQueueThreadSe, prev_tid=867, prev_prio=-49, prev_state=1, next_comm=CPU-BG-Thread0, next_tid=836, next_prio=-20	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 688 522	k_1	1	kmem_kmalloc	call_site=0xc02b5540, ptr=0xd26876c0, bytes_req=48, bytes_alloc=64, gfp_flags=21004480	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 697 143	k_1	1	syscall_exit_futex	ret=1, uaddr=2079364, uaddr2=2079336	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 699 421	k_1	1	syscall_entry_futex	uaddr=2079336, op=135, val=35, utime=2147484516, uaddr2=3044516608, val3=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 704 789	k_1	1	sched_waking	comm=MSQueueThreadSe, tid=867, prio=-49, target_cpu=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 709 669	k_1	1	sched_wakeup	comm=MSQueueThreadSe, tid=867, prio=-49, target_cpu=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 710 319	k_0	0	rcu_utilization	s=Start context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 711 458	k_0	0	rcu_utilization	s=End context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 713 085	k_1	1	syscall_exit_futex	ret=0, uaddr=2079336, uaddr2=3044516608	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 713 410	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadSe, next_tid=867, next_prio=-49	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 715 525	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 716 663	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 717 802	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=post		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 718 290	k_0	0	kmem_kfree	call_site=0xc02b5948, ptr=0xd26876c0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 720 242	k_1	1	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 721 055	k_0	0	syscall_exit_futex	ret=0, uaddr=2079364, uaddr2=2079336	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 721 055	k_1	1	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 722 031	u_1	1	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=9845, message=start while		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 723 333	k_0	0	syscall_entry_futex	uaddr=2079336, op=135, val=35, utime=2147484515, uaddr2=3044950784, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 726 911	k_0	0	syscall_exit_futex	ret=0, uaddr=2079336, uaddr2=3044950784	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 729 677	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 730 653	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 731 954	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=0, message=after wait		
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 734 069	k_0	0	syscall_entry_clock_gettime	which_clock=4	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 735 533	k_0	0	syscall_exit_clock_gettime	ret=0, tp=1733948	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 741 389	k_0	0	syscall_entry_futex	uaddr=3069284376, op=0, val=2, utime=0, uaddr2=3069284376, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 742 365	k_1	1	syscall_entry_futex	uaddr=3069280280, op=1, val=1, utime=0, uaddr2=128, val3=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 747 407	k_0	0	syscall_exit_futex	ret=-11, uaddr=3069284376, uaddr2=3069284376	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 749 034	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 750 335	k_0	0	syscall_entry_futex	uaddr=3069284400, op=0, val=2, utime=0, uaddr2=3069284400, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 750 498	k_1	1	syscall_entry_futex	uaddr=3069280304, op=1, val=1, utime=0, uaddr2=0, val3=19097	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 755 215	k_0	0	syscall_exit_futex	ret=-11, uaddr=3069284400, uaddr2=3069284400	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 755 866	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280304, uaddr2=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 756 679	k_0	0	syscall_entry_futex	uaddr=3069284404, op=5, val=1, utime=1, uaddr2=3069284400, val3=67108865	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 757 330	k_1	1	syscall_entry_futex	uaddr=3069280308, op=265, val=19097, utime=3044514512, uaddr2=0, val3=4294967295	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 758 957	k_1	1	timer_hrtimer_init	hrtimer=0xd26e3e00, clockid=0, mode=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 763 349	k_0	0	syscall_exit_futex	ret=0, uaddr=3069284404, uaddr2=3069284400	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 764 487	k_1	1	syscall_exit_futex	ret=-11, uaddr=3069280308, uaddr2=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 765 138	k_0	0	syscall_entry_futex	uaddr=3069284376, op=1, val=1, utime=0, uaddr2=128, val3=0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 767 578	k_1	1	syscall_entry_futex	uaddr=3069280280, op=1, val=1, utime=0, uaddr2=128, val3=0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 768 229	k_0	0	syscall_exit_futex	ret=0, uaddr=3069284376, uaddr2=128	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 770 669	k_1	1	syscall_exit_futex	ret=0, uaddr=3069280280, uaddr2=128	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 770 994	k_0	0	syscall_entry_getcpu	tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.276 771 970	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57e38c4, nodep=0x0, tcache=0x0	867	-49
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.276 773 271	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceSend, event_id=9845, message=send

On an unhealthy run I would e.g see the "send" and also the "out of timed_receive" (and also as a last thing the exit_clock_gettime), but I would get the "receive" trace 40ms later.
I noticed a sched_migrate_task for my receiving thread just before getting back to normal:

Trace	Timestamp	Channel	CPU	Event type	Contents	TID	Prio
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 196 914	k_1	1	sched_migrate_task	comm=MSQueueThreadRc, tid=868, prio=-49, orig_cpu=1, dest_cpu=0	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 202 119	k_0	0	rcu_utilization	s=Start context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 203 095	k_0	0	rcu_utilization	s=End context switch	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 205 210	k_0	0	sched_switch	prev_comm=CPU-BG-Thread0, prev_tid=836, prev_prio=-20, prev_state=0, next_comm=MSQueueThreadRc, next_tid=868, next_prio=-49	836	-20
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 209 764	k_1	1	irq_softirq_raise	vec=3	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 218 060	k_1	1	irq_softirq_entry	vec=3	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 218 386	k_0	0	syscall_entry_getcpu	tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 220 663	k_1	1	napi_poll	napi=0xeda39038, dev_name=eth0, work=0, budget=64	105	-51
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 221 151	k_0	0	syscall_exit_getcpu	ret=0, cpup=0xb57798ac, nodep=0x0, tcache=0x0	868	-49
10.41.140.114/vos70-20161103-195610/kernel	20:57:42.317 222 615	k_1	1	irq_softirq_exit	vec=3	105	-51
10.41.140.114/vos70-20161103-195610/ust/uid/0/32-bit	20:57:42.317 224 730	u_0	0	abb_lttng_ngsa_tp:event_message_tp	service_name=RTTestTraceRec, event_id=0, message=received		

Sorry for this lengthy post but I am kind of clueless where to look any further.
Cheers
Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
  2018-12-10 15:54       ` Andreas Hoefler
@ 2018-12-11  8:28         ` Andreas Hoefler
  2018-12-13 16:18           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 6+ messages in thread
From: Andreas Hoefler @ 2018-12-11  8:28 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi all

I finally found the root cause of my problem! What made me very suspicious was the fact, that the spikes occur pretty much exactly after 950ms each time. 
It turns out that throttling was activated having sched_rt_runtime_us set to 950000. Switching it to -1 and everything runs just fine!

Just out of curiosity: in case the throttling kicks in, are all the rt task just in a suspended state?

Again thanx to Sebastian for helping me out here!

BR
Andy

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt
  2018-12-11  8:28         ` Andreas Hoefler
@ 2018-12-13 16:18           ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 6+ messages in thread
From: Sebastian Andrzej Siewior @ 2018-12-13 16:18 UTC (permalink / raw)
  To: Andreas Hoefler; +Cc: linux-rt-users

On 2018-12-11 08:28:12 [+0000], Andreas Hoefler wrote:
> Hi all
Him

> I finally found the root cause of my problem! What made me very suspicious was the fact, that the spikes occur pretty much exactly after 950ms each time. 
> It turns out that throttling was activated having sched_rt_runtime_us set to 950000. Switching it to -1 and everything runs just fine!

Be aware that once a RT task forgets to go to sleep, it may lockup /
starve the whole system.

> Just out of curiosity: in case the throttling kicks in, are all the rt task just in a suspended state?

All RT tasks. This includes threaded interrupts which usually run at
FIFO 50.

> Again thanx to Sebastian for helping me out here!
> 
> BR
> Andy

Sebastian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-12-13 16:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <VI1PR06MB44627C92EA94534421191FEBCEA90@VI1PR06MB4462.eurprd06.prod.outlook.com>
2018-12-07 11:44 ` outliers when measuring boost msqueues latency v4.14.79-rt47 RT-Preempt Sebastian Andrzej Siewior
2018-12-07 14:46   ` Andreas Hoefler
2018-12-10 14:08     ` Sebastian Andrzej Siewior
2018-12-10 15:54       ` Andreas Hoefler
2018-12-11  8:28         ` Andreas Hoefler
2018-12-13 16:18           ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.