From mboxrd@z Thu Jan 1 00:00:00 1970 References: <87o8bzyxne.fsf@xenomai.org> <10a0-60d0d500-8b-29fb8780@204427917> From: Philippe Gerum Subject: Re: Large gpio interrupt latency In-reply-to: <10a0-60d0d500-8b-29fb8780@204427917> Date: Tue, 22 Jun 2021 09:38:46 +0200 Message-ID: <87lf72z6ux.fsf@xenomai.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: Discussions about the Xenomai project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?utf-8?Q?Fran=C3=A7ois?= Legal Cc: xenomai@xenomai.org Fran=C3=A7ois Legal writes: > Le Lundi, Juin 21, 2021 18:45 CEST, Philippe Gerum a = =C3=A9crit:=20 >=20=20 >>=20 >> Philippe Gerum via Xenomai writes: >>=20 >> > Fran=C3=A7ois Legal writes: >> > >> >> Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum = a =C3=A9crit:=20 >> >>=20=20 >> >>>=20 >> >>> Jan Kiszka writes: >> >>>=20 >> >>> > On 21.06.21 16:28, Philippe Gerum wrote: >> >>> >>=20 >> >>> >> Jan Kiszka writes: >> >>> >>=20 >> >>> >>> On 21.06.21 15:54, Philippe Gerum wrote: >> >>> >>>> >> >>> >>>> Jan Kiszka writes: >> >>> >>>> >> >>> >>>>> On 21.06.21 11:39, Philippe Gerum wrote: >> >>> >>>>>> >> >>> >>>>>> Jan Kiszka writes: >> >>> >>>>>> >> >>> >>>>>>> On 18.06.21 20:41, Fran=C3=A7ois Legal wrote: >> >>> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka a =C3=A9crit:=20 >> >>> >>>>>>>>=20=20 >> >>> >>>>>>>>> On 16.06.21 15:51, Fran=C3=A7ois Legal wrote: >> >>> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka a =C3=A9crit:=20 >> >>> >>>>>>>>>>=20=20 >> >>> >>>>>>>>>>> On 16.06.21 15:29, Fran=C3=A7ois Legal wrote: >> >>> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka a =C3=A9crit:=20 >> >>> >>>>>>>>>>>>=20=20 >> >>> >>>>>>>>>>>>> On 16.06.21 11:12, Fran=C3=A7ois Legal via Xenomai wro= te: >> >>> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzha= n" a =C3=A9crit:=20 >> >>> >>>>>>>>>>>>>>=20=20 >> >>> >>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> -----Original Message----- >> >>> >>>>>>>>>>>>>>>> From: Fran=C3=A7ois Legal =20 >> >>> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM >> >>> >>>>>>>>>>>>>>>> To: Chen, Hongzhan >> >>> >>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org >> >>> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency >> >>> >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongz= han" a =C3=A9crit:=20 >> >>> >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> -----Original Message----- >> >>> >>>>>>>>>>>>>>>>>> From: Xenomai On Be= half Of Fran=C3=A7ois Legal via Xenomai >> >>> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM >> >>> >>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org >> >>> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency >> >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> Hello, >> >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I = submitted a patch to add timestamp retrieval for net packets), I experience= a strange latency problem with taking GPIO interrupts. >> >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (whi= ch as received by the DUT would trigger toggling of a GPIO on the DUT) and = one GPIO. >> >>> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) th= at the DUT would toggle the GPIO 1-2ms after receiving a specific network p= acket. >> >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of eve= nts done in interrupts service routines for GPIOs and network interface. By= checking the timestamps returned by the application, I get an about 50ms d= elay between the network packet and the GPIO (so something between 51 and 5= 2ms delay). >> >>> >>>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device thro= ugh your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*? >> >>> >>>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> Maybe my setup was not very clear. >> >>> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and togg= les one of its GPIOs depending one the network data. >> >>> >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> I've got another device running my realtime recordi= ng app, that receives the same network data as the DUT, and that has one of= its GPIO connected to the DUT GPIO. >> >>> >>>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open= , ioctl (enable interrupt + timestamp), then select & read the GPIO value. >> >>> >>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>>> When issue happen, the recording app side have got s= ame number of network data packages and gpio interrupts with that DUT have = been toggling?=20=20 >> >>> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue t= hat would cause gpio interrupt missing or invalid gpio interrupts when gpio= is connecting on two different boards. >> >>> >>>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the = app. Regarding network, the DUT and the recoding device are connected to th= e same switch with the same port config. >> >>> >>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of = the recording device (switched from rising edge to falling edge), and the r= esult is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency >> >>> >>>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>>> Can you (or did you already) break down the latencies = on the DUT via >> >>> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is ha= ppening in it, >> >>> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions= etc.? >> >>> >>>>>>>>>>>>> >> >>> >>>>>>>>>>>> >> >>> >>>>>>>>>>>> I just did it. >> >>> >>>>>>>>>>>> >> >>> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my ap= plication, triggered the GPIO. >> >>> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure ho= w to read it and what conclusion I could derive : >> >>> >>>>>>>>>>>> cat frozen >> >>> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe re= lease #10 >> >>> >>>>>>>>>>>> -------------------------------------------------------= ----- >> >>> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 = (+10) >> >>> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us >> >>> >>>>>>>>>>>> >> >>> >>>>>>>>>>>> +----- Hard IRQs ('|': locked) >> >>> >>>>>>>>>>>> |+-- Xenomai >> >>> >>>>>>>>>>>> ||+- Linux ('*': domain stalled, '+': current, '#': cu= rrent+stalled) >> >>> >>>>>>>>>>>> ||| +---------- Delay flag ('+': = > 1 us, '!': > 10 us) >> >>> >>>>>>>>>>>> ||| | +- NMI noise ('N') >> >>> >>>>>>>>>>>> ||| | | >> >>> >>>>>>>>>>>> Type User Val. Time Delay Function (= Parent) >> >>> >>>>>>>>>>>> : +func -6087+ 1.302 load_balance+0x1= 4 (run_rebalance_domains+0x7e8) >> >>> >>>>>>>>>>>> : +func -6085 0.826 idle_cpu+0x10 = (load_balance+0x180) >> >>> >>>>>>>>>>>> : +func -6084 0.892 find_busiest_g= roup+0x14 (load_balance+0x1a4) >> >>> >>>>>>>>>>>> : +func -6084 0.757 update_group_c= apacity+0x14 (find_busiest_group+0x128) >> >>> >>>>>>>>>>>> : +func -6083+ 1.452 __msecs_to_jiffi= es+0x10 (update_group_capacity+0x30) >> >>> >>>>>>>>>>>> : +func -6081+ 1.535 idle_cpu+0x10 (f= ind_busiest_group+0x1e4) >> >>> >>>>>>>>>>>> : +func -6080+ 1.410 idle_cpu+0x10 (f= ind_busiest_group+0x1e4) >> >>> >>>>>>>>>>>> : +func -6078 0.967 __msecs_to_jif= fies+0x10 (run_rebalance_domains+0x810) >> >>> >>>>>>>>>>>> : +func -6077 0.886 __rcu_read_unl= ock+0x10 (run_rebalance_domains+0x648) >> >>> >>>>>>>>>>>> : +func -6077 0.820 rcu_bh_qs+0x10= (__do_softirq+0x1b0) >> >>> >>>>>>>>>>>> : +func -6076 0.742 ipipe_stall_ro= ot+0x10 (__do_softirq+0x1b4) >> >>> >>>>>>>>>>>> : +func -6075 0.766 ipipe_root_onl= y+0x10 (ipipe_stall_root+0x18) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6074 0.934 ipipe_trace_be= gin+0x24 (ipipe_root_only+0xb8) >> >>> >>>>>>>>>>>> :| +end 0x80000001 -6073 0.811 ipipe_trace_en= d+0x24 (ipipe_root_only+0xfc) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6072 0.895 ipipe_trace_be= gin+0x24 (ipipe_stall_root+0x78) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6072 0.847 ipipe_trace_en= d+0x24 (ipipe_stall_root+0xb8) >> >>> >>>>>>>>>>>> : #func -6071 0.814 __local_bh_ena= ble+0x10 (__do_softirq+0x214) >> >>> >>>>>>>>>>>> : #func -6070 0.760 ipipe_test_roo= t+0x10 (__local_bh_enable+0x1c) >> >>> >>>>>>>>>>>> :| #begin 0x80000001 -6069 0.907 ipipe_trace_be= gin+0x24 (ipipe_test_root+0x74) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6068 0.898 ipipe_trace_en= d+0x24 (ipipe_test_root+0xb8) >> >>> >>>>>>>>>>>> : #func -6067 0.811 rcu_irq_exit+0= x10 (irq_exit+0x84) >> >>> >>>>>>>>>>>> : #func -6067 0.781 ipipe_test_and= _stall_root+0x10 (rcu_irq_exit+0x18) >> >>> >>>>>>>>>>>> : #func -6066 0.799 ipipe_root_onl= y+0x10 (ipipe_test_and_stall_root+0x18) >> >>> >>>>>>>>>>>> :| #begin 0x80000001 -6065+ 1.041 ipipe_trace_begi= n+0x24 (ipipe_root_only+0xb8) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6064 0.769 ipipe_trace_en= d+0x24 (ipipe_root_only+0xfc) >> >>> >>>>>>>>>>>> :| #begin 0x80000001 -6063 0.895 ipipe_trace_be= gin+0x24 (ipipe_test_and_stall_root+0x80) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6062 0.841 ipipe_trace_en= d+0x24 (ipipe_test_and_stall_root+0xc4) >> >>> >>>>>>>>>>>> : #func -6061+ 1.197 rcu_eqs_enter_co= mmon.constprop.21+0x10 (rcu_irq_exit+0x80) >> >>> >>>>>>>>>>>> :| #begin 0x80000000 -6060+ 1.413 ipipe_trace_begi= n+0x24 (__ipipe_do_sync_stage+0x2b8) >> >>> >>>>>>>>>>>> :| +end 0x00000012 -6059+ 1.044 ipipe_trace_end+= 0x24 (__ipipe_grab_irq+0x84) >> >>> >>>>>>>>>>>> :| +func -6058 0.988 __ipipe_check_= root_interruptible+0x10 (__irq_svc+0x70) >> >>> >>>>>>>>>>>> :| +func -6057 0.976 ipipe_test_roo= t+0x10 (__ipipe_check_root_interruptible+0x68) >> >>> >>>>>>>>>>>> :| +func -6056 0.829 __ipipe_bugon_= irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4) >> >>> >>>>>>>>>>>> :| +end 0x90000000 -6055 0.913 __ipipe_fast_s= vc_irq_exit+0x20 (ipipe_unstall_root+0x88) >> >>> >>>>>>>>>>>> : +func -6054 0.781 ipipe_test_roo= t+0x10 (cpu_startup_entry+0x12c) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6053 0.868 ipipe_trace_be= gin+0x24 (ipipe_test_root+0x74) >> >>> >>>>>>>>>>>> :| +end 0x80000001 -6052 0.781 ipipe_trace_en= d+0x24 (ipipe_test_root+0xb8) >> >>> >>>>>>>>>>>> : +func -6052 0.748 rcu_idle_exit+= 0x10 (cpu_startup_entry+0x138) >> >>> >>>>>>>>>>>> : +func -6051 0.739 ipipe_test_and= _stall_root+0x10 (rcu_idle_exit+0x18) >> >>> >>>>>>>>>>>> : +func -6050 0.775 ipipe_root_onl= y+0x10 (ipipe_test_and_stall_root+0x18) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6049+ 1.011 ipipe_trace_begi= n+0x24 (ipipe_root_only+0xb8) >> >>> >>>>>>>>>>>> :| +end 0x80000001 -6048 0.742 ipipe_trace_en= d+0x24 (ipipe_root_only+0xfc) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6048 0.940 ipipe_trace_be= gin+0x24 (ipipe_test_and_stall_root+0x80) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6047 0.790 ipipe_trace_en= d+0x24 (ipipe_test_and_stall_root+0xc4) >> >>> >>>>>>>>>>>> : #func -6046 0.859 rcu_eqs_exit_c= ommon.constprop.19+0x10 (rcu_idle_exit+0x8c) >> >>> >>>>>>>>>>>> : #func -6045 0.772 ipipe_unstall_= root+0x10 (rcu_idle_exit+0x78) >> >>> >>>>>>>>>>>> :| #begin 0x80000000 -6044 0.814 ipipe_trace_be= gin+0x24 (ipipe_unstall_root+0x98) >> >>> >>>>>>>>>>>> :| #func -6043+ 1.077 ipipe_root_only+= 0x10 (ipipe_unstall_root+0x24) >> >>> >>>>>>>>>>>> :| +end 0x80000000 -6042 0.835 ipipe_trace_en= d+0x24 (ipipe_unstall_root+0x84) >> >>> >>>>>>>>>>>> : +func -6042 0.922 arch_cpu_idle_= exit+0x10 (cpu_startup_entry+0xfc) >> >>> >>>>>>>>>>>> : +func -6041 0.793 ipipe_stall_ro= ot+0x10 (cpu_startup_entry+0xc4) >> >>> >>>>>>>>>>>> : +func -6040 0.724 ipipe_root_onl= y+0x10 (ipipe_stall_root+0x18) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6039+ 1.098 ipipe_trace_begi= n+0x24 (ipipe_root_only+0xb8) >> >>> >>>>>>>>>>>> :| +end 0x80000001 -6038 0.772 ipipe_trace_en= d+0x24 (ipipe_root_only+0xfc) >> >>> >>>>>>>>>>>> :| +begin 0x80000001 -6037 0.841 ipipe_trace_be= gin+0x24 (ipipe_stall_root+0x78) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6036 0.763 ipipe_trace_en= d+0x24 (ipipe_stall_root+0xb8) >> >>> >>>>>>>>>>>> : #func -6036 0.838 arch_cpu_idle_= enter+0x10 (cpu_startup_entry+0xc8) >> >>> >>>>>>>>>>>> : #func -6035 0.745 arm_heavy_mb+0= x10 (arch_cpu_idle_enter+0x1c) >> >>> >>>>>>>>>>>> : #func -6034 0.916 l2c210_sync+0x= 10 (arm_heavy_mb+0x2c) >> >>> >>>>>>>>>>>> : #func -6033+ 1.062 tick_check_broad= cast_expired+0x10 (cpu_startup_entry+0xd8) >> >>> >>>>>>>>>>>> : #func -6032 0.787 rcu_idle_enter= +0x10 (cpu_startup_entry+0x124) >> >>> >>>>>>>>>>>> : #func -6031 0.745 ipipe_test_and= _stall_root+0x10 (rcu_idle_enter+0x18) >> >>> >>>>>>>>>>>> : #func -6031 0.751 ipipe_root_onl= y+0x10 (ipipe_test_and_stall_root+0x18) >> >>> >>>>>>>>>>>> :| #begin 0x80000001 -6030 0.991 ipipe_trace_be= gin+0x24 (ipipe_root_only+0xb8) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6029 0.772 ipipe_trace_en= d+0x24 (ipipe_root_only+0xfc) >> >>> >>>>>>>>>>>> :| #begin 0x80000001 -6028 0.892 ipipe_trace_be= gin+0x24 (ipipe_test_and_stall_root+0x80) >> >>> >>>>>>>>>>>> :| #end 0x80000001 -6027 0.847 ipipe_trace_en= d+0x24 (ipipe_test_and_stall_root+0xc4) >> >>> >>>>>>>>>>>> : #func -6026 0.922 rcu_eqs_enter_= common.constprop.21+0x10 (rcu_idle_enter+0x90) >> >>> >>>>>>>>>>>> : #func -6025 0.862 default_idle_c= all+0x10 (cpu_startup_entry+0x128) >> >>> >>>>>>>>>>>> : #func -6024 0.877 arch_cpu_idle+= 0x10 (default_idle_call+0x38) >> >>> >>>>>>>>>>>> :| #begin 0x80000000 -6024! 5992.167 ipipe_trace_beg= in+0x24 (arch_cpu_idle+0xb8) >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> Here your system (or this core) went idle, waiting for t= he next event. >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>>> :| +func -31 0.760 ipipe_unstall_= root+0x10 (arch_cpu_idle+0x30) >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> Comming back from idle. >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>>> :| +func -31+ 1.116 ipipe_root_only+= 0x10 (ipipe_unstall_root+0x24) >> >>> >>>>>>>>>>>> :| +end 0x80000000 -30 0.931 ipipe_trace_en= d+0x24 (ipipe_unstall_root+0x84) >> >>> >>>>>>>>>>>> :| +begin 0x90000000 -29 0.844 __irq_svc+0x58= (ipipe_unstall_root+0x88) >> >>> >>>>>>>>>>>> :| +func -28 0.925 gic_handle_irq= +0x10 (__irq_svc+0x6c) >> >>> >>>>>>>>>>>> :| +func -27 0.904 irq_find_mappi= ng+0x10 (gic_handle_irq+0x50) >> >>> >>>>>>>>>>>> :| +func -26 0.940 __ipipe_grab_i= rq+0x10 (gic_handle_irq+0x58) >> >>> >>>>>>>>>>>> :| +begin 0x000000c9 -25 0.826 ipipe_trace_be= gin+0x24 (__ipipe_grab_irq+0x58) >> >>> >>>>>>>>>>>> :| +func -24 0.814 __ipipe_dispat= ch_irq+0x10 (__ipipe_grab_irq+0x7c) >> >>> >>>>>>>>>>>> :| +func -23+ 1.275 irq_to_desc+0x10= (__ipipe_dispatch_irq+0x184) >> >>> >>>>>>>>>>>> :| +func -22+ 1.679 irq_to_desc+0x10= (__ipipe_dispatch_irq+0x198) >> >>> >>>>>>>>>>>> :| +func -20+ 2.092 ucc_gpio_irqhand= ler+0x14 (__ipipe_dispatch_irq+0x1fc) >> >>> >>>>>>>>>>>> :| +func -18+ 1.413 irq_find_mapping= +0x10 (ucc_gpio_irqhandler+0x84) >> >>> >>>>>>>>>>>> :| +begin 0x000000e3 -17 0.757 ipipe_trace_be= gin+0x24 (ucc_gpio_irqhandler+0x8c) >> >>> >>>>>>>>>>>> :| +func -16 0.778 __ipipe_dispat= ch_irq+0x10 (ucc_gpio_irqhandler+0x98) >> >>> >>>>>>>>>>>> :| +func -15+ 1.023 irq_to_desc+0x10= (__ipipe_dispatch_irq+0x184) >> >>> >>>>>>>>>>>> :| +func -14+ 1.494 irq_to_desc+0x10= (__ipipe_dispatch_irq+0x198) >> >>> >>>>>>>>>>>> :| +func -13+ 1.014 __ipipe_ack_leve= l_irq+0x10 (__ipipe_dispatch_irq+0x1fc) >> >>> >>>>>>>>>>>> :| +func -12 0.763 ucc_gpio_irq_m= ask+0x10 (__ipipe_ack_level_irq+0x54) >> >>> >>>>>>>>>>>> :| +func -11+ 1.248 __ipipe_spin_loc= k_irqsave+0x10 (ucc_gpio_irq_mask+0x2c) >> >>> >>>>>>>>>>>> :| #func -10+ 1.619 __ipipe_spin_unl= ock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c) >> >>> >>>>>>>>>>>> :| +func -8+ 1.239 __ipipe_set_irq_= pending+0x10 (__ipipe_dispatch_irq+0x3bc) >> >>> >>>>>>>>>>>> :| +end 0x000000e3 -7 0.994 ipipe_trace_en= d+0x24 (ucc_gpio_irqhandler+0xa0) >> >>> >>>>>>>>>>>> :| +func -6+ 1.029 gic_eoi_irq+0x10= (ucc_gpio_irqhandler+0xd4) >> >>> >>>>>>>>>>>> :| +func -5+ 1.353 __ipipe_do_sync_= pipeline+0x14 (__ipipe_dispatch_irq+0x17c) >> >>> >>>>>>>>>>>> :|+ func -4+ 1.449 __ipipe_do_sync_= stage+0x14 (__ipipe_do_sync_pipeline+0xf0) >> >>> >>>>>>>>>>>> :|# func -2+ 1.191 xnintr_irq_handl= er+0x14 (__ipipe_do_sync_stage+0x200) >> >>> >>>>>>>>>>>> :|# func -1+ 1.455 ___xnlock_get+0x= 10 (xnintr_irq_handler+0xc0) >> >>> >>>>>>>>>>>> <|# func 0 1.107 gpio_pin_inter= rupt+0x10 (xnintr_irq_handler+0xf4) >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> And here we start to process that GPIO interrupt in the = RTDM handler, >> >>> >>>>>>>>>>> roughly after 30 =C2=B5s (which are also due to tracing = overhead). So far >> >>> >>>>>>>>>>> nothing suspiciuos. >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>>> |# func 1 1.236 xnclock_core_r= ead_monotonic+0x10 (gpio_pin_interrupt+0x1c) >> >>> >>>>>>>>>>>> |# func 2 1.224 rtdm_event_sig= nal+0x10 (gpio_pin_interrupt+0x2c) >> >>> >>>>>>>>>>> >> >>> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that= one notice a >> >>> >>>>>>>>>>> too high latency? >> >>> >>>>>>>>>> >> >>> >>>>>>>>>> I may have been wrong speaking of "latency". The problem = I'm trying to fix, is understanding why my reference instrument indicates a= delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my= xenomai powered realtime recorder application, given the same network stre= am and gpio access gives me ~50 to 52 ms. >> >>> >>>>>>>>>> >> >>> >>>>>>>>>> I first though there was something wrong/delayed on the G= PIO interrupt side (the network packet could not have been received before = it is sent right). >> >>> >>>>>>>>>> The trace seem to demonstrate ~30=C2=B5s of interrupt lat= ency (a number I expected for interrupt latency on that system) for the GPI= O. So something is wrong on my system, but I don't know what ! >> >>> >>>>>>>>>> >> >>> >>>>>>>>> >> >>> >>>>>>>>> Try to trace events, not functions, using regular ftrace (= "trace-cmd >> >>> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Che= ck when the >> >>> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO e= vent is >> >>> >>>>>>>>> triggered (or whatever is trigger and reaction). Function = tracing is for >> >>> >>>>>>>>> zooming in when you know where to zoom. >> >>> >>>>>>>>> >> >>> >>>>>>>> >> >>> >>>>>>>> So I think I found the answer to my problem. >> >>> >>>>>>>> Is there any reason why NET events (in NIC irq functions) a= re dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock= _read_monotonic ? >> >>> >>>>>>>> >> >>> >>>>>>> >> >>> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clo= ck_read for >> >>> >>>>>>> user-exposed timestamping as well. Maybe Philippe can commen= t on >> >>> >>>>>>> thoughts behind this deviation. >> >>> >>>>>>> >> >>> >>>>>> >> >>> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on X= enomai's >> >>> >>>>>> idea of real time, which is the Cobalt monotonic clock plus a= n arbitrary >> >>> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consist= ent >> >>> >>>>>> (different CPUs might read different timestamps at the exact = same time), >> >>> >>>>>> is not in sync with linux's wall clock either. For these reas= ons, I >> >>> >>>>>> don't see any practical way to synchronize multiple systems o= n the clock >> >>> >>>>>> underlying rtdm_clock_read(). >> >>> >>>>>> >> >>> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for >> >>> >>>>>> timestamping in this context, only adding the potential for e= ven more >> >>> >>>>>> surprising results due to the mono->real-time offset changing= under our >> >>> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (= re)set >> >>> >>>>>> during runtime. >> >>> >>>>>> >> >>> >>>>>> I believe the UART driver is wrong here, it should return tim= estamps >> >>> >>>>>> based on the monotonic source, which best fits the common nee= d: getting >> >>> >>>>>> timestamps from the local CPU for measuring delays between ev= ents >> >>> >>>>>> received by drivers and the actions taken by the applications= which >> >>> >>>>>> consume them, immune from updates to the underlying clock >> >>> >>>>>> epoch. Granted, there might be a catch when a timestamp is ta= ken from >> >>> >>>>>> IRQ context, which is then consumed from a thread living on a= different >> >>> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that w= ould happen >> >>> >>>>>> the exact same way with rtdm_clock_read() anyway. >> >>> >>>>>> >> >>> >>>>>> The situation improves when running on top of Dovetail, since= Xenomai >> >>> >>>>>> now refers to the common linux clocks (mono / real) instead o= f providing >> >>> >>>>>> its own idea of time, but the UART driver code predates the D= ovetail >> >>> >>>>>> port. >> >>> >>>>>>=20=20 >> >>> >>>>> >> >>> >>>>> Well, it's not just UART. All hardware drivers - except for GP= IO - use >> >>> >>>>> rtdm_clock_read. That was no problem in practice for their use= cases so >> >>> >>>>> far. One problem is that GPIO timestamps are now not comparabl= e to others. >> >>> >>>>> >> >>> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are base= d on >> >>> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for= that >> >>> >>>> >> >>> >>>> /* >> >>> >>>> * The Copperplate clock shall be monotonic unless the threading >> >>> >>>> * library has restrictions to support this over Mercury. >> >>> >>>> * >> >>> >>>> * In the normal case, this means that ongoing delays and timeo= uts >> >>> >>>> * won't be affected when the host system date is changed. In t= he >> >>> >>>> * restricted case by contrast, ongoing delays and timeouts may= be >> >>> >>>> * impacted by changes to the host system date. >> >>> >>>> * >> >>> >>>> * The implementation maintains a per-clock epoch value, so that >> >>> >>>> * different emulators can have different (virtual) system date= s. >> >>> >>>> */ >> >>> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED >> >>> >>>> #define CLOCK_COPPERPLATE CLOCK_REALTIME >> >>> >>>> #else >> >>> >>>> #define CLOCK_COPPERPLATE CLOCK_MONOTONIC >> >>> >>>> #endif >> >>> >>>> >> >>> >>>> So no, only the POSIX API is using what the standard mandates, = which is >> >>> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and th= ey are >> >>> >>>> using a monotonic source as documented above. >> >>> >>> >> >>> >>> OK, but that changed in 3.x. At the time that RTDM API was origi= nally >> >>> >>> added and then promoted, it was the other not this way. We became >> >>> >>> inconsistent then. >> >>> >>> >> >>> >>>> >> >>> >>>>> particular clock was the original idea of rtdm_clock_read (whi= ch >> >>> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and shou= ld be >> >>> >>>>> fixed - unless I'm wrong with that assumption. >> >>> >>>>> >> >>> >>>> >> >>> >>>> We cannot assume the epoch is going to remain stable with >> >>> >>>> rtdm_clock_read() the way it is implemented, which is quite of a >> >>> >>>> problem wrt the common use case. For timestamping, a majority of >> >>> >>>> mainline drivers is using ktime_get() or a variant thereof whic= h is >> >>> >>>> based on the monotonic clock source, not the _real form. Why wo= uld the >> >>> >>>> real-time I/O drivers be different? >> >>> >>> >> >>> >>> We have two cases here: >> >>> >>> >> >>> >>> - I-pipe-based version where the realtime clock is under full >> >>> >>> application control -> no problem to use rtdm_clock_read >> >>> >>=20 >> >>> >> Well, there is still the issue that rtdm_clock_read() is not immu= ne to >> >>> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-w= ise >> >>> >> e.g. via a call to clock_settime(), which is the same problem tha= n Linux >> >>> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would br= eak the >> >>> >> application. >> >>> > >> >>> > Yes, but that's about the application(s) breaking themselves. Noth= ing >> >>> > new, not going to change when we only avoid clock_realtime for sta= mps >> >>> > but still use POSIX services basing timers on that clock. The key = point >> >>> > is that I-pipe gave that into RT application hands (with all relat= ed >> >>> > downsides), with Dovetail it's in system hands. >> >>> > >> >>> >>=20 >> >>> >>> - Dovetail where we share the realtime clock - with all its tun= ing - >> >>> >>> with Linux -> here we have a problem with rtdm_clock_read and= should >> >>> >>> reconsider its usage (and promotion!) >> >>> >>> >> >>> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_= read. >> >>> >>=20 >> >>> >> Wait, you have downstream users already depending on GPIO returni= ng >> >>> >> monotonic timestamps, and this is a _stable_ release. So why not = fixing >> >>> >> other drivers based on the fact that timestamping with rtdm_clock= _read() >> >>> >> is wrong instead? Same issue, right? >> >>> > >> >>> > GPIO was broken, but you are right that we may have users relying = on >> >>> > that breakage now. Obviously, we can change the other drivers for = the >> >>> > very same reasons: They are working like they work for more than 10 >> >>> > years now. >> >>> > >> >>>=20 >> >>> Sorry, but I my views, rtdm_read_clock() was broken since day one. G= PIO >> >>> had to work around the breakage.. :) >> >>>=20 >> >>> >>=20 >> >>> >> So the best course of action to sort this out for 3.1.x may be to= extend >> >>> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and r= eturn >> >>> >> timestamps based on the Xenomai wallclock. Applications which do = want to >> >>> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL >> >>> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat on= ly for >> >>> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than >> >>> >> introducing a sneaky change in behavior for the GPIO driver. >> >>> > >> >>> > Yeah, likely the way to go. >> >>> > >> >>> >>=20 >> >>> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read. >> >>> >>> >> >>> >>=20 >> >>> >> The decision looks pretty simple for the common use case: when >> >>> >> timestamps are needed for performance/delay measurements, we want= to get >> >>> >> them from a clock source which won't play funny games, warping ba= ck to >> >>> >> the future. >> >>> > >> >>> > You only think of measurements. The other drivers used the stampin= g also >> >>> > for real applications, means to calculate absolute clock-realtime >> >>> > timeouts and wait for them to arrive. We will likely need to enhan= ce >> >>> > also the other driver APIs to select the desired clocksource, just= like >> >>> > for GPIO. >> >>>=20 >> >>> I'm referring to what is in the GPIO code, which is the problem at h= and: >> >>> that timestamping was designed since day #1 to provide a way to meas= ure >> >>> the scheduling latency. >> >>>=20 >> >>> I agree that the only way out is to enable all timestamp consumers to >> >>> pick their base clock of choice (mono or wallclock). >> >>>=20 >> >>> --=20 >> >>> Philippe. >> >>=20=20 >> >> Hi there, >> >> >> >> as we speak of breaking ABI, why not introducing some specific IOCTL = to select which timesource to use for each driver using it, then using a fu= nction pointer to call the right rtdm_time_get_whatever function ? >> >> >> > >> > That would mean adding a generic ioctl, and many changes all over the >> > place (we would need no function pointer with distinct ioctl >> > codes). Doable for sure, but likely too much for 3.1.x though. >>=20 >> wrt function pointer: I mean that a plain simple branch on some selector >> is likely to perform better than a function pointer, with the spectre >> mitigations around. >>=20 > > Maybe I did not make myself clear enought. I was thinking of an ioctl tha= t would allow users to select between rtdm_read_clock/rtdm_read_clock_whate= ver/... and not to provide a user function pointer. > I got that. I was referring to the fact that using a function pointer (in kernel space) to redirect the call to the proper clock reading routine may not be as efficient as using a plain simple branch when only a few basic conditions need to be tested, due to the counter-measures the kernel may have to implement to sanitize indirect calls to circumvent attacks (e.g. retpolines/x86, branch target identification for arm64). With this in mind, assuming that we have previously sanitized the clock identifier, doing this: #define get_timestamp(__clock) \ ({ (__clock) =3D=3D CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() : rtdm_cl= ock_read(); }) may end up being faster than: xnticks_t (*__get_timestamp)(clockid_t clock); #define get_timestamp(__clock) __get_timestamp(__clock) --=20 Philippe.