All of lore.kernel.org
 help / color / mirror / Atom feed
* Large gpio interrupt latency
@ 2021-06-16  7:15 François Legal
  2021-06-16  8:10 ` Chen, Hongzhan
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-16  7:15 UTC (permalink / raw)
  To: xenomai

Hello,

working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.

So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.

My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).

I wonder where that might be coming from.

Any idea ?

thanks in advance

François



^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Large gpio interrupt latency
  2021-06-16  7:15 Large gpio interrupt latency François Legal
@ 2021-06-16  8:10 ` Chen, Hongzhan
  2021-06-16  8:18   ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Chen, Hongzhan @ 2021-06-16  8:10 UTC (permalink / raw)
  To: François Legal, xenomai


>
>-----Original Message-----
>From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>Sent: Wednesday, June 16, 2021 3:16 PM
>To: xenomai@xenomai.org
>Subject: Large gpio interrupt latency
>
>Hello,
>
>working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>
>So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>
>My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>

Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?

Hongzhan Chen

>I wonder where that might be coming from.
>
>Any idea ?
>
>thanks in advance
>
>François

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Large gpio interrupt latency
  2021-06-16  8:10 ` Chen, Hongzhan
@ 2021-06-16  8:18   ` François Legal
  2021-06-16  9:05     ` Chen, Hongzhan
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-16  8:18 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: xenomai

Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:

>
> >
> >-----Original Message-----
> >From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >Sent: Wednesday, June 16, 2021 3:16 PM
> >To: xenomai@xenomai.org
> >Subject: Large gpio interrupt latency
> >
> >Hello,
> >
> >working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >
> >So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >
> >My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >
>
> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>

Maybe my setup was not very clear.
I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.

I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.

On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.

François


> Hongzhan Chen
>
> >I wonder where that might be coming from.
> >
> >Any idea ?
> >
> >thanks in advance
> >
> >François



^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Large gpio interrupt latency
  2021-06-16  8:18   ` François Legal
@ 2021-06-16  9:05     ` Chen, Hongzhan
  2021-06-16  9:12       ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Chen, Hongzhan @ 2021-06-16  9:05 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai



>-----Original Message-----
>From: François Legal <devel@thom.fr.eu.org> 
>Sent: Wednesday, June 16, 2021 4:19 PM
>To: Chen, Hongzhan <hongzhan.chen@intel.com>
>Cc: xenomai@xenomai.org
>Subject: RE: Large gpio interrupt latency
>
>Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
> 
>> 
>> >
>> >-----Original Message-----
>> >From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>> >Sent: Wednesday, June 16, 2021 3:16 PM
>> >To: xenomai@xenomai.org
>> >Subject: Large gpio interrupt latency
>> >
>> >Hello,
>> >
>> >working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>> >
>> >So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>> >We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>> >
>> >My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>> >
>> 
>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>> 
>
>Maybe my setup was not very clear.
>I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>
>I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>
>On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.

When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.

>
>François
>
>
>> Hongzhan Chen
>> 
>> >I wonder where that might be coming from.
>> >
>> >Any idea ?
>> >
>> >thanks in advance
>> >
>> >François
>
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: Large gpio interrupt latency
  2021-06-16  9:05     ` Chen, Hongzhan
@ 2021-06-16  9:12       ` François Legal
  2021-06-16  9:40         ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-16  9:12 UTC (permalink / raw)
  To: Chen, Hongzhan; +Cc: xenomai

Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:

>
>
> >-----Original Message-----
> >From: François Legal <devel@thom.fr.eu.org>
> >Sent: Wednesday, June 16, 2021 4:19 PM
> >To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >Cc: xenomai@xenomai.org
> >Subject: RE: Large gpio interrupt latency
> >
> >Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >
> >>
> >> >
> >> >-----Original Message-----
> >> >From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >> >Sent: Wednesday, June 16, 2021 3:16 PM
> >> >To: xenomai@xenomai.org
> >> >Subject: Large gpio interrupt latency
> >> >
> >> >Hello,
> >> >
> >> >working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >> >
> >> >So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >> >We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >> >
> >> >My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >> >
> >>
> >> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>
> >
> >Maybe my setup was not very clear.
> >I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >
> >I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >
> >On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>
> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>

AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.

Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency

> >
> >François
> >
> >
> >> Hongzhan Chen
> >>
> >> >I wonder where that might be coming from.
> >> >
> >> >Any idea ?
> >> >
> >> >thanks in advance
> >> >
> >> >François
> >
> >
> >



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16  9:12       ` François Legal
@ 2021-06-16  9:40         ` Jan Kiszka
  2021-06-16 13:29           ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-16  9:40 UTC (permalink / raw)
  To: François Legal, Chen, Hongzhan; +Cc: xenomai

On 16.06.21 11:12, François Legal via Xenomai wrote:
> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>  
>>
>>
>>> -----Original Message-----
>>> From: François Legal <devel@thom.fr.eu.org> 
>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>> Cc: xenomai@xenomai.org
>>> Subject: RE: Large gpio interrupt latency
>>>
>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>> To: xenomai@xenomai.org
>>>>> Subject: Large gpio interrupt latency
>>>>>
>>>>> Hello,
>>>>>
>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>
>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>
>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>
>>>>
>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>
>>>
>>> Maybe my setup was not very clear.
>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>
>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>
>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>
>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>
> 
> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> 
> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> 

Can you (or did you already) break down the latencies on the DUT via
tracing? Is it really the GPIO output path? What is happening in it,
starting with RT/Linux task switches, mode transitions etc.?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16  9:40         ` Jan Kiszka
@ 2021-06-16 13:29           ` François Legal
  2021-06-16 13:38             ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-16 13:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chen, Hongzhan, xenomai

Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 16.06.21 11:12, François Legal via Xenomai wrote:
> > Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >
> >>
> >>
> >>> -----Original Message-----
> >>> From: François Legal <devel@thom.fr.eu.org>
> >>> Sent: Wednesday, June 16, 2021 4:19 PM
> >>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >>> Cc: xenomai@xenomai.org
> >>> Subject: RE: Large gpio interrupt latency
> >>>
> >>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>
> >>>>
> >>>>>
> >>>>> -----Original Message-----
> >>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >>>>> Sent: Wednesday, June 16, 2021 3:16 PM
> >>>>> To: xenomai@xenomai.org
> >>>>> Subject: Large gpio interrupt latency
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >>>>>
> >>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >>>>>
> >>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >>>>>
> >>>>
> >>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>>>
> >>>
> >>> Maybe my setup was not very clear.
> >>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >>>
> >>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >>>
> >>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
> >>
> >> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> >> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
> >>
> >
> > AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> >
> > Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> >
>
> Can you (or did you already) break down the latencies on the DUT via
> tracing? Is it really the GPIO output path? What is happening in it,
> starting with RT/Linux task switches, mode transitions etc.?
>

I just did it.

I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
 cat frozen
I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
------------------------------------------------------------
CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
Calibrated minimum trace-point overhead: 0.461 us

 +----- Hard IRQs ('|': locked)
 |+-- Xenomai
 ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
 |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
 |||                      |        +- NMI noise ('N')
 |||                      |        |
          Type    User Val.   Time    Delay  Function (Parent)
:  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
:  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
:  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
:  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
:  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
:  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
:  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
:  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
:  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
:  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
:  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
:  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
:| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
:| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
:| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
:| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
:  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
:  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
:| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
:| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
:  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
:  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
:  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
:| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
:| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
:| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
:| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
:  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
:| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
:| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
:| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
:| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
:| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
:| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
:  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
:| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
:| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
:  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
:  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
:  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
:| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
:| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
:| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
:| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
:  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
:  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
:| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
:| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
:| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
:  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
:  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
:  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
:| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
:| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
:| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
:| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
:  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
:  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
:  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
:  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
:  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
:  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
:  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
:| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
:| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
:| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
:| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
:  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
:  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
:  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
:| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
:| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
:| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
:| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
:| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
:| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
:| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
:| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
:| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
:| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
:| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
:| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
:| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
:| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
:| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
:| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
:| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
:| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
:| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
:| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
:| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
:| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
:| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
:| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
:| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
:| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
:|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
:|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
:|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
<|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
 |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
 |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
 |# func                   3      1.032  ___xnlock_get+0x10 (rtdm_event_signal+0x80)
 |# func                   4      0.997  xnsynch_flush+0x14 (rtdm_event_signal+0xa0)
 |# func                   5      1.835  ___xnlock_get+0x10 (xnsynch_flush+0x74)
 |# func                   7      1.083  __xnselect_signal+0x10 (rtdm_event_signal+0x1e4)
 |# func                   8      0.856  xnsynch_flush+0x14 (__xnselect_signal+0x90)
 |# func                   9      0.958  ___xnlock_get+0x10 (xnsynch_flush+0x74)
 |# func                  10      0.886  xnthread_resume+0x14 (xnsynch_flush+0xe8)
 |# func                  11      0.000  ___xnlock_get+0x10 (xnthread_resume+0x7c)



> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16 13:29           ` François Legal
@ 2021-06-16 13:38             ` Jan Kiszka
  2021-06-16 13:51               ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-16 13:38 UTC (permalink / raw)
  To: François Legal; +Cc: Chen, Hongzhan, xenomai

On 16.06.21 15:29, François Legal wrote:
> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>  
>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>  
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>> Cc: xenomai@xenomai.org
>>>>> Subject: RE: Large gpio interrupt latency
>>>>>
>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>
>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>> To: xenomai@xenomai.org
>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>
>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>
>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>
>>>>>>
>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>
>>>>>
>>>>> Maybe my setup was not very clear.
>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>
>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>
>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>
>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>
>>>
>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>
>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>
>>
>> Can you (or did you already) break down the latencies on the DUT via
>> tracing? Is it really the GPIO output path? What is happening in it,
>> starting with RT/Linux task switches, mode transitions etc.?
>>
> 
> I just did it.
> 
> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>  cat frozen
> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
> ------------------------------------------------------------
> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
> Calibrated minimum trace-point overhead: 0.461 us
> 
>  +----- Hard IRQs ('|': locked)
>  |+-- Xenomai
>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>  |||                      |        +- NMI noise ('N')
>  |||                      |        |
>           Type    User Val.   Time    Delay  Function (Parent)
> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)

Here your system (or this core) went idle, waiting for the next event.

> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)

Comming back from idle.

> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)

And here we start to process that GPIO interrupt in the RTDM handler,
roughly after 30 µs (which are also due to tracing overhead). So far
nothing suspiciuos.

>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)

Here some likely waiting RT task is signalled. Does that one notice a
too high latency?

Jan

>  |# func                   3      1.032  ___xnlock_get+0x10 (rtdm_event_signal+0x80)
>  |# func                   4      0.997  xnsynch_flush+0x14 (rtdm_event_signal+0xa0)
>  |# func                   5      1.835  ___xnlock_get+0x10 (xnsynch_flush+0x74)
>  |# func                   7      1.083  __xnselect_signal+0x10 (rtdm_event_signal+0x1e4)
>  |# func                   8      0.856  xnsynch_flush+0x14 (__xnselect_signal+0x90)
>  |# func                   9      0.958  ___xnlock_get+0x10 (xnsynch_flush+0x74)
>  |# func                  10      0.886  xnthread_resume+0x14 (xnsynch_flush+0xe8)
>  |# func                  11      0.000  ___xnlock_get+0x10 (xnthread_resume+0x7c)
> 
> 
> 
>> Jan
>>
>> -- 
>> Siemens AG, T RDA IOT
>> Corporate Competence Center Embedded Linux
> 

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16 13:38             ` Jan Kiszka
@ 2021-06-16 13:51               ` François Legal
  2021-06-16 15:10                 ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-16 13:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chen, Hongzhan, xenomai

Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 16.06.21 15:29, François Legal wrote:
> > Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >
> >> On 16.06.21 11:12, François Legal via Xenomai wrote:
> >>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: François Legal <devel@thom.fr.eu.org>
> >>>>> Sent: Wednesday, June 16, 2021 4:19 PM
> >>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >>>>> Cc: xenomai@xenomai.org
> >>>>> Subject: RE: Large gpio interrupt latency
> >>>>>
> >>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
> >>>>>>> To: xenomai@xenomai.org
> >>>>>>> Subject: Large gpio interrupt latency
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >>>>>>>
> >>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >>>>>>>
> >>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >>>>>>>
> >>>>>>
> >>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>>>>>
> >>>>>
> >>>>> Maybe my setup was not very clear.
> >>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >>>>>
> >>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >>>>>
> >>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
> >>>>
> >>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> >>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
> >>>>
> >>>
> >>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> >>>
> >>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> >>>
> >>
> >> Can you (or did you already) break down the latencies on the DUT via
> >> tracing? Is it really the GPIO output path? What is happening in it,
> >> starting with RT/Linux task switches, mode transitions etc.?
> >>
> >
> > I just did it.
> >
> > I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
> > This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
> >  cat frozen
> > I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
> > ------------------------------------------------------------
> > CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
> > Calibrated minimum trace-point overhead: 0.461 us
> >
> >  +----- Hard IRQs ('|': locked)
> >  |+-- Xenomai
> >  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
> >  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
> >  |||                      |        +- NMI noise ('N')
> >  |||                      |        |
> >           Type    User Val.   Time    Delay  Function (Parent)
> > :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
> > :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
> > :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
> > :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
> > :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
> > :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
> > :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
> > :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
> > :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
> > :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
> > :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
> > :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> > :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> > :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> > :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> > :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> > :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
> > :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
> > :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> > :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> > :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
> > :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
> > :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> > :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> > :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> > :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> > :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> > :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
> > :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
> > :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
> > :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
> > :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
> > :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
> > :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
> > :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
> > :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> > :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> > :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
> > :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
> > :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> > :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> > :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> > :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> > :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> > :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
> > :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
> > :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
> > :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> > :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> > :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
> > :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
> > :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> > :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> > :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> > :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> > :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> > :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
> > :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
> > :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
> > :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
> > :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
> > :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
> > :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> > :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> > :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> > :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> > :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> > :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
> > :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
> > :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
> > :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>
> Here your system (or this core) went idle, waiting for the next event.
>
> > :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>
> Comming back from idle.
>
> > :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> > :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> > :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
> > :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
> > :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
> > :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
> > :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
> > :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
> > :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> > :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> > :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
> > :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
> > :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
> > :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
> > :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> > :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> > :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
> > :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
> > :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
> > :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
> > :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
> > :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
> > :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
> > :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
> > :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
> > :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
> > :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
> > <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>
> And here we start to process that GPIO interrupt in the RTDM handler,
> roughly after 30 µs (which are also due to tracing overhead). So far
> nothing suspiciuos.
>
> >  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
> >  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>
> Here some likely waiting RT task is signalled. Does that one notice a
> too high latency?

I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.

I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !

>
> Jan
>
> >  |# func                   3      1.032  ___xnlock_get+0x10 (rtdm_event_signal+0x80)
> >  |# func                   4      0.997  xnsynch_flush+0x14 (rtdm_event_signal+0xa0)
> >  |# func                   5      1.835  ___xnlock_get+0x10 (xnsynch_flush+0x74)
> >  |# func                   7      1.083  __xnselect_signal+0x10 (rtdm_event_signal+0x1e4)
> >  |# func                   8      0.856  xnsynch_flush+0x14 (__xnselect_signal+0x90)
> >  |# func                   9      0.958  ___xnlock_get+0x10 (xnsynch_flush+0x74)
> >  |# func                  10      0.886  xnthread_resume+0x14 (xnsynch_flush+0xe8)
> >  |# func                  11      0.000  ___xnlock_get+0x10 (xnthread_resume+0x7c)
> >
> >
> >
> >> Jan
> >>
> >> --
> >> Siemens AG, T RDA IOT
> >> Corporate Competence Center Embedded Linux
> >
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16 13:51               ` François Legal
@ 2021-06-16 15:10                 ` Jan Kiszka
  2021-06-17  7:15                   ` François Legal
  2021-06-18 18:41                   ` François Legal
  0 siblings, 2 replies; 36+ messages in thread
From: Jan Kiszka @ 2021-06-16 15:10 UTC (permalink / raw)
  To: François Legal; +Cc: Chen, Hongzhan, xenomai

On 16.06.21 15:51, François Legal wrote:
> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>  
>> On 16.06.21 15:29, François Legal wrote:
>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>  
>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>  
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>> Cc: xenomai@xenomai.org
>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>
>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>
>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>
>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>
>>>>>>>
>>>>>>> Maybe my setup was not very clear.
>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>
>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>
>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>
>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>
>>>>>
>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>
>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>
>>>>
>>>> Can you (or did you already) break down the latencies on the DUT via
>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>
>>>
>>> I just did it.
>>>
>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>  cat frozen
>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>> ------------------------------------------------------------
>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>> Calibrated minimum trace-point overhead: 0.461 us
>>>
>>>  +----- Hard IRQs ('|': locked)
>>>  |+-- Xenomai
>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>  |||                      |        +- NMI noise ('N')
>>>  |||                      |        |
>>>           Type    User Val.   Time    Delay  Function (Parent)
>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>
>> Here your system (or this core) went idle, waiting for the next event.
>>
>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>
>> Comming back from idle.
>>
>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>
>> And here we start to process that GPIO interrupt in the RTDM handler,
>> roughly after 30 µs (which are also due to tracing overhead). So far
>> nothing suspiciuos.
>>
>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>
>> Here some likely waiting RT task is signalled. Does that one notice a
>> too high latency?
> 
> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
> 
> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
> 

Try to trace events, not functions, using regular ftrace ("trace-cmd
record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
NIC gets the interrupt and compare that to when the GPIO event is
triggered (or whatever is trigger and reaction). Function tracing is for
zooming in when you know where to zoom.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16 15:10                 ` Jan Kiszka
@ 2021-06-17  7:15                   ` François Legal
  2021-06-18 18:41                   ` François Legal
  1 sibling, 0 replies; 36+ messages in thread
From: François Legal @ 2021-06-17  7:15 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chen, Hongzhan, xenomai

Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 16.06.21 15:51, François Legal wrote:
> > Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >
> >> On 16.06.21 15:29, François Legal wrote:
> >>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>
> >>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
> >>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: François Legal <devel@thom.fr.eu.org>
> >>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
> >>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >>>>>>> Cc: xenomai@xenomai.org
> >>>>>>> Subject: RE: Large gpio interrupt latency
> >>>>>>>
> >>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
> >>>>>>>>> To: xenomai@xenomai.org
> >>>>>>>>> Subject: Large gpio interrupt latency
> >>>>>>>>>
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >>>>>>>>>
> >>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >>>>>>>>>
> >>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Maybe my setup was not very clear.
> >>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >>>>>>>
> >>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >>>>>>>
> >>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
> >>>>>>
> >>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> >>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
> >>>>>>
> >>>>>
> >>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> >>>>>
> >>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> >>>>>
> >>>>
> >>>> Can you (or did you already) break down the latencies on the DUT via
> >>>> tracing? Is it really the GPIO output path? What is happening in it,
> >>>> starting with RT/Linux task switches, mode transitions etc.?
> >>>>
> >>>
> >>> I just did it.
> >>>
> >>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
> >>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
> >>>  cat frozen
> >>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
> >>> ------------------------------------------------------------
> >>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
> >>> Calibrated minimum trace-point overhead: 0.461 us
> >>>
> >>>  +----- Hard IRQs ('|': locked)
> >>>  |+-- Xenomai
> >>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
> >>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
> >>>  |||                      |        +- NMI noise ('N')
> >>>  |||                      |        |
> >>>           Type    User Val.   Time    Delay  Function (Parent)
> >>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
> >>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
> >>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
> >>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
> >>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
> >>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
> >>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
> >>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
> >>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
> >>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
> >>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
> >>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
> >>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
> >>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
> >>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
> >>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
> >>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
> >>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
> >>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
> >>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
> >>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
> >>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
> >>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
> >>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
> >>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
> >>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
> >>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
> >>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
> >>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
> >>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
> >>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
> >>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
> >>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
> >>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
> >>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
> >>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
> >>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
> >>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
> >>
> >> Here your system (or this core) went idle, waiting for the next event.
> >>
> >>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
> >>
> >> Comming back from idle.
> >>
> >>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
> >>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
> >>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
> >>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
> >>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
> >>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
> >>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
> >>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
> >>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
> >>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
> >>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
> >>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
> >>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
> >>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
> >>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
> >>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
> >>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
> >>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
> >>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
> >>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
> >>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
> >>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
> >>
> >> And here we start to process that GPIO interrupt in the RTDM handler,
> >> roughly after 30 µs (which are also due to tracing overhead). So far
> >> nothing suspiciuos.
> >>
> >>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
> >>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
> >>
> >> Here some likely waiting RT task is signalled. Does that one notice a
> >> too high latency?
> >
> > I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
> >
> > I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
> > The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
> >
>
> Try to trace events, not functions, using regular ftrace ("trace-cmd
> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
> NIC gets the interrupt and compare that to when the GPIO event is
> triggered (or whatever is trigger and reaction). Function tracing is for
> zooming in when you know where to zoom.
>

I just tried what you indicated (trace-cmd...), but I can't see any of my RT interrupts. Only eth0 (which supports my NFS rootfs).
I'll try again adjusting the -e flags and report back.

François

> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-16 15:10                 ` Jan Kiszka
  2021-06-17  7:15                   ` François Legal
@ 2021-06-18 18:41                   ` François Legal
  2021-06-21  6:56                     ` Jan Kiszka
  1 sibling, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-18 18:41 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chen, Hongzhan, xenomai

Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 16.06.21 15:51, François Legal wrote:
> > Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >
> >> On 16.06.21 15:29, François Legal wrote:
> >>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>
> >>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
> >>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: François Legal <devel@thom.fr.eu.org>
> >>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
> >>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >>>>>>> Cc: xenomai@xenomai.org
> >>>>>>> Subject: RE: Large gpio interrupt latency
> >>>>>>>
> >>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
> >>>>>>>>> To: xenomai@xenomai.org
> >>>>>>>>> Subject: Large gpio interrupt latency
> >>>>>>>>>
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >>>>>>>>>
> >>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >>>>>>>>>
> >>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Maybe my setup was not very clear.
> >>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >>>>>>>
> >>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >>>>>>>
> >>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
> >>>>>>
> >>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> >>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
> >>>>>>
> >>>>>
> >>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> >>>>>
> >>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> >>>>>
> >>>>
> >>>> Can you (or did you already) break down the latencies on the DUT via
> >>>> tracing? Is it really the GPIO output path? What is happening in it,
> >>>> starting with RT/Linux task switches, mode transitions etc.?
> >>>>
> >>>
> >>> I just did it.
> >>>
> >>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
> >>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
> >>>  cat frozen
> >>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
> >>> ------------------------------------------------------------
> >>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
> >>> Calibrated minimum trace-point overhead: 0.461 us
> >>>
> >>>  +----- Hard IRQs ('|': locked)
> >>>  |+-- Xenomai
> >>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
> >>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
> >>>  |||                      |        +- NMI noise ('N')
> >>>  |||                      |        |
> >>>           Type    User Val.   Time    Delay  Function (Parent)
> >>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
> >>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
> >>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
> >>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
> >>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
> >>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
> >>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
> >>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
> >>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
> >>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
> >>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
> >>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
> >>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
> >>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
> >>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
> >>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
> >>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
> >>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
> >>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
> >>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
> >>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
> >>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
> >>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
> >>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
> >>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
> >>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
> >>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
> >>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
> >>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
> >>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
> >>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
> >>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
> >>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
> >>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
> >>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
> >>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
> >>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
> >>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
> >>
> >> Here your system (or this core) went idle, waiting for the next event.
> >>
> >>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
> >>
> >> Comming back from idle.
> >>
> >>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
> >>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
> >>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
> >>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
> >>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
> >>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
> >>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
> >>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
> >>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
> >>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
> >>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
> >>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
> >>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
> >>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
> >>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
> >>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
> >>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
> >>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
> >>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
> >>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
> >>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
> >>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
> >>
> >> And here we start to process that GPIO interrupt in the RTDM handler,
> >> roughly after 30 µs (which are also due to tracing overhead). So far
> >> nothing suspiciuos.
> >>
> >>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
> >>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
> >>
> >> Here some likely waiting RT task is signalled. Does that one notice a
> >> too high latency?
> >
> > I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
> >
> > I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
> > The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
> >
>
> Try to trace events, not functions, using regular ftrace ("trace-cmd
> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
> NIC gets the interrupt and compare that to when the GPIO event is
> triggered (or whatever is trigger and reaction). Function tracing is for
> zooming in when you know where to zoom.
>

So I think I found the answer to my problem.
Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?

Thanks in  advance

François

> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-18 18:41                   ` François Legal
@ 2021-06-21  6:56                     ` Jan Kiszka
  2021-06-21  9:39                       ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-21  6:56 UTC (permalink / raw)
  To: François Legal, Philippe Gerum; +Cc: Chen, Hongzhan, xenomai

On 18.06.21 20:41, François Legal wrote:
> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>  
>> On 16.06.21 15:51, François Legal wrote:
>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>  
>>>> On 16.06.21 15:29, François Legal wrote:
>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>  
>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>  
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>
>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>
>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>
>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>
>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>
>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>
>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>
>>>>>>>
>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>
>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>
>>>>>>
>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>
>>>>>
>>>>> I just did it.
>>>>>
>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>  cat frozen
>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>> ------------------------------------------------------------
>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>
>>>>>  +----- Hard IRQs ('|': locked)
>>>>>  |+-- Xenomai
>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>  |||                      |        +- NMI noise ('N')
>>>>>  |||                      |        |
>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>
>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>
>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>
>>>> Comming back from idle.
>>>>
>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>
>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>> nothing suspiciuos.
>>>>
>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>
>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>> too high latency?
>>>
>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>
>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>
>>
>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>> NIC gets the interrupt and compare that to when the GPIO event is
>> triggered (or whatever is trigger and reaction). Function tracing is for
>> zooming in when you know where to zoom.
>>
> 
> So I think I found the answer to my problem.
> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
> 

Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
user-exposed timestamping as well. Maybe Philippe can comment on
thoughts behind this deviation.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21  6:56                     ` Jan Kiszka
@ 2021-06-21  9:39                       ` Philippe Gerum
  2021-06-21 13:38                         ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-21  9:39 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: François Legal, Chen, Hongzhan, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 18.06.21 20:41, François Legal wrote:
>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>  
>>> On 16.06.21 15:51, François Legal wrote:
>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>  
>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>  
>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>  
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>
>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>
>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>
>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>
>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>
>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>
>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>
>>>>>>>>
>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>
>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>
>>>>>>>
>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>
>>>>>>
>>>>>> I just did it.
>>>>>>
>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>  cat frozen
>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>> ------------------------------------------------------------
>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>
>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>  |+-- Xenomai
>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>  |||                      |        |
>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>
>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>
>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>
>>>>> Comming back from idle.
>>>>>
>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>
>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>> nothing suspiciuos.
>>>>>
>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>
>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>> too high latency?
>>>>
>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>
>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>
>>>
>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>> NIC gets the interrupt and compare that to when the GPIO event is
>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>> zooming in when you know where to zoom.
>>>
>> 
>> So I think I found the answer to my problem.
>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>> 
>
> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
> user-exposed timestamping as well. Maybe Philippe can comment on
> thoughts behind this deviation.
>

When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
idea of real time, which is the Cobalt monotonic clock plus an arbitrary
offset. For this reason, rtdm_clock_read() is not SMP-consistent
(different CPUs might read different timestamps at the exact same time),
is not in sync with linux's wall clock either. For these reasons, I
don't see any practical way to synchronize multiple systems on the clock
underlying rtdm_clock_read().

Therefore, there is no upside in using rtdm_clock_read() for
timestamping in this context, only adding the potential for even more
surprising results due to the mono->real-time offset changing under our
feet, since the epoch of the Xenomai real-time clock can be (re)set
during runtime.

I believe the UART driver is wrong here, it should return timestamps
based on the monotonic source, which best fits the common need: getting
timestamps from the local CPU for measuring delays between events
received by drivers and the actions taken by the applications which
consume them, immune from updates to the underlying clock
epoch. Granted, there might be a catch when a timestamp is taken from
IRQ context, which is then consumed from a thread living on a different
CPU, if per-CPU clocks are not/badly synchronized. But that would happen
the exact same way with rtdm_clock_read() anyway.

The situation improves when running on top of Dovetail, since Xenomai
now refers to the common linux clocks (mono / real) instead of providing
its own idea of time, but the UART driver code predates the Dovetail
port.
 
-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21  9:39                       ` Philippe Gerum
@ 2021-06-21 13:38                         ` Jan Kiszka
  2021-06-21 13:54                           ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-21 13:38 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: François Legal, Chen, Hongzhan, xenomai

On 21.06.21 11:39, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 18.06.21 20:41, François Legal wrote:
>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>  
>>>> On 16.06.21 15:51, François Legal wrote:
>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>  
>>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>  
>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>  
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>>
>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>>
>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>>
>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>>
>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>>
>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>>
>>>>>>>>
>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>>
>>>>>>>
>>>>>>> I just did it.
>>>>>>>
>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>>  cat frozen
>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>>> ------------------------------------------------------------
>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>>
>>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>>  |+-- Xenomai
>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>>  |||                      |        |
>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>>
>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>>
>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>>
>>>>>> Comming back from idle.
>>>>>>
>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>>
>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>>> nothing suspiciuos.
>>>>>>
>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>>
>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>>> too high latency?
>>>>>
>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>>
>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>>
>>>>
>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>>> zooming in when you know where to zoom.
>>>>
>>>
>>> So I think I found the answer to my problem.
>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>>
>>
>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>> user-exposed timestamping as well. Maybe Philippe can comment on
>> thoughts behind this deviation.
>>
> 
> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
> offset. For this reason, rtdm_clock_read() is not SMP-consistent
> (different CPUs might read different timestamps at the exact same time),
> is not in sync with linux's wall clock either. For these reasons, I
> don't see any practical way to synchronize multiple systems on the clock
> underlying rtdm_clock_read().
> 
> Therefore, there is no upside in using rtdm_clock_read() for
> timestamping in this context, only adding the potential for even more
> surprising results due to the mono->real-time offset changing under our
> feet, since the epoch of the Xenomai real-time clock can be (re)set
> during runtime.
> 
> I believe the UART driver is wrong here, it should return timestamps
> based on the monotonic source, which best fits the common need: getting
> timestamps from the local CPU for measuring delays between events
> received by drivers and the actions taken by the applications which
> consume them, immune from updates to the underlying clock
> epoch. Granted, there might be a catch when a timestamp is taken from
> IRQ context, which is then consumed from a thread living on a different
> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
> the exact same way with rtdm_clock_read() anyway.
> 
> The situation improves when running on top of Dovetail, since Xenomai
> now refers to the common linux clocks (mono / real) instead of providing
> its own idea of time, but the UART driver code predates the Dovetail
> port.
>  

Well, it's not just UART. All hardware drivers - except for GPIO - use
rtdm_clock_read. That was no problem in practice for their use cases so
far. One problem is that GPIO timestamps are now not comparable to others.

But IIRC, most Xenomai APIs using absolute timestamps are based on
Xenomai's real-time clock. Therefore, providing timestamps for that
particular clock was the original idea of rtdm_clock_read (which
predates rtdm_clock_read_monotonic). GPIO breaks that and should be
fixed - unless I'm wrong with that assumption.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 13:38                         ` Jan Kiszka
@ 2021-06-21 13:54                           ` Philippe Gerum
  2021-06-21 14:02                             ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-21 13:54 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: François Legal, Chen, Hongzhan, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 21.06.21 11:39, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 18.06.21 20:41, François Legal wrote:
>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>  
>>>>> On 16.06.21 15:51, François Legal wrote:
>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>  
>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>  
>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>  
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>>>
>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>>>
>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>>>
>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>>>
>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>>>
>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I just did it.
>>>>>>>>
>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>>>  cat frozen
>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>>>> ------------------------------------------------------------
>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>>>
>>>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>>>  |+-- Xenomai
>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>>>  |||                      |        |
>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>>>
>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>>>
>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>>>
>>>>>>> Comming back from idle.
>>>>>>>
>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>>>
>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>>>> nothing suspiciuos.
>>>>>>>
>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>>>
>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>>>> too high latency?
>>>>>>
>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>>>
>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>>>
>>>>>
>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>>>> zooming in when you know where to zoom.
>>>>>
>>>>
>>>> So I think I found the answer to my problem.
>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>>>
>>>
>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>> thoughts behind this deviation.
>>>
>> 
>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>> (different CPUs might read different timestamps at the exact same time),
>> is not in sync with linux's wall clock either. For these reasons, I
>> don't see any practical way to synchronize multiple systems on the clock
>> underlying rtdm_clock_read().
>> 
>> Therefore, there is no upside in using rtdm_clock_read() for
>> timestamping in this context, only adding the potential for even more
>> surprising results due to the mono->real-time offset changing under our
>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>> during runtime.
>> 
>> I believe the UART driver is wrong here, it should return timestamps
>> based on the monotonic source, which best fits the common need: getting
>> timestamps from the local CPU for measuring delays between events
>> received by drivers and the actions taken by the applications which
>> consume them, immune from updates to the underlying clock
>> epoch. Granted, there might be a catch when a timestamp is taken from
>> IRQ context, which is then consumed from a thread living on a different
>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>> the exact same way with rtdm_clock_read() anyway.
>> 
>> The situation improves when running on top of Dovetail, since Xenomai
>> now refers to the common linux clocks (mono / real) instead of providing
>> its own idea of time, but the UART driver code predates the Dovetail
>> port.
>>  
>
> Well, it's not just UART. All hardware drivers - except for GPIO - use
> rtdm_clock_read. That was no problem in practice for their use cases so
> far. One problem is that GPIO timestamps are now not comparable to others.
>
> But IIRC, most Xenomai APIs using absolute timestamps are based on
> Xenomai's real-time clock. Therefore, providing timestamps for that

/*
 * The Copperplate clock shall be monotonic unless the threading
 * library has restrictions to support this over Mercury.
 *
 * In the normal case, this means that ongoing delays and timeouts
 * won't be affected when the host system date is changed. In the
 * restricted case by contrast, ongoing delays and timeouts may be
 * impacted by changes to the host system date.
 *
 * The implementation maintains a per-clock epoch value, so that
 * different emulators can have different (virtual) system dates.
 */
#ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
#define CLOCK_COPPERPLATE  CLOCK_REALTIME
#else
#define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
#endif

So no, only the POSIX API is using what the standard mandates, which is
CLOCK_REALTIME. All other APIs are based on copperplate, and they are
using a monotonic source as documented above.

> particular clock was the original idea of rtdm_clock_read (which
> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
> fixed - unless I'm wrong with that assumption.
>

We cannot assume the epoch is going to remain stable with
rtdm_clock_read() the way it is implemented, which is quite of a
problem wrt the common use case. For timestamping, a majority of
mainline drivers is using ktime_get() or a variant thereof which is
based on the monotonic clock source, not the _real form. Why would the
real-time I/O drivers be different?

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 13:54                           ` Philippe Gerum
@ 2021-06-21 14:02                             ` Jan Kiszka
  2021-06-21 14:28                               ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-21 14:02 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: François Legal, Chen, Hongzhan, xenomai

On 21.06.21 15:54, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 21.06.21 11:39, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 18.06.21 20:41, François Legal wrote:
>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>  
>>>>>> On 16.06.21 15:51, François Legal wrote:
>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>  
>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>>  
>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>  
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>>>>
>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>>>>
>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I just did it.
>>>>>>>>>
>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>>>>  cat frozen
>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>>>>
>>>>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>>>>  |+-- Xenomai
>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>>>>  |||                      |        |
>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>>>>
>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>>>>
>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>>>>
>>>>>>>> Comming back from idle.
>>>>>>>>
>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>>>>
>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>>>>> nothing suspiciuos.
>>>>>>>>
>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>>>>
>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>>>>> too high latency?
>>>>>>>
>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>>>>
>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>>>>
>>>>>>
>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>>>>> zooming in when you know where to zoom.
>>>>>>
>>>>>
>>>>> So I think I found the answer to my problem.
>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>>>>
>>>>
>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>>> thoughts behind this deviation.
>>>>
>>>
>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>>> (different CPUs might read different timestamps at the exact same time),
>>> is not in sync with linux's wall clock either. For these reasons, I
>>> don't see any practical way to synchronize multiple systems on the clock
>>> underlying rtdm_clock_read().
>>>
>>> Therefore, there is no upside in using rtdm_clock_read() for
>>> timestamping in this context, only adding the potential for even more
>>> surprising results due to the mono->real-time offset changing under our
>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>>> during runtime.
>>>
>>> I believe the UART driver is wrong here, it should return timestamps
>>> based on the monotonic source, which best fits the common need: getting
>>> timestamps from the local CPU for measuring delays between events
>>> received by drivers and the actions taken by the applications which
>>> consume them, immune from updates to the underlying clock
>>> epoch. Granted, there might be a catch when a timestamp is taken from
>>> IRQ context, which is then consumed from a thread living on a different
>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>>> the exact same way with rtdm_clock_read() anyway.
>>>
>>> The situation improves when running on top of Dovetail, since Xenomai
>>> now refers to the common linux clocks (mono / real) instead of providing
>>> its own idea of time, but the UART driver code predates the Dovetail
>>> port.
>>>  
>>
>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>> rtdm_clock_read. That was no problem in practice for their use cases so
>> far. One problem is that GPIO timestamps are now not comparable to others.
>>
>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>> Xenomai's real-time clock. Therefore, providing timestamps for that
> 
> /*
>  * The Copperplate clock shall be monotonic unless the threading
>  * library has restrictions to support this over Mercury.
>  *
>  * In the normal case, this means that ongoing delays and timeouts
>  * won't be affected when the host system date is changed. In the
>  * restricted case by contrast, ongoing delays and timeouts may be
>  * impacted by changes to the host system date.
>  *
>  * The implementation maintains a per-clock epoch value, so that
>  * different emulators can have different (virtual) system dates.
>  */
> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
> #else
> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
> #endif
> 
> So no, only the POSIX API is using what the standard mandates, which is
> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
> using a monotonic source as documented above.

OK, but that changed in 3.x. At the time that RTDM API was originally
added and then promoted, it was the other not this way. We became
inconsistent then.

> 
>> particular clock was the original idea of rtdm_clock_read (which
>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>> fixed - unless I'm wrong with that assumption.
>>
> 
> We cannot assume the epoch is going to remain stable with
> rtdm_clock_read() the way it is implemented, which is quite of a
> problem wrt the common use case. For timestamping, a majority of
> mainline drivers is using ktime_get() or a variant thereof which is
> based on the monotonic clock source, not the _real form. Why would the
> real-time I/O drivers be different?

We have two cases here:

 - I-pipe-based version where the realtime clock is under full
   application control -> no problem to use rtdm_clock_read
 - Dovetail where we share the realtime clock - with all its tuning -
   with Linux -> here we have a problem with rtdm_clock_read and should
   reconsider its usage (and promotion!)

For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
For 3.2, I'm not sure yet what to do with rtdm_clock_read.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 14:02                             ` Jan Kiszka
@ 2021-06-21 14:28                               ` Philippe Gerum
  2021-06-21 14:46                                 ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-21 14:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: François Legal, Chen, Hongzhan, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 21.06.21 15:54, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 21.06.21 11:39, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 18.06.21 20:41, François Legal wrote:
>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>  
>>>>>>> On 16.06.21 15:51, François Legal wrote:
>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>  
>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>>>  
>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>  
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>>>>>
>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I just did it.
>>>>>>>>>>
>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>>>>>  cat frozen
>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>>>>>
>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>>>>>  |+-- Xenomai
>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>>>>>  |||                      |        |
>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>>>>>
>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>>>>>
>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>>>>>
>>>>>>>>> Comming back from idle.
>>>>>>>>>
>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>>>>>
>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>>>>>> nothing suspiciuos.
>>>>>>>>>
>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>>>>>
>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>>>>>> too high latency?
>>>>>>>>
>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>>>>>
>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>>>>>
>>>>>>>
>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>>>>>> zooming in when you know where to zoom.
>>>>>>>
>>>>>>
>>>>>> So I think I found the answer to my problem.
>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>>>>>
>>>>>
>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>>>> thoughts behind this deviation.
>>>>>
>>>>
>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>>>> (different CPUs might read different timestamps at the exact same time),
>>>> is not in sync with linux's wall clock either. For these reasons, I
>>>> don't see any practical way to synchronize multiple systems on the clock
>>>> underlying rtdm_clock_read().
>>>>
>>>> Therefore, there is no upside in using rtdm_clock_read() for
>>>> timestamping in this context, only adding the potential for even more
>>>> surprising results due to the mono->real-time offset changing under our
>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>>>> during runtime.
>>>>
>>>> I believe the UART driver is wrong here, it should return timestamps
>>>> based on the monotonic source, which best fits the common need: getting
>>>> timestamps from the local CPU for measuring delays between events
>>>> received by drivers and the actions taken by the applications which
>>>> consume them, immune from updates to the underlying clock
>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>>>> IRQ context, which is then consumed from a thread living on a different
>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>>>> the exact same way with rtdm_clock_read() anyway.
>>>>
>>>> The situation improves when running on top of Dovetail, since Xenomai
>>>> now refers to the common linux clocks (mono / real) instead of providing
>>>> its own idea of time, but the UART driver code predates the Dovetail
>>>> port.
>>>>  
>>>
>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>>> rtdm_clock_read. That was no problem in practice for their use cases so
>>> far. One problem is that GPIO timestamps are now not comparable to others.
>>>
>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>> 
>> /*
>>  * The Copperplate clock shall be monotonic unless the threading
>>  * library has restrictions to support this over Mercury.
>>  *
>>  * In the normal case, this means that ongoing delays and timeouts
>>  * won't be affected when the host system date is changed. In the
>>  * restricted case by contrast, ongoing delays and timeouts may be
>>  * impacted by changes to the host system date.
>>  *
>>  * The implementation maintains a per-clock epoch value, so that
>>  * different emulators can have different (virtual) system dates.
>>  */
>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>> #else
>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>> #endif
>> 
>> So no, only the POSIX API is using what the standard mandates, which is
>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>> using a monotonic source as documented above.
>
> OK, but that changed in 3.x. At the time that RTDM API was originally
> added and then promoted, it was the other not this way. We became
> inconsistent then.
>
>> 
>>> particular clock was the original idea of rtdm_clock_read (which
>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>>> fixed - unless I'm wrong with that assumption.
>>>
>> 
>> We cannot assume the epoch is going to remain stable with
>> rtdm_clock_read() the way it is implemented, which is quite of a
>> problem wrt the common use case. For timestamping, a majority of
>> mainline drivers is using ktime_get() or a variant thereof which is
>> based on the monotonic clock source, not the _real form. Why would the
>> real-time I/O drivers be different?
>
> We have two cases here:
>
>  - I-pipe-based version where the realtime clock is under full
>    application control -> no problem to use rtdm_clock_read

Well, there is still the issue that rtdm_clock_read() is not immune to
some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
e.g. via a call to clock_settime(), which is the same problem than Linux
changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
application.

>  - Dovetail where we share the realtime clock - with all its tuning -
>    with Linux -> here we have a problem with rtdm_clock_read and should
>    reconsider its usage (and promotion!)
>
> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.

Wait, you have downstream users already depending on GPIO returning
monotonic timestamps, and this is a _stable_ release. So why not fixing
other drivers based on the fact that timestamping with rtdm_clock_read()
is wrong instead? Same issue, right?

So the best course of action to sort this out for 3.1.x may be to extend
GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
timestamps based on the Xenomai wallclock. Applications which do want to
align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
users of GPIO_RTIOC_TS_REAL, but that would be much better than
introducing a sneaky change in behavior for the GPIO driver.

> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>

The decision looks pretty simple for the common use case: when
timestamps are needed for performance/delay measurements, we want to get
them from a clock source which won't play funny games, warping back to
the future.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 14:28                               ` Philippe Gerum
@ 2021-06-21 14:46                                 ` Jan Kiszka
  2021-06-21 14:57                                   ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-21 14:46 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: François Legal, Chen, Hongzhan, xenomai

On 21.06.21 16:28, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 21.06.21 15:54, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 21.06.21 11:39, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 18.06.21 20:41, François Legal wrote:
>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>  
>>>>>>>> On 16.06.21 15:51, François Legal wrote:
>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>>  
>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>>>>  
>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>>  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I just did it.
>>>>>>>>>>>
>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>>>>>>  cat frozen
>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>>>>>>
>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>>>>>>  |+-- Xenomai
>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>>>>>>  |||                      |        |
>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>>>>>>
>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>>>>>>
>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>>>>>>
>>>>>>>>>> Comming back from idle.
>>>>>>>>>>
>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>>>>>>
>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>>>>>>> nothing suspiciuos.
>>>>>>>>>>
>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>>>>>>
>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>>>>>>> too high latency?
>>>>>>>>>
>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>>>>>>
>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>>>>>>
>>>>>>>>
>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>>>>>>> zooming in when you know where to zoom.
>>>>>>>>
>>>>>>>
>>>>>>> So I think I found the answer to my problem.
>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>>>>>>
>>>>>>
>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>>>>> thoughts behind this deviation.
>>>>>>
>>>>>
>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>>>>> (different CPUs might read different timestamps at the exact same time),
>>>>> is not in sync with linux's wall clock either. For these reasons, I
>>>>> don't see any practical way to synchronize multiple systems on the clock
>>>>> underlying rtdm_clock_read().
>>>>>
>>>>> Therefore, there is no upside in using rtdm_clock_read() for
>>>>> timestamping in this context, only adding the potential for even more
>>>>> surprising results due to the mono->real-time offset changing under our
>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>>>>> during runtime.
>>>>>
>>>>> I believe the UART driver is wrong here, it should return timestamps
>>>>> based on the monotonic source, which best fits the common need: getting
>>>>> timestamps from the local CPU for measuring delays between events
>>>>> received by drivers and the actions taken by the applications which
>>>>> consume them, immune from updates to the underlying clock
>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>>>>> IRQ context, which is then consumed from a thread living on a different
>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>>>>> the exact same way with rtdm_clock_read() anyway.
>>>>>
>>>>> The situation improves when running on top of Dovetail, since Xenomai
>>>>> now refers to the common linux clocks (mono / real) instead of providing
>>>>> its own idea of time, but the UART driver code predates the Dovetail
>>>>> port.
>>>>>  
>>>>
>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>>>> rtdm_clock_read. That was no problem in practice for their use cases so
>>>> far. One problem is that GPIO timestamps are now not comparable to others.
>>>>
>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>>>
>>> /*
>>>  * The Copperplate clock shall be monotonic unless the threading
>>>  * library has restrictions to support this over Mercury.
>>>  *
>>>  * In the normal case, this means that ongoing delays and timeouts
>>>  * won't be affected when the host system date is changed. In the
>>>  * restricted case by contrast, ongoing delays and timeouts may be
>>>  * impacted by changes to the host system date.
>>>  *
>>>  * The implementation maintains a per-clock epoch value, so that
>>>  * different emulators can have different (virtual) system dates.
>>>  */
>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>>> #else
>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>>> #endif
>>>
>>> So no, only the POSIX API is using what the standard mandates, which is
>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>>> using a monotonic source as documented above.
>>
>> OK, but that changed in 3.x. At the time that RTDM API was originally
>> added and then promoted, it was the other not this way. We became
>> inconsistent then.
>>
>>>
>>>> particular clock was the original idea of rtdm_clock_read (which
>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>>>> fixed - unless I'm wrong with that assumption.
>>>>
>>>
>>> We cannot assume the epoch is going to remain stable with
>>> rtdm_clock_read() the way it is implemented, which is quite of a
>>> problem wrt the common use case. For timestamping, a majority of
>>> mainline drivers is using ktime_get() or a variant thereof which is
>>> based on the monotonic clock source, not the _real form. Why would the
>>> real-time I/O drivers be different?
>>
>> We have two cases here:
>>
>>  - I-pipe-based version where the realtime clock is under full
>>    application control -> no problem to use rtdm_clock_read
> 
> Well, there is still the issue that rtdm_clock_read() is not immune to
> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
> e.g. via a call to clock_settime(), which is the same problem than Linux
> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
> application.

Yes, but that's about the application(s) breaking themselves. Nothing
new, not going to change when we only avoid clock_realtime for stamps
but still use POSIX services basing timers on that clock. The key point
is that I-pipe gave that into RT application hands (with all related
downsides), with Dovetail it's in system hands.

> 
>>  - Dovetail where we share the realtime clock - with all its tuning -
>>    with Linux -> here we have a problem with rtdm_clock_read and should
>>    reconsider its usage (and promotion!)
>>
>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
> 
> Wait, you have downstream users already depending on GPIO returning
> monotonic timestamps, and this is a _stable_ release. So why not fixing
> other drivers based on the fact that timestamping with rtdm_clock_read()
> is wrong instead? Same issue, right?

GPIO was broken, but you are right that we may have users relying on
that breakage now. Obviously, we can change the other drivers for the
very same reasons: They are working like they work for more than 10
years now.

> 
> So the best course of action to sort this out for 3.1.x may be to extend
> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
> timestamps based on the Xenomai wallclock. Applications which do want to
> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
> users of GPIO_RTIOC_TS_REAL, but that would be much better than
> introducing a sneaky change in behavior for the GPIO driver.

Yeah, likely the way to go.

> 
>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>>
> 
> The decision looks pretty simple for the common use case: when
> timestamps are needed for performance/delay measurements, we want to get
> them from a clock source which won't play funny games, warping back to
> the future.

You only think of measurements. The other drivers used the stamping also
for real applications, means to calculate absolute clock-realtime
timeouts and wait for them to arrive. We will likely need to enhance
also the other driver APIs to select the desired clocksource, just like
for GPIO.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 14:46                                 ` Jan Kiszka
@ 2021-06-21 14:57                                   ` Philippe Gerum
  2021-06-21 15:35                                     ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-21 14:57 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: François Legal, Chen, Hongzhan, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 21.06.21 16:28, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 21.06.21 15:54, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 21.06.21 11:39, Philippe Gerum wrote:
>>>>>>
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 18.06.21 20:41, François Legal wrote:
>>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>  
>>>>>>>>> On 16.06.21 15:51, François Legal wrote:
>>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>>>  
>>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>>>>>>>>>>>  
>>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe my setup was not very clear.
>>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I just did it.
>>>>>>>>>>>>
>>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>>>>>>>>>>>  cat frozen
>>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>>>>>>>>>>>
>>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>>>>>>>>>>>>  |+-- Xenomai
>>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>>>>>>>>>>>>  |||                      |        |
>>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>>>>>>>>>>
>>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>>>>>>>>>>
>>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>>>>>>>>>>
>>>>>>>>>>> Comming back from idle.
>>>>>>>>>>>
>>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>>>>>>>>>>
>>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>>>>>>>>>> nothing suspiciuos.
>>>>>>>>>>>
>>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>>>>>>>>>>
>>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>>>>>>>>>> too high latency?
>>>>>>>>>>
>>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>>>>>>>>>
>>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>>>>>>>> zooming in when you know where to zoom.
>>>>>>>>>
>>>>>>>>
>>>>>>>> So I think I found the answer to my problem.
>>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>>>>>>>
>>>>>>>
>>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>>>>>> thoughts behind this deviation.
>>>>>>>
>>>>>>
>>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>>>>>> (different CPUs might read different timestamps at the exact same time),
>>>>>> is not in sync with linux's wall clock either. For these reasons, I
>>>>>> don't see any practical way to synchronize multiple systems on the clock
>>>>>> underlying rtdm_clock_read().
>>>>>>
>>>>>> Therefore, there is no upside in using rtdm_clock_read() for
>>>>>> timestamping in this context, only adding the potential for even more
>>>>>> surprising results due to the mono->real-time offset changing under our
>>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>>>>>> during runtime.
>>>>>>
>>>>>> I believe the UART driver is wrong here, it should return timestamps
>>>>>> based on the monotonic source, which best fits the common need: getting
>>>>>> timestamps from the local CPU for measuring delays between events
>>>>>> received by drivers and the actions taken by the applications which
>>>>>> consume them, immune from updates to the underlying clock
>>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>>>>>> IRQ context, which is then consumed from a thread living on a different
>>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>>>>>> the exact same way with rtdm_clock_read() anyway.
>>>>>>
>>>>>> The situation improves when running on top of Dovetail, since Xenomai
>>>>>> now refers to the common linux clocks (mono / real) instead of providing
>>>>>> its own idea of time, but the UART driver code predates the Dovetail
>>>>>> port.
>>>>>>  
>>>>>
>>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>>>>> rtdm_clock_read. That was no problem in practice for their use cases so
>>>>> far. One problem is that GPIO timestamps are now not comparable to others.
>>>>>
>>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>>>>
>>>> /*
>>>>  * The Copperplate clock shall be monotonic unless the threading
>>>>  * library has restrictions to support this over Mercury.
>>>>  *
>>>>  * In the normal case, this means that ongoing delays and timeouts
>>>>  * won't be affected when the host system date is changed. In the
>>>>  * restricted case by contrast, ongoing delays and timeouts may be
>>>>  * impacted by changes to the host system date.
>>>>  *
>>>>  * The implementation maintains a per-clock epoch value, so that
>>>>  * different emulators can have different (virtual) system dates.
>>>>  */
>>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>>>> #else
>>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>>>> #endif
>>>>
>>>> So no, only the POSIX API is using what the standard mandates, which is
>>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>>>> using a monotonic source as documented above.
>>>
>>> OK, but that changed in 3.x. At the time that RTDM API was originally
>>> added and then promoted, it was the other not this way. We became
>>> inconsistent then.
>>>
>>>>
>>>>> particular clock was the original idea of rtdm_clock_read (which
>>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>>>>> fixed - unless I'm wrong with that assumption.
>>>>>
>>>>
>>>> We cannot assume the epoch is going to remain stable with
>>>> rtdm_clock_read() the way it is implemented, which is quite of a
>>>> problem wrt the common use case. For timestamping, a majority of
>>>> mainline drivers is using ktime_get() or a variant thereof which is
>>>> based on the monotonic clock source, not the _real form. Why would the
>>>> real-time I/O drivers be different?
>>>
>>> We have two cases here:
>>>
>>>  - I-pipe-based version where the realtime clock is under full
>>>    application control -> no problem to use rtdm_clock_read
>> 
>> Well, there is still the issue that rtdm_clock_read() is not immune to
>> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
>> e.g. via a call to clock_settime(), which is the same problem than Linux
>> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
>> application.
>
> Yes, but that's about the application(s) breaking themselves. Nothing
> new, not going to change when we only avoid clock_realtime for stamps
> but still use POSIX services basing timers on that clock. The key point
> is that I-pipe gave that into RT application hands (with all related
> downsides), with Dovetail it's in system hands.
>
>> 
>>>  - Dovetail where we share the realtime clock - with all its tuning -
>>>    with Linux -> here we have a problem with rtdm_clock_read and should
>>>    reconsider its usage (and promotion!)
>>>
>>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
>> 
>> Wait, you have downstream users already depending on GPIO returning
>> monotonic timestamps, and this is a _stable_ release. So why not fixing
>> other drivers based on the fact that timestamping with rtdm_clock_read()
>> is wrong instead? Same issue, right?
>
> GPIO was broken, but you are right that we may have users relying on
> that breakage now. Obviously, we can change the other drivers for the
> very same reasons: They are working like they work for more than 10
> years now.
>

Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
had to work around the breakage.. :)

>> 
>> So the best course of action to sort this out for 3.1.x may be to extend
>> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
>> timestamps based on the Xenomai wallclock. Applications which do want to
>> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
>> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
>> users of GPIO_RTIOC_TS_REAL, but that would be much better than
>> introducing a sneaky change in behavior for the GPIO driver.
>
> Yeah, likely the way to go.
>
>> 
>>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>>>
>> 
>> The decision looks pretty simple for the common use case: when
>> timestamps are needed for performance/delay measurements, we want to get
>> them from a clock source which won't play funny games, warping back to
>> the future.
>
> You only think of measurements. The other drivers used the stamping also
> for real applications, means to calculate absolute clock-realtime
> timeouts and wait for them to arrive. We will likely need to enhance
> also the other driver APIs to select the desired clocksource, just like
> for GPIO.

I'm referring to what is in the GPIO code, which is the problem at hand:
that timestamping was designed since day #1 to provide a way to measure
the scheduling latency.

I agree that the only way out is to enable all timestamp consumers to
pick their base clock of choice (mono or wallclock).

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 14:57                                   ` Philippe Gerum
@ 2021-06-21 15:35                                     ` François Legal
  2021-06-21 16:38                                       ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-21 15:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Chen, Hongzhan, xenomai

Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> Jan Kiszka <jan.kiszka@siemens.com> writes:
>
> > On 21.06.21 16:28, Philippe Gerum wrote:
> >>
> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>
> >>> On 21.06.21 15:54, Philippe Gerum wrote:
> >>>>
> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>>
> >>>>> On 21.06.21 11:39, Philippe Gerum wrote:
> >>>>>>
> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>>>>
> >>>>>>> On 18.06.21 20:41, François Legal wrote:
> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>>>>>
> >>>>>>>>> On 16.06.21 15:51, François Legal wrote:
> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>>>>>>>
> >>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org>
> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
> >>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
> >>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Maybe my setup was not very clear.
> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I just did it.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
> >>>>>>>>>>>>  cat frozen
> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
> >>>>>>>>>>>> ------------------------------------------------------------
> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
> >>>>>>>>>>>>
> >>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
> >>>>>>>>>>>>  |+-- Xenomai
> >>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
> >>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
> >>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
> >>>>>>>>>>>>  |||                      |        |
> >>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
> >>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
> >>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
> >>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
> >>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
> >>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
> >>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
> >>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
> >>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
> >>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
> >>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
> >>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
> >>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
> >>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
> >>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
> >>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
> >>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
> >>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
> >>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
> >>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
> >>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
> >>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
> >>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
> >>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
> >>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
> >>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
> >>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
> >>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
> >>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
> >>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
> >>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
> >>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
> >>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
> >>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
> >>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
> >>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
> >>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
> >>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
> >>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
> >>>>>>>>>>>
> >>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
> >>>>>>>>>>>
> >>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
> >>>>>>>>>>>
> >>>>>>>>>>> Comming back from idle.
> >>>>>>>>>>>
> >>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
> >>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
> >>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
> >>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
> >>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
> >>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
> >>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
> >>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
> >>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
> >>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
> >>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
> >>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
> >>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
> >>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
> >>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
> >>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
> >>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
> >>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
> >>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
> >>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
> >>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
> >>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
> >>>>>>>>>>>
> >>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
> >>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
> >>>>>>>>>>> nothing suspiciuos.
> >>>>>>>>>>>
> >>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
> >>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
> >>>>>>>>>>>
> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
> >>>>>>>>>>> too high latency?
> >>>>>>>>>>
> >>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
> >>>>>>>>>>
> >>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
> >>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
> >>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
> >>>>>>>>> zooming in when you know where to zoom.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> So I think I found the answer to my problem.
> >>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
> >>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
> >>>>>>> thoughts behind this deviation.
> >>>>>>>
> >>>>>>
> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
> >>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
> >>>>>> (different CPUs might read different timestamps at the exact same time),
> >>>>>> is not in sync with linux's wall clock either. For these reasons, I
> >>>>>> don't see any practical way to synchronize multiple systems on the clock
> >>>>>> underlying rtdm_clock_read().
> >>>>>>
> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for
> >>>>>> timestamping in this context, only adding the potential for even more
> >>>>>> surprising results due to the mono->real-time offset changing under our
> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
> >>>>>> during runtime.
> >>>>>>
> >>>>>> I believe the UART driver is wrong here, it should return timestamps
> >>>>>> based on the monotonic source, which best fits the common need: getting
> >>>>>> timestamps from the local CPU for measuring delays between events
> >>>>>> received by drivers and the actions taken by the applications which
> >>>>>> consume them, immune from updates to the underlying clock
> >>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
> >>>>>> IRQ context, which is then consumed from a thread living on a different
> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
> >>>>>> the exact same way with rtdm_clock_read() anyway.
> >>>>>>
> >>>>>> The situation improves when running on top of Dovetail, since Xenomai
> >>>>>> now refers to the common linux clocks (mono / real) instead of providing
> >>>>>> its own idea of time, but the UART driver code predates the Dovetail
> >>>>>> port.
> >>>>>>
> >>>>>
> >>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
> >>>>> rtdm_clock_read. That was no problem in practice for their use cases so
> >>>>> far. One problem is that GPIO timestamps are now not comparable to others.
> >>>>>
> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
> >>>>
> >>>> /*
> >>>>  * The Copperplate clock shall be monotonic unless the threading
> >>>>  * library has restrictions to support this over Mercury.
> >>>>  *
> >>>>  * In the normal case, this means that ongoing delays and timeouts
> >>>>  * won't be affected when the host system date is changed. In the
> >>>>  * restricted case by contrast, ongoing delays and timeouts may be
> >>>>  * impacted by changes to the host system date.
> >>>>  *
> >>>>  * The implementation maintains a per-clock epoch value, so that
> >>>>  * different emulators can have different (virtual) system dates.
> >>>>  */
> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
> >>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
> >>>> #else
> >>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
> >>>> #endif
> >>>>
> >>>> So no, only the POSIX API is using what the standard mandates, which is
> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
> >>>> using a monotonic source as documented above.
> >>>
> >>> OK, but that changed in 3.x. At the time that RTDM API was originally
> >>> added and then promoted, it was the other not this way. We became
> >>> inconsistent then.
> >>>
> >>>>
> >>>>> particular clock was the original idea of rtdm_clock_read (which
> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
> >>>>> fixed - unless I'm wrong with that assumption.
> >>>>>
> >>>>
> >>>> We cannot assume the epoch is going to remain stable with
> >>>> rtdm_clock_read() the way it is implemented, which is quite of a
> >>>> problem wrt the common use case. For timestamping, a majority of
> >>>> mainline drivers is using ktime_get() or a variant thereof which is
> >>>> based on the monotonic clock source, not the _real form. Why would the
> >>>> real-time I/O drivers be different?
> >>>
> >>> We have two cases here:
> >>>
> >>>  - I-pipe-based version where the realtime clock is under full
> >>>    application control -> no problem to use rtdm_clock_read
> >>
> >> Well, there is still the issue that rtdm_clock_read() is not immune to
> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
> >> e.g. via a call to clock_settime(), which is the same problem than Linux
> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
> >> application.
> >
> > Yes, but that's about the application(s) breaking themselves. Nothing
> > new, not going to change when we only avoid clock_realtime for stamps
> > but still use POSIX services basing timers on that clock. The key point
> > is that I-pipe gave that into RT application hands (with all related
> > downsides), with Dovetail it's in system hands.
> >
> >>
> >>>  - Dovetail where we share the realtime clock - with all its tuning -
> >>>    with Linux -> here we have a problem with rtdm_clock_read and should
> >>>    reconsider its usage (and promotion!)
> >>>
> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
> >>
> >> Wait, you have downstream users already depending on GPIO returning
> >> monotonic timestamps, and this is a _stable_ release. So why not fixing
> >> other drivers based on the fact that timestamping with rtdm_clock_read()
> >> is wrong instead? Same issue, right?
> >
> > GPIO was broken, but you are right that we may have users relying on
> > that breakage now. Obviously, we can change the other drivers for the
> > very same reasons: They are working like they work for more than 10
> > years now.
> >
>
> Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
> had to work around the breakage.. :)
>
> >>
> >> So the best course of action to sort this out for 3.1.x may be to extend
> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
> >> timestamps based on the Xenomai wallclock. Applications which do want to
> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than
> >> introducing a sneaky change in behavior for the GPIO driver.
> >
> > Yeah, likely the way to go.
> >
> >>
> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
> >>>
> >>
> >> The decision looks pretty simple for the common use case: when
> >> timestamps are needed for performance/delay measurements, we want to get
> >> them from a clock source which won't play funny games, warping back to
> >> the future.
> >
> > You only think of measurements. The other drivers used the stamping also
> > for real applications, means to calculate absolute clock-realtime
> > timeouts and wait for them to arrive. We will likely need to enhance
> > also the other driver APIs to select the desired clocksource, just like
> > for GPIO.
>
> I'm referring to what is in the GPIO code, which is the problem at hand:
> that timestamping was designed since day #1 to provide a way to measure
> the scheduling latency.
>
> I agree that the only way out is to enable all timestamp consumers to
> pick their base clock of choice (mono or wallclock).
>
> --
> Philippe.

 Hi there,

as we speak of breaking ABI, why not introducing some specific IOCTL to select which timesource to use for each driver using it, then using a function pointer to call the right rtdm_time_get_whatever function ?

François



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 15:35                                     ` François Legal
@ 2021-06-21 16:38                                       ` Philippe Gerum
  2021-06-21 16:45                                         ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-21 16:38 UTC (permalink / raw)
  To: François Legal; +Cc: Jan Kiszka, Chen, Hongzhan, xenomai


François Legal <francois.legal@thom.fr.eu.org> writes:

> Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>  
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>> > On 21.06.21 16:28, Philippe Gerum wrote:
>> >> 
>> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >> 
>> >>> On 21.06.21 15:54, Philippe Gerum wrote:
>> >>>>
>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>>>
>> >>>>> On 21.06.21 11:39, Philippe Gerum wrote:
>> >>>>>>
>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>>>>>
>> >>>>>>> On 18.06.21 20:41, François Legal wrote:
>> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>> >>>>>>>>  
>> >>>>>>>>> On 16.06.21 15:51, François Legal wrote:
>> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>> >>>>>>>>>>  
>> >>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>> >>>>>>>>>>>>  
>> >>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>> >>>>>>>>>>>>>>  
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> -----Original Message-----
>> >>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>> >>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> >>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> -----Original Message-----
>> >>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>> >>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Hello,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Maybe my setup was not very clear.
>> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I just did it.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>> >>>>>>>>>>>>  cat frozen
>> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>> >>>>>>>>>>>> ------------------------------------------------------------
>> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>> >>>>>>>>>>>>  |+-- Xenomai
>> >>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>> >>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>> >>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>> >>>>>>>>>>>>  |||                      |        |
>> >>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>> >>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>> >>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>> >>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>> >>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>> >>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>> >>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>> >>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>> >>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>> >>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>> >>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>> >>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>> >>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>> >>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>> >>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>> >>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>> >>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>> >>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>> >>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>> >>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>> >>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>> >>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>> >>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>> >>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>> >>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>> >>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>> >>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>> >>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>> >>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>> >>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>> >>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>> >>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>> >>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>> >>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>> >>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>> >>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>> >>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>> >>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>> >>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>> >>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>> >>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>> >>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>> >>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>> >>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>> >>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>> >>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>> >>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>> >>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>> >>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>> >>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>> >>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>> >>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>> >>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>> >>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>> >>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>> >>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>> >>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>> >>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>> >>>>>>>>>>>
>> >>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Comming back from idle.
>> >>>>>>>>>>>
>> >>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>> >>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>> >>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>> >>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>> >>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>> >>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>> >>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>> >>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>> >>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>> >>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>> >>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>> >>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>> >>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>> >>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>> >>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>> >>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>> >>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>> >>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>> >>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>> >>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>> >>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>> >>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>> >>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>> >>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>> >>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>> >>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>> >>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>> >>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>> >>>>>>>>>>>
>> >>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>> >>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>> >>>>>>>>>>> nothing suspiciuos.
>> >>>>>>>>>>>
>> >>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>> >>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>> >>>>>>>>>>> too high latency?
>> >>>>>>>>>>
>> >>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>> >>>>>>>>>>
>> >>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>> >>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>> >>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>> >>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>> >>>>>>>>> zooming in when you know where to zoom.
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> So I think I found the answer to my problem.
>> >>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>> >>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>> >>>>>>> thoughts behind this deviation.
>> >>>>>>>
>> >>>>>>
>> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>> >>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>> >>>>>> (different CPUs might read different timestamps at the exact same time),
>> >>>>>> is not in sync with linux's wall clock either. For these reasons, I
>> >>>>>> don't see any practical way to synchronize multiple systems on the clock
>> >>>>>> underlying rtdm_clock_read().
>> >>>>>>
>> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for
>> >>>>>> timestamping in this context, only adding the potential for even more
>> >>>>>> surprising results due to the mono->real-time offset changing under our
>> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>> >>>>>> during runtime.
>> >>>>>>
>> >>>>>> I believe the UART driver is wrong here, it should return timestamps
>> >>>>>> based on the monotonic source, which best fits the common need: getting
>> >>>>>> timestamps from the local CPU for measuring delays between events
>> >>>>>> received by drivers and the actions taken by the applications which
>> >>>>>> consume them, immune from updates to the underlying clock
>> >>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>> >>>>>> IRQ context, which is then consumed from a thread living on a different
>> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>> >>>>>> the exact same way with rtdm_clock_read() anyway.
>> >>>>>>
>> >>>>>> The situation improves when running on top of Dovetail, since Xenomai
>> >>>>>> now refers to the common linux clocks (mono / real) instead of providing
>> >>>>>> its own idea of time, but the UART driver code predates the Dovetail
>> >>>>>> port.
>> >>>>>>  
>> >>>>>
>> >>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>> >>>>> rtdm_clock_read. That was no problem in practice for their use cases so
>> >>>>> far. One problem is that GPIO timestamps are now not comparable to others.
>> >>>>>
>> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>> >>>>
>> >>>> /*
>> >>>>  * The Copperplate clock shall be monotonic unless the threading
>> >>>>  * library has restrictions to support this over Mercury.
>> >>>>  *
>> >>>>  * In the normal case, this means that ongoing delays and timeouts
>> >>>>  * won't be affected when the host system date is changed. In the
>> >>>>  * restricted case by contrast, ongoing delays and timeouts may be
>> >>>>  * impacted by changes to the host system date.
>> >>>>  *
>> >>>>  * The implementation maintains a per-clock epoch value, so that
>> >>>>  * different emulators can have different (virtual) system dates.
>> >>>>  */
>> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>> >>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>> >>>> #else
>> >>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>> >>>> #endif
>> >>>>
>> >>>> So no, only the POSIX API is using what the standard mandates, which is
>> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>> >>>> using a monotonic source as documented above.
>> >>>
>> >>> OK, but that changed in 3.x. At the time that RTDM API was originally
>> >>> added and then promoted, it was the other not this way. We became
>> >>> inconsistent then.
>> >>>
>> >>>>
>> >>>>> particular clock was the original idea of rtdm_clock_read (which
>> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>> >>>>> fixed - unless I'm wrong with that assumption.
>> >>>>>
>> >>>>
>> >>>> We cannot assume the epoch is going to remain stable with
>> >>>> rtdm_clock_read() the way it is implemented, which is quite of a
>> >>>> problem wrt the common use case. For timestamping, a majority of
>> >>>> mainline drivers is using ktime_get() or a variant thereof which is
>> >>>> based on the monotonic clock source, not the _real form. Why would the
>> >>>> real-time I/O drivers be different?
>> >>>
>> >>> We have two cases here:
>> >>>
>> >>>  - I-pipe-based version where the realtime clock is under full
>> >>>    application control -> no problem to use rtdm_clock_read
>> >> 
>> >> Well, there is still the issue that rtdm_clock_read() is not immune to
>> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
>> >> e.g. via a call to clock_settime(), which is the same problem than Linux
>> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
>> >> application.
>> >
>> > Yes, but that's about the application(s) breaking themselves. Nothing
>> > new, not going to change when we only avoid clock_realtime for stamps
>> > but still use POSIX services basing timers on that clock. The key point
>> > is that I-pipe gave that into RT application hands (with all related
>> > downsides), with Dovetail it's in system hands.
>> >
>> >> 
>> >>>  - Dovetail where we share the realtime clock - with all its tuning -
>> >>>    with Linux -> here we have a problem with rtdm_clock_read and should
>> >>>    reconsider its usage (and promotion!)
>> >>>
>> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
>> >> 
>> >> Wait, you have downstream users already depending on GPIO returning
>> >> monotonic timestamps, and this is a _stable_ release. So why not fixing
>> >> other drivers based on the fact that timestamping with rtdm_clock_read()
>> >> is wrong instead? Same issue, right?
>> >
>> > GPIO was broken, but you are right that we may have users relying on
>> > that breakage now. Obviously, we can change the other drivers for the
>> > very same reasons: They are working like they work for more than 10
>> > years now.
>> >
>> 
>> Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
>> had to work around the breakage.. :)
>> 
>> >> 
>> >> So the best course of action to sort this out for 3.1.x may be to extend
>> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
>> >> timestamps based on the Xenomai wallclock. Applications which do want to
>> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
>> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
>> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than
>> >> introducing a sneaky change in behavior for the GPIO driver.
>> >
>> > Yeah, likely the way to go.
>> >
>> >> 
>> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>> >>>
>> >> 
>> >> The decision looks pretty simple for the common use case: when
>> >> timestamps are needed for performance/delay measurements, we want to get
>> >> them from a clock source which won't play funny games, warping back to
>> >> the future.
>> >
>> > You only think of measurements. The other drivers used the stamping also
>> > for real applications, means to calculate absolute clock-realtime
>> > timeouts and wait for them to arrive. We will likely need to enhance
>> > also the other driver APIs to select the desired clocksource, just like
>> > for GPIO.
>> 
>> I'm referring to what is in the GPIO code, which is the problem at hand:
>> that timestamping was designed since day #1 to provide a way to measure
>> the scheduling latency.
>> 
>> I agree that the only way out is to enable all timestamp consumers to
>> pick their base clock of choice (mono or wallclock).
>> 
>> -- 
>> Philippe.
>  
>  Hi there,
>
> as we speak of breaking ABI, why not introducing some specific IOCTL to select which timesource to use for each driver using it, then using a function pointer to call the right rtdm_time_get_whatever function ?
>

That would mean adding a generic ioctl, and many changes all over the
place (we would need no function pointer with distinct ioctl
codes). Doable for sure, but likely too much for 3.1.x though.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 16:38                                       ` Philippe Gerum
@ 2021-06-21 16:45                                         ` Philippe Gerum
  2021-06-21 18:06                                           ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-21 16:45 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: François Legal, xenomai


Philippe Gerum via Xenomai <xenomai@xenomai.org> writes:

> François Legal <francois.legal@thom.fr.eu.org> writes:
>
>> Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>>  
>>> 
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>> 
>>> > On 21.06.21 16:28, Philippe Gerum wrote:
>>> >> 
>>> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>> >> 
>>> >>> On 21.06.21 15:54, Philippe Gerum wrote:
>>> >>>>
>>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>> >>>>
>>> >>>>> On 21.06.21 11:39, Philippe Gerum wrote:
>>> >>>>>>
>>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>> >>>>>>
>>> >>>>>>> On 18.06.21 20:41, François Legal wrote:
>>> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>> >>>>>>>>  
>>> >>>>>>>>> On 16.06.21 15:51, François Legal wrote:
>>> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>> >>>>>>>>>>  
>>> >>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>>> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>>> >>>>>>>>>>>>  
>>> >>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>>> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>> >>>>>>>>>>>>>>  
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> -----Original Message-----
>>> >>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>>> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>>> >>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>>> >>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>>> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> -----Original Message-----
>>> >>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>>> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>>> >>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>>> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> Hello,
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>>> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>>> >>>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>>> >>>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> Maybe my setup was not very clear.
>>> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>>> >>>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>>> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>>> >>>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>>> >>>>>>>>>>>>>>
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>>> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>>> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>>> >>>>>>>>>>>>>
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I just did it.
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>>> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>>> >>>>>>>>>>>>  cat frozen
>>> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>>> >>>>>>>>>>>> ------------------------------------------------------------
>>> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>>> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>>> >>>>>>>>>>>>
>>> >>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>>> >>>>>>>>>>>>  |+-- Xenomai
>>> >>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>>> >>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>>> >>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>>> >>>>>>>>>>>>  |||                      |        |
>>> >>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>>> >>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>>> >>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>>> >>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>>> >>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>>> >>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>>> >>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>> >>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>>> >>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>>> >>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>>> >>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>>> >>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>>> >>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>> >>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>>> >>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>> >>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>>> >>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>>> >>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>> >>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>>> >>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>>> >>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>>> >>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>>> >>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>>> >>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>>> >>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>>> >>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>>> >>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>>> >>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>>> >>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>> >>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>>> >>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>>> >>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>>> >>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>> >>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>> >>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>>> >>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>>> >>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>>> >>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>>> >>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>>> >>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>>> >>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>>> >>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>>> >>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>>> >>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>>> >>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>>> >>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>>> >>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>>> >>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>>> >>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>>> >>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Comming back from idle.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>>> >>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>>> >>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>>> >>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>>> >>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>>> >>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>>> >>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>>> >>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>>> >>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>> >>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>> >>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>>> >>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>>> >>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>>> >>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>>> >>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>>> >>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>>> >>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>>> >>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>>> >>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>>> >>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>>> >>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>>> >>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>>> >>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>>> >>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>>> >>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>>> >>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>>> >>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>>> >>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>>> >>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>>> >>>>>>>>>>> nothing suspiciuos.
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>>> >>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>>> >>>>>>>>>>> too high latency?
>>> >>>>>>>>>>
>>> >>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>>> >>>>>>>>>>
>>> >>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>>> >>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>>> >>>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>>> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>>> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>>> >>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>>> >>>>>>>>> zooming in when you know where to zoom.
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> So I think I found the answer to my problem.
>>> >>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>>> >>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>>> >>>>>>> thoughts behind this deviation.
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>>> >>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>>> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>>> >>>>>> (different CPUs might read different timestamps at the exact same time),
>>> >>>>>> is not in sync with linux's wall clock either. For these reasons, I
>>> >>>>>> don't see any practical way to synchronize multiple systems on the clock
>>> >>>>>> underlying rtdm_clock_read().
>>> >>>>>>
>>> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for
>>> >>>>>> timestamping in this context, only adding the potential for even more
>>> >>>>>> surprising results due to the mono->real-time offset changing under our
>>> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>>> >>>>>> during runtime.
>>> >>>>>>
>>> >>>>>> I believe the UART driver is wrong here, it should return timestamps
>>> >>>>>> based on the monotonic source, which best fits the common need: getting
>>> >>>>>> timestamps from the local CPU for measuring delays between events
>>> >>>>>> received by drivers and the actions taken by the applications which
>>> >>>>>> consume them, immune from updates to the underlying clock
>>> >>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>>> >>>>>> IRQ context, which is then consumed from a thread living on a different
>>> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>>> >>>>>> the exact same way with rtdm_clock_read() anyway.
>>> >>>>>>
>>> >>>>>> The situation improves when running on top of Dovetail, since Xenomai
>>> >>>>>> now refers to the common linux clocks (mono / real) instead of providing
>>> >>>>>> its own idea of time, but the UART driver code predates the Dovetail
>>> >>>>>> port.
>>> >>>>>>  
>>> >>>>>
>>> >>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>>> >>>>> rtdm_clock_read. That was no problem in practice for their use cases so
>>> >>>>> far. One problem is that GPIO timestamps are now not comparable to others.
>>> >>>>>
>>> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>>> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>>> >>>>
>>> >>>> /*
>>> >>>>  * The Copperplate clock shall be monotonic unless the threading
>>> >>>>  * library has restrictions to support this over Mercury.
>>> >>>>  *
>>> >>>>  * In the normal case, this means that ongoing delays and timeouts
>>> >>>>  * won't be affected when the host system date is changed. In the
>>> >>>>  * restricted case by contrast, ongoing delays and timeouts may be
>>> >>>>  * impacted by changes to the host system date.
>>> >>>>  *
>>> >>>>  * The implementation maintains a per-clock epoch value, so that
>>> >>>>  * different emulators can have different (virtual) system dates.
>>> >>>>  */
>>> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>>> >>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>>> >>>> #else
>>> >>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>>> >>>> #endif
>>> >>>>
>>> >>>> So no, only the POSIX API is using what the standard mandates, which is
>>> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>>> >>>> using a monotonic source as documented above.
>>> >>>
>>> >>> OK, but that changed in 3.x. At the time that RTDM API was originally
>>> >>> added and then promoted, it was the other not this way. We became
>>> >>> inconsistent then.
>>> >>>
>>> >>>>
>>> >>>>> particular clock was the original idea of rtdm_clock_read (which
>>> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>>> >>>>> fixed - unless I'm wrong with that assumption.
>>> >>>>>
>>> >>>>
>>> >>>> We cannot assume the epoch is going to remain stable with
>>> >>>> rtdm_clock_read() the way it is implemented, which is quite of a
>>> >>>> problem wrt the common use case. For timestamping, a majority of
>>> >>>> mainline drivers is using ktime_get() or a variant thereof which is
>>> >>>> based on the monotonic clock source, not the _real form. Why would the
>>> >>>> real-time I/O drivers be different?
>>> >>>
>>> >>> We have two cases here:
>>> >>>
>>> >>>  - I-pipe-based version where the realtime clock is under full
>>> >>>    application control -> no problem to use rtdm_clock_read
>>> >> 
>>> >> Well, there is still the issue that rtdm_clock_read() is not immune to
>>> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
>>> >> e.g. via a call to clock_settime(), which is the same problem than Linux
>>> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
>>> >> application.
>>> >
>>> > Yes, but that's about the application(s) breaking themselves. Nothing
>>> > new, not going to change when we only avoid clock_realtime for stamps
>>> > but still use POSIX services basing timers on that clock. The key point
>>> > is that I-pipe gave that into RT application hands (with all related
>>> > downsides), with Dovetail it's in system hands.
>>> >
>>> >> 
>>> >>>  - Dovetail where we share the realtime clock - with all its tuning -
>>> >>>    with Linux -> here we have a problem with rtdm_clock_read and should
>>> >>>    reconsider its usage (and promotion!)
>>> >>>
>>> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
>>> >> 
>>> >> Wait, you have downstream users already depending on GPIO returning
>>> >> monotonic timestamps, and this is a _stable_ release. So why not fixing
>>> >> other drivers based on the fact that timestamping with rtdm_clock_read()
>>> >> is wrong instead? Same issue, right?
>>> >
>>> > GPIO was broken, but you are right that we may have users relying on
>>> > that breakage now. Obviously, we can change the other drivers for the
>>> > very same reasons: They are working like they work for more than 10
>>> > years now.
>>> >
>>> 
>>> Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
>>> had to work around the breakage.. :)
>>> 
>>> >> 
>>> >> So the best course of action to sort this out for 3.1.x may be to extend
>>> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
>>> >> timestamps based on the Xenomai wallclock. Applications which do want to
>>> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
>>> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
>>> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than
>>> >> introducing a sneaky change in behavior for the GPIO driver.
>>> >
>>> > Yeah, likely the way to go.
>>> >
>>> >> 
>>> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>>> >>>
>>> >> 
>>> >> The decision looks pretty simple for the common use case: when
>>> >> timestamps are needed for performance/delay measurements, we want to get
>>> >> them from a clock source which won't play funny games, warping back to
>>> >> the future.
>>> >
>>> > You only think of measurements. The other drivers used the stamping also
>>> > for real applications, means to calculate absolute clock-realtime
>>> > timeouts and wait for them to arrive. We will likely need to enhance
>>> > also the other driver APIs to select the desired clocksource, just like
>>> > for GPIO.
>>> 
>>> I'm referring to what is in the GPIO code, which is the problem at hand:
>>> that timestamping was designed since day #1 to provide a way to measure
>>> the scheduling latency.
>>> 
>>> I agree that the only way out is to enable all timestamp consumers to
>>> pick their base clock of choice (mono or wallclock).
>>> 
>>> -- 
>>> Philippe.
>>  
>>  Hi there,
>>
>> as we speak of breaking ABI, why not introducing some specific IOCTL to select which timesource to use for each driver using it, then using a function pointer to call the right rtdm_time_get_whatever function ?
>>
>
> That would mean adding a generic ioctl, and many changes all over the
> place (we would need no function pointer with distinct ioctl
> codes). Doable for sure, but likely too much for 3.1.x though.

wrt function pointer: I mean that a plain simple branch on some selector
is likely to perform better than a function pointer, with the spectre
mitigations around.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 16:45                                         ` Philippe Gerum
@ 2021-06-21 18:06                                           ` François Legal
  2021-06-22  7:38                                             ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-06-21 18:06 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Le Lundi, Juin 21, 2021 18:45 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> Philippe Gerum via Xenomai <xenomai@xenomai.org> writes:
>
> > François Legal <francois.legal@thom.fr.eu.org> writes:
> >
> >> Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >>
> >>>
> >>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>
> >>> > On 21.06.21 16:28, Philippe Gerum wrote:
> >>> >>
> >>> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>> >>
> >>> >>> On 21.06.21 15:54, Philippe Gerum wrote:
> >>> >>>>
> >>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>> >>>>
> >>> >>>>> On 21.06.21 11:39, Philippe Gerum wrote:
> >>> >>>>>>
> >>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>> >>>>>>
> >>> >>>>>>> On 18.06.21 20:41, François Legal wrote:
> >>> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>> >>>>>>>>
> >>> >>>>>>>>> On 16.06.21 15:51, François Legal wrote:
> >>> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>> >>>>>>>>>>
> >>> >>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
> >>> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
> >>> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> -----Original Message-----
> >>> >>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org>
> >>> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
> >>> >>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
> >>> >>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
> >>> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit:
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> -----Original Message-----
> >>> >>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
> >>> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
> >>> >>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
> >>> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> Hello,
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
> >>> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
> >>> >>>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
> >>> >>>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> Maybe my setup was not very clear.
> >>> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
> >>> >>>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?
> >>> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
> >>> >>>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
> >>> >>>>>>>>>>>>>>
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
> >>> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
> >>> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
> >>> >>>>>>>>>>>>>
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> I just did it.
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
> >>> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
> >>> >>>>>>>>>>>>  cat frozen
> >>> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
> >>> >>>>>>>>>>>> ------------------------------------------------------------
> >>> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
> >>> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
> >>> >>>>>>>>>>>>
> >>> >>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
> >>> >>>>>>>>>>>>  |+-- Xenomai
> >>> >>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
> >>> >>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
> >>> >>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
> >>> >>>>>>>>>>>>  |||                      |        |
> >>> >>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
> >>> >>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
> >>> >>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
> >>> >>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
> >>> >>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
> >>> >>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
> >>> >>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>> >>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
> >>> >>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
> >>> >>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
> >>> >>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
> >>> >>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
> >>> >>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> >>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>> >>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
> >>> >>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>> >>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
> >>> >>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
> >>> >>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> >>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
> >>> >>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
> >>> >>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
> >>> >>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
> >>> >>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
> >>> >>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
> >>> >>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
> >>> >>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
> >>> >>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
> >>> >>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
> >>> >>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
> >>> >>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> >>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> >>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
> >>> >>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
> >>> >>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
> >>> >>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>> >>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>> >>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
> >>> >>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
> >>> >>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> >>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
> >>> >>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
> >>> >>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
> >>> >>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
> >>> >>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
> >>> >>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
> >>> >>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
> >>> >>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
> >>> >>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
> >>> >>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
> >>> >>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
> >>> >>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
> >>> >>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> Comming back from idle.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
> >>> >>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
> >>> >>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
> >>> >>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
> >>> >>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
> >>> >>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
> >>> >>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
> >>> >>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
> >>> >>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>> >>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>> >>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
> >>> >>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
> >>> >>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
> >>> >>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
> >>> >>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
> >>> >>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
> >>> >>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
> >>> >>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
> >>> >>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
> >>> >>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
> >>> >>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
> >>> >>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
> >>> >>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
> >>> >>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
> >>> >>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
> >>> >>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
> >>> >>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
> >>> >>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
> >>> >>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
> >>> >>>>>>>>>>> nothing suspiciuos.
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
> >>> >>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
> >>> >>>>>>>>>>>
> >>> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
> >>> >>>>>>>>>>> too high latency?
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
> >>> >>>>>>>>>>
> >>> >>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
> >>> >>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
> >>> >>>>>>>>>>
> >>> >>>>>>>>>
> >>> >>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
> >>> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
> >>> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
> >>> >>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
> >>> >>>>>>>>> zooming in when you know where to zoom.
> >>> >>>>>>>>>
> >>> >>>>>>>>
> >>> >>>>>>>> So I think I found the answer to my problem.
> >>> >>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
> >>> >>>>>>>>
> >>> >>>>>>>
> >>> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
> >>> >>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
> >>> >>>>>>> thoughts behind this deviation.
> >>> >>>>>>>
> >>> >>>>>>
> >>> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
> >>> >>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
> >>> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
> >>> >>>>>> (different CPUs might read different timestamps at the exact same time),
> >>> >>>>>> is not in sync with linux's wall clock either. For these reasons, I
> >>> >>>>>> don't see any practical way to synchronize multiple systems on the clock
> >>> >>>>>> underlying rtdm_clock_read().
> >>> >>>>>>
> >>> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for
> >>> >>>>>> timestamping in this context, only adding the potential for even more
> >>> >>>>>> surprising results due to the mono->real-time offset changing under our
> >>> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
> >>> >>>>>> during runtime.
> >>> >>>>>>
> >>> >>>>>> I believe the UART driver is wrong here, it should return timestamps
> >>> >>>>>> based on the monotonic source, which best fits the common need: getting
> >>> >>>>>> timestamps from the local CPU for measuring delays between events
> >>> >>>>>> received by drivers and the actions taken by the applications which
> >>> >>>>>> consume them, immune from updates to the underlying clock
> >>> >>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
> >>> >>>>>> IRQ context, which is then consumed from a thread living on a different
> >>> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
> >>> >>>>>> the exact same way with rtdm_clock_read() anyway.
> >>> >>>>>>
> >>> >>>>>> The situation improves when running on top of Dovetail, since Xenomai
> >>> >>>>>> now refers to the common linux clocks (mono / real) instead of providing
> >>> >>>>>> its own idea of time, but the UART driver code predates the Dovetail
> >>> >>>>>> port.
> >>> >>>>>>
> >>> >>>>>
> >>> >>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
> >>> >>>>> rtdm_clock_read. That was no problem in practice for their use cases so
> >>> >>>>> far. One problem is that GPIO timestamps are now not comparable to others.
> >>> >>>>>
> >>> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
> >>> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
> >>> >>>>
> >>> >>>> /*
> >>> >>>>  * The Copperplate clock shall be monotonic unless the threading
> >>> >>>>  * library has restrictions to support this over Mercury.
> >>> >>>>  *
> >>> >>>>  * In the normal case, this means that ongoing delays and timeouts
> >>> >>>>  * won't be affected when the host system date is changed. In the
> >>> >>>>  * restricted case by contrast, ongoing delays and timeouts may be
> >>> >>>>  * impacted by changes to the host system date.
> >>> >>>>  *
> >>> >>>>  * The implementation maintains a per-clock epoch value, so that
> >>> >>>>  * different emulators can have different (virtual) system dates.
> >>> >>>>  */
> >>> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
> >>> >>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
> >>> >>>> #else
> >>> >>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
> >>> >>>> #endif
> >>> >>>>
> >>> >>>> So no, only the POSIX API is using what the standard mandates, which is
> >>> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
> >>> >>>> using a monotonic source as documented above.
> >>> >>>
> >>> >>> OK, but that changed in 3.x. At the time that RTDM API was originally
> >>> >>> added and then promoted, it was the other not this way. We became
> >>> >>> inconsistent then.
> >>> >>>
> >>> >>>>
> >>> >>>>> particular clock was the original idea of rtdm_clock_read (which
> >>> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
> >>> >>>>> fixed - unless I'm wrong with that assumption.
> >>> >>>>>
> >>> >>>>
> >>> >>>> We cannot assume the epoch is going to remain stable with
> >>> >>>> rtdm_clock_read() the way it is implemented, which is quite of a
> >>> >>>> problem wrt the common use case. For timestamping, a majority of
> >>> >>>> mainline drivers is using ktime_get() or a variant thereof which is
> >>> >>>> based on the monotonic clock source, not the _real form. Why would the
> >>> >>>> real-time I/O drivers be different?
> >>> >>>
> >>> >>> We have two cases here:
> >>> >>>
> >>> >>>  - I-pipe-based version where the realtime clock is under full
> >>> >>>    application control -> no problem to use rtdm_clock_read
> >>> >>
> >>> >> Well, there is still the issue that rtdm_clock_read() is not immune to
> >>> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
> >>> >> e.g. via a call to clock_settime(), which is the same problem than Linux
> >>> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
> >>> >> application.
> >>> >
> >>> > Yes, but that's about the application(s) breaking themselves. Nothing
> >>> > new, not going to change when we only avoid clock_realtime for stamps
> >>> > but still use POSIX services basing timers on that clock. The key point
> >>> > is that I-pipe gave that into RT application hands (with all related
> >>> > downsides), with Dovetail it's in system hands.
> >>> >
> >>> >>
> >>> >>>  - Dovetail where we share the realtime clock - with all its tuning -
> >>> >>>    with Linux -> here we have a problem with rtdm_clock_read and should
> >>> >>>    reconsider its usage (and promotion!)
> >>> >>>
> >>> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
> >>> >>
> >>> >> Wait, you have downstream users already depending on GPIO returning
> >>> >> monotonic timestamps, and this is a _stable_ release. So why not fixing
> >>> >> other drivers based on the fact that timestamping with rtdm_clock_read()
> >>> >> is wrong instead? Same issue, right?
> >>> >
> >>> > GPIO was broken, but you are right that we may have users relying on
> >>> > that breakage now. Obviously, we can change the other drivers for the
> >>> > very same reasons: They are working like they work for more than 10
> >>> > years now.
> >>> >
> >>>
> >>> Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
> >>> had to work around the breakage.. :)
> >>>
> >>> >>
> >>> >> So the best course of action to sort this out for 3.1.x may be to extend
> >>> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
> >>> >> timestamps based on the Xenomai wallclock. Applications which do want to
> >>> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
> >>> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
> >>> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than
> >>> >> introducing a sneaky change in behavior for the GPIO driver.
> >>> >
> >>> > Yeah, likely the way to go.
> >>> >
> >>> >>
> >>> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
> >>> >>>
> >>> >>
> >>> >> The decision looks pretty simple for the common use case: when
> >>> >> timestamps are needed for performance/delay measurements, we want to get
> >>> >> them from a clock source which won't play funny games, warping back to
> >>> >> the future.
> >>> >
> >>> > You only think of measurements. The other drivers used the stamping also
> >>> > for real applications, means to calculate absolute clock-realtime
> >>> > timeouts and wait for them to arrive. We will likely need to enhance
> >>> > also the other driver APIs to select the desired clocksource, just like
> >>> > for GPIO.
> >>>
> >>> I'm referring to what is in the GPIO code, which is the problem at hand:
> >>> that timestamping was designed since day #1 to provide a way to measure
> >>> the scheduling latency.
> >>>
> >>> I agree that the only way out is to enable all timestamp consumers to
> >>> pick their base clock of choice (mono or wallclock).
> >>>
> >>> --
> >>> Philippe.
> >>
> >>  Hi there,
> >>
> >> as we speak of breaking ABI, why not introducing some specific IOCTL to select which timesource to use for each driver using it, then using a function pointer to call the right rtdm_time_get_whatever function ?
> >>
> >
> > That would mean adding a generic ioctl, and many changes all over the
> > place (we would need no function pointer with distinct ioctl
> > codes). Doable for sure, but likely too much for 3.1.x though.
>
> wrt function pointer: I mean that a plain simple branch on some selector
> is likely to perform better than a function pointer, with the spectre
> mitigations around.
>

Maybe I did not make myself clear enought. I was thinking of an ioctl that would allow users to select between rtdm_read_clock/rtdm_read_clock_whatever/... and not to provide a user function pointer.

François

> --
> Philippe.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-21 18:06                                           ` François Legal
@ 2021-06-22  7:38                                             ` Philippe Gerum
  2021-06-22  7:49                                               ` Julien Blanc
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-22  7:38 UTC (permalink / raw)
  To: François Legal; +Cc: xenomai


François Legal <francois.legal@thom.fr.eu.org> writes:

> Le Lundi, Juin 21, 2021 18:45 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>  
>> 
>> Philippe Gerum via Xenomai <xenomai@xenomai.org> writes:
>> 
>> > François Legal <francois.legal@thom.fr.eu.org> writes:
>> >
>> >> Le Lundi, Juin 21, 2021 16:57 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>> >>  
>> >>> 
>> >>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>> 
>> >>> > On 21.06.21 16:28, Philippe Gerum wrote:
>> >>> >> 
>> >>> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>> >> 
>> >>> >>> On 21.06.21 15:54, Philippe Gerum wrote:
>> >>> >>>>
>> >>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>> >>>>
>> >>> >>>>> On 21.06.21 11:39, Philippe Gerum wrote:
>> >>> >>>>>>
>> >>> >>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>> >>>>>>
>> >>> >>>>>>> On 18.06.21 20:41, François Legal wrote:
>> >>> >>>>>>>> Le Mercredi, Juin 16, 2021 17:10 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>> >>> >>>>>>>>  
>> >>> >>>>>>>>> On 16.06.21 15:51, François Legal wrote:
>> >>> >>>>>>>>>> Le Mercredi, Juin 16, 2021 15:38 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>> >>> >>>>>>>>>>  
>> >>> >>>>>>>>>>> On 16.06.21 15:29, François Legal wrote:
>> >>> >>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:40 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit: 
>> >>> >>>>>>>>>>>>  
>> >>> >>>>>>>>>>>>> On 16.06.21 11:12, François Legal via Xenomai wrote:
>> >>> >>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 11:05 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>> >>> >>>>>>>>>>>>>>  
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> -----Original Message-----
>> >>> >>>>>>>>>>>>>>>> From: François Legal <devel@thom.fr.eu.org> 
>> >>> >>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 4:19 PM
>> >>> >>>>>>>>>>>>>>>> To: Chen, Hongzhan <hongzhan.chen@intel.com>
>> >>> >>>>>>>>>>>>>>>> Cc: xenomai@xenomai.org
>> >>> >>>>>>>>>>>>>>>> Subject: RE: Large gpio interrupt latency
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> Le Mercredi, Juin 16, 2021 10:10 CEST, "Chen, Hongzhan" <hongzhan.chen@intel.com> a écrit: 
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> -----Original Message-----
>> >>> >>>>>>>>>>>>>>>>>> From: Xenomai <xenomai-bounces@xenomai.org> On Behalf Of François Legal via Xenomai
>> >>> >>>>>>>>>>>>>>>>>> Sent: Wednesday, June 16, 2021 3:16 PM
>> >>> >>>>>>>>>>>>>>>>>> To: xenomai@xenomai.org
>> >>> >>>>>>>>>>>>>>>>>> Subject: Large gpio interrupt latency
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> Hello,
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> working on a realtime data recorder (for which I submitted a patch to add timestamp retrieval for net packets), I experience a strange latency problem with taking GPIO interrupts.
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> So basically my app monitors network packets (which as received by the DUT would trigger toggling of a GPIO on the DUT) and one GPIO.
>> >>> >>>>>>>>>>>>>>>>>> We know (from some ohter reference instrument) that the DUT would toggle the GPIO 1-2ms after receiving a specific network packet.
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>> My recorder app relies on the timestamping of events done in interrupts service routines for GPIOs and network interface. By checking the timestamps returned by the application, I get an about 50ms delay between the network packet and the GPIO (so something between 51 and 52ms delay).
>> >>> >>>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>> Are you toggling gpio and access gpio device  through your rtdm device like on path /dev/rtdm/your_gpiocontroller/gpio*?
>> >>> >>>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> Maybe my setup was not very clear.
>> >>> >>>>>>>>>>>>>>>> I've got a DUT that receives network data, and toggles one of its GPIOs depending one the network data.
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> I've got another device running my realtime recording app, that receives the same network data as the DUT, and that has one of its GPIO connected to the DUT GPIO.
>> >>> >>>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>>> On the recording app, I use the RTDM device to open, ioctl (enable interrupt + timestamp), then select & read the GPIO value.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>> When issue happen, the recording app side have got same number of network data packages and gpio interrupts with that DUT have been toggling?  
>> >>> >>>>>>>>>>>>>>> I am asking this because I met gpio hardware issue that would cause gpio interrupt missing or invalid gpio interrupts when gpio is connecting on two different boards.
>> >>> >>>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> AFAICT, I'm not missing GPIO edges/interrupts in the app. Regarding network, the DUT and the recoding device are connected to the same switch with the same port config.
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>> Moreover, I tried changing the interrupt polarity of the recording device (switched from rising edge to falling edge), and the result is 1-2ms + ~30ms (the width of the pulse of the DUT) + 50ms latency
>> >>> >>>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>> Can you (or did you already) break down the latencies on the DUT via
>> >>> >>>>>>>>>>>>> tracing? Is it really the GPIO output path? What is happening in it,
>> >>> >>>>>>>>>>>>> starting with RT/Linux task switches, mode transitions etc.?
>> >>> >>>>>>>>>>>>>
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> I just did it.
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>> I did put gpio_pin_interrupt as trigger, then ran my application, triggered the GPIO.
>> >>> >>>>>>>>>>>> This is what I get in frozen, but I'm not quite sure how to read it and what conclusion I could derive :
>> >>> >>>>>>>>>>>>  cat frozen
>> >>> >>>>>>>>>>>> I-pipe frozen back-tracing service on 4.4.227+/ipipe release #10
>> >>> >>>>>>>>>>>> ------------------------------------------------------------
>> >>> >>>>>>>>>>>> CPU: 0, Freeze: 218343820846 cycles, Trace Points: 100 (+10)
>> >>> >>>>>>>>>>>> Calibrated minimum trace-point overhead: 0.461 us
>> >>> >>>>>>>>>>>>
>> >>> >>>>>>>>>>>>  +----- Hard IRQs ('|': locked)
>> >>> >>>>>>>>>>>>  |+-- Xenomai
>> >>> >>>>>>>>>>>>  ||+- Linux ('*': domain stalled, '+': current, '#': current+stalled)
>> >>> >>>>>>>>>>>>  |||                      +---------- Delay flag ('+': > 1 us, '!': > 10 us)
>> >>> >>>>>>>>>>>>  |||                      |        +- NMI noise ('N')
>> >>> >>>>>>>>>>>>  |||                      |        |
>> >>> >>>>>>>>>>>>           Type    User Val.   Time    Delay  Function (Parent)
>> >>> >>>>>>>>>>>> :  +func               -6087+   1.302  load_balance+0x14 (run_rebalance_domains+0x7e8)
>> >>> >>>>>>>>>>>> :  +func               -6085      0.826  idle_cpu+0x10 (load_balance+0x180)
>> >>> >>>>>>>>>>>> :  +func               -6084      0.892  find_busiest_group+0x14 (load_balance+0x1a4)
>> >>> >>>>>>>>>>>> :  +func               -6084      0.757  update_group_capacity+0x14 (find_busiest_group+0x128)
>> >>> >>>>>>>>>>>> :  +func               -6083+   1.452  __msecs_to_jiffies+0x10 (update_group_capacity+0x30)
>> >>> >>>>>>>>>>>> :  +func               -6081+   1.535  idle_cpu+0x10 (find_busiest_group+0x1e4)
>> >>> >>>>>>>>>>>> :  +func               -6080+   1.410  idle_cpu+0x10 (find_busiest_group+0x1e4)
>> >>> >>>>>>>>>>>> :  +func               -6078      0.967  __msecs_to_jiffies+0x10 (run_rebalance_domains+0x810)
>> >>> >>>>>>>>>>>> :  +func               -6077      0.886  __rcu_read_unlock+0x10 (run_rebalance_domains+0x648)
>> >>> >>>>>>>>>>>> :  +func               -6077      0.820  rcu_bh_qs+0x10 (__do_softirq+0x1b0)
>> >>> >>>>>>>>>>>> :  +func               -6076      0.742  ipipe_stall_root+0x10 (__do_softirq+0x1b4)
>> >>> >>>>>>>>>>>> :  +func               -6075      0.766  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6074      0.934  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>> >>>>>>>>>>>> :| +end     0x80000001 -6073      0.811  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6072      0.895  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6072      0.847  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>> >>> >>>>>>>>>>>> :  #func               -6071      0.814  __local_bh_enable+0x10 (__do_softirq+0x214)
>> >>> >>>>>>>>>>>> :  #func               -6070      0.760  ipipe_test_root+0x10 (__local_bh_enable+0x1c)
>> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6069      0.907  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6068      0.898  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>> >>> >>>>>>>>>>>> :  #func               -6067      0.811  rcu_irq_exit+0x10 (irq_exit+0x84)
>> >>> >>>>>>>>>>>> :  #func               -6067      0.781  ipipe_test_and_stall_root+0x10 (rcu_irq_exit+0x18)
>> >>> >>>>>>>>>>>> :  #func               -6066      0.799  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6065+   1.041  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6064      0.769  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6063      0.895  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6062      0.841  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>> >>> >>>>>>>>>>>> :  #func               -6061+   1.197  rcu_eqs_enter_common.constprop.21+0x10 (rcu_irq_exit+0x80)
>> >>> >>>>>>>>>>>> :| #begin   0x80000000 -6060+   1.413  ipipe_trace_begin+0x24 (__ipipe_do_sync_stage+0x2b8)
>> >>> >>>>>>>>>>>> :| +end     0x00000012 -6059+   1.044  ipipe_trace_end+0x24 (__ipipe_grab_irq+0x84)
>> >>> >>>>>>>>>>>> :| +func               -6058      0.988  __ipipe_check_root_interruptible+0x10 (__irq_svc+0x70)
>> >>> >>>>>>>>>>>> :| +func               -6057      0.976  ipipe_test_root+0x10 (__ipipe_check_root_interruptible+0x68)
>> >>> >>>>>>>>>>>> :| +func               -6056      0.829  __ipipe_bugon_irqs_enabled+0x10 (__ipipe_fast_svc_irq_exit+0x4)
>> >>> >>>>>>>>>>>> :| +end     0x90000000 -6055      0.913  __ipipe_fast_svc_irq_exit+0x20 (ipipe_unstall_root+0x88)
>> >>> >>>>>>>>>>>> :  +func               -6054      0.781  ipipe_test_root+0x10 (cpu_startup_entry+0x12c)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6053      0.868  ipipe_trace_begin+0x24 (ipipe_test_root+0x74)
>> >>> >>>>>>>>>>>> :| +end     0x80000001 -6052      0.781  ipipe_trace_end+0x24 (ipipe_test_root+0xb8)
>> >>> >>>>>>>>>>>> :  +func               -6052      0.748  rcu_idle_exit+0x10 (cpu_startup_entry+0x138)
>> >>> >>>>>>>>>>>> :  +func               -6051      0.739  ipipe_test_and_stall_root+0x10 (rcu_idle_exit+0x18)
>> >>> >>>>>>>>>>>> :  +func               -6050      0.775  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6049+   1.011  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>> >>>>>>>>>>>> :| +end     0x80000001 -6048      0.742  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6048      0.940  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6047      0.790  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>> >>> >>>>>>>>>>>> :  #func               -6046      0.859  rcu_eqs_exit_common.constprop.19+0x10 (rcu_idle_exit+0x8c)
>> >>> >>>>>>>>>>>> :  #func               -6045      0.772  ipipe_unstall_root+0x10 (rcu_idle_exit+0x78)
>> >>> >>>>>>>>>>>> :| #begin   0x80000000 -6044      0.814  ipipe_trace_begin+0x24 (ipipe_unstall_root+0x98)
>> >>> >>>>>>>>>>>> :| #func               -6043+   1.077  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>> >>> >>>>>>>>>>>> :| +end     0x80000000 -6042      0.835  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>> >>> >>>>>>>>>>>> :  +func               -6042      0.922  arch_cpu_idle_exit+0x10 (cpu_startup_entry+0xfc)
>> >>> >>>>>>>>>>>> :  +func               -6041      0.793  ipipe_stall_root+0x10 (cpu_startup_entry+0xc4)
>> >>> >>>>>>>>>>>> :  +func               -6040      0.724  ipipe_root_only+0x10 (ipipe_stall_root+0x18)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6039+   1.098  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>> >>>>>>>>>>>> :| +end     0x80000001 -6038      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>> >>>>>>>>>>>> :| +begin   0x80000001 -6037      0.841  ipipe_trace_begin+0x24 (ipipe_stall_root+0x78)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6036      0.763  ipipe_trace_end+0x24 (ipipe_stall_root+0xb8)
>> >>> >>>>>>>>>>>> :  #func               -6036      0.838  arch_cpu_idle_enter+0x10 (cpu_startup_entry+0xc8)
>> >>> >>>>>>>>>>>> :  #func               -6035      0.745  arm_heavy_mb+0x10 (arch_cpu_idle_enter+0x1c)
>> >>> >>>>>>>>>>>> :  #func               -6034      0.916  l2c210_sync+0x10 (arm_heavy_mb+0x2c)
>> >>> >>>>>>>>>>>> :  #func               -6033+   1.062  tick_check_broadcast_expired+0x10 (cpu_startup_entry+0xd8)
>> >>> >>>>>>>>>>>> :  #func               -6032      0.787  rcu_idle_enter+0x10 (cpu_startup_entry+0x124)
>> >>> >>>>>>>>>>>> :  #func               -6031      0.745  ipipe_test_and_stall_root+0x10 (rcu_idle_enter+0x18)
>> >>> >>>>>>>>>>>> :  #func               -6031      0.751  ipipe_root_only+0x10 (ipipe_test_and_stall_root+0x18)
>> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6030      0.991  ipipe_trace_begin+0x24 (ipipe_root_only+0xb8)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6029      0.772  ipipe_trace_end+0x24 (ipipe_root_only+0xfc)
>> >>> >>>>>>>>>>>> :| #begin   0x80000001 -6028      0.892  ipipe_trace_begin+0x24 (ipipe_test_and_stall_root+0x80)
>> >>> >>>>>>>>>>>> :| #end     0x80000001 -6027      0.847  ipipe_trace_end+0x24 (ipipe_test_and_stall_root+0xc4)
>> >>> >>>>>>>>>>>> :  #func               -6026      0.922  rcu_eqs_enter_common.constprop.21+0x10 (rcu_idle_enter+0x90)
>> >>> >>>>>>>>>>>> :  #func               -6025      0.862  default_idle_call+0x10 (cpu_startup_entry+0x128)
>> >>> >>>>>>>>>>>> :  #func               -6024      0.877  arch_cpu_idle+0x10 (default_idle_call+0x38)
>> >>> >>>>>>>>>>>> :| #begin   0x80000000 -6024! 5992.167  ipipe_trace_begin+0x24 (arch_cpu_idle+0xb8)
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> Here your system (or this core) went idle, waiting for the next event.
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>>> :| +func                 -31      0.760  ipipe_unstall_root+0x10 (arch_cpu_idle+0x30)
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> Comming back from idle.
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>>> :| +func                 -31+   1.116  ipipe_root_only+0x10 (ipipe_unstall_root+0x24)
>> >>> >>>>>>>>>>>> :| +end     0x80000000   -30      0.931  ipipe_trace_end+0x24 (ipipe_unstall_root+0x84)
>> >>> >>>>>>>>>>>> :| +begin   0x90000000   -29      0.844  __irq_svc+0x58 (ipipe_unstall_root+0x88)
>> >>> >>>>>>>>>>>> :| +func                 -28      0.925  gic_handle_irq+0x10 (__irq_svc+0x6c)
>> >>> >>>>>>>>>>>> :| +func                 -27      0.904  irq_find_mapping+0x10 (gic_handle_irq+0x50)
>> >>> >>>>>>>>>>>> :| +func                 -26      0.940  __ipipe_grab_irq+0x10 (gic_handle_irq+0x58)
>> >>> >>>>>>>>>>>> :| +begin   0x000000c9   -25      0.826  ipipe_trace_begin+0x24 (__ipipe_grab_irq+0x58)
>> >>> >>>>>>>>>>>> :| +func                 -24      0.814  __ipipe_dispatch_irq+0x10 (__ipipe_grab_irq+0x7c)
>> >>> >>>>>>>>>>>> :| +func                 -23+   1.275  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>> >>> >>>>>>>>>>>> :| +func                 -22+   1.679  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>> >>> >>>>>>>>>>>> :| +func                 -20+   2.092  ucc_gpio_irqhandler+0x14 (__ipipe_dispatch_irq+0x1fc)
>> >>> >>>>>>>>>>>> :| +func                 -18+   1.413  irq_find_mapping+0x10 (ucc_gpio_irqhandler+0x84)
>> >>> >>>>>>>>>>>> :| +begin   0x000000e3   -17      0.757  ipipe_trace_begin+0x24 (ucc_gpio_irqhandler+0x8c)
>> >>> >>>>>>>>>>>> :| +func                 -16      0.778  __ipipe_dispatch_irq+0x10 (ucc_gpio_irqhandler+0x98)
>> >>> >>>>>>>>>>>> :| +func                 -15+   1.023  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x184)
>> >>> >>>>>>>>>>>> :| +func                 -14+   1.494  irq_to_desc+0x10 (__ipipe_dispatch_irq+0x198)
>> >>> >>>>>>>>>>>> :| +func                 -13+   1.014  __ipipe_ack_level_irq+0x10 (__ipipe_dispatch_irq+0x1fc)
>> >>> >>>>>>>>>>>> :| +func                 -12      0.763  ucc_gpio_irq_mask+0x10 (__ipipe_ack_level_irq+0x54)
>> >>> >>>>>>>>>>>> :| +func                 -11+   1.248  __ipipe_spin_lock_irqsave+0x10 (ucc_gpio_irq_mask+0x2c)
>> >>> >>>>>>>>>>>> :| #func                 -10+   1.619  __ipipe_spin_unlock_irqrestore+0x10 (ucc_gpio_irq_mask+0x4c)
>> >>> >>>>>>>>>>>> :| +func                  -8+   1.239  __ipipe_set_irq_pending+0x10 (__ipipe_dispatch_irq+0x3bc)
>> >>> >>>>>>>>>>>> :| +end     0x000000e3    -7      0.994  ipipe_trace_end+0x24 (ucc_gpio_irqhandler+0xa0)
>> >>> >>>>>>>>>>>> :| +func                  -6+   1.029  gic_eoi_irq+0x10 (ucc_gpio_irqhandler+0xd4)
>> >>> >>>>>>>>>>>> :| +func                  -5+   1.353  __ipipe_do_sync_pipeline+0x14 (__ipipe_dispatch_irq+0x17c)
>> >>> >>>>>>>>>>>> :|+ func                  -4+   1.449  __ipipe_do_sync_stage+0x14 (__ipipe_do_sync_pipeline+0xf0)
>> >>> >>>>>>>>>>>> :|# func                  -2+   1.191  xnintr_irq_handler+0x14 (__ipipe_do_sync_stage+0x200)
>> >>> >>>>>>>>>>>> :|# func                  -1+   1.455  ___xnlock_get+0x10 (xnintr_irq_handler+0xc0)
>> >>> >>>>>>>>>>>> <|# func                   0      1.107  gpio_pin_interrupt+0x10 (xnintr_irq_handler+0xf4)
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> And here we start to process that GPIO interrupt in the RTDM handler,
>> >>> >>>>>>>>>>> roughly after 30 µs (which are also due to tracing overhead). So far
>> >>> >>>>>>>>>>> nothing suspiciuos.
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>>>  |# func                   1      1.236  xnclock_core_read_monotonic+0x10 (gpio_pin_interrupt+0x1c)
>> >>> >>>>>>>>>>>>  |# func                   2      1.224  rtdm_event_signal+0x10 (gpio_pin_interrupt+0x2c)
>> >>> >>>>>>>>>>>
>> >>> >>>>>>>>>>> Here some likely waiting RT task is signalled. Does that one notice a
>> >>> >>>>>>>>>>> too high latency?
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>> I may have been wrong speaking of "latency". The problem I'm trying to fix, is understanding why my reference instrument indicates a delay from network packet to GPIO on the DUT of about 1 to 2ms, whereas my xenomai powered realtime recorder application, given the same network stream and gpio access gives me ~50 to 52 ms.
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>> I first though there was something wrong/delayed on the GPIO interrupt side (the network packet could not have been received before it is sent right).
>> >>> >>>>>>>>>> The trace seem to demonstrate ~30µs of interrupt latency (a number I expected for interrupt latency on that system) for the GPIO. So something is wrong on my system, but I don't know what !
>> >>> >>>>>>>>>>
>> >>> >>>>>>>>>
>> >>> >>>>>>>>> Try to trace events, not functions, using regular ftrace ("trace-cmd
>> >>> >>>>>>>>> record -e cobalt* -e sched* -e irq* -e signal*" e.g.). Check when the
>> >>> >>>>>>>>> NIC gets the interrupt and compare that to when the GPIO event is
>> >>> >>>>>>>>> triggered (or whatever is trigger and reaction). Function tracing is for
>> >>> >>>>>>>>> zooming in when you know where to zoom.
>> >>> >>>>>>>>>
>> >>> >>>>>>>>
>> >>> >>>>>>>> So I think I found the answer to my problem.
>> >>> >>>>>>>> Is there any reason why NET events (in NIC irq functions) are dated with rtdm_clock_read, whereas GPIO event are dated with rtdm_clock_read_monotonic ?
>> >>> >>>>>>>>
>> >>> >>>>>>>
>> >>> >>>>>>> Inconsistency of the GPIO drivers, UART drivers use rtdm_clock_read for
>> >>> >>>>>>> user-exposed timestamping as well. Maybe Philippe can comment on
>> >>> >>>>>>> thoughts behind this deviation.
>> >>> >>>>>>>
>> >>> >>>>>>
>> >>> >>>>>> When running over the I-pipe, rtdm_clock_read() is based on Xenomai's
>> >>> >>>>>> idea of real time, which is the Cobalt monotonic clock plus an arbitrary
>> >>> >>>>>> offset. For this reason, rtdm_clock_read() is not SMP-consistent
>> >>> >>>>>> (different CPUs might read different timestamps at the exact same time),
>> >>> >>>>>> is not in sync with linux's wall clock either. For these reasons, I
>> >>> >>>>>> don't see any practical way to synchronize multiple systems on the clock
>> >>> >>>>>> underlying rtdm_clock_read().
>> >>> >>>>>>
>> >>> >>>>>> Therefore, there is no upside in using rtdm_clock_read() for
>> >>> >>>>>> timestamping in this context, only adding the potential for even more
>> >>> >>>>>> surprising results due to the mono->real-time offset changing under our
>> >>> >>>>>> feet, since the epoch of the Xenomai real-time clock can be (re)set
>> >>> >>>>>> during runtime.
>> >>> >>>>>>
>> >>> >>>>>> I believe the UART driver is wrong here, it should return timestamps
>> >>> >>>>>> based on the monotonic source, which best fits the common need: getting
>> >>> >>>>>> timestamps from the local CPU for measuring delays between events
>> >>> >>>>>> received by drivers and the actions taken by the applications which
>> >>> >>>>>> consume them, immune from updates to the underlying clock
>> >>> >>>>>> epoch. Granted, there might be a catch when a timestamp is taken from
>> >>> >>>>>> IRQ context, which is then consumed from a thread living on a different
>> >>> >>>>>> CPU, if per-CPU clocks are not/badly synchronized. But that would happen
>> >>> >>>>>> the exact same way with rtdm_clock_read() anyway.
>> >>> >>>>>>
>> >>> >>>>>> The situation improves when running on top of Dovetail, since Xenomai
>> >>> >>>>>> now refers to the common linux clocks (mono / real) instead of providing
>> >>> >>>>>> its own idea of time, but the UART driver code predates the Dovetail
>> >>> >>>>>> port.
>> >>> >>>>>>  
>> >>> >>>>>
>> >>> >>>>> Well, it's not just UART. All hardware drivers - except for GPIO - use
>> >>> >>>>> rtdm_clock_read. That was no problem in practice for their use cases so
>> >>> >>>>> far. One problem is that GPIO timestamps are now not comparable to others.
>> >>> >>>>>
>> >>> >>>>> But IIRC, most Xenomai APIs using absolute timestamps are based on
>> >>> >>>>> Xenomai's real-time clock. Therefore, providing timestamps for that
>> >>> >>>>
>> >>> >>>> /*
>> >>> >>>>  * The Copperplate clock shall be monotonic unless the threading
>> >>> >>>>  * library has restrictions to support this over Mercury.
>> >>> >>>>  *
>> >>> >>>>  * In the normal case, this means that ongoing delays and timeouts
>> >>> >>>>  * won't be affected when the host system date is changed. In the
>> >>> >>>>  * restricted case by contrast, ongoing delays and timeouts may be
>> >>> >>>>  * impacted by changes to the host system date.
>> >>> >>>>  *
>> >>> >>>>  * The implementation maintains a per-clock epoch value, so that
>> >>> >>>>  * different emulators can have different (virtual) system dates.
>> >>> >>>>  */
>> >>> >>>> #ifdef CONFIG_XENO_COPPERPLATE_CLOCK_RESTRICTED
>> >>> >>>> #define CLOCK_COPPERPLATE  CLOCK_REALTIME
>> >>> >>>> #else
>> >>> >>>> #define CLOCK_COPPERPLATE  CLOCK_MONOTONIC
>> >>> >>>> #endif
>> >>> >>>>
>> >>> >>>> So no, only the POSIX API is using what the standard mandates, which is
>> >>> >>>> CLOCK_REALTIME. All other APIs are based on copperplate, and they are
>> >>> >>>> using a monotonic source as documented above.
>> >>> >>>
>> >>> >>> OK, but that changed in 3.x. At the time that RTDM API was originally
>> >>> >>> added and then promoted, it was the other not this way. We became
>> >>> >>> inconsistent then.
>> >>> >>>
>> >>> >>>>
>> >>> >>>>> particular clock was the original idea of rtdm_clock_read (which
>> >>> >>>>> predates rtdm_clock_read_monotonic). GPIO breaks that and should be
>> >>> >>>>> fixed - unless I'm wrong with that assumption.
>> >>> >>>>>
>> >>> >>>>
>> >>> >>>> We cannot assume the epoch is going to remain stable with
>> >>> >>>> rtdm_clock_read() the way it is implemented, which is quite of a
>> >>> >>>> problem wrt the common use case. For timestamping, a majority of
>> >>> >>>> mainline drivers is using ktime_get() or a variant thereof which is
>> >>> >>>> based on the monotonic clock source, not the _real form. Why would the
>> >>> >>>> real-time I/O drivers be different?
>> >>> >>>
>> >>> >>> We have two cases here:
>> >>> >>>
>> >>> >>>  - I-pipe-based version where the realtime clock is under full
>> >>> >>>    application control -> no problem to use rtdm_clock_read
>> >>> >> 
>> >>> >> Well, there is still the issue that rtdm_clock_read() is not immune to
>> >>> >> some part of userland changing the CLOCK_REALTIME epoch Xenomai-wise
>> >>> >> e.g. via a call to clock_settime(), which is the same problem than Linux
>> >>> >> changing the epoch of CLOCK_REALTIME over Dovetail. This would break the
>> >>> >> application.
>> >>> >
>> >>> > Yes, but that's about the application(s) breaking themselves. Nothing
>> >>> > new, not going to change when we only avoid clock_realtime for stamps
>> >>> > but still use POSIX services basing timers on that clock. The key point
>> >>> > is that I-pipe gave that into RT application hands (with all related
>> >>> > downsides), with Dovetail it's in system hands.
>> >>> >
>> >>> >> 
>> >>> >>>  - Dovetail where we share the realtime clock - with all its tuning -
>> >>> >>>    with Linux -> here we have a problem with rtdm_clock_read and should
>> >>> >>>    reconsider its usage (and promotion!)
>> >>> >>>
>> >>> >>> For stable 3.1, the proper fix is with GPIO going to rtdm_clock_read.
>> >>> >> 
>> >>> >> Wait, you have downstream users already depending on GPIO returning
>> >>> >> monotonic timestamps, and this is a _stable_ release. So why not fixing
>> >>> >> other drivers based on the fact that timestamping with rtdm_clock_read()
>> >>> >> is wrong instead? Same issue, right?
>> >>> >
>> >>> > GPIO was broken, but you are right that we may have users relying on
>> >>> > that breakage now. Obviously, we can change the other drivers for the
>> >>> > very same reasons: They are working like they work for more than 10
>> >>> > years now.
>> >>> >
>> >>> 
>> >>> Sorry, but I my views, rtdm_read_clock() was broken since day one. GPIO
>> >>> had to work around the breakage.. :)
>> >>> 
>> >>> >> 
>> >>> >> So the best course of action to sort this out for 3.1.x may be to extend
>> >>> >> GPIO_RTIOC_TS with say, GPIO_RTIOC_TS_REAL, which would log and return
>> >>> >> timestamps based on the Xenomai wallclock. Applications which do want to
>> >>> >> align on that clock would simply have to issue GPIO_RTIOC_TS_REAL
>> >>> >> instead of GPIO_RTIOC_TS. This would break backward ABI compat only for
>> >>> >> users of GPIO_RTIOC_TS_REAL, but that would be much better than
>> >>> >> introducing a sneaky change in behavior for the GPIO driver.
>> >>> >
>> >>> > Yeah, likely the way to go.
>> >>> >
>> >>> >> 
>> >>> >>> For 3.2, I'm not sure yet what to do with rtdm_clock_read.
>> >>> >>>
>> >>> >> 
>> >>> >> The decision looks pretty simple for the common use case: when
>> >>> >> timestamps are needed for performance/delay measurements, we want to get
>> >>> >> them from a clock source which won't play funny games, warping back to
>> >>> >> the future.
>> >>> >
>> >>> > You only think of measurements. The other drivers used the stamping also
>> >>> > for real applications, means to calculate absolute clock-realtime
>> >>> > timeouts and wait for them to arrive. We will likely need to enhance
>> >>> > also the other driver APIs to select the desired clocksource, just like
>> >>> > for GPIO.
>> >>> 
>> >>> I'm referring to what is in the GPIO code, which is the problem at hand:
>> >>> that timestamping was designed since day #1 to provide a way to measure
>> >>> the scheduling latency.
>> >>> 
>> >>> I agree that the only way out is to enable all timestamp consumers to
>> >>> pick their base clock of choice (mono or wallclock).
>> >>> 
>> >>> -- 
>> >>> Philippe.
>> >>  
>> >>  Hi there,
>> >>
>> >> as we speak of breaking ABI, why not introducing some specific IOCTL to select which timesource to use for each driver using it, then using a function pointer to call the right rtdm_time_get_whatever function ?
>> >>
>> >
>> > That would mean adding a generic ioctl, and many changes all over the
>> > place (we would need no function pointer with distinct ioctl
>> > codes). Doable for sure, but likely too much for 3.1.x though.
>> 
>> wrt function pointer: I mean that a plain simple branch on some selector
>> is likely to perform better than a function pointer, with the spectre
>> mitigations around.
>> 
>
> Maybe I did not make myself clear enought. I was thinking of an ioctl that would allow users to select between rtdm_read_clock/rtdm_read_clock_whatever/... and not to provide a user function pointer.
>

I got that. I was referring to the fact that using a function pointer
(in kernel space) to redirect the call to the proper clock reading
routine may not be as efficient as using a plain simple branch when only
a few basic conditions need to be tested, due to the counter-measures
the kernel may have to implement to sanitize indirect calls to
circumvent attacks (e.g. retpolines/x86, branch target identification
for arm64).

With this in mind, assuming that we have previously sanitized the clock
identifier, doing this:

#define get_timestamp(__clock) \
({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() : rtdm_clock_read(); })

may end up being faster than:

xnticks_t (*__get_timestamp)(clockid_t clock);
#define get_timestamp(__clock)	__get_timestamp(__clock)

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  7:38                                             ` Philippe Gerum
@ 2021-06-22  7:49                                               ` Julien Blanc
  2021-06-22  8:22                                                 ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Julien Blanc @ 2021-06-22  7:49 UTC (permalink / raw)
  To: rpm, francois.legal; +Cc: xenomai

Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
écrit :
> 
> With this in mind, assuming that we have previously sanitized the
> clock
> identifier, doing this:
> 
> #define get_timestamp(__clock) \
> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
> rtdm_clock_read(); })
> 
> may end up being faster than:
> 
> xnticks_t (*__get_timestamp)(clockid_t clock);
> #define get_timestamp(__clock)	__get_timestamp(__clock)

Is really a runtime switch necessary ? Since relying on the realtime
clock is inherently broken, my understanding is that it should be kept
as compatibility purpose only.

A compile define would suffice, then, and that could also be used to
warn users that relying on the realtime clock is deprecated and will
disappear in future versions (if that's the plan).

Regards,

Julien

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  7:49                                               ` Julien Blanc
@ 2021-06-22  8:22                                                 ` Jan Kiszka
  2021-06-22  8:37                                                   ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-22  8:22 UTC (permalink / raw)
  To: Julien Blanc, rpm, francois.legal; +Cc: xenomai

On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
> écrit :
>>
>> With this in mind, assuming that we have previously sanitized the
>> clock
>> identifier, doing this:
>>
>> #define get_timestamp(__clock) \
>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>> rtdm_clock_read(); })
>>
>> may end up being faster than:
>>
>> xnticks_t (*__get_timestamp)(clockid_t clock);
>> #define get_timestamp(__clock)	__get_timestamp(__clock)
> 
> Is really a runtime switch necessary ? Since relying on the realtime
> clock is inherently broken, my understanding is that it should be kept
> as compatibility purpose only.

Again: The real-time clock is not a broken clock per se. It is the basis
of many services (POSIX...) and - if managed properly - it is as sound
clock to build upon. If you need absolute timestamps to calculate
absolute timeouts (like users of the existing code do), this is the
clock to go, also in future versions. Also if you want to use
PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
point, monotonic timestamps will lose relevance in practice.

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  8:22                                                 ` Jan Kiszka
@ 2021-06-22  8:37                                                   ` Philippe Gerum
  2021-06-22  9:14                                                     ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-22  8:37 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Julien Blanc, francois.legal, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>> écrit :
>>>
>>> With this in mind, assuming that we have previously sanitized the
>>> clock
>>> identifier, doing this:
>>>
>>> #define get_timestamp(__clock) \
>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>>> rtdm_clock_read(); })
>>>
>>> may end up being faster than:
>>>
>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>> 
>> Is really a runtime switch necessary ? Since relying on the realtime
>> clock is inherently broken, my understanding is that it should be kept
>> as compatibility purpose only.
>
> Again: The real-time clock is not a broken clock per se. It is the basis
> of many services (POSIX...) and - if managed properly - it is as sound
> clock to build upon. If you need absolute timestamps to calculate
> absolute timeouts (like users of the existing code do), this is the
> clock to go, also in future versions.

Definitely correct, for timeout specs.

> Also if you want to use
> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
> point, monotonic timestamps will lose relevance in practice.
>

We are not there yet. So, let's all agree than we need both clock bases,
and a flexible way to select the current one.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  8:37                                                   ` Philippe Gerum
@ 2021-06-22  9:14                                                     ` Jan Kiszka
  2021-06-22  9:31                                                       ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-22  9:14 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Julien Blanc, francois.legal, xenomai

On 22.06.21 10:37, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>>> écrit :
>>>>
>>>> With this in mind, assuming that we have previously sanitized the
>>>> clock
>>>> identifier, doing this:
>>>>
>>>> #define get_timestamp(__clock) \
>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>>>> rtdm_clock_read(); })
>>>>
>>>> may end up being faster than:
>>>>
>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>>>
>>> Is really a runtime switch necessary ? Since relying on the realtime
>>> clock is inherently broken, my understanding is that it should be kept
>>> as compatibility purpose only.
>>
>> Again: The real-time clock is not a broken clock per se. It is the basis
>> of many services (POSIX...) and - if managed properly - it is as sound
>> clock to build upon. If you need absolute timestamps to calculate
>> absolute timeouts (like users of the existing code do), this is the
>> clock to go, also in future versions.
> 
> Definitely correct, for timeout specs.
> 
>> Also if you want to use
>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
>> point, monotonic timestamps will lose relevance in practice.
>>
> 
> We are not there yet. So, let's all agree than we need both clock bases,
> and a flexible way to select the current one.
> 

What's still missing with Dovetail and a Linux-operated PTP sync for the
main clocksource? Hardending of software-based sync paths? Where it's
hw-based, that should already be fine (latest Intel SOCs).

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  9:14                                                     ` Jan Kiszka
@ 2021-06-22  9:31                                                       ` Philippe Gerum
  2021-06-22  9:39                                                         ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-22  9:31 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Julien Blanc, francois.legal, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 22.06.21 10:37, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>>>> écrit :
>>>>>
>>>>> With this in mind, assuming that we have previously sanitized the
>>>>> clock
>>>>> identifier, doing this:
>>>>>
>>>>> #define get_timestamp(__clock) \
>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>>>>> rtdm_clock_read(); })
>>>>>
>>>>> may end up being faster than:
>>>>>
>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>>>>
>>>> Is really a runtime switch necessary ? Since relying on the realtime
>>>> clock is inherently broken, my understanding is that it should be kept
>>>> as compatibility purpose only.
>>>
>>> Again: The real-time clock is not a broken clock per se. It is the basis
>>> of many services (POSIX...) and - if managed properly - it is as sound
>>> clock to build upon. If you need absolute timestamps to calculate
>>> absolute timeouts (like users of the existing code do), this is the
>>> clock to go, also in future versions.
>> 
>> Definitely correct, for timeout specs.
>> 
>>> Also if you want to use
>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
>>> point, monotonic timestamps will lose relevance in practice.
>>>
>> 
>> We are not there yet. So, let's all agree than we need both clock bases,
>> and a flexible way to select the current one.
>> 
>
> What's still missing with Dovetail and a Linux-operated PTP sync for the
> main clocksource? Hardending of software-based sync paths? Where it's
> hw-based, that should already be fine (latest Intel SOCs).
>

Again, we have two use cases: absolute timeouts based on a common epoch
which may change, and delays. We do know PTP with hw-based support is an
efficient way to maintain consistency among users of the former,
nevertheless we still need to express/measure strict delays in other
cases. So, we do need both clocks.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  9:31                                                       ` Philippe Gerum
@ 2021-06-22  9:39                                                         ` Jan Kiszka
  2021-06-22 10:07                                                           ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-06-22  9:39 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Julien Blanc, francois.legal, xenomai

On 22.06.21 11:31, Philippe Gerum wrote:
> 
> Jan Kiszka <jan.kiszka@siemens.com> writes:
> 
>> On 22.06.21 10:37, Philippe Gerum wrote:
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>>>>> écrit :
>>>>>>
>>>>>> With this in mind, assuming that we have previously sanitized the
>>>>>> clock
>>>>>> identifier, doing this:
>>>>>>
>>>>>> #define get_timestamp(__clock) \
>>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>>>>>> rtdm_clock_read(); })
>>>>>>
>>>>>> may end up being faster than:
>>>>>>
>>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>>>>>
>>>>> Is really a runtime switch necessary ? Since relying on the realtime
>>>>> clock is inherently broken, my understanding is that it should be kept
>>>>> as compatibility purpose only.
>>>>
>>>> Again: The real-time clock is not a broken clock per se. It is the basis
>>>> of many services (POSIX...) and - if managed properly - it is as sound
>>>> clock to build upon. If you need absolute timestamps to calculate
>>>> absolute timeouts (like users of the existing code do), this is the
>>>> clock to go, also in future versions.
>>>
>>> Definitely correct, for timeout specs.
>>>
>>>> Also if you want to use
>>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
>>>> point, monotonic timestamps will lose relevance in practice.
>>>>
>>>
>>> We are not there yet. So, let's all agree than we need both clock bases,
>>> and a flexible way to select the current one.
>>>
>>
>> What's still missing with Dovetail and a Linux-operated PTP sync for the
>> main clocksource? Hardending of software-based sync paths? Where it's
>> hw-based, that should already be fine (latest Intel SOCs).
>>
> 
> Again, we have two use cases: absolute timeouts based on a common epoch
> which may change, and delays. We do know PTP with hw-based support is an
> efficient way to maintain consistency among users of the former,
> nevertheless we still need to express/measure strict delays in other
> cases. So, we do need both clocks.
> 

If you need cross-system event stamping, the unsync'ed monotonic clock
is the wrong choice as well.

Again: What is missing to use PTP-sync'ed clock-realtime under Dovetail
today?

Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22  9:39                                                         ` Jan Kiszka
@ 2021-06-22 10:07                                                           ` Philippe Gerum
  2021-07-23  7:02                                                             ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-06-22 10:07 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Julien Blanc, francois.legal, xenomai


Jan Kiszka <jan.kiszka@siemens.com> writes:

> On 22.06.21 11:31, Philippe Gerum wrote:
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>>> On 22.06.21 10:37, Philippe Gerum wrote:
>>>>
>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>
>>>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>>>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>>>>>> écrit :
>>>>>>>
>>>>>>> With this in mind, assuming that we have previously sanitized the
>>>>>>> clock
>>>>>>> identifier, doing this:
>>>>>>>
>>>>>>> #define get_timestamp(__clock) \
>>>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>>>>>>> rtdm_clock_read(); })
>>>>>>>
>>>>>>> may end up being faster than:
>>>>>>>
>>>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>>>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>>>>>>
>>>>>> Is really a runtime switch necessary ? Since relying on the realtime
>>>>>> clock is inherently broken, my understanding is that it should be kept
>>>>>> as compatibility purpose only.
>>>>>
>>>>> Again: The real-time clock is not a broken clock per se. It is the basis
>>>>> of many services (POSIX...) and - if managed properly - it is as sound
>>>>> clock to build upon. If you need absolute timestamps to calculate
>>>>> absolute timeouts (like users of the existing code do), this is the
>>>>> clock to go, also in future versions.
>>>>
>>>> Definitely correct, for timeout specs.
>>>>
>>>>> Also if you want to use
>>>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
>>>>> point, monotonic timestamps will lose relevance in practice.
>>>>>
>>>>
>>>> We are not there yet. So, let's all agree than we need both clock bases,
>>>> and a flexible way to select the current one.
>>>>
>>>
>>> What's still missing with Dovetail and a Linux-operated PTP sync for the
>>> main clocksource? Hardending of software-based sync paths? Where it's
>>> hw-based, that should already be fine (latest Intel SOCs).
>>>
>> 
>> Again, we have two use cases: absolute timeouts based on a common epoch
>> which may change, and delays. We do know PTP with hw-based support is an
>> efficient way to maintain consistency among users of the former,
>> nevertheless we still need to express/measure strict delays in other
>> cases. So, we do need both clocks.
>> 
>
> If you need cross-system event stamping, the unsync'ed monotonic clock
> is the wrong choice as well.
>

As usual, it's a trade-off, a risk assessment between observing
non-coherent SMP readings and warp jumps. This is a call an application
should be allowed to make, I see no reason to impose a solution
arbitrarily without knowing what the app actually wants to do.

> Again: What is missing to use PTP-sync'ed clock-realtime under Dovetail
> today?
>

Nothing when it comes to reading timestamps, via the NMI-safe
ktime_get_real_fast_ns() service (updates will be non-preemptible while
holding the seqlock).

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-06-22 10:07                                                           ` Philippe Gerum
@ 2021-07-23  7:02                                                             ` François Legal
  2021-07-23  8:04                                                               ` Philippe Gerum
  0 siblings, 1 reply; 36+ messages in thread
From: François Legal @ 2021-07-23  7:02 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: Jan Kiszka, Julien Blanc, xenomai

Le Mardi, Juin 22, 2021 12:07 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:

>
> Jan Kiszka <jan.kiszka@siemens.com> writes:
>
> > On 22.06.21 11:31, Philippe Gerum wrote:
> >>
> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>
> >>> On 22.06.21 10:37, Philippe Gerum wrote:
> >>>>
> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>>
> >>>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
> >>>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
> >>>>>> écrit :
> >>>>>>>
> >>>>>>> With this in mind, assuming that we have previously sanitized the
> >>>>>>> clock
> >>>>>>> identifier, doing this:
> >>>>>>>
> >>>>>>> #define get_timestamp(__clock) \
> >>>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
> >>>>>>> rtdm_clock_read(); })
> >>>>>>>
> >>>>>>> may end up being faster than:
> >>>>>>>
> >>>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
> >>>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
> >>>>>>
> >>>>>> Is really a runtime switch necessary ? Since relying on the realtime
> >>>>>> clock is inherently broken, my understanding is that it should be kept
> >>>>>> as compatibility purpose only.
> >>>>>
> >>>>> Again: The real-time clock is not a broken clock per se. It is the basis
> >>>>> of many services (POSIX...) and - if managed properly - it is as sound
> >>>>> clock to build upon. If you need absolute timestamps to calculate
> >>>>> absolute timeouts (like users of the existing code do), this is the
> >>>>> clock to go, also in future versions.
> >>>>
> >>>> Definitely correct, for timeout specs.
> >>>>
> >>>>> Also if you want to use
> >>>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
> >>>>> point, monotonic timestamps will lose relevance in practice.
> >>>>>
> >>>>
> >>>> We are not there yet. So, let's all agree than we need both clock bases,
> >>>> and a flexible way to select the current one.
> >>>>
> >>>
> >>> What's still missing with Dovetail and a Linux-operated PTP sync for the
> >>> main clocksource? Hardending of software-based sync paths? Where it's
> >>> hw-based, that should already be fine (latest Intel SOCs).
> >>>
> >>
> >> Again, we have two use cases: absolute timeouts based on a common epoch
> >> which may change, and delays. We do know PTP with hw-based support is an
> >> efficient way to maintain consistency among users of the former,
> >> nevertheless we still need to express/measure strict delays in other
> >> cases. So, we do need both clocks.
> >>
> >
> > If you need cross-system event stamping, the unsync'ed monotonic clock
> > is the wrong choice as well.
> >
>
> As usual, it's a trade-off, a risk assessment between observing
> non-coherent SMP readings and warp jumps. This is a call an application
> should be allowed to make, I see no reason to impose a solution
> arbitrarily without knowing what the app actually wants to do.
>
> > Again: What is missing to use PTP-sync'ed clock-realtime under Dovetail
> > today?
> >
>
> Nothing when it comes to reading timestamps, via the NMI-safe
> ktime_get_real_fast_ns() service (updates will be non-preemptible while
> holding the seqlock).
>

Hi there,

to move this topic forward, I propose to create a patch to replace the call to rtdm_clock_read_monotonic by rtdm_clock_read in gpio_core.c,
plus another one that would introduce the possibility to switch between those 2 different clocks.

François
> --
> Philippe.



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-07-23  7:02                                                             ` François Legal
@ 2021-07-23  8:04                                                               ` Philippe Gerum
  2021-08-02 12:24                                                                 ` Jan Kiszka
  0 siblings, 1 reply; 36+ messages in thread
From: Philippe Gerum @ 2021-07-23  8:04 UTC (permalink / raw)
  To: François Legal; +Cc: Jan Kiszka, Julien Blanc, xenomai


François Legal <devel@thom.fr.eu.org> writes:

> Le Mardi, Juin 22, 2021 12:07 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>  
>> 
>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> 
>> > On 22.06.21 11:31, Philippe Gerum wrote:
>> >> 
>> >> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >> 
>> >>> On 22.06.21 10:37, Philippe Gerum wrote:
>> >>>>
>> >>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>> >>>>
>> >>>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>> >>>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>> >>>>>> écrit :
>> >>>>>>>
>> >>>>>>> With this in mind, assuming that we have previously sanitized the
>> >>>>>>> clock
>> >>>>>>> identifier, doing this:
>> >>>>>>>
>> >>>>>>> #define get_timestamp(__clock) \
>> >>>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>> >>>>>>> rtdm_clock_read(); })
>> >>>>>>>
>> >>>>>>> may end up being faster than:
>> >>>>>>>
>> >>>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>> >>>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>> >>>>>>
>> >>>>>> Is really a runtime switch necessary ? Since relying on the realtime
>> >>>>>> clock is inherently broken, my understanding is that it should be kept
>> >>>>>> as compatibility purpose only.
>> >>>>>
>> >>>>> Again: The real-time clock is not a broken clock per se. It is the basis
>> >>>>> of many services (POSIX...) and - if managed properly - it is as sound
>> >>>>> clock to build upon. If you need absolute timestamps to calculate
>> >>>>> absolute timeouts (like users of the existing code do), this is the
>> >>>>> clock to go, also in future versions.
>> >>>>
>> >>>> Definitely correct, for timeout specs.
>> >>>>
>> >>>>> Also if you want to use
>> >>>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
>> >>>>> point, monotonic timestamps will lose relevance in practice.
>> >>>>>
>> >>>>
>> >>>> We are not there yet. So, let's all agree than we need both clock bases,
>> >>>> and a flexible way to select the current one.
>> >>>>
>> >>>
>> >>> What's still missing with Dovetail and a Linux-operated PTP sync for the
>> >>> main clocksource? Hardending of software-based sync paths? Where it's
>> >>> hw-based, that should already be fine (latest Intel SOCs).
>> >>>
>> >> 
>> >> Again, we have two use cases: absolute timeouts based on a common epoch
>> >> which may change, and delays. We do know PTP with hw-based support is an
>> >> efficient way to maintain consistency among users of the former,
>> >> nevertheless we still need to express/measure strict delays in other
>> >> cases. So, we do need both clocks.
>> >> 
>> >
>> > If you need cross-system event stamping, the unsync'ed monotonic clock
>> > is the wrong choice as well.
>> >
>> 
>> As usual, it's a trade-off, a risk assessment between observing
>> non-coherent SMP readings and warp jumps. This is a call an application
>> should be allowed to make, I see no reason to impose a solution
>> arbitrarily without knowing what the app actually wants to do.
>> 
>> > Again: What is missing to use PTP-sync'ed clock-realtime under Dovetail
>> > today?
>> >
>> 
>> Nothing when it comes to reading timestamps, via the NMI-safe
>> ktime_get_real_fast_ns() service (updates will be non-preemptible while
>> holding the seqlock).
>> 
>
> Hi there,
>
> to move this topic forward, I propose to create a patch to replace the call to rtdm_clock_read_monotonic by rtdm_clock_read in gpio_core.c,
> plus another one that would introduce the possibility to switch between those 2 different clocks.
>

There is no timed wait calls in this driver which might benefit from a
general "clock switch" feature, is it?  So what about either of these
options instead:

- adding GPIO_RTIOC_TS_REAL only for the purpose of retrieving
  timestamps. That would leave the existing client code unaffected, and
  _might_ be backported to the stable tree since this is a mere
  extension of the ABI.

- renaming the current ioctl request to GPIO_RTIOC_GET_TS_MONO, and
  provide GPIO_RTIOC_GET_TS[_REAL] based on rtdm_clock_read() in the
  same move. That would force the existing client code to revisit their
  implementation wrt retrieving a timestamp, providing the desired
  alternative between mono/real too. That sort of change would have to
  wait for 3.2 though.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-07-23  8:04                                                               ` Philippe Gerum
@ 2021-08-02 12:24                                                                 ` Jan Kiszka
  2021-08-05 15:11                                                                   ` François Legal
  0 siblings, 1 reply; 36+ messages in thread
From: Jan Kiszka @ 2021-08-02 12:24 UTC (permalink / raw)
  To: Philippe Gerum, François Legal; +Cc: Julien Blanc, xenomai

On 23.07.21 10:04, Philippe Gerum wrote:
> 
> François Legal <devel@thom.fr.eu.org> writes:
> 
>> Le Mardi, Juin 22, 2021 12:07 CEST, Philippe Gerum <rpm@xenomai.org> a écrit: 
>>  
>>>
>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>
>>>> On 22.06.21 11:31, Philippe Gerum wrote:
>>>>>
>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>
>>>>>> On 22.06.21 10:37, Philippe Gerum wrote:
>>>>>>>
>>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>>
>>>>>>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
>>>>>>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
>>>>>>>>> écrit :
>>>>>>>>>>
>>>>>>>>>> With this in mind, assuming that we have previously sanitized the
>>>>>>>>>> clock
>>>>>>>>>> identifier, doing this:
>>>>>>>>>>
>>>>>>>>>> #define get_timestamp(__clock) \
>>>>>>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
>>>>>>>>>> rtdm_clock_read(); })
>>>>>>>>>>
>>>>>>>>>> may end up being faster than:
>>>>>>>>>>
>>>>>>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
>>>>>>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
>>>>>>>>>
>>>>>>>>> Is really a runtime switch necessary ? Since relying on the realtime
>>>>>>>>> clock is inherently broken, my understanding is that it should be kept
>>>>>>>>> as compatibility purpose only.
>>>>>>>>
>>>>>>>> Again: The real-time clock is not a broken clock per se. It is the basis
>>>>>>>> of many services (POSIX...) and - if managed properly - it is as sound
>>>>>>>> clock to build upon. If you need absolute timestamps to calculate
>>>>>>>> absolute timeouts (like users of the existing code do), this is the
>>>>>>>> clock to go, also in future versions.
>>>>>>>
>>>>>>> Definitely correct, for timeout specs.
>>>>>>>
>>>>>>>> Also if you want to use
>>>>>>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
>>>>>>>> point, monotonic timestamps will lose relevance in practice.
>>>>>>>>
>>>>>>>
>>>>>>> We are not there yet. So, let's all agree than we need both clock bases,
>>>>>>> and a flexible way to select the current one.
>>>>>>>
>>>>>>
>>>>>> What's still missing with Dovetail and a Linux-operated PTP sync for the
>>>>>> main clocksource? Hardending of software-based sync paths? Where it's
>>>>>> hw-based, that should already be fine (latest Intel SOCs).
>>>>>>
>>>>>
>>>>> Again, we have two use cases: absolute timeouts based on a common epoch
>>>>> which may change, and delays. We do know PTP with hw-based support is an
>>>>> efficient way to maintain consistency among users of the former,
>>>>> nevertheless we still need to express/measure strict delays in other
>>>>> cases. So, we do need both clocks.
>>>>>
>>>>
>>>> If you need cross-system event stamping, the unsync'ed monotonic clock
>>>> is the wrong choice as well.
>>>>
>>>
>>> As usual, it's a trade-off, a risk assessment between observing
>>> non-coherent SMP readings and warp jumps. This is a call an application
>>> should be allowed to make, I see no reason to impose a solution
>>> arbitrarily without knowing what the app actually wants to do.
>>>
>>>> Again: What is missing to use PTP-sync'ed clock-realtime under Dovetail
>>>> today?
>>>>
>>>
>>> Nothing when it comes to reading timestamps, via the NMI-safe
>>> ktime_get_real_fast_ns() service (updates will be non-preemptible while
>>> holding the seqlock).
>>>
>>
>> Hi there,
>>
>> to move this topic forward, I propose to create a patch to replace the call to rtdm_clock_read_monotonic by rtdm_clock_read in gpio_core.c,
>> plus another one that would introduce the possibility to switch between those 2 different clocks.
>>
> 
> There is no timed wait calls in this driver which might benefit from a
> general "clock switch" feature, is it?  So what about either of these
> options instead:
> 
> - adding GPIO_RTIOC_TS_REAL only for the purpose of retrieving
>   timestamps. That would leave the existing client code unaffected, and
>   _might_ be backported to the stable tree since this is a mere
>   extension of the ABI.
> 
> - renaming the current ioctl request to GPIO_RTIOC_GET_TS_MONO, and
>   provide GPIO_RTIOC_GET_TS[_REAL] based on rtdm_clock_read() in the
>   same move. That would force the existing client code to revisit their
>   implementation wrt retrieving a timestamp, providing the desired
>   alternative between mono/real too. That sort of change would have to
>   wait for 3.2 though.
> 

How is the status here? This topic should really be address before the
3.2 release.

Thanks,
Jan

-- 
Siemens AG, T RDA IOT
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: Large gpio interrupt latency
  2021-08-02 12:24                                                                 ` Jan Kiszka
@ 2021-08-05 15:11                                                                   ` François Legal
  0 siblings, 0 replies; 36+ messages in thread
From: François Legal @ 2021-08-05 15:11 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Philippe Gerum, Julien Blanc, xenomai

Le Lundi, Août 02, 2021 14:24 CEST, Jan Kiszka <jan.kiszka@siemens.com> a écrit:

> On 23.07.21 10:04, Philippe Gerum wrote:
> >
> > François Legal <devel@thom.fr.eu.org> writes:
> >
> >> Le Mardi, Juin 22, 2021 12:07 CEST, Philippe Gerum <rpm@xenomai.org> a écrit:
> >>
> >>>
> >>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>
> >>>> On 22.06.21 11:31, Philippe Gerum wrote:
> >>>>>
> >>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>>>
> >>>>>> On 22.06.21 10:37, Philippe Gerum wrote:
> >>>>>>>
> >>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
> >>>>>>>
> >>>>>>>> On 22.06.21 09:49, Julien Blanc via Xenomai wrote:
> >>>>>>>>> Le mardi 22 juin 2021 à 09:38 +0200, Philippe Gerum via Xenomai a
> >>>>>>>>> écrit :
> >>>>>>>>>>
> >>>>>>>>>> With this in mind, assuming that we have previously sanitized the
> >>>>>>>>>> clock
> >>>>>>>>>> identifier, doing this:
> >>>>>>>>>>
> >>>>>>>>>> #define get_timestamp(__clock) \
> >>>>>>>>>> ({ (__clock) == CLOCK_MONOTONIC ? rtdm_clock_read_monotonic() :
> >>>>>>>>>> rtdm_clock_read(); })
> >>>>>>>>>>
> >>>>>>>>>> may end up being faster than:
> >>>>>>>>>>
> >>>>>>>>>> xnticks_t (*__get_timestamp)(clockid_t clock);
> >>>>>>>>>> #define get_timestamp(__clock)	__get_timestamp(__clock)
> >>>>>>>>>
> >>>>>>>>> Is really a runtime switch necessary ? Since relying on the realtime
> >>>>>>>>> clock is inherently broken, my understanding is that it should be kept
> >>>>>>>>> as compatibility purpose only.
> >>>>>>>>
> >>>>>>>> Again: The real-time clock is not a broken clock per se. It is the basis
> >>>>>>>> of many services (POSIX...) and - if managed properly - it is as sound
> >>>>>>>> clock to build upon. If you need absolute timestamps to calculate
> >>>>>>>> absolute timeouts (like users of the existing code do), this is the
> >>>>>>>> clock to go, also in future versions.
> >>>>>>>
> >>>>>>> Definitely correct, for timeout specs.
> >>>>>>>
> >>>>>>>> Also if you want to use
> >>>>>>>> PTP-sync'ed clocks across systems (TSN...), it is THE way to go. At that
> >>>>>>>> point, monotonic timestamps will lose relevance in practice.
> >>>>>>>>
> >>>>>>>
> >>>>>>> We are not there yet. So, let's all agree than we need both clock bases,
> >>>>>>> and a flexible way to select the current one.
> >>>>>>>
> >>>>>>
> >>>>>> What's still missing with Dovetail and a Linux-operated PTP sync for the
> >>>>>> main clocksource? Hardending of software-based sync paths? Where it's
> >>>>>> hw-based, that should already be fine (latest Intel SOCs).
> >>>>>>
> >>>>>
> >>>>> Again, we have two use cases: absolute timeouts based on a common epoch
> >>>>> which may change, and delays. We do know PTP with hw-based support is an
> >>>>> efficient way to maintain consistency among users of the former,
> >>>>> nevertheless we still need to express/measure strict delays in other
> >>>>> cases. So, we do need both clocks.
> >>>>>
> >>>>
> >>>> If you need cross-system event stamping, the unsync'ed monotonic clock
> >>>> is the wrong choice as well.
> >>>>
> >>>
> >>> As usual, it's a trade-off, a risk assessment between observing
> >>> non-coherent SMP readings and warp jumps. This is a call an application
> >>> should be allowed to make, I see no reason to impose a solution
> >>> arbitrarily without knowing what the app actually wants to do.
> >>>
> >>>> Again: What is missing to use PTP-sync'ed clock-realtime under Dovetail
> >>>> today?
> >>>>
> >>>
> >>> Nothing when it comes to reading timestamps, via the NMI-safe
> >>> ktime_get_real_fast_ns() service (updates will be non-preemptible while
> >>> holding the seqlock).
> >>>
> >>
> >> Hi there,
> >>
> >> to move this topic forward, I propose to create a patch to replace the call to rtdm_clock_read_monotonic by rtdm_clock_read in gpio_core.c,
> >> plus another one that would introduce the possibility to switch between those 2 different clocks.
> >>
> >
> > There is no timed wait calls in this driver which might benefit from a
> > general "clock switch" feature, is it?  So what about either of these
> > options instead:
> >
> > - adding GPIO_RTIOC_TS_REAL only for the purpose of retrieving
> >   timestamps. That would leave the existing client code unaffected, and
> >   _might_ be backported to the stable tree since this is a mere
> >   extension of the ABI.
> >
> > - renaming the current ioctl request to GPIO_RTIOC_GET_TS_MONO, and
> >   provide GPIO_RTIOC_GET_TS[_REAL] based on rtdm_clock_read() in the
> >   same move. That would force the existing client code to revisit their
> >   implementation wrt retrieving a timestamp, providing the desired
> >   alternative between mono/real too. That sort of change would have to
> >   wait for 3.2 though.
> >
>
> How is the status here? This topic should really be address before the
> 3.2 release.
>

Yes, sorry, I left for holydays.
I'm sending the patch right now.

> Thanks,
> Jan
>
> --
> Siemens AG, T RDA IOT
> Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-08-05 15:11 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-16  7:15 Large gpio interrupt latency François Legal
2021-06-16  8:10 ` Chen, Hongzhan
2021-06-16  8:18   ` François Legal
2021-06-16  9:05     ` Chen, Hongzhan
2021-06-16  9:12       ` François Legal
2021-06-16  9:40         ` Jan Kiszka
2021-06-16 13:29           ` François Legal
2021-06-16 13:38             ` Jan Kiszka
2021-06-16 13:51               ` François Legal
2021-06-16 15:10                 ` Jan Kiszka
2021-06-17  7:15                   ` François Legal
2021-06-18 18:41                   ` François Legal
2021-06-21  6:56                     ` Jan Kiszka
2021-06-21  9:39                       ` Philippe Gerum
2021-06-21 13:38                         ` Jan Kiszka
2021-06-21 13:54                           ` Philippe Gerum
2021-06-21 14:02                             ` Jan Kiszka
2021-06-21 14:28                               ` Philippe Gerum
2021-06-21 14:46                                 ` Jan Kiszka
2021-06-21 14:57                                   ` Philippe Gerum
2021-06-21 15:35                                     ` François Legal
2021-06-21 16:38                                       ` Philippe Gerum
2021-06-21 16:45                                         ` Philippe Gerum
2021-06-21 18:06                                           ` François Legal
2021-06-22  7:38                                             ` Philippe Gerum
2021-06-22  7:49                                               ` Julien Blanc
2021-06-22  8:22                                                 ` Jan Kiszka
2021-06-22  8:37                                                   ` Philippe Gerum
2021-06-22  9:14                                                     ` Jan Kiszka
2021-06-22  9:31                                                       ` Philippe Gerum
2021-06-22  9:39                                                         ` Jan Kiszka
2021-06-22 10:07                                                           ` Philippe Gerum
2021-07-23  7:02                                                             ` François Legal
2021-07-23  8:04                                                               ` Philippe Gerum
2021-08-02 12:24                                                                 ` Jan Kiszka
2021-08-05 15:11                                                                   ` François Legal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.