All of lore.kernel.org
 help / color / mirror / Atom feed
* CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
@ 2013-10-24 13:48 Martin Kozusky
  2013-10-25 12:59 ` Martin Kozusky
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Kozusky @ 2013-10-24 13:48 UTC (permalink / raw)
  To: linux-can

Hello,
after more than year I'm back with CAN fifo overrun problems on i.MX25 board.
(it was good enough earlier, if some frames were lost, but not this time)

I have 2 flexcan interfaces, each receiving around 1100 msgs/s (situation is a little better if I use just one iface, but I need both)
I just configure them and then run:

I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
(simulate writing to SDcard with 100KB blocks in 1 sec intervals)

and start sending data from another device.

I am not running any other program (like candump etc) to read from CAN.

this is what is shown after I finish sending 35777 packets (both interfaces now connected to same bus so they should receive same data) with ip -d -s link show can0/1

2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
     link/can
     can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
     bitrate 250000 sample-point 0.857
     tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
     flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
     clock 66500000
     re-started bus-errors arbit-lost error-warn error-pass bus-off
     0          0          0          1          1          0
     RX: bytes  packets  errors  dropped overrun mcast
     151769     19000    1699    0       1699    0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0
root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
     link/can
     can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
     bitrate 250000 sample-point 0.857
     tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
     flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
     clock 66500000
     re-started bus-errors arbit-lost error-warn error-pass bus-off
     0          0          0          0          0          0
     RX: bytes  packets  errors  dropped overrun mcast
     157377     19696    2664    0       2664    0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0


With just one iface used:

2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
     link/can
     can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
     bitrate 250000 sample-point 0.857
     tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
     flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
     clock 66500000
     re-started bus-errors arbit-lost error-warn error-pass bus-off
     0          0          0          1          1          0
     RX: bytes  packets  errors  dropped overrun mcast
     233277     29201    1483    0       1483    0
     TX: bytes  packets  errors  dropped carrier collsns
     0          0        0       0       0       0


Too many packets are lost.

I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that didn't hepl too much, if I put it too high then the system response was slow and packets still lost, also tried to change priority of CAN interrupts with (don't know if correctly)
   // imx_irq_set_priority(43,14);
   // imx_irq_set_priority(44,14);

But it didn't help either.


Does anybody have any idea how not to lose any packets? :)


Thanks,
Martin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-24 13:48 CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19) Martin Kozusky
@ 2013-10-25 12:59 ` Martin Kozusky
  2013-10-25 17:58   ` Wolfgang Grandegger
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Kozusky @ 2013-10-25 12:59 UTC (permalink / raw)
  To: linux-can

Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
> Hello,
> after more than year I'm back with CAN fifo overrun problems on i.MX25 board.
> (it was good enough earlier, if some frames were lost, but not this time)
>
> I have 2 flexcan interfaces, each receiving around 1100 msgs/s (situation is a little better if I use just one iface, but I need both)
> I just configure them and then run:
>
> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>
> and start sending data from another device.
>
> I am not running any other program (like candump etc) to read from CAN.
>
> this is what is shown after I finish sending 35777 packets (both interfaces now connected to same bus so they should receive same data) with ip -d -s link show can0/1
>
> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>      link/can
>      can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>      bitrate 250000 sample-point 0.857
>      tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>      flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>      clock 66500000
>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>      0          0          0          1          1          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      151769     19000    1699    0       1699    0
>      TX: bytes  packets  errors  dropped carrier collsns
>      0          0        0       0       0       0
> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>      link/can
>      can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>      bitrate 250000 sample-point 0.857
>      tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>      flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>      clock 66500000
>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>      0          0          0          0          0          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      157377     19696    2664    0       2664    0
>      TX: bytes  packets  errors  dropped carrier collsns
>      0          0        0       0       0       0
>
>
> With just one iface used:
>
> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>      link/can
>      can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>      bitrate 250000 sample-point 0.857
>      tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>      flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>      clock 66500000
>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>      0          0          0          1          1          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      233277     29201    1483    0       1483    0
>      TX: bytes  packets  errors  dropped carrier collsns
>      0          0        0       0       0       0
>
>
> Too many packets are lost.
>
> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that didn't hepl too much, if I put it too high then the system response was slow and packets still lost, also tried to change priority of CAN interrupts with (don't know if correctly)
>    // imx_irq_set_priority(43,14);
>    // imx_irq_set_priority(44,14);
>
> But it didn't help either.
>
>
> Does anybody have any idea how not to lose any packets? :)


Hello,
I tried to disable
//netif_receive_skb(skb); in  flexcan_read_frame() and other functions so that data is not processed further in system

It didn't help.
So I tried to put time_start=ktime_get_real() at the begining of flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and add their difference to the global variable time_total+=time_stop-time_start;
I divided this time_total by rx_packets count at flexcan_chip_stop and wrote with dev_info into log (variables were initialized in flexcan_chi_start, so I could just do ifconfig can0 up/down and reset those counters and write them to log), so now I had average time spent int flexcan_read_frame.
This time it was around 100usec! just with one CAN used, if both were connected, it was more than twice. And many CAN frames were lost.

So I tried to disable
     /*
         skb = alloc_can_skb(dev, &cf);
         if (unlikely(!skb)) {
                 stats->rx_dropped++;
                 return 0;
         }
      */
and made "struct can_frame cf" (not pointer, so that I can use it in flexcan_read_fifo call)
And tried to send data again.
Now - average time in flexcan_read_frame was not 100usec, but just 2 usec! 50x less ...  no CAN frame was lost, even if I was using both CAN interfaces, each getting over 1100 msgs/sec and writing 100KB data to SD card.

So I am asking - how to make this alloc_can_skb faster (or is there any alternative)? Or if there is another way how to get data to user?

Thank you,
Martin

>
> Thanks,
> Martin
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-25 12:59 ` Martin Kozusky
@ 2013-10-25 17:58   ` Wolfgang Grandegger
  2013-10-26 18:18     ` Martin Kozusky
  0 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-25 17:58 UTC (permalink / raw)
  To: Martin Kozusky, linux-can

Hi Martin,

On 10/25/2013 02:59 PM, Martin Kozusky wrote:
> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>> Hello,
>> after more than year I'm back with CAN fifo overrun problems on i.MX25
>> board.
>> (it was good enough earlier, if some frames were lost, but not this time)
>>
>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>> (situation is a little better if I use just one iface, but I need both)
>> I just configure them and then run:
>>
>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>
>> and start sending data from another device.
>>
>> I am not running any other program (like candump etc) to read from CAN.
>>
>> this is what is shown after I finish sending 35777 packets (both
>> interfaces now connected to same bus so they should receive same data)
>> with ip -d -s link show can0/1
>>
>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>      link/can
>>      can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>> restart-ms 0
>>      bitrate 250000 sample-point 0.857
>>      tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>      flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>      clock 66500000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          1          0

Do you have electrical problems on the bus? Or is reaching error-passive
not related to this problem?

>>      RX: bytes  packets  errors  dropped overrun mcast
>>      151769     19000    1699    0       1699    0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      0          0        0       0       0       0
>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>      link/can
>>      can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>      bitrate 250000 sample-point 0.857
>>      tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>      flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>      clock 66500000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          0          0          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      157377     19696    2664    0       2664    0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      0          0        0       0       0       0
>>
>>
>> With just one iface used:
>>
>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>      link/can
>>      can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>> restart-ms 0
>>      bitrate 250000 sample-point 0.857
>>      tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>      flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>      clock 66500000
>>      re-started bus-errors arbit-lost error-warn error-pass bus-off
>>      0          0          0          1          1          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      233277     29201    1483    0       1483    0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      0          0        0       0       0       0
>>
>>
>> Too many packets are lost.
>>
>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>> didn't hepl too much, if I put it too high then the system response
>> was slow and packets still lost, also tried to change priority of CAN
>> interrupts with (don't know if correctly)
>>    // imx_irq_set_priority(43,14);
>>    // imx_irq_set_priority(44,14);
>>
>> But it didn't help either.
>>
>>
>> Does anybody have any idea how not to lose any packets? :)
> 
> 
> Hello,
> I tried to disable
> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
> so that data is not processed further in system

Well ...

> It didn't help.
> So I tried to put time_start=ktime_get_real() at the begining of
> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
> add their difference to the global variable
> time_total+=time_stop-time_start;
> I divided this time_total by rx_packets count at flexcan_chip_stop and
> wrote with dev_info into log (variables were initialized in
> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
> those counters and write them to log), so now I had average time spent
> int flexcan_read_frame.
> This time it was around 100usec! just with one CAN used, if both were
> connected, it was more than twice. And many CAN frames were lost.
> 
> So I tried to disable
>     /*
>         skb = alloc_can_skb(dev, &cf);
>         if (unlikely(!skb)) {
>                 stats->rx_dropped++;
>                 return 0;
>         }
>      */
> and made "struct can_frame cf" (not pointer, so that I can use it in
> flexcan_read_fifo call)
> And tried to send data again.
> Now - average time in flexcan_read_frame was not 100usec, but just 2
> usec! 50x less ...  no CAN frame was lost, even if I was using both CAN
> interfaces, each getting over 1100 msgs/sec and writing 100KB data to SD
> card.

... but the messages need to be allocated, queued, delivered to and even
processed by a user space task. What you messure it part of the network
stack overhead but 100us just for alloc_can_skb() seems quite a lot to
me. At what frequency is your CPU running? Is the system low of memory?
Maybe your system is simply not fast enough. To see what code is
involved just follow:

  http://lxr.free-electrons.com/ident?i=alloc_can_skb

> So I am asking - how to make this alloc_can_skb faster (or is there any
> alternative)? Or if there is another way how to get data to user?

Well, not with Linux-CAN. Anyway, messages arrive at a rate of approx. 1
KHz. So there is 1ms per message. I think it's a latency problem in the
first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
the queue is full you loose messages. This obviously happens when the
SDcard is accessed.

Could you take function traces on your system?

Wolfgang.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-25 17:58   ` Wolfgang Grandegger
@ 2013-10-26 18:18     ` Martin Kozusky
  2013-10-26 19:40       ` Wolfgang Grandegger
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Kozusky @ 2013-10-26 18:18 UTC (permalink / raw)
  To: linux-can; +Cc: wg

Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
> Hi Martin,
>
> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>> Hello,
>>> after more than year I'm back with CAN fifo overrun problems on i.MX25
>>> board.
>>> (it was good enough earlier, if some frames were lost, but not this time)
>>>
>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>> (situation is a little better if I use just one iface, but I need both)
>>> I just configure them and then run:
>>>
>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>
>>> and start sending data from another device.
>>>
>>> I am not running any other program (like candump etc) to read from CAN.
>>>
>>> this is what is shown after I finish sending 35777 packets (both
>>> interfaces now connected to same bus so they should receive same data)
>>> with ip -d -s link show can0/1
>>>
>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>       link/can
>>>       can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>> restart-ms 0
>>>       bitrate 250000 sample-point 0.857
>>>       tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>       flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>       clock 66500000
>>>       re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>       0          0          0          1          1          0
>
> Do you have electrical problems on the bus? Or is reaching error-passive
> not related to this problem?
It is not related to this problem - there is only RX pin connected on can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is doing TX)

  
>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>       151769     19000    1699    0       1699    0
>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>       0          0        0       0       0       0
>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>       link/can
>>>       can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>       bitrate 250000 sample-point 0.857
>>>       tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>       flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>       clock 66500000
>>>       re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>       0          0          0          0          0          0
>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>       157377     19696    2664    0       2664    0
>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>       0          0        0       0       0       0
>>>
>>>
>>> With just one iface used:
>>>
>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>       link/can
>>>       can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>> restart-ms 0
>>>       bitrate 250000 sample-point 0.857
>>>       tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>       flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>       clock 66500000
>>>       re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>       0          0          0          1          1          0
>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>       233277     29201    1483    0       1483    0
>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>       0          0        0       0       0       0
>>>
>>>
>>> Too many packets are lost.
>>>
>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>> didn't hepl too much, if I put it too high then the system response
>>> was slow and packets still lost, also tried to change priority of CAN
>>> interrupts with (don't know if correctly)
>>>     // imx_irq_set_priority(43,14);
>>>     // imx_irq_set_priority(44,14);
>>>
>>> But it didn't help either.
>>>
>>>
>>> Does anybody have any idea how not to lose any packets? :)
>>
>>
>> Hello,
>> I tried to disable
>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>> so that data is not processed further in system
>
> Well ...
>
>> It didn't help.
>> So I tried to put time_start=ktime_get_real() at the begining of
>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>> add their difference to the global variable
>> time_total+=time_stop-time_start;
>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>> wrote with dev_info into log (variables were initialized in
>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>> those counters and write them to log), so now I had average time spent
>> int flexcan_read_frame.
>> This time it was around 100usec! just with one CAN used, if both were
>> connected, it was more than twice. And many CAN frames were lost.
>>
>> So I tried to disable
>>      /*
>>          skb = alloc_can_skb(dev, &cf);
>>          if (unlikely(!skb)) {
>>                  stats->rx_dropped++;
>>                  return 0;
>>          }
>>       */
>> and made "struct can_frame cf" (not pointer, so that I can use it in
>> flexcan_read_fifo call)
>> And tried to send data again.
>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>> usec! 50x less ...  no CAN frame was lost, even if I was using both CAN
>> interfaces, each getting over 1100 msgs/sec and writing 100KB data to SD
>> card.
>
> ... but the messages need to be allocated, queued, delivered to and even
> processed by a user space task. What you messure it part of the network
> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
> me. At what frequency is your CPU running? Is the system low of memory?
> Maybe your system is simply not fast enough. To see what code is
> involved just follow:
CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally and free enough :(

>
>    http://lxr.free-electrons.com/ident?i=alloc_can_skb
>> So I am asking - how to make this alloc_can_skb faster (or is there any
>> alternative)? Or if there is another way how to get data to user?
>
> Well, not with Linux-CAN. Anyway, messages arrive at a rate of approx. 1
> KHz. So there is 1ms per message. I think it's a latency problem in the
> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
> the queue is full you loose messages. This obviously happens when the
> SDcard is accessed.
>
> Could you take function traces on your system?
Is there any special tool for this or should I use my start/stop timers?

Martin

> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-26 18:18     ` Martin Kozusky
@ 2013-10-26 19:40       ` Wolfgang Grandegger
  2013-10-29 10:46         ` Martin Kozusky
  0 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-26 19:40 UTC (permalink / raw)
  To: Martin Kozusky, linux-can

On 10/26/2013 08:18 PM, Martin Kozusky wrote:
> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>> Hi Martin,
>>
>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>> Hello,
>>>> after more than year I'm back with CAN fifo overrun problems on i.MX25
>>>> board.
>>>> (it was good enough earlier, if some frames were lost, but not this
>>>> time)
>>>>
>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>> (situation is a little better if I use just one iface, but I need both)
>>>> I just configure them and then run:
>>>>
>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>
>>>> and start sending data from another device.
>>>>
>>>> I am not running any other program (like candump etc) to read from CAN.
>>>>
>>>> this is what is shown after I finish sending 35777 packets (both
>>>> interfaces now connected to same bus so they should receive same data)
>>>> with ip -d -s link show can0/1
>>>>
>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>       link/can
>>>>       can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>> restart-ms 0
>>>>       bitrate 250000 sample-point 0.857
>>>>       tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>       flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>       clock 66500000
>>>>       re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>       0          0          0          1          1          0
>>
>> Do you have electrical problems on the bus? Or is reaching error-passive
>> not related to this problem?
> It is not related to this problem - there is only RX pin connected on
> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
> doing TX)
> 
>  
>>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>>       151769     19000    1699    0       1699    0
>>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>>       0          0        0       0       0       0
>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>       link/can
>>>>       can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>       bitrate 250000 sample-point 0.857
>>>>       tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>       flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>       clock 66500000
>>>>       re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>       0          0          0          0          0          0
>>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>>       157377     19696    2664    0       2664    0
>>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>>       0          0        0       0       0       0
>>>>
>>>>
>>>> With just one iface used:
>>>>
>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>       link/can
>>>>       can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>> restart-ms 0
>>>>       bitrate 250000 sample-point 0.857
>>>>       tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>       flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>       clock 66500000
>>>>       re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>       0          0          0          1          1          0
>>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>>       233277     29201    1483    0       1483    0
>>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>>       0          0        0       0       0       0
>>>>
>>>>
>>>> Too many packets are lost.
>>>>
>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>> didn't hepl too much, if I put it too high then the system response
>>>> was slow and packets still lost, also tried to change priority of CAN
>>>> interrupts with (don't know if correctly)
>>>>     // imx_irq_set_priority(43,14);
>>>>     // imx_irq_set_priority(44,14);
>>>>
>>>> But it didn't help either.
>>>>
>>>>
>>>> Does anybody have any idea how not to lose any packets? :)
>>>
>>>
>>> Hello,
>>> I tried to disable
>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>> so that data is not processed further in system
>>
>> Well ...
>>
>>> It didn't help.
>>> So I tried to put time_start=ktime_get_real() at the begining of
>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>> add their difference to the global variable
>>> time_total+=time_stop-time_start;
>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>> wrote with dev_info into log (variables were initialized in
>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>> those counters and write them to log), so now I had average time spent
>>> int flexcan_read_frame.
>>> This time it was around 100usec! just with one CAN used, if both were
>>> connected, it was more than twice. And many CAN frames were lost.
>>>
>>> So I tried to disable
>>>      /*
>>>          skb = alloc_can_skb(dev, &cf);
>>>          if (unlikely(!skb)) {
>>>                  stats->rx_dropped++;
>>>                  return 0;
>>>          }
>>>       */
>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>> flexcan_read_fifo call)
>>> And tried to send data again.
>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>> usec! 50x less ...  no CAN frame was lost, even if I was using both CAN
>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data to SD
>>> card.
>>
>> ... but the messages need to be allocated, queued, delivered to and even
>> processed by a user space task. What you messure it part of the network
>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>> me. At what frequency is your CPU running? Is the system low of memory?
>> Maybe your system is simply not fast enough. To see what code is
>> involved just follow:
> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
> and free enough :(
> 
>>
>>    http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>> So I am asking - how to make this alloc_can_skb faster (or is there any
>>> alternative)? Or if there is another way how to get data to user?
>>
>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of approx. 1
>> KHz. So there is 1ms per message. I think it's a latency problem in the
>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>> the queue is full you loose messages. This obviously happens when the
>> SDcard is accessed.
>>
>> Could you take function traces on your system?
> Is there any special tool for this or should I use my start/stop timers?

Your start/stop timers will not show what other activity is disturbing
the CAN messages reception. There is the Linux function tracer:

  http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt

It need to be enabled in the kernel. Especially event and function
tracing could help to better understand your problems.

Wolfgang.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-26 19:40       ` Wolfgang Grandegger
@ 2013-10-29 10:46         ` Martin Kozusky
  2013-10-29 12:03           ` Wolfgang Grandegger
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Kozusky @ 2013-10-29 10:46 UTC (permalink / raw)
  To: linux-can; +Cc: Wolfgang Grandegger

Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>> Hi Martin,
>>>
>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>> Hello,
>>>>> after more than year I'm back with CAN fifo overrun problems on i.MX25
>>>>> board.
>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>> time)
>>>>>
>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>> (situation is a little better if I use just one iface, but I need both)
>>>>> I just configure them and then run:
>>>>>
>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>
>>>>> and start sending data from another device.
>>>>>
>>>>> I am not running any other program (like candump etc) to read from CAN.
>>>>>
>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>> interfaces now connected to same bus so they should receive same data)
>>>>> with ip -d -s link show can0/1
>>>>>
>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>        link/can
>>>>>        can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>> restart-ms 0
>>>>>        bitrate 250000 sample-point 0.857
>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>        clock 66500000
>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>        0          0          0          1          1          0
>>>
>>> Do you have electrical problems on the bus? Or is reaching error-passive
>>> not related to this problem?
>> It is not related to this problem - there is only RX pin connected on
>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>> doing TX)
>>
>>
>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>        151769     19000    1699    0       1699    0
>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>        0          0        0       0       0       0
>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>        link/can
>>>>>        can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>        bitrate 250000 sample-point 0.857
>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>        clock 66500000
>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>        0          0          0          0          0          0
>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>        157377     19696    2664    0       2664    0
>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>        0          0        0       0       0       0
>>>>>
>>>>>
>>>>> With just one iface used:
>>>>>
>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>        link/can
>>>>>        can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>> restart-ms 0
>>>>>        bitrate 250000 sample-point 0.857
>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>        clock 66500000
>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>        0          0          0          1          1          0
>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>        233277     29201    1483    0       1483    0
>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>        0          0        0       0       0       0
>>>>>
>>>>>
>>>>> Too many packets are lost.
>>>>>
>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>> didn't hepl too much, if I put it too high then the system response
>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>> interrupts with (don't know if correctly)
>>>>>      // imx_irq_set_priority(43,14);
>>>>>      // imx_irq_set_priority(44,14);
>>>>>
>>>>> But it didn't help either.
>>>>>
>>>>>
>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>
>>>>
>>>> Hello,
>>>> I tried to disable
>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>> so that data is not processed further in system
>>>
>>> Well ...
>>>
>>>> It didn't help.
>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>> add their difference to the global variable
>>>> time_total+=time_stop-time_start;
>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>> wrote with dev_info into log (variables were initialized in
>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>> those counters and write them to log), so now I had average time spent
>>>> int flexcan_read_frame.
>>>> This time it was around 100usec! just with one CAN used, if both were
>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>
>>>> So I tried to disable
>>>>       /*
>>>>           skb = alloc_can_skb(dev, &cf);
>>>>           if (unlikely(!skb)) {
>>>>                   stats->rx_dropped++;
>>>>                   return 0;
>>>>           }
>>>>        */
>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>> flexcan_read_fifo call)
>>>> And tried to send data again.
>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both CAN
>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data to SD
>>>> card.
>>>
>>> ... but the messages need to be allocated, queued, delivered to and even
>>> processed by a user space task. What you messure it part of the network
>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>> me. At what frequency is your CPU running? Is the system low of memory?
>>> Maybe your system is simply not fast enough. To see what code is
>>> involved just follow:
>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>> and free enough :(
>>
>>>
>>>     http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>> So I am asking - how to make this alloc_can_skb faster (or is there any
>>>> alternative)? Or if there is another way how to get data to user?
>>>
>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of approx. 1
>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>> the queue is full you loose messages. This obviously happens when the
>>> SDcard is accessed.
>>>
>>> Could you take function traces on your system?
>> Is there any special tool for this or should I use my start/stop timers?
>
> Your start/stop timers will not show what other activity is disturbing
> the CAN messages reception. There is the Linux function tracer:
>
>    http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>
> It need to be enabled in the kernel. Especially event and function
> tracing could help to better understand your problems.
Hello Wolfgang,
it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be that easy, will be?
Timestamps that ftrace is showing me are in 10 miliseconds resolution, that won't help me much :(

Martin
  
> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 10:46         ` Martin Kozusky
@ 2013-10-29 12:03           ` Wolfgang Grandegger
  2013-10-29 12:22             ` Wolfgang Grandegger
  2013-10-29 12:40             ` Martin Kozusky
  0 siblings, 2 replies; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-29 12:03 UTC (permalink / raw)
  To: Martin Kozusky, linux-can

On 10/29/2013 11:46 AM, Martin Kozusky wrote:
> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>> Hi Martin,
>>>>
>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>> Hello,
>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>> i.MX25
>>>>>> board.
>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>> time)
>>>>>>
>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>> both)
>>>>>> I just configure them and then run:
>>>>>>
>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>
>>>>>> and start sending data from another device.
>>>>>>
>>>>>> I am not running any other program (like candump etc) to read from
>>>>>> CAN.
>>>>>>
>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>> interfaces now connected to same bus so they should receive same
>>>>>> data)
>>>>>> with ip -d -s link show can0/1
>>>>>>
>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>        link/can
>>>>>>        can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>> restart-ms 0
>>>>>>        bitrate 250000 sample-point 0.857
>>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>        clock 66500000
>>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>        0          0          0          1          1          0
>>>>
>>>> Do you have electrical problems on the bus? Or is reaching
>>>> error-passive
>>>> not related to this problem?
>>> It is not related to this problem - there is only RX pin connected on
>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>> doing TX)
>>>
>>>
>>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>>        151769     19000    1699    0       1699    0
>>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>>        0          0        0       0       0       0
>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>        link/can
>>>>>>        can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>        bitrate 250000 sample-point 0.857
>>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>        clock 66500000
>>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>        0          0          0          0          0          0
>>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>>        157377     19696    2664    0       2664    0
>>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>>        0          0        0       0       0       0
>>>>>>
>>>>>>
>>>>>> With just one iface used:
>>>>>>
>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>        link/can
>>>>>>        can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>> restart-ms 0
>>>>>>        bitrate 250000 sample-point 0.857
>>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>        clock 66500000
>>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>        0          0          0          1          1          0
>>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>>        233277     29201    1483    0       1483    0
>>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>>        0          0        0       0       0       0
>>>>>>
>>>>>>
>>>>>> Too many packets are lost.
>>>>>>
>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>> interrupts with (don't know if correctly)
>>>>>>      // imx_irq_set_priority(43,14);
>>>>>>      // imx_irq_set_priority(44,14);
>>>>>>
>>>>>> But it didn't help either.
>>>>>>
>>>>>>
>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>
>>>>>
>>>>> Hello,
>>>>> I tried to disable
>>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>>> so that data is not processed further in system
>>>>
>>>> Well ...
>>>>
>>>>> It didn't help.
>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>> add their difference to the global variable
>>>>> time_total+=time_stop-time_start;
>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>> wrote with dev_info into log (variables were initialized in
>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>> those counters and write them to log), so now I had average time spent
>>>>> int flexcan_read_frame.
>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>
>>>>> So I tried to disable
>>>>>       /*
>>>>>           skb = alloc_can_skb(dev, &cf);
>>>>>           if (unlikely(!skb)) {
>>>>>                   stats->rx_dropped++;
>>>>>                   return 0;
>>>>>           }
>>>>>        */
>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>> flexcan_read_fifo call)
>>>>> And tried to send data again.
>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both
>>>>> CAN
>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>> to SD
>>>>> card.
>>>>
>>>> ... but the messages need to be allocated, queued, delivered to and
>>>> even
>>>> processed by a user space task. What you messure it part of the network
>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>> Maybe your system is simply not fast enough. To see what code is
>>>> involved just follow:
>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>> and free enough :(
>>>
>>>>
>>>>     http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>> any
>>>>> alternative)? Or if there is another way how to get data to user?
>>>>
>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>> approx. 1
>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>> the queue is full you loose messages. This obviously happens when the
>>>> SDcard is accessed.
>>>>
>>>> Could you take function traces on your system?
>>> Is there any special tool for this or should I use my start/stop timers?
>>
>> Your start/stop timers will not show what other activity is disturbing
>> the CAN messages reception. There is the Linux function tracer:
>>
>>    http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>
>> It need to be enabled in the kernel. Especially event and function
>> tracing could help to better understand your problems.
> Hello Wolfgang,
> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
> that easy, will be?
> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
> that won't help me much :(

Probably that version is to old for proper ftrace support. The 100us you
measured for alloc_can_skb() is worst case, right? What is the mean value?

Wolfgang.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 12:03           ` Wolfgang Grandegger
@ 2013-10-29 12:22             ` Wolfgang Grandegger
  2013-10-29 12:49               ` Martin Kozusky
  2013-10-29 12:40             ` Martin Kozusky
  1 sibling, 1 reply; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-29 12:22 UTC (permalink / raw)
  To: Martin Kozusky, linux-can

On 10/29/2013 01:03 PM, Wolfgang Grandegger wrote:
> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>>> Hi Martin,
>>>>>
>>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>>> Hello,
>>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>>> i.MX25
>>>>>>> board.
>>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>>> time)
>>>>>>>
>>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>>> both)
>>>>>>> I just configure them and then run:
>>>>>>>
>>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>>
>>>>>>> and start sending data from another device.
>>>>>>>
>>>>>>> I am not running any other program (like candump etc) to read from
>>>>>>> CAN.
>>>>>>>
>>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>>> interfaces now connected to same bus so they should receive same
>>>>>>> data)
>>>>>>> with ip -d -s link show can0/1
>>>>>>>
>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>        link/can
>>>>>>>        can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>> restart-ms 0
>>>>>>>        bitrate 250000 sample-point 0.857
>>>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>        clock 66500000
>>>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>        0          0          0          1          1          0
>>>>>
>>>>> Do you have electrical problems on the bus? Or is reaching
>>>>> error-passive
>>>>> not related to this problem?
>>>> It is not related to this problem - there is only RX pin connected on
>>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>>> doing TX)
>>>>
>>>>
>>>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>        151769     19000    1699    0       1699    0
>>>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>        0          0        0       0       0       0
>>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>        link/can
>>>>>>>        can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>>        bitrate 250000 sample-point 0.857
>>>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>        clock 66500000
>>>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>        0          0          0          0          0          0
>>>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>        157377     19696    2664    0       2664    0
>>>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>        0          0        0       0       0       0
>>>>>>>
>>>>>>>
>>>>>>> With just one iface used:
>>>>>>>
>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>        link/can
>>>>>>>        can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>> restart-ms 0
>>>>>>>        bitrate 250000 sample-point 0.857
>>>>>>>        tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>        flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>        clock 66500000
>>>>>>>        re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>        0          0          0          1          1          0
>>>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>        233277     29201    1483    0       1483    0
>>>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>        0          0        0       0       0       0
>>>>>>>
>>>>>>>
>>>>>>> Too many packets are lost.
>>>>>>>
>>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>>> interrupts with (don't know if correctly)
>>>>>>>      // imx_irq_set_priority(43,14);
>>>>>>>      // imx_irq_set_priority(44,14);
>>>>>>>
>>>>>>> But it didn't help either.
>>>>>>>
>>>>>>>
>>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>> I tried to disable
>>>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>>>> so that data is not processed further in system
>>>>>
>>>>> Well ...
>>>>>
>>>>>> It didn't help.
>>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>>> add their difference to the global variable
>>>>>> time_total+=time_stop-time_start;
>>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>>> wrote with dev_info into log (variables were initialized in
>>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>>> those counters and write them to log), so now I had average time spent
>>>>>> int flexcan_read_frame.
>>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>>
>>>>>> So I tried to disable
>>>>>>       /*
>>>>>>           skb = alloc_can_skb(dev, &cf);
>>>>>>           if (unlikely(!skb)) {
>>>>>>                   stats->rx_dropped++;
>>>>>>                   return 0;
>>>>>>           }
>>>>>>        */
>>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>>> flexcan_read_fifo call)
>>>>>> And tried to send data again.
>>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both
>>>>>> CAN
>>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>>> to SD
>>>>>> card.
>>>>>
>>>>> ... but the messages need to be allocated, queued, delivered to and
>>>>> even
>>>>> processed by a user space task. What you messure it part of the network
>>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>>> Maybe your system is simply not fast enough. To see what code is
>>>>> involved just follow:
>>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>>> and free enough :(
>>>>
>>>>>
>>>>>     http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>>> any
>>>>>> alternative)? Or if there is another way how to get data to user?
>>>>>
>>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>>> approx. 1
>>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>>> the queue is full you loose messages. This obviously happens when the
>>>>> SDcard is accessed.
>>>>>
>>>>> Could you take function traces on your system?
>>>> Is there any special tool for this or should I use my start/stop timers?
>>>
>>> Your start/stop timers will not show what other activity is disturbing
>>> the CAN messages reception. There is the Linux function tracer:
>>>
>>>    http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>>
>>> It need to be enabled in the kernel. Especially event and function
>>> tracing could help to better understand your problems.
>> Hello Wolfgang,
>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>> that easy, will be?
>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>> that won't help me much :(
> 
> Probably that version is to old for proper ftrace support. The 100us you
> measured for alloc_can_skb() is worst case, right? What is the mean value?

Flexcan support was added to the mainline kernel 2.6.36. Where did you
get your flexcan driver from? Could you post it please here? Any chance
to switch to a (more) recent version of the Linux kernel?

Wolfgang.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 12:03           ` Wolfgang Grandegger
  2013-10-29 12:22             ` Wolfgang Grandegger
@ 2013-10-29 12:40             ` Martin Kozusky
  2013-10-29 14:30               ` Wolfgang Grandegger
  1 sibling, 1 reply; 17+ messages in thread
From: Martin Kozusky @ 2013-10-29 12:40 UTC (permalink / raw)
  To: linux-can; +Cc: Wolfgang Grandegger

Dne 29.10.2013 13:03, Wolfgang Grandegger napsal(a):
> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>>> Hi Martin,
>>>>>
>>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>>> Hello,
>>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>>> i.MX25
>>>>>>> board.
>>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>>> time)
>>>>>>>
>>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>>> both)
>>>>>>> I just configure them and then run:
>>>>>>>
>>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>>
>>>>>>> and start sending data from another device.
>>>>>>>
>>>>>>> I am not running any other program (like candump etc) to read from
>>>>>>> CAN.
>>>>>>>
>>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>>> interfaces now connected to same bus so they should receive same
>>>>>>> data)
>>>>>>> with ip -d -s link show can0/1
>>>>>>>
>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>         link/can
>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>> restart-ms 0
>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>         clock 66500000
>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>         0          0          0          1          1          0
>>>>>
>>>>> Do you have electrical problems on the bus? Or is reaching
>>>>> error-passive
>>>>> not related to this problem?
>>>> It is not related to this problem - there is only RX pin connected on
>>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>>> doing TX)
>>>>
>>>>
>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>         151769     19000    1699    0       1699    0
>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>         0          0        0       0       0       0
>>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>         link/can
>>>>>>>         can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>         clock 66500000
>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>         0          0          0          0          0          0
>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>         157377     19696    2664    0       2664    0
>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>         0          0        0       0       0       0
>>>>>>>
>>>>>>>
>>>>>>> With just one iface used:
>>>>>>>
>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>         link/can
>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>> restart-ms 0
>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>         clock 66500000
>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>         0          0          0          1          1          0
>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>         233277     29201    1483    0       1483    0
>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>         0          0        0       0       0       0
>>>>>>>
>>>>>>>
>>>>>>> Too many packets are lost.
>>>>>>>
>>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>>> interrupts with (don't know if correctly)
>>>>>>>       // imx_irq_set_priority(43,14);
>>>>>>>       // imx_irq_set_priority(44,14);
>>>>>>>
>>>>>>> But it didn't help either.
>>>>>>>
>>>>>>>
>>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>>
>>>>>>
>>>>>> Hello,
>>>>>> I tried to disable
>>>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>>>> so that data is not processed further in system
>>>>>
>>>>> Well ...
>>>>>
>>>>>> It didn't help.
>>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>>> add their difference to the global variable
>>>>>> time_total+=time_stop-time_start;
>>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>>> wrote with dev_info into log (variables were initialized in
>>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>>> those counters and write them to log), so now I had average time spent
>>>>>> int flexcan_read_frame.
>>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>>
>>>>>> So I tried to disable
>>>>>>        /*
>>>>>>            skb = alloc_can_skb(dev, &cf);
>>>>>>            if (unlikely(!skb)) {
>>>>>>                    stats->rx_dropped++;
>>>>>>                    return 0;
>>>>>>            }
>>>>>>         */
>>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>>> flexcan_read_fifo call)
>>>>>> And tried to send data again.
>>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both
>>>>>> CAN
>>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>>> to SD
>>>>>> card.
>>>>>
>>>>> ... but the messages need to be allocated, queued, delivered to and
>>>>> even
>>>>> processed by a user space task. What you messure it part of the network
>>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>>> Maybe your system is simply not fast enough. To see what code is
>>>>> involved just follow:
>>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>>> and free enough :(
>>>>
>>>>>
>>>>>      http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>>> any
>>>>>> alternative)? Or if there is another way how to get data to user?
>>>>>
>>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>>> approx. 1
>>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>>> the queue is full you loose messages. This obviously happens when the
>>>>> SDcard is accessed.
>>>>>
>>>>> Could you take function traces on your system?
>>>> Is there any special tool for this or should I use my start/stop timers?
>>>
>>> Your start/stop timers will not show what other activity is disturbing
>>> the CAN messages reception. There is the Linux function tracer:
>>>
>>>     http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>>
>>> It need to be enabled in the kernel. Especially event and function
>>> tracing could help to better understand your problems.
>> Hello Wolfgang,
>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>> that easy, will be?
>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>> that won't help me much :(
>
> Probably that version is to old for proper ftrace support. The 100us you
> measured for alloc_can_skb() is worst case, right? What is the mean value?

Now I checked again and logged every call (don't know if realy everything was logged but something was :) and I see that it is not 100usec, but only around 20usec (mean value - checked by eye). There were some very long calls (around 2ms!) that were puttings errors in my  sum/count formula (may be I should filter out calls longer that 200usec), with this error it was not 100usec, but almost 1ms (my value was only 32bit and was overflowing when I wrote it first time). So normally around 20usec, but with very long calls around 2-3 ms (looks like those long are periodic - each 6th - 8th call is much longer, but not all the time)
But still with those 20usec call, there are many RX overflows, if I disable the call alloc_can_skb, there are no overflows and all is received (but still not processed further, because I don't have skb ... )

Martin

> Wolfgang.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 12:22             ` Wolfgang Grandegger
@ 2013-10-29 12:49               ` Martin Kozusky
  2013-10-29 12:54                 ` Gary Thomas
  0 siblings, 1 reply; 17+ messages in thread
From: Martin Kozusky @ 2013-10-29 12:49 UTC (permalink / raw)
  To: linux-can; +Cc: Wolfgang Grandegger

Dne 29.10.2013 13:22, Wolfgang Grandegger napsal(a):
> On 10/29/2013 01:03 PM, Wolfgang Grandegger wrote:
>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>>> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>>>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>>>> Hi Martin,
>>>>>>
>>>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>>>> Hello,
>>>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>>>> i.MX25
>>>>>>>> board.
>>>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>>>> time)
>>>>>>>>
>>>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>>>> both)
>>>>>>>> I just configure them and then run:
>>>>>>>>
>>>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>>>
>>>>>>>> and start sending data from another device.
>>>>>>>>
>>>>>>>> I am not running any other program (like candump etc) to read from
>>>>>>>> CAN.
>>>>>>>>
>>>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>>>> interfaces now connected to same bus so they should receive same
>>>>>>>> data)
>>>>>>>> with ip -d -s link show can0/1
>>>>>>>>
>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>         link/can
>>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>> restart-ms 0
>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>         clock 66500000
>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>         0          0          0          1          1          0
>>>>>>
>>>>>> Do you have electrical problems on the bus? Or is reaching
>>>>>> error-passive
>>>>>> not related to this problem?
>>>>> It is not related to this problem - there is only RX pin connected on
>>>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>>>> doing TX)
>>>>>
>>>>>
>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>         151769     19000    1699    0       1699    0
>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>         0          0        0       0       0       0
>>>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>         link/can
>>>>>>>>         can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>         clock 66500000
>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>         0          0          0          0          0          0
>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>         157377     19696    2664    0       2664    0
>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>         0          0        0       0       0       0
>>>>>>>>
>>>>>>>>
>>>>>>>> With just one iface used:
>>>>>>>>
>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>         link/can
>>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>> restart-ms 0
>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>         clock 66500000
>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>         0          0          0          1          1          0
>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>         233277     29201    1483    0       1483    0
>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>         0          0        0       0       0       0
>>>>>>>>
>>>>>>>>
>>>>>>>> Too many packets are lost.
>>>>>>>>
>>>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>>>> interrupts with (don't know if correctly)
>>>>>>>>       // imx_irq_set_priority(43,14);
>>>>>>>>       // imx_irq_set_priority(44,14);
>>>>>>>>
>>>>>>>> But it didn't help either.
>>>>>>>>
>>>>>>>>
>>>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>>>
>>>>>>>
>>>>>>> Hello,
>>>>>>> I tried to disable
>>>>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>>>>> so that data is not processed further in system
>>>>>>
>>>>>> Well ...
>>>>>>
>>>>>>> It didn't help.
>>>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>>>> add their difference to the global variable
>>>>>>> time_total+=time_stop-time_start;
>>>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>>>> wrote with dev_info into log (variables were initialized in
>>>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>>>> those counters and write them to log), so now I had average time spent
>>>>>>> int flexcan_read_frame.
>>>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>>>
>>>>>>> So I tried to disable
>>>>>>>        /*
>>>>>>>            skb = alloc_can_skb(dev, &cf);
>>>>>>>            if (unlikely(!skb)) {
>>>>>>>                    stats->rx_dropped++;
>>>>>>>                    return 0;
>>>>>>>            }
>>>>>>>         */
>>>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>>>> flexcan_read_fifo call)
>>>>>>> And tried to send data again.
>>>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both
>>>>>>> CAN
>>>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>>>> to SD
>>>>>>> card.
>>>>>>
>>>>>> ... but the messages need to be allocated, queued, delivered to and
>>>>>> even
>>>>>> processed by a user space task. What you messure it part of the network
>>>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>>>> Maybe your system is simply not fast enough. To see what code is
>>>>>> involved just follow:
>>>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>>>> and free enough :(
>>>>>
>>>>>>
>>>>>>      http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>>>> any
>>>>>>> alternative)? Or if there is another way how to get data to user?
>>>>>>
>>>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>>>> approx. 1
>>>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>>>> the queue is full you loose messages. This obviously happens when the
>>>>>> SDcard is accessed.
>>>>>>
>>>>>> Could you take function traces on your system?
>>>>> Is there any special tool for this or should I use my start/stop timers?
>>>>
>>>> Your start/stop timers will not show what other activity is disturbing
>>>> the CAN messages reception. There is the Linux function tracer:
>>>>
>>>>     http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>>>
>>>> It need to be enabled in the kernel. Especially event and function
>>>> tracing could help to better understand your problems.
>>> Hello Wolfgang,
>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>> that easy, will be?
>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>> that won't help me much :(
>>
>> Probably that version is to old for proper ftrace support. The 100us you
>> measured for alloc_can_skb() is worst case, right? What is the mean value?
>
> Flexcan support was added to the mainline kernel 2.6.36. Where did you
> get your flexcan driver from? Could you post it please here? Any chance
> to switch to a (more) recent version of the Linux kernel?

It is 2.6.35.9 kernel, I think flexcan was backported from 2.6.36 by board developer who made patch for this kernel. But I am keeping it "updated" with latest updates from 3.x kernel, so I think there should be no errors in this driver.
I tried to switch to 2.6.39 but I think there were some errors with the kernel patch that adds support for this board so I gave up. May be I should try again.

Martin

> Wolfgang.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 12:49               ` Martin Kozusky
@ 2013-10-29 12:54                 ` Gary Thomas
  2013-10-29 13:00                   ` "Martin Kožuský [KK micro s.r.o.]"
  0 siblings, 1 reply; 17+ messages in thread
From: Gary Thomas @ 2013-10-29 12:54 UTC (permalink / raw)
  To: Martin Kozusky, linux-can; +Cc: Wolfgang Grandegger

On 2013-10-29 06:49, Martin Kozusky wrote:
> Dne 29.10.2013 13:22, Wolfgang Grandegger napsal(a):
>> On 10/29/2013 01:03 PM, Wolfgang Grandegger wrote:
>>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>>>> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>>>>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>>>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>>>>> Hi Martin,
>>>>>>>
>>>>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>>>>> Hello,
>>>>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>>>>> i.MX25
>>>>>>>>> board.
>>>>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>>>>> time)
>>>>>>>>>
>>>>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>>>>> both)
>>>>>>>>> I just configure them and then run:
>>>>>>>>>
>>>>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>>>>
>>>>>>>>> and start sending data from another device.
>>>>>>>>>
>>>>>>>>> I am not running any other program (like candump etc) to read from
>>>>>>>>> CAN.
>>>>>>>>>
>>>>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>>>>> interfaces now connected to same bus so they should receive same
>>>>>>>>> data)
>>>>>>>>> with ip -d -s link show can0/1
>>>>>>>>>
>>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>>         link/can
>>>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>>> restart-ms 0
>>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>>         clock 66500000
>>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>>         0          0          0          1          1          0
>>>>>>>
>>>>>>> Do you have electrical problems on the bus? Or is reaching
>>>>>>> error-passive
>>>>>>> not related to this problem?
>>>>>> It is not related to this problem - there is only RX pin connected on
>>>>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>>>>> doing TX)
>>>>>>
>>>>>>
>>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>>         151769     19000    1699    0       1699    0
>>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>>         0          0        0       0       0       0
>>>>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>>         link/can
>>>>>>>>>         can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>>         clock 66500000
>>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>>         0          0          0          0          0          0
>>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>>         157377     19696    2664    0       2664    0
>>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>>         0          0        0       0       0       0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With just one iface used:
>>>>>>>>>
>>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>>         link/can
>>>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>>> restart-ms 0
>>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>>         clock 66500000
>>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>>         0          0          0          1          1          0
>>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>>         233277     29201    1483    0       1483    0
>>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>>         0          0        0       0       0       0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Too many packets are lost.
>>>>>>>>>
>>>>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>>>>> interrupts with (don't know if correctly)
>>>>>>>>>       // imx_irq_set_priority(43,14);
>>>>>>>>>       // imx_irq_set_priority(44,14);
>>>>>>>>>
>>>>>>>>> But it didn't help either.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>>>>
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>> I tried to disable
>>>>>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>>>>>> so that data is not processed further in system
>>>>>>>
>>>>>>> Well ...
>>>>>>>
>>>>>>>> It didn't help.
>>>>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>>>>> add their difference to the global variable
>>>>>>>> time_total+=time_stop-time_start;
>>>>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>>>>> wrote with dev_info into log (variables were initialized in
>>>>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>>>>> those counters and write them to log), so now I had average time spent
>>>>>>>> int flexcan_read_frame.
>>>>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>>>>
>>>>>>>> So I tried to disable
>>>>>>>>        /*
>>>>>>>>            skb = alloc_can_skb(dev, &cf);
>>>>>>>>            if (unlikely(!skb)) {
>>>>>>>>                    stats->rx_dropped++;
>>>>>>>>                    return 0;
>>>>>>>>            }
>>>>>>>>         */
>>>>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>>>>> flexcan_read_fifo call)
>>>>>>>> And tried to send data again.
>>>>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both
>>>>>>>> CAN
>>>>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>>>>> to SD
>>>>>>>> card.
>>>>>>>
>>>>>>> ... but the messages need to be allocated, queued, delivered to and
>>>>>>> even
>>>>>>> processed by a user space task. What you messure it part of the network
>>>>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>>>>> Maybe your system is simply not fast enough. To see what code is
>>>>>>> involved just follow:
>>>>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>>>>> and free enough :(
>>>>>>
>>>>>>>
>>>>>>>      http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>>>>> any
>>>>>>>> alternative)? Or if there is another way how to get data to user?
>>>>>>>
>>>>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>>>>> approx. 1
>>>>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>>>>> the queue is full you loose messages. This obviously happens when the
>>>>>>> SDcard is accessed.
>>>>>>>
>>>>>>> Could you take function traces on your system?
>>>>>> Is there any special tool for this or should I use my start/stop timers?
>>>>>
>>>>> Your start/stop timers will not show what other activity is disturbing
>>>>> the CAN messages reception. There is the Linux function tracer:
>>>>>
>>>>>     http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>>>>
>>>>> It need to be enabled in the kernel. Especially event and function
>>>>> tracing could help to better understand your problems.
>>>> Hello Wolfgang,
>>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>>> that easy, will be?
>>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>>> that won't help me much :(
>>>
>>> Probably that version is to old for proper ftrace support. The 100us you
>>> measured for alloc_can_skb() is worst case, right? What is the mean value?
>>
>> Flexcan support was added to the mainline kernel 2.6.36. Where did you
>> get your flexcan driver from? Could you post it please here? Any chance
>> to switch to a (more) recent version of the Linux kernel?
>
> It is 2.6.35.9 kernel, I think flexcan was backported from 2.6.36 by board developer who made patch for this kernel. But I am keeping it "updated" with latest updates from 3.x
> kernel, so I think there should be no errors in this driver.
> I tried to switch to 2.6.39 but I think there were some errors with the kernel patch that adds support for this board so I gave up. May be I should try again.

What board are you using?

I've had good success with the mainline kernel 3.4 on i.MX25
n.b. I haven't tried CAN with that board yet, but the flexcan
driver is standard in that version.

-- 
------------------------------------------------------------
Gary Thomas                 |  Consulting for the
MLB Associates              |    Embedded world
------------------------------------------------------------

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 12:54                 ` Gary Thomas
@ 2013-10-29 13:00                   ` "Martin Kožuský [KK micro s.r.o.]"
  0 siblings, 0 replies; 17+ messages in thread
From: "Martin Kožuský [KK micro s.r.o.]" @ 2013-10-29 13:00 UTC (permalink / raw)
  To: Gary Thomas, linux-can; +Cc: Wolfgang Grandegger

-------- Original Message  --------
Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
From: Gary Thomas
To: Martin Kozusky, linux-can@vger.kernel.org
Date: 29. Říjen 2013 13:54:01

> On 2013-10-29 06:49, Martin Kozusky wrote:
>> Dne 29.10.2013 13:22, Wolfgang Grandegger napsal(a):
>>> On 10/29/2013 01:03 PM, Wolfgang Grandegger wrote:
>>>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>>>>> Dne 26.10.2013 21:40, Wolfgang Grandegger napsal(a):
>>>>>> On 10/26/2013 08:18 PM, Martin Kozusky wrote:
>>>>>>> Dne 25.10.2013 19:58, Wolfgang Grandegger napsal(a):
>>>>>>>> Hi Martin,
>>>>>>>>
>>>>>>>> On 10/25/2013 02:59 PM, Martin Kozusky wrote:
>>>>>>>>> Dne 24.10.2013 15:48, Martin Kozusky napsal(a):
>>>>>>>>>> Hello,
>>>>>>>>>> after more than year I'm back with CAN fifo overrun problems on
>>>>>>>>>> i.MX25
>>>>>>>>>> board.
>>>>>>>>>> (it was good enough earlier, if some frames were lost, but not this
>>>>>>>>>> time)
>>>>>>>>>>
>>>>>>>>>> I have 2 flexcan interfaces, each receiving around 1100 msgs/s
>>>>>>>>>> (situation is a little better if I use just one iface, but I need
>>>>>>>>>> both)
>>>>>>>>>> I just configure them and then run:
>>>>>>>>>>
>>>>>>>>>> I=0; while [ $I -le 20 ]; do dd if=/dev/zero of=/mnt/mmcblk0p1/test
>>>>>>>>>> bs=512 count=200; sync; sleep 1; I=$(($I+1)); done
>>>>>>>>>> (simulate writing to SDcard with 100KB blocks in 1 sec intervals)
>>>>>>>>>>
>>>>>>>>>> and start sending data from another device.
>>>>>>>>>>
>>>>>>>>>> I am not running any other program (like candump etc) to read from
>>>>>>>>>> CAN.
>>>>>>>>>>
>>>>>>>>>> this is what is shown after I finish sending 35777 packets (both
>>>>>>>>>> interfaces now connected to same bus so they should receive same
>>>>>>>>>> data)
>>>>>>>>>> with ip -d -s link show can0/1
>>>>>>>>>>
>>>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>>>         link/can
>>>>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>>>> restart-ms 0
>>>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>>>         clock 66500000
>>>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>>>         0          0          0          1          1          0
>>>>>>>>
>>>>>>>> Do you have electrical problems on the bus? Or is reaching
>>>>>>>> error-passive
>>>>>>>> not related to this problem?
>>>>>>> It is not related to this problem - there is only RX pin connected on
>>>>>>> can0 (RX is connected in parallel with Coldfire V1 MCU CAN, which is
>>>>>>> doing TX)
>>>>>>>
>>>>>>>
>>>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>>>         151769     19000    1699    0       1699    0
>>>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>>>         0          0        0       0       0       0
>>>>>>>>>> root@vmx25 /opt/waytracer$ /root/utils/ip -d -s link show can1
>>>>>>>>>> 3: can1: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>>>         link/can
>>>>>>>>>>         can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
>>>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>>>         clock 66500000
>>>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>>>         0          0          0          0          0          0
>>>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>>>         157377     19696    2664    0       2664    0
>>>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>>>         0          0        0       0       0       0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> With just one iface used:
>>>>>>>>>>
>>>>>>>>>> 2: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN qlen 10
>>>>>>>>>>         link/can
>>>>>>>>>>         can <LISTEN-ONLY> state STOPPED (berr-counter tx 0 rx 0)
>>>>>>>>>> restart-ms 0
>>>>>>>>>>         bitrate 250000 sample-point 0.857
>>>>>>>>>>         tq 285 prop-seg 5 phase-seg1 6 phase-seg2 2 sjw 1
>>>>>>>>>>         flexcan: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 1..256 brp-inc 1
>>>>>>>>>>         clock 66500000
>>>>>>>>>>         re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>>>>>>>>         0          0          0          1          1          0
>>>>>>>>>>         RX: bytes  packets  errors  dropped overrun mcast
>>>>>>>>>>         233277     29201    1483    0       1483    0
>>>>>>>>>>         TX: bytes  packets  errors  dropped carrier collsns
>>>>>>>>>>         0          0        0       0       0       0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Too many packets are lost.
>>>>>>>>>>
>>>>>>>>>> I tried to play with FLEXCAN_NAPI_WEIGHT (quota for napi) and that
>>>>>>>>>> didn't hepl too much, if I put it too high then the system response
>>>>>>>>>> was slow and packets still lost, also tried to change priority of CAN
>>>>>>>>>> interrupts with (don't know if correctly)
>>>>>>>>>>       // imx_irq_set_priority(43,14);
>>>>>>>>>>       // imx_irq_set_priority(44,14);
>>>>>>>>>>
>>>>>>>>>> But it didn't help either.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Does anybody have any idea how not to lose any packets? :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> I tried to disable
>>>>>>>>> //netif_receive_skb(skb); in  flexcan_read_frame() and other functions
>>>>>>>>> so that data is not processed further in system
>>>>>>>>
>>>>>>>> Well ...
>>>>>>>>
>>>>>>>>> It didn't help.
>>>>>>>>> So I tried to put time_start=ktime_get_real() at the begining of
>>>>>>>>> flexcan_read_frame(), then time_stop=ktime_get_real(); at the end and
>>>>>>>>> add their difference to the global variable
>>>>>>>>> time_total+=time_stop-time_start;
>>>>>>>>> I divided this time_total by rx_packets count at flexcan_chip_stop and
>>>>>>>>> wrote with dev_info into log (variables were initialized in
>>>>>>>>> flexcan_chi_start, so I could just do ifconfig can0 up/down and reset
>>>>>>>>> those counters and write them to log), so now I had average time spent
>>>>>>>>> int flexcan_read_frame.
>>>>>>>>> This time it was around 100usec! just with one CAN used, if both were
>>>>>>>>> connected, it was more than twice. And many CAN frames were lost.
>>>>>>>>>
>>>>>>>>> So I tried to disable
>>>>>>>>>        /*
>>>>>>>>>            skb = alloc_can_skb(dev, &cf);
>>>>>>>>>            if (unlikely(!skb)) {
>>>>>>>>>                    stats->rx_dropped++;
>>>>>>>>>                    return 0;
>>>>>>>>>            }
>>>>>>>>>         */
>>>>>>>>> and made "struct can_frame cf" (not pointer, so that I can use it in
>>>>>>>>> flexcan_read_fifo call)
>>>>>>>>> And tried to send data again.
>>>>>>>>> Now - average time in flexcan_read_frame was not 100usec, but just 2
>>>>>>>>> usec! 50x less ...  no CAN frame was lost, even if I was using both
>>>>>>>>> CAN
>>>>>>>>> interfaces, each getting over 1100 msgs/sec and writing 100KB data
>>>>>>>>> to SD
>>>>>>>>> card.
>>>>>>>>
>>>>>>>> ... but the messages need to be allocated, queued, delivered to and
>>>>>>>> even
>>>>>>>> processed by a user space task. What you messure it part of the network
>>>>>>>> stack overhead but 100us just for alloc_can_skb() seems quite a lot to
>>>>>>>> me. At what frequency is your CPU running? Is the system low of memory?
>>>>>>>> Maybe your system is simply not fast enough. To see what code is
>>>>>>>> involved just follow:
>>>>>>> CPU is i.MX25, should be running at 400MHz. There is 64MB RAM totally
>>>>>>> and free enough :(
>>>>>>>
>>>>>>>>
>>>>>>>>      http://lxr.free-electrons.com/ident?i=alloc_can_skb
>>>>>>>>> So I am asking - how to make this alloc_can_skb faster (or is there
>>>>>>>>> any
>>>>>>>>> alternative)? Or if there is another way how to get data to user?
>>>>>>>>
>>>>>>>> Well, not with Linux-CAN. Anyway, messages arrive at a rate of
>>>>>>>> approx. 1
>>>>>>>> KHz. So there is 1ms per message. I think it's a latency problem in the
>>>>>>>> first place. The Flexcan on the i.MX25 can queue up to 5 messages. If
>>>>>>>> the queue is full you loose messages. This obviously happens when the
>>>>>>>> SDcard is accessed.
>>>>>>>>
>>>>>>>> Could you take function traces on your system?
>>>>>>> Is there any special tool for this or should I use my start/stop timers?
>>>>>>
>>>>>> Your start/stop timers will not show what other activity is disturbing
>>>>>> the CAN messages reception. There is the Linux function tracer:
>>>>>>
>>>>>>     http://lxr.free-electrons.com/source/Documentation/trace/ftrace.txt
>>>>>>
>>>>>> It need to be enabled in the kernel. Especially event and function
>>>>>> tracing could help to better understand your problems.
>>>>> Hello Wolfgang,
>>>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>>>> that easy, will be?
>>>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>>>> that won't help me much :(
>>>>
>>>> Probably that version is to old for proper ftrace support. The 100us you
>>>> measured for alloc_can_skb() is worst case, right? What is the mean value?
>>>
>>> Flexcan support was added to the mainline kernel 2.6.36. Where did you
>>> get your flexcan driver from? Could you post it please here? Any chance
>>> to switch to a (more) recent version of the Linux kernel?
>>
>> It is 2.6.35.9 kernel, I think flexcan was backported from 2.6.36 by board developer who made patch for this kernel. But I am keeping it "updated" with latest updates from 3.x
>> kernel, so I think there should be no errors in this driver.
>> I tried to switch to 2.6.39 but I think there were some errors with the kernel patch that adds support for this board so I gave up. May be I should try again.
>
> What board are you using?
>
> I've had good success with the mainline kernel 3.4 on i.MX25
> n.b. I haven't tried CAN with that board yet, but the flexcan
> driver is standard in that version.
>
This one: http://www.voipac.com/#X25-DMM-254
They don't have patch for 3.x yet, they are working on WinCE for this one now :) So may be later ...

Martin


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 12:40             ` Martin Kozusky
@ 2013-10-29 14:30               ` Wolfgang Grandegger
  2013-10-30  8:43                 ` "Martin Kožuský [KK micro s.r.o.]"
  0 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-29 14:30 UTC (permalink / raw)
  To: Martin Kozusky, linux-can

On 10/29/2013 01:40 PM, Martin Kozusky wrote:
> Dne 29.10.2013 13:03, Wolfgang Grandegger napsal(a):
>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
...
>>> Hello Wolfgang,
>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>> that easy, will be?
>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>> that won't help me much :(

Are high resolution timers enabled in the kernel? Still, event tracing
could would already be useful.

>> Probably that version is to old for proper ftrace support. The 100us you
>> measured for alloc_can_skb() is worst case, right? What is the mean
>> value?
> 
> Now I checked again and logged every call (don't know if realy
> everything was logged but something was :) and I see that it is not
> 100usec, but only around 20usec (mean value - checked by eye). There
> were some very long calls (around 2ms!) that were puttings errors in my 
> sum/count formula (may be I should filter out calls longer that
> 200usec), with this error it was not 100usec, but almost 1ms (my value
> was only 32bit and was overflowing when I wrote it first time). So
> normally around 20usec, but with very long calls around 2-3 ms (looks
> like those long are periodic - each 6th - 8th call is much longer, but
> not all the time)

Are these long latencies related to the SDcard accesses? I think the
problem is the rather long latencies caused by other kernel activity. In
your case caused by the MMC (SDcard) driver, I assume. The Flexcan
controller does buffer up to 5 messages before loosing packets.

> But still with those 20usec call, there are many RX overflows, if I
> disable the call alloc_can_skb, there are no overflows and all is
> received (but still not processed further, because I don't have skb ... )

Could you run this test for a longer time? The probability is just lower
that RX gets interrupted for a longer time, I think.

I see a few approaches to overcome this problem:

- fix the MMC driver to cause less latency. If you are lucky this is
  already the case in more recent versions of the kernel.

- Use the "-rt" extension (CONFIG_PREEPMT_RT). It will then allow to
  adjust priorities of soft and hard irq threads.

- Do the RX processing in the interrupt context, which would require
  some custom modifications.

Wolfgang.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-29 14:30               ` Wolfgang Grandegger
@ 2013-10-30  8:43                 ` "Martin Kožuský [KK micro s.r.o.]"
  2013-10-30  9:04                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 17+ messages in thread
From: "Martin Kožuský [KK micro s.r.o.]" @ 2013-10-30  8:43 UTC (permalink / raw)
  To: Wolfgang Grandegger, linux-can

-------- Original Message  --------
Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
From: Wolfgang Grandegger
To: Martin Kozusky, linux-can@vger.kernel.org
Date: 29. Říjen 2013 15:30:08

> On 10/29/2013 01:40 PM, Martin Kozusky wrote:
>> Dne 29.10.2013 13:03, Wolfgang Grandegger napsal(a):
>>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
> ...
>>>> Hello Wolfgang,
>>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>>> that easy, will be?
>>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>>> that won't help me much :(
>
> Are high resolution timers enabled in the kernel? Still, event tracing
> could would already be useful.
>
>>> Probably that version is to old for proper ftrace support. The 100us you
>>> measured for alloc_can_skb() is worst case, right? What is the mean
>>> value?
>>
>> Now I checked again and logged every call (don't know if realy
>> everything was logged but something was :) and I see that it is not
>> 100usec, but only around 20usec (mean value - checked by eye). There
>> were some very long calls (around 2ms!) that were puttings errors in my
>> sum/count formula (may be I should filter out calls longer that
>> 200usec), with this error it was not 100usec, but almost 1ms (my value
>> was only 32bit and was overflowing when I wrote it first time). So
>> normally around 20usec, but with very long calls around 2-3 ms (looks
>> like those long are periodic - each 6th - 8th call is much longer, but
>> not all the time)
>
> Are these long latencies related to the SDcard accesses? I think the
> problem is the rather long latencies caused by other kernel activity. In
> your case caused by the MMC (SDcard) driver, I assume. The Flexcan
> controller does buffer up to 5 messages before loosing packets.
I think it is not primary related to SD card, those tests I was doing lately were done when system was idle, no special processes were running. But when I do access SD card, problems get bigger.

>> But still with those 20usec call, there are many RX overflows, if I
>> disable the call alloc_can_skb, there are no overflows and all is
>> received (but still not processed further, because I don't have skb ... )
>
> Could you run this test for a longer time? The probability is just lower
> that RX gets interrupted for a longer time, I think.
I have run it for a few minutes, problem is still the same :(
  
> I see a few approaches to overcome this problem:
>
> - fix the MMC driver to cause less latency. If you are lucky this is
>    already the case in more recent versions of the kernel.
Hard to say if this helps when it's also doing in idle.

> - Use the "-rt" extension (CONFIG_PREEPMT_RT). It will then allow to
>    adjust priorities of soft and hard irq threads.
I don't know if there is a patch for this :(
  
> - Do the RX processing in the interrupt context, which would require
>    some custom modifications.
I was thinking about writing data directly to fifo file, my program would read from it
  
Martin

> Wolfgang.
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-30  8:43                 ` "Martin Kožuský [KK micro s.r.o.]"
@ 2013-10-30  9:04                   ` Wolfgang Grandegger
  2013-10-30  9:14                     ` "Martin Kožuský [KK micro s.r.o.]"
  0 siblings, 1 reply; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-30  9:04 UTC (permalink / raw)
  To: "Martin Kožuský [KK micro s.r.o.]", linux-can

On 10/30/2013 09:43 AM, "Martin Kožuský [KK micro s.r.o.]" wrote:
> -------- Original Message  --------
> Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued
> (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
> From: Wolfgang Grandegger
> To: Martin Kozusky, linux-can@vger.kernel.org
> Date: 29. Říjen 2013 15:30:08
> 
>> On 10/29/2013 01:40 PM, Martin Kozusky wrote:
>>> Dne 29.10.2013 13:03, Wolfgang Grandegger napsal(a):
>>>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>> ...
>>>>> Hello Wolfgang,
>>>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>>>> that easy, will be?
>>>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>>>> that won't help me much :(
>>
>> Are high resolution timers enabled in the kernel? Still, event tracing
>> could would already be useful.
>>
>>>> Probably that version is to old for proper ftrace support. The 100us
>>>> you
>>>> measured for alloc_can_skb() is worst case, right? What is the mean
>>>> value?
>>>
>>> Now I checked again and logged every call (don't know if realy
>>> everything was logged but something was :) and I see that it is not
>>> 100usec, but only around 20usec (mean value - checked by eye). There
>>> were some very long calls (around 2ms!) that were puttings errors in my
>>> sum/count formula (may be I should filter out calls longer that
>>> 200usec), with this error it was not 100usec, but almost 1ms (my value
>>> was only 32bit and was overflowing when I wrote it first time). So
>>> normally around 20usec, but with very long calls around 2-3 ms (looks
>>> like those long are periodic - each 6th - 8th call is much longer, but
>>> not all the time)
>>
>> Are these long latencies related to the SDcard accesses? I think the
>> problem is the rather long latencies caused by other kernel activity. In
>> your case caused by the MMC (SDcard) driver, I assume. The Flexcan
>> controller does buffer up to 5 messages before loosing packets.
> I think it is not primary related to SD card, those tests I was doing
> lately were done when system was idle, no special processes were
> running. But when I do access SD card, problems get bigger.
> 
>>> But still with those 20usec call, there are many RX overflows, if I
>>> disable the call alloc_can_skb, there are no overflows and all is
>>> received (but still not processed further, because I don't have skb
>>> ... )
>>
>> Could you run this test for a longer time? The probability is just lower
>> that RX gets interrupted for a longer time, I think.
> I have run it for a few minutes, problem is still the same :(
>  
>> I see a few approaches to overcome this problem:
>>
>> - fix the MMC driver to cause less latency. If you are lucky this is
>>    already the case in more recent versions of the kernel.
> Hard to say if this helps when it's also doing in idle.

OK, then I misinterpreted your error description.

>> - Use the "-rt" extension (CONFIG_PREEPMT_RT). It will then allow to
>>    adjust priorities of soft and hard irq threads.
> I don't know if there is a patch for this :(
>  
>> - Do the RX processing in the interrupt context, which would require
>>    some custom modifications.
> I was thinking about writing data directly to fifo file, my program
> would read from it

Well, before hacking something you should try to find out what is
provoking the long latencies (> 1ms). FTrace is your friend. Therefore I
would try to get a more recent version of the kernel running somehow.
2.6.39 should already be much better.

Wolfgang.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-30  9:04                   ` Wolfgang Grandegger
@ 2013-10-30  9:14                     ` "Martin Kožuský [KK micro s.r.o.]"
  2013-10-30  9:27                       ` Wolfgang Grandegger
  0 siblings, 1 reply; 17+ messages in thread
From: "Martin Kožuský [KK micro s.r.o.]" @ 2013-10-30  9:14 UTC (permalink / raw)
  To: Wolfgang Grandegger, linux-can

-------- Original Message  --------
Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
From: Wolfgang Grandegger
To: Martin Kožuský [KK micro s.r.o.], linux-can@vger.kernel.org
Date: 30. Říjen 2013 10:04:13

> On 10/30/2013 09:43 AM, "Martin Kožuský [KK micro s.r.o.]" wrote:
>> -------- Original Message  --------
>> Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued
>> (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
>> From: Wolfgang Grandegger
>> To: Martin Kozusky, linux-can@vger.kernel.org
>> Date: 29. Říjen 2013 15:30:08
>>
>>> On 10/29/2013 01:40 PM, Martin Kozusky wrote:
>>>> Dne 29.10.2013 13:03, Wolfgang Grandegger napsal(a):
>>>>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>>> ...
>>>>>> Hello Wolfgang,
>>>>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>>>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it won't be
>>>>>> that easy, will be?
>>>>>> Timestamps that ftrace is showing me are in 10 miliseconds resolution,
>>>>>> that won't help me much :(
>>>
>>> Are high resolution timers enabled in the kernel? Still, event tracing
>>> could would already be useful.
>>>
>>>>> Probably that version is to old for proper ftrace support. The 100us
>>>>> you
>>>>> measured for alloc_can_skb() is worst case, right? What is the mean
>>>>> value?
>>>>
>>>> Now I checked again and logged every call (don't know if realy
>>>> everything was logged but something was :) and I see that it is not
>>>> 100usec, but only around 20usec (mean value - checked by eye). There
>>>> were some very long calls (around 2ms!) that were puttings errors in my
>>>> sum/count formula (may be I should filter out calls longer that
>>>> 200usec), with this error it was not 100usec, but almost 1ms (my value
>>>> was only 32bit and was overflowing when I wrote it first time). So
>>>> normally around 20usec, but with very long calls around 2-3 ms (looks
>>>> like those long are periodic - each 6th - 8th call is much longer, but
>>>> not all the time)
>>>
>>> Are these long latencies related to the SDcard accesses? I think the
>>> problem is the rather long latencies caused by other kernel activity. In
>>> your case caused by the MMC (SDcard) driver, I assume. The Flexcan
>>> controller does buffer up to 5 messages before loosing packets.
>> I think it is not primary related to SD card, those tests I was doing
>> lately were done when system was idle, no special processes were
>> running. But when I do access SD card, problems get bigger.
>>
>>>> But still with those 20usec call, there are many RX overflows, if I
>>>> disable the call alloc_can_skb, there are no overflows and all is
>>>> received (but still not processed further, because I don't have skb
>>>> ... )
>>>
>>> Could you run this test for a longer time? The probability is just lower
>>> that RX gets interrupted for a longer time, I think.
>> I have run it for a few minutes, problem is still the same :(
>>
>>> I see a few approaches to overcome this problem:
>>>
>>> - fix the MMC driver to cause less latency. If you are lucky this is
>>>     already the case in more recent versions of the kernel.
>> Hard to say if this helps when it's also doing in idle.
>
> OK, then I misinterpreted your error description.
>
>>> - Use the "-rt" extension (CONFIG_PREEPMT_RT). It will then allow to
>>>     adjust priorities of soft and hard irq threads.
>> I don't know if there is a patch for this :(
>>
>>> - Do the RX processing in the interrupt context, which would require
>>>     some custom modifications.
>> I was thinking about writing data directly to fifo file, my program
>> would read from it
>
> Well, before hacking something you should try to find out what is
> provoking the long latencies (> 1ms). FTrace is your friend. Therefore I
> would try to get a more recent version of the kernel running somehow.
> 2.6.39 should already be much better.
I will try to get new kernel running, but when I checked the patch for 2.6.35, I see some incompatibilites in directory structure and files, so I will have to go line-by-line and make my patch that will fit the new kernel. And when I do that, I will try it on latest 3.x kernel. I hope I will make it work.

Martin
  
> Wolfgang.
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
  2013-10-30  9:14                     ` "Martin Kožuský [KK micro s.r.o.]"
@ 2013-10-30  9:27                       ` Wolfgang Grandegger
  0 siblings, 0 replies; 17+ messages in thread
From: Wolfgang Grandegger @ 2013-10-30  9:27 UTC (permalink / raw)
  To: "Martin Kožuský [KK micro s.r.o.]", linux-can

On 10/30/2013 10:14 AM, "Martin Kožuský [KK micro s.r.o.]" wrote:
> -------- Original Message  --------
> Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued
> (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
> From: Wolfgang Grandegger
> To: Martin Kožuský [KK micro s.r.o.], linux-can@vger.kernel.org
> Date: 30. Říjen 2013 10:04:13
> 
>> On 10/30/2013 09:43 AM, "Martin Kožuský [KK micro s.r.o.]" wrote:
>>> -------- Original Message  --------
>>> Subject: Re: CAN messages being lost on i.MX25 with flexcan - continued
>>> (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19)
>>> From: Wolfgang Grandegger
>>> To: Martin Kozusky, linux-can@vger.kernel.org
>>> Date: 29. Říjen 2013 15:30:08
>>>
>>>> On 10/29/2013 01:40 PM, Martin Kozusky wrote:
>>>>> Dne 29.10.2013 13:03, Wolfgang Grandegger napsal(a):
>>>>>> On 10/29/2013 11:46 AM, Martin Kozusky wrote:
>>>> ...
>>>>>>> Hello Wolfgang,
>>>>>>> it seems that my architecture (arm/mx25 on 2.6.35 kernel) is missing
>>>>>>> HAVE_FUNCTION_GRAPH_TRACER, HAVE_DYNAMIC_FTRACE options so it
>>>>>>> won't be
>>>>>>> that easy, will be?
>>>>>>> Timestamps that ftrace is showing me are in 10 miliseconds
>>>>>>> resolution,
>>>>>>> that won't help me much :(
>>>>
>>>> Are high resolution timers enabled in the kernel? Still, event tracing
>>>> could would already be useful.
>>>>
>>>>>> Probably that version is to old for proper ftrace support. The 100us
>>>>>> you
>>>>>> measured for alloc_can_skb() is worst case, right? What is the mean
>>>>>> value?
>>>>>
>>>>> Now I checked again and logged every call (don't know if realy
>>>>> everything was logged but something was :) and I see that it is not
>>>>> 100usec, but only around 20usec (mean value - checked by eye). There
>>>>> were some very long calls (around 2ms!) that were puttings errors
>>>>> in my
>>>>> sum/count formula (may be I should filter out calls longer that
>>>>> 200usec), with this error it was not 100usec, but almost 1ms (my value
>>>>> was only 32bit and was overflowing when I wrote it first time). So
>>>>> normally around 20usec, but with very long calls around 2-3 ms (looks
>>>>> like those long are periodic - each 6th - 8th call is much longer, but
>>>>> not all the time)
>>>>
>>>> Are these long latencies related to the SDcard accesses? I think the
>>>> problem is the rather long latencies caused by other kernel
>>>> activity. In
>>>> your case caused by the MMC (SDcard) driver, I assume. The Flexcan
>>>> controller does buffer up to 5 messages before loosing packets.
>>> I think it is not primary related to SD card, those tests I was doing
>>> lately were done when system was idle, no special processes were
>>> running. But when I do access SD card, problems get bigger.
>>>
>>>>> But still with those 20usec call, there are many RX overflows, if I
>>>>> disable the call alloc_can_skb, there are no overflows and all is
>>>>> received (but still not processed further, because I don't have skb
>>>>> ... )
>>>>
>>>> Could you run this test for a longer time? The probability is just
>>>> lower
>>>> that RX gets interrupted for a longer time, I think.
>>> I have run it for a few minutes, problem is still the same :(

BTW: with long I was thinking about hours rather than minutes.

>>>
>>>> I see a few approaches to overcome this problem:
>>>>
>>>> - fix the MMC driver to cause less latency. If you are lucky this is
>>>>     already the case in more recent versions of the kernel.
>>> Hard to say if this helps when it's also doing in idle.
>>
>> OK, then I misinterpreted your error description.
>>
>>>> - Use the "-rt" extension (CONFIG_PREEPMT_RT). It will then allow to
>>>>     adjust priorities of soft and hard irq threads.
>>> I don't know if there is a patch for this :(
>>>
>>>> - Do the RX processing in the interrupt context, which would require
>>>>     some custom modifications.
>>> I was thinking about writing data directly to fifo file, my program
>>> would read from it
>>
>> Well, before hacking something you should try to find out what is
>> provoking the long latencies (> 1ms). FTrace is your friend. Therefore I
>> would try to get a more recent version of the kernel running somehow.
>> 2.6.39 should already be much better.
> I will try to get new kernel running, but when I checked the patch for
> 2.6.35, I see some incompatibilites in directory structure and files, so
> I will have to go line-by-line and make my patch that will fit the new
> kernel. And when I do that, I will try it on latest 3.x kernel. I hope I
> will make it work.

Porting board support to a recent kernel version would be ideal, of
course. But it might be much less straight-forward than porting to a
close version, e.g. 2.6.39. Note that version 2.6.35 is more 3 years old.

Wolfgang.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-10-30  9:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-24 13:48 CAN messages being lost on i.MX25 with flexcan - continued (was CAN messages being lost on i.MX25 with flexcan - 2012-04-19) Martin Kozusky
2013-10-25 12:59 ` Martin Kozusky
2013-10-25 17:58   ` Wolfgang Grandegger
2013-10-26 18:18     ` Martin Kozusky
2013-10-26 19:40       ` Wolfgang Grandegger
2013-10-29 10:46         ` Martin Kozusky
2013-10-29 12:03           ` Wolfgang Grandegger
2013-10-29 12:22             ` Wolfgang Grandegger
2013-10-29 12:49               ` Martin Kozusky
2013-10-29 12:54                 ` Gary Thomas
2013-10-29 13:00                   ` "Martin Kožuský [KK micro s.r.o.]"
2013-10-29 12:40             ` Martin Kozusky
2013-10-29 14:30               ` Wolfgang Grandegger
2013-10-30  8:43                 ` "Martin Kožuský [KK micro s.r.o.]"
2013-10-30  9:04                   ` Wolfgang Grandegger
2013-10-30  9:14                     ` "Martin Kožuský [KK micro s.r.o.]"
2013-10-30  9:27                       ` Wolfgang Grandegger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.