All of lore.kernel.org
 help / color / mirror / Atom feed
* Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
@ 2016-04-05 13:10 Amr Bekhit
  2016-04-08  7:39 ` Wolfgang Grandegger
  2016-05-02  6:23 ` Alexander Stein
  0 siblings, 2 replies; 16+ messages in thread
From: Amr Bekhit @ 2016-04-05 13:10 UTC (permalink / raw)
  To: wg, mkl; +Cc: linux-can

Hello,

<Sorry for the re-send. I initially sent this as HTML, but then found
out that it was recommended to send all emails as plaintext, hence the
resend>

I working on a board based on the AT91SAM9X25 SoC and I'm using
integrated CAN peripheral. I seem to have run into an issue whereby
sending lots of messages very rapidly in quick succession causes the
CAN peripheral to then stop receiving any messages at all. The only
way to bring it back to a functional state is to bring the network
interface down and then back up again.

The problem can be replicated as follows:

The CAN interface is initialised using:

ip link set can0 type can bitrate 100000 restart-ms 100
ifconfig can0 up

I then start sending CAN messages to the unit using a PCAN-USB adapter
that is plugged into a test Linux PC. After bringing up the CAN
interface on the test PC, messages can be continuously sent using the
following bash script:

#!/bin/bash

while :
do
cansend can0 123#DEADBEEFDEADBEEF
done

After running the script, I check that messages are being received on
the AT91 target by running

ifconfig can0

and checking that the number of received packets is increasing.

I then leave the system running for some time (1.5 hours typically,
may vary), periodically running ifconfig can0 to check to see if new
packets are being received. After a while, the can interface will stop
receiving new packets, even though the test PC is still transmitting
them. Stopping and restarting the CAN transmissions on the test PC
does not solve the problem. The interface does not appear to be in the
bus off state, as shown by running the following:

# ip -details -statistics link show can0
2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
          bitrate 99950 sample-point 0.739
          tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
          at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
          clock 133333333
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    12609768   1576221  5       0       5       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0


# ifconfig can0
can0      Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10
          RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
          Interrupt:40

Using the devmem command line program and a custom python script, I
dumped the contents of the CAN peripheral registers to a file. When
the AT91 CAN peripheral is in the failed state, here is what the
peripheral memory looks like:

Dumping memory from 0xF8004000 to 0xF8004000:
0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

I noticed that the RBSY flag is set, even though there was nothing
transmitted to the CAN bus. All of the message boxes had data inside
ready to be retrieved.

If there are any other test you would like me to carry out, just let me know.

Regards,

Amr Bekhit

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-05 13:10 Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages Amr Bekhit
@ 2016-04-08  7:39 ` Wolfgang Grandegger
  2016-04-29  8:04   ` Amr Bekhit
  2016-05-02  6:23 ` Alexander Stein
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-04-08  7:39 UTC (permalink / raw)
  To: Amr Bekhit, mkl; +Cc: linux-can

Hello,

Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
> Hello,
>
> <Sorry for the re-send. I initially sent this as HTML, but then found
> out that it was recommended to send all emails as plaintext, hence the
> resend>
>
> I working on a board based on the AT91SAM9X25 SoC and I'm using
> integrated CAN peripheral. I seem to have run into an issue whereby
> sending lots of messages very rapidly in quick succession causes the
> CAN peripheral to then stop receiving any messages at all. The only
> way to bring it back to a functional state is to bring the network
> interface down and then back up again.
>
> The problem can be replicated as follows:
>
> The CAN interface is initialised using:
>
> ip link set can0 type can bitrate 100000 restart-ms 100
> ifconfig can0 up
>
> I then start sending CAN messages to the unit using a PCAN-USB adapter
> that is plugged into a test Linux PC. After bringing up the CAN
> interface on the test PC, messages can be continuously sent using the
> following bash script:
>
> #!/bin/bash
>
> while :
> do
> cansend can0 123#DEADBEEFDEADBEEF
> done
>
> After running the script, I check that messages are being received on
> the AT91 target by running
>
> ifconfig can0
>
> and checking that the number of received packets is increasing.
>
> I then leave the system running for some time (1.5 hours typically,
> may vary), periodically running ifconfig can0 to check to see if new
> packets are being received. After a while, the can interface will stop
> receiving new packets, even though the test PC is still transmitting
> them. Stopping and restarting the CAN transmissions on the test PC
> does not solve the problem. The interface does not appear to be in the
> bus off state, as shown by running the following:
>
> # ip -details -statistics link show can0
> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
> UNKNOWN mode DEFAULT group default qlen 10
>      link/can  promiscuity 0
>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>            bitrate 99950 sample-point 0.739
>            tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>            at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>            clock 133333333
>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>            0          0          0          0          0          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      12609768   1576221  5       0       5       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      0          0        0       0       0       0
>
>
> # ifconfig can0
> can0      Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>            UP RUNNING NOARP  MTU:16  Metric:1
>            RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:10
>            RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>            Interrupt:40
>
> Using the devmem command line program and a custom python script, I
> dumped the contents of the CAN peripheral registers to a file. When
> the AT91 CAN peripheral is in the failed state, here is what the
> peripheral memory looks like:
>
> Dumping memory from 0xF8004000 to 0xF8004000:
> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> I noticed that the RBSY flag is set, even though there was nothing
> transmitted to the CAN bus. All of the message boxes had data inside
> ready to be retrieved.
>
> If there are any other test you would like me to carry out, just let me know.

Where did your Linux kernel come from and what version are you using? 
Also interesting is:

- how fast is your CPU (frequency)?
- the output of "/proc/interrrupts".
- run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
- use "cangen" or even better "canfdtest" for testing.

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-08  7:39 ` Wolfgang Grandegger
@ 2016-04-29  8:04   ` Amr Bekhit
       [not found]     ` <CAOLz05oo=EGqvmCaXXBhXs5McMmJDPKCzuiij7Pv22fj5hPB_g@mail.gmail.com>
  2016-04-29 11:18     ` Wolfgang Grandegger
  0 siblings, 2 replies; 16+ messages in thread
From: Amr Bekhit @ 2016-04-29  8:04 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: mkl, linux-can

Hello,

Thanks to both of you for your responses.

@Menchel: From my observations, the problem is when you have several
CAN packets that are sent in quick succession. I initially started
seeing this problem while trying to interface my embedded board to a
3rd party CAN device. The CAN device only sent 10 messages every
second at 100kBaud (so not a huge amount of traffic), but they were
all sent in one burst, one immediately after the other. If I modify my
test script to send the CAN messages with a 5ms delay in between, I
don't seem to have this problem.

@Wolfgang: The CPU is an AT91SAM9X25 SoC running at 400Mhz.

I've carried out another test where I've run "canfdtest -vv -g can0"
on a host PC with a PCAN-USB device attached and "canfdtest -vv can0"
on the embedded device and let it run overnight (from 5pm till 8.30am
the following day). At the end of the test, the host PC had stopped
sending any more data (there was no more terminal output indicating
that bytes had been sent) and likewise, the embedded system was not
receiving any data.

(As a side note, I could not Ctrl-C out of the running canfdtest on
the embedded system - I ended up having to SSH into the embedded
system to get access to another terminal so I could run some commands
- only kill -9 would kill the process)

Even after killing and restarting the canfdtest processes on both host
and embedded computers, no can messages were sent. I had to bring the
interface down then back up again on the embedded system before the
two programs starting showing that messages were being sent and
recevied.

I've run the following commands on the embedded system at the end of the test:

# cat /proc/interrupts
           CPU0
 16:  102823451  atmel-aic   1 Level     pmc, at91_tick, ttyS0
 17:          0       PMC  17 Level     main_rc_osc
 18:          0       PMC   0 Level     main_osc
 19:          0       PMC  16 Level     mainck
 20:          0       PMC   1 Level     clk-plla
 21:          1       PMC   6 Level     clk-utmi
 22:          0       PMC   3 Level     clk-master
 23:    7352177  atmel-aic  17 Level     tc_clkevt
 24:      24128  atmel-aic  20 Level     at_hdmac
 25:          0  atmel-aic  21 Level     at_hdmac
 29:         42  atmel-aic  12 Level     f0008000.mmc
 32:    3134371  atmel-aic   9 Level     f8010000.i2c
 34:          3  atmel-aic  16 Level     ttyS6
 35:          0  atmel-aic  19 Level     at91_adc
 36:       6881  atmel-aic  13 Level     f0000000.spi
 37:          0  atmel-aic  23 Level     atmel_usba_udc
 39:          0  atmel-aic  24 Level     eth0
 40:    4952869  atmel-aic  30 Level     can0
 41:     348546  atmel-aic  22 Level     ehci_hcd:usb1, ohci_hcd:usb2
 90:          0      GPIO  16 Edge      atmel_usba_udc
140:          0      GPIO  15 Edge      mmc-detect
Err:          0

# ifconfig can0
can0      Link encap:UNSPEC  HWaddr
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          UP RUNNING NOARP  MTU:16  Metric:1
          RX packets:2476577 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2476527 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:10
          RX bytes:19812616 (18.8 MiB)  TX bytes:19812216 (18.8 MiB)
          Interrupt:40

# ip -details -statistics link show can0
2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
          bitrate 99950 sample-point 0.739
          tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
          at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
          clock 133333333
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    19812616   2476577  0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    19812216   2476527  0       0       0       0

I did another dump of the can peripheral register memory after the
test. Here are the results:

Dumping memory from 0xF8004000 to 0xF8004000:
0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111
0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
0xF8004018: [0000B2E4] 0000 0000 0000 0000 1011 0010 1110 0100
0xF800401C: [000052D3] 0000 0000 0000 0000 0101 0010 1101 0011
0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004208: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
0xF800420C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004210: [01885139] 0000 0001 1000 1000 0101 0001 0011 1001
0xF8004214: [21201F1E] 0010 0001 0010 0000 0001 1111 0001 1110
0xF8004218: [25242322] 0010 0101 0010 0100 0010 0011 0010 0010
0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004228: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
0xF800422C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004230: [018851AD] 0000 0001 1000 1000 0101 0001 1010 1101
0xF8004234: [2221201F] 0010 0010 0010 0001 0010 0000 0001 1111
0xF8004238: [26252423] 0010 0110 0010 0101 0010 0100 0010 0011
0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004248: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
0xF800424C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004250: [008852D3] 0000 0000 1000 1000 0101 0010 1101 0011
0xF8004254: [23222120] 0010 0011 0010 0010 0010 0001 0010 0000
0xF8004258: [27262524] 0010 0111 0010 0110 0010 0101 0010 0100
0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004268: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
0xF800426C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004270: [00084FD9] 0000 0000 0000 1000 0100 1111 1101 1001
0xF8004274: [201F1E1D] 0010 0000 0001 1111 0001 1110 0001 1101
0xF8004278: [24232221] 0010 0100 0010 0011 0010 0010 0010 0001
0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004288: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
0xF800428C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF8004290: [000876D4] 0000 0000 0000 1000 0111 0110 1101 0100
0xF8004294: [78777675] 0111 1000 0111 0111 0111 0110 0111 0101
0xF8004298: [7C7B7A79] 0111 1100 0111 1011 0111 1010 0111 1001
0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042A8: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
0xF80042AC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042B0: [0008C8D2] 0000 0000 0000 1000 1100 1000 1101 0010
0xF80042B4: [05040302] 0000 0101 0000 0100 0000 0011 0000 0010
0xF80042B8: [09080706] 0000 1001 0000 1000 0000 0111 0000 0110
0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80042C0: [03070000] 0000 0011 0000 0111 0000 0000 0000 0000
0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042C8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
0xF80042CC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
0xF80042D0: [00885220] 0000 0000 1000 1000 0101 0010 0010 0000
0xF80042D4: [F2F1F0EF] 1111 0010 1111 0001 1111 0000 1110 1111
0xF80042D8: [F6F5F4F3] 1111 0110 1111 0101 1111 0100 1111 0011
0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

0xF80042E0: [03060000] 0000 0011 0000 0110 0000 0000 0000 0000
0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
0xF80042E8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
0xF80042EC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
0xF80042F0: [008850C3] 0000 0000 1000 1000 0101 0000 1100 0011
0xF80042F4: [F1F0EFEE] 1111 0001 1111 0000 1110 1111 1110 1110
0xF80042F8: [F5F4F3F2] 1111 0101 1111 0100 1111 0011 1111 0010
0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000

On 8 April 2016 at 08:39, Wolfgang Grandegger <wg@grandegger.com> wrote:
> Hello,
>
>
> Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
>>
>> Hello,
>>
>> <Sorry for the re-send. I initially sent this as HTML, but then found
>> out that it was recommended to send all emails as plaintext, hence the
>> resend>
>>
>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>> integrated CAN peripheral. I seem to have run into an issue whereby
>> sending lots of messages very rapidly in quick succession causes the
>> CAN peripheral to then stop receiving any messages at all. The only
>> way to bring it back to a functional state is to bring the network
>> interface down and then back up again.
>>
>> The problem can be replicated as follows:
>>
>> The CAN interface is initialised using:
>>
>> ip link set can0 type can bitrate 100000 restart-ms 100
>> ifconfig can0 up
>>
>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>> that is plugged into a test Linux PC. After bringing up the CAN
>> interface on the test PC, messages can be continuously sent using the
>> following bash script:
>>
>> #!/bin/bash
>>
>> while :
>> do
>> cansend can0 123#DEADBEEFDEADBEEF
>> done
>>
>> After running the script, I check that messages are being received on
>> the AT91 target by running
>>
>> ifconfig can0
>>
>> and checking that the number of received packets is increasing.
>>
>> I then leave the system running for some time (1.5 hours typically,
>> may vary), periodically running ifconfig can0 to check to see if new
>> packets are being received. After a while, the can interface will stop
>> receiving new packets, even though the test PC is still transmitting
>> them. Stopping and restarting the CAN transmissions on the test PC
>> does not solve the problem. The interface does not appear to be in the
>> bus off state, as shown by running the following:
>>
>> # ip -details -statistics link show can0
>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>> UNKNOWN mode DEFAULT group default qlen 10
>>      link/can  promiscuity 0
>>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>            bitrate 99950 sample-point 0.739
>>            tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>            at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>>            clock 133333333
>>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>>            0          0          0          0          0          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      12609768   1576221  5       0       5       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      0          0        0       0       0       0
>>
>>
>> # ifconfig can0
>> can0      Link encap:UNSPEC  HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>            UP RUNNING NOARP  MTU:16  Metric:1
>>            RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:10
>>            RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>>            Interrupt:40
>>
>> Using the devmem command line program and a custom python script, I
>> dumped the contents of the CAN peripheral registers to a file. When
>> the AT91 CAN peripheral is in the failed state, here is what the
>> peripheral memory looks like:
>>
>> Dumping memory from 0xF8004000 to 0xF8004000:
>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
>> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
>> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
>> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
>> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
>> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
>> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
>> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
>> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> I noticed that the RBSY flag is set, even though there was nothing
>> transmitted to the CAN bus. All of the message boxes had data inside
>> ready to be retrieved.
>>
>> If there are any other test you would like me to carry out, just let me
>> know.
>
>
> Where did your Linux kernel come from and what version are you using? Also
> interesting is:
>
> - how fast is your CPU (frequency)?
> - the output of "/proc/interrrupts".
> - run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
> - use "cangen" or even better "canfdtest" for testing.
>
> Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
       [not found]     ` <CAOLz05oo=EGqvmCaXXBhXs5McMmJDPKCzuiij7Pv22fj5hPB_g@mail.gmail.com>
@ 2016-04-29  8:15       ` Amr Bekhit
  0 siblings, 0 replies; 16+ messages in thread
From: Amr Bekhit @ 2016-04-29  8:15 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: mkl, linux-can

The Linux kernel is mainline version 4.3, taken from kernel.org, which
has been retrieved and built via Buildroot 2015.11.1

On 29 April 2016 at 09:07, Amr Bekhit <amrbekhit@gmail.com> wrote:
> The Linux kernel is mainline version 4.3, taken from kernel.org, which has
> been retrieved and built via Buildroot 2015.11.1
>
> On 29 April 2016 at 09:04, Amr Bekhit <amrbekhit@gmail.com> wrote:
>>
>> Hello,
>>
>> Thanks to both of you for your responses.
>>
>> @Menchel: From my observations, the problem is when you have several
>> CAN packets that are sent in quick succession. I initially started
>> seeing this problem while trying to interface my embedded board to a
>> 3rd party CAN device. The CAN device only sent 10 messages every
>> second at 100kBaud (so not a huge amount of traffic), but they were
>> all sent in one burst, one immediately after the other. If I modify my
>> test script to send the CAN messages with a 5ms delay in between, I
>> don't seem to have this problem.
>>
>> @Wolfgang: The CPU is an AT91SAM9X25 SoC running at 400Mhz.
>>
>> I've carried out another test where I've run "canfdtest -vv -g can0"
>> on a host PC with a PCAN-USB device attached and "canfdtest -vv can0"
>> on the embedded device and let it run overnight (from 5pm till 8.30am
>> the following day). At the end of the test, the host PC had stopped
>> sending any more data (there was no more terminal output indicating
>> that bytes had been sent) and likewise, the embedded system was not
>> receiving any data.
>>
>> (As a side note, I could not Ctrl-C out of the running canfdtest on
>> the embedded system - I ended up having to SSH into the embedded
>> system to get access to another terminal so I could run some commands
>> - only kill -9 would kill the process)
>>
>> Even after killing and restarting the canfdtest processes on both host
>> and embedded computers, no can messages were sent. I had to bring the
>> interface down then back up again on the embedded system before the
>> two programs starting showing that messages were being sent and
>> recevied.
>>
>> I've run the following commands on the embedded system at the end of the
>> test:
>>
>> # cat /proc/interrupts
>>            CPU0
>>  16:  102823451  atmel-aic   1 Level     pmc, at91_tick, ttyS0
>>  17:          0       PMC  17 Level     main_rc_osc
>>  18:          0       PMC   0 Level     main_osc
>>  19:          0       PMC  16 Level     mainck
>>  20:          0       PMC   1 Level     clk-plla
>>  21:          1       PMC   6 Level     clk-utmi
>>  22:          0       PMC   3 Level     clk-master
>>  23:    7352177  atmel-aic  17 Level     tc_clkevt
>>  24:      24128  atmel-aic  20 Level     at_hdmac
>>  25:          0  atmel-aic  21 Level     at_hdmac
>>  29:         42  atmel-aic  12 Level     f0008000.mmc
>>  32:    3134371  atmel-aic   9 Level     f8010000.i2c
>>  34:          3  atmel-aic  16 Level     ttyS6
>>  35:          0  atmel-aic  19 Level     at91_adc
>>  36:       6881  atmel-aic  13 Level     f0000000.spi
>>  37:          0  atmel-aic  23 Level     atmel_usba_udc
>>  39:          0  atmel-aic  24 Level     eth0
>>  40:    4952869  atmel-aic  30 Level     can0
>>  41:     348546  atmel-aic  22 Level     ehci_hcd:usb1, ohci_hcd:usb2
>>  90:          0      GPIO  16 Edge      atmel_usba_udc
>> 140:          0      GPIO  15 Edge      mmc-detect
>> Err:          0
>>
>> # ifconfig can0
>> can0      Link encap:UNSPEC  HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>           UP RUNNING NOARP  MTU:16  Metric:1
>>           RX packets:2476577 errors:0 dropped:0 overruns:0 frame:0
>>           TX packets:2476527 errors:0 dropped:0 overruns:0 carrier:0
>>           collisions:0 txqueuelen:10
>>           RX bytes:19812616 (18.8 MiB)  TX bytes:19812216 (18.8 MiB)
>>           Interrupt:40
>>
>> # ip -details -statistics link show can0
>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>> UNKNOWN mode DEFAULT group default qlen 10
>>     link/can  promiscuity 0
>>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>           bitrate 99950 sample-point 0.739
>>           tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>           at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>>           clock 133333333
>>           re-started bus-errors arbit-lost error-warn error-pass bus-off
>>           0          0          0          0          0          0
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     19812616   2476577  0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     19812216   2476527  0       0       0       0
>>
>> I did another dump of the can peripheral register memory after the
>> test. Here are the results:
>>
>> Dumping memory from 0xF8004000 to 0xF8004000:
>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
>> 0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111
>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>> 0xF8004018: [0000B2E4] 0000 0000 0000 0000 1011 0010 1110 0100
>> 0xF800401C: [000052D3] 0000 0000 0000 0000 0101 0010 1101 0011
>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004208: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800420C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004210: [01885139] 0000 0001 1000 1000 0101 0001 0011 1001
>> 0xF8004214: [21201F1E] 0010 0001 0010 0000 0001 1111 0001 1110
>> 0xF8004218: [25242322] 0010 0101 0010 0100 0010 0011 0010 0010
>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004228: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800422C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004230: [018851AD] 0000 0001 1000 1000 0101 0001 1010 1101
>> 0xF8004234: [2221201F] 0010 0010 0010 0001 0010 0000 0001 1111
>> 0xF8004238: [26252423] 0010 0110 0010 0101 0010 0100 0010 0011
>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004248: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800424C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004250: [008852D3] 0000 0000 1000 1000 0101 0010 1101 0011
>> 0xF8004254: [23222120] 0010 0011 0010 0010 0010 0001 0010 0000
>> 0xF8004258: [27262524] 0010 0111 0010 0110 0010 0101 0010 0100
>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004268: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800426C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004270: [00084FD9] 0000 0000 0000 1000 0100 1111 1101 1001
>> 0xF8004274: [201F1E1D] 0010 0000 0001 1111 0001 1110 0001 1101
>> 0xF8004278: [24232221] 0010 0100 0010 0011 0010 0010 0010 0001
>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004288: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800428C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004290: [000876D4] 0000 0000 0000 1000 0111 0110 1101 0100
>> 0xF8004294: [78777675] 0111 1000 0111 0111 0111 0110 0111 0101
>> 0xF8004298: [7C7B7A79] 0111 1100 0111 1011 0111 1010 0111 1001
>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042A8: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042AC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042B0: [0008C8D2] 0000 0000 0000 1000 1100 1000 1101 0010
>> 0xF80042B4: [05040302] 0000 0101 0000 0100 0000 0011 0000 0010
>> 0xF80042B8: [09080706] 0000 1001 0000 1000 0000 0111 0000 0110
>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042C0: [03070000] 0000 0011 0000 0111 0000 0000 0000 0000
>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042C8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>> 0xF80042CC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>> 0xF80042D0: [00885220] 0000 0000 1000 1000 0101 0010 0010 0000
>> 0xF80042D4: [F2F1F0EF] 1111 0010 1111 0001 1111 0000 1110 1111
>> 0xF80042D8: [F6F5F4F3] 1111 0110 1111 0101 1111 0100 1111 0011
>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042E0: [03060000] 0000 0011 0000 0110 0000 0000 0000 0000
>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042E8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>> 0xF80042EC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>> 0xF80042F0: [008850C3] 0000 0000 1000 1000 0101 0000 1100 0011
>> 0xF80042F4: [F1F0EFEE] 1111 0001 1111 0000 1110 1111 1110 1110
>> 0xF80042F8: [F5F4F3F2] 1111 0101 1111 0100 1111 0011 1111 0010
>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> On 8 April 2016 at 08:39, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> > Hello,
>> >
>> >
>> > Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
>> >>
>> >> Hello,
>> >>
>> >> <Sorry for the re-send. I initially sent this as HTML, but then found
>> >> out that it was recommended to send all emails as plaintext, hence the
>> >> resend>
>> >>
>> >> I working on a board based on the AT91SAM9X25 SoC and I'm using
>> >> integrated CAN peripheral. I seem to have run into an issue whereby
>> >> sending lots of messages very rapidly in quick succession causes the
>> >> CAN peripheral to then stop receiving any messages at all. The only
>> >> way to bring it back to a functional state is to bring the network
>> >> interface down and then back up again.
>> >>
>> >> The problem can be replicated as follows:
>> >>
>> >> The CAN interface is initialised using:
>> >>
>> >> ip link set can0 type can bitrate 100000 restart-ms 100
>> >> ifconfig can0 up
>> >>
>> >> I then start sending CAN messages to the unit using a PCAN-USB adapter
>> >> that is plugged into a test Linux PC. After bringing up the CAN
>> >> interface on the test PC, messages can be continuously sent using the
>> >> following bash script:
>> >>
>> >> #!/bin/bash
>> >>
>> >> while :
>> >> do
>> >> cansend can0 123#DEADBEEFDEADBEEF
>> >> done
>> >>
>> >> After running the script, I check that messages are being received on
>> >> the AT91 target by running
>> >>
>> >> ifconfig can0
>> >>
>> >> and checking that the number of received packets is increasing.
>> >>
>> >> I then leave the system running for some time (1.5 hours typically,
>> >> may vary), periodically running ifconfig can0 to check to see if new
>> >> packets are being received. After a while, the can interface will stop
>> >> receiving new packets, even though the test PC is still transmitting
>> >> them. Stopping and restarting the CAN transmissions on the test PC
>> >> does not solve the problem. The interface does not appear to be in the
>> >> bus off state, as shown by running the following:
>> >>
>> >> # ip -details -statistics link show can0
>> >> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>> >> UNKNOWN mode DEFAULT group default qlen 10
>> >>      link/can  promiscuity 0
>> >>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>> >>            bitrate 99950 sample-point 0.739
>> >>            tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>> >>            at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc
>> >> 1
>> >>            clock 133333333
>> >>            re-started bus-errors arbit-lost error-warn error-pass
>> >> bus-off
>> >>            0          0          0          0          0          0
>> >>      RX: bytes  packets  errors  dropped overrun mcast
>> >>      12609768   1576221  5       0       5       0
>> >>      TX: bytes  packets  errors  dropped carrier collsns
>> >>      0          0        0       0       0       0
>> >>
>> >>
>> >> # ifconfig can0
>> >> can0      Link encap:UNSPEC  HWaddr
>> >> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>> >>            UP RUNNING NOARP  MTU:16  Metric:1
>> >>            RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>> >>            TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>> >>            collisions:0 txqueuelen:10
>> >>            RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>> >>            Interrupt:40
>> >>
>> >> Using the devmem command line program and a custom python script, I
>> >> dumped the contents of the CAN peripheral registers to a file. When
>> >> the AT91 CAN peripheral is in the failed state, here is what the
>> >> peripheral memory looks like:
>> >>
>> >> Dumping memory from 0xF8004000 to 0xF8004000:
>> >> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>> >> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
>> >> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
>> >> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>> >> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
>> >> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
>> >> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> >> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> >> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> >> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
>> >> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> >> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> >> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> >> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
>> >> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> >> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> >> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> >> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
>> >> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> >> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> >> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> >> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
>> >> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> >> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> >> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> >> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
>> >> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>> >> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>> >> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>> >> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
>> >> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>> >> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>> >> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>> >> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>> >> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>> >> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> >>
>> >> I noticed that the RBSY flag is set, even though there was nothing
>> >> transmitted to the CAN bus. All of the message boxes had data inside
>> >> ready to be retrieved.
>> >>
>> >> If there are any other test you would like me to carry out, just let me
>> >> know.
>> >
>> >
>> > Where did your Linux kernel come from and what version are you using?
>> > Also
>> > interesting is:
>> >
>> > - how fast is your CPU (frequency)?
>> > - the output of "/proc/interrrupts".
>> > - run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
>> > - use "cangen" or even better "canfdtest" for testing.
>> >
>> > Wolfgang.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-29  8:04   ` Amr Bekhit
       [not found]     ` <CAOLz05oo=EGqvmCaXXBhXs5McMmJDPKCzuiij7Pv22fj5hPB_g@mail.gmail.com>
@ 2016-04-29 11:18     ` Wolfgang Grandegger
  2016-04-29 11:29       ` Amr Bekhit
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-04-29 11:18 UTC (permalink / raw)
  To: Amr Bekhit; +Cc: mkl, linux-can

Hello Amr,

is function tracing (ftrace) working on your system?

Wolfgang.

Am 29.04.2016 um 10:04 schrieb Amr Bekhit:
> Hello,
>
> Thanks to both of you for your responses.
>
> @Menchel: From my observations, the problem is when you have several
> CAN packets that are sent in quick succession. I initially started
> seeing this problem while trying to interface my embedded board to a
> 3rd party CAN device. The CAN device only sent 10 messages every
> second at 100kBaud (so not a huge amount of traffic), but they were
> all sent in one burst, one immediately after the other. If I modify my
> test script to send the CAN messages with a 5ms delay in between, I
> don't seem to have this problem.
>
> @Wolfgang: The CPU is an AT91SAM9X25 SoC running at 400Mhz.
>
> I've carried out another test where I've run "canfdtest -vv -g can0"
> on a host PC with a PCAN-USB device attached and "canfdtest -vv can0"
> on the embedded device and let it run overnight (from 5pm till 8.30am
> the following day). At the end of the test, the host PC had stopped
> sending any more data (there was no more terminal output indicating
> that bytes had been sent) and likewise, the embedded system was not
> receiving any data.
>
> (As a side note, I could not Ctrl-C out of the running canfdtest on
> the embedded system - I ended up having to SSH into the embedded
> system to get access to another terminal so I could run some commands
> - only kill -9 would kill the process)
>
> Even after killing and restarting the canfdtest processes on both host
> and embedded computers, no can messages were sent. I had to bring the
> interface down then back up again on the embedded system before the
> two programs starting showing that messages were being sent and
> recevied.
>
> I've run the following commands on the embedded system at the end of the test:
>
> # cat /proc/interrupts
>             CPU0
>   16:  102823451  atmel-aic   1 Level     pmc, at91_tick, ttyS0
>   17:          0       PMC  17 Level     main_rc_osc
>   18:          0       PMC   0 Level     main_osc
>   19:          0       PMC  16 Level     mainck
>   20:          0       PMC   1 Level     clk-plla
>   21:          1       PMC   6 Level     clk-utmi
>   22:          0       PMC   3 Level     clk-master
>   23:    7352177  atmel-aic  17 Level     tc_clkevt
>   24:      24128  atmel-aic  20 Level     at_hdmac
>   25:          0  atmel-aic  21 Level     at_hdmac
>   29:         42  atmel-aic  12 Level     f0008000.mmc
>   32:    3134371  atmel-aic   9 Level     f8010000.i2c
>   34:          3  atmel-aic  16 Level     ttyS6
>   35:          0  atmel-aic  19 Level     at91_adc
>   36:       6881  atmel-aic  13 Level     f0000000.spi
>   37:          0  atmel-aic  23 Level     atmel_usba_udc
>   39:          0  atmel-aic  24 Level     eth0
>   40:    4952869  atmel-aic  30 Level     can0
>   41:     348546  atmel-aic  22 Level     ehci_hcd:usb1, ohci_hcd:usb2
>   90:          0      GPIO  16 Edge      atmel_usba_udc
> 140:          0      GPIO  15 Edge      mmc-detect
> Err:          0
>
> # ifconfig can0
> can0      Link encap:UNSPEC  HWaddr
> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>            UP RUNNING NOARP  MTU:16  Metric:1
>            RX packets:2476577 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:2476527 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:10
>            RX bytes:19812616 (18.8 MiB)  TX bytes:19812216 (18.8 MiB)
>            Interrupt:40
>
> # ip -details -statistics link show can0
> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
> UNKNOWN mode DEFAULT group default qlen 10
>      link/can  promiscuity 0
>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>            bitrate 99950 sample-point 0.739
>            tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>            at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>            clock 133333333
>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>            0          0          0          0          0          0
>      RX: bytes  packets  errors  dropped overrun mcast
>      19812616   2476577  0       0       0       0
>      TX: bytes  packets  errors  dropped carrier collsns
>      19812216   2476527  0       0       0       0
>
> I did another dump of the can peripheral register memory after the
> test. Here are the results:
>
> Dumping memory from 0xF8004000 to 0xF8004000:
> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
> 0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111
> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
> 0xF8004018: [0000B2E4] 0000 0000 0000 0000 1011 0010 1110 0100
> 0xF800401C: [000052D3] 0000 0000 0000 0000 0101 0010 1101 0011
> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004208: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
> 0xF800420C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004210: [01885139] 0000 0001 1000 1000 0101 0001 0011 1001
> 0xF8004214: [21201F1E] 0010 0001 0010 0000 0001 1111 0001 1110
> 0xF8004218: [25242322] 0010 0101 0010 0100 0010 0011 0010 0010
> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004228: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
> 0xF800422C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004230: [018851AD] 0000 0001 1000 1000 0101 0001 1010 1101
> 0xF8004234: [2221201F] 0010 0010 0010 0001 0010 0000 0001 1111
> 0xF8004238: [26252423] 0010 0110 0010 0101 0010 0100 0010 0011
> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004248: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
> 0xF800424C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004250: [008852D3] 0000 0000 1000 1000 0101 0010 1101 0011
> 0xF8004254: [23222120] 0010 0011 0010 0010 0010 0001 0010 0000
> 0xF8004258: [27262524] 0010 0111 0010 0110 0010 0101 0010 0100
> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004268: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
> 0xF800426C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004270: [00084FD9] 0000 0000 0000 1000 0100 1111 1101 1001
> 0xF8004274: [201F1E1D] 0010 0000 0001 1111 0001 1110 0001 1101
> 0xF8004278: [24232221] 0010 0100 0010 0011 0010 0010 0010 0001
> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004288: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
> 0xF800428C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF8004290: [000876D4] 0000 0000 0000 1000 0111 0110 1101 0100
> 0xF8004294: [78777675] 0111 1000 0111 0111 0111 0110 0111 0101
> 0xF8004298: [7C7B7A79] 0111 1100 0111 1011 0111 1010 0111 1001
> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042A8: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
> 0xF80042AC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042B0: [0008C8D2] 0000 0000 0000 1000 1100 1000 1101 0010
> 0xF80042B4: [05040302] 0000 0101 0000 0100 0000 0011 0000 0010
> 0xF80042B8: [09080706] 0000 1001 0000 1000 0000 0111 0000 0110
> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80042C0: [03070000] 0000 0011 0000 0111 0000 0000 0000 0000
> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042C8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
> 0xF80042CC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
> 0xF80042D0: [00885220] 0000 0000 1000 1000 0101 0010 0010 0000
> 0xF80042D4: [F2F1F0EF] 1111 0010 1111 0001 1111 0000 1110 1111
> 0xF80042D8: [F6F5F4F3] 1111 0110 1111 0101 1111 0100 1111 0011
> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> 0xF80042E0: [03060000] 0000 0011 0000 0110 0000 0000 0000 0000
> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
> 0xF80042E8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
> 0xF80042EC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
> 0xF80042F0: [008850C3] 0000 0000 1000 1000 0101 0000 1100 0011
> 0xF80042F4: [F1F0EFEE] 1111 0001 1111 0000 1110 1111 1110 1110
> 0xF80042F8: [F5F4F3F2] 1111 0101 1111 0100 1111 0011 1111 0010
> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>
> On 8 April 2016 at 08:39, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> Hello,
>>
>>
>> Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
>>>
>>> Hello,
>>>
>>> <Sorry for the re-send. I initially sent this as HTML, but then found
>>> out that it was recommended to send all emails as plaintext, hence the
>>> resend>
>>>
>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>> sending lots of messages very rapidly in quick succession causes the
>>> CAN peripheral to then stop receiving any messages at all. The only
>>> way to bring it back to a functional state is to bring the network
>>> interface down and then back up again.
>>>
>>> The problem can be replicated as follows:
>>>
>>> The CAN interface is initialised using:
>>>
>>> ip link set can0 type can bitrate 100000 restart-ms 100
>>> ifconfig can0 up
>>>
>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>> that is plugged into a test Linux PC. After bringing up the CAN
>>> interface on the test PC, messages can be continuously sent using the
>>> following bash script:
>>>
>>> #!/bin/bash
>>>
>>> while :
>>> do
>>> cansend can0 123#DEADBEEFDEADBEEF
>>> done
>>>
>>> After running the script, I check that messages are being received on
>>> the AT91 target by running
>>>
>>> ifconfig can0
>>>
>>> and checking that the number of received packets is increasing.
>>>
>>> I then leave the system running for some time (1.5 hours typically,
>>> may vary), periodically running ifconfig can0 to check to see if new
>>> packets are being received. After a while, the can interface will stop
>>> receiving new packets, even though the test PC is still transmitting
>>> them. Stopping and restarting the CAN transmissions on the test PC
>>> does not solve the problem. The interface does not appear to be in the
>>> bus off state, as shown by running the following:
>>>
>>> # ip -details -statistics link show can0
>>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>>> UNKNOWN mode DEFAULT group default qlen 10
>>>       link/can  promiscuity 0
>>>       can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>>             bitrate 99950 sample-point 0.739
>>>             tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>>             at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>>>             clock 133333333
>>>             re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>             0          0          0          0          0          0
>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>       12609768   1576221  5       0       5       0
>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>       0          0        0       0       0       0
>>>
>>>
>>> # ifconfig can0
>>> can0      Link encap:UNSPEC  HWaddr
>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>             UP RUNNING NOARP  MTU:16  Metric:1
>>>             RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>             collisions:0 txqueuelen:10
>>>             RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>>>             Interrupt:40
>>>
>>> Using the devmem command line program and a custom python script, I
>>> dumped the contents of the CAN peripheral registers to a file. When
>>> the AT91 CAN peripheral is in the failed state, here is what the
>>> peripheral memory looks like:
>>>
>>> Dumping memory from 0xF8004000 to 0xF8004000:
>>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
>>> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
>>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>>> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
>>> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
>>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
>>> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
>>> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
>>> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
>>> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
>>> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
>>> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> I noticed that the RBSY flag is set, even though there was nothing
>>> transmitted to the CAN bus. All of the message boxes had data inside
>>> ready to be retrieved.
>>>
>>> If there are any other test you would like me to carry out, just let me
>>> know.
>>
>>
>> Where did your Linux kernel come from and what version are you using? Also
>> interesting is:
>>
>> - how fast is your CPU (frequency)?
>> - the output of "/proc/interrrupts".
>> - run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
>> - use "cangen" or even better "canfdtest" for testing.
>>
>> Wolfgang.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-29 11:18     ` Wolfgang Grandegger
@ 2016-04-29 11:29       ` Amr Bekhit
  2016-04-29 14:27         ` Wolfgang Grandegger
  2016-04-30 13:34         ` Wolfgang Grandegger
  0 siblings, 2 replies; 16+ messages in thread
From: Amr Bekhit @ 2016-04-29 11:29 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: mkl, linux-can

Hello Wolfgang,

I can enable ftrace (and any other config) in the kernel. Just let me
know what other options you want enabling and I can configure the
kernel accordingly.

Amr

On 29 April 2016 at 12:18, Wolfgang Grandegger <wg@grandegger.com> wrote:
> Hello Amr,
>
> is function tracing (ftrace) working on your system?
>
> Wolfgang.
>
>
> Am 29.04.2016 um 10:04 schrieb Amr Bekhit:
>>
>> Hello,
>>
>> Thanks to both of you for your responses.
>>
>> @Menchel: From my observations, the problem is when you have several
>> CAN packets that are sent in quick succession. I initially started
>> seeing this problem while trying to interface my embedded board to a
>> 3rd party CAN device. The CAN device only sent 10 messages every
>> second at 100kBaud (so not a huge amount of traffic), but they were
>> all sent in one burst, one immediately after the other. If I modify my
>> test script to send the CAN messages with a 5ms delay in between, I
>> don't seem to have this problem.
>>
>> @Wolfgang: The CPU is an AT91SAM9X25 SoC running at 400Mhz.
>>
>> I've carried out another test where I've run "canfdtest -vv -g can0"
>> on a host PC with a PCAN-USB device attached and "canfdtest -vv can0"
>> on the embedded device and let it run overnight (from 5pm till 8.30am
>> the following day). At the end of the test, the host PC had stopped
>> sending any more data (there was no more terminal output indicating
>> that bytes had been sent) and likewise, the embedded system was not
>> receiving any data.
>>
>> (As a side note, I could not Ctrl-C out of the running canfdtest on
>> the embedded system - I ended up having to SSH into the embedded
>> system to get access to another terminal so I could run some commands
>> - only kill -9 would kill the process)
>>
>> Even after killing and restarting the canfdtest processes on both host
>> and embedded computers, no can messages were sent. I had to bring the
>> interface down then back up again on the embedded system before the
>> two programs starting showing that messages were being sent and
>> recevied.
>>
>> I've run the following commands on the embedded system at the end of the
>> test:
>>
>> # cat /proc/interrupts
>>             CPU0
>>   16:  102823451  atmel-aic   1 Level     pmc, at91_tick, ttyS0
>>   17:          0       PMC  17 Level     main_rc_osc
>>   18:          0       PMC   0 Level     main_osc
>>   19:          0       PMC  16 Level     mainck
>>   20:          0       PMC   1 Level     clk-plla
>>   21:          1       PMC   6 Level     clk-utmi
>>   22:          0       PMC   3 Level     clk-master
>>   23:    7352177  atmel-aic  17 Level     tc_clkevt
>>   24:      24128  atmel-aic  20 Level     at_hdmac
>>   25:          0  atmel-aic  21 Level     at_hdmac
>>   29:         42  atmel-aic  12 Level     f0008000.mmc
>>   32:    3134371  atmel-aic   9 Level     f8010000.i2c
>>   34:          3  atmel-aic  16 Level     ttyS6
>>   35:          0  atmel-aic  19 Level     at91_adc
>>   36:       6881  atmel-aic  13 Level     f0000000.spi
>>   37:          0  atmel-aic  23 Level     atmel_usba_udc
>>   39:          0  atmel-aic  24 Level     eth0
>>   40:    4952869  atmel-aic  30 Level     can0
>>   41:     348546  atmel-aic  22 Level     ehci_hcd:usb1, ohci_hcd:usb2
>>   90:          0      GPIO  16 Edge      atmel_usba_udc
>> 140:          0      GPIO  15 Edge      mmc-detect
>> Err:          0
>>
>> # ifconfig can0
>> can0      Link encap:UNSPEC  HWaddr
>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>            UP RUNNING NOARP  MTU:16  Metric:1
>>            RX packets:2476577 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:2476527 errors:0 dropped:0 overruns:0 carrier:0
>>            collisions:0 txqueuelen:10
>>            RX bytes:19812616 (18.8 MiB)  TX bytes:19812216 (18.8 MiB)
>>            Interrupt:40
>>
>> # ip -details -statistics link show can0
>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>> UNKNOWN mode DEFAULT group default qlen 10
>>      link/can  promiscuity 0
>>      can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>            bitrate 99950 sample-point 0.739
>>            tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>            at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>>            clock 133333333
>>            re-started bus-errors arbit-lost error-warn error-pass bus-off
>>            0          0          0          0          0          0
>>      RX: bytes  packets  errors  dropped overrun mcast
>>      19812616   2476577  0       0       0       0
>>      TX: bytes  packets  errors  dropped carrier collsns
>>      19812216   2476527  0       0       0       0
>>
>> I did another dump of the can peripheral register memory after the
>> test. Here are the results:
>>
>> Dumping memory from 0xF8004000 to 0xF8004000:
>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
>> 0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111
>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>> 0xF8004018: [0000B2E4] 0000 0000 0000 0000 1011 0010 1110 0100
>> 0xF800401C: [000052D3] 0000 0000 0000 0000 0101 0010 1101 0011
>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004208: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800420C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004210: [01885139] 0000 0001 1000 1000 0101 0001 0011 1001
>> 0xF8004214: [21201F1E] 0010 0001 0010 0000 0001 1111 0001 1110
>> 0xF8004218: [25242322] 0010 0101 0010 0100 0010 0011 0010 0010
>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004228: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800422C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004230: [018851AD] 0000 0001 1000 1000 0101 0001 1010 1101
>> 0xF8004234: [2221201F] 0010 0010 0010 0001 0010 0000 0001 1111
>> 0xF8004238: [26252423] 0010 0110 0010 0101 0010 0100 0010 0011
>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004248: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800424C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004250: [008852D3] 0000 0000 1000 1000 0101 0010 1101 0011
>> 0xF8004254: [23222120] 0010 0011 0010 0010 0010 0001 0010 0000
>> 0xF8004258: [27262524] 0010 0111 0010 0110 0010 0101 0010 0100
>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004268: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800426C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004270: [00084FD9] 0000 0000 0000 1000 0100 1111 1101 1001
>> 0xF8004274: [201F1E1D] 0010 0000 0001 1111 0001 1110 0001 1101
>> 0xF8004278: [24232221] 0010 0100 0010 0011 0010 0010 0010 0001
>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004288: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF800428C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF8004290: [000876D4] 0000 0000 0000 1000 0111 0110 1101 0100
>> 0xF8004294: [78777675] 0111 1000 0111 0111 0111 0110 0111 0101
>> 0xF8004298: [7C7B7A79] 0111 1100 0111 1011 0111 1010 0111 1001
>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042A8: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042AC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042B0: [0008C8D2] 0000 0000 0000 1000 1100 1000 1101 0010
>> 0xF80042B4: [05040302] 0000 0101 0000 0100 0000 0011 0000 0010
>> 0xF80042B8: [09080706] 0000 1001 0000 1000 0000 0111 0000 0110
>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042C0: [03070000] 0000 0011 0000 0111 0000 0000 0000 0000
>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042C8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>> 0xF80042CC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>> 0xF80042D0: [00885220] 0000 0000 1000 1000 0101 0010 0010 0000
>> 0xF80042D4: [F2F1F0EF] 1111 0010 1111 0001 1111 0000 1110 1111
>> 0xF80042D8: [F6F5F4F3] 1111 0110 1111 0101 1111 0100 1111 0011
>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> 0xF80042E0: [03060000] 0000 0011 0000 0110 0000 0000 0000 0000
>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>> 0xF80042E8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>> 0xF80042EC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>> 0xF80042F0: [008850C3] 0000 0000 1000 1000 0101 0000 1100 0011
>> 0xF80042F4: [F1F0EFEE] 1111 0001 1111 0000 1110 1111 1110 1110
>> 0xF80042F8: [F5F4F3F2] 1111 0101 1111 0100 1111 0011 1111 0010
>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>
>> On 8 April 2016 at 08:39, Wolfgang Grandegger <wg@grandegger.com> wrote:
>>>
>>> Hello,
>>>
>>>
>>> Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
>>>>
>>>>
>>>> Hello,
>>>>
>>>> <Sorry for the re-send. I initially sent this as HTML, but then found
>>>> out that it was recommended to send all emails as plaintext, hence the
>>>> resend>
>>>>
>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>> sending lots of messages very rapidly in quick succession causes the
>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>> way to bring it back to a functional state is to bring the network
>>>> interface down and then back up again.
>>>>
>>>> The problem can be replicated as follows:
>>>>
>>>> The CAN interface is initialised using:
>>>>
>>>> ip link set can0 type can bitrate 100000 restart-ms 100
>>>> ifconfig can0 up
>>>>
>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>> interface on the test PC, messages can be continuously sent using the
>>>> following bash script:
>>>>
>>>> #!/bin/bash
>>>>
>>>> while :
>>>> do
>>>> cansend can0 123#DEADBEEFDEADBEEF
>>>> done
>>>>
>>>> After running the script, I check that messages are being received on
>>>> the AT91 target by running
>>>>
>>>> ifconfig can0
>>>>
>>>> and checking that the number of received packets is increasing.
>>>>
>>>> I then leave the system running for some time (1.5 hours typically,
>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>> packets are being received. After a while, the can interface will stop
>>>> receiving new packets, even though the test PC is still transmitting
>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>> does not solve the problem. The interface does not appear to be in the
>>>> bus off state, as shown by running the following:
>>>>
>>>> # ip -details -statistics link show can0
>>>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>>>> UNKNOWN mode DEFAULT group default qlen 10
>>>>       link/can  promiscuity 0
>>>>       can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>>>             bitrate 99950 sample-point 0.739
>>>>             tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>>>             at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc
>>>> 1
>>>>             clock 133333333
>>>>             re-started bus-errors arbit-lost error-warn error-pass
>>>> bus-off
>>>>             0          0          0          0          0          0
>>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>>       12609768   1576221  5       0       5       0
>>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>>       0          0        0       0       0       0
>>>>
>>>>
>>>> # ifconfig can0
>>>> can0      Link encap:UNSPEC  HWaddr
>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>>             UP RUNNING NOARP  MTU:16  Metric:1
>>>>             RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>>>>             TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>             collisions:0 txqueuelen:10
>>>>             RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>>>>             Interrupt:40
>>>>
>>>> Using the devmem command line program and a custom python script, I
>>>> dumped the contents of the CAN peripheral registers to a file. When
>>>> the AT91 CAN peripheral is in the failed state, here is what the
>>>> peripheral memory looks like:
>>>>
>>>> Dumping memory from 0xF8004000 to 0xF8004000:
>>>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>>>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
>>>> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
>>>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>>>> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
>>>> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
>>>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
>>>> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
>>>> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
>>>> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
>>>> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
>>>> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>>>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
>>>> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>>> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>>> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> I noticed that the RBSY flag is set, even though there was nothing
>>>> transmitted to the CAN bus. All of the message boxes had data inside
>>>> ready to be retrieved.
>>>>
>>>> If there are any other test you would like me to carry out, just let me
>>>> know.
>>>
>>>
>>>
>>> Where did your Linux kernel come from and what version are you using?
>>> Also
>>> interesting is:
>>>
>>> - how fast is your CPU (frequency)?
>>> - the output of "/proc/interrrupts".
>>> - run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
>>> - use "cangen" or even better "canfdtest" for testing.
>>>
>>> Wolfgang.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-29 11:29       ` Amr Bekhit
@ 2016-04-29 14:27         ` Wolfgang Grandegger
  2016-04-30 13:34         ` Wolfgang Grandegger
  1 sibling, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-04-29 14:27 UTC (permalink / raw)
  To: Amr Bekhit; +Cc: mkl, linux-can

Hello,

Am 29.04.2016 um 13:29 schrieb Amr Bekhit:
> Hello Wolfgang,
>
> I can enable ftrace (and any other config) in the kernel. Just let me
> know what other options you want enabling and I can configure the
> kernel accordingly.

OK, first another question. How is the CAN interface defined in the DTS 
files.

Wolfgang.

> On 29 April 2016 at 12:18, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> Hello Amr,
>>
>> is function tracing (ftrace) working on your system?
>>
>> Wolfgang.
>>
>>
>> Am 29.04.2016 um 10:04 schrieb Amr Bekhit:
>>>
>>> Hello,
>>>
>>> Thanks to both of you for your responses.
>>>
>>> @Menchel: From my observations, the problem is when you have several
>>> CAN packets that are sent in quick succession. I initially started
>>> seeing this problem while trying to interface my embedded board to a
>>> 3rd party CAN device. The CAN device only sent 10 messages every
>>> second at 100kBaud (so not a huge amount of traffic), but they were
>>> all sent in one burst, one immediately after the other. If I modify my
>>> test script to send the CAN messages with a 5ms delay in between, I
>>> don't seem to have this problem.
>>>
>>> @Wolfgang: The CPU is an AT91SAM9X25 SoC running at 400Mhz.
>>>
>>> I've carried out another test where I've run "canfdtest -vv -g can0"
>>> on a host PC with a PCAN-USB device attached and "canfdtest -vv can0"
>>> on the embedded device and let it run overnight (from 5pm till 8.30am
>>> the following day). At the end of the test, the host PC had stopped
>>> sending any more data (there was no more terminal output indicating
>>> that bytes had been sent) and likewise, the embedded system was not
>>> receiving any data.
>>>
>>> (As a side note, I could not Ctrl-C out of the running canfdtest on
>>> the embedded system - I ended up having to SSH into the embedded
>>> system to get access to another terminal so I could run some commands
>>> - only kill -9 would kill the process)
>>>
>>> Even after killing and restarting the canfdtest processes on both host
>>> and embedded computers, no can messages were sent. I had to bring the
>>> interface down then back up again on the embedded system before the
>>> two programs starting showing that messages were being sent and
>>> recevied.
>>>
>>> I've run the following commands on the embedded system at the end of the
>>> test:
>>>
>>> # cat /proc/interrupts
>>>              CPU0
>>>    16:  102823451  atmel-aic   1 Level     pmc, at91_tick, ttyS0
>>>    17:          0       PMC  17 Level     main_rc_osc
>>>    18:          0       PMC   0 Level     main_osc
>>>    19:          0       PMC  16 Level     mainck
>>>    20:          0       PMC   1 Level     clk-plla
>>>    21:          1       PMC   6 Level     clk-utmi
>>>    22:          0       PMC   3 Level     clk-master
>>>    23:    7352177  atmel-aic  17 Level     tc_clkevt
>>>    24:      24128  atmel-aic  20 Level     at_hdmac
>>>    25:          0  atmel-aic  21 Level     at_hdmac
>>>    29:         42  atmel-aic  12 Level     f0008000.mmc
>>>    32:    3134371  atmel-aic   9 Level     f8010000.i2c
>>>    34:          3  atmel-aic  16 Level     ttyS6
>>>    35:          0  atmel-aic  19 Level     at91_adc
>>>    36:       6881  atmel-aic  13 Level     f0000000.spi
>>>    37:          0  atmel-aic  23 Level     atmel_usba_udc
>>>    39:          0  atmel-aic  24 Level     eth0
>>>    40:    4952869  atmel-aic  30 Level     can0
>>>    41:     348546  atmel-aic  22 Level     ehci_hcd:usb1, ohci_hcd:usb2
>>>    90:          0      GPIO  16 Edge      atmel_usba_udc
>>> 140:          0      GPIO  15 Edge      mmc-detect
>>> Err:          0
>>>
>>> # ifconfig can0
>>> can0      Link encap:UNSPEC  HWaddr
>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>             UP RUNNING NOARP  MTU:16  Metric:1
>>>             RX packets:2476577 errors:0 dropped:0 overruns:0 frame:0
>>>             TX packets:2476527 errors:0 dropped:0 overruns:0 carrier:0
>>>             collisions:0 txqueuelen:10
>>>             RX bytes:19812616 (18.8 MiB)  TX bytes:19812216 (18.8 MiB)
>>>             Interrupt:40
>>>
>>> # ip -details -statistics link show can0
>>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>>> UNKNOWN mode DEFAULT group default qlen 10
>>>       link/can  promiscuity 0
>>>       can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>>             bitrate 99950 sample-point 0.739
>>>             tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>>             at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>>>             clock 133333333
>>>             re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>             0          0          0          0          0          0
>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>       19812616   2476577  0       0       0       0
>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>       19812216   2476527  0       0       0       0
>>>
>>> I did another dump of the can peripheral register memory after the
>>> test. Here are the results:
>>>
>>> Dumping memory from 0xF8004000 to 0xF8004000:
>>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
>>> 0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111
>>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>>> 0xF8004018: [0000B2E4] 0000 0000 0000 0000 1011 0010 1110 0100
>>> 0xF800401C: [000052D3] 0000 0000 0000 0000 0101 0010 1101 0011
>>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004208: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800420C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004210: [01885139] 0000 0001 1000 1000 0101 0001 0011 1001
>>> 0xF8004214: [21201F1E] 0010 0001 0010 0000 0001 1111 0001 1110
>>> 0xF8004218: [25242322] 0010 0101 0010 0100 0010 0011 0010 0010
>>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004228: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800422C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004230: [018851AD] 0000 0001 1000 1000 0101 0001 1010 1101
>>> 0xF8004234: [2221201F] 0010 0010 0010 0001 0010 0000 0001 1111
>>> 0xF8004238: [26252423] 0010 0110 0010 0101 0010 0100 0010 0011
>>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004248: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800424C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004250: [008852D3] 0000 0000 1000 1000 0101 0010 1101 0011
>>> 0xF8004254: [23222120] 0010 0011 0010 0010 0010 0001 0010 0000
>>> 0xF8004258: [27262524] 0010 0111 0010 0110 0010 0101 0010 0100
>>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004268: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800426C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004270: [00084FD9] 0000 0000 0000 1000 0100 1111 1101 1001
>>> 0xF8004274: [201F1E1D] 0010 0000 0001 1111 0001 1110 0001 1101
>>> 0xF8004278: [24232221] 0010 0100 0010 0011 0010 0010 0010 0001
>>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004288: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800428C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004290: [000876D4] 0000 0000 0000 1000 0111 0110 1101 0100
>>> 0xF8004294: [78777675] 0111 1000 0111 0111 0111 0110 0111 0101
>>> 0xF8004298: [7C7B7A79] 0111 1100 0111 1011 0111 1010 0111 1001
>>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042A8: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042AC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042B0: [0008C8D2] 0000 0000 0000 1000 1100 1000 1101 0010
>>> 0xF80042B4: [05040302] 0000 0101 0000 0100 0000 0011 0000 0010
>>> 0xF80042B8: [09080706] 0000 1001 0000 1000 0000 0111 0000 0110
>>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042C0: [03070000] 0000 0011 0000 0111 0000 0000 0000 0000
>>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042C8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>>> 0xF80042CC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>>> 0xF80042D0: [00885220] 0000 0000 1000 1000 0101 0010 0010 0000
>>> 0xF80042D4: [F2F1F0EF] 1111 0010 1111 0001 1111 0000 1110 1111
>>> 0xF80042D8: [F6F5F4F3] 1111 0110 1111 0101 1111 0100 1111 0011
>>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042E0: [03060000] 0000 0011 0000 0110 0000 0000 0000 0000
>>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042E8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>>> 0xF80042EC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>>> 0xF80042F0: [008850C3] 0000 0000 1000 1000 0101 0000 1100 0011
>>> 0xF80042F4: [F1F0EFEE] 1111 0001 1111 0000 1110 1111 1110 1110
>>> 0xF80042F8: [F5F4F3F2] 1111 0101 1111 0100 1111 0011 1111 0010
>>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> On 8 April 2016 at 08:39, Wolfgang Grandegger <wg@grandegger.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> <Sorry for the re-send. I initially sent this as HTML, but then found
>>>>> out that it was recommended to send all emails as plaintext, hence the
>>>>> resend>
>>>>>
>>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>>> sending lots of messages very rapidly in quick succession causes the
>>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>>> way to bring it back to a functional state is to bring the network
>>>>> interface down and then back up again.
>>>>>
>>>>> The problem can be replicated as follows:
>>>>>
>>>>> The CAN interface is initialised using:
>>>>>
>>>>> ip link set can0 type can bitrate 100000 restart-ms 100
>>>>> ifconfig can0 up
>>>>>
>>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>>> interface on the test PC, messages can be continuously sent using the
>>>>> following bash script:
>>>>>
>>>>> #!/bin/bash
>>>>>
>>>>> while :
>>>>> do
>>>>> cansend can0 123#DEADBEEFDEADBEEF
>>>>> done
>>>>>
>>>>> After running the script, I check that messages are being received on
>>>>> the AT91 target by running
>>>>>
>>>>> ifconfig can0
>>>>>
>>>>> and checking that the number of received packets is increasing.
>>>>>
>>>>> I then leave the system running for some time (1.5 hours typically,
>>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>>> packets are being received. After a while, the can interface will stop
>>>>> receiving new packets, even though the test PC is still transmitting
>>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>>> does not solve the problem. The interface does not appear to be in the
>>>>> bus off state, as shown by running the following:
>>>>>
>>>>> # ip -details -statistics link show can0
>>>>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>>>>> UNKNOWN mode DEFAULT group default qlen 10
>>>>>        link/can  promiscuity 0
>>>>>        can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>>>>              bitrate 99950 sample-point 0.739
>>>>>              tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>>>>              at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc
>>>>> 1
>>>>>              clock 133333333
>>>>>              re-started bus-errors arbit-lost error-warn error-pass
>>>>> bus-off
>>>>>              0          0          0          0          0          0
>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>        12609768   1576221  5       0       5       0
>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>        0          0        0       0       0       0
>>>>>
>>>>>
>>>>> # ifconfig can0
>>>>> can0      Link encap:UNSPEC  HWaddr
>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>>>              UP RUNNING NOARP  MTU:16  Metric:1
>>>>>              RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>              collisions:0 txqueuelen:10
>>>>>              RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>>>>>              Interrupt:40
>>>>>
>>>>> Using the devmem command line program and a custom python script, I
>>>>> dumped the contents of the CAN peripheral registers to a file. When
>>>>> the AT91 CAN peripheral is in the failed state, here is what the
>>>>> peripheral memory looks like:
>>>>>
>>>>> Dumping memory from 0xF8004000 to 0xF8004000:
>>>>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>>>>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
>>>>> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
>>>>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>>>>> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
>>>>> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
>>>>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
>>>>> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
>>>>> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
>>>>> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
>>>>> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
>>>>> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
>>>>> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>>>> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>>>> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> I noticed that the RBSY flag is set, even though there was nothing
>>>>> transmitted to the CAN bus. All of the message boxes had data inside
>>>>> ready to be retrieved.
>>>>>
>>>>> If there are any other test you would like me to carry out, just let me
>>>>> know.
>>>>
>>>>
>>>>
>>>> Where did your Linux kernel come from and what version are you using?
>>>> Also
>>>> interesting is:
>>>>
>>>> - how fast is your CPU (frequency)?
>>>> - the output of "/proc/interrrupts".
>>>> - run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
>>>> - use "cangen" or even better "canfdtest" for testing.
>>>>
>>>> Wolfgang.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-29 11:29       ` Amr Bekhit
  2016-04-29 14:27         ` Wolfgang Grandegger
@ 2016-04-30 13:34         ` Wolfgang Grandegger
  1 sibling, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-04-30 13:34 UTC (permalink / raw)
  To: Amr Bekhit; +Cc: mkl, linux-can

[-- Attachment #1: Type: text/plain, Size: 20832 bytes --]

Hello Amr,

Am 29.04.2016 um 13:29 schrieb Amr Bekhit:
> Hello Wolfgang,
>
> I can enable ftrace (and any other config) in the kernel. Just let me
> know what other options you want enabling and I can configure the
> kernel accordingly.

Could you please apply the attached patch first (it's untested). It 
prints out some variables when the device is closed (via ifconfig down).

 From your two dumps:

  0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
  0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111

  0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
  0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111

There are mailbox events pending but the interrupt sources are not 
enabled for them... strange!

Wolfgang-

> Amr
>
> On 29 April 2016 at 12:18, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> Hello Amr,
>>
>> is function tracing (ftrace) working on your system?
>>
>> Wolfgang.
>>
>>
>> Am 29.04.2016 um 10:04 schrieb Amr Bekhit:
>>>
>>> Hello,
>>>
>>> Thanks to both of you for your responses.
>>>
>>> @Menchel: From my observations, the problem is when you have several
>>> CAN packets that are sent in quick succession. I initially started
>>> seeing this problem while trying to interface my embedded board to a
>>> 3rd party CAN device. The CAN device only sent 10 messages every
>>> second at 100kBaud (so not a huge amount of traffic), but they were
>>> all sent in one burst, one immediately after the other. If I modify my
>>> test script to send the CAN messages with a 5ms delay in between, I
>>> don't seem to have this problem.
>>>
>>> @Wolfgang: The CPU is an AT91SAM9X25 SoC running at 400Mhz.
>>>
>>> I've carried out another test where I've run "canfdtest -vv -g can0"
>>> on a host PC with a PCAN-USB device attached and "canfdtest -vv can0"
>>> on the embedded device and let it run overnight (from 5pm till 8.30am
>>> the following day). At the end of the test, the host PC had stopped
>>> sending any more data (there was no more terminal output indicating
>>> that bytes had been sent) and likewise, the embedded system was not
>>> receiving any data.
>>>
>>> (As a side note, I could not Ctrl-C out of the running canfdtest on
>>> the embedded system - I ended up having to SSH into the embedded
>>> system to get access to another terminal so I could run some commands
>>> - only kill -9 would kill the process)
>>>
>>> Even after killing and restarting the canfdtest processes on both host
>>> and embedded computers, no can messages were sent. I had to bring the
>>> interface down then back up again on the embedded system before the
>>> two programs starting showing that messages were being sent and
>>> recevied.
>>>
>>> I've run the following commands on the embedded system at the end of the
>>> test:
>>>
>>> # cat /proc/interrupts
>>>              CPU0
>>>    16:  102823451  atmel-aic   1 Level     pmc, at91_tick, ttyS0
>>>    17:          0       PMC  17 Level     main_rc_osc
>>>    18:          0       PMC   0 Level     main_osc
>>>    19:          0       PMC  16 Level     mainck
>>>    20:          0       PMC   1 Level     clk-plla
>>>    21:          1       PMC   6 Level     clk-utmi
>>>    22:          0       PMC   3 Level     clk-master
>>>    23:    7352177  atmel-aic  17 Level     tc_clkevt
>>>    24:      24128  atmel-aic  20 Level     at_hdmac
>>>    25:          0  atmel-aic  21 Level     at_hdmac
>>>    29:         42  atmel-aic  12 Level     f0008000.mmc
>>>    32:    3134371  atmel-aic   9 Level     f8010000.i2c
>>>    34:          3  atmel-aic  16 Level     ttyS6
>>>    35:          0  atmel-aic  19 Level     at91_adc
>>>    36:       6881  atmel-aic  13 Level     f0000000.spi
>>>    37:          0  atmel-aic  23 Level     atmel_usba_udc
>>>    39:          0  atmel-aic  24 Level     eth0
>>>    40:    4952869  atmel-aic  30 Level     can0
>>>    41:     348546  atmel-aic  22 Level     ehci_hcd:usb1, ohci_hcd:usb2
>>>    90:          0      GPIO  16 Edge      atmel_usba_udc
>>> 140:          0      GPIO  15 Edge      mmc-detect
>>> Err:          0
>>>
>>> # ifconfig can0
>>> can0      Link encap:UNSPEC  HWaddr
>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>             UP RUNNING NOARP  MTU:16  Metric:1
>>>             RX packets:2476577 errors:0 dropped:0 overruns:0 frame:0
>>>             TX packets:2476527 errors:0 dropped:0 overruns:0 carrier:0
>>>             collisions:0 txqueuelen:10
>>>             RX bytes:19812616 (18.8 MiB)  TX bytes:19812216 (18.8 MiB)
>>>             Interrupt:40
>>>
>>> # ip -details -statistics link show can0
>>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>>> UNKNOWN mode DEFAULT group default qlen 10
>>>       link/can  promiscuity 0
>>>       can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>>             bitrate 99950 sample-point 0.739
>>>             tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>>             at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc 1
>>>             clock 133333333
>>>             re-started bus-errors arbit-lost error-warn error-pass bus-off
>>>             0          0          0          0          0          0
>>>       RX: bytes  packets  errors  dropped overrun mcast
>>>       19812616   2476577  0       0       0       0
>>>       TX: bytes  packets  errors  dropped carrier collsns
>>>       19812216   2476527  0       0       0       0
>>>
>>> I did another dump of the can peripheral register memory after the
>>> test. Here are the results:
>>>
>>> Dumping memory from 0xF8004000 to 0xF8004000:
>>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800400C: [1F040038] 0001 1111 0000 0100 0000 0000 0011 1000
>>> 0xF8004010: [006000C7] 0000 0000 0110 0000 0000 0000 1100 0111
>>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>>> 0xF8004018: [0000B2E4] 0000 0000 0000 0000 1011 0010 1110 0100
>>> 0xF800401C: [000052D3] 0000 0000 0000 0000 0101 0010 1101 0011
>>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004208: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800420C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004210: [01885139] 0000 0001 1000 1000 0101 0001 0011 1001
>>> 0xF8004214: [21201F1E] 0010 0001 0010 0000 0001 1111 0001 1110
>>> 0xF8004218: [25242322] 0010 0101 0010 0100 0010 0011 0010 0010
>>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004228: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800422C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004230: [018851AD] 0000 0001 1000 1000 0101 0001 1010 1101
>>> 0xF8004234: [2221201F] 0010 0010 0010 0001 0010 0000 0001 1111
>>> 0xF8004238: [26252423] 0010 0110 0010 0101 0010 0100 0010 0011
>>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004248: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800424C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004250: [008852D3] 0000 0000 1000 1000 0101 0010 1101 0011
>>> 0xF8004254: [23222120] 0010 0011 0010 0010 0010 0001 0010 0000
>>> 0xF8004258: [27262524] 0010 0111 0010 0110 0010 0101 0010 0100
>>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004268: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800426C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004270: [00084FD9] 0000 0000 0000 1000 0100 1111 1101 1001
>>> 0xF8004274: [201F1E1D] 0010 0000 0001 1111 0001 1110 0001 1101
>>> 0xF8004278: [24232221] 0010 0100 0010 0011 0010 0010 0010 0001
>>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004288: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF800428C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF8004290: [000876D4] 0000 0000 0000 1000 0111 0110 1101 0100
>>> 0xF8004294: [78777675] 0111 1000 0111 0111 0111 0110 0111 0101
>>> 0xF8004298: [7C7B7A79] 0111 1100 0111 1011 0111 1010 0111 1001
>>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042A8: [20000000] 0010 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042AC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042B0: [0008C8D2] 0000 0000 0000 1000 1100 1000 1101 0010
>>> 0xF80042B4: [05040302] 0000 0101 0000 0100 0000 0011 0000 0010
>>> 0xF80042B8: [09080706] 0000 1001 0000 1000 0000 0111 0000 0110
>>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042C0: [03070000] 0000 0011 0000 0111 0000 0000 0000 0000
>>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042C8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>>> 0xF80042CC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>>> 0xF80042D0: [00885220] 0000 0000 1000 1000 0101 0010 0010 0000
>>> 0xF80042D4: [F2F1F0EF] 1111 0010 1111 0001 1111 0000 1110 1111
>>> 0xF80042D8: [F6F5F4F3] 1111 0110 1111 0101 1111 0100 1111 0011
>>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> 0xF80042E0: [03060000] 0000 0011 0000 0110 0000 0000 0000 0000
>>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>> 0xF80042E8: [01E00000] 0000 0001 1110 0000 0000 0000 0000 0000
>>> 0xF80042EC: [00000078] 0000 0000 0000 0000 0000 0000 0111 1000
>>> 0xF80042F0: [008850C3] 0000 0000 1000 1000 0101 0000 1100 0011
>>> 0xF80042F4: [F1F0EFEE] 1111 0001 1111 0000 1110 1111 1110 1110
>>> 0xF80042F8: [F5F4F3F2] 1111 0101 1111 0100 1111 0011 1111 0010
>>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> On 8 April 2016 at 08:39, Wolfgang Grandegger <wg@grandegger.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> Am 05.04.2016 um 15:10 schrieb Amr Bekhit:
>>>>>
>>>>>
>>>>> Hello,
>>>>>
>>>>> <Sorry for the re-send. I initially sent this as HTML, but then found
>>>>> out that it was recommended to send all emails as plaintext, hence the
>>>>> resend>
>>>>>
>>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>>> sending lots of messages very rapidly in quick succession causes the
>>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>>> way to bring it back to a functional state is to bring the network
>>>>> interface down and then back up again.
>>>>>
>>>>> The problem can be replicated as follows:
>>>>>
>>>>> The CAN interface is initialised using:
>>>>>
>>>>> ip link set can0 type can bitrate 100000 restart-ms 100
>>>>> ifconfig can0 up
>>>>>
>>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>>> interface on the test PC, messages can be continuously sent using the
>>>>> following bash script:
>>>>>
>>>>> #!/bin/bash
>>>>>
>>>>> while :
>>>>> do
>>>>> cansend can0 123#DEADBEEFDEADBEEF
>>>>> done
>>>>>
>>>>> After running the script, I check that messages are being received on
>>>>> the AT91 target by running
>>>>>
>>>>> ifconfig can0
>>>>>
>>>>> and checking that the number of received packets is increasing.
>>>>>
>>>>> I then leave the system running for some time (1.5 hours typically,
>>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>>> packets are being received. After a while, the can interface will stop
>>>>> receiving new packets, even though the test PC is still transmitting
>>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>>> does not solve the problem. The interface does not appear to be in the
>>>>> bus off state, as shown by running the following:
>>>>>
>>>>> # ip -details -statistics link show can0
>>>>> 2: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
>>>>> UNKNOWN mode DEFAULT group default qlen 10
>>>>>        link/can  promiscuity 0
>>>>>        can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100
>>>>>              bitrate 99950 sample-point 0.739
>>>>>              tq 435 prop-seg 8 phase-seg1 8 phase-seg2 6 sjw 1
>>>>>              at91_can: tseg1 4..16 tseg2 2..8 sjw 1..4 brp 2..128 brp-inc
>>>>> 1
>>>>>              clock 133333333
>>>>>              re-started bus-errors arbit-lost error-warn error-pass
>>>>> bus-off
>>>>>              0          0          0          0          0          0
>>>>>        RX: bytes  packets  errors  dropped overrun mcast
>>>>>        12609768   1576221  5       0       5       0
>>>>>        TX: bytes  packets  errors  dropped carrier collsns
>>>>>        0          0        0       0       0       0
>>>>>
>>>>>
>>>>> # ifconfig can0
>>>>> can0      Link encap:UNSPEC  HWaddr
>>>>> 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
>>>>>              UP RUNNING NOARP  MTU:16  Metric:1
>>>>>              RX packets:1576221 errors:5 dropped:0 overruns:0 frame:5
>>>>>              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
>>>>>              collisions:0 txqueuelen:10
>>>>>              RX bytes:12609768 (12.0 MiB)  TX bytes:0 (0.0 B)
>>>>>              Interrupt:40
>>>>>
>>>>> Using the devmem command line program and a custom python script, I
>>>>> dumped the contents of the CAN peripheral registers to a file. When
>>>>> the AT91 CAN peripheral is in the failed state, here is what the
>>>>> peripheral memory looks like:
>>>>>
>>>>> Dumping memory from 0xF8004000 to 0xF8004000:
>>>>> 0xF8004000: [00000001] 0000 0000 0000 0000 0000 0000 0000 0001
>>>>> 0xF8004004: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004008: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF800400C: [1F040000] 0001 1111 0000 0100 0000 0000 0000 0000
>>>>> 0xF8004010: [20E000FF] 0010 0000 1110 0000 0000 0000 1111 1111
>>>>> 0xF8004014: [00390775] 0000 0000 0011 1001 0000 0111 0111 0101
>>>>> 0xF8004018: [000090EF] 0000 0000 0000 0000 1001 0000 1110 1111
>>>>> 0xF800401C: [00009C81] 0000 0000 0000 0000 1001 1100 1000 0001
>>>>> 0xF8004020: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004024: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004028: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80040E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80040E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004200: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004204: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004208: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800420C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004210: [01881273] 0000 0001 1000 1000 0001 0010 0111 0011
>>>>> 0xF8004214: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004218: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800421C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004220: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004224: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004228: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800422C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004230: [018812E6] 0000 0001 1000 1000 0001 0010 1110 0110
>>>>> 0xF8004234: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004238: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800423C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004240: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004244: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004248: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800424C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004250: [01881359] 0000 0001 1000 1000 0001 0011 0101 1001
>>>>> 0xF8004254: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004258: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800425C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004260: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004264: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004268: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800426C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004270: [018813CC] 0000 0001 1000 1000 0001 0011 1100 1100
>>>>> 0xF8004274: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004278: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800427C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF8004280: [01000000] 0000 0001 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004284: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF8004288: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF800428C: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF8004290: [0188143F] 0000 0001 1000 1000 0001 0100 0011 1111
>>>>> 0xF8004294: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF8004298: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF800429C: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80042A0: [02000000] 0000 0010 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042A4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042A8: [048C0000] 0000 0100 1000 1100 0000 0000 0000 0000
>>>>> 0xF80042AC: [00000123] 0000 0000 0000 0000 0000 0001 0010 0011
>>>>> 0xF80042B0: [0188E950] 0000 0001 1000 1000 1110 1001 0101 0000
>>>>> 0xF80042B4: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF80042B8: [EFBEADDE] 1110 1111 1011 1110 1010 1101 1101 1110
>>>>> 0xF80042BC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80042C0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042C4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042C8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042CC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042D0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>>>> 0xF80042D4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042D8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042DC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> 0xF80042E0: [03000000] 0000 0011 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042E4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042E8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042EC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042F0: [00800000] 0000 0000 1000 0000 0000 0000 0000 0000
>>>>> 0xF80042F4: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042F8: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>> 0xF80042FC: [00000000] 0000 0000 0000 0000 0000 0000 0000 0000
>>>>>
>>>>> I noticed that the RBSY flag is set, even though there was nothing
>>>>> transmitted to the CAN bus. All of the message boxes had data inside
>>>>> ready to be retrieved.
>>>>>
>>>>> If there are any other test you would like me to carry out, just let me
>>>>> know.
>>>>
>>>>
>>>>
>>>> Where did your Linux kernel come from and what version are you using?
>>>> Also
>>>> interesting is:
>>>>
>>>> - how fast is your CPU (frequency)?
>>>> - the output of "/proc/interrrupts".
>>>> - run "candump any,0:0,#FFFFFFFF" on the AT91 while the test is running
>>>> - use "cangen" or even better "canfdtest" for testing.
>>>>
>>>> Wolfgang.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>
>

[-- Attachment #2: at91_can_debug.patch --]
[-- Type: text/x-patch, Size: 580 bytes --]

diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
index 945c095..7fb2f09 100644
--- a/drivers/net/can/at91_can.c
+++ b/drivers/net/can/at91_can.c
@@ -1162,6 +1162,11 @@ static int at91_close(struct net_device *dev)
 {
 	struct at91_priv *priv = netdev_priv(dev);
 
+	netdev_info(dev, "reg_sr=%d\n", priv->reg_sr)
+	netdev_info(dev, "tx_next=%d\n", priv->tx_next)
+	netdev_info(dev, "tx_echo=%d\n", priv->tx_echo)
+	netdev_info(dev, "rx_next=%d\n", priv->rx_next)
+
 	netif_stop_queue(dev);
 	napi_disable(&priv->napi);
 	at91_chip_stop(dev, CAN_STATE_STOPPED);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-04-05 13:10 Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages Amr Bekhit
  2016-04-08  7:39 ` Wolfgang Grandegger
@ 2016-05-02  6:23 ` Alexander Stein
  2016-05-02 13:53   ` Wolfgang Grandegger
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Stein @ 2016-05-02  6:23 UTC (permalink / raw)
  To: Amr Bekhit; +Cc: wg, mkl, linux-can

On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
> I working on a board based on the AT91SAM9X25 SoC and I'm using
> integrated CAN peripheral. I seem to have run into an issue whereby
> sending lots of messages very rapidly in quick succession causes the
> CAN peripheral to then stop receiving any messages at all. The only
> way to bring it back to a functional state is to bring the network
> interface down and then back up again.
> [...]
> I then start sending CAN messages to the unit using a PCAN-USB adapter
> that is plugged into a test Linux PC. After bringing up the CAN
> interface on the test PC, messages can be continuously sent using the
> following bash script:
> [...]
> I then leave the system running for some time (1.5 hours typically,
> may vary), periodically running ifconfig can0 to check to see if new
> packets are being received. After a while, the can interface will stop
> receiving new packets, even though the test PC is still transmitting
> them. Stopping and restarting the CAN transmissions on the test PC
> does not solve the problem. The interface does not appear to be in the
> bus off state, as shown by running the following:

That sounds a bit like my getting stuck problem in http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2

The patch post1 at least keeps the driver working. Although I don't know what 
has changed in at91_can meanwhile.

Best regards,
Alexander


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-05-02  6:23 ` Alexander Stein
@ 2016-05-02 13:53   ` Wolfgang Grandegger
  2016-05-03  8:27     ` Amr Bekhit
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-05-02 13:53 UTC (permalink / raw)
  To: Alexander Stein, Amr Bekhit; +Cc: mkl, linux-can

Hello Alexander,

Am 02.05.2016 um 08:23 schrieb Alexander Stein:
> On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>> integrated CAN peripheral. I seem to have run into an issue whereby
>> sending lots of messages very rapidly in quick succession causes the
>> CAN peripheral to then stop receiving any messages at all. The only
>> way to bring it back to a functional state is to bring the network
>> interface down and then back up again.
>> [...]
>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>> that is plugged into a test Linux PC. After bringing up the CAN
>> interface on the test PC, messages can be continuously sent using the
>> following bash script:
>> [...]
>> I then leave the system running for some time (1.5 hours typically,
>> may vary), periodically running ifconfig can0 to check to see if new
>> packets are being received. After a while, the can interface will stop
>> receiving new packets, even though the test PC is still transmitting
>> them. Stopping and restarting the CAN transmissions on the test PC
>> does not solve the problem. The interface does not appear to be in the
>> bus off state, as shown by running the following:
>
> That sounds a bit like my getting stuck problem in http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2
>
> The patch post1 at least keeps the driver working. Although I don't know what
> has changed in at91_can meanwhile.

Thanks for pointing me to that patch. It still applies to Linux 4.1 with 
some minor fixes. Amr, could you please give it a try. Please let me 
know if you need help.

Anyway, I think the driver should not hang even in case of overflows. I 
will have a closer look later this week.

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-05-02 13:53   ` Wolfgang Grandegger
@ 2016-05-03  8:27     ` Amr Bekhit
  2016-06-01 13:21       ` Amr Bekhit
  0 siblings, 1 reply; 16+ messages in thread
From: Amr Bekhit @ 2016-05-03  8:27 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Alexander Stein, mkl, linux-can

Hi Wolfgang and Alexander,

Thanks for both of your responses.


@Alexander: Thanks for pointing out the patch.

@Wolfgang: In response to your earlier request, I've uploaded my dts
file to pastebin, which can find at http://pastebin.com/tNp2PnW4. I'll
give the patch mentioned by Alexander and your one a try and let you
know how it goes.

Amr

On 2 May 2016 at 14:53, Wolfgang Grandegger <wg@grandegger.com> wrote:
> Hello Alexander,
>
> Am 02.05.2016 um 08:23 schrieb Alexander Stein:
>>
>> On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
>>>
>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>> sending lots of messages very rapidly in quick succession causes the
>>> CAN peripheral to then stop receiving any messages at all. The only
>>> way to bring it back to a functional state is to bring the network
>>> interface down and then back up again.
>>> [...]
>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>> that is plugged into a test Linux PC. After bringing up the CAN
>>> interface on the test PC, messages can be continuously sent using the
>>> following bash script:
>>> [...]
>>> I then leave the system running for some time (1.5 hours typically,
>>> may vary), periodically running ifconfig can0 to check to see if new
>>> packets are being received. After a while, the can interface will stop
>>> receiving new packets, even though the test PC is still transmitting
>>> them. Stopping and restarting the CAN transmissions on the test PC
>>> does not solve the problem. The interface does not appear to be in the
>>> bus off state, as shown by running the following:
>>
>>
>> That sounds a bit like my getting stuck problem in
>> http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2
>>
>> The patch post1 at least keeps the driver working. Although I don't know
>> what
>> has changed in at91_can meanwhile.
>
>
> Thanks for pointing me to that patch. It still applies to Linux 4.1 with
> some minor fixes. Amr, could you please give it a try. Please let me know if
> you need help.
>
> Anyway, I think the driver should not hang even in case of overflows. I will
> have a closer look later this week.
>
> Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-05-03  8:27     ` Amr Bekhit
@ 2016-06-01 13:21       ` Amr Bekhit
  2016-06-03  7:22         ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Amr Bekhit @ 2016-06-01 13:21 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Alexander Stein, mkl, linux-can

Hi Wolfgang and Alexander,

@Wolfgang: using the patch you sent to me, I ran the test twice until
the unit stopped responding to messages. After taking the can
interface down, here is the output from the console for both tests:

# ifconfig can0 down
at91_can f8004000.can can0: reg_sr=1
at91_can f8004000.can can0: tx_next=0
at91_can f8004000.can can0: tx_echo=0
at91_can f8004000.can can0: rx_next=6

# ifconfig can0 down
at91_can f8004000.can can0: reg_sr=1
at91_can f8004000.can can0: tx_next=8042
at91_can f8004000.can can0: tx_echo=8042
at91_can f8004000.can can0: rx_next=6

I've also tried out the patch suggested by Alexander and that seems to
work fine - I was unable to get the CAN device to lock up after
running it for over a day continuously (test repeated twice). As I
understood it, the aim of the patch was to get the messages out of the
CAN peripheral immediately during the interrupt and store them in a
kfifo for later processing. From my testing, this does appear to have
solved the problem (or severely reduced the probability of it
happening).

Amr

On 3 May 2016 at 09:27, Amr Bekhit <amrbekhit@gmail.com> wrote:
> Hi Wolfgang and Alexander,
>
> Thanks for both of your responses.
>
>
> @Alexander: Thanks for pointing out the patch.
>
> @Wolfgang: In response to your earlier request, I've uploaded my dts
> file to pastebin, which can find at http://pastebin.com/tNp2PnW4. I'll
> give the patch mentioned by Alexander and your one a try and let you
> know how it goes.
>
> Amr
>
> On 2 May 2016 at 14:53, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> Hello Alexander,
>>
>> Am 02.05.2016 um 08:23 schrieb Alexander Stein:
>>>
>>> On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
>>>>
>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>> sending lots of messages very rapidly in quick succession causes the
>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>> way to bring it back to a functional state is to bring the network
>>>> interface down and then back up again.
>>>> [...]
>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>> interface on the test PC, messages can be continuously sent using the
>>>> following bash script:
>>>> [...]
>>>> I then leave the system running for some time (1.5 hours typically,
>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>> packets are being received. After a while, the can interface will stop
>>>> receiving new packets, even though the test PC is still transmitting
>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>> does not solve the problem. The interface does not appear to be in the
>>>> bus off state, as shown by running the following:
>>>
>>>
>>> That sounds a bit like my getting stuck problem in
>>> http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2
>>>
>>> The patch post1 at least keeps the driver working. Although I don't know
>>> what
>>> has changed in at91_can meanwhile.
>>
>>
>> Thanks for pointing me to that patch. It still applies to Linux 4.1 with
>> some minor fixes. Amr, could you please give it a try. Please let me know if
>> you need help.
>>
>> Anyway, I think the driver should not hang even in case of overflows. I will
>> have a closer look later this week.
>>
>> Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-06-01 13:21       ` Amr Bekhit
@ 2016-06-03  7:22         ` Wolfgang Grandegger
  2016-06-08  8:17           ` Amr Bekhit
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-06-03  7:22 UTC (permalink / raw)
  To: Amr Bekhit; +Cc: Alexander Stein, mkl, linux-can

Hello Amr,

I'm resending this message because it did not show up on the linux-can 
mailing list archive...

Am 01.06.2016 um 15:21 schrieb Amr Bekhit:
> Hi Wolfgang and Alexander,
>
> @Wolfgang: using the patch you sent to me, I ran the test twice until
> the unit stopped responding to messages. After taking the can
> interface down, here is the output from the console for both tests:
>
> # ifconfig can0 down
> at91_can f8004000.can can0: reg_sr=1
> at91_can f8004000.can can0: tx_next=0
> at91_can f8004000.can can0: tx_echo=0
> at91_can f8004000.can can0: rx_next=6
>
> # ifconfig can0 down
> at91_can f8004000.can can0: reg_sr=1
> at91_can f8004000.can can0: tx_next=8042
> at91_can f8004000.can can0: tx_echo=8042
> at91_can f8004000.can can0: rx_next=6

Trying to understand why RX stopped: at91_poll() entered with all RX 
message boxes filled (reg_sr=1, rx_next=6). Because "quota" is exceeded, 
the following if block is not executed:

http://lxr.free-electrons.com/source/drivers/net/can/at91_can.c#L713

At the next entrance of at91_poll(), at91_poll_rx() is *not* called, 
because reg_sr is 0 and the RX MB interrupts are not re-enabled, because 
rx_next is still 6. The RX interrupts stay *disabled*.

If I'm not wrong, the following patch should fix that problem:

diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
index 945c095..c9f36a4 100644
--- a/drivers/net/can/at91_can.c
+++ b/drivers/net/can/at91_can.c
@@ -733,9 +733,10 @@ static int at91_poll_rx(struct net_device *dev, int 
quota)

         /* upper group completed, look again in lower */
         if (priv->rx_next > get_mb_rx_low_last(priv) &&
-           quota > 0 && mb > get_mb_rx_last(priv)) {
+           mb > get_mb_rx_last(priv)) {
                 priv->rx_next = get_mb_rx_first(priv);
-               goto again;
+               if (quota > 0)
+                       goto again;
         }

         return received;

Could you give this patch a try, please.

> I've also tried out the patch suggested by Alexander and that seems to
> work fine - I was unable to get the CAN device to lock up after
> running it for over a day continuously (test repeated twice). As I
> understood it, the aim of the patch was to get the messages out of the
> CAN peripheral immediately during the interrupt and store them in a
> kfifo for later processing. From my testing, this does appear to have
> solved the problem (or severely reduced the probability of it
> happening).

The existing driver may loose messages due to latency, but it should not 
stop working.

Wolfgang.

> On 3 May 2016 at 09:27, Amr Bekhit <amrbekhit@gmail.com> wrote:
>> Hi Wolfgang and Alexander,
>>
>> Thanks for both of your responses.
>>
>>
>> @Alexander: Thanks for pointing out the patch.
>>
>> @Wolfgang: In response to your earlier request, I've uploaded my dts
>> file to pastebin, which can find at http://pastebin.com/tNp2PnW4. I'll
>> give the patch mentioned by Alexander and your one a try and let you
>> know how it goes.
>>
>> Amr
>>
>> On 2 May 2016 at 14:53, Wolfgang Grandegger <wg@grandegger.com> wrote:
>>> Hello Alexander,
>>>
>>> Am 02.05.2016 um 08:23 schrieb Alexander Stein:
>>>>
>>>> On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
>>>>>
>>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>>> sending lots of messages very rapidly in quick succession causes the
>>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>>> way to bring it back to a functional state is to bring the network
>>>>> interface down and then back up again.
>>>>> [...]
>>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>>> interface on the test PC, messages can be continuously sent using the
>>>>> following bash script:
>>>>> [...]
>>>>> I then leave the system running for some time (1.5 hours typically,
>>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>>> packets are being received. After a while, the can interface will stop
>>>>> receiving new packets, even though the test PC is still transmitting
>>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>>> does not solve the problem. The interface does not appear to be in the
>>>>> bus off state, as shown by running the following:
>>>>
>>>>
>>>> That sounds a bit like my getting stuck problem in
>>>> http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2
>>>>
>>>> The patch post1 at least keeps the driver working. Although I don't know
>>>> what
>>>> has changed in at91_can meanwhile.
>>>
>>>
>>> Thanks for pointing me to that patch. It still applies to Linux 4.1 with
>>> some minor fixes. Amr, could you please give it a try. Please let me know if
>>> you need help.
>>>
>>> Anyway, I think the driver should not hang even in case of overflows. I will
>>> have a closer look later this week.
>>>
>>> Wolfgang.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-06-03  7:22         ` Wolfgang Grandegger
@ 2016-06-08  8:17           ` Amr Bekhit
  2016-06-08  8:37             ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Amr Bekhit @ 2016-06-08  8:17 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Alexander Stein, mkl, linux-can

Hi Wolfgang,

I've implemented the patch that you suggested and have been testing
the unit for almost 48 hours, sending CAN messages continuously and
the unit has continued to operate fine with no problems whatsoever.
So, it appears that your patch has fixed the problem. Thanks!

Amr

On 3 June 2016 at 08:22, Wolfgang Grandegger <wg@grandegger.com> wrote:
> Hello Amr,
>
> I'm resending this message because it did not show up on the linux-can
> mailing list archive...
>
> Am 01.06.2016 um 15:21 schrieb Amr Bekhit:
>>
>> Hi Wolfgang and Alexander,
>>
>> @Wolfgang: using the patch you sent to me, I ran the test twice until
>> the unit stopped responding to messages. After taking the can
>> interface down, here is the output from the console for both tests:
>>
>> # ifconfig can0 down
>> at91_can f8004000.can can0: reg_sr=1
>> at91_can f8004000.can can0: tx_next=0
>> at91_can f8004000.can can0: tx_echo=0
>> at91_can f8004000.can can0: rx_next=6
>>
>> # ifconfig can0 down
>> at91_can f8004000.can can0: reg_sr=1
>> at91_can f8004000.can can0: tx_next=8042
>> at91_can f8004000.can can0: tx_echo=8042
>> at91_can f8004000.can can0: rx_next=6
>
>
> Trying to understand why RX stopped: at91_poll() entered with all RX message
> boxes filled (reg_sr=1, rx_next=6). Because "quota" is exceeded, the
> following if block is not executed:
>
> http://lxr.free-electrons.com/source/drivers/net/can/at91_can.c#L713
>
> At the next entrance of at91_poll(), at91_poll_rx() is *not* called, because
> reg_sr is 0 and the RX MB interrupts are not re-enabled, because rx_next is
> still 6. The RX interrupts stay *disabled*.
>
> If I'm not wrong, the following patch should fix that problem:
>
> diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
> index 945c095..c9f36a4 100644
> --- a/drivers/net/can/at91_can.c
> +++ b/drivers/net/can/at91_can.c
> @@ -733,9 +733,10 @@ static int at91_poll_rx(struct net_device *dev, int
> quota)
>
>         /* upper group completed, look again in lower */
>         if (priv->rx_next > get_mb_rx_low_last(priv) &&
> -           quota > 0 && mb > get_mb_rx_last(priv)) {
> +           mb > get_mb_rx_last(priv)) {
>                 priv->rx_next = get_mb_rx_first(priv);
> -               goto again;
> +               if (quota > 0)
> +                       goto again;
>         }
>
>         return received;
>
> Could you give this patch a try, please.
>
>> I've also tried out the patch suggested by Alexander and that seems to
>> work fine - I was unable to get the CAN device to lock up after
>> running it for over a day continuously (test repeated twice). As I
>> understood it, the aim of the patch was to get the messages out of the
>> CAN peripheral immediately during the interrupt and store them in a
>> kfifo for later processing. From my testing, this does appear to have
>> solved the problem (or severely reduced the probability of it
>> happening).
>
>
> The existing driver may loose messages due to latency, but it should not
> stop working.
>
> Wolfgang.
>
>> On 3 May 2016 at 09:27, Amr Bekhit <amrbekhit@gmail.com> wrote:
>>>
>>> Hi Wolfgang and Alexander,
>>>
>>> Thanks for both of your responses.
>>>
>>>
>>> @Alexander: Thanks for pointing out the patch.
>>>
>>> @Wolfgang: In response to your earlier request, I've uploaded my dts
>>> file to pastebin, which can find at http://pastebin.com/tNp2PnW4. I'll
>>> give the patch mentioned by Alexander and your one a try and let you
>>> know how it goes.
>>>
>>> Amr
>>>
>>> On 2 May 2016 at 14:53, Wolfgang Grandegger <wg@grandegger.com> wrote:
>>>>
>>>> Hello Alexander,
>>>>
>>>> Am 02.05.2016 um 08:23 schrieb Alexander Stein:
>>>>>
>>>>>
>>>>> On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
>>>>>>
>>>>>>
>>>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>>>> sending lots of messages very rapidly in quick succession causes the
>>>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>>>> way to bring it back to a functional state is to bring the network
>>>>>> interface down and then back up again.
>>>>>> [...]
>>>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>>>> interface on the test PC, messages can be continuously sent using the
>>>>>> following bash script:
>>>>>> [...]
>>>>>> I then leave the system running for some time (1.5 hours typically,
>>>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>>>> packets are being received. After a while, the can interface will stop
>>>>>> receiving new packets, even though the test PC is still transmitting
>>>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>>>> does not solve the problem. The interface does not appear to be in the
>>>>>> bus off state, as shown by running the following:
>>>>>
>>>>>
>>>>>
>>>>> That sounds a bit like my getting stuck problem in
>>>>>
>>>>> http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2
>>>>>
>>>>> The patch post1 at least keeps the driver working. Although I don't
>>>>> know
>>>>> what
>>>>> has changed in at91_can meanwhile.
>>>>
>>>>
>>>>
>>>> Thanks for pointing me to that patch. It still applies to Linux 4.1 with
>>>> some minor fixes. Amr, could you please give it a try. Please let me
>>>> know if
>>>> you need help.
>>>>
>>>> Anyway, I think the driver should not hang even in case of overflows. I
>>>> will
>>>> have a closer look later this week.
>>>>
>>>> Wolfgang.
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-06-08  8:17           ` Amr Bekhit
@ 2016-06-08  8:37             ` Wolfgang Grandegger
  2016-06-10 13:34               ` Amr Bekhit
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2016-06-08  8:37 UTC (permalink / raw)
  To: Amr Bekhit; +Cc: Alexander Stein, mkl, linux-can

Hello Amr,

Am 08.06.2016 um 10:17 schrieb Amr Bekhit:
> Hi Wolfgang,
>
> I've implemented the patch that you suggested and have been testing
> the unit for almost 48 hours, sending CAN messages continuously and
> the unit has continued to operate fine with no problems whatsoever.
> So, it appears that your patch has fixed the problem. Thanks!

Good news. I'm going to prepare a patch for mainline inclusion. Can I 
add your "Tested-by: Amr Bekhit <amrbekhit@gmail.com>" ?

BTW, did you realize message overflows (with "ip -d -s link show") or 
related out-of-order messages in the kernel log ("dmesg").

Thanks,

Wolfgang.

> Amr
>
> On 3 June 2016 at 08:22, Wolfgang Grandegger <wg@grandegger.com> wrote:
>> Hello Amr,
>>
>> I'm resending this message because it did not show up on the linux-can
>> mailing list archive...
>>
>> Am 01.06.2016 um 15:21 schrieb Amr Bekhit:
>>>
>>> Hi Wolfgang and Alexander,
>>>
>>> @Wolfgang: using the patch you sent to me, I ran the test twice until
>>> the unit stopped responding to messages. After taking the can
>>> interface down, here is the output from the console for both tests:
>>>
>>> # ifconfig can0 down
>>> at91_can f8004000.can can0: reg_sr=1
>>> at91_can f8004000.can can0: tx_next=0
>>> at91_can f8004000.can can0: tx_echo=0
>>> at91_can f8004000.can can0: rx_next=6
>>>
>>> # ifconfig can0 down
>>> at91_can f8004000.can can0: reg_sr=1
>>> at91_can f8004000.can can0: tx_next=8042
>>> at91_can f8004000.can can0: tx_echo=8042
>>> at91_can f8004000.can can0: rx_next=6
>>
>>
>> Trying to understand why RX stopped: at91_poll() entered with all RX message
>> boxes filled (reg_sr=1, rx_next=6). Because "quota" is exceeded, the
>> following if block is not executed:
>>
>> http://lxr.free-electrons.com/source/drivers/net/can/at91_can.c#L713
>>
>> At the next entrance of at91_poll(), at91_poll_rx() is *not* called, because
>> reg_sr is 0 and the RX MB interrupts are not re-enabled, because rx_next is
>> still 6. The RX interrupts stay *disabled*.
>>
>> If I'm not wrong, the following patch should fix that problem:
>>
>> diff --git a/drivers/net/can/at91_can.c b/drivers/net/can/at91_can.c
>> index 945c095..c9f36a4 100644
>> --- a/drivers/net/can/at91_can.c
>> +++ b/drivers/net/can/at91_can.c
>> @@ -733,9 +733,10 @@ static int at91_poll_rx(struct net_device *dev, int
>> quota)
>>
>>          /* upper group completed, look again in lower */
>>          if (priv->rx_next > get_mb_rx_low_last(priv) &&
>> -           quota > 0 && mb > get_mb_rx_last(priv)) {
>> +           mb > get_mb_rx_last(priv)) {
>>                  priv->rx_next = get_mb_rx_first(priv);
>> -               goto again;
>> +               if (quota > 0)
>> +                       goto again;
>>          }
>>
>>          return received;
>>
>> Could you give this patch a try, please.
>>
>>> I've also tried out the patch suggested by Alexander and that seems to
>>> work fine - I was unable to get the CAN device to lock up after
>>> running it for over a day continuously (test repeated twice). As I
>>> understood it, the aim of the patch was to get the messages out of the
>>> CAN peripheral immediately during the interrupt and store them in a
>>> kfifo for later processing. From my testing, this does appear to have
>>> solved the problem (or severely reduced the probability of it
>>> happening).
>>
>>
>> The existing driver may loose messages due to latency, but it should not
>> stop working.
>>
>> Wolfgang.
>>
>>> On 3 May 2016 at 09:27, Amr Bekhit <amrbekhit@gmail.com> wrote:
>>>>
>>>> Hi Wolfgang and Alexander,
>>>>
>>>> Thanks for both of your responses.
>>>>
>>>>
>>>> @Alexander: Thanks for pointing out the patch.
>>>>
>>>> @Wolfgang: In response to your earlier request, I've uploaded my dts
>>>> file to pastebin, which can find at http://pastebin.com/tNp2PnW4. I'll
>>>> give the patch mentioned by Alexander and your one a try and let you
>>>> know how it goes.
>>>>
>>>> Amr
>>>>
>>>> On 2 May 2016 at 14:53, Wolfgang Grandegger <wg@grandegger.com> wrote:
>>>>>
>>>>> Hello Alexander,
>>>>>
>>>>> Am 02.05.2016 um 08:23 schrieb Alexander Stein:
>>>>>>
>>>>>>
>>>>>> On Tuesday 05 April 2016 14:10:48, Amr Bekhit wrote:
>>>>>>>
>>>>>>>
>>>>>>> I working on a board based on the AT91SAM9X25 SoC and I'm using
>>>>>>> integrated CAN peripheral. I seem to have run into an issue whereby
>>>>>>> sending lots of messages very rapidly in quick succession causes the
>>>>>>> CAN peripheral to then stop receiving any messages at all. The only
>>>>>>> way to bring it back to a functional state is to bring the network
>>>>>>> interface down and then back up again.
>>>>>>> [...]
>>>>>>> I then start sending CAN messages to the unit using a PCAN-USB adapter
>>>>>>> that is plugged into a test Linux PC. After bringing up the CAN
>>>>>>> interface on the test PC, messages can be continuously sent using the
>>>>>>> following bash script:
>>>>>>> [...]
>>>>>>> I then leave the system running for some time (1.5 hours typically,
>>>>>>> may vary), periodically running ifconfig can0 to check to see if new
>>>>>>> packets are being received. After a while, the can interface will stop
>>>>>>> receiving new packets, even though the test PC is still transmitting
>>>>>>> them. Stopping and restarting the CAN transmissions on the test PC
>>>>>>> does not solve the problem. The interface does not appear to be in the
>>>>>>> bus off state, as shown by running the following:
>>>>>>
>>>>>>
>>>>>>
>>>>>> That sounds a bit like my getting stuck problem in
>>>>>>
>>>>>> http://linux-can.vger.kernel.narkive.com/bBQqK84G/resend-patch-net-can-at91-can-c-decrease-likelyhood-of-rx-overruns#post2
>>>>>>
>>>>>> The patch post1 at least keeps the driver working. Although I don't
>>>>>> know
>>>>>> what
>>>>>> has changed in at91_can meanwhile.
>>>>>
>>>>>
>>>>>
>>>>> Thanks for pointing me to that patch. It still applies to Linux 4.1 with
>>>>> some minor fixes. Amr, could you please give it a try. Please let me
>>>>> know if
>>>>> you need help.
>>>>>
>>>>> Anyway, I think the driver should not hang even in case of overflows. I
>>>>> will
>>>>> have a closer look later this week.
>>>>>
>>>>> Wolfgang.
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-can" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>
>>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages
  2016-06-08  8:37             ` Wolfgang Grandegger
@ 2016-06-10 13:34               ` Amr Bekhit
  0 siblings, 0 replies; 16+ messages in thread
From: Amr Bekhit @ 2016-06-10 13:34 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: Alexander Stein, mkl, linux-can

> Good news. I'm going to prepare a patch for mainline inclusion. Can I add
> your "Tested-by: Amr Bekhit <amrbekhit@gmail.com>" ?

Yes, that's fine.

> BTW, did you realize message overflows (with "ip -d -s link show") or
> related out-of-order messages in the kernel log ("dmesg").

Not sure what you mean - I have seen the "out-of-order" messages
before. Basically, I'm using a PCAN-USB to send messages continuously
to my unit using the packaged Windows software. After the unit has
been running for a while, the TX buffer gets full. I can press Esc to
clear it and then I get a "order of packets cannot be guaranteed"
message on my unit. I haven't used the ip -d -s link show command
before, I'll check it out, thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-06-10 13:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-05 13:10 Flooding AT91_CAN peripheral with messages causes it to stop receiving any more messages Amr Bekhit
2016-04-08  7:39 ` Wolfgang Grandegger
2016-04-29  8:04   ` Amr Bekhit
     [not found]     ` <CAOLz05oo=EGqvmCaXXBhXs5McMmJDPKCzuiij7Pv22fj5hPB_g@mail.gmail.com>
2016-04-29  8:15       ` Amr Bekhit
2016-04-29 11:18     ` Wolfgang Grandegger
2016-04-29 11:29       ` Amr Bekhit
2016-04-29 14:27         ` Wolfgang Grandegger
2016-04-30 13:34         ` Wolfgang Grandegger
2016-05-02  6:23 ` Alexander Stein
2016-05-02 13:53   ` Wolfgang Grandegger
2016-05-03  8:27     ` Amr Bekhit
2016-06-01 13:21       ` Amr Bekhit
2016-06-03  7:22         ` Wolfgang Grandegger
2016-06-08  8:17           ` Amr Bekhit
2016-06-08  8:37             ` Wolfgang Grandegger
2016-06-10 13:34               ` Amr Bekhit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.