All of lore.kernel.org
 help / color / mirror / Atom feed
* Dropped Tx frames with SJA1000
@ 2016-07-27  6:23 Damien Dusha
  2016-07-27  7:17 ` Tom Evans
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Dusha @ 2016-07-27  6:23 UTC (permalink / raw)
  To: linux-can

Dear linux-can,

I am attempting to debug an issue with the SJA1000 driver. The symptom
is the occasional dropped frame when transmitting a burst of frames.
Reception is unaffected.


Hardware and Setup:

I am using an ESD CAN-PCI200 [1], which uses the sja1000_plx_pci
driver, on a standard Core 2 Duo machine running Fedora 24 i686 with a
freshly compiled vanilla linux-4.6.4.

The SJA1000 driver is patched with some debug (patch is shown below).
The problem appears without the debug, but it seems to appear more
frequently with debug (??).

07:04.0 Bridge: PLX Technology, Inc. PCI <-> IOBus Bridge (rev 02)
    Subsystem: ESD Electronic System Design GmbH Device 0004
    Flags: medium devsel, IRQ 20
    Memory at f0200400 (32-bit, non-prefetchable) [size=128]
    I/O ports at 1100 [size=128]
    Memory at f0200000 (32-bit, non-prefetchable) [size=1K]
    Kernel driver in use: sja1000_plx_pci
    Kernel modules: plx_pci


Symptoms:

The missing frames are not received with ./candump on the same
machine, nor is it seen on the bus externally; two other machines are
listening to the bus, both not receiving the same packet (PEAK USB
adaptors).

Dropped packets are not counted as a TX error in ifconfig, nor are
they counted as being sent.

can0: flags=193<UP,RUNNING,NOARP>  mtu 16
        unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
txqueuelen 1000  (UNSPEC)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 10539  bytes 40178 (39.2 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 20

It appears to be unrelated to the SJA1000 Tx drop issue observed in
Peak PCI devices earlier in the year [2].


Reproduction:

The easiest way to reproduce the problem is with the following
command, although it may take several attempts to do so:

for i in `seq 1 20`; do strace ./cangen can0 -g 0 -Ii -Di -Li -n 32; done

For example, it can be seen that ID 0x0A is missing with the following
run of ./candump (on a remote machine).

  can0  000   [1]  00
  can0  001   [1]  01
  can0  002   [2]  02 00
  can0  003   [3]  03 00 00
  can0  004   [4]  04 00 00 00
  can0  005   [5]  05 00 00 00 00
  can0  006   [6]  06 00 00 00 00 00
  can0  007   [7]  07 00 00 00 00 00 00
  can0  008   [8]  08 00 00 00 00 00 00 00
  can0  009   [1]  09
*** Missing 0x0A ***
  can0  00B   [2]  0B 00
  can0  00C   [3]  0C 00 00
  can0  00D   [4]  0D 00 00 00
  can0  00E   [5]  0E 00 00 00 00
  can0  00F   [6]  0F 00 00 00 00 00
  can0  010   [7]  10 00 00 00 00 00 00
  can0  011   [8]  11 00 00 00 00 00 00 00
  can0  012   [1]  12
  can0  013   [1]  13
  can0  014   [2]  14 00
  can0  015   [3]  15 00 00
  can0  016   [4]  16 00 00 00
  can0  017   [5]  17 00 00 00 00
  can0  018   [6]  18 00 00 00 00 00
  can0  019   [7]  19 00 00 00 00 00 00
  can0  01A   [8]  1A 00 00 00 00 00 00 00
  can0  01B   [1]  1B
  can0  01C   [1]  1C
  can0  01D   [2]  1D 00
  can0  01E   [3]  1E 00 00
  can0  01F   [4]  1F 00 00 00

Running strace is not necessary to reproduce the problem, but it does
confirm that the packet is received by the kernel.

write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16
write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16
write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16
write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16
write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16
write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16
write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16
write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16
write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16
** Note that 0x0A is indeed sent without error ***
write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16
write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16
write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16
write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16
write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16
write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16
write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16
write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16
write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16
write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16
write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16
write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16
write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16
write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16
write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16
write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16
write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16
write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16
write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16
write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16
write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16
write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16

Examining dmesg debug which was added to the SJA1000 driver, it
appears that 0x0A was never sent to the device.

[ 5223.770685] [sja1000_interrupt] start
[ 5223.770692] [sja1000_interrupt] tx_packets++
[ 5223.770693] [sja1000_interrupt] tx processing complete
[ 5223.770696] [sja1000_interrupt] end
[ 5223.770698] [sja1000_start_xmit] start
[ 5223.770700] Start tx, ID = 0x07
[ 5223.770704] [sja1000_start_xmit] end
[ 5223.771157] [sja1000_interrupt] start
[ 5223.771164] [sja1000_interrupt] tx_packets++
[ 5223.771166] [sja1000_interrupt] tx processing complete
[ 5223.771168] [sja1000_interrupt] end
[ 5223.771171] [sja1000_start_xmit] start
[ 5223.771172] Start tx, ID = 0x08
[ 5223.771177] [sja1000_start_xmit] end
[ 5223.771675] [sja1000_interrupt] start
[ 5223.771682] [sja1000_interrupt] tx_packets++
[ 5223.771684] [sja1000_interrupt] tx processing complete
[ 5223.771686] [sja1000_interrupt] end
[ 5223.771692] [sja1000_start_xmit] start
[ 5223.771694] Start tx, ID = 0x09
[ 5223.771698] [sja1000_start_xmit] end
[ 5223.771925] [sja1000_interrupt] start
[ 5223.771931] [sja1000_interrupt] tx_packets++
[ 5223.771932] [sja1000_interrupt] tx processing complete
[ 5223.771934] [sja1000_interrupt] end
*** Note that ID = 0x0A is never sent to the device ***
[ 5223.771938] [sja1000_start_xmit] start
[ 5223.771940] Start tx, ID = 0x0B
[ 5223.771944] [sja1000_start_xmit] end
[ 5223.772209] [sja1000_interrupt] start
[ 5223.772215] [sja1000_interrupt] tx_packets++
[ 5223.772217] [sja1000_interrupt] tx processing complete
[ 5223.772219] [sja1000_interrupt] end
[ 5223.772222] [sja1000_start_xmit] start
[ 5223.772223] Start tx, ID = 0x0C
[ 5223.772227] [sja1000_start_xmit] end
[ 5223.772529] [sja1000_interrupt] start
[ 5223.772536] [sja1000_interrupt] tx_packets++
[ 5223.772538] [sja1000_interrupt] tx processing complete
[ 5223.772540] [sja1000_interrupt] end
[ 5223.772542] [sja1000_start_xmit] start
[ 5223.772544] Start tx, ID = 0x0D


I am unsure how to proceed here to help debug the problem. Ideas are welcome.

Kind regards,
Damien.

References:

[1] https://esd.eu/en/products/can-pci200
[2] http://article.gmane.org/gmane.linux.can/8923



Patch with debug:


--- linux-4.6.4/drivers/net/can/sja1000/sja1000.c    2016-07-27
13:58:25.343706506 +1000
+++ clean/linux-4.6.4/drivers/net/can/sja1000/sja1000.c    2016-07-12
02:30:07.000000000 +1000
@@ -290,8 +290,6 @@
     u8 cmd_reg_val = 0x00;
     int i;

-    printk(KERN_NOTICE "[sja1000_start_xmit] start\n");
-
     if (can_dropped_invalid_skb(dev, skb))
         return NETDEV_TX_OK;

@@ -300,8 +298,6 @@
     fi = dlc = cf->can_dlc;
     id = cf->can_id;

-    printk(KERN_NOTICE "Start tx, ID = 0x%02X\n", id);
-
     if (id & CAN_RTR_FLAG)
         fi |= SJA1000_FI_RTR;

@@ -335,7 +331,6 @@

     sja1000_write_cmdreg(priv, cmd_reg_val);

-    printk("[sja1000_start_xmit] end\n");
     return NETDEV_TX_OK;
 }

@@ -505,8 +500,6 @@
     uint8_t isrc, status;
     int n = 0;

-    printk(KERN_NOTICE "[sja1000_interrupt] start\n");
-
     if (priv->pre_irq)
         priv->pre_irq(priv);

@@ -537,12 +530,9 @@
                     priv->read_reg(priv, SJA1000_FI) & 0xf;
                 stats->tx_packets++;
                 can_get_echo_skb(dev, 0);
-                printk(KERN_NOTICE "[sja1000_interrupt] tx_packets++\n");
             }
             netif_wake_queue(dev);
             can_led_event(dev, CAN_LED_EVENT_TX);
-
-            printk(KERN_NOTICE "[sja1000_interrupt] tx processing complete\n");
         }
         if (isrc & IRQ_RI) {
             /* receive interrupt */
@@ -568,8 +558,6 @@
     if (n >= SJA1000_MAX_IRQ)
         netdev_dbg(dev, "%d messages handled in ISR", n);

-    printk(KERN_NOTICE "[sja1000_interrupt] end\n");
-
     return (n) ? IRQ_HANDLED : IRQ_NONE;
 }
 EXPORT_SYMBOL_GPL(sja1000_interrupt);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dropped Tx frames with SJA1000
  2016-07-27  6:23 Dropped Tx frames with SJA1000 Damien Dusha
@ 2016-07-27  7:17 ` Tom Evans
  2016-07-27 22:47   ` Damien Dusha
  0 siblings, 1 reply; 6+ messages in thread
From: Tom Evans @ 2016-07-27  7:17 UTC (permalink / raw)
  To: Damien Dusha, linux-can

On 27/07/16 16:23, Damien Dusha wrote:
> Dear linux-can,
>
> I am attempting to debug an issue with the SJA1000 driver. The symptom
> is the occasional dropped frame when transmitting a burst of frames.
> Reception is unaffected.

I've sent a detailed response in email. A summary:

"Pattern matching" this problem (not reading all of it, and doing exactly what 
we hate when we see "help desk" people do this) I'll ask:

* What are you setting "/sys/class/net/can0/tx_queue_len" to?

* Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET, SO_SNDBUF, 
&buflen)" call with a sensible value? What do they do when they get ENOBUFS?

http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html

http://www.arm.com/files/pdf/cachecoherencywhitepaper_6june2011.pdf

Tom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dropped Tx frames with SJA1000
  2016-07-27  7:17 ` Tom Evans
@ 2016-07-27 22:47   ` Damien Dusha
  2016-07-28  6:08     ` Wolfgang Grandegger
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Dusha @ 2016-07-27 22:47 UTC (permalink / raw)
  To: tom_usenet; +Cc: linux-can

Hi Tom,

On Wed, Jul 27, 2016 at 5:17 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
> On 27/07/16 16:23, Damien Dusha wrote:
>>
>> Dear linux-can,
>>
>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>> is the occasional dropped frame when transmitting a burst of frames.
>> Reception is unaffected.
>
> * What are you setting "/sys/class/net/can0/tx_queue_len" to?

[damien@localhost can-utils]$ cat /sys/class/net/can0/tx_queue_len
1000

The test sends 640 messages (20 bursts of 32 messages).

> * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET,
> SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they
> get ENOBUFS?

cangen is from the standard CAN utilities, which may be found at:
git clone https://github.com/linux-can/can-utils.git

> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html

I'm not convinced is the ENOBUFS problem.  When strace is performed on
the process, all writes are returned successfully, but the message
never makes it to the hardware.

The entire strace is shown below where message with ID = 0x0A was
dropped, Note all 32 write() calls are successful.

execve("./cangen", ["./cangen", "can0", "-g", "0", "-Ii", "-Di",
"-Li", "-n", "32"], [/* 33 vars */]) = 0
brk(NULL)                               = 0x9606000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7723000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=98687, ...}) = 0
mmap2(NULL, 98687, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76e3000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\207\1\0004\0\0\0"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=2111020, ...}) = 0
mmap2(NULL, 1894876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7514000
mprotect(0xb76dc000, 4096, PROT_NONE)   = 0
mmap2(0xb76dd000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c8000) = 0xb76dd000
mmap2(0xb76e0000, 10716, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e0000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7513000
set_thread_area({entry_number:-1, base_addr:0xb7513700, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
seg_not_present:0, useable:1}) = 0 (entry_number:6)
mprotect(0xb76dd000, 8192, PROT_READ)   = 0
mprotect(0x804d000, 4096, PROT_READ)    = 0
mprotect(0xb7724000, 4096, PROT_READ)   = 0
munmap(0xb76e3000, 98687)               = 0
rt_sigaction(SIGTERM, {0x8049610, [TERM], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGHUP, {0x8049610, [HUP], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {0x8049610, [INT], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
socket(AF_CAN, SOCK_RAW, 1)             = 3
ioctl(3, SIOCGIFINDEX, {ifr_name="can0", }) = 0
setsockopt(3, 0x65 /* SOL_?? */, 1, NULL, 0) = 0
bind(3, {sa_family=AF_CAN,
sa_data="\0\0\3\0\0\0\320\257p\267\350[r\267"}, 16) = 0
write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16
write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16
write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16
write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16
write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16
write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16
write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16
write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16
write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16
write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16
write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16
write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16
write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16
write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16
write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16
write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16
write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16
write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16
write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16
write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16
write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16
write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16
write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16
write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16
write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16
write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16
write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16
write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16
write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16
write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16
write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dropped Tx frames with SJA1000
  2016-07-27 22:47   ` Damien Dusha
@ 2016-07-28  6:08     ` Wolfgang Grandegger
  2016-07-28 23:19       ` Damien Dusha
  0 siblings, 1 reply; 6+ messages in thread
From: Wolfgang Grandegger @ 2016-07-28  6:08 UTC (permalink / raw)
  To: Damien Dusha, tom_usenet; +Cc: linux-can

Hello,

Am 28.07.2016 um 00:47 schrieb Damien Dusha:
> Hi Tom,
>
> On Wed, Jul 27, 2016 at 5:17 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
>> On 27/07/16 16:23, Damien Dusha wrote:
>>>
>>> Dear linux-can,
>>>
>>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>>> is the occasional dropped frame when transmitting a burst of frames.
>>> Reception is unaffected.
>>
>> * What are you setting "/sys/class/net/can0/tx_queue_len" to?
>
> [damien@localhost can-utils]$ cat /sys/class/net/can0/tx_queue_len
> 1000
>
> The test sends 640 messages (20 bursts of 32 messages).
>
>> * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET,
>> SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they
>> get ENOBUFS?
>
> cangen is from the standard CAN utilities, which may be found at:
> git clone https://github.com/linux-can/can-utils.git
>
>> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
>
> I'm not convinced is the ENOBUFS problem.  When strace is performed on
> the process, all writes are returned successfully, but the message
> never makes it to the hardware.
>
> The entire strace is shown below where message with ID = 0x0A was
> dropped, Note all 32 write() calls are successful.
>
> execve("./cangen", ["./cangen", "can0", "-g", "0", "-Ii", "-Di",
> "-Li", "-n", "32"], [/* 33 vars */]) = 0
> brk(NULL)                               = 0x9606000
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0xb7723000
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=98687, ...}) = 0
> mmap2(NULL, 98687, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76e3000
> close(3)                                = 0
> open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\207\1\0004\0\0\0"...,
> 512) = 512
> fstat64(3, {st_mode=S_IFREG|0755, st_size=2111020, ...}) = 0
> mmap2(NULL, 1894876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
> 3, 0) = 0xb7514000
> mprotect(0xb76dc000, 4096, PROT_NONE)   = 0
> mmap2(0xb76dd000, 12288, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c8000) = 0xb76dd000
> mmap2(0xb76e0000, 10716, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e0000
> close(3)                                = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0xb7513000
> set_thread_area({entry_number:-1, base_addr:0xb7513700, limit:1048575,
> seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
> seg_not_present:0, useable:1}) = 0 (entry_number:6)
> mprotect(0xb76dd000, 8192, PROT_READ)   = 0
> mprotect(0x804d000, 4096, PROT_READ)    = 0
> mprotect(0xb7724000, 4096, PROT_READ)   = 0
> munmap(0xb76e3000, 98687)               = 0
> rt_sigaction(SIGTERM, {0x8049610, [TERM], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
> rt_sigaction(SIGHUP, {0x8049610, [HUP], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
> rt_sigaction(SIGINT, {0x8049610, [INT], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
> socket(AF_CAN, SOCK_RAW, 1)             = 3
> ioctl(3, SIOCGIFINDEX, {ifr_name="can0", }) = 0
> setsockopt(3, 0x65 /* SOL_?? */, 1, NULL, 0) = 0
> bind(3, {sa_family=AF_CAN,
> sa_data="\0\0\3\0\0\0\320\257p\267\350[r\267"}, 16) = 0
> write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
> write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16
> write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16
> write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16
> write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16
> write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16
> write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16
> write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16
> write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16
> write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16
> write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16
> write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16
> write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16
> write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16
> write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16
> write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16
> write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16
> write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16
> write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16
> write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16
> write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16
> write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16
> write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16
> write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16
> write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16
> write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16
> write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16
> write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16
> write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16
> write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16
> write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16
> write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16
> close(3)                                = 0
> exit_group(0)                           = ?
> +++ exited with 0 +++

What does "ip -d -s link show can0" report? What device are you using to 
monitor/trace the traffic?

Wolfgang.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dropped Tx frames with SJA1000
  2016-07-28  6:08     ` Wolfgang Grandegger
@ 2016-07-28 23:19       ` Damien Dusha
  2016-07-29  0:35         ` Damien Dusha
  0 siblings, 1 reply; 6+ messages in thread
From: Damien Dusha @ 2016-07-28 23:19 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: tom_usenet, linux-can

Hello Wolfgang,

>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>> is the occasional dropped frame when transmitting a burst of frames.
>> Reception is unaffected.

> What does "ip -d -s link show can0" report?

[damien@localhost can-utils]$ ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc fq_codel state UNKNOWN
mode DEFAULT group default qlen 1000
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
      bitrate 250000 sample-point 0.875
      tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
      sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
      clock 8000000
      re-started bus-errors arbit-lost error-warn error-pass bus-off
      0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    747304     183494   0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    14767      3859     0       0       0       0


> What device are you using to monitor/trace the traffic?

On a second machine, I have a PEAK USB-CAN adaptor.  I don't believe
this is the problem because:
 - To test, I had a second USB-CAN adaptor, and they both happened to
drop exactly the same frame.
 - On the machine with the SJA1000 which drops frames on transmit, I
can read back the traffic via ./candump can0, which results in the
same dropped frame as the remote machine with the PEAK adaptor.
 - Yesterday I tried to reproduce the problem with a TI Sitara AM3352
(single core ARM Cortex-A8 300MHz, with Bosch D_CAN, running Linux
4.1.8) but was unable to do so (i.e. the PEAK USB works flawlessly).

Incidentally, I bring up the driver in the following way, in the
off-chance it may help:

[damien@localhost ~]$ cat enable_can0.sh
#!/bin/sh

CAN_DEVICE=can0
CAN_BITRATE=250000

sudo ip link set $CAN_DEVICE type can bitrate $CAN_BITRATE
sudo ifconfig $CAN_DEVICE up
sudo ifconfig $CAN_DEVICE txqueuelen 1000

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dropped Tx frames with SJA1000
  2016-07-28 23:19       ` Damien Dusha
@ 2016-07-29  0:35         ` Damien Dusha
  0 siblings, 0 replies; 6+ messages in thread
From: Damien Dusha @ 2016-07-29  0:35 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: tom_usenet, linux-can

Hello All,

>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>> is the occasional dropped frame when transmitting a burst of frames.
>> Reception is unaffected.
>
> What does "ip -d -s link show can0" report?

Thanks to Wolfgang's hint, I have managed to find the problem.  The
clue comes from the command that was suggested. From this command, it
can be seen that the queue discipline (qdisc) used by default in
Fedora 24 is fq_codel.
can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc *fq_codel* state UNKNOWN

This is the problem; Wikipedia [1] makes it clear that the queuing
algorithm will drop packets from the queue in particular
circumstances, especially if it has been waiting too long.  If I
change the qdisc to pfifo, then transmit works perfectly:
sudo tc qdisc replace dev can0 root pfifo

Most other machines that I have access to - including Ubuntu 14.04 -
uses pfifo_fast as the qdisc.

This is certainly one to add to the "checklist" for debugging problems
like this.

Thanks again to everyone for their assistance.

Kind regards,
Damien.

[1] https://en.wikipedia.org/wiki/CoDel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-07-29  0:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27  6:23 Dropped Tx frames with SJA1000 Damien Dusha
2016-07-27  7:17 ` Tom Evans
2016-07-27 22:47   ` Damien Dusha
2016-07-28  6:08     ` Wolfgang Grandegger
2016-07-28 23:19       ` Damien Dusha
2016-07-29  0:35         ` Damien Dusha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.