* Dropped Tx frames with SJA1000 @ 2016-07-27 6:23 Damien Dusha 2016-07-27 7:17 ` Tom Evans 0 siblings, 1 reply; 6+ messages in thread From: Damien Dusha @ 2016-07-27 6:23 UTC (permalink / raw) To: linux-can Dear linux-can, I am attempting to debug an issue with the SJA1000 driver. The symptom is the occasional dropped frame when transmitting a burst of frames. Reception is unaffected. Hardware and Setup: I am using an ESD CAN-PCI200 [1], which uses the sja1000_plx_pci driver, on a standard Core 2 Duo machine running Fedora 24 i686 with a freshly compiled vanilla linux-4.6.4. The SJA1000 driver is patched with some debug (patch is shown below). The problem appears without the debug, but it seems to appear more frequently with debug (??). 07:04.0 Bridge: PLX Technology, Inc. PCI <-> IOBus Bridge (rev 02) Subsystem: ESD Electronic System Design GmbH Device 0004 Flags: medium devsel, IRQ 20 Memory at f0200400 (32-bit, non-prefetchable) [size=128] I/O ports at 1100 [size=128] Memory at f0200000 (32-bit, non-prefetchable) [size=1K] Kernel driver in use: sja1000_plx_pci Kernel modules: plx_pci Symptoms: The missing frames are not received with ./candump on the same machine, nor is it seen on the bus externally; two other machines are listening to the bus, both not receiving the same packet (PEAK USB adaptors). Dropped packets are not counted as a TX error in ifconfig, nor are they counted as being sent. can0: flags=193<UP,RUNNING,NOARP> mtu 16 unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 10539 bytes 40178 (39.2 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device interrupt 20 It appears to be unrelated to the SJA1000 Tx drop issue observed in Peak PCI devices earlier in the year [2]. Reproduction: The easiest way to reproduce the problem is with the following command, although it may take several attempts to do so: for i in `seq 1 20`; do strace ./cangen can0 -g 0 -Ii -Di -Li -n 32; done For example, it can be seen that ID 0x0A is missing with the following run of ./candump (on a remote machine). can0 000 [1] 00 can0 001 [1] 01 can0 002 [2] 02 00 can0 003 [3] 03 00 00 can0 004 [4] 04 00 00 00 can0 005 [5] 05 00 00 00 00 can0 006 [6] 06 00 00 00 00 00 can0 007 [7] 07 00 00 00 00 00 00 can0 008 [8] 08 00 00 00 00 00 00 00 can0 009 [1] 09 *** Missing 0x0A *** can0 00B [2] 0B 00 can0 00C [3] 0C 00 00 can0 00D [4] 0D 00 00 00 can0 00E [5] 0E 00 00 00 00 can0 00F [6] 0F 00 00 00 00 00 can0 010 [7] 10 00 00 00 00 00 00 can0 011 [8] 11 00 00 00 00 00 00 00 can0 012 [1] 12 can0 013 [1] 13 can0 014 [2] 14 00 can0 015 [3] 15 00 00 can0 016 [4] 16 00 00 00 can0 017 [5] 17 00 00 00 00 can0 018 [6] 18 00 00 00 00 00 can0 019 [7] 19 00 00 00 00 00 00 can0 01A [8] 1A 00 00 00 00 00 00 00 can0 01B [1] 1B can0 01C [1] 1C can0 01D [2] 1D 00 can0 01E [3] 1E 00 00 can0 01F [4] 1F 00 00 00 Running strace is not necessary to reproduce the problem, but it does confirm that the packet is received by the kernel. write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16 write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16 write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16 write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16 write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16 write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16 write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16 write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16 ** Note that 0x0A is indeed sent without error *** write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16 write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16 write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16 write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16 write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16 write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16 write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16 write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16 write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16 write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16 write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16 write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16 write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16 write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16 write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16 write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16 write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16 write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16 write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16 write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16 write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16 write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16 Examining dmesg debug which was added to the SJA1000 driver, it appears that 0x0A was never sent to the device. [ 5223.770685] [sja1000_interrupt] start [ 5223.770692] [sja1000_interrupt] tx_packets++ [ 5223.770693] [sja1000_interrupt] tx processing complete [ 5223.770696] [sja1000_interrupt] end [ 5223.770698] [sja1000_start_xmit] start [ 5223.770700] Start tx, ID = 0x07 [ 5223.770704] [sja1000_start_xmit] end [ 5223.771157] [sja1000_interrupt] start [ 5223.771164] [sja1000_interrupt] tx_packets++ [ 5223.771166] [sja1000_interrupt] tx processing complete [ 5223.771168] [sja1000_interrupt] end [ 5223.771171] [sja1000_start_xmit] start [ 5223.771172] Start tx, ID = 0x08 [ 5223.771177] [sja1000_start_xmit] end [ 5223.771675] [sja1000_interrupt] start [ 5223.771682] [sja1000_interrupt] tx_packets++ [ 5223.771684] [sja1000_interrupt] tx processing complete [ 5223.771686] [sja1000_interrupt] end [ 5223.771692] [sja1000_start_xmit] start [ 5223.771694] Start tx, ID = 0x09 [ 5223.771698] [sja1000_start_xmit] end [ 5223.771925] [sja1000_interrupt] start [ 5223.771931] [sja1000_interrupt] tx_packets++ [ 5223.771932] [sja1000_interrupt] tx processing complete [ 5223.771934] [sja1000_interrupt] end *** Note that ID = 0x0A is never sent to the device *** [ 5223.771938] [sja1000_start_xmit] start [ 5223.771940] Start tx, ID = 0x0B [ 5223.771944] [sja1000_start_xmit] end [ 5223.772209] [sja1000_interrupt] start [ 5223.772215] [sja1000_interrupt] tx_packets++ [ 5223.772217] [sja1000_interrupt] tx processing complete [ 5223.772219] [sja1000_interrupt] end [ 5223.772222] [sja1000_start_xmit] start [ 5223.772223] Start tx, ID = 0x0C [ 5223.772227] [sja1000_start_xmit] end [ 5223.772529] [sja1000_interrupt] start [ 5223.772536] [sja1000_interrupt] tx_packets++ [ 5223.772538] [sja1000_interrupt] tx processing complete [ 5223.772540] [sja1000_interrupt] end [ 5223.772542] [sja1000_start_xmit] start [ 5223.772544] Start tx, ID = 0x0D I am unsure how to proceed here to help debug the problem. Ideas are welcome. Kind regards, Damien. References: [1] https://esd.eu/en/products/can-pci200 [2] http://article.gmane.org/gmane.linux.can/8923 Patch with debug: --- linux-4.6.4/drivers/net/can/sja1000/sja1000.c 2016-07-27 13:58:25.343706506 +1000 +++ clean/linux-4.6.4/drivers/net/can/sja1000/sja1000.c 2016-07-12 02:30:07.000000000 +1000 @@ -290,8 +290,6 @@ u8 cmd_reg_val = 0x00; int i; - printk(KERN_NOTICE "[sja1000_start_xmit] start\n"); - if (can_dropped_invalid_skb(dev, skb)) return NETDEV_TX_OK; @@ -300,8 +298,6 @@ fi = dlc = cf->can_dlc; id = cf->can_id; - printk(KERN_NOTICE "Start tx, ID = 0x%02X\n", id); - if (id & CAN_RTR_FLAG) fi |= SJA1000_FI_RTR; @@ -335,7 +331,6 @@ sja1000_write_cmdreg(priv, cmd_reg_val); - printk("[sja1000_start_xmit] end\n"); return NETDEV_TX_OK; } @@ -505,8 +500,6 @@ uint8_t isrc, status; int n = 0; - printk(KERN_NOTICE "[sja1000_interrupt] start\n"); - if (priv->pre_irq) priv->pre_irq(priv); @@ -537,12 +530,9 @@ priv->read_reg(priv, SJA1000_FI) & 0xf; stats->tx_packets++; can_get_echo_skb(dev, 0); - printk(KERN_NOTICE "[sja1000_interrupt] tx_packets++\n"); } netif_wake_queue(dev); can_led_event(dev, CAN_LED_EVENT_TX); - - printk(KERN_NOTICE "[sja1000_interrupt] tx processing complete\n"); } if (isrc & IRQ_RI) { /* receive interrupt */ @@ -568,8 +558,6 @@ if (n >= SJA1000_MAX_IRQ) netdev_dbg(dev, "%d messages handled in ISR", n); - printk(KERN_NOTICE "[sja1000_interrupt] end\n"); - return (n) ? IRQ_HANDLED : IRQ_NONE; } EXPORT_SYMBOL_GPL(sja1000_interrupt); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000 2016-07-27 6:23 Dropped Tx frames with SJA1000 Damien Dusha @ 2016-07-27 7:17 ` Tom Evans 2016-07-27 22:47 ` Damien Dusha 0 siblings, 1 reply; 6+ messages in thread From: Tom Evans @ 2016-07-27 7:17 UTC (permalink / raw) To: Damien Dusha, linux-can On 27/07/16 16:23, Damien Dusha wrote: > Dear linux-can, > > I am attempting to debug an issue with the SJA1000 driver. The symptom > is the occasional dropped frame when transmitting a burst of frames. > Reception is unaffected. I've sent a detailed response in email. A summary: "Pattern matching" this problem (not reading all of it, and doing exactly what we hate when we see "help desk" people do this) I'll ask: * What are you setting "/sys/class/net/can0/tx_queue_len" to? * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET, SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they get ENOBUFS? http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html http://www.arm.com/files/pdf/cachecoherencywhitepaper_6june2011.pdf Tom ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000 2016-07-27 7:17 ` Tom Evans @ 2016-07-27 22:47 ` Damien Dusha 2016-07-28 6:08 ` Wolfgang Grandegger 0 siblings, 1 reply; 6+ messages in thread From: Damien Dusha @ 2016-07-27 22:47 UTC (permalink / raw) To: tom_usenet; +Cc: linux-can Hi Tom, On Wed, Jul 27, 2016 at 5:17 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote: > On 27/07/16 16:23, Damien Dusha wrote: >> >> Dear linux-can, >> >> I am attempting to debug an issue with the SJA1000 driver. The symptom >> is the occasional dropped frame when transmitting a burst of frames. >> Reception is unaffected. > > * What are you setting "/sys/class/net/can0/tx_queue_len" to? [damien@localhost can-utils]$ cat /sys/class/net/can0/tx_queue_len 1000 The test sends 640 messages (20 bursts of 32 messages). > * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET, > SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they > get ENOBUFS? cangen is from the standard CAN utilities, which may be found at: git clone https://github.com/linux-can/can-utils.git > http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html I'm not convinced is the ENOBUFS problem. When strace is performed on the process, all writes are returned successfully, but the message never makes it to the hardware. The entire strace is shown below where message with ID = 0x0A was dropped, Note all 32 write() calls are successful. execve("./cangen", ["./cangen", "can0", "-g", "0", "-Ii", "-Di", "-Li", "-n", "32"], [/* 33 vars */]) = 0 brk(NULL) = 0x9606000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7723000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=98687, ...}) = 0 mmap2(NULL, 98687, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76e3000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\207\1\0004\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=2111020, ...}) = 0 mmap2(NULL, 1894876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7514000 mprotect(0xb76dc000, 4096, PROT_NONE) = 0 mmap2(0xb76dd000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c8000) = 0xb76dd000 mmap2(0xb76e0000, 10716, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e0000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7513000 set_thread_area({entry_number:-1, base_addr:0xb7513700, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 (entry_number:6) mprotect(0xb76dd000, 8192, PROT_READ) = 0 mprotect(0x804d000, 4096, PROT_READ) = 0 mprotect(0xb7724000, 4096, PROT_READ) = 0 munmap(0xb76e3000, 98687) = 0 rt_sigaction(SIGTERM, {0x8049610, [TERM], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGHUP, {0x8049610, [HUP], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0 rt_sigaction(SIGINT, {0x8049610, [INT], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0 socket(AF_CAN, SOCK_RAW, 1) = 3 ioctl(3, SIOCGIFINDEX, {ifr_name="can0", }) = 0 setsockopt(3, 0x65 /* SOL_?? */, 1, NULL, 0) = 0 bind(3, {sa_family=AF_CAN, sa_data="\0\0\3\0\0\0\320\257p\267\350[r\267"}, 16) = 0 write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16 write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16 write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16 write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16 write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16 write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16 write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16 write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16 write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16 write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16 write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16 write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16 write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16 write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16 write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16 write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16 write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16 write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16 write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16 write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16 write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16 write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16 write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16 write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16 write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16 write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16 write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16 write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16 write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16 write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16 close(3) = 0 exit_group(0) = ? +++ exited with 0 +++ ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000 2016-07-27 22:47 ` Damien Dusha @ 2016-07-28 6:08 ` Wolfgang Grandegger 2016-07-28 23:19 ` Damien Dusha 0 siblings, 1 reply; 6+ messages in thread From: Wolfgang Grandegger @ 2016-07-28 6:08 UTC (permalink / raw) To: Damien Dusha, tom_usenet; +Cc: linux-can Hello, Am 28.07.2016 um 00:47 schrieb Damien Dusha: > Hi Tom, > > On Wed, Jul 27, 2016 at 5:17 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote: >> On 27/07/16 16:23, Damien Dusha wrote: >>> >>> Dear linux-can, >>> >>> I am attempting to debug an issue with the SJA1000 driver. The symptom >>> is the occasional dropped frame when transmitting a burst of frames. >>> Reception is unaffected. >> >> * What are you setting "/sys/class/net/can0/tx_queue_len" to? > > [damien@localhost can-utils]$ cat /sys/class/net/can0/tx_queue_len > 1000 > > The test sends 640 messages (20 bursts of 32 messages). > >> * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET, >> SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they >> get ENOBUFS? > > cangen is from the standard CAN utilities, which may be found at: > git clone https://github.com/linux-can/can-utils.git > >> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html > > I'm not convinced is the ENOBUFS problem. When strace is performed on > the process, all writes are returned successfully, but the message > never makes it to the hardware. > > The entire strace is shown below where message with ID = 0x0A was > dropped, Note all 32 write() calls are successful. > > execve("./cangen", ["./cangen", "can0", "-g", "0", "-Ii", "-Di", > "-Li", "-n", "32"], [/* 33 vars */]) = 0 > brk(NULL) = 0x9606000 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0xb7723000 > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > fstat64(3, {st_mode=S_IFREG|0644, st_size=98687, ...}) = 0 > mmap2(NULL, 98687, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76e3000 > close(3) = 0 > open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\207\1\0004\0\0\0"..., > 512) = 512 > fstat64(3, {st_mode=S_IFREG|0755, st_size=2111020, ...}) = 0 > mmap2(NULL, 1894876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, > 3, 0) = 0xb7514000 > mprotect(0xb76dc000, 4096, PROT_NONE) = 0 > mmap2(0xb76dd000, 12288, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c8000) = 0xb76dd000 > mmap2(0xb76e0000, 10716, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e0000 > close(3) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0xb7513000 > set_thread_area({entry_number:-1, base_addr:0xb7513700, limit:1048575, > seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, > seg_not_present:0, useable:1}) = 0 (entry_number:6) > mprotect(0xb76dd000, 8192, PROT_READ) = 0 > mprotect(0x804d000, 4096, PROT_READ) = 0 > mprotect(0xb7724000, 4096, PROT_READ) = 0 > munmap(0xb76e3000, 98687) = 0 > rt_sigaction(SIGTERM, {0x8049610, [TERM], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0 > rt_sigaction(SIGHUP, {0x8049610, [HUP], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0 > rt_sigaction(SIGINT, {0x8049610, [INT], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0 > socket(AF_CAN, SOCK_RAW, 1) = 3 > ioctl(3, SIOCGIFINDEX, {ifr_name="can0", }) = 0 > setsockopt(3, 0x65 /* SOL_?? */, 1, NULL, 0) = 0 > bind(3, {sa_family=AF_CAN, > sa_data="\0\0\3\0\0\0\320\257p\267\350[r\267"}, 16) = 0 > write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 > write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16 > write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16 > write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16 > write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16 > write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16 > write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16 > write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16 > write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16 > write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16 > write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16 > write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16 > write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16 > write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16 > write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16 > write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16 > write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16 > write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16 > write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16 > write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16 > write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16 > write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16 > write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16 > write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16 > write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16 > write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16 > write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16 > write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16 > write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16 > write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16 > write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16 > write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16 > close(3) = 0 > exit_group(0) = ? > +++ exited with 0 +++ What does "ip -d -s link show can0" report? What device are you using to monitor/trace the traffic? Wolfgang. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000 2016-07-28 6:08 ` Wolfgang Grandegger @ 2016-07-28 23:19 ` Damien Dusha 2016-07-29 0:35 ` Damien Dusha 0 siblings, 1 reply; 6+ messages in thread From: Damien Dusha @ 2016-07-28 23:19 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: tom_usenet, linux-can Hello Wolfgang, >> I am attempting to debug an issue with the SJA1000 driver. The symptom >> is the occasional dropped frame when transmitting a burst of frames. >> Reception is unaffected. > What does "ip -d -s link show can0" report? [damien@localhost can-utils]$ ip -d -s link show can0 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc fq_codel state UNKNOWN mode DEFAULT group default qlen 1000 link/can promiscuity 0 can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 250000 sample-point 0.875 tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1 sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 clock 8000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 RX: bytes packets errors dropped overrun mcast 747304 183494 0 0 0 0 TX: bytes packets errors dropped carrier collsns 14767 3859 0 0 0 0 > What device are you using to monitor/trace the traffic? On a second machine, I have a PEAK USB-CAN adaptor. I don't believe this is the problem because: - To test, I had a second USB-CAN adaptor, and they both happened to drop exactly the same frame. - On the machine with the SJA1000 which drops frames on transmit, I can read back the traffic via ./candump can0, which results in the same dropped frame as the remote machine with the PEAK adaptor. - Yesterday I tried to reproduce the problem with a TI Sitara AM3352 (single core ARM Cortex-A8 300MHz, with Bosch D_CAN, running Linux 4.1.8) but was unable to do so (i.e. the PEAK USB works flawlessly). Incidentally, I bring up the driver in the following way, in the off-chance it may help: [damien@localhost ~]$ cat enable_can0.sh #!/bin/sh CAN_DEVICE=can0 CAN_BITRATE=250000 sudo ip link set $CAN_DEVICE type can bitrate $CAN_BITRATE sudo ifconfig $CAN_DEVICE up sudo ifconfig $CAN_DEVICE txqueuelen 1000 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000 2016-07-28 23:19 ` Damien Dusha @ 2016-07-29 0:35 ` Damien Dusha 0 siblings, 0 replies; 6+ messages in thread From: Damien Dusha @ 2016-07-29 0:35 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: tom_usenet, linux-can Hello All, >> I am attempting to debug an issue with the SJA1000 driver. The symptom >> is the occasional dropped frame when transmitting a burst of frames. >> Reception is unaffected. > > What does "ip -d -s link show can0" report? Thanks to Wolfgang's hint, I have managed to find the problem. The clue comes from the command that was suggested. From this command, it can be seen that the queue discipline (qdisc) used by default in Fedora 24 is fq_codel. can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc *fq_codel* state UNKNOWN This is the problem; Wikipedia [1] makes it clear that the queuing algorithm will drop packets from the queue in particular circumstances, especially if it has been waiting too long. If I change the qdisc to pfifo, then transmit works perfectly: sudo tc qdisc replace dev can0 root pfifo Most other machines that I have access to - including Ubuntu 14.04 - uses pfifo_fast as the qdisc. This is certainly one to add to the "checklist" for debugging problems like this. Thanks again to everyone for their assistance. Kind regards, Damien. [1] https://en.wikipedia.org/wiki/CoDel ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-29 0:35 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-07-27 6:23 Dropped Tx frames with SJA1000 Damien Dusha 2016-07-27 7:17 ` Tom Evans 2016-07-27 22:47 ` Damien Dusha 2016-07-28 6:08 ` Wolfgang Grandegger 2016-07-28 23:19 ` Damien Dusha 2016-07-29 0:35 ` Damien Dusha
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.