* Dropped Tx frames with SJA1000
@ 2016-07-27 6:23 Damien Dusha
2016-07-27 7:17 ` Tom Evans
0 siblings, 1 reply; 6+ messages in thread
From: Damien Dusha @ 2016-07-27 6:23 UTC (permalink / raw)
To: linux-can
Dear linux-can,
I am attempting to debug an issue with the SJA1000 driver. The symptom
is the occasional dropped frame when transmitting a burst of frames.
Reception is unaffected.
Hardware and Setup:
I am using an ESD CAN-PCI200 [1], which uses the sja1000_plx_pci
driver, on a standard Core 2 Duo machine running Fedora 24 i686 with a
freshly compiled vanilla linux-4.6.4.
The SJA1000 driver is patched with some debug (patch is shown below).
The problem appears without the debug, but it seems to appear more
frequently with debug (??).
07:04.0 Bridge: PLX Technology, Inc. PCI <-> IOBus Bridge (rev 02)
Subsystem: ESD Electronic System Design GmbH Device 0004
Flags: medium devsel, IRQ 20
Memory at f0200400 (32-bit, non-prefetchable) [size=128]
I/O ports at 1100 [size=128]
Memory at f0200000 (32-bit, non-prefetchable) [size=1K]
Kernel driver in use: sja1000_plx_pci
Kernel modules: plx_pci
Symptoms:
The missing frames are not received with ./candump on the same
machine, nor is it seen on the bus externally; two other machines are
listening to the bus, both not receiving the same packet (PEAK USB
adaptors).
Dropped packets are not counted as a TX error in ifconfig, nor are
they counted as being sent.
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
txqueuelen 1000 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 10539 bytes 40178 (39.2 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 20
It appears to be unrelated to the SJA1000 Tx drop issue observed in
Peak PCI devices earlier in the year [2].
Reproduction:
The easiest way to reproduce the problem is with the following
command, although it may take several attempts to do so:
for i in `seq 1 20`; do strace ./cangen can0 -g 0 -Ii -Di -Li -n 32; done
For example, it can be seen that ID 0x0A is missing with the following
run of ./candump (on a remote machine).
can0 000 [1] 00
can0 001 [1] 01
can0 002 [2] 02 00
can0 003 [3] 03 00 00
can0 004 [4] 04 00 00 00
can0 005 [5] 05 00 00 00 00
can0 006 [6] 06 00 00 00 00 00
can0 007 [7] 07 00 00 00 00 00 00
can0 008 [8] 08 00 00 00 00 00 00 00
can0 009 [1] 09
*** Missing 0x0A ***
can0 00B [2] 0B 00
can0 00C [3] 0C 00 00
can0 00D [4] 0D 00 00 00
can0 00E [5] 0E 00 00 00 00
can0 00F [6] 0F 00 00 00 00 00
can0 010 [7] 10 00 00 00 00 00 00
can0 011 [8] 11 00 00 00 00 00 00 00
can0 012 [1] 12
can0 013 [1] 13
can0 014 [2] 14 00
can0 015 [3] 15 00 00
can0 016 [4] 16 00 00 00
can0 017 [5] 17 00 00 00 00
can0 018 [6] 18 00 00 00 00 00
can0 019 [7] 19 00 00 00 00 00 00
can0 01A [8] 1A 00 00 00 00 00 00 00
can0 01B [1] 1B
can0 01C [1] 1C
can0 01D [2] 1D 00
can0 01E [3] 1E 00 00
can0 01F [4] 1F 00 00 00
Running strace is not necessary to reproduce the problem, but it does
confirm that the packet is received by the kernel.
write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16
write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16
write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16
write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16
write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16
write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16
write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16
write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16
write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16
** Note that 0x0A is indeed sent without error ***
write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16
write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16
write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16
write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16
write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16
write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16
write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16
write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16
write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16
write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16
write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16
write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16
write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16
write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16
write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16
write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16
write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16
write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16
write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16
write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16
write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16
write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16
Examining dmesg debug which was added to the SJA1000 driver, it
appears that 0x0A was never sent to the device.
[ 5223.770685] [sja1000_interrupt] start
[ 5223.770692] [sja1000_interrupt] tx_packets++
[ 5223.770693] [sja1000_interrupt] tx processing complete
[ 5223.770696] [sja1000_interrupt] end
[ 5223.770698] [sja1000_start_xmit] start
[ 5223.770700] Start tx, ID = 0x07
[ 5223.770704] [sja1000_start_xmit] end
[ 5223.771157] [sja1000_interrupt] start
[ 5223.771164] [sja1000_interrupt] tx_packets++
[ 5223.771166] [sja1000_interrupt] tx processing complete
[ 5223.771168] [sja1000_interrupt] end
[ 5223.771171] [sja1000_start_xmit] start
[ 5223.771172] Start tx, ID = 0x08
[ 5223.771177] [sja1000_start_xmit] end
[ 5223.771675] [sja1000_interrupt] start
[ 5223.771682] [sja1000_interrupt] tx_packets++
[ 5223.771684] [sja1000_interrupt] tx processing complete
[ 5223.771686] [sja1000_interrupt] end
[ 5223.771692] [sja1000_start_xmit] start
[ 5223.771694] Start tx, ID = 0x09
[ 5223.771698] [sja1000_start_xmit] end
[ 5223.771925] [sja1000_interrupt] start
[ 5223.771931] [sja1000_interrupt] tx_packets++
[ 5223.771932] [sja1000_interrupt] tx processing complete
[ 5223.771934] [sja1000_interrupt] end
*** Note that ID = 0x0A is never sent to the device ***
[ 5223.771938] [sja1000_start_xmit] start
[ 5223.771940] Start tx, ID = 0x0B
[ 5223.771944] [sja1000_start_xmit] end
[ 5223.772209] [sja1000_interrupt] start
[ 5223.772215] [sja1000_interrupt] tx_packets++
[ 5223.772217] [sja1000_interrupt] tx processing complete
[ 5223.772219] [sja1000_interrupt] end
[ 5223.772222] [sja1000_start_xmit] start
[ 5223.772223] Start tx, ID = 0x0C
[ 5223.772227] [sja1000_start_xmit] end
[ 5223.772529] [sja1000_interrupt] start
[ 5223.772536] [sja1000_interrupt] tx_packets++
[ 5223.772538] [sja1000_interrupt] tx processing complete
[ 5223.772540] [sja1000_interrupt] end
[ 5223.772542] [sja1000_start_xmit] start
[ 5223.772544] Start tx, ID = 0x0D
I am unsure how to proceed here to help debug the problem. Ideas are welcome.
Kind regards,
Damien.
References:
[1] https://esd.eu/en/products/can-pci200
[2] http://article.gmane.org/gmane.linux.can/8923
Patch with debug:
--- linux-4.6.4/drivers/net/can/sja1000/sja1000.c 2016-07-27
13:58:25.343706506 +1000
+++ clean/linux-4.6.4/drivers/net/can/sja1000/sja1000.c 2016-07-12
02:30:07.000000000 +1000
@@ -290,8 +290,6 @@
u8 cmd_reg_val = 0x00;
int i;
- printk(KERN_NOTICE "[sja1000_start_xmit] start\n");
-
if (can_dropped_invalid_skb(dev, skb))
return NETDEV_TX_OK;
@@ -300,8 +298,6 @@
fi = dlc = cf->can_dlc;
id = cf->can_id;
- printk(KERN_NOTICE "Start tx, ID = 0x%02X\n", id);
-
if (id & CAN_RTR_FLAG)
fi |= SJA1000_FI_RTR;
@@ -335,7 +331,6 @@
sja1000_write_cmdreg(priv, cmd_reg_val);
- printk("[sja1000_start_xmit] end\n");
return NETDEV_TX_OK;
}
@@ -505,8 +500,6 @@
uint8_t isrc, status;
int n = 0;
- printk(KERN_NOTICE "[sja1000_interrupt] start\n");
-
if (priv->pre_irq)
priv->pre_irq(priv);
@@ -537,12 +530,9 @@
priv->read_reg(priv, SJA1000_FI) & 0xf;
stats->tx_packets++;
can_get_echo_skb(dev, 0);
- printk(KERN_NOTICE "[sja1000_interrupt] tx_packets++\n");
}
netif_wake_queue(dev);
can_led_event(dev, CAN_LED_EVENT_TX);
-
- printk(KERN_NOTICE "[sja1000_interrupt] tx processing complete\n");
}
if (isrc & IRQ_RI) {
/* receive interrupt */
@@ -568,8 +558,6 @@
if (n >= SJA1000_MAX_IRQ)
netdev_dbg(dev, "%d messages handled in ISR", n);
- printk(KERN_NOTICE "[sja1000_interrupt] end\n");
-
return (n) ? IRQ_HANDLED : IRQ_NONE;
}
EXPORT_SYMBOL_GPL(sja1000_interrupt);
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000
2016-07-27 6:23 Dropped Tx frames with SJA1000 Damien Dusha
@ 2016-07-27 7:17 ` Tom Evans
2016-07-27 22:47 ` Damien Dusha
0 siblings, 1 reply; 6+ messages in thread
From: Tom Evans @ 2016-07-27 7:17 UTC (permalink / raw)
To: Damien Dusha, linux-can
On 27/07/16 16:23, Damien Dusha wrote:
> Dear linux-can,
>
> I am attempting to debug an issue with the SJA1000 driver. The symptom
> is the occasional dropped frame when transmitting a burst of frames.
> Reception is unaffected.
I've sent a detailed response in email. A summary:
"Pattern matching" this problem (not reading all of it, and doing exactly what
we hate when we see "help desk" people do this) I'll ask:
* What are you setting "/sys/class/net/can0/tx_queue_len" to?
* Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET, SO_SNDBUF,
&buflen)" call with a sensible value? What do they do when they get ENOBUFS?
http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
http://www.arm.com/files/pdf/cachecoherencywhitepaper_6june2011.pdf
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000
2016-07-27 7:17 ` Tom Evans
@ 2016-07-27 22:47 ` Damien Dusha
2016-07-28 6:08 ` Wolfgang Grandegger
0 siblings, 1 reply; 6+ messages in thread
From: Damien Dusha @ 2016-07-27 22:47 UTC (permalink / raw)
To: tom_usenet; +Cc: linux-can
Hi Tom,
On Wed, Jul 27, 2016 at 5:17 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
> On 27/07/16 16:23, Damien Dusha wrote:
>>
>> Dear linux-can,
>>
>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>> is the occasional dropped frame when transmitting a burst of frames.
>> Reception is unaffected.
>
> * What are you setting "/sys/class/net/can0/tx_queue_len" to?
[damien@localhost can-utils]$ cat /sys/class/net/can0/tx_queue_len
1000
The test sends 640 messages (20 bursts of 32 messages).
> * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET,
> SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they
> get ENOBUFS?
cangen is from the standard CAN utilities, which may be found at:
git clone https://github.com/linux-can/can-utils.git
> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
I'm not convinced is the ENOBUFS problem. When strace is performed on
the process, all writes are returned successfully, but the message
never makes it to the hardware.
The entire strace is shown below where message with ID = 0x0A was
dropped, Note all 32 write() calls are successful.
execve("./cangen", ["./cangen", "can0", "-g", "0", "-Ii", "-Di",
"-Li", "-n", "32"], [/* 33 vars */]) = 0
brk(NULL) = 0x9606000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7723000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=98687, ...}) = 0
mmap2(NULL, 98687, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76e3000
close(3) = 0
open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\207\1\0004\0\0\0"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=2111020, ...}) = 0
mmap2(NULL, 1894876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
3, 0) = 0xb7514000
mprotect(0xb76dc000, 4096, PROT_NONE) = 0
mmap2(0xb76dd000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c8000) = 0xb76dd000
mmap2(0xb76e0000, 10716, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e0000
close(3) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0xb7513000
set_thread_area({entry_number:-1, base_addr:0xb7513700, limit:1048575,
seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
seg_not_present:0, useable:1}) = 0 (entry_number:6)
mprotect(0xb76dd000, 8192, PROT_READ) = 0
mprotect(0x804d000, 4096, PROT_READ) = 0
mprotect(0xb7724000, 4096, PROT_READ) = 0
munmap(0xb76e3000, 98687) = 0
rt_sigaction(SIGTERM, {0x8049610, [TERM], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGHUP, {0x8049610, [HUP], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {0x8049610, [INT], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
socket(AF_CAN, SOCK_RAW, 1) = 3
ioctl(3, SIOCGIFINDEX, {ifr_name="can0", }) = 0
setsockopt(3, 0x65 /* SOL_?? */, 1, NULL, 0) = 0
bind(3, {sa_family=AF_CAN,
sa_data="\0\0\3\0\0\0\320\257p\267\350[r\267"}, 16) = 0
write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16
write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16
write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16
write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16
write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16
write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16
write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16
write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16
write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16
write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16
write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16
write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16
write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16
write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16
write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16
write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16
write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16
write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16
write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16
write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16
write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16
write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16
write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16
write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16
write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16
write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16
write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16
write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16
write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16
write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16
write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16
close(3) = 0
exit_group(0) = ?
+++ exited with 0 +++
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000
2016-07-27 22:47 ` Damien Dusha
@ 2016-07-28 6:08 ` Wolfgang Grandegger
2016-07-28 23:19 ` Damien Dusha
0 siblings, 1 reply; 6+ messages in thread
From: Wolfgang Grandegger @ 2016-07-28 6:08 UTC (permalink / raw)
To: Damien Dusha, tom_usenet; +Cc: linux-can
Hello,
Am 28.07.2016 um 00:47 schrieb Damien Dusha:
> Hi Tom,
>
> On Wed, Jul 27, 2016 at 5:17 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
>> On 27/07/16 16:23, Damien Dusha wrote:
>>>
>>> Dear linux-can,
>>>
>>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>>> is the occasional dropped frame when transmitting a burst of frames.
>>> Reception is unaffected.
>>
>> * What are you setting "/sys/class/net/can0/tx_queue_len" to?
>
> [damien@localhost can-utils]$ cat /sys/class/net/can0/tx_queue_len
> 1000
>
> The test sends 640 messages (20 bursts of 32 messages).
>
>> * Are your tools (cangen et al) making a "setsockopt(s, SOL_SOCKET,
>> SO_SNDBUF, &buflen)" call with a sensible value? What do they do when they
>> get ENOBUFS?
>
> cangen is from the standard CAN utilities, which may be found at:
> git clone https://github.com/linux-can/can-utils.git
>
>> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
>
> I'm not convinced is the ENOBUFS problem. When strace is performed on
> the process, all writes are returned successfully, but the message
> never makes it to the hardware.
>
> The entire strace is shown below where message with ID = 0x0A was
> dropped, Note all 32 write() calls are successful.
>
> execve("./cangen", ["./cangen", "can0", "-g", "0", "-Ii", "-Di",
> "-Li", "-n", "32"], [/* 33 vars */]) = 0
> brk(NULL) = 0x9606000
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0xb7723000
> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> fstat64(3, {st_mode=S_IFREG|0644, st_size=98687, ...}) = 0
> mmap2(NULL, 98687, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb76e3000
> close(3) = 0
> open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000\207\1\0004\0\0\0"...,
> 512) = 512
> fstat64(3, {st_mode=S_IFREG|0755, st_size=2111020, ...}) = 0
> mmap2(NULL, 1894876, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
> 3, 0) = 0xb7514000
> mprotect(0xb76dc000, 4096, PROT_NONE) = 0
> mmap2(0xb76dd000, 12288, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c8000) = 0xb76dd000
> mmap2(0xb76e0000, 10716, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb76e0000
> close(3) = 0
> mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0xb7513000
> set_thread_area({entry_number:-1, base_addr:0xb7513700, limit:1048575,
> seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1,
> seg_not_present:0, useable:1}) = 0 (entry_number:6)
> mprotect(0xb76dd000, 8192, PROT_READ) = 0
> mprotect(0x804d000, 4096, PROT_READ) = 0
> mprotect(0xb7724000, 4096, PROT_READ) = 0
> munmap(0xb76e3000, 98687) = 0
> rt_sigaction(SIGTERM, {0x8049610, [TERM], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
> rt_sigaction(SIGHUP, {0x8049610, [HUP], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
> rt_sigaction(SIGINT, {0x8049610, [INT], SA_RESTART}, {SIG_DFL, [], 0}, 8) = 0
> socket(AF_CAN, SOCK_RAW, 1) = 3
> ioctl(3, SIOCGIFINDEX, {ifr_name="can0", }) = 0
> setsockopt(3, 0x65 /* SOL_?? */, 1, NULL, 0) = 0
> bind(3, {sa_family=AF_CAN,
> sa_data="\0\0\3\0\0\0\320\257p\267\350[r\267"}, 16) = 0
> write(3, "\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16
> write(3, "\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0", 16) = 16
> write(3, "\2\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0", 16) = 16
> write(3, "\3\0\0\0\3\0\0\0\3\0\0\0\0\0\0\0", 16) = 16
> write(3, "\4\0\0\0\4\0\0\0\4\0\0\0\0\0\0\0", 16) = 16
> write(3, "\5\0\0\0\5\0\0\0\5\0\0\0\0\0\0\0", 16) = 16
> write(3, "\6\0\0\0\6\0\0\0\6\0\0\0\0\0\0\0", 16) = 16
> write(3, "\7\0\0\0\7\0\0\0\7\0\0\0\0\0\0\0", 16) = 16
> write(3, "\10\0\0\0\10\0\0\0\10\0\0\0\0\0\0\0", 16) = 16
> write(3, "\t\0\0\0\1\0\0\0\t\0\0\0\0\0\0\0", 16) = 16
> write(3, "\n\0\0\0\1\0\0\0\n\0\0\0\0\0\0\0", 16) = 16
> write(3, "\v\0\0\0\2\0\0\0\v\0\0\0\0\0\0\0", 16) = 16
> write(3, "\f\0\0\0\3\0\0\0\f\0\0\0\0\0\0\0", 16) = 16
> write(3, "\r\0\0\0\4\0\0\0\r\0\0\0\0\0\0\0", 16) = 16
> write(3, "\16\0\0\0\5\0\0\0\16\0\0\0\0\0\0\0", 16) = 16
> write(3, "\17\0\0\0\6\0\0\0\17\0\0\0\0\0\0\0", 16) = 16
> write(3, "\20\0\0\0\7\0\0\0\20\0\0\0\0\0\0\0", 16) = 16
> write(3, "\21\0\0\0\10\0\0\0\21\0\0\0\0\0\0\0", 16) = 16
> write(3, "\22\0\0\0\1\0\0\0\22\0\0\0\0\0\0\0", 16) = 16
> write(3, "\23\0\0\0\1\0\0\0\23\0\0\0\0\0\0\0", 16) = 16
> write(3, "\24\0\0\0\2\0\0\0\24\0\0\0\0\0\0\0", 16) = 16
> write(3, "\25\0\0\0\3\0\0\0\25\0\0\0\0\0\0\0", 16) = 16
> write(3, "\26\0\0\0\4\0\0\0\26\0\0\0\0\0\0\0", 16) = 16
> write(3, "\27\0\0\0\5\0\0\0\27\0\0\0\0\0\0\0", 16) = 16
> write(3, "\30\0\0\0\6\0\0\0\30\0\0\0\0\0\0\0", 16) = 16
> write(3, "\31\0\0\0\7\0\0\0\31\0\0\0\0\0\0\0", 16) = 16
> write(3, "\32\0\0\0\10\0\0\0\32\0\0\0\0\0\0\0", 16) = 16
> write(3, "\33\0\0\0\1\0\0\0\33\0\0\0\0\0\0\0", 16) = 16
> write(3, "\34\0\0\0\1\0\0\0\34\0\0\0\0\0\0\0", 16) = 16
> write(3, "\35\0\0\0\2\0\0\0\35\0\0\0\0\0\0\0", 16) = 16
> write(3, "\36\0\0\0\3\0\0\0\36\0\0\0\0\0\0\0", 16) = 16
> write(3, "\37\0\0\0\4\0\0\0\37\0\0\0\0\0\0\0", 16) = 16
> close(3) = 0
> exit_group(0) = ?
> +++ exited with 0 +++
What does "ip -d -s link show can0" report? What device are you using to
monitor/trace the traffic?
Wolfgang.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000
2016-07-28 6:08 ` Wolfgang Grandegger
@ 2016-07-28 23:19 ` Damien Dusha
2016-07-29 0:35 ` Damien Dusha
0 siblings, 1 reply; 6+ messages in thread
From: Damien Dusha @ 2016-07-28 23:19 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: tom_usenet, linux-can
Hello Wolfgang,
>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>> is the occasional dropped frame when transmitting a burst of frames.
>> Reception is unaffected.
> What does "ip -d -s link show can0" report?
[damien@localhost can-utils]$ ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc fq_codel state UNKNOWN
mode DEFAULT group default qlen 1000
link/can promiscuity 0
can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
bitrate 250000 sample-point 0.875
tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
clock 8000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0
RX: bytes packets errors dropped overrun mcast
747304 183494 0 0 0 0
TX: bytes packets errors dropped carrier collsns
14767 3859 0 0 0 0
> What device are you using to monitor/trace the traffic?
On a second machine, I have a PEAK USB-CAN adaptor. I don't believe
this is the problem because:
- To test, I had a second USB-CAN adaptor, and they both happened to
drop exactly the same frame.
- On the machine with the SJA1000 which drops frames on transmit, I
can read back the traffic via ./candump can0, which results in the
same dropped frame as the remote machine with the PEAK adaptor.
- Yesterday I tried to reproduce the problem with a TI Sitara AM3352
(single core ARM Cortex-A8 300MHz, with Bosch D_CAN, running Linux
4.1.8) but was unable to do so (i.e. the PEAK USB works flawlessly).
Incidentally, I bring up the driver in the following way, in the
off-chance it may help:
[damien@localhost ~]$ cat enable_can0.sh
#!/bin/sh
CAN_DEVICE=can0
CAN_BITRATE=250000
sudo ip link set $CAN_DEVICE type can bitrate $CAN_BITRATE
sudo ifconfig $CAN_DEVICE up
sudo ifconfig $CAN_DEVICE txqueuelen 1000
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Dropped Tx frames with SJA1000
2016-07-28 23:19 ` Damien Dusha
@ 2016-07-29 0:35 ` Damien Dusha
0 siblings, 0 replies; 6+ messages in thread
From: Damien Dusha @ 2016-07-29 0:35 UTC (permalink / raw)
To: Wolfgang Grandegger; +Cc: tom_usenet, linux-can
Hello All,
>> I am attempting to debug an issue with the SJA1000 driver. The symptom
>> is the occasional dropped frame when transmitting a burst of frames.
>> Reception is unaffected.
>
> What does "ip -d -s link show can0" report?
Thanks to Wolfgang's hint, I have managed to find the problem. The
clue comes from the command that was suggested. From this command, it
can be seen that the queue discipline (qdisc) used by default in
Fedora 24 is fq_codel.
can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc *fq_codel* state UNKNOWN
This is the problem; Wikipedia [1] makes it clear that the queuing
algorithm will drop packets from the queue in particular
circumstances, especially if it has been waiting too long. If I
change the qdisc to pfifo, then transmit works perfectly:
sudo tc qdisc replace dev can0 root pfifo
Most other machines that I have access to - including Ubuntu 14.04 -
uses pfifo_fast as the qdisc.
This is certainly one to add to the "checklist" for debugging problems
like this.
Thanks again to everyone for their assistance.
Kind regards,
Damien.
[1] https://en.wikipedia.org/wiki/CoDel
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-07-29 0:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-27 6:23 Dropped Tx frames with SJA1000 Damien Dusha
2016-07-27 7:17 ` Tom Evans
2016-07-27 22:47 ` Damien Dusha
2016-07-28 6:08 ` Wolfgang Grandegger
2016-07-28 23:19 ` Damien Dusha
2016-07-29 0:35 ` Damien Dusha
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.