All of lore.kernel.org
 help / color / mirror / Atom feed
* Troubleshooting bus errors, bus turns off during canplayer playback
@ 2016-12-20  4:53 Hans L
  2016-12-20  5:01 ` Tom Evans
  2016-12-20  5:23 ` Brian Silverman
  0 siblings, 2 replies; 6+ messages in thread
From: Hans L @ 2016-12-20  4:53 UTC (permalink / raw)
  To: linux-can

I'm a developer writing firmware for a J1939 device and I have some
logs recorded from vehicles that I want to use as test data, playing
these on a "bench" setup.

I am using a Beaglebone black running debian, with a PEAK PCAN-USB
adapter to play these logs on my test bus.

When I attempt to play a log that has pretty high bus utilization, it
just quits after sending few frames with "No buffer space available".
I found that default txqueuelen was 10 and increasing it to 1000 helps
get rid of the buffer space issues, but then the canplayer process
still just stops transmitting after 300 frames or so.
I guess the bus is encountering some sort of error, and turning off,
so I set restart-ms 1 to have minimal downtime.

With these settings it seems that canplayer is able to get through the
whole log and transmit all frames, but I don't understand why the bus
has to restart so much in the first place, or the cause of these
errors that regularly kill the bus?

Also many options that I tried such as "berr-reporting on" result in
"RTNETLINK answers: Operation not supported".
And where are the descriptions of what these settings do? I search
through the manual but berr-reporting is only mentioned once in the
options list, but no explanation(same for most other settings)
https://www.kernel.org/doc/Documentation/networking/can.txt

Below is some system info:

debian@beaglebone:~$ uname -a
Linux beaglebone 3.8.13-bone79 #1 SMP Tue Oct 13 20:44:55 UTC 2015
armv7l GNU/Linux

debian@beaglebone:~$ sudo modinfo peak_usb
filename:
/lib/modules/3.8.13-bone79/kernel/drivers/net/can/usb/peak_usb/peak_usb.ko
license:        GPL v2
description:    CAN driver for PEAK-System USB adapters
author:         Stephane Grosjean <s.grosjean@peak-system.com>
srcversion:     E230D7B8207864FEABBECC6
alias:          usb:v0C72p000Dd*dc*dsc*dp*ic*isc*ip*in*
alias:          usb:v0C72p000Cd*dc*dsc*dp*ic*isc*ip*in*
depends:        can-dev
intree:         Y
vermagic:       3.8.13-bone79 SMP mod_unload modversions ARMv7 thumb2 p2v8

debian@beaglebone:~$ ip -d -s link show can0
5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP
mode DEFAULT qlen 1000
    link/can
    can state ERROR-PASSIVE restart-ms 1
    bitrate 250000 sample-point 0.875
    tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
    pcan_usb: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 8000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    18         0          0          36         41         20
    RX: bytes  packets  errors  dropped overrun mcast
    10167      2709     0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    1663258    207911   0       46      0       0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Troubleshooting bus errors, bus turns off during canplayer playback
  2016-12-20  4:53 Troubleshooting bus errors, bus turns off during canplayer playback Hans L
@ 2016-12-20  5:01 ` Tom Evans
  2016-12-21 18:08   ` Hans L
  2016-12-20  5:23 ` Brian Silverman
  1 sibling, 1 reply; 6+ messages in thread
From: Tom Evans @ 2016-12-20  5:01 UTC (permalink / raw)
  To: Hans L, linux-can

On 20/12/16 15:53, Hans L wrote:
> I'm a developer writing firmware for a J1939 device and I have some
> logs recorded from vehicles that I want to use as test data, playing
> these on a "bench" setup.
>
> I am using a Beaglebone black running debian, with a PEAK PCAN-USB
> adapter to play these logs on my test bus.
>
> When I attempt to play a log that has pretty high bus utilization, it
> just quits after sending few frames with "No buffer space available".

That's an old problem that everyone runs into eventually.

It has come up before on this list.

Here's something I posted in March 2015:

====

Sockets can block or return ENOBUFS. Ethernet blocks before it returns ENOBUFS 
like you'd expect. With CAN it does the opposite "out of the box" and needs to 
be fixed.

http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html

     With Ethernet, the transmit queue length is 1000 (which would
     return ENOBUF) but before that happens it hits SO_SNDBUF,
     which may be 108544, which is the total Data plus SKB, and
     with an SKB size of about 200 that means it blocks at about
     500 before it ENOBUFs at 1000.

     With CAN, it would block at 500, but it ENOBUFs at 10 first with
     the default queue depth!

I do the following to get a 256-deep queue that blocks before it overflows:

     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
     /bin/echo 256 > /sys/class/net/can1/tx_queue_len

     int sndbuf = (250 + 8) * 256;
     socklen_t socklen = sizeof(sndbuf);
     /* Minimum socket buffer to try and get it blocking */
     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
                     &sndbuf, sizeof(sndbuf));

You might also like to read:

http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf

     SocketCAN and queueing disciplines:
     Final Report
     M. Sojka, R. Lisov\x13y, P. P\x13\x10\x14sa
     Czech Technical University in Prague
     July 20, 2012
     Version 1.2

Tom


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Troubleshooting bus errors, bus turns off during canplayer playback
  2016-12-20  4:53 Troubleshooting bus errors, bus turns off during canplayer playback Hans L
  2016-12-20  5:01 ` Tom Evans
@ 2016-12-20  5:23 ` Brian Silverman
  1 sibling, 0 replies; 6+ messages in thread
From: Brian Silverman @ 2016-12-20  5:23 UTC (permalink / raw)
  To: Hans L; +Cc: linux-can

Are you sending messages from the log file with the same source
address as something else currently on the bus is sending from? I get
those same symptoms when I do that (doing the same thing too). I'm
pretty sure the problem is both nodes think they won the arbitration
because they're using the same address, but then the first to try
sending a recessive data bit when the other one's sending a dominant
one gets a bit error. The first node which detected the error then
sends an error frame so all nodes on the bus ignore the frame and see
an error.

I'm using a PEAK PCAN-miniPCIe card with 3.14.43-rt42, and I have a
"berr-counter tx NNN rx NNN" in the `ip -d -s link show` output. I
don't recall whether I've ever looked at those when two nodes are
sending with the same address to see if they increment, but I can
check in a few days after I'm done travelling. For reference, this is
what that output looks like on a random box of mine (I think it has no
CAN wires attached):
brian@box:~$ sudo ip -d -s link show can0
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state
UNKNOWN mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can <TRIPLE-SAMPLING> state ERROR-PASSIVE (berr-counter tx 0 rx
135) restart-ms 1000
        bitrate 250000 sample-point 0.875
        tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
        sja1000: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
        clock 8000000
        re-started bus-errors arbit-lost error-warn error-pass bus-off
        0          0          -688250473 1          1          0
    RX: bytes  packets  errors  dropped overrun mcast
    28853734600 3606716825 0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        3606716823 0       0       0

Interestingly, that appears to be without berr-reporting on. When I
explicitly pass "berr-reporting on" in the `ip link set can0 up ...`
command, BERR-REPORTING shows up next to TRIPLE-SAMPLING as expected.
I'll have to check if that has any noticeable effect on the numbers
reported. I too would appreciate a pointer to more detailed
documentation on what exactly that option does.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Troubleshooting bus errors, bus turns off during canplayer playback
  2016-12-20  5:01 ` Tom Evans
@ 2016-12-21 18:08   ` Hans L
  2016-12-22 18:03     ` Oliver Hartkopp
  2016-12-23 12:03     ` Tom Evans
  0 siblings, 2 replies; 6+ messages in thread
From: Hans L @ 2016-12-21 18:08 UTC (permalink / raw)
  To: linux-can

I'm reading some of these links talking about the issue of dealing
with multiple programs acessing the socket, but in the case I'm
dealing with, there's only one program connecting to the can socket on
this device.  That program is canplayer.  So I would need to patch
canplayer with that setsockopt code you are suggesting?

On Mon, Dec 19, 2016 at 11:01 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
> On 20/12/16 15:53, Hans L wrote:
>>
>> I'm a developer writing firmware for a J1939 device and I have some
>> logs recorded from vehicles that I want to use as test data, playing
>> these on a "bench" setup.
>>
>> I am using a Beaglebone black running debian, with a PEAK PCAN-USB
>> adapter to play these logs on my test bus.
>>
>> When I attempt to play a log that has pretty high bus utilization, it
>> just quits after sending few frames with "No buffer space available".
>
>
> That's an old problem that everyone runs into eventually.
>
> It has come up before on this list.
>
> Here's something I posted in March 2015:
>
> ====
>
> Sockets can block or return ENOBUFS. Ethernet blocks before it returns
> ENOBUFS like you'd expect. With CAN it does the opposite "out of the box"
> and needs to be fixed.
>
> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
>
>     With Ethernet, the transmit queue length is 1000 (which would
>     return ENOBUF) but before that happens it hits SO_SNDBUF,
>     which may be 108544, which is the total Data plus SKB, and
>     with an SKB size of about 200 that means it blocks at about
>     500 before it ENOBUFs at 1000.
>
>     With CAN, it would block at 500, but it ENOBUFs at 10 first with
>     the default queue depth!
>
> I do the following to get a 256-deep queue that blocks before it overflows:
>
>     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
>     /bin/echo 256 > /sys/class/net/can1/tx_queue_len
>
>     int sndbuf = (250 + 8) * 256;
>     socklen_t socklen = sizeof(sndbuf);
>     /* Minimum socket buffer to try and get it blocking */
>     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
>                     &sndbuf, sizeof(sndbuf));
>
> You might also like to read:
>
> http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
>
>     SocketCAN and queueing disciplines:
>     Final Report
>     M. Sojka, R. Lisov y, P. P   sa
>     Czech Technical University in Prague
>     July 20, 2012
>     Version 1.2
>
> Tom
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Troubleshooting bus errors, bus turns off during canplayer playback
  2016-12-21 18:08   ` Hans L
@ 2016-12-22 18:03     ` Oliver Hartkopp
  2016-12-23 12:03     ` Tom Evans
  1 sibling, 0 replies; 6+ messages in thread
From: Oliver Hartkopp @ 2016-12-22 18:03 UTC (permalink / raw)
  To: Hans L, linux-can

Hello Hans,

On 12/21/2016 07:08 PM, Hans L wrote:
> I'm reading some of these links talking about the issue of dealing
> with multiple programs acessing the socket, but in the case I'm
> dealing with, there's only one program connecting to the can socket on
> this device.  That program is canplayer.  So I would need to patch
> canplayer with that setsockopt code you are suggesting?

the problem may arise when the time gap of CAN frames to be sent is too 
short in the recorded timestamps of the logfile.

When you want to send data at a higher rate than the bandwidth of the 
CAN bus is able to carry, the tx queue will overflow.

When you have a real CAN dump created from a real CAN interface with the 
same bitrate there can be a pile up of CAN frames in the tx queue.

Did you try to increase the tx queue length to create some extra buffer 
to handle these pile ups?

To increase the available tx queue length in the CAN driver, e.g.:

     echo 100 > /sys/class/net/can0/tx_queue_len

   or with the 'ip' tool from the iproute2 package

     ip link set can0 txqueuelen 100

The default is 10 - so you might experiment yourself what is sufficient 
for your use case.

Regards,
Oliver


>
> On Mon, Dec 19, 2016 at 11:01 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
>> On 20/12/16 15:53, Hans L wrote:
>>>
>>> I'm a developer writing firmware for a J1939 device and I have some
>>> logs recorded from vehicles that I want to use as test data, playing
>>> these on a "bench" setup.
>>>
>>> I am using a Beaglebone black running debian, with a PEAK PCAN-USB
>>> adapter to play these logs on my test bus.
>>>
>>> When I attempt to play a log that has pretty high bus utilization, it
>>> just quits after sending few frames with "No buffer space available".
>>
>>
>> That's an old problem that everyone runs into eventually.
>>
>> It has come up before on this list.
>>
>> Here's something I posted in March 2015:
>>
>> ====
>>
>> Sockets can block or return ENOBUFS. Ethernet blocks before it returns
>> ENOBUFS like you'd expect. With CAN it does the opposite "out of the box"
>> and needs to be fixed.
>>
>> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
>>
>>     With Ethernet, the transmit queue length is 1000 (which would
>>     return ENOBUF) but before that happens it hits SO_SNDBUF,
>>     which may be 108544, which is the total Data plus SKB, and
>>     with an SKB size of about 200 that means it blocks at about
>>     500 before it ENOBUFs at 1000.
>>
>>     With CAN, it would block at 500, but it ENOBUFs at 10 first with
>>     the default queue depth!
>>
>> I do the following to get a 256-deep queue that blocks before it overflows:
>>
>>     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
>>     /bin/echo 256 > /sys/class/net/can1/tx_queue_len
>>
>>     int sndbuf = (250 + 8) * 256;
>>     socklen_t socklen = sizeof(sndbuf);
>>     /* Minimum socket buffer to try and get it blocking */
>>     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
>>                     &sndbuf, sizeof(sndbuf));
>>
>> You might also like to read:
>>
>> http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
>>
>>     SocketCAN and queueing disciplines:
>>     Final Report
>>     M. Sojka, R. Lisov y, P. P   sa
>>     Czech Technical University in Prague
>>     July 20, 2012
>>     Version 1.2
>>
>> Tom
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Troubleshooting bus errors, bus turns off during canplayer playback
  2016-12-21 18:08   ` Hans L
  2016-12-22 18:03     ` Oliver Hartkopp
@ 2016-12-23 12:03     ` Tom Evans
  1 sibling, 0 replies; 6+ messages in thread
From: Tom Evans @ 2016-12-23 12:03 UTC (permalink / raw)
  To: Hans L, linux-can

On 22/12/2016 5:08 AM, Hans L wrote:
 > I'm reading some of these links talking about the issue of dealing
 > with multiple programs accessing the socket, but in the case I'm
 > dealing with, there's only one program connecting to the can socket on
 > this device.

That's not the problem. The network layer is set up to assume 1500-byte 
Ethernet packets. There are two limits - the number of packets on the 
lower just-above-the-hardware queue and the number of buffers taken up 
by each socket. The default settings mean that programs sending "as fast 
as they can" like an FTP file-transfer will block before the hit the 
"start dumping buffers" limit. By default, CAN is set up completely the 
wrong way for this. You have to increase the hardware queue (who ever 
thought "10" was a good number?) and change the socket limit.

 > That program is canplayer.  So I would need to patch
 > canplayer with that setsockopt code you are suggesting?

If it doesn't have a command to make it rate-limit its transmissions - 
if it is written assuming the lower layers will properly flow-control 
it, then I'd say "yes". There may be a way to change the OS defaults so 
you don't have to patch it. It may make it a better program if you do 
add explicit command-line-specified rate-limiting to it.

Tom



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-12-23 12:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-20  4:53 Troubleshooting bus errors, bus turns off during canplayer playback Hans L
2016-12-20  5:01 ` Tom Evans
2016-12-21 18:08   ` Hans L
2016-12-22 18:03     ` Oliver Hartkopp
2016-12-23 12:03     ` Tom Evans
2016-12-20  5:23 ` Brian Silverman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.