Fwd: Querying current tx_queue usage of a SocketCAN interface

All of lore.kernel.org
 help / color / mirror / Atom feed

* Fwd: Querying current tx_queue usage of a SocketCAN interface
       [not found] <CAE+ymru296P+LjkT7_ONVc2OGMP9mtXW46Nq5aSnm1etauj9Aw@mail.gmail.com>
@ 2015-03-28 20:26 ` Paarvai Naai
  2015-03-29 22:42   ` Tom Evans
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-03-28 20:26 UTC (permalink / raw)
  To: linux-can

Hi,

I have been looking into how to query the SocketCAN interface for its
current tx_queue usage.  I found the following posts touching on this:
http://comments.gmane.org/gmane.network.socketcan.user/1255
http://socket-can.996257.n3.nabble.com/queuesize-manipulation-td310.html

It appears that there is no easy way to find out how much of the
interface's tx_queue has been filled up, but these posts are
relatively old.  Has there been any improved features in this regard
with SocketCAN?

Also, the tx_queue appears to be shared between clients of the
SocketCAN interface.  Even if one process checked the fill state of
the tx_queue, another process could consume the remaining space in the
tx_queue before the first process submits its message for
transmission.  This means that clients have to implement their own TX
queues and use poll() (either in a thread or in the application's main
loop) to occasionally service their private TX queues when the
SocketCAN interface is able to accept new messages.

I saw the following article about using qdisc with SocketCAN but
haven't looked at it in detail yet:
http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
Has anyone else used qdisc to get around the problem I mention?

Thanks in advance!
Paarvai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-03-28 20:26 ` Fwd: Querying current tx_queue usage of a SocketCAN interface Paarvai Naai
@ 2015-03-29 22:42   ` Tom Evans
  2015-03-30 21:55     ` Paarvai Naai
  0 siblings, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-03-29 22:42 UTC (permalink / raw)
  To: Paarvai Naai, linux-can

On 29/03/15 07:26, Paarvai Naai wrote:
> Hi,
>
> I have been looking into how to query the SocketCAN interface for its
> current tx_queue usage.

Just letting you know of a (what was for me) unexpected behaviour of the queue.

Sockets can block or return ENOBUFS. Ethernet blocks before it returns ENOBUFS 
like you'd expect. With CAN it does the opposite "out of the box" and needs to 
be fixed.

http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html

     With Ethernet, the transmit queue length is 1000 (which would
     return ENOBUF) but before that happens it hits SO_SNDBUF,
     which may be 108544, which is the total Data plus SKB, and
     with an SKB size of about 200 that means it blocks at about
     500 before it ENOBUFs at 1000.

     With CAN, it would block at 500, but it ENOBUFs at 10 first with
     the default queue depth!

I do the following to get a 256-deep queue that blocks before it overflows:

     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
     /bin/echo 256 > /sys/class/net/can1/tx_queue_len

     int sndbuf = (250 + 8) * 256;
     socklen_t socklen = sizeof(sndbuf);
     /* Minimum socket buffer to try and get it blocking */
     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
                     &sndbuf, sizeof(sndbuf));

You might also like to read:

http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf

     SocketCAN and queueing disciplines:
     Final Report
     M. Sojka, R. Lisov\x13y, P. P\x13\x10\x14sa
     Czech Technical University in Prague
     July 20, 2012
     Version 1.2

Tom


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-03-29 22:42   ` Tom Evans
@ 2015-03-30 21:55     ` Paarvai Naai
       [not found]       ` <5519E5A9.7080104@optusnet.com.au>
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-03-30 21:55 UTC (permalink / raw)
  To: tom_usenet; +Cc: linux-can

Hi Tom,

Thanks for that pointer regarding ENOBUFS.  I will look into it for my
application.

Regarding, the qdisc reference, that does look like an interesting
whitepaper -- I had mentioned it in my original post in this thread as
well.

If anyone has pointers regarding querying the number of available
buffers on the TX side, that would be much appreciated.

Best regards,
Paarvai


On Sun, Mar 29, 2015 at 3:42 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
> On 29/03/15 07:26, Paarvai Naai wrote:
>>
>> Hi,
>>
>> I have been looking into how to query the SocketCAN interface for its
>> current tx_queue usage.
>
>
> Just letting you know of a (what was for me) unexpected behaviour of the
> queue.
>
> Sockets can block or return ENOBUFS. Ethernet blocks before it returns
> ENOBUFS like you'd expect. With CAN it does the opposite "out of the box"
> and needs to be fixed.
>
> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
>
>     With Ethernet, the transmit queue length is 1000 (which would
>     return ENOBUF) but before that happens it hits SO_SNDBUF,
>     which may be 108544, which is the total Data plus SKB, and
>     with an SKB size of about 200 that means it blocks at about
>     500 before it ENOBUFs at 1000.
>
>     With CAN, it would block at 500, but it ENOBUFs at 10 first with
>     the default queue depth!
>
> I do the following to get a 256-deep queue that blocks before it overflows:
>
>     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
>     /bin/echo 256 > /sys/class/net/can1/tx_queue_len
>
>     int sndbuf = (250 + 8) * 256;
>     socklen_t socklen = sizeof(sndbuf);
>     /* Minimum socket buffer to try and get it blocking */
>     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
>                     &sndbuf, sizeof(sndbuf));
>
> You might also like to read:
>
> http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf
>
>     SocketCAN and queueing disciplines:
>     Final Report
>     M. Sojka, R. Lisov y, P. P   sa
>     Czech Technical University in Prague
>     July 20, 2012
>     Version 1.2
>
> Tom
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
       [not found]       ` <5519E5A9.7080104@optusnet.com.au>
@ 2015-03-31  0:26         ` Paarvai Naai
  2015-03-31  3:09           ` Tom Evans
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-03-31  0:26 UTC (permalink / raw)
  To: tom_usenet; +Cc: linux-can

Hi Tom,

Right, my understanding is that if there is only one client for the
SocketCAN interface, a simple wrapping of the SocketCAN tx_queue with
a qdisc might(?) work.  Need to understand the complexities of qdisc
first though.

If there are multiple clients, there is idneed the priority inversion
problem you mention.  Even if one has a single process designated to
read/write to the interface, managing priority is not fool proof.  A
low priority message could get submitted to the underlying SocketCAN
interface and be unable to go out onto the bus because of other
third-party nodes on the bus sending higher priority traffic at high
frequency.  This would block further messages from going out on the
SocketCAN interfaces, even though they have higher priority, no?

Paarvai


> The whitepaper is "promising". All you have to do is to implement it in your
> drivers to gain the functionality. I don't expect anyone else will any time
> soon.
>
> Otherwise, this looks "promising", but I can't get it to show anything other
> than zero:
>
>   # cat /sys/class/net/can0/queues/tx-0/byte_queue_limits/inflight
>   0
>
> Maybe I need to configure something else in the kernel or make some other
> configuration change to use it. I don't need this, but as you do I'd suggest
> you look at it.
>
> The code that allows access to "inflight" is in:
>
>     drivers/net/core/net-sysfs.c
>
> The definition of "inflight" in that file is:
>
> static ssize_t bql_show_inflight(struct netdev_queue *queue,
>                  struct netdev_queue_attribute *attr,
>                  char *buf)
> {
>     struct dql *dql = &queue->dql;
>
>     return sprintf(buf, "%u\n", dql->num_queued - dql->num_completed);
> }
>
> If you dig around enough through the linux source code you may be able to
> get the queueing to work and be able to use that sysfs variable. The code is
> in /lib/dynamic_queue_limits.c. It seems to be used by
> include/linux/netdevice.h.
>
> That still isn't going to solve your real problem, which (I'm assuming) is
> to try and get multiple senders through the CAN interface to "play nice"
> with each other in an attempt to establish normal ID-based transmission
> priority. That's what the whitepaper is all about.
>
> If you can read the queue depth I'm guessing all of your programs will have
> to queue internally and "promise" not to fill the queue if a higher priority
> ID message (from another thread/process) might need the slot. That's quite a
> change to the interface. If you're going to do that I'd suggest instead that
> you designate one process (or thread as appropriate) that is the only one
> allowed to open the CAN driver for transmission. It implements multiple
> internal priority queues. Have it provide a socket interface for the other
> programs that wish to send and have them all transmit via that process.
>
> Tom
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-03-31  0:26         ` Paarvai Naai
@ 2015-03-31  3:09           ` Tom Evans
  2015-04-01 20:33             ` Paarvai Naai
  0 siblings, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-03-31  3:09 UTC (permalink / raw)
  To: Paarvai Naai; +Cc: linux-can

On 31/03/15 11:26, Paarvai Naai wrote:
> Hi Tom,
 > ...
> If there are multiple clients, there is idneed the priority inversion
> problem you mention.  Even if one has a single process designated to
> read/write to the interface, managing priority is not fool proof.  A
> low priority message could get submitted to the underlying SocketCAN
> interface and be unable to go out onto the bus because of other
> third-party nodes on the bus sending higher priority traffic at high
> frequency.  This would block further messages from going out on the
> SocketCAN interfaces, even though they have higher priority, no?

Of course. That's the working definition of "PRIORITY".

If that happens the system is behaving exactly as it has been designed to do, 
both by the hardware and software vendors, and by YOU, the default "System 
Architect".

It that isn't doing what you want it to do, reassign the IDs so the "Higher 
Priority" or time sensitive messages have a higher priority, and the "high 
frequency" traffic has a sensibly lower priority.

CAN was never meant to be used in an "Open System" where different and 
unrelated devices can be plugged onto a common bus. It is meant to be used n a 
single car where the whole system has been designed, specified (including 
assignment of all IDs with appropriate priorities), and rigorously tested to 
ensure the situation you're describing can't happen.

Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-03-31  3:09           ` Tom Evans
@ 2015-04-01 20:33             ` Paarvai Naai
  2015-04-01 20:57               ` Dan Egnor
  2015-04-01 23:21               ` Tom Evans
  0 siblings, 2 replies; 29+ messages in thread
From: Paarvai Naai @ 2015-04-01 20:33 UTC (permalink / raw)
  To: tom_usenet; +Cc: linux-can

Hi Tom,

I'm not sure if you quite followed my concern with multiple clients to
the SocketCAN interface.  All of the clients of a particular SocketCAN
interface (i.e., "can0") are piggybacking on the same electrical node
of the CAN bus -- this is their shared pipe to get their frames out on
the bus.  Let me try to do a better job of explaining:

1) Lets call presuppose a node in our network, node "A", that runs
Linux and has uses SocketCAN.
2) Client #1 of node A's SocketCAN-based interface submits a frame to
its socket with ID=0x7ff.
3) Other non-Linux-based nodes on the bus, say nodes "B", "C", and "D"
are spewing higher priority messages out on the bus.
4) As such, node A is unable to send frame with ID=0x7ff for while.
5) Now, some small amount of time later, Client #2 of the
SocektCAN-based interface to node "A" submits a frame to its socket
with ID=0x000.  Client #2 might be another thread or process running
on the Linux OS of node A.
6) On the next arbitration loss of ID=0x7ff, node A should ideally
reprioritize its TX queue such that ID=0x000 has a chance to get out.

I have a suspicion that SocketCAN implementations are not careful
about doing this, even though particular controllers may support this
type of re-prioritization of frames (with their own internal TX
queues).

Does this make sense?  Do you have any thoughts?

Thanks,
Paarvai

On Mon, Mar 30, 2015 at 8:09 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
> On 31/03/15 11:26, Paarvai Naai wrote:
>>
>> Hi Tom,
>
>> ...
>>
>> If there are multiple clients, there is idneed the priority inversion
>> problem you mention.  Even if one has a single process designated to
>> read/write to the interface, managing priority is not fool proof.  A
>> low priority message could get submitted to the underlying SocketCAN
>> interface and be unable to go out onto the bus because of other
>> third-party nodes on the bus sending higher priority traffic at high
>> frequency.  This would block further messages from going out on the
>> SocketCAN interfaces, even though they have higher priority, no?
>
>
> Of course. That's the working definition of "PRIORITY".
>
> If that happens the system is behaving exactly as it has been designed to
> do, both by the hardware and software vendors, and by YOU, the default
> "System Architect".
>
> It that isn't doing what you want it to do, reassign the IDs so the "Higher
> Priority" or time sensitive messages have a higher priority, and the "high
> frequency" traffic has a sensibly lower priority.
>
> CAN was never meant to be used in an "Open System" where different and
> unrelated devices can be plugged onto a common bus. It is meant to be used n
> a single car where the whole system has been designed, specified (including
> assignment of all IDs with appropriate priorities), and rigorously tested to
> ensure the situation you're describing can't happen.
>
> Tom
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-01 20:33             ` Paarvai Naai
@ 2015-04-01 20:57               ` Dan Egnor
  2015-04-02  2:20                 ` Tom Evans
  2015-04-01 23:21               ` Tom Evans
  1 sibling, 1 reply; 29+ messages in thread
From: Dan Egnor @ 2015-04-01 20:57 UTC (permalink / raw)
  To: linux-can

Paarvai Naai <opensource3141 <at> gmail.com> writes:
> I have a suspicion that SocketCAN implementations are not careful
> about doing this, even though particular controllers may support this
> type of re-prioritization of frames (with their own internal TX
> queues).

Actually I think this is a problem even if you have one process -- a higher 
priority message can get stuck behind a lower priority message (priority 
inversion).

Is there any way to deal with this, assuming that one node wants to send both 
high priority and low priority messages? There must be a solution, otherwise 
half the point of CAN (prioritized messages) would be lost...!

-- egnor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-01 20:33             ` Paarvai Naai
  2015-04-01 20:57               ` Dan Egnor
@ 2015-04-01 23:21               ` Tom Evans
  2015-04-02  0:33                 ` Dan Egnor
  1 sibling, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-04-01 23:21 UTC (permalink / raw)
  To: Paarvai Naai; +Cc: linux-can

On 02/04/15 07:33, Paarvai Naai wrote:
> Hi Tom,
>
> I'm not sure if you quite followed my concern with multiple clients to
> the SocketCAN interface.  All of the clients of a particular SocketCAN
> interface (i.e., "can0") are piggybacking on the same electrical node
> of the CAN bus -- this is their shared pipe to get their frames out on
> the bus.  Let me try to do a better job of explaining:
>
> 1) Lets call presuppose a node in our network, node "A", that runs
> Linux and has uses SocketCAN.
> 2) Client #1 of node A's SocketCAN-based interface submits a frame to
> its socket with ID=0x7ff.
> 3) Other non-Linux-based nodes on the bus, say nodes "B", "C", and "D"
> are spewing higher priority messages out on the bus.
> 4) As such, node A is unable to send frame with ID=0x7ff for while.
> 5) Now, some small amount of time later, Client #2 of the
> SocektCAN-based interface to node "A" submits a frame to its socket
> with ID=0x000.  Client #2 might be another thread or process running
> on the Linux OS of node A.
> 6) On the next arbitration loss of ID=0x7ff, node A should ideally
> reprioritize its TX queue such that ID=0x000 has a chance to get out.
>
> I have a suspicion that SocketCAN implementations are not careful
> about doing this, even though particular controllers may support this
> type of re-prioritization of frames (with their own internal TX
> queues).
>
> Does this make sense?  Do you have any thoughts?

You've given a very good and clear description of this problem.

The same thing can happen with a "deeply embedded" device. It depends on how 
many hardware transmit buffers there are, and how these are used by the 
controlling software.

There are two basic classes of CAN hardware. One has a lot of separate 
transmit and receive buffers, and is expected to be set up with one-only ID 
per buffer, in and outbound. This can be so "bare metal" that the buffer 
memory in the CAN controller is used as the ONLY place where the "variables" 
in the CAN messages are stored. The software reads and writes these variables 
in the buffers. The other sort of CAN controller implements FIFOs and better 
supports "streams" of packets. Many controllers are a bit of both.

I'm familiar with a few different CAN controllers.

The early/original Philips PCA82C200 and the compatible NXP SJA1000. They only 
have ONE hardware transmit buffer (even with the SJA1000 in PeliCAN mode), so 
this hardware suffers from the exact same problem you mention above. It can 
only handle priority properly if it is only sending ONE ID only.

The Microchip  MCP2515 has 3 transmit buffers, so can automatically prioritise 
three different IDs in hardware if the software supports that. But a maximum 
of three.

The Microchip ECAN controllers support the MCP2515 mode as well as a mode with 
(up to) 8 hardware transmit buffers.

Freescale's MSCAN has three transmit buffers.

Freescale's FlexCAN controller has 16 buffers that can be split between 
transmit and receive.

As the "System Designer", if your system (your car) uses SJA1000 controllers, 
or other ones, but with software that doesn't support the multiple hardware 
buffers, or has more IDs in use that the number of buffers, they you have to 
design the *SYSTEM* so it works within those constraints.

 > say nodes "B", "C", and "D" are spewing higher priority
 > messages out on the bus.

Then as part of the "system design" they're not allowed to do that. They have 
to rate-limit their transmissions to allow the other devices to meet the 
system requirements for latency, jitter or whatever.

TTCAN was invented in 2002 to try and solve this (and other) problems, but it 
looks to have been the "Betamax" of CAN protocols. Details here if you're 
interested:

http://www.microchip.com/forums/m849480.aspx

Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-01 23:21               ` Tom Evans
@ 2015-04-02  0:33                 ` Dan Egnor
  2015-04-02  2:20                   ` Tom Evans
  2015-04-02  6:46                   ` Marc Kleine-Budde
  0 siblings, 2 replies; 29+ messages in thread
From: Dan Egnor @ 2015-04-02  0:33 UTC (permalink / raw)
  To: linux-can

Tom Evans <tom_usenet <at> optusnet.com.au> writes:
> As the "System Designer", if your system (your car) uses SJA1000 
controllers, 
> or other ones, but with software that doesn't support the multiple hardware 
> buffers, or has more IDs in use that the number of buffers, they you have 
to 
> design the *SYSTEM* so it works within those constraints.

That's all well and good, but what you're saying is that Linux SocketCAN is 
incapable of supporting anything but the most absolutely basic single-mailbox 
operation.

That seems surprisingly awful for something that's an established generic 
interface for all CAN hardware on this operating system! No matter how many 
mailboxes the controller might support, the operating system simply does not 
support prioritized transmission.

This means that in our system architecture, Linux nodes must be treated as 
substantially less capable than most embedded controllers. (The TI MCUs we're 
using have 64 mailboxes, for example.)

This feels like a massive functional deficit that's acknowledged nowhere in 
the documentation. Is this really the case??

-- egnor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-02  0:33                 ` Dan Egnor
@ 2015-04-02  2:20                   ` Tom Evans
  2015-04-02  6:28                     ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Marc Kleine-Budde
  2015-04-02 18:23                     ` Fwd: Querying current tx_queue usage of a SocketCAN interface Paarvai Naai
  2015-04-02  6:46                   ` Marc Kleine-Budde
  1 sibling, 2 replies; 29+ messages in thread
From: Tom Evans @ 2015-04-02  2:20 UTC (permalink / raw)
  To: Dan Egnor, linux-can

On 02/04/15 11:33, Dan Egnor wrote:
> Tom Evans <tom_usenet <at> optusnet.com.au> writes:
>> As the "System Designer", if your system (your car) uses SJA1000
>> controllers, or other ones, but with software that doesn't support
 >> the multiple hardware buffers, or has more IDs in use that the
 >> number of buffers, they you have to design the *SYSTEM* so it
 >> works within those constraints.
>
> That's all well and good, but what you're saying is that Linux SocketCAN is
> incapable of supporting anything but the most absolutely basic single-mailbox
> operation.

That's a restriction of the socket interface. It gives you a lot "for free" in 
all the demultiplexing of receive data and the transmit FIFOs, but at a cost.

If you read this paper you'll see that you can use QDISC to implement priority 
queues. That fixes MOST of the problems, but not all as the priority queues 
don't connect through the driver to individual hardware transmit buffers. So 
there's always going to be a "single pipe" somewhere in the system where 
you'll get the priority inversion.

http://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf

If you wanted a "perfect" solution you'd need a probably separate can drive 
instance for each hardware buffer.

There's a lot of manual work and overhead setting up those priority queues. 
There's nothing "automatic" about it.

> That seems surprisingly awful for something that's an established generic
> interface for all CAN hardware on this operating system! No matter how many
> mailboxes the controller might support, the operating system simply does not
> support prioritized transmission.

Generic and simple versus custom and complicated. Give me simple any day.

> This means that in our system architecture, Linux nodes must be treated as
> substantially less capable than most embedded controllers. (The TI MCUs we're
> using have 64 mailboxes, for example.)

Linux nodes are by definition a lot less capable. They're not "real time" for 
a start, unless you want to use some "Real Time Linux" (Xenomai etc). But they 
need the CAN drivers rewritten to support them (Xenomai anyway).

I had to seriously rewrite the FlexCAN drivers for our i.MX53 because the 
Ethernet driver blocked the CAN driver for so long the CAN hardware receive 
FIFO overflowed. Linux doesn't even support setting different interrupt 
priorities on ARM (but my problem was with NAPI).

Even if you're running SMP and multicore with ARM, if the framebuffer driver 
does a cache flush it can lock all cores out for L2 for a millisecond, that's 
10 CAN messages at 1MHz:

http://www.arm.com/files/pdf/cachecoherencywhitepaper_6june2011.pdf

> This feels like a massive functional deficit that's acknowledged
 > nowhere in the documentation. Is this really the case??

And everything else in Linux is "perfect"?

Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-01 20:57               ` Dan Egnor
@ 2015-04-02  2:20                 ` Tom Evans
  2015-04-02  2:33                   ` Daniel Egnor
  0 siblings, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-04-02  2:20 UTC (permalink / raw)
  To: Dan Egnor, linux-can

On 02/04/15 07:57, Dan Egnor wrote:
> Paarvai Naai <opensource3141 <at> gmail.com> writes:
>> I have a suspicion that SocketCAN implementations are not careful
>> about doing this, even though particular controllers may support this
>> type of re-prioritization of frames (with their own internal TX
>> queues).
>
> Actually I think this is a problem even if you have one process -- a higher
> priority message can get stuck behind a lower priority message (priority
> inversion).

If there's a "single pipe" anywhere - a queue in the Application, in the OS, 
the driver or the hardware then you get this problem.

> Is there any way to deal with this, assuming that one node wants to send both
> high priority and low priority messages?

You need separate pipes all the way down to the CAN bus, through separate 
hardware "transmit buffers" that compete for the wire based on the IDs. You 
can probably get 90% of this by using the QDISC system though (in my other email).

> There must be a solution, otherwise
> half the point of CAN (prioritized messages) would be lost...!

That's right, it is lost. Unlike a more "generic" protocol like Ethernet, 
there's no requirement for every implementation to support every operating 
mode. There's no "testing authority" other than what vehicle manufacturers 
require suppliers to meet.

Every hardware and software implementation is a "good enough subset for some 
purpose", but can't be expected to meet every requirement.

In most cases it doesn't matter. As long as all devices have controlled and 
limited transmit rates and don't saturate the bus it all works well enough.

If your system requires the priority system to work for some reason, then YOU 
(as the system designer) have to be across every piece of hardware and 
software in the system to prove that it does work. That means binning all 
devices that use SJA1000s and replacing them with something better. That means 
replacing all devices running Linux with something else, or at least replacing 
the software and the drivers to work the way you need them to, probably 
bypassing socketcan and writing your own custom driver interface using ioctls.

Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-02  2:20                 ` Tom Evans
@ 2015-04-02  2:33                   ` Daniel Egnor
  0 siblings, 0 replies; 29+ messages in thread
From: Daniel Egnor @ 2015-04-02  2:33 UTC (permalink / raw)
  To: tom_usenet; +Cc: linux-can

On Wed, Apr 1, 2015 at 7:20 PM, Tom Evans <tom_usenet@optusnet.com.au> wrote:
> If your system requires the priority system to work for some reason, ...

Yeah, when you get to 80% nominal utilization on some of your buses
with occasional low priority "background" transfers pushing it to
100%, you really start to care about prioritization :-/.

We are indeed fighting the battle on many fronts to make sure that all
the nodes either send at a single priority, or don't mind lots of
delay, or have good CAN transmit buffering. It's just something of a
setback to realize that the Linux nodes have to be put in one of the
first two buckets.

Or maybe QDISC will save us -- we are indeed looking into that.

-- egnor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface)
  2015-04-02  2:20                   ` Tom Evans
@ 2015-04-02  6:28                     ` Marc Kleine-Budde
  2015-04-02 11:35                       ` Tom Evans
  2015-04-02 18:23                     ` Fwd: Querying current tx_queue usage of a SocketCAN interface Paarvai Naai
  1 sibling, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2015-04-02  6:28 UTC (permalink / raw)
  To: tom_usenet, Dan Egnor, linux-can

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

On 04/02/2015 04:20 AM, Tom Evans wrote:
> I had to seriously rewrite the FlexCAN drivers for our i.MX53 because the 
> Ethernet driver blocked the CAN driver for so long the CAN hardware receive 
> FIFO overflowed. Linux doesn't even support setting different interrupt 
> priorities on ARM (but my problem was with NAPI).

Can you share your changes? I'll have some days to hack on a better
generic RX-FIFO implementation for the at91 and flexcan.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-02  0:33                 ` Dan Egnor
  2015-04-02  2:20                   ` Tom Evans
@ 2015-04-02  6:46                   ` Marc Kleine-Budde
  2015-04-02 18:28                     ` Paarvai Naai
  1 sibling, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2015-04-02  6:46 UTC (permalink / raw)
  To: Dan Egnor, linux-can

[-- Attachment #1: Type: text/plain, Size: 1954 bytes --]

On 04/02/2015 02:33 AM, Dan Egnor wrote:
> That seems surprisingly awful for something that's an established generic 
> interface for all CAN hardware on this operating system! No matter how many 
> mailboxes the controller might support, the operating system simply does not 
> support prioritized transmission.

You have access to all queueing disc in linux, to prioritize your CAN
frames until they are send via the CAN driver into the hardware. This is
currently done in all drivers via a single TX queue. In default
configuration (without a special queueing discipline), you have multiple
FIFO from the userspace (the sockets) that are combined into a single
FIFO (queueing disc + TX routine in driver +  hardward).

However, if you need more control, fist make use of a queueing disc.
Then, if you still suffer from prio inversion, we can have a look at
your hardware. For now, all drivers, even those with multiple TX
buffers, configure them into one large TX-FIFO. We can define and
implement a new per interface option (or options, if needed) that
switches the hardware between FIFO and priority based sending. This
however is still a problem, if all your HW buffers are filled with low
prio CAN frames before a high prio one has the change to be transferred
into the hardware. This leads us to the implementation of multiple TX
queues from the linux stack into the driver. This is manly used on big
server Ethernet card, so that more than one processor can send packages.
We have to check if a) we can attach a queueing disc to a TX queue and
b) if these multiple TX queues are properly used on a single processor
system.

regards,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface)
  2015-04-02  6:28                     ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Marc Kleine-Budde
@ 2015-04-02 11:35                       ` Tom Evans
  2015-04-02 12:07                         ` Flexcan Marc Kleine-Budde
  2015-04-04  3:32                         ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Tom Evans
  0 siblings, 2 replies; 29+ messages in thread
From: Tom Evans @ 2015-04-02 11:35 UTC (permalink / raw)
  To: Marc Kleine-Budde, dan.egnor, linux-can

[-- Attachment #1: Type: text/plain, Size: 8279 bytes --]

On 2/04/2015 5:28 PM, Marc Kleine-Budde wrote:
> On 04/02/2015 04:20 AM, Tom Evans wrote:
>> I had to seriously rewrite the FlexCAN drivers for our i.MX53 because the
>> Ethernet driver blocked the CAN driver for so long the CAN hardware receive
>> FIFO overflowed. Linux doesn't even support setting different interrupt
>> priorities on ARM (but my problem was with NAPI).
>
> Can you share your changes? I'll have some days to hack on a better
> generic RX-FIFO implementation for the at91 and flexcan.

Sure. "git diff -p" based of the 3.4 kernel sources.

Working from home over Easter using git under cygwin under Windows via 
Thunderbird, so the CRLFs and line breaks could be messed up, sorry. 
Patches inline and attached. If this lot is useless, tell me what to 
type to do it properly and I'll resend on Tuesday.

flexcan.c.patch.buffer

The problem with the FlexCan driver is that it takes the interrupt, then 
throws to NAPI to actually read the hardware buffers. How messed up is 
that? It isn't as if reading 16 bytes or so is going to take THAT long, 
and as there's only six message buffers it overflows in 600us at 1MHz. 
In the same kernel, the Ethernet driver DOESN'T throw to NAPI but 
handles as many 1500-byte Ethernet frames as there are coming in in the 
one ISR. Other things can delay NAPI as well. So I fixed it with a 
sledgehammer. It reads the messages during the ISR to an internal ring 
buffer (sized to 100, later to 128)

flexcan.c.patch.flood

A disconnected CAN bus generate an interrupt storm. This patch fixes it 
the same way it is apparently done in some 3.18 drivers. This may 
already be in there.

flexcan.c.patch.buffer

---------------------------------------------------

diff --git a/arm-linux/drivers/net/can/flexcan.c 
b/arm-linux/drivers/net/can/flexcan.c
old mode 100644
new mode 100755
index 14c8a86..96e0b55
--- a/arm-linux/drivers/net/can/flexcan.c
+++ b/arm-linux/drivers/net/can/flexcan.c
@@ -43,6 +43,9 @@
  /* 8 for RX fifo and 2 error handling */
  #define FLEXCAN_NAPI_WEIGHT		(8 + 2)

+/* Buffers in the interrupt-read ring, 10ms@100us per. */
+#define FLEXCAN_READ_RING_SIZE		(100)
+
  /* FLEXCAN module configuration register (CANMCR) bits */
  #define FLEXCAN_MCR_MDIS		BIT(31)
  #define FLEXCAN_MCR_FRZ			BIT(30)
@@ -170,6 +173,14 @@ struct flexcan_regs {
  	struct flexcan_mb cantxfg[64];
  };

+struct flexcan_ring {
+	u32 input;		/* Ring input index */
+	u32 output;		/* Ring output index */
+	u32 full;		/* ring full/wrapped flag */
+	spinlock_t ring_lock;
+	struct can_frame frames[FLEXCAN_READ_RING_SIZE];
+};
+
  struct flexcan_priv {
  	struct can_priv can;
  	struct net_device *dev;
@@ -182,6 +193,7 @@ struct flexcan_priv {
  	struct clk *clk;
  	struct flexcan_platform_data *pdata;
  	struct gpio_sw *xcvr_switch;
+	struct flexcan_ring read_ring;
  };

  static struct can_bittiming_const flexcan_bittiming_const = {
@@ -532,12 +544,70 @@ static int flexcan_read_frame(struct net_device *dev)
  	return 1;
  }

+static int flexcan_read_frame_to_ring(struct net_device *dev)
+{
+	struct net_device_stats *stats = &dev->stats;
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct can_frame *cf;
+
+	cf = &priv->read_ring.frames[priv->read_ring.input];
+
+	/* Read even if ring full to clear the interrupt request */
+	flexcan_read_fifo(dev, cf);
+
+	if (priv->read_ring.full) {
+		stats->rx_dropped++;
+		return 0;
+	}
+	priv->read_ring.input += 1;
+	if (priv->read_ring.input >= FLEXCAN_READ_RING_SIZE) {
+		priv->read_ring.input  = 0;
+	}
+	if (priv->read_ring.input == priv->read_ring.output) {
+		priv->read_ring.full = 1;
+	}
+
+	stats->rx_packets++;
+	stats->rx_bytes += cf->can_dlc;
+
+	return 1;
+}
+
+static int flexcan_read_frame_from_ring(struct net_device *dev)
+{
+	struct net_device_stats *stats = &dev->stats;
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct can_frame *cf;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	skb = alloc_can_skb(dev, &cf);
+	if (unlikely(!skb)) {
+		stats->rx_dropped++;
+		return 1;	/* was 0, but that's an infinite loop bug */
+	}
+	/* Assumes that the ring-empty test already done */
+	memcpy(cf, &priv->read_ring.frames[priv->read_ring.output], sizeof(*cf));
+
+	spin_lock_irqsave(&priv->read_ring.ring_lock, flags);
+	priv->read_ring.full = 0;
+	priv->read_ring.output += 1;
+	if (priv->read_ring.output >= FLEXCAN_READ_RING_SIZE) {
+		priv->read_ring.output = 0;
+	}
+	spin_unlock_irqrestore(&priv->read_ring.ring_lock, flags);
+
+	netif_receive_skb(skb);
+
+	return 1;
+}
+
  static int flexcan_poll(struct napi_struct *napi, int quota)
  {
  	struct net_device *dev = napi->dev;
-	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_priv *priv = netdev_priv(dev);
  	struct flexcan_regs __iomem *regs = priv->base;
-	u32 reg_iflag1, reg_esr;
+	u32 reg_esr;
  	int work_done = 0;

  	/*
@@ -545,16 +615,16 @@ static int flexcan_poll(struct napi_struct *napi, 
int quota)
  	 * use saved value from irq handler.
  	 */
  	reg_esr = flexcan_read(&regs->esr) | priv->reg_esr;
+	priv->reg_esr = 0;

  	/* handle state changes */
  	work_done += flexcan_poll_state(dev, reg_esr);

-	/* handle RX-FIFO */
-	reg_iflag1 = flexcan_read(&regs->iflag1);
-	while (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE &&
+	/* handle RX-RING */
+	while (((priv->read_ring.full) ||
+		(priv->read_ring.input != priv->read_ring.output)) &&
  	       work_done < quota) {
-		work_done += flexcan_read_frame(dev);
-		reg_iflag1 = flexcan_read(&regs->iflag1);
+		work_done += flexcan_read_frame_from_ring(dev);
  	}

  	/* report bus errors */
@@ -563,9 +633,6 @@ static int flexcan_poll(struct napi_struct *napi, 
int quota)

  	if (work_done < quota) {
  		napi_complete(napi);
-		/* enable IRQs */
-		flexcan_write(FLEXCAN_IFLAG_DEFAULT, &regs->imask1);
-		flexcan_write(priv->reg_ctrl_default, &regs->ctrl);
  	}

  	return work_done;
@@ -598,11 +665,14 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
  		 * The error bits are cleared on read,
  		 * save them for later use.
  		 */
-		priv->reg_esr = reg_esr & FLEXCAN_ESR_ERR_BUS;
-		flexcan_write(FLEXCAN_IFLAG_DEFAULT &
-			~FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, &regs->imask1);
-		flexcan_write(priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_ALL,
-		       &regs->ctrl);
+		priv->reg_esr |= reg_esr & FLEXCAN_ESR_ERR_BUS;
+		/*
+		 * Read the FIFO into the rings.
+		 */
+		while (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) {
+			flexcan_read_frame_to_ring(dev);
+			reg_iflag1 = flexcan_read(&regs->iflag1);
+		}
  		napi_schedule(&priv->napi);
  	}

@@ -1012,6 +1082,7 @@ static int __devinit flexcan_probe(struct 
platform_device *pdev)
  	priv->dev = dev;
  	priv->clk = clk;
  	priv->pdata = pdev->dev.platform_data;
+	spin_lock_init(&priv->read_ring.ring_lock);

  	netif_napi_add(dev, &priv->napi, flexcan_poll, FLEXCAN_NAPI_WEIGHT);

---------------------------------------------------

flexcan.c.patch.flood

---------------------------------------------------

diff --git a/arm-linux/drivers/net/can/flexcan.c 
b/arm-linux/drivers/net/can/flexcan.c
index 2b25548..f773cb6 100644
--- a/arm-linux/drivers/net/can/flexcan.c
+++ b/arm-linux/drivers/net/can/flexcan.c
@@ -857,11 +857,26 @@ static int flexcan_chip_start(struct net_device *dev)
  	 * _note_: we enable the "error interrupt"
  	 * (FLEXCAN_CTRL_ERR_MSK), too. Otherwise we don't get any
  	 * warning or bus passive interrupts.
+	 *
+	 * __note__: The above is rubbish and causes interrupt
+	 * flooding. Change it to the 3.18 way without of obeying
+	 * CAN_CTRL_MODE_BERR_REPORTING and without enabling the
+	 * Error interrupts.
  	 */
  	reg_ctrl = flexcan_read(&regs->ctrl);
  	reg_ctrl &= ~FLEXCAN_CTRL_TSYN;
  	reg_ctrl |= FLEXCAN_CTRL_BOFF_REC | FLEXCAN_CTRL_LBUF |
-		FLEXCAN_CTRL_ERR_STATE | FLEXCAN_CTRL_ERR_MSK;
+		FLEXCAN_CTRL_ERR_STATE /* | FLEXCAN_CTRL_ERR_MSK */;
+
+	/*
+	* enable the "error interrupt" (FLEXCAN_CTRL_ERR_MSK),
+	* on most Flexcan cores, too. Otherwise we don't get
+	* any error warning or passive interrupts.
+	*/
+	if (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
+		reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
+	else
+		reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;

  	/* save for later use */
  	priv->reg_ctrl_default = reg_ctrl;

---------------------------------------------------

Tom


[-- Attachment #2: flexcan.c.patch.buffer --]
[-- Type: text/plain, Size: 5085 bytes --]

diff --git a/arm-linux/drivers/net/can/flexcan.c b/arm-linux/drivers/net/can/flexcan.c
old mode 100644
new mode 100755
index 14c8a86..96e0b55
--- a/arm-linux/drivers/net/can/flexcan.c
+++ b/arm-linux/drivers/net/can/flexcan.c
@@ -43,6 +43,9 @@
 /* 8 for RX fifo and 2 error handling */
 #define FLEXCAN_NAPI_WEIGHT		(8 + 2)
 
+/* Buffers in the interrupt-read ring, 10ms@100us per. */
+#define FLEXCAN_READ_RING_SIZE		(100)
+
 /* FLEXCAN module configuration register (CANMCR) bits */
 #define FLEXCAN_MCR_MDIS		BIT(31)
 #define FLEXCAN_MCR_FRZ			BIT(30)
@@ -170,6 +173,14 @@ struct flexcan_regs {
 	struct flexcan_mb cantxfg[64];
 };
 
+struct flexcan_ring {
+	u32 input;		/* Ring input index */
+	u32 output;		/* Ring output index */
+	u32 full;		/* ring full/wrapped flag */
+	spinlock_t ring_lock;
+	struct can_frame frames[FLEXCAN_READ_RING_SIZE];
+};
+
 struct flexcan_priv {
 	struct can_priv can;
 	struct net_device *dev;
@@ -182,6 +193,7 @@ struct flexcan_priv {
 	struct clk *clk;
 	struct flexcan_platform_data *pdata;
 	struct gpio_sw *xcvr_switch;
+	struct flexcan_ring read_ring;
 };
 
 static struct can_bittiming_const flexcan_bittiming_const = {
@@ -532,12 +544,70 @@ static int flexcan_read_frame(struct net_device *dev)
 	return 1;
 }
 
+static int flexcan_read_frame_to_ring(struct net_device *dev)
+{
+	struct net_device_stats *stats = &dev->stats;
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct can_frame *cf;
+
+	cf = &priv->read_ring.frames[priv->read_ring.input];
+
+	/* Read even if ring full to clear the interrupt request */
+	flexcan_read_fifo(dev, cf);
+
+	if (priv->read_ring.full) {
+		stats->rx_dropped++;
+		return 0;
+	}
+	priv->read_ring.input += 1;
+	if (priv->read_ring.input >= FLEXCAN_READ_RING_SIZE) {
+		priv->read_ring.input  = 0;
+	}
+	if (priv->read_ring.input == priv->read_ring.output) {
+		priv->read_ring.full = 1;
+	}
+
+	stats->rx_packets++;
+	stats->rx_bytes += cf->can_dlc;
+
+	return 1;
+}
+
+static int flexcan_read_frame_from_ring(struct net_device *dev)
+{
+	struct net_device_stats *stats = &dev->stats;
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct can_frame *cf;
+	struct sk_buff *skb;
+	unsigned long flags;
+
+	skb = alloc_can_skb(dev, &cf);
+	if (unlikely(!skb)) {
+		stats->rx_dropped++;
+		return 1;	/* was 0, but that's an infinite loop bug */
+	}
+	/* Assumes that the ring-empty test already done */
+	memcpy(cf, &priv->read_ring.frames[priv->read_ring.output], sizeof(*cf));
+
+	spin_lock_irqsave(&priv->read_ring.ring_lock, flags);
+	priv->read_ring.full = 0;
+	priv->read_ring.output += 1;
+	if (priv->read_ring.output >= FLEXCAN_READ_RING_SIZE) {
+		priv->read_ring.output = 0;
+	}
+	spin_unlock_irqrestore(&priv->read_ring.ring_lock, flags);
+
+	netif_receive_skb(skb);
+
+	return 1;
+}
+
 static int flexcan_poll(struct napi_struct *napi, int quota)
 {
 	struct net_device *dev = napi->dev;
-	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_priv *priv = netdev_priv(dev);
 	struct flexcan_regs __iomem *regs = priv->base;
-	u32 reg_iflag1, reg_esr;
+	u32 reg_esr;
 	int work_done = 0;
 
 	/*
@@ -545,16 +615,16 @@ static int flexcan_poll(struct napi_struct *napi, int quota)
 	 * use saved value from irq handler.
 	 */
 	reg_esr = flexcan_read(&regs->esr) | priv->reg_esr;
+	priv->reg_esr = 0;
 
 	/* handle state changes */
 	work_done += flexcan_poll_state(dev, reg_esr);
 
-	/* handle RX-FIFO */
-	reg_iflag1 = flexcan_read(&regs->iflag1);
-	while (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE &&
+	/* handle RX-RING */
+	while (((priv->read_ring.full) ||
+		(priv->read_ring.input != priv->read_ring.output)) &&
 	       work_done < quota) {
-		work_done += flexcan_read_frame(dev);
-		reg_iflag1 = flexcan_read(&regs->iflag1);
+		work_done += flexcan_read_frame_from_ring(dev);
 	}
 
 	/* report bus errors */
@@ -563,9 +633,6 @@ static int flexcan_poll(struct napi_struct *napi, int quota)
 
 	if (work_done < quota) {
 		napi_complete(napi);
-		/* enable IRQs */
-		flexcan_write(FLEXCAN_IFLAG_DEFAULT, &regs->imask1);
-		flexcan_write(priv->reg_ctrl_default, &regs->ctrl);
 	}
 
 	return work_done;
@@ -598,11 +665,14 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 		 * The error bits are cleared on read,
 		 * save them for later use.
 		 */
-		priv->reg_esr = reg_esr & FLEXCAN_ESR_ERR_BUS;
-		flexcan_write(FLEXCAN_IFLAG_DEFAULT &
-			~FLEXCAN_IFLAG_RX_FIFO_AVAILABLE, &regs->imask1);
-		flexcan_write(priv->reg_ctrl_default & ~FLEXCAN_CTRL_ERR_ALL,
-		       &regs->ctrl);
+		priv->reg_esr |= reg_esr & FLEXCAN_ESR_ERR_BUS;
+		/*
+		 * Read the FIFO into the rings.
+		 */
+		while (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) {
+			flexcan_read_frame_to_ring(dev);
+			reg_iflag1 = flexcan_read(&regs->iflag1);
+		}
 		napi_schedule(&priv->napi);
 	}
 
@@ -1012,6 +1082,7 @@ static int __devinit flexcan_probe(struct platform_device *pdev)
 	priv->dev = dev;
 	priv->clk = clk;
 	priv->pdata = pdev->dev.platform_data;
+	spin_lock_init(&priv->read_ring.ring_lock);
 
 	netif_napi_add(dev, &priv->napi, flexcan_poll, FLEXCAN_NAPI_WEIGHT);
 

[-- Attachment #3: flexcan.c.patch.flood --]
[-- Type: text/plain, Size: 1244 bytes --]

diff --git a/arm-linux/drivers/net/can/flexcan.c b/arm-linux/drivers/net/can/flexcan.c
index 2b25548..f773cb6 100644
--- a/arm-linux/drivers/net/can/flexcan.c
+++ b/arm-linux/drivers/net/can/flexcan.c
@@ -857,11 +857,26 @@ static int flexcan_chip_start(struct net_device *dev)
 	 * _note_: we enable the "error interrupt"
 	 * (FLEXCAN_CTRL_ERR_MSK), too. Otherwise we don't get any
 	 * warning or bus passive interrupts.
+	 *
+	 * __note__: The above is rubbish and causes interrupt
+	 * flooding. Change it to the 3.18 way without of obeying
+	 * CAN_CTRL_MODE_BERR_REPORTING and without enabling the
+	 * Error interrupts.
 	 */
 	reg_ctrl = flexcan_read(&regs->ctrl);
 	reg_ctrl &= ~FLEXCAN_CTRL_TSYN;
 	reg_ctrl |= FLEXCAN_CTRL_BOFF_REC | FLEXCAN_CTRL_LBUF |
-		FLEXCAN_CTRL_ERR_STATE | FLEXCAN_CTRL_ERR_MSK;
+		FLEXCAN_CTRL_ERR_STATE /* | FLEXCAN_CTRL_ERR_MSK */;
+
+	/*
+	* enable the "error interrupt" (FLEXCAN_CTRL_ERR_MSK),
+	* on most Flexcan cores, too. Otherwise we don't get
+	* any error warning or passive interrupts.
+	*/
+	if (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)
+		reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
+	else
+		reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;
 
 	/* save for later use */
 	priv->reg_ctrl_default = reg_ctrl;

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: Flexcan
  2015-04-02 11:35                       ` Tom Evans
@ 2015-04-02 12:07                         ` Marc Kleine-Budde
  2015-04-04  3:32                         ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Tom Evans
  1 sibling, 0 replies; 29+ messages in thread
From: Marc Kleine-Budde @ 2015-04-02 12:07 UTC (permalink / raw)
  To: Tom Evans, dan.egnor, linux-can

[-- Attachment #1: Type: text/plain, Size: 2575 bytes --]

On 04/02/2015 01:35 PM, Tom Evans wrote:
> On 2/04/2015 5:28 PM, Marc Kleine-Budde wrote:
>> On 04/02/2015 04:20 AM, Tom Evans wrote:
>>> I had to seriously rewrite the FlexCAN drivers for our i.MX53 because the
>>> Ethernet driver blocked the CAN driver for so long the CAN hardware receive
>>> FIFO overflowed. Linux doesn't even support setting different interrupt
>>> priorities on ARM (but my problem was with NAPI).
>>
>> Can you share your changes? I'll have some days to hack on a better
>> generic RX-FIFO implementation for the at91 and flexcan.
> 
> Sure. "git diff -p" based of the 3.4 kernel sources.

Thanks.

> Working from home over Easter using git under cygwin under Windows via 
> Thunderbird, so the CRLFs and line breaks could be messed up, sorry. 
> Patches inline and attached. If this lot is useless, tell me what to 
> type to do it properly and I'll resend on Tuesday.

Uhhh sounds scary - what about switching to Linux over the holidays? :D
As I don't want to push that patch upstream, but get more insight into
the various problem and solutions, this is alright.

BTW: you can send pathes directly via "git send-email".

> flexcan.c.patch.buffer
> 
> The problem with the FlexCan driver is that it takes the interrupt, then 
> throws to NAPI to actually read the hardware buffers. How messed up is 
> that? It isn't as if reading 16 bytes or so is going to take THAT long, 
> and as there's only six message buffers it overflows in 600us at 1MHz. 
> In the same kernel, the Ethernet driver DOESN'T throw to NAPI but 
> handles as many 1500-byte Ethernet frames as there are coming in in the 
> one ISR. Other things can delay NAPI as well. So I fixed it with a 
> sledgehammer. It reads the messages during the ISR to an internal ring 
> buffer (sized to 100, later to 128)

Sledgehammers are not the best tool for mainline kernel development, but
it's good to demolish old walls and get a clear view to head in the
hopefully right direction.

> flexcan.c.patch.flood
> 
> A disconnected CAN bus generate an interrupt storm. This patch fixes it 
> the same way it is apparently done in some 3.18 drivers. This may 
> already be in there.

We're heading for a different solution, limiting the bus errors.

Thanks,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-02  2:20                   ` Tom Evans
  2015-04-02  6:28                     ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Marc Kleine-Budde
@ 2015-04-02 18:23                     ` Paarvai Naai
  1 sibling, 0 replies; 29+ messages in thread
From: Paarvai Naai @ 2015-04-02 18:23 UTC (permalink / raw)
  To: tom_usenet; +Cc: Dan Egnor, linux-can

Dan, thank you for sharing similar concerns on this thread.

Tom, I appreciate the detailed conversation about this topic -- even
if there is no facility to address the underlying priority concerns
fully in Linux (yet?), it's a nice public service that we are exposing
the relevant caveats regarding priority management to the general
SocketCAN user base.  Your initial points about system architecture
and design are certainly understood, although not foreign to those of
us who are reasonably experienced with CAN already.

> Generic and simple versus custom and complicated. Give me simple any day.

Indeed, the current scheme probably satisfies 95% of the users of
SocketCAN.  At the same time, if SocketCAN is supposed to be the one
CAN interface to rule all CAN interfaces on Linux, it's worth thinking
about "power-user" features that provide functionality that is fully
in keeping with the CAN specification.

> And everything else in Linux is "perfect"?

True, but to be fair, there has been a general quest to continually
improve the OS over the last 20 years that I've been using it.

Best regards,
Paarvai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-02  6:46                   ` Marc Kleine-Budde
@ 2015-04-02 18:28                     ` Paarvai Naai
  2015-04-03  1:35                       ` Tom Evans
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-04-02 18:28 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Dan Egnor, linux-can

Hi Marc,

Thanks for your response!  Yes, even with better driver-level TX
buffer management, there can be eventually a point where the CAN
hardware TX buffer is full with low-priority messages.  One mitigation
for this is to provide a facility for querying the current tx_queue
usage of a SocketCAN interface so that a process chooses to only
submit "bulk" low-priority messages if the tx_queue fill state is less
than some low watermark.

Ironically, that brings us full-circle back to my original reason for
this thread, but we went off on a tangent in the meantime (partly my
fault!).  Querying the tx_queue usage has multiple benefits.  Is there
some way we can address that easier issue first?

Thanks,
Paarvai

On Wed, Apr 1, 2015 at 11:46 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 04/02/2015 02:33 AM, Dan Egnor wrote:
>> That seems surprisingly awful for something that's an established generic
>> interface for all CAN hardware on this operating system! No matter how many
>> mailboxes the controller might support, the operating system simply does not
>> support prioritized transmission.
>
> You have access to all queueing disc in linux, to prioritize your CAN
> frames until they are send via the CAN driver into the hardware. This is
> currently done in all drivers via a single TX queue. In default
> configuration (without a special queueing discipline), you have multiple
> FIFO from the userspace (the sockets) that are combined into a single
> FIFO (queueing disc + TX routine in driver +  hardward).
>
> However, if you need more control, fist make use of a queueing disc.
> Then, if you still suffer from prio inversion, we can have a look at
> your hardware. For now, all drivers, even those with multiple TX
> buffers, configure them into one large TX-FIFO. We can define and
> implement a new per interface option (or options, if needed) that
> switches the hardware between FIFO and priority based sending. This
> however is still a problem, if all your HW buffers are filled with low
> prio CAN frames before a high prio one has the change to be transferred
> into the hardware. This leads us to the implementation of multiple TX
> queues from the linux stack into the driver. This is manly used on big
> server Ethernet card, so that more than one processor can send packages.
> We have to check if a) we can attach a queueing disc to a TX queue and
> b) if these multiple TX queues are properly used on a single processor
> system.
>
> regards,
> Marc
>
> --
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-02 18:28                     ` Paarvai Naai
@ 2015-04-03  1:35                       ` Tom Evans
  2015-04-03  6:45                         ` Paarvai Naai
  0 siblings, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-04-03  1:35 UTC (permalink / raw)
  To: Paarvai Naai, Marc Kleine-Budde; +Cc: Dan Egnor, linux-can

On 3/04/2015 5:28 AM, Paarvai Naai wrote:
> Ironically, that brings us full-circle back to my original reason for
> this thread, but we went off on a tangent in the meantime (partly my
> fault!).  Querying the tx_queue usage has multiple benefits.  Is there
> some way we can address that easier issue first?

Didn't I already answer that question a few days ago?

I found something in /proc or /sys (details in my mailbox at work) That 
looked like the interface to qdisc to allow inspection of the queue depths.

Thanks to google that's:

   cat /sys/class/net/can0/queues/tx-0/byte_queue_limits/inflight
   drivers/net/core/net-sysfs.c
   /lib/dynamic_queue_limits.c

Did you investigate that at all, or are you waiting for someone else to 
do this for you?

Can anyone else comment on that apparent interface? Is "dql" part of 
"qdisc" or something else? How is it enabled?

---

Before using google I just spent 30 minutes trying to quote my previous 
reply from:

     http://dir.gmane.org/gmane.linux.can

Can anyone else get to the above URL today? It crashes Firefox every 
time I go there, locks up IE solidly, and I gave up on lynx after it 
started running GZIP and pulling down about 10 MEGS of data without 
displaying anything.

It is now spitting "fopen()", "feof()" and "fgets()" PHP errors at me, 
so it looks like gmane is broken for Easter.

Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-03  1:35                       ` Tom Evans
@ 2015-04-03  6:45                         ` Paarvai Naai
  2015-04-03 11:08                           ` Marc Kleine-Budde
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-04-03  6:45 UTC (permalink / raw)
  To: Tom Evans; +Cc: Marc Kleine-Budde, Dan Egnor, linux-can

Hi Tom,

> Didn't I already answer that question a few days ago?

Yes, somewhat, but I was hoping for something more concrete from Marc
(or someone else), which is why I posed the question again.  (Also,
I'm not a huge fan of programmatically interfacing with the
system-level control/status via the /sys filesystem....)

> I found something in /proc or /sys (details in my mailbox at work) That
> looked like the interface to qdisc to allow inspection of the queue depths.
>
> Thanks to google that's:
>
>   cat /sys/class/net/can0/queues/tx-0/byte_queue_limits/inflight
>   drivers/net/core/net-sysfs.c
>   /lib/dynamic_queue_limits.c
>
> Did you investigate that at all, or are you waiting for someone else to do
> this for you?

While I am reasonably capable of spelunking through Linux source code,
and have done so successfully in the past, I have not gotten to that
yet (been unfortunately busy with my other tasks at work!).  I was
hoping that someone who is far more familiar with this infrastructure
to speak to the support for this feature.  Apologies if I'm being too
presumptuous.

> Can anyone else comment on that apparent interface? Is "dql" part of "qdisc"
> or something else? How is it enabled?

Yes please!  Would be much appreciated!

>     http://dir.gmane.org/gmane.linux.can
>
> It is now spitting "fopen()", "feof()" and "fgets()" PHP errors at me, so it
> looks like gmane is broken for Easter.

Oh, I've had this problem for the last several days since I got active
on the linux-can mailing list.  But you are able to find individual
messages via Google search.  For example, try searching for
    linux-can "Querying current tx_queue usage of a SocketCAN interface"
and you will find our thread.  The particular message you had in mind
will not be there directly because I believe you accidentally only
replied to me rather than to the mailing list as well.  However, my
response to that email includes your original message quoted below.
See:
http://permalink.gmane.org/gmane.linux.can/7918

Thanks for your continued interest in this thread, and Happy Easter,
Paarvai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-03  6:45                         ` Paarvai Naai
@ 2015-04-03 11:08                           ` Marc Kleine-Budde
  2015-04-03 15:24                             ` Paarvai Naai
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2015-04-03 11:08 UTC (permalink / raw)
  To: Paarvai Naai, Tom Evans; +Cc: Dan Egnor, linux-can

On April 3, 2015 7:45:46 AM GMT+01:00, Paarvai Naai <opensource3141@gmail.com> wrote:
>Hi Tom,
>
>> Didn't I already answer that question a few days ago?
>
>Yes, somewhat, but I was hoping for something more concrete from Marc
>(or someone else), which is why I posed the question again.  (Also,
>I'm not a huge fan of programmatically interfacing with the
>system-level control/status via the /sys filesystem....)

As this is not CAN specific, you might want to ask over on the netdev mailing list (please add linux-can on Cc). Maybe there's a socket option or ioctl for this, if not maybe you can rise the awareness for this need and get hints how to implement and mainline this.

Marc

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-03 11:08                           ` Marc Kleine-Budde
@ 2015-04-03 15:24                             ` Paarvai Naai
  2015-04-03 20:28                               ` Marc Kleine-Budde
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-04-03 15:24 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Tom Evans, Dan Egnor, linux-can

(reposting to linux-can as it seems my email client somehow forgot
this thread should have remained plain text)

Hi Marc,

Thanks for the suggestions.  Happy to cross-post to netdev, but just
one more question before I do so to make sure I am most effective when
dialoguing with those folks.  I was under the impression that the way
one sets up the tx_queue_len is by doing:

ip link set can type can tx_queue_len 32

or equivalently through Netlink with a CAN-specific option (see
libsocketcan).  This suggests that tx_queue_len option is
CAN-specific.  Or does setting tx_queue_len simply pass the
configuration on to the qdisc layer of the kernel?

Thanks!
Paarvai

On Fri, Apr 3, 2015 at 4:08 AM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
> On April 3, 2015 7:45:46 AM GMT+01:00, Paarvai Naai <opensource3141@gmail.com> wrote:
> >Hi Tom,
> >
> >> Didn't I already answer that question a few days ago?
> >
> >Yes, somewhat, but I was hoping for something more concrete from Marc
> >(or someone else), which is why I posed the question again.  (Also,
> >I'm not a huge fan of programmatically interfacing with the
> >system-level control/status via the /sys filesystem....)
>
> As this is not CAN specific, you might want to ask over on the netdev mailing list (please add linux-can on Cc). Maybe there's a socket option or ioctl for this, if not maybe you can rise the awareness for this need and get hints how to implement and mainline this.
>
> Marc

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-03 15:24                             ` Paarvai Naai
@ 2015-04-03 20:28                               ` Marc Kleine-Budde
  2015-04-03 20:53                                 ` Paarvai Naai
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2015-04-03 20:28 UTC (permalink / raw)
  To: Paarvai Naai; +Cc: Tom Evans, Dan Egnor, linux-can



On April 3, 2015 4:24:57 PM GMT+01:00, Paarvai Naai <opensource3141@gmail.com> wrote:
>(reposting to linux-can as it seems my email client somehow forgot
>this thread should have remained plain text)
>
>Hi Marc,
>
>Thanks for the suggestions.  Happy to cross-post to netdev, but just
>one more question before I do so to make sure I am most effective when
>dialoguing with those folks.  I was under the impression that the way
>one sets up the tx_queue_len is by doing:
>
>ip link set can type can tx_queue_len 32
>
>or equivalently through Netlink with a CAN-specific option (see
>libsocketcan).  This suggests that tx_queue_len option is
>CAN-specific.  Or does setting tx_queue_len simply pass the
>configuration on to the qdisc layer of the kernel?

Just glancing at libsocketcan, I don't see an option to set the tx queue len. The parameter is not CAN specific. 

Marc

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-03 20:28                               ` Marc Kleine-Budde
@ 2015-04-03 20:53                                 ` Paarvai Naai
  2015-04-04  8:49                                   ` Marc Kleine-Budde
  0 siblings, 1 reply; 29+ messages in thread
From: Paarvai Naai @ 2015-04-03 20:53 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Tom Evans, Dan Egnor, linux-can

Hi Marc,

> Just glancing at libsocketcan, I don't see an option to set the tx queue len. The parameter is not CAN specific.

You are right, of course.  My confusion stemmed from the fact that
someone else here pulled down libsocketcan and then modified it to add
this as an API call to libsocketcan.{h,cc}.  Thanks for setting me
straight.  I will look into this further.

Paarvai

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface)
  2015-04-02 11:35                       ` Tom Evans
  2015-04-02 12:07                         ` Flexcan Marc Kleine-Budde
@ 2015-04-04  3:32                         ` Tom Evans
  2015-04-09  8:06                           ` Flexcan Tom Evans
  1 sibling, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-04-04  3:32 UTC (permalink / raw)
  To: Marc Kleine-Budde, dan.egnor, linux-can

On 2/04/2015 10:35 PM, Tom Evans wrote:
> On 2/04/2015 5:28 PM, Marc Kleine-Budde wrote:
>> On 04/02/2015 04:20 AM, Tom Evans wrote:
>>> I had to seriously rewrite the FlexCAN drivers for our
 >>> i.MX53 because the Ethernet driver blocked the CAN
 >>> driver for so long the CAN hardware receive FIFO overflowed.
 >>
>> Can you share your changes? I'll have some days to hack on a better
>> generic RX-FIFO implementation for the at91 and flexcan.
>
> Sure. "git diff -p" based of the 3.4 kernel sources.

> flexcan.c.patch.buffer
>
> So I fixed it with a sledgehammer. It reads the messages
 > during the ISR to an internal ring buffer (sized to 100,
 > later to 128)

And schedules NAPI to forward them from there rather than reading them 
from the hardware FIFO.

The purpose of NAPI is to make the interrupts as fast as possible, doing 
as little work as possible, but servicing time-critical hardware so it 
doesn't overflow/underflow. Operations like reading characters from a 
serial port.

But that assumes the "little work" is fast. In the case of the FlexCAN 
driver, it takes about 5 reads and a write to read a CAN message, and 
there may be six messages in the FIFO.

Not many accesses, but peripheral device registers can be notoriously 
slow on some CPUs [1]. I've worked on an ARM-based PXA CPU that took 
something like 200 CPU clocks to read a GPIO port once. I don't know how 
fast or slow the FlexCAN register reads and writes are on the i.MX53. If 
they're too slow then my patches may be increasing interrupt latency for 
a given value of "too much".

I'll try and measure this on Tuesday.

Tom

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-03 20:53                                 ` Paarvai Naai
@ 2015-04-04  8:49                                   ` Marc Kleine-Budde
  2015-04-06 17:54                                     ` Paarvai Naai
  0 siblings, 1 reply; 29+ messages in thread
From: Marc Kleine-Budde @ 2015-04-04  8:49 UTC (permalink / raw)
  To: Paarvai Naai; +Cc: Tom Evans, Dan Egnor, linux-can



On April 3, 2015 9:53:17 PM GMT+01:00, Paarvai Naai <opensource3141@gmail.com> wrote:
>Hi Marc,
>
>> Just glancing at libsocketcan, I don't see an option to set the tx
>queue len. The parameter is not CAN specific.
>
>You are right, of course.  My confusion stemmed from the fact that
>someone else here pulled down libsocketcan and then modified it to add
>this as an API call to libsocketcan.{h,cc}.  Thanks for setting me
>straight.  I will look into this further.

Btw, libsocketcan is not C++ as a your .cc file extension might suggest.

Marc

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Fwd: Querying current tx_queue usage of a SocketCAN interface
  2015-04-04  8:49                                   ` Marc Kleine-Budde
@ 2015-04-06 17:54                                     ` Paarvai Naai
  0 siblings, 0 replies; 29+ messages in thread
From: Paarvai Naai @ 2015-04-06 17:54 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: Tom Evans, Dan Egnor, linux-can

Yup, that was a typo.  Sorry for any confusion!


On Sat, Apr 4, 2015 at 1:49 AM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
>
>
> On April 3, 2015 9:53:17 PM GMT+01:00, Paarvai Naai <opensource3141@gmail.com> wrote:
>>Hi Marc,
>>
>>> Just glancing at libsocketcan, I don't see an option to set the tx
>>queue len. The parameter is not CAN specific.
>>
>>You are right, of course.  My confusion stemmed from the fact that
>>someone else here pulled down libsocketcan and then modified it to add
>>this as an API call to libsocketcan.{h,cc}.  Thanks for setting me
>>straight.  I will look into this further.
>
> Btw, libsocketcan is not C++ as a your .cc file extension might suggest.
>
> Marc

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Flexcan
  2015-04-04  3:32                         ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Tom Evans
@ 2015-04-09  8:06                           ` Tom Evans
  2015-04-10  6:35                             ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Tom Evans
  0 siblings, 1 reply; 29+ messages in thread
From: Tom Evans @ 2015-04-09  8:06 UTC (permalink / raw)
  To: Marc Kleine-Budde, dan.egnor, linux-can

On 04/04/15 14:32, Tom Evans wrote:
> On 2/04/2015 10:35 PM, Tom Evans wrote:
> ...
> And schedules NAPI to forward them from there rather than reading them from
> the hardware FIFO.
>
> The purpose of NAPI is to make the interrupts as fast as possible, doing as
> little work as possible, but servicing time-critical hardware so it doesn't
> overflow/underflow. Operations like reading characters from a serial port.
>
> But that assumes the "little work" is fast. In the case of the FlexCAN driver,
> it takes about 5 reads and a write to read a CAN message, and there may be six
> messages in the FIFO.
>
> Not many accesses, but peripheral device registers can be notoriously slow on
> some CPUs [1].
 > ...
> I'll try and measure this on Tuesday.

Now quite tomorrow, but I have some results:

[    1.494142] flexcan flexcan.1: One do_gettimeofday took 0 us)
[    1.499903] flexcan flexcan.1: Ten do_gettimeofday took 4 us)
[    1.505677] flexcan flexcan.1: 100 flexcan_read() took 23 us)

I first measured the overhead of calling do_gettimeofday(), which is about 
0.4us. So I can pretty much ignore that in this test.

Then in a loop reading a FlexCAN control register, it took about 0.23us per 
read. That's 230ns or about 184 CPU clocks at 800MHz.

OK, so this IS a slow peripheral.

Given it takes about 5 reads to read one message, that's about 1.15us per 
message. With a queue depth of "6" that's a maximum extra delay of 6.9us.

I think that's OK for an interrupt. What's the "modern expectation" of the 
latency?

Tom


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface)
  2015-04-09  8:06                           ` Flexcan Tom Evans
@ 2015-04-10  6:35                             ` Tom Evans
  0 siblings, 0 replies; 29+ messages in thread
From: Tom Evans @ 2015-04-10  6:35 UTC (permalink / raw)
  To: Marc Kleine-Budde, dan.egnor, linux-can

On 09/04/15 18:06, Tom Evans wrote:
> On 04/04/15 14:32, Tom Evans wrote:
>> On 2/04/2015 10:35 PM, Tom Evans wrote:
>> ...
>> And schedules NAPI to forward them from there rather than reading them from
>> the hardware FIFO.
>>
>> The purpose of NAPI is to make the interrupts as fast as possible, doing as
>> little work as possible, but servicing time-critical hardware so it doesn't
>> overflow/underflow. Operations like reading characters from a serial port.
>>
>> But that assumes the "little work" is fast. In the case of the FlexCAN driver,
>> it takes about 5 reads and a write to read a CAN message, and there may be six
>> messages in the FIFO.
>>
>> Not many accesses, but peripheral device registers can be notoriously slow on
>> some CPUs [1].
>  > ...
>> I'll try and measure this on Tuesday.
>
> Now quite tomorrow, but I have some results:
>
> [    1.494142] flexcan flexcan.1: One do_gettimeofday took 0 us)
> [    1.499903] flexcan flexcan.1: Ten do_gettimeofday took 4 us)
> [    1.505677] flexcan flexcan.1: 100 flexcan_read() took 23 us)
>
> I first measured the overhead of calling do_gettimeofday(), which is about
> 0.4us. So I can pretty much ignore that in this test.
>
> Then in a loop reading a FlexCAN control register, it took about 0.23us per
> read. That's 230ns or about 184 CPU clocks at 800MHz.
>
> OK, so this IS a slow peripheral.
>
> Given it takes about 5 reads to read one message, that's about 1.15us per
> message. With a queue depth of "6" that's a maximum extra delay of 6.9us.

That would only happen if interrupts were delayed for 6 whole CAN message 
times, which is over 600us. This should be unlikely. In the more common case, 
one interrupt would read one message, meaning only about 1.15us more than 
throwing to NAPI.

Does anyone have any figures on how slow (how many CPU cycles to read and 
write) the other peripherals are on this CPU? This is something I've never 
seen in any Freescale manual for any of their CPUs.

I wonder if any of the other peripherals are faster? I can run that test myself:

[    1.588819] flexcan flexcan.1: 100 read(ssi)     @0x50014000  took 24 us
[    1.596449] flexcan flexcan.1: 100 read(esdhc1)  @0x50004000  took 25 us
[    1.604337] flexcan flexcan.1: 100 read(uart)    @0x5000c000  took 23 us
[    1.612051] flexcan flexcan.1: 100 read(flexcan) @0x53fc8000  took 23 us
[    1.620017] flexcan flexcan.1: 100 read(gpio)    @0x53f84000  took 26 us
[    1.627731] flexcan flexcan.1: 100 read(pwm)     @0x53fb8000  took 23 us
[    1.635358] flexcan flexcan.1: 100 read(i2c1)    @0x63fc0000  took 23 us
[    1.643076] flexcan flexcan.1: 100 read(fec)     @0x63fec000  took 27 us
[    1.650690] flexcan flexcan.1: 100 read(sdma)    @0x63fb0000  took 23 us
[    1.658406] flexcan flexcan.1: 100 read(sram)    @0xf8000000  took 17 us

The IRAM is a bit faster, but not by that much. I don't believe these tests. 
The IRAM is meant to be accessed in a few CLOCKS not a hundred! Maybe it is 
springing an MMU trap on every "I/O" access? That would account for the time.

I think I'm testing this the right way. The inner loop that is reading the 
registers (after calling ioremap() to get an address) is and disassembles to:

             tbase = ioremap(psDev->addr, 4096);
             do_gettimeofday(&now);
             reg = readl(tbase);
             for (i = 0; i < 100; i++)
             {
                 reg = readl(tbase);
             }
             do_gettimeofday(&now2);


  530:   ebfffffe    bl  0 <__arm_ioremap>
  534:   e2504000    subs    r4, r0, #0
...
  558:   e3a03064    mov r3, #100    ; 0x64
  55c:   e5942000    ldr r2, [r4]
  560:   f57ff04f    dsb sy
  564:   e2533001    subs    r3, r3, #1
  568:   e50b2030    str r2, [fp, #-48]  ; 0x30
  56c:   1afffffa    bne 55c <flexcan_probe+0x55c>
...

If I generate code that abuses "volatile" to read the registers, but leaves 
the "dsb" out the time for the loop drops to 18ms (180ns/read) for registers 
and 13us for the IRAM (130ns/read or still 100 CPU clocks at 800MHz).

I can believe the IO Registers are that slow but why should the internal SRAM 
shouldn't be that slow?

Tom



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-04-10  6:35 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAE+ymru296P+LjkT7_ONVc2OGMP9mtXW46Nq5aSnm1etauj9Aw@mail.gmail.com>
2015-03-28 20:26 ` Fwd: Querying current tx_queue usage of a SocketCAN interface Paarvai Naai
2015-03-29 22:42   ` Tom Evans
2015-03-30 21:55     ` Paarvai Naai
     [not found]       ` <5519E5A9.7080104@optusnet.com.au>
2015-03-31  0:26         ` Paarvai Naai
2015-03-31  3:09           ` Tom Evans
2015-04-01 20:33             ` Paarvai Naai
2015-04-01 20:57               ` Dan Egnor
2015-04-02  2:20                 ` Tom Evans
2015-04-02  2:33                   ` Daniel Egnor
2015-04-01 23:21               ` Tom Evans
2015-04-02  0:33                 ` Dan Egnor
2015-04-02  2:20                   ` Tom Evans
2015-04-02  6:28                     ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Marc Kleine-Budde
2015-04-02 11:35                       ` Tom Evans
2015-04-02 12:07                         ` Flexcan Marc Kleine-Budde
2015-04-04  3:32                         ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Tom Evans
2015-04-09  8:06                           ` Flexcan Tom Evans
2015-04-10  6:35                             ` Flexcan (was: Re: Fwd: Querying current tx_queue usage of a SocketCAN interface) Tom Evans
2015-04-02 18:23                     ` Fwd: Querying current tx_queue usage of a SocketCAN interface Paarvai Naai
2015-04-02  6:46                   ` Marc Kleine-Budde
2015-04-02 18:28                     ` Paarvai Naai
2015-04-03  1:35                       ` Tom Evans
2015-04-03  6:45                         ` Paarvai Naai
2015-04-03 11:08                           ` Marc Kleine-Budde
2015-04-03 15:24                             ` Paarvai Naai
2015-04-03 20:28                               ` Marc Kleine-Budde
2015-04-03 20:53                                 ` Paarvai Naai
2015-04-04  8:49                                   ` Marc Kleine-Budde
2015-04-06 17:54                                     ` Paarvai Naai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.