All of lore.kernel.org
 help / color / mirror / Atom feed
* socket can receive order
@ 2015-09-08  9:42 Daniel Squires
  2015-09-08 10:01 ` Marc Kleine-Budde
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Squires @ 2015-09-08  9:42 UTC (permalink / raw)
  To: linux-can

Hi all,

new to this list.

Just a quick question at present, when using recv on a socket that is 
bound to a can interface, should the packets be received in the order 
they came off the wire? or is this not guaranteed?

for example, is this valid or an error in some part of the system?

candump can3 -tz
<snip>
  (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
  (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
  (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00

The packets were sent to the wire in order, the CAN ID is incremented 
with each send to ensure mailbox/arbitration details dont mess up the 
order on to the wire, the packets were seen coming off the wire over USB 
in wireshark in the correct order, but my test utility which aborts when 
something unexpected happens and candump both see this out of sequence 
result. Note that  the timestamps confirm that something saw them in the 
correct order, but recv returns them out of order.

Another identical receiver with another instance of candump sees the 
packets in the expected order, but sees other packets at other times out 
of order, i.e. it appears random.

Thanks

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08  9:42 socket can receive order Daniel Squires
@ 2015-09-08 10:01 ` Marc Kleine-Budde
  2015-09-08 10:41   ` Daniel Squires
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 10:01 UTC (permalink / raw)
  To: Daniel Squires, linux-can

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On 09/08/2015 11:42 AM, Daniel Squires wrote:
> Hi all,
> 
> new to this list.
> 
> Just a quick question at present, when using recv on a socket that is 
> bound to a can interface, should the packets be received in the order 
> they came off the wire? or is this not guaranteed?

Should be guaranteed. Which CAN core are you using? What's your kernel
version?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 10:01 ` Marc Kleine-Budde
@ 2015-09-08 10:41   ` Daniel Squires
  2015-09-08 11:13     ` Marc Kleine-Budde
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 10:41 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can

Hi Marc,

I should have mentioned that this "issue" seems to only show up on our 
application PC, (which is an Intel NUC).

On my laptop and Desktop PC I have not seen it happen.

Both the application PC (NUC) and the Laptop are running Ubuntu kernel 
3.19.0-26-generic

The NUC has the kernel rebuilt without xhci due to problems it causes 
with another USB peripheral.

I am not entirely sure what you mean by which can core I am using but if 
it helps i am opening the socket as follows :

     sock = socket(PF_CAN,SOCK_RAW,CAN_RAW);

in a small standalone test application which I wrote after having 
difficulty with our main application.

I am using custom hardware/firmware and am using the kernel module found 
here : https://github.com/fabiobaltieri/open-usb-can
though it has a small change to stop the net queue at the top of 
open_usb_can_start_xmit as otherwise its prone to loosing TX packets 
when loaded.

I can see the packets coming in the correct order in wireshark and it is 
not immediately obvious to me how the kernel module could mix up the 
order, so it seems that it must be something that happens at the socket 
level?

On the top level I am using CANFestival for CANOpen implementation, so 
it has occurred to me I could implement a CANFestival "driver" using 
libusb and completely bypass the kernel module and socket can layers, 
but I hope not to have to do this.

On 08/09/15 11:01, Marc Kleine-Budde wrote:
> On 09/08/2015 11:42 AM, Daniel Squires wrote:
>> Hi all,
>>
>> new to this list.
>>
>> Just a quick question at present, when using recv on a socket that is
>> bound to a can interface, should the packets be received in the order
>> they came off the wire? or is this not guaranteed?
> Should be guaranteed. Which CAN core are you using? What's your kernel
> version?
>
> Marc
>

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 10:41   ` Daniel Squires
@ 2015-09-08 11:13     ` Marc Kleine-Budde
  2015-09-08 11:17       ` Daniel Squires
  2015-09-08 11:46       ` Wolfgang Grandegger
  0 siblings, 2 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 11:13 UTC (permalink / raw)
  To: Daniel Squires, linux-can, Oliver Hartkopp

[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]

On 09/08/2015 12:41 PM, Daniel Squires wrote:
> On my laptop and Desktop PC I have not seen it happen.

> Both the application PC (NUC) and the Laptop are running Ubuntu kernel 
> 3.19.0-26-generic
> 
> The NUC has the kernel rebuilt without xhci due to problems it causes 
> with another USB peripheral.
> 
> I am not entirely sure what you mean by which can core I am using but if 
> it helps i am opening the socket as follows :

I mean what kind of CAN adapter...

> 
>      sock = socket(PF_CAN,SOCK_RAW,CAN_RAW);
> 
> in a small standalone test application which I wrote after having 
> difficulty with our main application.
> 
> I am using custom hardware/firmware and am using the kernel module found 
> here : https://github.com/fabiobaltieri/open-usb-can
> though it has a small change to stop the net queue at the top of 
> open_usb_can_start_xmit as otherwise its prone to loosing TX packets 
> when loaded.

Yes, this looks racy - You should ask then to mainline working the driver.

> I can see the packets coming in the correct order in wireshark and it is 
> not immediately obvious to me how the kernel module could mix up the 
> order, so it seems that it must be something that happens at the socket 
> level?

The kernel module "produces" the CAN frames, so if you see them in the
correct order in wireshark, they have left the module in the right order.

> candump can3 -tz
> <snip>
>   (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
>   (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
>   (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00

The timestamps are in the correct order. Maybe Oliver can help here,
he's an expert when it comes to strange reordering :)

> On the top level I am using CANFestival for CANOpen implementation, so 
> it has occurred to me I could implement a CANFestival "driver" using 
> libusb and completely bypass the kernel module and socket can layers, 
> but I hope not to have to do this.

Na, you don't want to do this.

Marc
-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:13     ` Marc Kleine-Budde
@ 2015-09-08 11:17       ` Daniel Squires
  2015-09-08 11:20         ` Marc Kleine-Budde
  2015-09-08 16:56         ` Oliver Hartkopp
  2015-09-08 11:46       ` Wolfgang Grandegger
  1 sibling, 2 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 11:17 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, Oliver Hartkopp



On 08/09/15 12:13, Marc Kleine-Budde wrote:
> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>> On my laptop and Desktop PC I have not seen it happen.
>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
>> 3.19.0-26-generic
>>
>> The NUC has the kernel rebuilt without xhci due to problems it causes
>> with another USB peripheral.
>>
>> I am not entirely sure what you mean by which can core I am using but if
>> it helps i am opening the socket as follows :
> I mean what kind of CAN adapter...
>
>>       sock = socket(PF_CAN,SOCK_RAW,CAN_RAW);
>>
>> in a small standalone test application which I wrote after having
>> difficulty with our main application.
>>
>> I am using custom hardware/firmware and am using the kernel module found
>> here : https://github.com/fabiobaltieri/open-usb-can
>> though it has a small change to stop the net queue at the top of
>> open_usb_can_start_xmit as otherwise its prone to loosing TX packets
>> when loaded.
> Yes, this looks racy - You should ask then to mainline working the driver.
>
>> I can see the packets coming in the correct order in wireshark and it is
>> not immediately obvious to me how the kernel module could mix up the
>> order, so it seems that it must be something that happens at the socket
>> level?
> The kernel module "produces" the CAN frames, so if you see them in the
> correct order in wireshark, they have left the module in the right order.

Sorry , I should have been clearer here, in wireshark was looking at the 
USB frames not the CAN frames. however I think what you say still stands 
due to the time stamps being in the correct order.
>
>> candump can3 -tz
>> <snip>
>>    (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
>>    (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
>>    (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00
> The timestamps are in the correct order. Maybe Oliver can help here,
> he's an expert when it comes to strange reordering :)
>
>> On the top level I am using CANFestival for CANOpen implementation, so
>> it has occurred to me I could implement a CANFestival "driver" using
>> libusb and completely bypass the kernel module and socket can layers,
>> but I hope not to have to do this.
> Na, you don't want to do this.
>
> Marc

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:17       ` Daniel Squires
@ 2015-09-08 11:20         ` Marc Kleine-Budde
  2015-09-08 11:37           ` Daniel Squires
  2015-09-08 16:56         ` Oliver Hartkopp
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 11:20 UTC (permalink / raw)
  To: Daniel Squires, linux-can, Oliver Hartkopp

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On 09/08/2015 01:17 PM, Daniel Squires wrote:
>> The kernel module "produces" the CAN frames, so if you see them in the
>> correct order in wireshark, they have left the module in the right order.
> 
> Sorry , I should have been clearer here, in wireshark was looking at the 
> USB frames not the CAN frames. however I think what you say still stands 
> due to the time stamps being in the correct order.

Thanks for the clarification. Can you have a look at the CAN interface
with wireshark, too?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:20         ` Marc Kleine-Budde
@ 2015-09-08 11:37           ` Daniel Squires
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 11:37 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, Oliver Hartkopp



On 08/09/15 12:20, Marc Kleine-Budde wrote:
> On 09/08/2015 01:17 PM, Daniel Squires wrote:
>>> The kernel module "produces" the CAN frames, so if you see them in the
>>> correct order in wireshark, they have left the module in the right order.
>> Sorry , I should have been clearer here, in wireshark was looking at the
>> USB frames not the CAN frames. however I think what you say still stands
>> due to the time stamps being in the correct order.
> Thanks for the clarification. Can you have a look at the CAN interface
> with wireshark, too?
Wireshark shows the packets in the same order as candump, however it 
seems the timestamps are in the order wireshark got the packets, rather 
than when they were generated.

A couple of other observations, it seems to take longer for an out of 
order packet ot happen whilst wireshark is capturing, and on one 
occasion my application saw an out of order packet which candump showed 
as being in the correct order! This is a first and I wonder if is also 
related to wireshark also capturing.
> Marc
>

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:13     ` Marc Kleine-Budde
  2015-09-08 11:17       ` Daniel Squires
@ 2015-09-08 11:46       ` Wolfgang Grandegger
  2015-09-08 11:49         ` Daniel Squires
                           ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Wolfgang Grandegger @ 2015-09-08 11:46 UTC (permalink / raw)
  To: Marc Kleine-Budde, Daniel Squires, linux-can, Oliver Hartkopp



Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>> On my laptop and Desktop PC I have not seen it happen.
>
>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
>> 3.19.0-26-generic
>>
>> The NUC has the kernel rebuilt without xhci due to problems it causes
>> with another USB peripheral.
>>
>> I am not entirely sure what you mean by which can core I am using but if
>> it helps i am opening the socket as follows :
>
> I mean what kind of CAN adapter...

"https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 
controller.

Wolfgang.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:46       ` Wolfgang Grandegger
@ 2015-09-08 11:49         ` Daniel Squires
  2015-09-08 11:56         ` Marc Kleine-Budde
  2015-09-10  2:29         ` Tom Evans
  2 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 11:49 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can, Oliver Hartkopp

Whilst I am using the kernel module of that project at present, the 
firmware and hardware are not from that project, but instead based 
around an STM32 MCU.


On 08/09/15 12:46, Wolfgang Grandegger wrote:
>
>
> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
>> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>>> On my laptop and Desktop PC I have not seen it happen.
>>
>>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
>>> 3.19.0-26-generic
>>>
>>> The NUC has the kernel rebuilt without xhci due to problems it causes
>>> with another USB peripheral.
>>>
>>> I am not entirely sure what you mean by which can core I am using 
>>> but if
>>> it helps i am opening the socket as follows :
>>
>> I mean what kind of CAN adapter...
>
> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 
> controller.
>
> Wolfgang.
>

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:46       ` Wolfgang Grandegger
  2015-09-08 11:49         ` Daniel Squires
@ 2015-09-08 11:56         ` Marc Kleine-Budde
  2015-09-10  2:29         ` Tom Evans
  2 siblings, 0 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 11:56 UTC (permalink / raw)
  To: Wolfgang Grandegger, Daniel Squires, linux-can, Oliver Hartkopp

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On 09/08/2015 01:46 PM, Wolfgang Grandegger wrote:

>> I mean what kind of CAN adapter...
> 
> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515 
> controller.

ACK. But from the Linux driver side it's a USB device and Daniel is
using the driver from github.

Thanks,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:17       ` Daniel Squires
  2015-09-08 11:20         ` Marc Kleine-Budde
@ 2015-09-08 16:56         ` Oliver Hartkopp
  2015-09-09  2:30           ` Austin Schuh
  1 sibling, 1 reply; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-08 16:56 UTC (permalink / raw)
  To: Daniel Squires, Marc Kleine-Budde, linux-can

Hi all,

On 08.09.2015 13:17, Daniel Squires wrote:
> On 08/09/15 12:13, Marc Kleine-Budde wrote:

>>> I can see the packets coming in the correct order in wireshark and it is
>>> not immediately obvious to me how the kernel module could mix up the
>>> order, so it seems that it must be something that happens at the socket
>>> level?
>> The kernel module "produces" the CAN frames, so if you see them in the
>> correct order in wireshark, they have left the module in the right order.

Yes. This is trivial.

But Daniel is right to ask about the frame reordering on socket level - better 
say - reordering outside the driver level.

>
> Sorry , I should have been clearer here, in wireshark was looking at the USB
> frames not the CAN frames. however I think what you say still stands due to
> the time stamps being in the correct order.
>>
>>> candump can3 -tz
>>> <snip>
>>>    (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
>>>    (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
>>>    (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00
>> The timestamps are in the correct order. Maybe Oliver can help here,
>> he's an expert when it comes to strange reordering :)

Will try - see below.

>>
>>> On the top level I am using CANFestival for CANOpen implementation, so
>>> it has occurred to me I could implement a CANFestival "driver" using
>>> libusb and completely bypass the kernel module and socket can layers,
>>> but I hope not to have to do this.
>> Na, you don't want to do this.

The point this that it would not help either - even if you are using the 
PF_PACKET socket (which wireshark does) - bypassing the CAN network layer 
modules (can, can_raw) doesn't fix the problem.

I discussed the problem on netdev ML as I discovered a out-of-order issue when 
fixing the CAN_RAW join feature.

When you have a multicore SMP processor the interrupt can be processed by 
different CPUs, which can lead to packet reordering when using netif_ix() on 
driver level.

The discussion ended with the networking guys pointing me to use NAPI which 
does not really help, e.g. there's only one USB network adapter in 
linux/drivers/net which is a complete mess.

My suggestion was to set a hash value into the socket buffer (skb) at driver 
level, which is used for generating a 'flow' for IP traffic too. You can 
generate flows by hashes to put all traffic from a specific IP into the same 
per-cpu input queue to help TCP assembling the packets in the softirq for this 
IP address in correct order (aha!).

See http://marc.info/?l=linux-netdev&m=143689694125450&w=2

I assume the networking guys interpreted my suggestion as hack as they are not 
aware how 'addressing' is done in CAN. They only know about IP ...

NAPI is not really a valid solution for CAN USB adapters and I think I'll have 
to restart the discussion as out-of-order frames are a no-go for CAN as it 
kills ISO15765-2 and (obviously) CANopen segmentation.

I assume Daniel uses a multicore system, right?

If so, please try the 'hack' I suggested on the netdev ML if it fixes your 
problem. It might help for the discussion too.

Regards,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 16:56         ` Oliver Hartkopp
@ 2015-09-09  2:30           ` Austin Schuh
  2015-09-09  3:10             ` Brian Silverman
                               ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Austin Schuh @ 2015-09-09  2:30 UTC (permalink / raw)
  To: Oliver Hartkopp, Daniel Squires, Marc Kleine-Budde, linux-can

On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
> Hi all,
>
> On 08.09.2015 13:17, Daniel Squires wrote:
> > On 08/09/15 12:13, Marc Kleine-Budde wrote:
>
> >>> I can see the packets coming in the correct order in wireshark and it is
> >>> not immediately obvious to me how the kernel module could mix up the
> >>> order, so it seems that it must be something that happens at the socket
> >>> level?
> >> The kernel module "produces" the CAN frames, so if you see them in the
> >> correct order in wireshark, they have left the module in the right order.
>
> Yes. This is trivial.
>
> But Daniel is right to ask about the frame reordering on socket level - better
> say - reordering outside the driver level.
>
> >
> > Sorry , I should have been clearer here, in wireshark was looking at the USB
> > frames not the CAN frames. however I think what you say still stands due to
> > the time stamps being in the correct order.
> >>
> >>> candump can3 -tz
> >>> <snip>
> >>>    (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
> >>>    (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
> >>>    (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00
> >> The timestamps are in the correct order. Maybe Oliver can help here,
> >> he's an expert when it comes to strange reordering :)
>
> Will try - see below.
>
> >>
> >>> On the top level I am using CANFestival for CANOpen implementation, so
> >>> it has occurred to me I could implement a CANFestival "driver" using
> >>> libusb and completely bypass the kernel module and socket can layers,
> >>> but I hope not to have to do this.
> >> Na, you don't want to do this.
>
> The point this that it would not help either - even if you are using the
> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
> modules (can, can_raw) doesn't fix the problem.
>
> I discussed the problem on netdev ML as I discovered a out-of-order issue when
> fixing the CAN_RAW join feature.
>
> When you have a multicore SMP processor the interrupt can be processed by
> different CPUs, which can lead to packet reordering when using netif_ix() on
> driver level.
>
> The discussion ended with the networking guys pointing me to use NAPI which
> does not really help, e.g. there's only one USB network adapter in
> linux/drivers/net which is a complete mess.
>
> My suggestion was to set a hash value into the socket buffer (skb) at driver
> level, which is used for generating a 'flow' for IP traffic too. You can
> generate flows by hashes to put all traffic from a specific IP into the same
> per-cpu input queue to help TCP assembling the packets in the softirq for this
> IP address in correct order (aha!).
>
> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>
> I assume the networking guys interpreted my suggestion as hack as they are not
> aware how 'addressing' is done in CAN. They only know about IP ...
>
> NAPI is not really a valid solution for CAN USB adapters and I think I'll have
> to restart the discussion as out-of-order frames are a no-go for CAN as it
> kills ISO15765-2 and (obviously) CANopen segmentation.
>
> I assume Daniel uses a multicore system, right?
>
> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
> problem. It might help for the discussion too.
>
> Regards,
> Oliver

On our boxes, I've been setting the affinity for both the IRQ thread
(we are running a RT kernel), and the interrupt to the same single
core.  Would that help here?

We've seen CAN packets get significantly delayed causing overruns due
to Ethernet load and both CAN and ethernet sharing the same softirq.
Our solution has been to set the affinity for each of those to
different cores to keep them isolated.

Austin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-09  2:30           ` Austin Schuh
@ 2015-09-09  3:10             ` Brian Silverman
  2015-09-09 16:23               ` Oliver Hartkopp
  2015-09-09 12:05             ` Daniel Squires
  2015-09-09 16:14             ` Daniel Squires
  2 siblings, 1 reply; 20+ messages in thread
From: Brian Silverman @ 2015-09-09  3:10 UTC (permalink / raw)
  To: Austin Schuh
  Cc: Oliver Hartkopp, Daniel Squires, Marc Kleine-Budde, linux-can

Another thing Austin and I do is set up RPS (receive packet steering)
for the Ethernet interfaces so the hardware sends the physical
Ethernet interrupts to that same core which isn't involved in
servicing the CAN interfaces at all.

On Tue, Sep 8, 2015 at 10:30 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>
> On our boxes, I've been setting the affinity for both the IRQ thread
> (we are running a RT kernel), and the interrupt to the same single
> core.  Would that help here?
>
> We've seen CAN packets get significantly delayed causing overruns due
> to Ethernet load and both CAN and ethernet sharing the same softirq.
> Our solution has been to set the affinity for each of those to
> different cores to keep them isolated.
>
> Austin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-09  2:30           ` Austin Schuh
  2015-09-09  3:10             ` Brian Silverman
@ 2015-09-09 12:05             ` Daniel Squires
  2015-09-09 16:14             ` Daniel Squires
  2 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-09 12:05 UTC (permalink / raw)
  To: Austin Schuh, Oliver Hartkopp, Marc Kleine-Budde, linux-can



On 09/09/15 03:30, Austin Schuh wrote:
> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> Hi all,
>>
>> On 08.09.2015 13:17, Daniel Squires wrote:
>>> On 08/09/15 12:13, Marc Kleine-Budde wrote:
>>>>> I can see the packets coming in the correct order in wireshark and it is
>>>>> not immediately obvious to me how the kernel module could mix up the
>>>>> order, so it seems that it must be something that happens at the socket
>>>>> level?
>>>> The kernel module "produces" the CAN frames, so if you see them in the
>>>> correct order in wireshark, they have left the module in the right order.
>> Yes. This is trivial.
>>
>> But Daniel is right to ask about the frame reordering on socket level - better
>> say - reordering outside the driver level.
>>
>>> Sorry , I should have been clearer here, in wireshark was looking at the USB
>>> frames not the CAN frames. however I think what you say still stands due to
>>> the time stamps being in the correct order.
>>>>> candump can3 -tz
>>>>> <snip>
>>>>>     (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
>>>>>     (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
>>>>>     (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00
>>>> The timestamps are in the correct order. Maybe Oliver can help here,
>>>> he's an expert when it comes to strange reordering :)
>> Will try - see below.
>>
>>>>> On the top level I am using CANFestival for CANOpen implementation, so
>>>>> it has occurred to me I could implement a CANFestival "driver" using
>>>>> libusb and completely bypass the kernel module and socket can layers,
>>>>> but I hope not to have to do this.
>>>> Na, you don't want to do this.
>> The point this that it would not help either - even if you are using the
>> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
>> modules (can, can_raw) doesn't fix the problem.
I meant to bypass ALL the kernel CAN / sock layers and go direct from 
usb frames to application, which I think would avoid the problem, tho 
also renders useless tools such as wireshark and can-utils and i would 
rather avoid.

The USB frames appear to arrive in order as the  timestamps (as shown by 
candump) are in order, though the packets come out of recv() OOO, and 
further testing reveals some of them are significantly delayed at the 
application level, by 10s of mS, in that in that time many newer pkts 
are received promptly (<mS).


>> I discussed the problem on netdev ML as I discovered a out-of-order issue when
>> fixing the CAN_RAW join feature.
>>
>> When you have a multicore SMP processor the interrupt can be processed by
>> different CPUs, which can lead to packet reordering when using netif_ix() on
>> driver level.
>>
>> The discussion ended with the networking guys pointing me to use NAPI which
>> does not really help, e.g. there's only one USB network adapter in
>> linux/drivers/net which is a complete mess.
>>
>> My suggestion was to set a hash value into the socket buffer (skb) at driver
>> level, which is used for generating a 'flow' for IP traffic too. You can
>> generate flows by hashes to put all traffic from a specific IP into the same
>> per-cpu input queue to help TCP assembling the packets in the softirq for this
>> IP address in correct order (aha!).
>>
>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>>
>> I assume the networking guys interpreted my suggestion as hack as they are not
>> aware how 'addressing' is done in CAN. They only know about IP ...
>>
>> NAPI is not really a valid solution for CAN USB adapters and I think I'll have
>> to restart the discussion as out-of-order frames are a no-go for CAN as it
>> kills ISO15765-2 and (obviously) CANopen segmentation.
>>
>> I assume Daniel uses a multicore system, right?

Correct, a core I5 in this case.
>>
>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
>> problem. It might help for the discussion too.
>>
>> Regards,
>> Oliver
> On our boxes, I've been setting the affinity for both the IRQ thread
> (we are running a RT kernel), and the interrupt to the same single
> core.  Would that help here?
>
> We've seen CAN packets get significantly delayed causing overruns due
> to Ethernet load and both CAN and ethernet sharing the same softirq.
> Our solution has been to set the affinity for each of those to
> different cores to keep them isolated.
>
> Austin
>

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-09  2:30           ` Austin Schuh
  2015-09-09  3:10             ` Brian Silverman
  2015-09-09 12:05             ` Daniel Squires
@ 2015-09-09 16:14             ` Daniel Squires
  2015-09-09 16:31               ` Oliver Hartkopp
  2015-09-17 19:18               ` Oliver Hartkopp
  2 siblings, 2 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-09 16:14 UTC (permalink / raw)
  To: Austin Schuh, Oliver Hartkopp, Marc Kleine-Budde, linux-can

The Hack seems to work, its been a short test of a half hour so far, but 
before it happened reliably after a few seconds.

On 09/09/15 03:30, Austin Schuh wrote:
> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> Hi all,
>>
>> On 08.09.2015 13:17, Daniel Squires wrote:
>>> On 08/09/15 12:13, Marc Kleine-Budde wrote:
>>>>> I can see the packets coming in the correct order in wireshark and it is
>>>>> not immediately obvious to me how the kernel module could mix up the
>>>>> order, so it seems that it must be something that happens at the socket
>>>>> level?
>>>> The kernel module "produces" the CAN frames, so if you see them in the
>>>> correct order in wireshark, they have left the module in the right order.
>> Yes. This is trivial.
>>
>> But Daniel is right to ask about the frame reordering on socket level - better
>> say - reordering outside the driver level.
>>
>>> Sorry , I should have been clearer here, in wireshark was looking at the USB
>>> frames not the CAN frames. however I think what you say still stands due to
>>> the time stamps being in the correct order.
>>>>> candump can3 -tz
>>>>> <snip>
>>>>>     (003.088648)  can3  043   [8]  F7 2D 00 00 00 00 00 00
>>>>>     (003.089149)  can3  045   [8]  F9 2D 00 00 00 00 00 00
>>>>>     (003.088897)  can3  044   [8]  F8 2D 00 00 00 00 00 00
>>>> The timestamps are in the correct order. Maybe Oliver can help here,
>>>> he's an expert when it comes to strange reordering :)
>> Will try - see below.
>>
>>>>> On the top level I am using CANFestival for CANOpen implementation, so
>>>>> it has occurred to me I could implement a CANFestival "driver" using
>>>>> libusb and completely bypass the kernel module and socket can layers,
>>>>> but I hope not to have to do this.
>>>> Na, you don't want to do this.
>> The point this that it would not help either - even if you are using the
>> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
>> modules (can, can_raw) doesn't fix the problem.
>>
>> I discussed the problem on netdev ML as I discovered a out-of-order issue when
>> fixing the CAN_RAW join feature.
>>
>> When you have a multicore SMP processor the interrupt can be processed by
>> different CPUs, which can lead to packet reordering when using netif_ix() on
>> driver level.
>>
>> The discussion ended with the networking guys pointing me to use NAPI which
>> does not really help, e.g. there's only one USB network adapter in
>> linux/drivers/net which is a complete mess.
>>
>> My suggestion was to set a hash value into the socket buffer (skb) at driver
>> level, which is used for generating a 'flow' for IP traffic too. You can
>> generate flows by hashes to put all traffic from a specific IP into the same
>> per-cpu input queue to help TCP assembling the packets in the softirq for this
>> IP address in correct order (aha!).
>>
>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>>
>> I assume the networking guys interpreted my suggestion as hack as they are not
>> aware how 'addressing' is done in CAN. They only know about IP ...
>>
>> NAPI is not really a valid solution for CAN USB adapters and I think I'll have
>> to restart the discussion as out-of-order frames are a no-go for CAN as it
>> kills ISO15765-2 and (obviously) CANopen segmentation.
>>
>> I assume Daniel uses a multicore system, right?
>>
>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
>> problem. It might help for the discussion too.
>>
>> Regards,
>> Oliver
> On our boxes, I've been setting the affinity for both the IRQ thread
> (we are running a RT kernel), and the interrupt to the same single
> core.  Would that help here?
>
> We've seen CAN packets get significantly delayed causing overruns due
> to Ethernet load and both CAN and ethernet sharing the same softirq.
> Our solution has been to set the affinity for each of those to
> different cores to keep them isolated.
>
> Austin
>

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-09  3:10             ` Brian Silverman
@ 2015-09-09 16:23               ` Oliver Hartkopp
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-09 16:23 UTC (permalink / raw)
  To: Brian Silverman, Austin Schuh, Daniel Squires
  Cc: Marc Kleine-Budde, linux-can

On 09.09.2015 05:10, Brian Silverman wrote:
> Another thing Austin and I do is set up RPS (receive packet steering)
> for the Ethernet interfaces so the hardware sends the physical
> Ethernet interrupts to that same core which isn't involved in
> servicing the CAN interfaces at all.
>
> On Tue, Sep 8, 2015 at 10:30 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>>
>> On our boxes, I've been setting the affinity for both the IRQ thread
>> (we are running a RT kernel), and the interrupt to the same single
>> core.  Would that help here?

Yes it does.

Nailing the interrupts from ethernet and CAN interfaces to different CPUs by 
setting irq_affinity is a valid but pretty hard solution.

Therefore I was trying to use hash based RPS to fix the out-of-order problem 
and let the kernel networking do the (hopefully optimal) rest.

>> We've seen CAN packets get significantly delayed causing overruns due
>> to Ethernet load and both CAN and ethernet sharing the same softirq.
>> Our solution has been to set the affinity for each of those to
>> different cores to keep them isolated.

Yes. That's a good point. Due to the fact that you split up ethernet and CAN 
on different CPUs the softirq is also running on different CPUs.

This could also be the solution for Daniels problem!

My suggested solution (aka 'hack' 
http://marc.info/?l=linux-netdev&m=143689694125450&w=2 ) with the hash based 
RPS does not split the ethernet/CAN traffic among CPUs as - depending on the 
IP hashes - some of the ethernet traffic can be pushed into the same CPU we 
use for the CAN interface.

So it's a softer solution which at least fixes out-of-order for CAN interfaces.

Regards,
Oliver

ps. There were some performance tests (vanilla and RT kernel) from the 
University of Prague where you can see the impact of additional ethernet load:

http://rtime.felk.cvut.cz/can/

http://rtime.felk.cvut.cz/can/benchmark/3.0/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-09 16:14             ` Daniel Squires
@ 2015-09-09 16:31               ` Oliver Hartkopp
  2015-09-17 19:18               ` Oliver Hartkopp
  1 sibling, 0 replies; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-09 16:31 UTC (permalink / raw)
  To: Daniel Squires, Austin Schuh, Marc Kleine-Budde, linux-can

On 09.09.2015 18:14, Daniel Squires wrote:
> The Hack seems to work, its been a short test of a half hour so far, but
> before it happened reliably after a few seconds.

Great! You should not see any out-of-order frames anymore.

I obviously have to start a new attempt to push that single line of source 
code into mainline :-)

If it doesn't help to fix your latency problem under ethernet load, you might 
check the fixed irq_affinity setting for separating ethernet/CAN CPUs that 
Brian and Austin were suggesting.

Thanks for the feedback,
Oliver


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-08 11:46       ` Wolfgang Grandegger
  2015-09-08 11:49         ` Daniel Squires
  2015-09-08 11:56         ` Marc Kleine-Budde
@ 2015-09-10  2:29         ` Tom Evans
  2015-09-10  8:08           ` Daniel Squires
  2 siblings, 1 reply; 20+ messages in thread
From: Tom Evans @ 2015-09-10  2:29 UTC (permalink / raw)
  To: Daniel Squires, linux-can

On 08/09/15 21:46, Wolfgang Grandegger wrote:
>
> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
>> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>>> On my laptop and Desktop PC I have not seen it happen.
>> I mean what kind of CAN adapter...
>
> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
> controller.

http://fabiobaltieri.com/2013/07/23/hacking-into-a-vehicle-can-bus-toyothack-and-socketcan/#more-1419

"my own open hardware USB AVR + MCP2515 interface", "the performances are not 
that good above 250kbps", "It’s tempting to use an SPI controller (the MCP2515 
is very common), but that has terrible performances on highly loaded fast 
busses, and you will end up with problem such as RX buffer underruns and 
out-of-order frames."

He means "overruns". The MCP2515 doesn't have a FIFO. Messages have to be read 
out over a slow SPI bus one bit at a time within one message time or it 
overruns. Or two if the BUKT bit is set, but that risks reading messages in 
the wrong order.

The design uses an ATMEGA32U2 and an MCP2515. I can't see why it shouldn't be 
able to buffer messages from the MCP2515 at relatively high data rates, if the 
code is well written. From my experience though, code for the MCP2515 is 
seldom "well written". It is too easy to fall into a trap and get the message 
arrival order wrong.

This is unlikely to be related to the OP's problem, but just something to be 
aware of.

For anybody still coding and debugging MCP2515 stuff:

http://www.microchip.com/forums/m620741.aspx

 > otherwise its prone to loosing TX packets when loaded.

Do you know about having to do something like the following to stop CAN 
Transmit Drops? The networking stack defaults to DROPPING CAN transmit frames 
before blocking the socket if you don't.

     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
     ...
     int sndbuf = (250 + 8) * 256;
     socklen_t socklen = sizeof(sndbuf);
     /* Minimum socket buffer to try and get it blocking */
     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
                     &sndbuf, sizeof(sndbuf));


http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html

Tom




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-10  2:29         ` Tom Evans
@ 2015-09-10  8:08           ` Daniel Squires
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-10  8:08 UTC (permalink / raw)
  To: tom_usenet, linux-can



On 10/09/15 03:29, Tom Evans wrote:
> On 08/09/15 21:46, Wolfgang Grandegger wrote:
>>
>> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
>>> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>>>> On my laptop and Desktop PC I have not seen it happen.
>>> I mean what kind of CAN adapter...
>>
>> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
>> controller.
>
> http://fabiobaltieri.com/2013/07/23/hacking-into-a-vehicle-can-bus-toyothack-and-socketcan/#more-1419 
>
>
> "my own open hardware USB AVR + MCP2515 interface", "the performances 
> are not that good above 250kbps", "It’s tempting to use an SPI 
> controller (the MCP2515 is very common), but that has terrible 
> performances on highly loaded fast busses, and you will end up with 
> problem such as RX buffer underruns and out-of-order frames."
>
> He means "overruns". The MCP2515 doesn't have a FIFO. Messages have to 
> be read out over a slow SPI bus one bit at a time within one message 
> time or it overruns. Or two if the BUKT bit is set, but that risks 
> reading messages in the wrong order.
>
> The design uses an ATMEGA32U2 and an MCP2515. I can't see why it 
> shouldn't be able to buffer messages from the MCP2515 at relatively 
> high data rates, if the code is well written. From my experience 
> though, code for the MCP2515 is seldom "well written". It is too easy 
> to fall into a trap and get the message arrival order wrong.
>
> This is unlikely to be related to the OP's problem, but just something 
> to be aware of.

Yes, I had read that whole article and didn't use the hardware / 
firmware there for those reasons. I needed 1Mbit also. I am using the 
STM32F4 Discovery boards with a CAN phy attached. I didn't know where to 
start with t kernel module which is why i am suing the one from there at 
present, it would be nice to get a "standardised" usb class kernel 
module, but i guess that would require input form the USB implementers 
group.
>
> For anybody still coding and debugging MCP2515 stuff:
>
> http://www.microchip.com/forums/m620741.aspx
>
> > otherwise its prone to loosing TX packets when loaded.
>
> Do you know about having to do something like the following to stop 
> CAN Transmit Drops? The networking stack defaults to DROPPING CAN 
> transmit frames before blocking the socket if you don't.
>
>     /bin/echo 256 > /sys/class/net/can0/tx_queue_len
>     ...
>     int sndbuf = (250 + 8) * 256;
>     socklen_t socklen = sizeof(sndbuf);
>     /* Minimum socket buffer to try and get it blocking */
>     rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
>                     &sndbuf, sizeof(sndbuf));
>
I hadn't noticed it could be done in that way, had been using ip 
utility, but was aware of the 10 frames default queue size and that it 
could be changed. In my very basic OOO test app i'm actually sending 
packets with incrementing values until there is no space (send returns 
ENOBUF), then doing the receives and checking the values until there is 
nothing to receive, before continuing to send from the previous failed 
value.
>
> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html 
>
>
> Tom
>
>
>

-- 
Dan Squires

Engineered Arts Ltd.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: socket can receive order
  2015-09-09 16:14             ` Daniel Squires
  2015-09-09 16:31               ` Oliver Hartkopp
@ 2015-09-17 19:18               ` Oliver Hartkopp
  1 sibling, 0 replies; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-17 19:18 UTC (permalink / raw)
  To: Daniel Squires, Austin Schuh, Marc Kleine-Budde, linux-can

Hello Daniel,

On 09.09.2015 18:14, Daniel Squires wrote:
> The Hack seems to work, its been a short test of a half hour so far, but
> before it happened reliably after a few seconds.
>
>> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>>>
>>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>>>
>>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
>>> problem. It might help for the discussion too.

In the referenced posting above I suggested to set

	skb_set_hash(skb, dev->ifindex, PKT_HASH_TYPE_L2);

to create a interface specific hash for the socket buffer.

And then to enable the receive packet steering (RPS) with

	echo f > /sys/class/net/can0/queues/rx-0/rps_cpus

To create a proper patch and description I evaluated some more skb_set_hash() 
parameters and finally discovered that setting the skb hash seems to be 
obsolete ... %-)

Can you confirm that

	echo f > /sys/class/net/can0/queues/rx-0/rps_cpus

already fixes the out-of-order issue even without setting the skb hash?

If so we could give a general recommendation for multi-core CPU system users 
to enable RPS for CAN interfaces by setting the specific sysfs entry.

Regards,
Oliver



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-09-17 19:24 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08  9:42 socket can receive order Daniel Squires
2015-09-08 10:01 ` Marc Kleine-Budde
2015-09-08 10:41   ` Daniel Squires
2015-09-08 11:13     ` Marc Kleine-Budde
2015-09-08 11:17       ` Daniel Squires
2015-09-08 11:20         ` Marc Kleine-Budde
2015-09-08 11:37           ` Daniel Squires
2015-09-08 16:56         ` Oliver Hartkopp
2015-09-09  2:30           ` Austin Schuh
2015-09-09  3:10             ` Brian Silverman
2015-09-09 16:23               ` Oliver Hartkopp
2015-09-09 12:05             ` Daniel Squires
2015-09-09 16:14             ` Daniel Squires
2015-09-09 16:31               ` Oliver Hartkopp
2015-09-17 19:18               ` Oliver Hartkopp
2015-09-08 11:46       ` Wolfgang Grandegger
2015-09-08 11:49         ` Daniel Squires
2015-09-08 11:56         ` Marc Kleine-Budde
2015-09-10  2:29         ` Tom Evans
2015-09-10  8:08           ` Daniel Squires

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.