* socket can receive order
@ 2015-09-08 9:42 Daniel Squires
2015-09-08 10:01 ` Marc Kleine-Budde
0 siblings, 1 reply; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 9:42 UTC (permalink / raw)
To: linux-can
Hi all,
new to this list.
Just a quick question at present, when using recv on a socket that is
bound to a can interface, should the packets be received in the order
they came off the wire? or is this not guaranteed?
for example, is this valid or an error in some part of the system?
candump can3 -tz
<snip>
(003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
(003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
(003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
The packets were sent to the wire in order, the CAN ID is incremented
with each send to ensure mailbox/arbitration details dont mess up the
order on to the wire, the packets were seen coming off the wire over USB
in wireshark in the correct order, but my test utility which aborts when
something unexpected happens and candump both see this out of sequence
result. Note that the timestamps confirm that something saw them in the
correct order, but recv returns them out of order.
Another identical receiver with another instance of candump sees the
packets in the expected order, but sees other packets at other times out
of order, i.e. it appears random.
Thanks
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 9:42 socket can receive order Daniel Squires
@ 2015-09-08 10:01 ` Marc Kleine-Budde
2015-09-08 10:41 ` Daniel Squires
0 siblings, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 10:01 UTC (permalink / raw)
To: Daniel Squires, linux-can
[-- Attachment #1: Type: text/plain, Size: 658 bytes --]
On 09/08/2015 11:42 AM, Daniel Squires wrote:
> Hi all,
>
> new to this list.
>
> Just a quick question at present, when using recv on a socket that is
> bound to a can interface, should the packets be received in the order
> they came off the wire? or is this not guaranteed?
Should be guaranteed. Which CAN core are you using? What's your kernel
version?
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 10:01 ` Marc Kleine-Budde
@ 2015-09-08 10:41 ` Daniel Squires
2015-09-08 11:13 ` Marc Kleine-Budde
0 siblings, 1 reply; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 10:41 UTC (permalink / raw)
To: Marc Kleine-Budde, linux-can
Hi Marc,
I should have mentioned that this "issue" seems to only show up on our
application PC, (which is an Intel NUC).
On my laptop and Desktop PC I have not seen it happen.
Both the application PC (NUC) and the Laptop are running Ubuntu kernel
3.19.0-26-generic
The NUC has the kernel rebuilt without xhci due to problems it causes
with another USB peripheral.
I am not entirely sure what you mean by which can core I am using but if
it helps i am opening the socket as follows :
sock = socket(PF_CAN,SOCK_RAW,CAN_RAW);
in a small standalone test application which I wrote after having
difficulty with our main application.
I am using custom hardware/firmware and am using the kernel module found
here : https://github.com/fabiobaltieri/open-usb-can
though it has a small change to stop the net queue at the top of
open_usb_can_start_xmit as otherwise its prone to loosing TX packets
when loaded.
I can see the packets coming in the correct order in wireshark and it is
not immediately obvious to me how the kernel module could mix up the
order, so it seems that it must be something that happens at the socket
level?
On the top level I am using CANFestival for CANOpen implementation, so
it has occurred to me I could implement a CANFestival "driver" using
libusb and completely bypass the kernel module and socket can layers,
but I hope not to have to do this.
On 08/09/15 11:01, Marc Kleine-Budde wrote:
> On 09/08/2015 11:42 AM, Daniel Squires wrote:
>> Hi all,
>>
>> new to this list.
>>
>> Just a quick question at present, when using recv on a socket that is
>> bound to a can interface, should the packets be received in the order
>> they came off the wire? or is this not guaranteed?
> Should be guaranteed. Which CAN core are you using? What's your kernel
> version?
>
> Marc
>
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 10:41 ` Daniel Squires
@ 2015-09-08 11:13 ` Marc Kleine-Budde
2015-09-08 11:17 ` Daniel Squires
2015-09-08 11:46 ` Wolfgang Grandegger
0 siblings, 2 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 11:13 UTC (permalink / raw)
To: Daniel Squires, linux-can, Oliver Hartkopp
[-- Attachment #1: Type: text/plain, Size: 2328 bytes --]
On 09/08/2015 12:41 PM, Daniel Squires wrote:
> On my laptop and Desktop PC I have not seen it happen.
> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
> 3.19.0-26-generic
>
> The NUC has the kernel rebuilt without xhci due to problems it causes
> with another USB peripheral.
>
> I am not entirely sure what you mean by which can core I am using but if
> it helps i am opening the socket as follows :
I mean what kind of CAN adapter...
>
> sock = socket(PF_CAN,SOCK_RAW,CAN_RAW);
>
> in a small standalone test application which I wrote after having
> difficulty with our main application.
>
> I am using custom hardware/firmware and am using the kernel module found
> here : https://github.com/fabiobaltieri/open-usb-can
> though it has a small change to stop the net queue at the top of
> open_usb_can_start_xmit as otherwise its prone to loosing TX packets
> when loaded.
Yes, this looks racy - You should ask then to mainline working the driver.
> I can see the packets coming in the correct order in wireshark and it is
> not immediately obvious to me how the kernel module could mix up the
> order, so it seems that it must be something that happens at the socket
> level?
The kernel module "produces" the CAN frames, so if you see them in the
correct order in wireshark, they have left the module in the right order.
> candump can3 -tz
> <snip>
> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
The timestamps are in the correct order. Maybe Oliver can help here,
he's an expert when it comes to strange reordering :)
> On the top level I am using CANFestival for CANOpen implementation, so
> it has occurred to me I could implement a CANFestival "driver" using
> libusb and completely bypass the kernel module and socket can layers,
> but I hope not to have to do this.
Na, you don't want to do this.
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:13 ` Marc Kleine-Budde
@ 2015-09-08 11:17 ` Daniel Squires
2015-09-08 11:20 ` Marc Kleine-Budde
2015-09-08 16:56 ` Oliver Hartkopp
2015-09-08 11:46 ` Wolfgang Grandegger
1 sibling, 2 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 11:17 UTC (permalink / raw)
To: Marc Kleine-Budde, linux-can, Oliver Hartkopp
On 08/09/15 12:13, Marc Kleine-Budde wrote:
> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>> On my laptop and Desktop PC I have not seen it happen.
>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
>> 3.19.0-26-generic
>>
>> The NUC has the kernel rebuilt without xhci due to problems it causes
>> with another USB peripheral.
>>
>> I am not entirely sure what you mean by which can core I am using but if
>> it helps i am opening the socket as follows :
> I mean what kind of CAN adapter...
>
>> sock = socket(PF_CAN,SOCK_RAW,CAN_RAW);
>>
>> in a small standalone test application which I wrote after having
>> difficulty with our main application.
>>
>> I am using custom hardware/firmware and am using the kernel module found
>> here : https://github.com/fabiobaltieri/open-usb-can
>> though it has a small change to stop the net queue at the top of
>> open_usb_can_start_xmit as otherwise its prone to loosing TX packets
>> when loaded.
> Yes, this looks racy - You should ask then to mainline working the driver.
>
>> I can see the packets coming in the correct order in wireshark and it is
>> not immediately obvious to me how the kernel module could mix up the
>> order, so it seems that it must be something that happens at the socket
>> level?
> The kernel module "produces" the CAN frames, so if you see them in the
> correct order in wireshark, they have left the module in the right order.
Sorry , I should have been clearer here, in wireshark was looking at the
USB frames not the CAN frames. however I think what you say still stands
due to the time stamps being in the correct order.
>
>> candump can3 -tz
>> <snip>
>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
> The timestamps are in the correct order. Maybe Oliver can help here,
> he's an expert when it comes to strange reordering :)
>
>> On the top level I am using CANFestival for CANOpen implementation, so
>> it has occurred to me I could implement a CANFestival "driver" using
>> libusb and completely bypass the kernel module and socket can layers,
>> but I hope not to have to do this.
> Na, you don't want to do this.
>
> Marc
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:17 ` Daniel Squires
@ 2015-09-08 11:20 ` Marc Kleine-Budde
2015-09-08 11:37 ` Daniel Squires
2015-09-08 16:56 ` Oliver Hartkopp
1 sibling, 1 reply; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 11:20 UTC (permalink / raw)
To: Daniel Squires, linux-can, Oliver Hartkopp
[-- Attachment #1: Type: text/plain, Size: 793 bytes --]
On 09/08/2015 01:17 PM, Daniel Squires wrote:
>> The kernel module "produces" the CAN frames, so if you see them in the
>> correct order in wireshark, they have left the module in the right order.
>
> Sorry , I should have been clearer here, in wireshark was looking at the
> USB frames not the CAN frames. however I think what you say still stands
> due to the time stamps being in the correct order.
Thanks for the clarification. Can you have a look at the CAN interface
with wireshark, too?
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:20 ` Marc Kleine-Budde
@ 2015-09-08 11:37 ` Daniel Squires
0 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 11:37 UTC (permalink / raw)
To: Marc Kleine-Budde, linux-can, Oliver Hartkopp
On 08/09/15 12:20, Marc Kleine-Budde wrote:
> On 09/08/2015 01:17 PM, Daniel Squires wrote:
>>> The kernel module "produces" the CAN frames, so if you see them in the
>>> correct order in wireshark, they have left the module in the right order.
>> Sorry , I should have been clearer here, in wireshark was looking at the
>> USB frames not the CAN frames. however I think what you say still stands
>> due to the time stamps being in the correct order.
> Thanks for the clarification. Can you have a look at the CAN interface
> with wireshark, too?
Wireshark shows the packets in the same order as candump, however it
seems the timestamps are in the order wireshark got the packets, rather
than when they were generated.
A couple of other observations, it seems to take longer for an out of
order packet ot happen whilst wireshark is capturing, and on one
occasion my application saw an out of order packet which candump showed
as being in the correct order! This is a first and I wonder if is also
related to wireshark also capturing.
> Marc
>
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:13 ` Marc Kleine-Budde
2015-09-08 11:17 ` Daniel Squires
@ 2015-09-08 11:46 ` Wolfgang Grandegger
2015-09-08 11:49 ` Daniel Squires
` (2 more replies)
1 sibling, 3 replies; 20+ messages in thread
From: Wolfgang Grandegger @ 2015-09-08 11:46 UTC (permalink / raw)
To: Marc Kleine-Budde, Daniel Squires, linux-can, Oliver Hartkopp
Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>> On my laptop and Desktop PC I have not seen it happen.
>
>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
>> 3.19.0-26-generic
>>
>> The NUC has the kernel rebuilt without xhci due to problems it causes
>> with another USB peripheral.
>>
>> I am not entirely sure what you mean by which can core I am using but if
>> it helps i am opening the socket as follows :
>
> I mean what kind of CAN adapter...
"https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
controller.
Wolfgang.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:46 ` Wolfgang Grandegger
@ 2015-09-08 11:49 ` Daniel Squires
2015-09-08 11:56 ` Marc Kleine-Budde
2015-09-10 2:29 ` Tom Evans
2 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-08 11:49 UTC (permalink / raw)
To: Wolfgang Grandegger, Marc Kleine-Budde, linux-can, Oliver Hartkopp
Whilst I am using the kernel module of that project at present, the
firmware and hardware are not from that project, but instead based
around an STM32 MCU.
On 08/09/15 12:46, Wolfgang Grandegger wrote:
>
>
> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
>> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>>> On my laptop and Desktop PC I have not seen it happen.
>>
>>> Both the application PC (NUC) and the Laptop are running Ubuntu kernel
>>> 3.19.0-26-generic
>>>
>>> The NUC has the kernel rebuilt without xhci due to problems it causes
>>> with another USB peripheral.
>>>
>>> I am not entirely sure what you mean by which can core I am using
>>> but if
>>> it helps i am opening the socket as follows :
>>
>> I mean what kind of CAN adapter...
>
> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
> controller.
>
> Wolfgang.
>
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:46 ` Wolfgang Grandegger
2015-09-08 11:49 ` Daniel Squires
@ 2015-09-08 11:56 ` Marc Kleine-Budde
2015-09-10 2:29 ` Tom Evans
2 siblings, 0 replies; 20+ messages in thread
From: Marc Kleine-Budde @ 2015-09-08 11:56 UTC (permalink / raw)
To: Wolfgang Grandegger, Daniel Squires, linux-can, Oliver Hartkopp
[-- Attachment #1: Type: text/plain, Size: 583 bytes --]
On 09/08/2015 01:46 PM, Wolfgang Grandegger wrote:
>> I mean what kind of CAN adapter...
>
> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
> controller.
ACK. But from the Linux driver side it's a USB device and Daniel is
using the driver from github.
Thanks,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:17 ` Daniel Squires
2015-09-08 11:20 ` Marc Kleine-Budde
@ 2015-09-08 16:56 ` Oliver Hartkopp
2015-09-09 2:30 ` Austin Schuh
1 sibling, 1 reply; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-08 16:56 UTC (permalink / raw)
To: Daniel Squires, Marc Kleine-Budde, linux-can
Hi all,
On 08.09.2015 13:17, Daniel Squires wrote:
> On 08/09/15 12:13, Marc Kleine-Budde wrote:
>>> I can see the packets coming in the correct order in wireshark and it is
>>> not immediately obvious to me how the kernel module could mix up the
>>> order, so it seems that it must be something that happens at the socket
>>> level?
>> The kernel module "produces" the CAN frames, so if you see them in the
>> correct order in wireshark, they have left the module in the right order.
Yes. This is trivial.
But Daniel is right to ask about the frame reordering on socket level - better
say - reordering outside the driver level.
>
> Sorry , I should have been clearer here, in wireshark was looking at the USB
> frames not the CAN frames. however I think what you say still stands due to
> the time stamps being in the correct order.
>>
>>> candump can3 -tz
>>> <snip>
>>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
>>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
>>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
>> The timestamps are in the correct order. Maybe Oliver can help here,
>> he's an expert when it comes to strange reordering :)
Will try - see below.
>>
>>> On the top level I am using CANFestival for CANOpen implementation, so
>>> it has occurred to me I could implement a CANFestival "driver" using
>>> libusb and completely bypass the kernel module and socket can layers,
>>> but I hope not to have to do this.
>> Na, you don't want to do this.
The point this that it would not help either - even if you are using the
PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
modules (can, can_raw) doesn't fix the problem.
I discussed the problem on netdev ML as I discovered a out-of-order issue when
fixing the CAN_RAW join feature.
When you have a multicore SMP processor the interrupt can be processed by
different CPUs, which can lead to packet reordering when using netif_ix() on
driver level.
The discussion ended with the networking guys pointing me to use NAPI which
does not really help, e.g. there's only one USB network adapter in
linux/drivers/net which is a complete mess.
My suggestion was to set a hash value into the socket buffer (skb) at driver
level, which is used for generating a 'flow' for IP traffic too. You can
generate flows by hashes to put all traffic from a specific IP into the same
per-cpu input queue to help TCP assembling the packets in the softirq for this
IP address in correct order (aha!).
See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
I assume the networking guys interpreted my suggestion as hack as they are not
aware how 'addressing' is done in CAN. They only know about IP ...
NAPI is not really a valid solution for CAN USB adapters and I think I'll have
to restart the discussion as out-of-order frames are a no-go for CAN as it
kills ISO15765-2 and (obviously) CANopen segmentation.
I assume Daniel uses a multicore system, right?
If so, please try the 'hack' I suggested on the netdev ML if it fixes your
problem. It might help for the discussion too.
Regards,
Oliver
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 16:56 ` Oliver Hartkopp
@ 2015-09-09 2:30 ` Austin Schuh
2015-09-09 3:10 ` Brian Silverman
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Austin Schuh @ 2015-09-09 2:30 UTC (permalink / raw)
To: Oliver Hartkopp, Daniel Squires, Marc Kleine-Budde, linux-can
On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>
> Hi all,
>
> On 08.09.2015 13:17, Daniel Squires wrote:
> > On 08/09/15 12:13, Marc Kleine-Budde wrote:
>
> >>> I can see the packets coming in the correct order in wireshark and it is
> >>> not immediately obvious to me how the kernel module could mix up the
> >>> order, so it seems that it must be something that happens at the socket
> >>> level?
> >> The kernel module "produces" the CAN frames, so if you see them in the
> >> correct order in wireshark, they have left the module in the right order.
>
> Yes. This is trivial.
>
> But Daniel is right to ask about the frame reordering on socket level - better
> say - reordering outside the driver level.
>
> >
> > Sorry , I should have been clearer here, in wireshark was looking at the USB
> > frames not the CAN frames. however I think what you say still stands due to
> > the time stamps being in the correct order.
> >>
> >>> candump can3 -tz
> >>> <snip>
> >>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
> >>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
> >>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
> >> The timestamps are in the correct order. Maybe Oliver can help here,
> >> he's an expert when it comes to strange reordering :)
>
> Will try - see below.
>
> >>
> >>> On the top level I am using CANFestival for CANOpen implementation, so
> >>> it has occurred to me I could implement a CANFestival "driver" using
> >>> libusb and completely bypass the kernel module and socket can layers,
> >>> but I hope not to have to do this.
> >> Na, you don't want to do this.
>
> The point this that it would not help either - even if you are using the
> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
> modules (can, can_raw) doesn't fix the problem.
>
> I discussed the problem on netdev ML as I discovered a out-of-order issue when
> fixing the CAN_RAW join feature.
>
> When you have a multicore SMP processor the interrupt can be processed by
> different CPUs, which can lead to packet reordering when using netif_ix() on
> driver level.
>
> The discussion ended with the networking guys pointing me to use NAPI which
> does not really help, e.g. there's only one USB network adapter in
> linux/drivers/net which is a complete mess.
>
> My suggestion was to set a hash value into the socket buffer (skb) at driver
> level, which is used for generating a 'flow' for IP traffic too. You can
> generate flows by hashes to put all traffic from a specific IP into the same
> per-cpu input queue to help TCP assembling the packets in the softirq for this
> IP address in correct order (aha!).
>
> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>
> I assume the networking guys interpreted my suggestion as hack as they are not
> aware how 'addressing' is done in CAN. They only know about IP ...
>
> NAPI is not really a valid solution for CAN USB adapters and I think I'll have
> to restart the discussion as out-of-order frames are a no-go for CAN as it
> kills ISO15765-2 and (obviously) CANopen segmentation.
>
> I assume Daniel uses a multicore system, right?
>
> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
> problem. It might help for the discussion too.
>
> Regards,
> Oliver
On our boxes, I've been setting the affinity for both the IRQ thread
(we are running a RT kernel), and the interrupt to the same single
core. Would that help here?
We've seen CAN packets get significantly delayed causing overruns due
to Ethernet load and both CAN and ethernet sharing the same softirq.
Our solution has been to set the affinity for each of those to
different cores to keep them isolated.
Austin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-09 2:30 ` Austin Schuh
@ 2015-09-09 3:10 ` Brian Silverman
2015-09-09 16:23 ` Oliver Hartkopp
2015-09-09 12:05 ` Daniel Squires
2015-09-09 16:14 ` Daniel Squires
2 siblings, 1 reply; 20+ messages in thread
From: Brian Silverman @ 2015-09-09 3:10 UTC (permalink / raw)
To: Austin Schuh
Cc: Oliver Hartkopp, Daniel Squires, Marc Kleine-Budde, linux-can
Another thing Austin and I do is set up RPS (receive packet steering)
for the Ethernet interfaces so the hardware sends the physical
Ethernet interrupts to that same core which isn't involved in
servicing the CAN interfaces at all.
On Tue, Sep 8, 2015 at 10:30 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>
> On our boxes, I've been setting the affinity for both the IRQ thread
> (we are running a RT kernel), and the interrupt to the same single
> core. Would that help here?
>
> We've seen CAN packets get significantly delayed causing overruns due
> to Ethernet load and both CAN and ethernet sharing the same softirq.
> Our solution has been to set the affinity for each of those to
> different cores to keep them isolated.
>
> Austin
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-09 2:30 ` Austin Schuh
2015-09-09 3:10 ` Brian Silverman
@ 2015-09-09 12:05 ` Daniel Squires
2015-09-09 16:14 ` Daniel Squires
2 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-09 12:05 UTC (permalink / raw)
To: Austin Schuh, Oliver Hartkopp, Marc Kleine-Budde, linux-can
On 09/09/15 03:30, Austin Schuh wrote:
> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> Hi all,
>>
>> On 08.09.2015 13:17, Daniel Squires wrote:
>>> On 08/09/15 12:13, Marc Kleine-Budde wrote:
>>>>> I can see the packets coming in the correct order in wireshark and it is
>>>>> not immediately obvious to me how the kernel module could mix up the
>>>>> order, so it seems that it must be something that happens at the socket
>>>>> level?
>>>> The kernel module "produces" the CAN frames, so if you see them in the
>>>> correct order in wireshark, they have left the module in the right order.
>> Yes. This is trivial.
>>
>> But Daniel is right to ask about the frame reordering on socket level - better
>> say - reordering outside the driver level.
>>
>>> Sorry , I should have been clearer here, in wireshark was looking at the USB
>>> frames not the CAN frames. however I think what you say still stands due to
>>> the time stamps being in the correct order.
>>>>> candump can3 -tz
>>>>> <snip>
>>>>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
>>>>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
>>>>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
>>>> The timestamps are in the correct order. Maybe Oliver can help here,
>>>> he's an expert when it comes to strange reordering :)
>> Will try - see below.
>>
>>>>> On the top level I am using CANFestival for CANOpen implementation, so
>>>>> it has occurred to me I could implement a CANFestival "driver" using
>>>>> libusb and completely bypass the kernel module and socket can layers,
>>>>> but I hope not to have to do this.
>>>> Na, you don't want to do this.
>> The point this that it would not help either - even if you are using the
>> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
>> modules (can, can_raw) doesn't fix the problem.
I meant to bypass ALL the kernel CAN / sock layers and go direct from
usb frames to application, which I think would avoid the problem, tho
also renders useless tools such as wireshark and can-utils and i would
rather avoid.
The USB frames appear to arrive in order as the timestamps (as shown by
candump) are in order, though the packets come out of recv() OOO, and
further testing reveals some of them are significantly delayed at the
application level, by 10s of mS, in that in that time many newer pkts
are received promptly (<mS).
>> I discussed the problem on netdev ML as I discovered a out-of-order issue when
>> fixing the CAN_RAW join feature.
>>
>> When you have a multicore SMP processor the interrupt can be processed by
>> different CPUs, which can lead to packet reordering when using netif_ix() on
>> driver level.
>>
>> The discussion ended with the networking guys pointing me to use NAPI which
>> does not really help, e.g. there's only one USB network adapter in
>> linux/drivers/net which is a complete mess.
>>
>> My suggestion was to set a hash value into the socket buffer (skb) at driver
>> level, which is used for generating a 'flow' for IP traffic too. You can
>> generate flows by hashes to put all traffic from a specific IP into the same
>> per-cpu input queue to help TCP assembling the packets in the softirq for this
>> IP address in correct order (aha!).
>>
>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>>
>> I assume the networking guys interpreted my suggestion as hack as they are not
>> aware how 'addressing' is done in CAN. They only know about IP ...
>>
>> NAPI is not really a valid solution for CAN USB adapters and I think I'll have
>> to restart the discussion as out-of-order frames are a no-go for CAN as it
>> kills ISO15765-2 and (obviously) CANopen segmentation.
>>
>> I assume Daniel uses a multicore system, right?
Correct, a core I5 in this case.
>>
>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
>> problem. It might help for the discussion too.
>>
>> Regards,
>> Oliver
> On our boxes, I've been setting the affinity for both the IRQ thread
> (we are running a RT kernel), and the interrupt to the same single
> core. Would that help here?
>
> We've seen CAN packets get significantly delayed causing overruns due
> to Ethernet load and both CAN and ethernet sharing the same softirq.
> Our solution has been to set the affinity for each of those to
> different cores to keep them isolated.
>
> Austin
>
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-09 2:30 ` Austin Schuh
2015-09-09 3:10 ` Brian Silverman
2015-09-09 12:05 ` Daniel Squires
@ 2015-09-09 16:14 ` Daniel Squires
2015-09-09 16:31 ` Oliver Hartkopp
2015-09-17 19:18 ` Oliver Hartkopp
2 siblings, 2 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-09 16:14 UTC (permalink / raw)
To: Austin Schuh, Oliver Hartkopp, Marc Kleine-Budde, linux-can
The Hack seems to work, its been a short test of a half hour so far, but
before it happened reliably after a few seconds.
On 09/09/15 03:30, Austin Schuh wrote:
> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>> Hi all,
>>
>> On 08.09.2015 13:17, Daniel Squires wrote:
>>> On 08/09/15 12:13, Marc Kleine-Budde wrote:
>>>>> I can see the packets coming in the correct order in wireshark and it is
>>>>> not immediately obvious to me how the kernel module could mix up the
>>>>> order, so it seems that it must be something that happens at the socket
>>>>> level?
>>>> The kernel module "produces" the CAN frames, so if you see them in the
>>>> correct order in wireshark, they have left the module in the right order.
>> Yes. This is trivial.
>>
>> But Daniel is right to ask about the frame reordering on socket level - better
>> say - reordering outside the driver level.
>>
>>> Sorry , I should have been clearer here, in wireshark was looking at the USB
>>> frames not the CAN frames. however I think what you say still stands due to
>>> the time stamps being in the correct order.
>>>>> candump can3 -tz
>>>>> <snip>
>>>>> (003.088648) can3 043 [8] F7 2D 00 00 00 00 00 00
>>>>> (003.089149) can3 045 [8] F9 2D 00 00 00 00 00 00
>>>>> (003.088897) can3 044 [8] F8 2D 00 00 00 00 00 00
>>>> The timestamps are in the correct order. Maybe Oliver can help here,
>>>> he's an expert when it comes to strange reordering :)
>> Will try - see below.
>>
>>>>> On the top level I am using CANFestival for CANOpen implementation, so
>>>>> it has occurred to me I could implement a CANFestival "driver" using
>>>>> libusb and completely bypass the kernel module and socket can layers,
>>>>> but I hope not to have to do this.
>>>> Na, you don't want to do this.
>> The point this that it would not help either - even if you are using the
>> PF_PACKET socket (which wireshark does) - bypassing the CAN network layer
>> modules (can, can_raw) doesn't fix the problem.
>>
>> I discussed the problem on netdev ML as I discovered a out-of-order issue when
>> fixing the CAN_RAW join feature.
>>
>> When you have a multicore SMP processor the interrupt can be processed by
>> different CPUs, which can lead to packet reordering when using netif_ix() on
>> driver level.
>>
>> The discussion ended with the networking guys pointing me to use NAPI which
>> does not really help, e.g. there's only one USB network adapter in
>> linux/drivers/net which is a complete mess.
>>
>> My suggestion was to set a hash value into the socket buffer (skb) at driver
>> level, which is used for generating a 'flow' for IP traffic too. You can
>> generate flows by hashes to put all traffic from a specific IP into the same
>> per-cpu input queue to help TCP assembling the packets in the softirq for this
>> IP address in correct order (aha!).
>>
>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>>
>> I assume the networking guys interpreted my suggestion as hack as they are not
>> aware how 'addressing' is done in CAN. They only know about IP ...
>>
>> NAPI is not really a valid solution for CAN USB adapters and I think I'll have
>> to restart the discussion as out-of-order frames are a no-go for CAN as it
>> kills ISO15765-2 and (obviously) CANopen segmentation.
>>
>> I assume Daniel uses a multicore system, right?
>>
>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
>> problem. It might help for the discussion too.
>>
>> Regards,
>> Oliver
> On our boxes, I've been setting the affinity for both the IRQ thread
> (we are running a RT kernel), and the interrupt to the same single
> core. Would that help here?
>
> We've seen CAN packets get significantly delayed causing overruns due
> to Ethernet load and both CAN and ethernet sharing the same softirq.
> Our solution has been to set the affinity for each of those to
> different cores to keep them isolated.
>
> Austin
>
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-09 3:10 ` Brian Silverman
@ 2015-09-09 16:23 ` Oliver Hartkopp
0 siblings, 0 replies; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-09 16:23 UTC (permalink / raw)
To: Brian Silverman, Austin Schuh, Daniel Squires
Cc: Marc Kleine-Budde, linux-can
On 09.09.2015 05:10, Brian Silverman wrote:
> Another thing Austin and I do is set up RPS (receive packet steering)
> for the Ethernet interfaces so the hardware sends the physical
> Ethernet interrupts to that same core which isn't involved in
> servicing the CAN interfaces at all.
>
> On Tue, Sep 8, 2015 at 10:30 PM, Austin Schuh <austin@peloton-tech.com> wrote:
>>
>> On our boxes, I've been setting the affinity for both the IRQ thread
>> (we are running a RT kernel), and the interrupt to the same single
>> core. Would that help here?
Yes it does.
Nailing the interrupts from ethernet and CAN interfaces to different CPUs by
setting irq_affinity is a valid but pretty hard solution.
Therefore I was trying to use hash based RPS to fix the out-of-order problem
and let the kernel networking do the (hopefully optimal) rest.
>> We've seen CAN packets get significantly delayed causing overruns due
>> to Ethernet load and both CAN and ethernet sharing the same softirq.
>> Our solution has been to set the affinity for each of those to
>> different cores to keep them isolated.
Yes. That's a good point. Due to the fact that you split up ethernet and CAN
on different CPUs the softirq is also running on different CPUs.
This could also be the solution for Daniels problem!
My suggested solution (aka 'hack'
http://marc.info/?l=linux-netdev&m=143689694125450&w=2 ) with the hash based
RPS does not split the ethernet/CAN traffic among CPUs as - depending on the
IP hashes - some of the ethernet traffic can be pushed into the same CPU we
use for the CAN interface.
So it's a softer solution which at least fixes out-of-order for CAN interfaces.
Regards,
Oliver
ps. There were some performance tests (vanilla and RT kernel) from the
University of Prague where you can see the impact of additional ethernet load:
http://rtime.felk.cvut.cz/can/
http://rtime.felk.cvut.cz/can/benchmark/3.0/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-09 16:14 ` Daniel Squires
@ 2015-09-09 16:31 ` Oliver Hartkopp
2015-09-17 19:18 ` Oliver Hartkopp
1 sibling, 0 replies; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-09 16:31 UTC (permalink / raw)
To: Daniel Squires, Austin Schuh, Marc Kleine-Budde, linux-can
On 09.09.2015 18:14, Daniel Squires wrote:
> The Hack seems to work, its been a short test of a half hour so far, but
> before it happened reliably after a few seconds.
Great! You should not see any out-of-order frames anymore.
I obviously have to start a new attempt to push that single line of source
code into mainline :-)
If it doesn't help to fix your latency problem under ethernet load, you might
check the fixed irq_affinity setting for separating ethernet/CAN CPUs that
Brian and Austin were suggesting.
Thanks for the feedback,
Oliver
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-08 11:46 ` Wolfgang Grandegger
2015-09-08 11:49 ` Daniel Squires
2015-09-08 11:56 ` Marc Kleine-Budde
@ 2015-09-10 2:29 ` Tom Evans
2015-09-10 8:08 ` Daniel Squires
2 siblings, 1 reply; 20+ messages in thread
From: Tom Evans @ 2015-09-10 2:29 UTC (permalink / raw)
To: Daniel Squires, linux-can
On 08/09/15 21:46, Wolfgang Grandegger wrote:
>
> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
>> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>>> On my laptop and Desktop PC I have not seen it happen.
>> I mean what kind of CAN adapter...
>
> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
> controller.
http://fabiobaltieri.com/2013/07/23/hacking-into-a-vehicle-can-bus-toyothack-and-socketcan/#more-1419
"my own open hardware USB AVR + MCP2515 interface", "the performances are not
that good above 250kbps", "It’s tempting to use an SPI controller (the MCP2515
is very common), but that has terrible performances on highly loaded fast
busses, and you will end up with problem such as RX buffer underruns and
out-of-order frames."
He means "overruns". The MCP2515 doesn't have a FIFO. Messages have to be read
out over a slow SPI bus one bit at a time within one message time or it
overruns. Or two if the BUKT bit is set, but that risks reading messages in
the wrong order.
The design uses an ATMEGA32U2 and an MCP2515. I can't see why it shouldn't be
able to buffer messages from the MCP2515 at relatively high data rates, if the
code is well written. From my experience though, code for the MCP2515 is
seldom "well written". It is too easy to fall into a trap and get the message
arrival order wrong.
This is unlikely to be related to the OP's problem, but just something to be
aware of.
For anybody still coding and debugging MCP2515 stuff:
http://www.microchip.com/forums/m620741.aspx
> otherwise its prone to loosing TX packets when loaded.
Do you know about having to do something like the following to stop CAN
Transmit Drops? The networking stack defaults to DROPPING CAN transmit frames
before blocking the socket if you don't.
/bin/echo 256 > /sys/class/net/can0/tx_queue_len
...
int sndbuf = (250 + 8) * 256;
socklen_t socklen = sizeof(sndbuf);
/* Minimum socket buffer to try and get it blocking */
rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
&sndbuf, sizeof(sndbuf));
http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
Tom
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-10 2:29 ` Tom Evans
@ 2015-09-10 8:08 ` Daniel Squires
0 siblings, 0 replies; 20+ messages in thread
From: Daniel Squires @ 2015-09-10 8:08 UTC (permalink / raw)
To: tom_usenet, linux-can
On 10/09/15 03:29, Tom Evans wrote:
> On 08/09/15 21:46, Wolfgang Grandegger wrote:
>>
>> Am 08.09.2015 um 13:13 schrieb Marc Kleine-Budde:
>>> On 09/08/2015 12:41 PM, Daniel Squires wrote:
>>>> On my laptop and Desktop PC I have not seen it happen.
>>> I mean what kind of CAN adapter...
>>
>> "https://github.com/fabiobaltieri/open-usb-can" seems to use a MCP2515
>> controller.
>
> http://fabiobaltieri.com/2013/07/23/hacking-into-a-vehicle-can-bus-toyothack-and-socketcan/#more-1419
>
>
> "my own open hardware USB AVR + MCP2515 interface", "the performances
> are not that good above 250kbps", "It’s tempting to use an SPI
> controller (the MCP2515 is very common), but that has terrible
> performances on highly loaded fast busses, and you will end up with
> problem such as RX buffer underruns and out-of-order frames."
>
> He means "overruns". The MCP2515 doesn't have a FIFO. Messages have to
> be read out over a slow SPI bus one bit at a time within one message
> time or it overruns. Or two if the BUKT bit is set, but that risks
> reading messages in the wrong order.
>
> The design uses an ATMEGA32U2 and an MCP2515. I can't see why it
> shouldn't be able to buffer messages from the MCP2515 at relatively
> high data rates, if the code is well written. From my experience
> though, code for the MCP2515 is seldom "well written". It is too easy
> to fall into a trap and get the message arrival order wrong.
>
> This is unlikely to be related to the OP's problem, but just something
> to be aware of.
Yes, I had read that whole article and didn't use the hardware /
firmware there for those reasons. I needed 1Mbit also. I am using the
STM32F4 Discovery boards with a CAN phy attached. I didn't know where to
start with t kernel module which is why i am suing the one from there at
present, it would be nice to get a "standardised" usb class kernel
module, but i guess that would require input form the USB implementers
group.
>
> For anybody still coding and debugging MCP2515 stuff:
>
> http://www.microchip.com/forums/m620741.aspx
>
> > otherwise its prone to loosing TX packets when loaded.
>
> Do you know about having to do something like the following to stop
> CAN Transmit Drops? The networking stack defaults to DROPPING CAN
> transmit frames before blocking the socket if you don't.
>
> /bin/echo 256 > /sys/class/net/can0/tx_queue_len
> ...
> int sndbuf = (250 + 8) * 256;
> socklen_t socklen = sizeof(sndbuf);
> /* Minimum socket buffer to try and get it blocking */
> rc = setsockopt(pSkt->skt, SOL_SOCKET, SO_SNDBUF,
> &sndbuf, sizeof(sndbuf));
>
I hadn't noticed it could be done in that way, had been using ip
utility, but was aware of the 10 frames default queue size and that it
could be changed. In my very basic OOO test app i'm actually sending
packets with incrementing values until there is no space (send returns
ENOBUF), then doing the receives and checking the values until there is
nothing to receive, before continuing to send from the previous failed
value.
>
> http://socket-can.996257.n3.nabble.com/Solving-ENOBUFS-returned-by-write-td2886.html
>
>
> Tom
>
>
>
--
Dan Squires
Engineered Arts Ltd.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: socket can receive order
2015-09-09 16:14 ` Daniel Squires
2015-09-09 16:31 ` Oliver Hartkopp
@ 2015-09-17 19:18 ` Oliver Hartkopp
1 sibling, 0 replies; 20+ messages in thread
From: Oliver Hartkopp @ 2015-09-17 19:18 UTC (permalink / raw)
To: Daniel Squires, Austin Schuh, Marc Kleine-Budde, linux-can
Hello Daniel,
On 09.09.2015 18:14, Daniel Squires wrote:
> The Hack seems to work, its been a short test of a half hour so far, but
> before it happened reliably after a few seconds.
>
>> On Tue, Sep 8, 2015 at 9:56 AM Oliver Hartkopp <socketcan@hartkopp.net> wrote:
>>>
>>> See http://marc.info/?l=linux-netdev&m=143689694125450&w=2
>>>
>>> If so, please try the 'hack' I suggested on the netdev ML if it fixes your
>>> problem. It might help for the discussion too.
In the referenced posting above I suggested to set
skb_set_hash(skb, dev->ifindex, PKT_HASH_TYPE_L2);
to create a interface specific hash for the socket buffer.
And then to enable the receive packet steering (RPS) with
echo f > /sys/class/net/can0/queues/rx-0/rps_cpus
To create a proper patch and description I evaluated some more skb_set_hash()
parameters and finally discovered that setting the skb hash seems to be
obsolete ... %-)
Can you confirm that
echo f > /sys/class/net/can0/queues/rx-0/rps_cpus
already fixes the out-of-order issue even without setting the skb hash?
If so we could give a general recommendation for multi-core CPU system users
to enable RPS for CAN interfaces by setting the specific sysfs entry.
Regards,
Oliver
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2015-09-17 19:24 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 9:42 socket can receive order Daniel Squires
2015-09-08 10:01 ` Marc Kleine-Budde
2015-09-08 10:41 ` Daniel Squires
2015-09-08 11:13 ` Marc Kleine-Budde
2015-09-08 11:17 ` Daniel Squires
2015-09-08 11:20 ` Marc Kleine-Budde
2015-09-08 11:37 ` Daniel Squires
2015-09-08 16:56 ` Oliver Hartkopp
2015-09-09 2:30 ` Austin Schuh
2015-09-09 3:10 ` Brian Silverman
2015-09-09 16:23 ` Oliver Hartkopp
2015-09-09 12:05 ` Daniel Squires
2015-09-09 16:14 ` Daniel Squires
2015-09-09 16:31 ` Oliver Hartkopp
2015-09-17 19:18 ` Oliver Hartkopp
2015-09-08 11:46 ` Wolfgang Grandegger
2015-09-08 11:49 ` Daniel Squires
2015-09-08 11:56 ` Marc Kleine-Budde
2015-09-10 2:29 ` Tom Evans
2015-09-10 8:08 ` Daniel Squires
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.