* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
@ 2021-06-29 19:39 Harald Mommer
2021-06-30 7:27 ` Oliver Hartkopp
0 siblings, 1 reply; 10+ messages in thread
From: Harald Mommer @ 2021-06-29 19:39 UTC (permalink / raw)
To: Marc Kleine-Budde, Oliver Hartkopp; +Cc: linux-can
[Re-sent because some mechanism on the mailing list thought this was
SPAM and rejected.
Looks like the list does not like when Thunderbird composes a HTML
E-Mail. Setting changed & retry.]
Hello,
Am 25.06.21 um 11:39 schrieb Marc Kleine-Budde:
> It makes sense to have a TX done notification. You probably need this
> for proper queue handling and throttling.
Yes. But this acknowledgements must be 100% reliable under all possible
load conditions otherwise testers will prove that the solution does only
work when the sun is shining but not during bad weather.
>
>>> Can you sketch a quick block diagram showing guest, host, Virtio device,
>>> Virtio driver, etc...
>> I hope this arrives on the list as is been sent and not garbled:
>>
>> Guest 2 | Guest3
>> ---------------- | ----------------
>> ! cangen, ! | ! cangen, !
>> ! candump, ! | ! candump, !
>> ! cansend ! | ! cansend !
>> ! using vcan0 ! | ! using can0 !
>> ---------------- | ----------------
>> ^ | ^
>> ! --------------------- | !
>> ! ! Service process ! | !
>> ! ! in user space ! | !
> Oliver has already commented on this :) Getting feedback from the
> community early could have saved you some work :)
I still don't get it. This service process is the virtio device itself.
All our virtio devices are user land processes. There is no problem,
this works that way.
The problem may be that the virtio device should better not have used
vcan0 to get CAN access and that it should have used something different
instead. CAN GW? Is it that what you want to tell me all the time? "Do
not use vcan0 to exchange CAN messages but use CAN GW"? In this case in
the picture the box "Device Linux / VCAN / vcan0" changes but not the
userland virtio CAN device service process box.
If it's this I'll get into CAN GW to understand what all this means now
and how to use it.
But anyway, if so this should not have any impact on the driver or the
spec, this would be an issue of the device implementation itself which
is closed source and should now not be this interesting.
>> ! ! virtio-can device ! | !
>> ! ! forwarding vcan0 ! | !
>> ! --------------------- | !
>> ! ^ ^ | !
>> ! ! ! | !
>> --------------------------------------------------
>> ! ! Device side ! kernel | Driver side ! kernel
>> v v v | v
>> ---------------- -------------- | ----------------
>> ! Device Linux ! ! HV support ! | ! Driver Linux !
>> ! VCan ! ! module ! | ! Virtio CAN !
>> ! vcan0 ! ! on device ! | ! can0 !
>> ! ! ! side ! | ! !
>> ---------------- -------------- | ----------------
>> ^ ^ | ^
>> ! ! | !
>> --------------------------------------------------
>> ! ! ! Hypervisor
>> v v v
>> --------------------------------------------------
>> ! COQOS-HV !
>> --------------------------------------------------
>>
>>
> IC - as I'm not interested in closed source solution I'd focus on the
> qemu use case. Good thing is, the virtio-can must handle both use cases
> anyways.
For me qemu is in this moment an unknown environment to develop for.
There are already some challenges in this project and at some point
there are too much challenges. Have to discuss if/how qemu is to be
addressed.
> Your user space bridge is the wrong solution here.....See Oliver's mail.
The virtio devices are always user land processes in our architecture.
Only what exactly is to be bridged is the question.
>> Nothing which should be done now, getting far too complicated for a 1st shot
>> to implement a Virtio CAN device.
>>
>>> We don't have a feature flag to query if the Linux driver support proper
>>> CAN echo on TX complete notification.
>> Not so nice. But the device integrator should know which backend is used and
>> having a command line option for the device application the issue can be
>> handled. Need the command line switch anyway now to do experiments.
> If needed we can add flags to the CAN drivers so that they are
> introspectable, maybe via the ethtool interface.
I understand here that nothing is etched in stone for all time. Did not
expect that something like this could be possible.
> Marc
Harald
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-29 19:39 MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Harald Mommer
@ 2021-06-30 7:27 ` Oliver Hartkopp
0 siblings, 0 replies; 10+ messages in thread
From: Oliver Hartkopp @ 2021-06-30 7:27 UTC (permalink / raw)
To: Harald Mommer, Marc Kleine-Budde; +Cc: linux-can
On 29.06.21 21:39, Harald Mommer wrote:
> I still don't get it. This service process is the virtio device itself.
> All our virtio devices are user land processes. There is no problem,
> this works that way.
Works this way ... well, AFAIK virtio devices are usually no user space
implementations.
> The problem may be that the virtio device should better not have used
> vcan0 to get CAN access and that it should have used something different
> instead. CAN GW? Is it that what you want to tell me all the time? "Do
> not use vcan0 to exchange CAN messages but use CAN GW"?
You would still still use vcan0 or whatever you name it. But the
"routing between CAN interfaces" can be done more efficiently inside the
kernel.
> In this case in
> the picture the box "Device Linux / VCAN / vcan0" changes but not the
> userland virtio CAN device service process box.
My suggestion is more like: Create a virtual CAN device that exposes the
virtio net driver as a CAN device inside kernel space.
An then you can use can-gw to do filtering/firewalling/forwarding to
different application specific vcan's with can-gw.
> If it's this I'll get into CAN GW to understand what all this means now
> and how to use it.
Just try this (as root):
modprobe can-gw
cangw -A -s vcan0 -d vcan1 -e
cangw -A -s vcan0 -d vcan2 -e -m OR:ID:400.8.8888888888888888
cangen vcan0
(and candump -c -c any on a second terminal)
This should give an impression. No filtering shown.
> But anyway, if so this should not have any impact on the driver or the
> spec, this would be an issue of the device implementation itself which
> is closed source and should now not be this interesting.
IMO a CAN virtio driver can be from public interest - and it has no USP.
So why putting such a simple thing under closed source?
Regards,
Oliver
ps. Some can-gw / CAN net namespace slideware:
https://wiki.automotivelinux.org/_media/agl-distro/agl2018-socketcan.pdf
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-24 18:45 ` Oliver Hartkopp
@ 2021-06-28 13:47 ` Harald Mommer
0 siblings, 0 replies; 10+ messages in thread
From: Harald Mommer @ 2021-06-28 13:47 UTC (permalink / raw)
To: Oliver Hartkopp, linux-can
Hello Oliver,
Am 24.06.21 um 20:45 schrieb Oliver Hartkopp:
>
> What is this 'device application' in the sketch below?
The device application provides the virtio CAN device. It provides a
virtio CAN device using an existing CAN device (here vcan).
>
>>> Can you sketch a quick block diagram showing guest, host, Virtio
>>> device,
>>> Virtio driver, etc...
>>
>> I hope this arrives on the list as is been sent and not garbled:
>>
>> Guest 2 | Guest3
>> ---------------- | ----------------
>> ! cangen, ! | ! cangen, !
>> ! candump, ! | ! candump, !
>> ! cansend ! | ! cansend !
>> ! using vcan0 ! | ! using can0 !
>> ---------------- | ----------------
>> ^ | ^
>> ! --------------------- | !
>> ! ! Service process ! | !
>> ! ! in user space ! | !
>> ! ! virtio-can device ! | !
>> ! ! forwarding vcan0 ! | !
>> ! --------------------- | !
>
> Hopefully not this "Service process in user space" ???
The virtio CAN device is the "Service process in user space".
>
> If so, this is a very questionable approach!
>
> To route/forward/manipulate CAN frames between CAN network interfaces
> there is a CAN gateway module 'can-gw' which can be controlled over
> PF_NETLINK.
>
> The can-gw runs super efficient and fast inside kernel space in the
> SOFTIRQ context.
>
> E.g. 22.000 CAN frames/s with 6% sys load on a 2 core i7 from 2012,
> here: https://youtu.be/O3eOjfTl1yk?t=89
>
> Just type cangw from the can-utils to get an impression of the powerful
> options.
>
> You can even calculate E2E CRCs and XOR checksums after doing content
> mods on the fly.
>
>> ! ^ ^ | !
>> ! ! ! | !
>> --------------------------------------------------
>> ! ! Device side ! kernel | Driver side ! kernel
>> v v v | v
>> ---------------- -------------- | ----------------
>> ! Device Linux ! ! HV support ! | ! Driver Linux !
>> ! VCan ! ! module ! | ! Virtio CAN !
>> ! vcan0 ! ! on device ! | ! can0 !
>> ! ! ! side ! | ! !
>> ---------------- -------------- | ----------------
>> ^ ^ | ^
>> ! ! | !
>> --------------------------------------------------
>> ! ! ! Hypervisor
>> v v v
>> --------------------------------------------------
>> ! COQOS-HV !
>> --------------------------------------------------
>>
>
> (..)
>
>> can be handled. Need the command line switch anyway now to do
>> experiments.
>
> Now with cangw ?!? ;-)
No. We cannot do this here with something which already exists like CAN
GW. We are not talking about user processes running on the same Linux
instance which want to communicate to each other. This might have been
the misunderstanding here.
We are talking about two different virtual machines both running
different OS instances under a hypervisor! And one or two VMs may not
even run Linux as the OS. The device VM could in a future setup run
under an RTOS using maybe an AUTOSAR CAN driver as backend which might
even come from a 3rd party.
In the current setup we have 2 VMs running different instances of Linux
on the same physical machine under hypervisor control. Only the left VM,
the device VM has access to any hardware (like a CAN controller). The
right VM has no direct access to any hardware at all. To be able to send
and receive frames in the right (driver) VM we have to do something to
be able to get out to the external world. Currently there exists nothing
to do this for CAN so we must do the new virtio CAN device which allows
the access to a (physical) CAN controller via Virtio means.
>
> Regards,
> Oliver
>
Regards
Harald
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-24 15:21 ` Harald Mommer
2021-06-24 18:45 ` Oliver Hartkopp
@ 2021-06-25 9:39 ` Marc Kleine-Budde
1 sibling, 0 replies; 10+ messages in thread
From: Marc Kleine-Budde @ 2021-06-25 9:39 UTC (permalink / raw)
To: Harald Mommer; +Cc: linux-can
[-- Attachment #1: Type: text/plain, Size: 8179 bytes --]
On 24.06.2021 17:21:15, Harald Mommer wrote:
> Hello,
>
> Am 18.06.21 um 11:16 schrieb Marc Kleine-Budde:
> > On 17.06.2021 14:22:03, Harald Mommer wrote:
> > > we are currently in the process of developing a draft specification for
> > > Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
> > > driver and a Virtio CAN Linux device
> > Oh that sounds interesting. Please keep the linux-can mailing list in
> > the loop. Do you have a first draft version for review, yet?
>
> First draft went to virtio-comment@lists.oasis-open.org and
> virtio-dev@lists.oasis-open.org.
>
> https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results
>
> Link should reveal the short conversation. Currently working on the next
> draft which incorporates the review comments I got so far but the next draft
> will also address the "TX ACK" problem we are discussing here.
>
> In the future I will put the Linux-CAN list in the loop.
>
> > > running on top of our hypervisor solution.
> > >
> > > The Virtio CAN Linux device forwards an existing SocketCAN CAN device
> > > (currently vcan) via Virtio to the Virtio driver guest so that the virtual
> > > driver guest can send and receive CAN frames via SocketCAN.
> > >
> > > What was originally planned (probably with too much AUTOSAR CAN driver
> > > semantics in my head and too few SocketCAN knowledge) is to mark a
> > > transmission request as used (done) when it's sent finally on the CAN bus
> > > (vs. when it's given to SocketCAN not really done but still pending
> > > somewhere in the protocol stack).
> > Makes sense.
>
> Reading the "Makes sense". But reading also the rest of the E-Mail (and the
> thread) it makes the impression that making this timing requirement
> mandatory using SocketCAN is calling for trouble.
It makes sense to have a TX done notification. You probably need this
for proper queue handling and throttling.
> - Could remove the timing requirement. This is the easy solution. But there
> is the "Makes sense".
>
> - The original strict timing requirement becomes an option so it's not a
> mandatory requirement.
>
> 2nd is my favorite (but I tend to do over engineering in the first shot so
> the option before may be indeed the better one).
>
> Not having this timing behavior has the implication that in the next virtio
> draft spec some other things have to be changed and this means now
> simplified.
>
> > > Thought this was doable with some implementation effort using
> > >
> > > setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
> > > MSG_CONFIRM bit on received messages.
> > Where does that code run? Would that be part of qemu running on the host
> > of an open source solution?
> The device application is closed source, runs under the COQOS hypervisor
> which is also closed source.
Ok
> A qemu device implementation is not planned as of now. The virtio CAN
> driver is a Linux device driver and will be open sourced at some point
> in time in the hope to get it upstreamed in a more far away future.
I suggest to post the code as early as possible, probably along with the
next round of virio-can spec RFC.
> Currently the driver is on an internal development branch, outsiders
> cannot see it (still better for everyone)
I doubt that :) I think the Linux community has seen a lot of code that
has been cooking for too long before trying to bring it mainline.
> and the colleagues are reviewing helping to bring it into an
> acceptable shape.
You have to pass the review here anyways :D
> > Can you sketch a quick block diagram showing guest, host, Virtio device,
> > Virtio driver, etc...
>
> I hope this arrives on the list as is been sent and not garbled:
>
> Guest 2 | Guest3
> ---------------- | ----------------
> ! cangen, ! | ! cangen, !
> ! candump, ! | ! candump, !
> ! cansend ! | ! cansend !
> ! using vcan0 ! | ! using can0 !
> ---------------- | ----------------
> ^ | ^
> ! --------------------- | !
> ! ! Service process ! | !
> ! ! in user space ! | !
Oliver has already commented on this :) Getting feedback from the
community early could have saved you some work :)
> ! ! virtio-can device ! | !
> ! ! forwarding vcan0 ! | !
> ! --------------------- | !
> ! ^ ^ | !
> ! ! ! | !
> --------------------------------------------------
> ! ! Device side ! kernel | Driver side ! kernel
> v v v | v
> ---------------- -------------- | ----------------
> ! Device Linux ! ! HV support ! | ! Driver Linux !
> ! VCan ! ! module ! | ! Virtio CAN !
> ! vcan0 ! ! on device ! | ! can0 !
> ! ! ! side ! | ! !
> ---------------- -------------- | ----------------
> ^ ^ | ^
> ! ! | !
> --------------------------------------------------
> ! ! ! Hypervisor
> v v v
> --------------------------------------------------
> ! COQOS-HV !
> --------------------------------------------------
>
> > > This works fine with
> > >
> > > cangen -g 0 -i can0
> > >
> > > on the driver side sending CAN messages to the device guest. No confirmation
> > > is lost testing for several minutes.
>
> > Where's the driver side? On the host or the guest?
>
> Both sides are guests of the hypervisor in our architecture. There is no
> host in this sense, COQOS-HV is a type 1 hypervisor. The hypervisor does not
> provide devices directly on its own, the devices are provided with the
> support of a device (provider) guest which is also only a guest of the
> hypervisor.
IC - as I'm not interested in closed source solution I'd focus on the
qemu use case. Good thing is, the virtio-can must handle both use cases
anyways.
> > Have you activated SO_RXQ_OVFL?
> > With recvmsg() you get the number of dropped messages in the socket.
> > Have a look at:
> > https://github.com/linux-can/can-utils/blob/master/cansequence.c
>
> I had no idea about SO_RXQ_OVFL. This looks to be useful to implement an
> emergency recovery mechanism not to get stuck. If detecting loss of received
> frames and the controller is still active and TX messages are pending for a
> too long time then marking the pending TX messages as used (done) to cope
> with the situation and not getting stuck (for too long). Might be acceptable
> if this was something which normally does not happen besides in really
> exceptional situations.
Your user space bridge is the wrong solution here.....See Oliver's mail.
> Nothing which should be done now, getting far too complicated for a 1st shot
> to implement a Virtio CAN device.
>
> > We don't have a feature flag to query if the Linux driver support proper
> > CAN echo on TX complete notification.
>
> Not so nice. But the device integrator should know which backend is used and
> having a command line option for the device application the issue can be
> handled. Need the command line switch anyway now to do experiments.
If needed we can add flags to the CAN drivers so that they are
introspectable, maybe via the ethtool interface.
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-24 15:21 ` Harald Mommer
@ 2021-06-24 18:45 ` Oliver Hartkopp
2021-06-28 13:47 ` Harald Mommer
2021-06-25 9:39 ` Marc Kleine-Budde
1 sibling, 1 reply; 10+ messages in thread
From: Oliver Hartkopp @ 2021-06-24 18:45 UTC (permalink / raw)
To: Harald Mommer, linux-can
Hello Harald,
On 24.06.21 17:21, Harald Mommer wrote:
> The device application is closed source, runs under the COQOS hypervisor
> which is also closed source.
What is this 'device application' in the sketch below?
>> Can you sketch a quick block diagram showing guest, host, Virtio device,
>> Virtio driver, etc...
>
> I hope this arrives on the list as is been sent and not garbled:
>
> Guest 2 | Guest3
> ---------------- | ----------------
> ! cangen, ! | ! cangen, !
> ! candump, ! | ! candump, !
> ! cansend ! | ! cansend !
> ! using vcan0 ! | ! using can0 !
> ---------------- | ----------------
> ^ | ^
> ! --------------------- | !
> ! ! Service process ! | !
> ! ! in user space ! | !
> ! ! virtio-can device ! | !
> ! ! forwarding vcan0 ! | !
> ! --------------------- | !
Hopefully not this "Service process in user space" ???
If so, this is a very questionable approach!
To route/forward/manipulate CAN frames between CAN network interfaces
there is a CAN gateway module 'can-gw' which can be controlled over
PF_NETLINK.
The can-gw runs super efficient and fast inside kernel space in the
SOFTIRQ context.
E.g. 22.000 CAN frames/s with 6% sys load on a 2 core i7 from 2012,
here: https://youtu.be/O3eOjfTl1yk?t=89
Just type cangw from the can-utils to get an impression of the powerful
options.
You can even calculate E2E CRCs and XOR checksums after doing content
mods on the fly.
> ! ^ ^ | !
> ! ! ! | !
> --------------------------------------------------
> ! ! Device side ! kernel | Driver side ! kernel
> v v v | v
> ---------------- -------------- | ----------------
> ! Device Linux ! ! HV support ! | ! Driver Linux !
> ! VCan ! ! module ! | ! Virtio CAN !
> ! vcan0 ! ! on device ! | ! can0 !
> ! ! ! side ! | ! !
> ---------------- -------------- | ----------------
> ^ ^ | ^
> ! ! | !
> --------------------------------------------------
> ! ! ! Hypervisor
> v v v
> --------------------------------------------------
> ! COQOS-HV !
> --------------------------------------------------
>
(..)
> can be handled. Need the command line switch anyway now to do experiments.
Now with cangw ?!? ;-)
Regards,
Oliver
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-18 9:16 ` Marc Kleine-Budde
2021-06-18 18:23 ` Oliver Hartkopp
@ 2021-06-24 15:21 ` Harald Mommer
2021-06-24 18:45 ` Oliver Hartkopp
2021-06-25 9:39 ` Marc Kleine-Budde
1 sibling, 2 replies; 10+ messages in thread
From: Harald Mommer @ 2021-06-24 15:21 UTC (permalink / raw)
To: linux-can
Hello,
Am 18.06.21 um 11:16 schrieb Marc Kleine-Budde:
> On 17.06.2021 14:22:03, Harald Mommer wrote:
>> we are currently in the process of developing a draft specification for
>> Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
>> driver and a Virtio CAN Linux device
> Oh that sounds interesting. Please keep the linux-can mailing list in
> the loop. Do you have a first draft version for review, yet?
First draft went to virtio-comment@lists.oasis-open.org and
virtio-dev@lists.oasis-open.org.
https://markmail.org/search/?q=virtio-can&q=list%3Aorg.oasis-open.lists.virtio-comment#query:virtio-can%20list%3Aorg.oasis-open.lists.virtio-comment+page:1+mid:hdxj35fsthypllkt+state:results
Link should reveal the short conversation. Currently working on the next
draft which incorporates the review comments I got so far but the next
draft will also address the "TX ACK" problem we are discussing here.
In the future I will put the Linux-CAN list in the loop.
>> running on top of our hypervisor solution.
>>
>> The Virtio CAN Linux device forwards an existing SocketCAN CAN device
>> (currently vcan) via Virtio to the Virtio driver guest so that the virtual
>> driver guest can send and receive CAN frames via SocketCAN.
>>
>> What was originally planned (probably with too much AUTOSAR CAN driver
>> semantics in my head and too few SocketCAN knowledge) is to mark a
>> transmission request as used (done) when it's sent finally on the CAN bus
>> (vs. when it's given to SocketCAN not really done but still pending
>> somewhere in the protocol stack).
> Makes sense.
Reading the "Makes sense". But reading also the rest of the E-Mail (and
the thread) it makes the impression that making this timing requirement
mandatory using SocketCAN is calling for trouble.
- Could remove the timing requirement. This is the easy solution. But
there is the "Makes sense".
- The original strict timing requirement becomes an option so it's not a
mandatory requirement.
2nd is my favorite (but I tend to do over engineering in the first shot
so the option before may be indeed the better one).
Not having this timing behavior has the implication that in the next
virtio draft spec some other things have to be changed and this means
now simplified.
>> Thought this was doable with some implementation effort using
>>
>> setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
>> MSG_CONFIRM bit on received messages.
> Where does that code run? Would that be part of qemu running on the host
> of an open source solution?
The device application is closed source, runs under the COQOS hypervisor
which is also closed source. A qemu device implementation is not planned
as of now. The virtio CAN driver is a Linux device driver and will be
open sourced at some point in time in the hope to get it upstreamed in a
more far away future. Currently the driver is on an internal development
branch, outsiders cannot see it (still better for everyone) and the
colleagues are reviewing helping to bring it into an acceptable shape.
> Can you sketch a quick block diagram showing guest, host, Virtio device,
> Virtio driver, etc...
I hope this arrives on the list as is been sent and not garbled:
Guest 2 | Guest3
---------------- | ----------------
! cangen, ! | ! cangen, !
! candump, ! | ! candump, !
! cansend ! | ! cansend !
! using vcan0 ! | ! using can0 !
---------------- | ----------------
^ | ^
! --------------------- | !
! ! Service process ! | !
! ! in user space ! | !
! ! virtio-can device ! | !
! ! forwarding vcan0 ! | !
! --------------------- | !
! ^ ^ | !
! ! ! | !
--------------------------------------------------
! ! Device side ! kernel | Driver side ! kernel
v v v | v
---------------- -------------- | ----------------
! Device Linux ! ! HV support ! | ! Driver Linux !
! VCan ! ! module ! | ! Virtio CAN !
! vcan0 ! ! on device ! | ! can0 !
! ! ! side ! | ! !
---------------- -------------- | ----------------
^ ^ | ^
! ! | !
--------------------------------------------------
! ! ! Hypervisor
v v v
--------------------------------------------------
! COQOS-HV !
--------------------------------------------------
>> This works fine with
>>
>> cangen -g 0 -i can0
>>
>> on the driver side sending CAN messages to the device guest. No confirmation
>> is lost testing for several minutes.
> Where's the driver side? On the host or the guest?
Both sides are guests of the hypervisor in our architecture. There is no
host in this sense, COQOS-HV is a type 1 hypervisor. The hypervisor does
not provide devices directly on its own, the devices are provided with
the support of a device (provider) guest which is also only a guest of
the hypervisor.
>
> Have you activated SO_RXQ_OVFL?
> With recvmsg() you get the number of dropped messages in the socket.
> Have a look at:
> https://github.com/linux-can/can-utils/blob/master/cansequence.c
I had no idea about SO_RXQ_OVFL. This looks to be useful to implement an
emergency recovery mechanism not to get stuck. If detecting loss of
received frames and the controller is still active and TX messages are
pending for a too long time then marking the pending TX messages as used
(done) to cope with the situation and not getting stuck (for too long).
Might be acceptable if this was something which normally does not happen
besides in really exceptional situations.
Nothing which should be done now, getting far too complicated for a 1st
shot to implement a Virtio CAN device.
> We don't have a feature flag to query if the Linux driver support proper
> CAN echo on TX complete notification.
Not so nice. But the device integrator should know which backend is used
and having a command line option for the device application the issue
can be handled. Need the command line switch anyway now to do experiments.
Regards
Harald
--
Dipl.-Ing. Harald Mommer
Senior Software Engineer
OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
Phone: +49 (30) 60 98 540-0 <== Zentrale
Fax: +49 (30) 60 98 540-99
E-Mail:harald.mommer@opensynergy.com
www.opensynergy.com
Handelsregister: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Regis Adjamah
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-18 18:23 ` Oliver Hartkopp
@ 2021-06-19 21:42 ` Marc Kleine-Budde
0 siblings, 0 replies; 10+ messages in thread
From: Marc Kleine-Budde @ 2021-06-19 21:42 UTC (permalink / raw)
To: Oliver Hartkopp; +Cc: Harald Mommer, linux-can
[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]
On 18.06.2021 20:23:39, Oliver Hartkopp wrote:
> > Even if the Linux Kernel doesn't drop any messages, not all CAN
> > controllers support that feature. On the Linux side we try our best, but
> > some USB attached devices don't report a TX complete event back, so the
> > driver triggers the CAN echo skb after the USB transfer has been
> > completed.
> >
> > We don't have a feature flag to query if the Linux driver support proper
> > CAN echo on TX complete notification.
>
> We have. It is set in struct netdevice.flags and called IFF_ECHO.
>
> https://elixir.bootlin.com/linux/v5.12.11/source/net/can/af_can.c#L257
The flag tells the rest of the stack, that the driver takes care of
generating the CAN echo packages.
Several USB based driver set the IFF_ECHO flag, but the USB device don't
signal the TX-complete to the host. These drivers generate the CAN echo
frame after the successful USB TX transmission. This is better than
letting the networking stack generate the CAN echo frame, but it's not
100% perfect.
> E.g. the slcan driver does not have this bit set.
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-18 9:16 ` Marc Kleine-Budde
@ 2021-06-18 18:23 ` Oliver Hartkopp
2021-06-19 21:42 ` Marc Kleine-Budde
2021-06-24 15:21 ` Harald Mommer
1 sibling, 1 reply; 10+ messages in thread
From: Oliver Hartkopp @ 2021-06-18 18:23 UTC (permalink / raw)
To: Marc Kleine-Budde, Harald Mommer; +Cc: linux-can
On 18.06.21 11:16, Marc Kleine-Budde wrote:
>
> Even if the Linux Kernel doesn't drop any messages, not all CAN
> controllers support that feature. On the Linux side we try our best, but
> some USB attached devices don't report a TX complete event back, so the
> driver triggers the CAN echo skb after the USB transfer has been
> completed.
>
> We don't have a feature flag to query if the Linux driver support proper
> CAN echo on TX complete notification.
>
We have. It is set in struct netdevice.flags and called IFF_ECHO.
https://elixir.bootlin.com/linux/v5.12.11/source/net/can/af_can.c#L257
E.g. the slcan driver does not have this bit set.
Regards,
Oliver
>> But before declaring as "not reliably implementable with
>> Linux SocketCAN" I would like to be sure that it's really that way and
>> absolutely nothing can be done about it. Could even be that I missed an
>> additional setting I'm not aware of. But the observed behavior may as well
>> be something which is known to everyone except me.
>>
>> Of course it can be that there is still a bug in my software but checked
>> this carefully and I'm now convinced that under heavy load situations
>> MSG_CONFIRM messages are lost somewhere in the Linux SocketCAN protocol
>> stack. If there's no way to recover from this situaton I've to weaken the
>> next draft Virtio CAN draft specification regarding the TX ACK timing. As
>> this has some additional impact on the specification before doing so I would
>> like to be really sure that the TX ACK timing cannot be done reliably the
>> way it was originally planned.
>
> Do you have some code available yet?
>
> regards,
> Marc
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
2021-06-17 12:22 Harald Mommer
@ 2021-06-18 9:16 ` Marc Kleine-Budde
2021-06-18 18:23 ` Oliver Hartkopp
2021-06-24 15:21 ` Harald Mommer
0 siblings, 2 replies; 10+ messages in thread
From: Marc Kleine-Budde @ 2021-06-18 9:16 UTC (permalink / raw)
To: Harald Mommer; +Cc: linux-can
[-- Attachment #1: Type: text/plain, Size: 4672 bytes --]
On 17.06.2021 14:22:03, Harald Mommer wrote:
> we are currently in the process of developing a draft specification for
> Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
> driver and a Virtio CAN Linux device
Oh that sounds interesting. Please keep the linux-can mailing list in
the loop. Do you have a first draft version for review, yet?
> running on top of our hypervisor solution.
>
> The Virtio CAN Linux device forwards an existing SocketCAN CAN device
> (currently vcan) via Virtio to the Virtio driver guest so that the virtual
> driver guest can send and receive CAN frames via SocketCAN.
>
> What was originally planned (probably with too much AUTOSAR CAN driver
> semantics in my head and too few SocketCAN knowledge) is to mark a
> transmission request as used (done) when it's sent finally on the CAN bus
> (vs. when it's given to SocketCAN not really done but still pending
> somewhere in the protocol stack).
Makes sense.
> Thought this was doable with some implementation effort using
>
> setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing the
> MSG_CONFIRM bit on received messages.
Where does that code run? Would that be part of qemu running on the host
of an open source solution?
Can you sketch a quick block diagram showing guest, host, Virtio device,
Virtio driver, etc...
> This works fine with
>
> cangen -g 0 -i can0
>
> on the driver side sending CAN messages to the device guest. No confirmation
> is lost testing for several minutes.
Where's the driver side? On the host or the guest?
> Adding now on the device side a
>
> cangen -g 0 -i vcan0
>
> sending messages like crazy from the device side guest to the driver side
> guest in parallel I'm loosing TX confirmations in the Linux CAN stack. Seems
> also there is no other error indication (CAN_ERR_FLAG) that something like
CAN_ERR_FLAG are only for real CAN errors on the bus or controller
problems. The vcan interface doesn't generate any.
> this happened. The virtio CAN device gets out of resources and TX will
> become stuck. Which is not really acceptable even for such a heavy load
> situation (-g0 on both sides).
>
> Is CAN_RAW_RECV_OWN_MSGS / MSG_CONFIRM known as being unreliable (means
> MSG_CONFIRM messages are dropped) under extreme load situations? If so, is
> there a way to detect reliably that this happened so that somehow a recovery
> mechanism for the pending TX acknowledgements could be implemented?
Have you activated SO_RXQ_OVFL?
With recvmsg() you get the number of dropped messages in the socket.
Have a look at:
https://github.com/linux-can/can-utils/blob/master/cansequence.c
> I'm aware that "normal" RX messages from other nodes may be dropped due to
> overload. No problem with this.
>
> The timing requirement originally set (done when sent on CAN bus) has to be
> weakened or put under a feature flag when it's not reliably implementable in
> all environments.
Even if the Linux Kernel doesn't drop any messages, not all CAN
controllers support that feature. On the Linux side we try our best, but
some USB attached devices don't report a TX complete event back, so the
driver triggers the CAN echo skb after the USB transfer has been
completed.
We don't have a feature flag to query if the Linux driver support proper
CAN echo on TX complete notification.
> But before declaring as "not reliably implementable with
> Linux SocketCAN" I would like to be sure that it's really that way and
> absolutely nothing can be done about it. Could even be that I missed an
> additional setting I'm not aware of. But the observed behavior may as well
> be something which is known to everyone except me.
>
> Of course it can be that there is still a bug in my software but checked
> this carefully and I'm now convinced that under heavy load situations
> MSG_CONFIRM messages are lost somewhere in the Linux SocketCAN protocol
> stack. If there's no way to recover from this situaton I've to weaken the
> next draft Virtio CAN draft specification regarding the TX ACK timing. As
> this has some additional impact on the specification before doing so I would
> like to be really sure that the TX ACK timing cannot be done reliably the
> way it was originally planned.
Do you have some code available yet?
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load?
@ 2021-06-17 12:22 Harald Mommer
2021-06-18 9:16 ` Marc Kleine-Budde
0 siblings, 1 reply; 10+ messages in thread
From: Harald Mommer @ 2021-06-17 12:22 UTC (permalink / raw)
To: linux-can
Hello,
we are currently in the process of developing a draft specification for
Virtio CAN. In the scope of this work I am developing a Virtio CAN Linux
driver and a Virtio CAN Linux device running on top of our hypervisor
solution.
The Virtio CAN Linux device forwards an existing SocketCAN CAN device
(currently vcan) via Virtio to the Virtio driver guest so that the
virtual driver guest can send and receive CAN frames via SocketCAN.
What was originally planned (probably with too much AUTOSAR CAN driver
semantics in my head and too few SocketCAN knowledge) is to mark a
transmission request as used (done) when it's sent finally on the CAN
bus (vs. when it's given to SocketCAN not really done but still pending
somewhere in the protocol stack).
Thought this was doable with some implementation effort using
setsockopt(..., SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS, ...) and evaluatiing
the MSG_CONFIRM bit on received messages.
This works fine with
cangen -g 0 -i can0
on the driver side sending CAN messages to the device guest. No
confirmation is lost testing for several minutes.
Adding now on the device side a
cangen -g 0 -i vcan0
sending messages like crazy from the device side guest to the driver
side guest in parallel I'm loosing TX confirmations in the Linux CAN
stack. Seems also there is no other error indication (CAN_ERR_FLAG) that
something like this happened. The virtio CAN device gets out of
resources and TX will become stuck. Which is not really acceptable even
for such a heavy load situation (-g0 on both sides).
Is CAN_RAW_RECV_OWN_MSGS / MSG_CONFIRM known as being unreliable (means
MSG_CONFIRM messages are dropped) under extreme load situations? If so,
is there a way to detect reliably that this happened so that somehow a
recovery mechanism for the pending TX acknowledgements could be implemented?
I'm aware that "normal" RX messages from other nodes may be dropped due
to overload. No problem with this.
The timing requirement originally set (done when sent on CAN bus) has to
be weakened or put under a feature flag when it's not reliably
implementable in all environments. But before declaring as "not reliably
implementable with Linux SocketCAN" I would like to be sure that it's
really that way and absolutely nothing can be done about it. Could even
be that I missed an additional setting I'm not aware of. But the
observed behavior may as well be something which is known to everyone
except me.
Of course it can be that there is still a bug in my software but checked
this carefully and I'm now convinced that under heavy load situations
MSG_CONFIRM messages are lost somewhere in the Linux SocketCAN protocol
stack. If there's no way to recover from this situaton I've to weaken
the next draft Virtio CAN draft specification regarding the TX ACK
timing. As this has some additional impact on the specification before
doing so I would like to be really sure that the TX ACK timing cannot be
done reliably the way it was originally planned.
Regards
Harald
--
Dipl.-Ing. Harald Mommer
Senior Software Engineer
OpenSynergy GmbH
Rotherstr. 20, 10245 Berlin
Phone: +49 (30) 60 98 540-0 <== Zentrale
Fax: +49 (30) 60 98 540-99
E-Mail: harald.mommer@opensynergy.com
www.opensynergy.com
Handelsregister: Amtsgericht Charlottenburg, HRB 108616B
Geschäftsführer/Managing Director: Regis Adjamah
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-06-30 7:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-29 19:39 MSG_CONFIRM RX messages with SocketCAN known as unreliable under heavy load? Harald Mommer
2021-06-30 7:27 ` Oliver Hartkopp
-- strict thread matches above, loose matches on Subject: below --
2021-06-17 12:22 Harald Mommer
2021-06-18 9:16 ` Marc Kleine-Budde
2021-06-18 18:23 ` Oliver Hartkopp
2021-06-19 21:42 ` Marc Kleine-Budde
2021-06-24 15:21 ` Harald Mommer
2021-06-24 18:45 ` Oliver Hartkopp
2021-06-28 13:47 ` Harald Mommer
2021-06-25 9:39 ` Marc Kleine-Budde
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).