All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] RT-Socket-CAN bus error rate and latencies
@ 2007-03-20 18:58 Wolfgang Grandegger
  2007-03-20 19:10 ` Jan Kiszka
  2007-03-21 17:14 ` [Xenomai-core] " Wolfgang Grandegger
  0 siblings, 2 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-20 18:58 UTC (permalink / raw)
  To: xenomai-core; +Cc: socketcan-core

Hello,

on the Xenomai mailing list the topic "bus error flooding" popped up 
again. Various users reported trouble due to high bus error rates and 
bad impact on latencies. Some discussion is going on on how to avoid 
such flooding. I have already implemented "on-demand" bus error 
interrupts. Bus error interrupts are then only enabled when at least one 
socket is listening on bus errors. But flooding can still occur and we 
are thinking about a better way of downscaling or temporarily disabling 
them. Socket-CAN currently restarts the controller after 200 bus errors.
My preferred solution for RT-Socket-CAN currently is to stop the CAN 
controller after a kernel configurable amount of successive bus errors. 
More clever ideas and comments are welcome?

To have some input, I have measured the bus error rate with the PEAK 
PCAN PCI card on my Icecube MPC5200 eval-board doing rtcansend without 
cable connected. Here are the results for the various baud-rates:

    125 KB/s  1926 BEI/s
    250 KB/s  3925 BEI/s
    500 KB/s  7856 BEI/s
   1000 KB/s 15700 BEI/s

The latency measured with "latency" from the testsuite reported an 
increase of the latency with load from 67 to 95us almost independently 
of the baud-rate. Sending messages with 8 byte payload from MSCAN to 
SJA1000 on the same node as fast as possible increased the latency up to 
103us. This measurement did not include delivery of messages to sockets 
(actually no socket was listening).

Wolfgang.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] RT-Socket-CAN bus error rate and latencies
  2007-03-20 18:58 [Xenomai-core] RT-Socket-CAN bus error rate and latencies Wolfgang Grandegger
@ 2007-03-20 19:10 ` Jan Kiszka
  2007-03-20 19:29   ` Wolfgang Grandegger
  2007-03-21 17:14 ` [Xenomai-core] " Wolfgang Grandegger
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2007-03-20 19:10 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: socketcan-core, xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1872 bytes --]

Wolfgang Grandegger wrote:
> Hello,
> 
> on the Xenomai mailing list the topic "bus error flooding" popped up
> again. Various users reported trouble due to high bus error rates and
> bad impact on latencies. Some discussion is going on on how to avoid
> such flooding. I have already implemented "on-demand" bus error
> interrupts. Bus error interrupts are then only enabled when at least one
> socket is listening on bus errors. But flooding can still occur and we
> are thinking about a better way of downscaling or temporarily disabling
> them. Socket-CAN currently restarts the controller after 200 bus errors.
> My preferred solution for RT-Socket-CAN currently is to stop the CAN
> controller after a kernel configurable amount of successive bus errors.
> More clever ideas and comments are welcome?
> 
> To have some input, I have measured the bus error rate with the PEAK
> PCAN PCI card on my Icecube MPC5200 eval-board doing rtcansend without
> cable connected. Here are the results for the various baud-rates:
> 
>    125 KB/s  1926 BEI/s
>    250 KB/s  3925 BEI/s
>    500 KB/s  7856 BEI/s
>   1000 KB/s 15700 BEI/s

So bus errors come in at a rate comparable to shortest possible frames
at maximum speed, right? That's an IRQ load you can live with - if you
plan to load your CAN bus at 100% anyway. But if you don't...

> 
> The latency measured with "latency" from the testsuite reported an
> increase of the latency with load from 67 to 95us almost independently
> of the baud-rate. Sending messages with 8 byte payload from MSCAN to
> SJA1000 on the same node as fast as possible increased the latency up to
> 103us. This measurement did not include delivery of messages to sockets
> (actually no socket was listening).
> 

Some dump of /proc/xenomai/stat would be interesting (system load in
both cases).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] RT-Socket-CAN bus error rate and latencies
  2007-03-20 19:10 ` Jan Kiszka
@ 2007-03-20 19:29   ` Wolfgang Grandegger
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-20 19:29 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: socketcan-core, xenomai-core

Jan Kiszka wrote:
> Wolfgang Grandegger wrote:
>> Hello,
>>
>> on the Xenomai mailing list the topic "bus error flooding" popped up
>> again. Various users reported trouble due to high bus error rates and
>> bad impact on latencies. Some discussion is going on on how to avoid
>> such flooding. I have already implemented "on-demand" bus error
>> interrupts. Bus error interrupts are then only enabled when at least one
>> socket is listening on bus errors. But flooding can still occur and we
>> are thinking about a better way of downscaling or temporarily disabling
>> them. Socket-CAN currently restarts the controller after 200 bus errors.
>> My preferred solution for RT-Socket-CAN currently is to stop the CAN
>> controller after a kernel configurable amount of successive bus errors.
>> More clever ideas and comments are welcome?
>>
>> To have some input, I have measured the bus error rate with the PEAK
>> PCAN PCI card on my Icecube MPC5200 eval-board doing rtcansend without
>> cable connected. Here are the results for the various baud-rates:
>>
>>    125 KB/s  1926 BEI/s
>>    250 KB/s  3925 BEI/s
>>    500 KB/s  7856 BEI/s
>>   1000 KB/s 15700 BEI/s
> 
> So bus errors come in at a rate comparable to shortest possible frames
> at maximum speed, right? That's an IRQ load you can live with - if you
> plan to load your CAN bus at 100% anyway. But if you don't...

Yes, it looks like.

>> The latency measured with "latency" from the testsuite reported an
>> increase of the latency with load from 67 to 95us almost independently
>> of the baud-rate. Sending messages with 8 byte payload from MSCAN to
>> SJA1000 on the same node as fast as possible increased the latency up to
>> 103us. This measurement did not include delivery of messages to sockets
>> (actually no socket was listening).
>>
> 
> Some dump of /proc/xenomai/stat would be interesting (system load in
> both cases).

OK, here for bus error flooding at 1MB/s:

bash-2.05b# cat /proc/xenomai/stat
CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
   0  0      0          9781211    0     00500080   74.0  ROOT
   0  27199  130        264        0     00300182    0.0  display-27197
   0  27200  0          260166     0     00300184    8.1  sampling-27197
   0  0      0          29325916   0     00000000   14.5  IRQ0: SJA1000
   0  0      0          0          0     00000000    0.0  IRQ0: SJA1000
   0  0      0          1000002    0     00000000    0.0  IRQ55: MSCAN
   0  0      0          0          0     00000000    0.0  IRQ56: MSCAN
   0  0      0          9685429    0     00000000    3.0  IRQ256: [timer]

and here for "rtcansend -d 0 -l 1000000 rtcan0 1 2 3 4 5 6 7":

bash-2.05b# cat /proc/xenomai/stat
CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
   0  0      0          10377571   0     00500080   51.5  ROOT
   0  27199  336        816        0     00300182    0.0  display-27197
   0  27200  0          672738     0     00300184    5.8  sampling-27197
   0  29526  0          291678     0     00300182   23.6  rtcansend-29526
   0  0      0          30863047   0     00000000    9.7  IRQ0: SJA1000
   0  0      0          0          0     00000000    0.0  IRQ0: SJA1000
   0  0      0          1242608    0     00000000    6.5  IRQ55: MSCAN
   0  0      0          0          0     00000000    0.0  IRQ56: MSCAN
   0  0      0          10115159   0     00000000    2.1  IRQ256: [timer]

Wolfgang.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-20 18:58 [Xenomai-core] RT-Socket-CAN bus error rate and latencies Wolfgang Grandegger
  2007-03-20 19:10 ` Jan Kiszka
@ 2007-03-21 17:14 ` Wolfgang Grandegger
       [not found]   ` <46017CA7.2080801@domain.hid>
  1 sibling, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-21 17:14 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: socketcan-core, xenomai-core

Wolfgang Grandegger wrote:
> Hello,
> 
> on the Xenomai mailing list the topic "bus error flooding" popped up 
> again. Various users reported trouble due to high bus error rates and 
> bad impact on latencies. Some discussion is going on on how to avoid 
> such flooding. I have already implemented "on-demand" bus error 
> interrupts. Bus error interrupts are then only enabled when at least one 
> socket is listening on bus errors. But flooding can still occur and we 
> are thinking about a better way of downscaling or temporarily disabling 
> them. Socket-CAN currently restarts the controller after 200 bus errors.
> My preferred solution for RT-Socket-CAN currently is to stop the CAN 
> controller after a kernel configurable amount of successive bus errors. 
> More clever ideas and comments are welcome?

What do you think about the following method?

   config XENO_DRIVERS_CAN_SJA1000_BUS_ERR_LIMIT
	depends on XENO_DRIVERS_CAN_SJA1000
	int "Maximum number of successive bus errors"
	range 0 255
	default 20
	help

	CAN bus errors are very useful for analyzing electrical problems
         but they can come at a very high rate resulting in interrupt
         flooding with bad impact on system performance and real-time
         behavior. This option, if greater than 0, will limit the amount
         of successive bus error interrupts. If the limit is reached, an
         error message with "can_id = CAN_ERR_BUSERR_FLOOD" is sent. The
         bus error counter gets reset on restart of the device and on any
         successful message transmission or reception. Be aware that bus
         error interrupts are only enabled if at least one socket is
         listening on bus errors.

I think it should make Sebastian and Jan happy, at least ;-).

> 
> To have some input, I have measured the bus error rate with the PEAK 
> PCAN PCI card on my Icecube MPC5200 eval-board doing rtcansend without 
> cable connected. Here are the results for the various baud-rates:
> 
>     125 KB/s  1926 BEI/s
>     250 KB/s  3925 BEI/s
>     500 KB/s  7856 BEI/s
>    1000 KB/s 15700 BEI/s
> 
> The latency measured with "latency" from the testsuite reported an 
> increase of the latency with load from 67 to 95us almost independently 
> of the baud-rate. Sending messages with 8 byte payload from MSCAN to 
> SJA1000 on the same node as fast as possible increased the latency up to 
> 103us. This measurement did not include delivery of messages to sockets 
> (actually no socket was listening).
> 
> Wolfgang.

Wolfgang.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
       [not found]   ` <46017CA7.2080801@domain.hid>
@ 2007-03-21 20:29     ` Wolfgang Grandegger
  2007-03-21 21:43       ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-21 20:29 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: socketcan-core, xenomai-core

Oliver Hartkopp wrote:
> Wolfgang Grandegger wrote:
>> Wolfgang Grandegger wrote:
>>   
>>> But flooding can still occur and we 
>>> are thinking about a better way of downscaling or temporarily disabling 
>>> them. Socket-CAN currently restarts the controller after 200 bus errors.
>>> My preferred solution for RT-Socket-CAN currently is to stop the CAN 
>>> controller after a kernel configurable amount of successive bus errors. 
>>> More clever ideas and comments are welcome?
>>>     
>> What do you think about the following method?
>>
>>    config XENO_DRIVERS_CAN_SJA1000_BUS_ERR_LIMIT
>> 	depends on XENO_DRIVERS_CAN_SJA1000
>> 	int "Maximum number of successive bus errors"
>> 	range 0 255
>> 	default 20
>> 	help
>>
>> 	CAN bus errors are very useful for analyzing electrical problems
>>          but they can come at a very high rate resulting in interrupt
>>          flooding with bad impact on system performance and real-time
>>          behavior. This option, if greater than 0, will limit the amount
>>          of successive bus error interrupts. If the limit is reached, an
>>          error message with "can_id = CAN_ERR_BUSERR_FLOOD" is sent. The
>>          bus error counter gets reset on restart of the device and on any
>>          successful message transmission or reception. Be aware that bus
>>          error interrupts are only enabled if at least one socket is
>>          listening on bus errors.
>>
>>   
> 
> Hi Wolfgang,
> 
> what would be the wanted behaviour, after the discussed problem of bus 
> error flooding occurred?

Well, I think the bus error rate should be downscaled without loosing 
vital information concerning the cause of the problem and it should 
require as little user intervention as possible. Treating it like a bus 
error as currently done in Socket-CAN is a bit to strong in my mind.

> Can the Controller be assumed to be 'slightly dead', or what? Is there 
> any chance that the bus heals by itself (=> no more bus errors) and can 
> be used in a normal way? Or is a user interaction recommended or _required_?

Yes, if you plug the cable, the bus errors might go away and the TX done 
interrupt will arrive or you get a bus-off (I have seen both).

> Indeed the slow down of bus errors is a reasonable approach, but your 
> suggested method leaves too many questions open for the user :-/

What questions?

> I would tend to reduce the notifications to the user by creating a timer 
> at the first bus error interrupt. The first BE irq would lead to a 
> CAN_ERR_BUSERROR and after a (configurable) time (e.g.250ms) the next 
> information about bus errors is allowed to be passed to the user. After 
> this time period is over a new CAN_ERR_BUSERROR may be passed to the 
> user containing the count of occurred bus errors somewhere in the 
> data[]-section of the Error Frame. When a normal RX/TX-interrupt 
> indicates a 'working' CAN again, the timer would be terminated.
> 
> Instead of a fix configurable time we could also think about a dynamic 
> behaviour (e.g. with increasing periods).
> 
> What do you think about this?

The question is if one bus-error does provide enough information on the 
cause of the electrical problem or if a sequence is better. Furthermore, 
I personally regard the use of timers as to heavy. But the solution is 
feasible, of course. Any other opinions?

Thanks.

Wolfgang.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-21 20:29     ` Wolfgang Grandegger
@ 2007-03-21 21:43       ` Jan Kiszka
  2007-03-22  8:08         ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2007-03-21 21:43 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: socketcan-core, Oliver Hartkopp, xenomai-core

[-- Attachment #1: Type: text/plain, Size: 4870 bytes --]

Wolfgang Grandegger wrote:
> Oliver Hartkopp wrote:
>> Wolfgang Grandegger wrote:
>>> Wolfgang Grandegger wrote:
>>>   
>>>> But flooding can still occur and we 
>>>> are thinking about a better way of downscaling or temporarily disabling 
>>>> them. Socket-CAN currently restarts the controller after 200 bus errors.
>>>> My preferred solution for RT-Socket-CAN currently is to stop the CAN 
>>>> controller after a kernel configurable amount of successive bus errors. 
>>>> More clever ideas and comments are welcome?
>>>>     
>>> What do you think about the following method?
>>>
>>>    config XENO_DRIVERS_CAN_SJA1000_BUS_ERR_LIMIT
>>> 	depends on XENO_DRIVERS_CAN_SJA1000
>>> 	int "Maximum number of successive bus errors"
>>> 	range 0 255
>>> 	default 20
>>> 	help
>>>
>>> 	CAN bus errors are very useful for analyzing electrical problems
>>>          but they can come at a very high rate resulting in interrupt
>>>          flooding with bad impact on system performance and real-time
>>>          behavior. This option, if greater than 0, will limit the amount
>>>          of successive bus error interrupts. If the limit is reached, an
>>>          error message with "can_id = CAN_ERR_BUSERR_FLOOD" is sent. The
>>>          bus error counter gets reset on restart of the device and on any
>>>          successful message transmission or reception. Be aware that bus
>>>          error interrupts are only enabled if at least one socket is
>>>          listening on bus errors.
>>>
>>>   
>> Hi Wolfgang,
>>
>> what would be the wanted behaviour, after the discussed problem of bus 
>> error flooding occurred?
> 
> Well, I think the bus error rate should be downscaled without loosing 
> vital information concerning the cause of the problem and it should 
> require as little user intervention as possible. Treating it like a bus 
> error as currently done in Socket-CAN is a bit to strong in my mind.
> 
>> Can the Controller be assumed to be 'slightly dead', or what? Is there 
>> any chance that the bus heals by itself (=> no more bus errors) and can 
>> be used in a normal way? Or is a user interaction recommended or _required_?
> 
> Yes, if you plug the cable, the bus errors might go away and the TX done 
> interrupt will arrive or you get a bus-off (I have seen both).
> 
>> Indeed the slow down of bus errors is a reasonable approach, but your 
>> suggested method leaves too many questions open for the user :-/
> 
> What questions?
> 
>> I would tend to reduce the notifications to the user by creating a timer 
>> at the first bus error interrupt. The first BE irq would lead to a 
>> CAN_ERR_BUSERROR and after a (configurable) time (e.g.250ms) the next 
>> information about bus errors is allowed to be passed to the user. After 
>> this time period is over a new CAN_ERR_BUSERROR may be passed to the 
>> user containing the count of occurred bus errors somewhere in the 
>> data[]-section of the Error Frame. When a normal RX/TX-interrupt 
>> indicates a 'working' CAN again, the timer would be terminated.
>>
>> Instead of a fix configurable time we could also think about a dynamic 
>> behaviour (e.g. with increasing periods).
>>
>> What do you think about this?
> 
> The question is if one bus-error does provide enough information on the 
> cause of the electrical problem or if a sequence is better. Furthermore, 
> I personally regard the use of timers as to heavy. But the solution is 
> feasible, of course. Any other opinions?
> 

I think Oliver's suggestions points in the right direction. But instead
of only coding a timer into the stack, I still vote for closing the loop
over the application:

After the first error in a potential series, the related error frame is
queued, listeners are woken up, and BEI is disabled for now. Once some
listener read the error frame *and* decided to call into the stack for
further bus errors, BEI is enabled again.

That way the application decides about the error-related IRQ rate and
can easily throttle it by delaying the next receive call. Moreover,
threads of higher priority will be delayed at worst by one error IRQ.
This mechanism just needs some words in the documentation ("Be warned:
error frames may overwhelm you. Throttle your reception!"), but no
further user-visible config options.

Well, and if there is no thread listening on bus errors, but we want
stats to be updated once in a while, a slow low-prio timer to re-enable
BEI might still be created in the stack like Oliver suggested. For
Xenomai, you could consider pending an rtdm_nrtsig to keep the impact on
the RT domain low. But that's a minor implementation detail. The
important point is to avoid uncontrolled error bursts, even over a short
period (20 bus errors at 1 MBit/s already last for > 1 ms).

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-21 21:43       ` Jan Kiszka
@ 2007-03-22  8:08         ` Wolfgang Grandegger
       [not found]           ` <46036D32.7000603@domain.hid>
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-22  8:08 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: socketcan-core, Oliver Hartkopp, xenomai-core

Jan Kiszka wrote:
> Wolfgang Grandegger wrote:
>> Oliver Hartkopp wrote:
>>> Wolfgang Grandegger wrote:
>>>> Wolfgang Grandegger wrote:
>>>>   
>>>>> But flooding can still occur and we 
>>>>> are thinking about a better way of downscaling or temporarily disabling 
>>>>> them. Socket-CAN currently restarts the controller after 200 bus errors.
>>>>> My preferred solution for RT-Socket-CAN currently is to stop the CAN 
>>>>> controller after a kernel configurable amount of successive bus errors. 
>>>>> More clever ideas and comments are welcome?
>>>>>     
>>>> What do you think about the following method?
>>>>
>>>>    config XENO_DRIVERS_CAN_SJA1000_BUS_ERR_LIMIT
>>>> 	depends on XENO_DRIVERS_CAN_SJA1000
>>>> 	int "Maximum number of successive bus errors"
>>>> 	range 0 255
>>>> 	default 20
>>>> 	help
>>>>
>>>> 	CAN bus errors are very useful for analyzing electrical problems
>>>>          but they can come at a very high rate resulting in interrupt
>>>>          flooding with bad impact on system performance and real-time
>>>>          behavior. This option, if greater than 0, will limit the amount
>>>>          of successive bus error interrupts. If the limit is reached, an
>>>>          error message with "can_id = CAN_ERR_BUSERR_FLOOD" is sent. The
>>>>          bus error counter gets reset on restart of the device and on any
>>>>          successful message transmission or reception. Be aware that bus
>>>>          error interrupts are only enabled if at least one socket is
>>>>          listening on bus errors.
>>>>
>>>>   
>>> Hi Wolfgang,
>>>
>>> what would be the wanted behaviour, after the discussed problem of bus 
>>> error flooding occurred?
>> Well, I think the bus error rate should be downscaled without loosing 
>> vital information concerning the cause of the problem and it should 
>> require as little user intervention as possible. Treating it like a bus 
>> error as currently done in Socket-CAN is a bit to strong in my mind.
>>
>>> Can the Controller be assumed to be 'slightly dead', or what? Is there 
>>> any chance that the bus heals by itself (=> no more bus errors) and can 
>>> be used in a normal way? Or is a user interaction recommended or _required_?
>> Yes, if you plug the cable, the bus errors might go away and the TX done 
>> interrupt will arrive or you get a bus-off (I have seen both).
>>
>>> Indeed the slow down of bus errors is a reasonable approach, but your 
>>> suggested method leaves too many questions open for the user :-/
>> What questions?
>>
>>> I would tend to reduce the notifications to the user by creating a timer 
>>> at the first bus error interrupt. The first BE irq would lead to a 
>>> CAN_ERR_BUSERROR and after a (configurable) time (e.g.250ms) the next 
>>> information about bus errors is allowed to be passed to the user. After 
>>> this time period is over a new CAN_ERR_BUSERROR may be passed to the 
>>> user containing the count of occurred bus errors somewhere in the 
>>> data[]-section of the Error Frame. When a normal RX/TX-interrupt 
>>> indicates a 'working' CAN again, the timer would be terminated.
>>>
>>> Instead of a fix configurable time we could also think about a dynamic 
>>> behaviour (e.g. with increasing periods).
>>>
>>> What do you think about this?
>> The question is if one bus-error does provide enough information on the 
>> cause of the electrical problem or if a sequence is better. Furthermore, 
>> I personally regard the use of timers as to heavy. But the solution is 
>> feasible, of course. Any other opinions?
>>
> 
> I think Oliver's suggestions points in the right direction. But instead
> of only coding a timer into the stack, I still vote for closing the loop
> over the application:
> 
> After the first error in a potential series, the related error frame is
> queued, listeners are woken up, and BEI is disabled for now. Once some
> listener read the error frame *and* decided to call into the stack for
> further bus errors, BEI is enabled again.
> 
> That way the application decides about the error-related IRQ rate and
> can easily throttle it by delaying the next receive call. Moreover,
> threads of higher priority will be delayed at worst by one error IRQ.
> This mechanism just needs some words in the documentation ("Be warned:
> error frames may overwhelm you. Throttle your reception!"), but no
> further user-visible config options.

I understand, BEI interrupts get (re-)enabled in recvmsg() if the socket 
wants to receive bus errors. There can me multiple readers, but that's 
not a problem. Just some overhead in this function. This would also 
simplify the implementation as my previous one with "on-demand" bus 
error would be obsolete. I start to like this solution.

> Well, and if there is no thread listening on bus errors, but we want
> stats to be updated once in a while, a slow low-prio timer to re-enable
> BEI might still be created in the stack like Oliver suggested. For
> Xenomai, you could consider pending an rtdm_nrtsig to keep the impact on
> the RT domain low. But that's a minor implementation detail. The
> important point is to avoid uncontrolled error bursts, even over a short
> period (20 bus errors at 1 MBit/s already last for > 1 ms).

I think the above solution is enough. Let's go for it?

Wolfgang.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
       [not found]             ` <46036F22.60709@domain.hid>
@ 2007-03-23  8:34               ` Jan Kiszka
  2007-03-23  8:51                 ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2007-03-23  8:34 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: socketcan-core, xenomai-core

[-- Attachment #1: Type: text/plain, Size: 6343 bytes --]

Oliver Hartkopp wrote:
> Additionally to the written stuff below (please read that first), i want
> to remark:
> 
> - Remember that we are talking about a case that is not a standard
> operation mode but a (temporary) error condition that normally leads to
> a bus-off state and appears only in development and hardware setup phase!
> - i would suggest to use some low resolution timestamp (like jiffies)
> for this, which is very cheap in CPU usage
> - the throttling should be configured as a driver module parameter (e.g.
> bei_thr=0 or bei_thr=200 )due to the need of the global use-case. If you
> are writing a CAN analysis tool you might want to set bei_thr=0 in other
> cases a default of 200ms might be the right thing.

We are falling back to #1, i.e. where we are now already. Your
suggestion doesn't help us to provide a generic RT-stack for Xenomai.

> 
> Regards,
> Oliver
> 
> 
> 
> Oliver Hartkopp wrote:
>> Wolfgang Grandegger wrote:
>>> Jan Kiszka wrote:
>>>> Wolfgang Grandegger wrote:
>>>>> Oliver Hartkopp wrote:
>>>>>
>>>>>> I would tend to reduce the notifications to the user by creating a
>>>>>> timer at the first bus error interrupt. The first BE irq would
>>>>>> lead to a CAN_ERR_BUSERROR and after a (configurable) time
>>>>>> (e.g.250ms) the next information about bus errors is allowed to be
>>>>>> passed to the user. After this time period is over a new
>>>>>> CAN_ERR_BUSERROR may be passed to the user containing the count of
>>>>>> occurred bus errors somewhere in the data[]-section of the Error
>>>>>> Frame. When a normal RX/TX-interrupt indicates a 'working' CAN
>>>>>> again, the timer would be terminated.
>>>>>>
>>>>>> Instead of a fix configurable time we could also think about a
>>>>>> dynamic behaviour (e.g. with increasing periods).
>>>>>>
>>>>>> What do you think about this?
>>>>> The question is if one bus-error does provide enough information on
>>>>> the cause of the electrical problem or if a sequence is better.
>>>>> Furthermore, I personally regard the use of timers as to heavy. But
>>>>> the solution is feasible, of course. Any other opinions?
>>>>>
>>>>
>>>> I think Oliver's suggestions points in the right direction. But instead
>>>> of only coding a timer into the stack, I still vote for closing the
>>>> loop
>>>> over the application:
>>>>
>>>> After the first error in a potential series, the related error frame is
>>>> queued, listeners are woken up, and BEI is disabled for now. Once some
>>>> listener read the error frame *and* decided to call into the stack for
>>>> further bus errors, BEI is enabled again.
>>>>
>>>> That way the application decides about the error-related IRQ rate and
>>>> can easily throttle it by delaying the next receive call. Moreover,
>>>> threads of higher priority will be delayed at worst by one error IRQ.
>>>> This mechanism just needs some words in the documentation ("Be warned:
>>>> error frames may overwhelm you. Throttle your reception!"), but no
>>>> further user-visible config options.
>>>
>>> I understand, BEI interrupts get (re-)enabled in recvmsg() if the
>>> socket wants to receive bus errors. There can me multiple readers,
>>> but that's not a problem. Just some overhead in this function. This
>>> would also simplify the implementation as my previous one with
>>> "on-demand" bus error would be obsolete. I start to like this solution.
>>
>> Hm - to reenable the BEI on user interaction would be a nice thing BUT i
>> can see several problems:
>>
>> 1. In socketcan you have receive queues into the userspace with a
>> length >1

Can you explain to me what the problem behind this is? I don't see it yet.

>>
>> 2. How can we handle multiple subscribers (A reads three error frames
>> and reenables therefore the BEI, B reads nothing in this time). Please
>> remember: To have multiple applications it a vital idea from socketcan.

Same here, I don't see the issue. A and B will both find the first error
frame in their queues/ring buffers/whatever. If A has higher priority
(or gets an earlier timeslice), it may already re-enable BEI before B
was able to run as well. But that's an application-specific scheduling
issue and not a problem of the CAN stack (often it is precisely what you
want when assigning priorities...).

>>
>> 3. The count of occured BEIs gets lost (maybe this is unimportant)

Agreed, but I also don't consider this problematic.

>>
>> ----
>>
>> Regarding (2) the solution could be not to reenable the BEI for a device
>> until every subscriber has read his error frame. But this collides with
>> a raw-socket that's bound to 'any' device (ifindex = 0).

That can cause prio-inversion: a low-prio BEI-reader decides about when
a high-prio one gets the next message. No-go for RT.

>>
>> Regarding (3) we could count the BEIs (which would not reduce the
>> interrupt load) or we just stop the BEI after the first occurance which
>> might possibly not enough for some people to implement the CAN
>> academical correct.
>>
>> As you may see here a tight coupling of the problems on the CAN bus with
>> the application(s!) is very tricky or even impossible in socketcan.
>> Regarding other network devices (like ethernet devices) the notification
>> about Layer 1/2 problems is unusual. The concept of creating error
>> frames was a good compromise for this reason.
>>
>> As i also would like to avoid to create a timer for "bus error
>> throttling", i got a new idea:
>>
>> - on the first BEI: create an error frame, set a counter to zero and
>> save the current timestamp
>> - on the next BEI:
>>  - increment the counter
>>  - check if the time is up for the next error frame (e.g. after 200ms -
>> configurable?)
>>  - if so: Send the next error frame (including the number of occured
>> error frames in this 200ms)
>>
>> BEI means ONLY to have a BEI (and no other error).
>>
>> Of course this does NOT reduce the interrupt load but all this
>> throttling is performed inside the interrupt context. This should not be
>> that problem, or is it? And we do not need a timer ...
>>
>> Any comments to this idea?
>>
>> Regards,
>> Oliver
>>

Well, I may oversee some pitfalls of my suggestion, so please help me to
understand your concerns.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
       [not found]           ` <46036D32.7000603@domain.hid>
       [not found]             ` <46036F22.60709@domain.hid>
@ 2007-03-23  8:37             ` Wolfgang Grandegger
  1 sibling, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-23  8:37 UTC (permalink / raw)
  To: Oliver Hartkopp; +Cc: socketcan-core, Jan Kiszka, xenomai-core

Oliver Hartkopp wrote:
> Wolfgang Grandegger wrote:
>> Jan Kiszka wrote:
>>> Wolfgang Grandegger wrote:
>>>> Oliver Hartkopp wrote:
>>>>
>>>>> I would tend to reduce the notifications to the user by creating a 
>>>>> timer at the first bus error interrupt. The first BE irq would lead 
>>>>> to a CAN_ERR_BUSERROR and after a (configurable) time (e.g.250ms) 
>>>>> the next information about bus errors is allowed to be passed to 
>>>>> the user. After this time period is over a new CAN_ERR_BUSERROR may 
>>>>> be passed to the user containing the count of occurred bus errors 
>>>>> somewhere in the data[]-section of the Error Frame. When a normal 
>>>>> RX/TX-interrupt indicates a 'working' CAN again, the timer would be 
>>>>> terminated.
>>>>>
>>>>> Instead of a fix configurable time we could also think about a 
>>>>> dynamic behaviour (e.g. with increasing periods).
>>>>>
>>>>> What do you think about this?
>>>> The question is if one bus-error does provide enough information on 
>>>> the cause of the electrical problem or if a sequence is better. 
>>>> Furthermore, I personally regard the use of timers as to heavy. But 
>>>> the solution is feasible, of course. Any other opinions?
>>>>
>>>
>>> I think Oliver's suggestions points in the right direction. But instead
>>> of only coding a timer into the stack, I still vote for closing the loop
>>> over the application:
>>>
>>> After the first error in a potential series, the related error frame is
>>> queued, listeners are woken up, and BEI is disabled for now. Once some
>>> listener read the error frame *and* decided to call into the stack for
>>> further bus errors, BEI is enabled again.
>>>
>>> That way the application decides about the error-related IRQ rate and
>>> can easily throttle it by delaying the next receive call. Moreover,
>>> threads of higher priority will be delayed at worst by one error IRQ.
>>> This mechanism just needs some words in the documentation ("Be warned:
>>> error frames may overwhelm you. Throttle your reception!"), but no
>>> further user-visible config options.
>>
>> I understand, BEI interrupts get (re-)enabled in recvmsg() if the 
>> socket wants to receive bus errors. There can me multiple readers, but 
>> that's not a problem. Just some overhead in this function. This would 
>> also simplify the implementation as my previous one with "on-demand" 
>> bus error would be obsolete. I start to like this solution.
> 
> Hm - to reenable the BEI on user interaction would be a nice thing BUT i
> can see several problems:
> 
> 1. In socketcan you have receive queues into the userspace with a length >1

Yes, and they are still used to receive normal and other error messages.

> 2. How can we handle multiple subscribers (A reads three error frames
> and reenables therefore the BEI, B reads nothing in this time). Please
> remember: To have multiple applications it a vital idea from socketcan.

My idea was to re-enable (or trigger) one BEI at the beginning of 
recvmsg in selected. Then all listening sockets will receive all bus 
error messages, e.g. A and B will receive up to the sum of the recvmsg 
calls form A and B. Well, this mixture is not really nice but it will 
downscale the bus error rate to a "digestible" level.

> 3. The count of occured BEIs gets lost (maybe this is unimportant)

I do not regard this as a problem. The purpose of this option is to 
downscale the BEI rate.

> ----
> 
> Regarding (2) the solution could be not to reenable the BEI for a device
> until every subscriber has read his error frame. But this collides with
> a raw-socket that's bound to 'any' device (ifindex = 0).

We cannot do that.

> Regarding (3) we could count the BEIs (which would not reduce the
> interrupt load) or we just stop the BEI after the first occurance which
> might possibly not enough for some people to implement the CAN
> academical correct.
> 
> As you may see here a tight coupling of the problems on the CAN bus with
> the application(s!) is very tricky or even impossible in socketcan.
> Regarding other network devices (like ethernet devices) the notification
> about Layer 1/2 problems is unusual. The concept of creating error
> frames was a good compromise for this reason.
> 
> As i also would like to avoid to create a timer for "bus error
> throttling", i got a new idea:
> 
> - on the first BEI: create an error frame, set a counter to zero and
> save the current timestamp
> - on the next BEI:
>  - increment the counter
>  - check if the time is up for the next error frame (e.g. after 200ms -
> configurable?)
>  - if so: Send the next error frame (including the number of occured
> error frames in this 200ms)
> 
> BEI means ONLY to have a BEI (and no other error).
> 
> Of course this does NOT reduce the interrupt load but all this
> throttling is performed inside the interrupt context. This should not be
> that problem, or is it? And we do not need a timer ...

It's about reducing the very high BEI interrupt rate. A rate of 15kHz is 
_really_ heavy on mid and low end systems and especially for real-time.

> Any comments to this idea?

Well, till now I do not see a 100% satisfactory solution and I do not 
want to make the implementation too sophisticated. Currently I'm in 
favor of my proposes solution with a low default value (10 or even 1):

   config XENO_DRIVERS_CAN_SJA1000_BUS_ERR_LIMIT
         depends on XENO_DRIVERS_CAN_SJA1000
         int "Maximum number of successive bus errors"
         range 0 255
         default 10
         help

         CAN bus errors are very useful for analyzing electrical problems
         but they can come at a very high rate resulting in interrupt
         flooding with bad impact on system performance and real-time
         behavior. This option, if greater than 0, will limit the amount
         of successive bus error interrupts. If the limit is reached, an
         error message with "can_id = CAN_ERR_BUSERR_FLOOD" is sent and
         the bus error interrutps get disabled. They get re-enabled on
         restart of the device and on any successful message transmission
         or reception. Be aware that bus error interrupts are only
         enabled if at least one socket is listening on bus errors.

The user has still all choices. What is you preferred solution?

Wolfgang



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-23  8:34               ` Jan Kiszka
@ 2007-03-23  8:51                 ` Wolfgang Grandegger
  2007-03-24 11:51                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-23  8:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: socketcan-core, Oliver Hartkopp, xenomai-core

Jan Kiszka wrote:
> Oliver Hartkopp wrote:
>> Additionally to the written stuff below (please read that first), i want
>> to remark:
>>
>> - Remember that we are talking about a case that is not a standard
>> operation mode but a (temporary) error condition that normally leads to
>> a bus-off state and appears only in development and hardware setup phase!
>> - i would suggest to use some low resolution timestamp (like jiffies)
>> for this, which is very cheap in CPU usage
>> - the throttling should be configured as a driver module parameter (e.g.
>> bei_thr=0 or bei_thr=200 )due to the need of the global use-case. If you
>> are writing a CAN analysis tool you might want to set bei_thr=0 in other
>> cases a default of 200ms might be the right thing.
> 
> We are falling back to #1, i.e. where we are now already. Your
> suggestion doesn't help us to provide a generic RT-stack for Xenomai.
> 
>> Regards,
>> Oliver
>>
>>
>>
>> Oliver Hartkopp wrote:
>>> Wolfgang Grandegger wrote:
>>>> Jan Kiszka wrote:
>>>>> Wolfgang Grandegger wrote:
>>>>>> Oliver Hartkopp wrote:
>>>>>>
>>>>>>> I would tend to reduce the notifications to the user by creating a
>>>>>>> timer at the first bus error interrupt. The first BE irq would
>>>>>>> lead to a CAN_ERR_BUSERROR and after a (configurable) time
>>>>>>> (e.g.250ms) the next information about bus errors is allowed to be
>>>>>>> passed to the user. After this time period is over a new
>>>>>>> CAN_ERR_BUSERROR may be passed to the user containing the count of
>>>>>>> occurred bus errors somewhere in the data[]-section of the Error
>>>>>>> Frame. When a normal RX/TX-interrupt indicates a 'working' CAN
>>>>>>> again, the timer would be terminated.
>>>>>>>
>>>>>>> Instead of a fix configurable time we could also think about a
>>>>>>> dynamic behaviour (e.g. with increasing periods).
>>>>>>>
>>>>>>> What do you think about this?
>>>>>> The question is if one bus-error does provide enough information on
>>>>>> the cause of the electrical problem or if a sequence is better.
>>>>>> Furthermore, I personally regard the use of timers as to heavy. But
>>>>>> the solution is feasible, of course. Any other opinions?
>>>>>>
>>>>> I think Oliver's suggestions points in the right direction. But instead
>>>>> of only coding a timer into the stack, I still vote for closing the
>>>>> loop
>>>>> over the application:
>>>>>
>>>>> After the first error in a potential series, the related error frame is
>>>>> queued, listeners are woken up, and BEI is disabled for now. Once some
>>>>> listener read the error frame *and* decided to call into the stack for
>>>>> further bus errors, BEI is enabled again.
>>>>>
>>>>> That way the application decides about the error-related IRQ rate and
>>>>> can easily throttle it by delaying the next receive call. Moreover,
>>>>> threads of higher priority will be delayed at worst by one error IRQ.
>>>>> This mechanism just needs some words in the documentation ("Be warned:
>>>>> error frames may overwhelm you. Throttle your reception!"), but no
>>>>> further user-visible config options.
>>>> I understand, BEI interrupts get (re-)enabled in recvmsg() if the
>>>> socket wants to receive bus errors. There can me multiple readers,
>>>> but that's not a problem. Just some overhead in this function. This
>>>> would also simplify the implementation as my previous one with
>>>> "on-demand" bus error would be obsolete. I start to like this solution.
>>> Hm - to reenable the BEI on user interaction would be a nice thing BUT i
>>> can see several problems:
>>>
>>> 1. In socketcan you have receive queues into the userspace with a
>>> length >1
> 
> Can you explain to me what the problem behind this is? I don't see it yet.
> 
>>> 2. How can we handle multiple subscribers (A reads three error frames
>>> and reenables therefore the BEI, B reads nothing in this time). Please
>>> remember: To have multiple applications it a vital idea from socketcan.
> 
> Same here, I don't see the issue. A and B will both find the first error
> frame in their queues/ring buffers/whatever. If A has higher priority
> (or gets an earlier timeslice), it may already re-enable BEI before B
> was able to run as well. But that's an application-specific scheduling
> issue and not a problem of the CAN stack (often it is precisely what you
> want when assigning priorities...).
> 
>>> 3. The count of occured BEIs gets lost (maybe this is unimportant)
> 
> Agreed, but I also don't consider this problematic.
> 
>>> ----
>>>
>>> Regarding (2) the solution could be not to reenable the BEI for a device
>>> until every subscriber has read his error frame. But this collides with
>>> a raw-socket that's bound to 'any' device (ifindex = 0).
> 
> That can cause prio-inversion: a low-prio BEI-reader decides about when
> a high-prio one gets the next message. No-go for RT.
> 
>>> Regarding (3) we could count the BEIs (which would not reduce the
>>> interrupt load) or we just stop the BEI after the first occurance which
>>> might possibly not enough for some people to implement the CAN
>>> academical correct.
>>>
>>> As you may see here a tight coupling of the problems on the CAN bus with
>>> the application(s!) is very tricky or even impossible in socketcan.
>>> Regarding other network devices (like ethernet devices) the notification
>>> about Layer 1/2 problems is unusual. The concept of creating error
>>> frames was a good compromise for this reason.
>>>
>>> As i also would like to avoid to create a timer for "bus error
>>> throttling", i got a new idea:
>>>
>>> - on the first BEI: create an error frame, set a counter to zero and
>>> save the current timestamp
>>> - on the next BEI:
>>>  - increment the counter
>>>  - check if the time is up for the next error frame (e.g. after 200ms -
>>> configurable?)
>>>  - if so: Send the next error frame (including the number of occured
>>> error frames in this 200ms)
>>>
>>> BEI means ONLY to have a BEI (and no other error).
>>>
>>> Of course this does NOT reduce the interrupt load but all this
>>> throttling is performed inside the interrupt context. This should not be
>>> that problem, or is it? And we do not need a timer ...
>>>
>>> Any comments to this idea?
>>>
>>> Regards,
>>> Oliver
>>>
> 
> Well, I may oversee some pitfalls of my suggestion, so please help me to
> understand your concerns.

There might be a problem with re-enabling BEI interrupts because we need 
  to read the ECC. OK, I'm going to implement the method as well to 
check for pitfalls.

Wolfgang.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-23  8:51                 ` Wolfgang Grandegger
@ 2007-03-24 11:51                   ` Wolfgang Grandegger
  2007-03-24 13:38                     ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-24 11:51 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: socketcan-core, Oliver Hartkopp, Jan Kiszka, xenomai-core

[-- Attachment #1: Type: text/plain, Size: 7314 bytes --]

Wolfgang Grandegger wrote:
> Jan Kiszka wrote:
>> Oliver Hartkopp wrote:
>>> Additionally to the written stuff below (please read that first), i want
>>> to remark:
>>>
>>> - Remember that we are talking about a case that is not a standard
>>> operation mode but a (temporary) error condition that normally leads to
>>> a bus-off state and appears only in development and hardware setup 
>>> phase!
>>> - i would suggest to use some low resolution timestamp (like jiffies)
>>> for this, which is very cheap in CPU usage
>>> - the throttling should be configured as a driver module parameter (e.g.
>>> bei_thr=0 or bei_thr=200 )due to the need of the global use-case. If you
>>> are writing a CAN analysis tool you might want to set bei_thr=0 in other
>>> cases a default of 200ms might be the right thing.
>>
>> We are falling back to #1, i.e. where we are now already. Your
>> suggestion doesn't help us to provide a generic RT-stack for Xenomai.
>>
>>> Regards,
>>> Oliver
>>>
>>>
>>>
>>> Oliver Hartkopp wrote:
>>>> Wolfgang Grandegger wrote:
>>>>> Jan Kiszka wrote:
>>>>>> Wolfgang Grandegger wrote:
>>>>>>> Oliver Hartkopp wrote:
>>>>>>>
>>>>>>>> I would tend to reduce the notifications to the user by creating a
>>>>>>>> timer at the first bus error interrupt. The first BE irq would
>>>>>>>> lead to a CAN_ERR_BUSERROR and after a (configurable) time
>>>>>>>> (e.g.250ms) the next information about bus errors is allowed to be
>>>>>>>> passed to the user. After this time period is over a new
>>>>>>>> CAN_ERR_BUSERROR may be passed to the user containing the count of
>>>>>>>> occurred bus errors somewhere in the data[]-section of the Error
>>>>>>>> Frame. When a normal RX/TX-interrupt indicates a 'working' CAN
>>>>>>>> again, the timer would be terminated.
>>>>>>>>
>>>>>>>> Instead of a fix configurable time we could also think about a
>>>>>>>> dynamic behaviour (e.g. with increasing periods).
>>>>>>>>
>>>>>>>> What do you think about this?
>>>>>>> The question is if one bus-error does provide enough information on
>>>>>>> the cause of the electrical problem or if a sequence is better.
>>>>>>> Furthermore, I personally regard the use of timers as to heavy. But
>>>>>>> the solution is feasible, of course. Any other opinions?
>>>>>>>
>>>>>> I think Oliver's suggestions points in the right direction. But 
>>>>>> instead
>>>>>> of only coding a timer into the stack, I still vote for closing the
>>>>>> loop
>>>>>> over the application:
>>>>>>
>>>>>> After the first error in a potential series, the related error 
>>>>>> frame is
>>>>>> queued, listeners are woken up, and BEI is disabled for now. Once 
>>>>>> some
>>>>>> listener read the error frame *and* decided to call into the stack 
>>>>>> for
>>>>>> further bus errors, BEI is enabled again.
>>>>>>
>>>>>> That way the application decides about the error-related IRQ rate and
>>>>>> can easily throttle it by delaying the next receive call. Moreover,
>>>>>> threads of higher priority will be delayed at worst by one error IRQ.
>>>>>> This mechanism just needs some words in the documentation ("Be 
>>>>>> warned:
>>>>>> error frames may overwhelm you. Throttle your reception!"), but no
>>>>>> further user-visible config options.
>>>>> I understand, BEI interrupts get (re-)enabled in recvmsg() if the
>>>>> socket wants to receive bus errors. There can me multiple readers,
>>>>> but that's not a problem. Just some overhead in this function. This
>>>>> would also simplify the implementation as my previous one with
>>>>> "on-demand" bus error would be obsolete. I start to like this 
>>>>> solution.
>>>> Hm - to reenable the BEI on user interaction would be a nice thing 
>>>> BUT i
>>>> can see several problems:
>>>>
>>>> 1. In socketcan you have receive queues into the userspace with a
>>>> length >1
>>
>> Can you explain to me what the problem behind this is? I don't see it 
>> yet.
>>
>>>> 2. How can we handle multiple subscribers (A reads three error frames
>>>> and reenables therefore the BEI, B reads nothing in this time). Please
>>>> remember: To have multiple applications it a vital idea from socketcan.
>>
>> Same here, I don't see the issue. A and B will both find the first error
>> frame in their queues/ring buffers/whatever. If A has higher priority
>> (or gets an earlier timeslice), it may already re-enable BEI before B
>> was able to run as well. But that's an application-specific scheduling
>> issue and not a problem of the CAN stack (often it is precisely what you
>> want when assigning priorities...).
>>
>>>> 3. The count of occured BEIs gets lost (maybe this is unimportant)
>>
>> Agreed, but I also don't consider this problematic.
>>
>>>> ----
>>>>
>>>> Regarding (2) the solution could be not to reenable the BEI for a 
>>>> device
>>>> until every subscriber has read his error frame. But this collides with
>>>> a raw-socket that's bound to 'any' device (ifindex = 0).
>>
>> That can cause prio-inversion: a low-prio BEI-reader decides about when
>> a high-prio one gets the next message. No-go for RT.
>>
>>>> Regarding (3) we could count the BEIs (which would not reduce the
>>>> interrupt load) or we just stop the BEI after the first occurance which
>>>> might possibly not enough for some people to implement the CAN
>>>> academical correct.
>>>>
>>>> As you may see here a tight coupling of the problems on the CAN bus 
>>>> with
>>>> the application(s!) is very tricky or even impossible in socketcan.
>>>> Regarding other network devices (like ethernet devices) the 
>>>> notification
>>>> about Layer 1/2 problems is unusual. The concept of creating error
>>>> frames was a good compromise for this reason.
>>>>
>>>> As i also would like to avoid to create a timer for "bus error
>>>> throttling", i got a new idea:
>>>>
>>>> - on the first BEI: create an error frame, set a counter to zero and
>>>> save the current timestamp
>>>> - on the next BEI:
>>>>  - increment the counter
>>>>  - check if the time is up for the next error frame (e.g. after 200ms -
>>>> configurable?)
>>>>  - if so: Send the next error frame (including the number of occured
>>>> error frames in this 200ms)
>>>>
>>>> BEI means ONLY to have a BEI (and no other error).
>>>>
>>>> Of course this does NOT reduce the interrupt load but all this
>>>> throttling is performed inside the interrupt context. This should 
>>>> not be
>>>> that problem, or is it? And we do not need a timer ...
>>>>
>>>> Any comments to this idea?
>>>>
>>>> Regards,
>>>> Oliver
>>>>
>>
>> Well, I may oversee some pitfalls of my suggestion, so please help me to
>> understand your concerns.
> 
> There might be a problem with re-enabling BEI interrupts because we need 
>  to read the ECC. OK, I'm going to implement the method as well to check 
> for pitfalls.

Attached is the patch and it works fine. The key function 
rtcan_sja_enable_bus_err() is called from sendmsg():

void rtcan_sja_enable_bus_err(struct rtcan_device *dev)
{
     struct rtcan_sja1000 *chip = (struct rtcan_sja1000 *)dev->priv;

     if (chip->bus_err_on < 2) {
	if (chip->bus_err_on < 1)
	    chip->read_reg(dev, SJA_ECC);
	chip->bus_err_on = 2;
     }
}

And I do also do not see a real problem with multiple readers. I would 
commit this solution. I'm just unsure if we should select it silently or 
if the user should have the choice.

Wolfgang.

[-- Attachment #2: xenomai-rtcan-bus-err-on-send.patch --]
[-- Type: text/x-patch, Size: 7490 bytes --]

Index: ksrc/drivers/can/Kconfig
===================================================================
--- ksrc/drivers/can/Kconfig	(revision 2335)
+++ ksrc/drivers/can/Kconfig	(working copy)
@@ -49,6 +49,19 @@ config XENO_DRIVERS_CAN_MAX_RECEIVERS
 
 	The driver maintains a receive filter list per device for fast access.
 
+config XENO_DRIVERS_CAN_BUS_ERR
+	depends on XENO_DRIVERS_CAN
+	bool
+	default n
+	help
+
+	To avoid unnecessary bus error interrupt flooding, this option enables
+	bus error interrupts when an application is calling a receive function
+	on a socket listening on bus errors. After one bus error has occured,
+	the interrupt will be disabled to allow the application time for error
+	processing. This option is automatically selected for CAN controllers
+	supporting bus error interrupts like the SJA1000.
+
 config XENO_DRIVERS_CAN_VIRT
 	depends on XENO_DRIVERS_CAN
 	tristate "Virtual CAN bus driver"
Index: ksrc/drivers/can/rtcan_dev.h
===================================================================
--- ksrc/drivers/can/rtcan_dev.h	(revision 2335)
+++ ksrc/drivers/can/rtcan_dev.h	(working copy)
@@ -118,6 +118,7 @@ struct rtcan_device {
     int                 (*do_set_bit_time)(struct rtcan_device *dev,
 					   struct can_bittime *bit_time,
 					   rtdm_lockctx_t *lock_ctx);
+    void                (*do_enable_bus_err)(struct rtcan_device *dev);
 
     /* Reception list head. This list contains all filters which have been
      * registered via a bind call. */
Index: ksrc/drivers/can/rtcan_raw_dev.c
===================================================================
--- ksrc/drivers/can/rtcan_raw_dev.c	(revision 2335)
+++ ksrc/drivers/can/rtcan_raw_dev.c	(working copy)
@@ -312,3 +312,33 @@ int rtcan_raw_ioctl_dev(struct rtdm_dev_
 
     return ret;
 }
+
+#ifdef CONFIG_XENO_DRIVERS_CAN_BUS_ERR
+void __rtcan_raw_enable_bus_err(struct rtcan_socket *sock)
+{
+    int i, begin, end;
+    struct rtcan_device *dev;
+    rtdm_lockctx_t lock_ctx;
+    int ifindex = atomic_read(&sock->ifindex);
+
+    if (ifindex) {
+	begin = ifindex;
+	end   = ifindex;
+    } else {
+	begin = 1;
+	end = RTCAN_MAX_DEVICES;
+    }
+
+    for (i = begin; i <= end; i++) {
+	if ((dev = rtcan_dev_get_by_index(i)) == NULL)
+	    continue;
+
+	if (dev->do_enable_bus_err) {
+	    rtdm_lock_get_irqsave(&dev->device_lock, lock_ctx);
+	    dev->do_enable_bus_err(dev);
+	    rtdm_lock_put_irqrestore(&dev->device_lock, lock_ctx);
+	}
+	rtcan_dev_dereference(dev);
+    }
+}
+#endif /* CONFIG_XENO_DRIVERS_CAN_BUS_ERR*/
Index: ksrc/drivers/can/rtcan_raw.c
===================================================================
--- ksrc/drivers/can/rtcan_raw.c	(revision 2335)
+++ ksrc/drivers/can/rtcan_raw.c	(working copy)
@@ -624,6 +624,7 @@ ssize_t rtcan_raw_recvmsg(struct rtdm_de
             return -EINVAL;
     }
 
+    rtcan_raw_enable_bus_err(sock);
 
     /* Set RX timeout */
     timeout = (flags & MSG_DONTWAIT) ? RTDM_TIMEOUT_NONE : sock->rx_timeout;
Index: ksrc/drivers/can/rtcan_raw.h
===================================================================
--- ksrc/drivers/can/rtcan_raw.h	(revision 2335)
+++ ksrc/drivers/can/rtcan_raw.h	(working copy)
@@ -41,6 +41,17 @@ void rtcan_loopback(struct rtcan_device 
 #define rtcan_loopback_pending(dev) (0)
 #endif /* CONFIG_XENO_DRIVERS_CAN_LOOPBACK */
 
+#ifdef CONFIG_XENO_DRIVERS_CAN_BUS_ERR
+void __rtcan_raw_enable_bus_err(struct rtcan_socket *sock);
+static inline void rtcan_raw_enable_bus_err(struct rtcan_socket *sock)
+{
+    if ((sock->err_mask & CAN_ERR_BUSERROR))
+	__rtcan_raw_enable_bus_err(sock);
+}
+#else
+#define rtcan_raw_enable_bus_err(sock)
+#endif
+
 int __init rtcan_raw_proto_register(void);
 void __exit rtcan_raw_proto_unregister(void);
 
Index: ksrc/drivers/can/sja1000/Kconfig
===================================================================
--- ksrc/drivers/can/sja1000/Kconfig	(revision 2335)
+++ ksrc/drivers/can/sja1000/Kconfig	(working copy)
@@ -1,6 +1,7 @@
 config XENO_DRIVERS_CAN_SJA1000
 	depends on XENO_DRIVERS_CAN
 	tristate "Philips SJA1000 CAN controller"
+	select XENO_DRIVERS_CAN_BUS_ERR
 
 config XENO_DRIVERS_CAN_SJA1000_ISA
 	depends on XENO_DRIVERS_CAN_SJA1000
Index: ksrc/drivers/can/sja1000/rtcan_sja1000.c
===================================================================
--- ksrc/drivers/can/sja1000/rtcan_sja1000.c	(revision 2335)
+++ ksrc/drivers/can/sja1000/rtcan_sja1000.c	(working copy)
@@ -293,18 +293,21 @@ static int rtcan_sja_interrupt(rtdm_irq_
             dev->state = dev->state_before_sleep;
 
 	/* Error Interrupt? */
-        if (irq_source & (SJA_IR_EI | SJA_IR_DOI | SJA_IR_EPI | 
+        if (irq_source & (SJA_IR_EI | SJA_IR_DOI | SJA_IR_EPI |
 			  SJA_IR_ALI | SJA_IR_BEI)) {
+
 	    /* Check error condition and fill error frame */
-	    rtcan_sja_err_interrupt(dev, chip, &skb, irq_source);
+	    if (!((irq_source & SJA_IR_BEI) && (chip->bus_err_on-- < 2))) {
+		rtcan_sja_err_interrupt(dev, chip, &skb, irq_source);
 
-	    if (recv_lock_free) {
-		recv_lock_free = 0;
-		rtdm_lock_get(&rtcan_recv_list_lock);
-		rtdm_lock_get(&rtcan_socket_lock);
+		if (recv_lock_free) {
+		    recv_lock_free = 0;
+		    rtdm_lock_get(&rtcan_recv_list_lock);
+		    rtdm_lock_get(&rtcan_socket_lock);
+		}
+		/* Pass error frame out to the sockets */
+		rtcan_rcv(dev, &skb);
 	    }
-	    /* Pass error frame out to the sockets */
-	    rtcan_rcv(dev, &skb);
 	}
 
         /* Transmit Interrupt? */
@@ -625,8 +628,16 @@ int rtcan_sja_set_bit_time(struct rtcan_
     return 0;
 }
 
+void rtcan_sja_enable_bus_err(struct rtcan_device *dev)
+{
+    struct rtcan_sja1000 *chip = (struct rtcan_sja1000 *)dev->priv;
 
-
+    if (chip->bus_err_on < 2) {
+	if (chip->bus_err_on < 1)
+	    chip->read_reg(dev, SJA_ECC);
+	chip->bus_err_on = 2;
+    }
+}
 
 /*
  *  Start a transmission to a SJA1000 device
@@ -745,8 +756,10 @@ int rtcan_sja1000_register(struct rtcan_
     dev->do_set_mode = rtcan_sja_set_mode;
     dev->do_get_state = rtcan_sja_get_state;
     dev->do_set_bit_time = rtcan_sja_set_bit_time;
+    dev->do_enable_bus_err = rtcan_sja_enable_bus_err;
+    chip->bus_err_on = 1;
 
-    ret = rtdm_irq_request(&dev->irq_handle, 
+    ret = rtdm_irq_request(&dev->irq_handle,
 			   chip->irq_num, rtcan_sja_interrupt,
 			   chip->irq_flags, sja_ctrl_name, dev);
     if (ret) {
Index: ksrc/drivers/can/sja1000/Config.in
===================================================================
--- ksrc/drivers/can/sja1000/Config.in	(revision 2335)
+++ ksrc/drivers/can/sja1000/Config.in	(working copy)
@@ -4,6 +4,10 @@
 
 dep_tristate 'Philips SJA1000 CAN controller' CONFIG_XENO_DRIVERS_CAN_SJA1000 $CONFIG_XENO_DRIVERS_CAN
 
+if [ "$CONFIG_XENO_DRIVERS_CAN_SJA1000" != "n" ]; then
+	define_bool CONFIG_XENO_DRIVERS_CAN_BUS_ERR y
+fi
+
 dep_tristate '  Standard ISA controllers' CONFIG_XENO_DRIVERS_CAN_SJA1000_ISA $CONFIG_XENO_DRIVERS_CAN_SJA1000
 if [ "$CONFIG_XENO_DRIVERS_CAN_SJA1000_ISA" != "n" ]; then
 	int '    Maximum number of controllers' CONFIG_XENO_DRIVERS_CAN_SJA1000_ISA_MAX_DEV 4
Index: ksrc/drivers/can/sja1000/rtcan_sja1000.h
===================================================================
--- ksrc/drivers/can/sja1000/rtcan_sja1000.h	(revision 2335)
+++ ksrc/drivers/can/sja1000/rtcan_sja1000.h	(working copy)
@@ -30,6 +30,7 @@ struct rtcan_sja1000 {
     unsigned short irq_flags;
     unsigned char ocr;
     unsigned char cdr;
+    char bus_err_on;
 };
 
 int rtcan_sja_create_proc(struct rtcan_device* dev);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-24 11:51                   ` Wolfgang Grandegger
@ 2007-03-24 13:38                     ` Jan Kiszka
  2007-04-02 16:22                       ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2007-03-24 13:38 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: socketcan-core, Oliver Hartkopp, xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

Wolfgang Grandegger wrote:
> ...
> Attached is the patch and it works fine. The key function
> rtcan_sja_enable_bus_err() is called from sendmsg():
> 
> void rtcan_sja_enable_bus_err(struct rtcan_device *dev)
> {
>     struct rtcan_sja1000 *chip = (struct rtcan_sja1000 *)dev->priv;
> 
>     if (chip->bus_err_on < 2) {
>     if (chip->bus_err_on < 1)
>         chip->read_reg(dev, SJA_ECC);
>     chip->bus_err_on = 2;
>     }
> }
> 
> And I do also do not see a real problem with multiple readers. I would
> commit this solution. I'm just unsure if we should select it silently or
> if the user should have the choice.

I would say no to user-selectability unless someone comes up with a
serious downside of this approach.

Are there any other error interrupt sources we should treat the same
way? Or all? Just to make the behaviour as regular as reasonable. I have
no opinion yet, I only want to make sure we have considered this as well
before we set the change API in stone.

Nice work!

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-03-24 13:38                     ` Jan Kiszka
@ 2007-04-02 16:22                       ` Wolfgang Grandegger
  2007-04-07 21:03                         ` Jan Kiszka
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-04-02 16:22 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: socketcan-core, Oliver Hartkopp, xenomai-core

Hello,

Jan Kiszka wrote:
> Wolfgang Grandegger wrote:
>> ...
>> Attached is the patch and it works fine. The key function
>> rtcan_sja_enable_bus_err() is called from sendmsg():
>>
>> void rtcan_sja_enable_bus_err(struct rtcan_device *dev)
>> {
>>     struct rtcan_sja1000 *chip = (struct rtcan_sja1000 *)dev->priv;
>>
>>     if (chip->bus_err_on < 2) {
>>     if (chip->bus_err_on < 1)
>>         chip->read_reg(dev, SJA_ECC);
>>     chip->bus_err_on = 2;
>>     }
>> }
>>
>> And I do also do not see a real problem with multiple readers. I would
>> commit this solution. I'm just unsure if we should select it silently or
>> if the user should have the choice.
> 
> I would say no to user-selectability unless someone comes up with a
> serious downside of this approach.
> 
> Are there any other error interrupt sources we should treat the same
> way? Or all? Just to make the behaviour as regular as reasonable. I have
> no opinion yet, I only want to make sure we have considered this as well
> before we set the change API in stone.
> 
> Nice work!

OK, I have just commited the following changes:

2007-04-02  Wolfgang Grandegger  <wg@domain.hid>

  * ksrc/drivers/can/*: The option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now
    enables bus error interrupts when an application is calling a receive
    function on a socket listening on bus errors. After one bus error has
    occured, the interrupt will be disabled to allow the application time
    for error processing and to efficiently avoid bus error interrupt
    flooding. This option is automatically selected for CAN controllers
    supporting bus error interrupts like the SJA1000.

  * include/rtdm/rtcan.h: Add some doc on bus-off and bus-error error
    conditions and the restart policy.

  * src/utils/can/rtcanconfig.c: Controller mode settings and doc
    has been corrected.

Wolfgang.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-04-02 16:22                       ` Wolfgang Grandegger
@ 2007-04-07 21:03                         ` Jan Kiszka
  2007-04-07 21:12                           ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2007-04-07 21:03 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

Wolfgang Grandegger wrote:
> OK, I have just commited the following changes:
> 
> 2007-04-02  Wolfgang Grandegger  <wg@domain.hid>
> 
>   * ksrc/drivers/can/*: The option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now
>     enables bus error interrupts when an application is calling a receive
>     function on a socket listening on bus errors. After one bus error has
>     occured, the interrupt will be disabled to allow the application time
>     for error processing and to efficiently avoid bus error interrupt
>     flooding. This option is automatically selected for CAN controllers
>     supporting bus error interrupts like the SJA1000.

Just one final nitpicking question: Who's to read the help text of
CONFIG_XENO_DRIVERS_CAN_BUS_ERR - except people reading Kconfig
directly? (My point is that this explanation may rather become a code
comment.)

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
  2007-04-07 21:03                         ` Jan Kiszka
@ 2007-04-07 21:12                           ` Wolfgang Grandegger
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-04-07 21:12 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Wolfgang Grandegger wrote:
>> OK, I have just commited the following changes:
>>
>> 2007-04-02  Wolfgang Grandegger  <wg@domain.hid>
>>
>>   * ksrc/drivers/can/*: The option CONFIG_XENO_DRIVERS_CAN_BUS_ERR now
>>     enables bus error interrupts when an application is calling a receive
>>     function on a socket listening on bus errors. After one bus error has
>>     occured, the interrupt will be disabled to allow the application time
>>     for error processing and to efficiently avoid bus error interrupt
>>     flooding. This option is automatically selected for CAN controllers
>>     supporting bus error interrupts like the SJA1000.
> 
> Just one final nitpicking question: Who's to read the help text of
> CONFIG_XENO_DRIVERS_CAN_BUS_ERR - except people reading Kconfig
> directly? (My point is that this explanation may rather become a code
> comment.)

Good hint. Patch comes soon. At least it's already documented in the API 
doc.

Wolfgang.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Xenomai-core] Re: RT-Socket-CAN bus error rate and latencies
       [not found] <460237BD.1020205@domain.hid>
@ 2007-03-22  8:12 ` Wolfgang Grandegger
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2007-03-22  8:12 UTC (permalink / raw)
  To: Hartkopp, Oliver (K-GEFE/E)
  Cc: socketcan-core, Oliver Hartkopp, Jan Kiszka, xenomai-core

Hartkopp, Oliver (K-GEFE/E) wrote:
> Even if i know, that my reply with our f*cking exchange mailserver will 
> break the thread ...
> 
> Jan Kiszka wrote:
>> I think Oliver's suggestions points in the right direction. But instead
>> of only coding a timer into the stack, I still vote for closing the loop
>> over the application:
>>
>> After the first error in a potential series, the related error frame is
>> queued, listeners are woken up, and BEI is disabled for now. Once some
>> listener read the error frame *and* decided to call into the stack for
>> further bus errors, BEI is enabled again.
>>
>>   
> IMO this is really a good idea! Even if it look just a bit different on 
> (non-RT) socketcan, the procedure could look like this:
> 
> 1. BEI occurs => disable BEI & lauch receiption of one error frame for 
> the user
> 2. Wait for user interaction
> 
> I would suggest to create an IOCTL, where you can define, if you only 
> want to receive one BE-frame (== default!) or every BE-frame (may flood 
> your system). After an occurance of a BEI the BE-frames are disabled (in 
> the default case) until the user decides to reenable it via the 
> suggested IOCTL.
> 
> Does this look acceptable to you?

No extra IOCTL please. I would place re-enabling of BEI in sendmsg.

> Btw. Currently the enabling of error frames (in socketcan) currently 
> does not depend on subscribed listeners (what is definitely a good idea!).

Why?

Wolfgang.



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-04-07 21:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-20 18:58 [Xenomai-core] RT-Socket-CAN bus error rate and latencies Wolfgang Grandegger
2007-03-20 19:10 ` Jan Kiszka
2007-03-20 19:29   ` Wolfgang Grandegger
2007-03-21 17:14 ` [Xenomai-core] " Wolfgang Grandegger
     [not found]   ` <46017CA7.2080801@domain.hid>
2007-03-21 20:29     ` Wolfgang Grandegger
2007-03-21 21:43       ` Jan Kiszka
2007-03-22  8:08         ` Wolfgang Grandegger
     [not found]           ` <46036D32.7000603@domain.hid>
     [not found]             ` <46036F22.60709@domain.hid>
2007-03-23  8:34               ` Jan Kiszka
2007-03-23  8:51                 ` Wolfgang Grandegger
2007-03-24 11:51                   ` Wolfgang Grandegger
2007-03-24 13:38                     ` Jan Kiszka
2007-04-02 16:22                       ` Wolfgang Grandegger
2007-04-07 21:03                         ` Jan Kiszka
2007-04-07 21:12                           ` Wolfgang Grandegger
2007-03-23  8:37             ` Wolfgang Grandegger
     [not found] <460237BD.1020205@domain.hid>
2007-03-22  8:12 ` Wolfgang Grandegger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.