All of lore.kernel.org
 help / color / mirror / Atom feed
* mcp25xxfd: missing cerrif
@ 2020-02-11 12:53 Marc Kleine-Budde
  2020-02-11 13:31 ` Thomas.Kopp
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Kleine-Budde @ 2020-02-11 12:53 UTC (permalink / raw)
  To: Thomas Kopp, linux-can

Hello Thomas,

according to the mcp2518fd datasheet:

> 10.5.3 CAN BUS ERROR INTERRUPT - CERRIF
>
> The CiTREC register will count the errors during transmit and receive
> according to the ISO 11898-1:2015. The CERRIF flag will be set based
> on the error counter values. The flag must be cleared by the
> application.
> 
> CERRIF will be set each time a threshold in the TEC/REC counter is
> crossed by the following conditions:
>
> - TEC or REC exceeds the Error Warning state threshold
> - The transmitter or receiver transitions to Error Passive state
> - The transmitter transitions to Bus Off state
> - The transmitter or receiver transitions from Error Passive to Error
>   Active state

I don't see this interrupt, neither on the mcp2517fd nor on the
mcp2518fd rev0.0.

> - The module transitions from Bus Off to Error Active state, after
>   the bus off recovery sequence
>
> When the user clears CERRIF, it will remain clear until a new counter
> crossing occurs.

The workaround in the driver is next to trivial and doesn't cause much
overhead, but can you try to confirm the error and try to get it fixed
in the next chip revision?

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: mcp25xxfd: missing cerrif
  2020-02-11 12:53 mcp25xxfd: missing cerrif Marc Kleine-Budde
@ 2020-02-11 13:31 ` Thomas.Kopp
  2020-02-11 13:51   ` Marc Kleine-Budde
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas.Kopp @ 2020-02-11 13:31 UTC (permalink / raw)
  To: mkl, linux-can

Hi Marc,

> I don't see this interrupt, neither on the mcp2517fd nor on the
> mcp2518fd rev0.0.

What do you mean you don't see the interrupt? I just did a quick test (non linux environment but with our test board and I do see CERRIF being set and it's possible to clear via SPI). What exactly are you checking?

Best Regards,
Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mcp25xxfd: missing cerrif
  2020-02-11 13:31 ` Thomas.Kopp
@ 2020-02-11 13:51   ` Marc Kleine-Budde
  2020-02-12  9:59     ` Thomas.Kopp
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Kleine-Budde @ 2020-02-11 13:51 UTC (permalink / raw)
  To: Thomas.Kopp, linux-can

On 2/11/20 2:31 PM, Thomas.Kopp@microchip.com wrote:
>>> - The transmitter or receiver transitions from Error Passive to Error
>>>   Active state

>> I don't see this interrupt, neither on the mcp2517fd nor on the
>> mcp2518fd rev0.0.
> 
> What do you mean you don't see the interrupt?

If the chip transitions from Error Passive to Error Active state the
CERRIF is not set.

> I just did a quick test (non linux environment but with our test
> board and I do see CERRIF being set and it's possible to clear via
> SPI). What exactly are you checking?
Below code is from the driver and illustrates the problem:

> 		/* On the MCP2527FD and MCP2518FD, we don't get a
> 		 * CERRIF IRQ on the transition TX ERROR_WARNING -> TX
> 		 * ERROR_ACTIVE.
> 		 */
> 		if (intf_pending & MCP25XXFD_CAN_INT_CERRIF ||
> 		    priv->can.state > CAN_STATE_ERROR_ACTIVE) {
> 			err = mcp25xxfd_handle(priv, cerrif);
> 			if (err)
> 				goto out_fail;
> 		}

On the transition from "TX ERROR_WARNING -> TX ERROR_ACTIVE" the CERRIF
is not set.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: mcp25xxfd: missing cerrif
  2020-02-11 13:51   ` Marc Kleine-Budde
@ 2020-02-12  9:59     ` Thomas.Kopp
  2020-02-14 13:59       ` Marc Kleine-Budde
  0 siblings, 1 reply; 6+ messages in thread
From: Thomas.Kopp @ 2020-02-12  9:59 UTC (permalink / raw)
  To: mkl, linux-can

> On the transition from "TX ERROR_WARNING -> TX ERROR_ACTIVE" the
> CERRIF
> is not set.

Mhm, do I misunderstand your point here? The Transition TX ERROR_WARNING -> TX ERROR_ACTIVE isn't covered by the CERRIF interrupt (which matches the documentation). The interrupt bit gets set on ERROR_Passive to ERROR_ACTIVE (TEC/REC goes from 128 to 127). At that transition I do see the CERRIF bit. It is correct and intended (though maybe not 100% intuitive) that going from ERROR_WARNING to ERROR_ACTIVE TEC/REC 96 to 95 CERRIF is NOT being set.

Best Regards,
Thomas

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: mcp25xxfd: missing cerrif
  2020-02-12  9:59     ` Thomas.Kopp
@ 2020-02-14 13:59       ` Marc Kleine-Budde
  2020-02-18 11:44         ` Thomas.Kopp
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Kleine-Budde @ 2020-02-14 13:59 UTC (permalink / raw)
  To: Thomas.Kopp, linux-can

On 2/12/20 10:59 AM, Thomas.Kopp@microchip.com wrote:
>> On the transition from "TX ERROR_WARNING -> TX ERROR_ACTIVE" the 
>> CERRIF is not set.
> Mhm, do I misunderstand your point here?

I misread the documentation somehow.

Hmmm. I think the question is, is there a dedicated warning state and
if, how are the transitions from and to it?

From my understanding:

           -- E > 95 --     -- E > 127 --     -- E > 255 --
          /            \   /             \   /             \
         /              V /               V /               V
--> error active  error warning      error passive       bus off
         ^              / ^               /
          \            /   \             /
           -- E < 96 --     -- E < 128 --

> The Transition TX ERROR_WARNING -> TX ERROR_ACTIVE isn't covered by
> the CERRIF interrupt (which matches the documentation).

ACK. The documentation talks about error passive to error active, which
means the diagram looks like this:

           -- E > 95 --     -- E > 127 --     -- E > 255 --
          /            \   /             \   /             \
         /              V /               V /               V
--> error active  error warning      error passive       bus off
         ^                                /
          \                              /
           ------------------- E < 128 --

> The interrupt bit gets set on ERROR_Passive to ERROR_ACTIVE (TEC/REC
> goes from 128 to 127).

I think we have to discuss how the state diagram on socketcan should
look like.

> At that transition I do see the CERRIF bit. It is correct and
> intended (though maybe not 100% intuitive) that going from
> ERROR_WARNING to ERROR_ACTIVE TEC/REC 96 to 95 CERRIF is NOT being
> set.

Ok.

regards
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: mcp25xxfd: missing cerrif
  2020-02-14 13:59       ` Marc Kleine-Budde
@ 2020-02-18 11:44         ` Thomas.Kopp
  0 siblings, 0 replies; 6+ messages in thread
From: Thomas.Kopp @ 2020-02-18 11:44 UTC (permalink / raw)
  To: mkl, linux-can

> Hmmm. I think the question is, is there a dedicated warning state and
> if, how are the transitions from and to it?
I think the main confusion comes from the fact that error warning is not a state as defined in the ISO. The ISO defines three states: error-active, error-passive and bus-off.
The error warning is an interrupt that (most?) CAN IPs issue when the error counters exceeds 96. AFAIK this is only a convention though, not a spec (I know that Kvaser and the Bosch MCAN handle it like this).

> ACK. The documentation talks about error passive to error active, which
> means the diagram looks like this:
Not quite, that's how it should look like:
                     -- E > 95 --         -- E > 127 --               -- E > 255 --
                      /            \          /             \                    /             \
                    /              V         /               V                /               V
 --> error active  (error warning)      error passive    bus off
           ^              /                      ^         /
            \            /                         \       /
             -- E < 96 --                   -- E < 128 --

The enum in netlink.h mixes the error states plus some other states. To fulfill the ISO an interrupt from Error warning to Error active isn't really useful. I suppose that's why it's not implemented in the MCP2518FD.

enum can_state {
	CAN_STATE_ERROR_ACTIVE = 0,	/* RX/TX error count < 96 */
	CAN_STATE_ERROR_WARNING,	/* RX/TX error count < 128 */
	CAN_STATE_ERROR_PASSIVE,	/* RX/TX error count < 256 */
	CAN_STATE_BUS_OFF,		/* RX/TX error count >= 256 */
	CAN_STATE_STOPPED,		/* Device is stopped */
	CAN_STATE_SLEEPING,		/* Device is sleeping */
	CAN_STATE_MAX
};

Quickly looking at different CAN drivers in the kernel I don't really see a common pattern how this counter is used.
Taking the M_CAN driver as an example (drivers/net/can/m_can/m_can.c#L671) I see that a state change from error passive to error active (with the error warning bit being set) leads to the error warning counter being incremented and the state being updated to CAN_STATE_ERROR_WARNING.
I did not find a documentation on what this error counter is supposed to show - I suppose I didn't look in the right place. But while the number of times the device was in error warning state has technically gone up by one I would have expected that this counts the switches. Again, this is not based on documentation (I'm not really familiar with the SocketCAN so it's likely I missed something) but rather on what I would have expected intuitively.

One more thing I noticed in the mcan driver is that there doesn't seem to be a way to go from error warning to error active without ifdown/ifup. So the driver remains in in error warning state for a significant time.

Another example in the kvaser_usb_hydra.c:
 * Known issues:
 *  - Transition from CAN_STATE_ERROR_WARNING to CAN_STATE_ERROR_ACTIVE is only
 *    reported after a call to do_get_berr_counter(), since firmware does not
 *    distinguish between ERROR_WARNING and ERROR_ACTIVE.

> I think we have to discuss how the state diagram on socketcan should
>look like.

I agree. How would that discussion happen?

Best Regards,
Thomas 


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-18 11:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 12:53 mcp25xxfd: missing cerrif Marc Kleine-Budde
2020-02-11 13:31 ` Thomas.Kopp
2020-02-11 13:51   ` Marc Kleine-Budde
2020-02-12  9:59     ` Thomas.Kopp
2020-02-14 13:59       ` Marc Kleine-Budde
2020-02-18 11:44         ` Thomas.Kopp

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.