All of lore.kernel.org
 help / color / mirror / Atom feed
* flexcan missing error state transitions
@ 2017-08-18  7:47 ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-21 16:18 ` Andri Yngvason
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-18  7:47 UTC (permalink / raw)
  To: linux-can, mkl, wg
  Cc: hs, Jonas Mark (ST-FIR/ENG1), RUAN Tingquan (ST-FIR/ENG1-Zhu)

[-- Attachment #1: Type: text/plain, Size: 2890 bytes --]

Hello,

In our i.MX6 based platform, we encountered Rx overflow in flexcan (kernel v4.9-rc2),
where commit b3cf53e9 (can: flexcan: add support for timestamp based rx-offload)
fixed this issue.

However, 
commit 30164759 (can: flexcan: make use of rx-offload's irq_offload_fifo) changed 
the way of can state transition from poll state upon receive + state + error interrupt to
only update state upon state interrupt.

Due to a chip flaw in flexcan core regarding state interrupt generation, there are some
state transitions which do not generate interrupts:
(copied from TomE's post on https://community.nxp.com/thread/333839)

Interrupts     MCF53xx i.MX53 i.MX6  MSCAN
------------------------------------------
Active->Warn   No      No     Yes    Yes
Warn->Active   No      No     No     Yes
Warn->Passive  No      No     No     Yes
Passive->Warn  No      No     No     Yes
Passive->BOFF  Yes     Yes    Yes    Yes
BOFF->Active   No      No     No     Yes
Errors         Yes     Yes    Yes    No

As our application requires to monitor the can bus state change, so this function
will not work properly after we applied all the commits to flexcan from v4.11 to v4.9-rc2.
We also tested v4.11 which contains all the latest patches to flexcan, and it behaves the same.

Sadly, the flexcan core does not generate error state interrupts for all state transitions.
This is required for the current flexcan driver to operate smoothly. We propose the following
workaround to mitigate this problem:
1. update state upon _any_ interrupt
2. update state when user explicitly inquires (via do_get_state callback)

Item 1 will keep the state correct as long as the bus is attached and can frames are send,
so the core can	generates Tx/Rx/State/Error interrupts accordingly.

However, if there is single node on the bus, then this node cannot enter bus off state
(only becomes error passive), and the core only generates a single interrupt (warn + ack error),
and no more. It looks like i.MX6 disables flood of error interrupts in auto-retransmission,
otherwise, the state would be updated. Nonetheless, the old driver (in v4.9-rc2) also cannot report
error passive in this case, even it postponed (a little) to read FLT_CONF in napi_poll.

This means the user space cannot simply rely on receiving a state change notification via a 
SocketCAN error frame. It needs to do active polling to get the correct state. But without proper
error interrupts or CAN traffic this is impossible. But essentially it is still an improvement over
v4.9-rc2 because there you could not get the proper error passive state even if polling was done
from user space. Hence item 2 allows user space to get the correct state manually.

The attached patch implements this for i.MX6 and has been tested on i.MX6.
It applies cleanly to v4.13-rc5.

Best regards,
Yi ZHU


[-- Attachment #2: 0001-can-flexcan-mitigate-incomplete-state-interrupt.patch --]
[-- Type: application/octet-stream, Size: 5938 bytes --]

From 6c0c111f0d8c318a459606cf5871dcc56512fb84 Mon Sep 17 00:00:00 2001
From: Zhu Yi <yi.zhu5@cn.bosch.com>
Date: Thu, 17 Aug 2017 14:45:19 +0800
Subject: [PATCH] can: flexcan: mitigate incomplete state interrupt

This patch adds a workaround to mitigate missing state transitions
on i.MX6 flexcan core which has incomplete state interrupt by:
1. update state upon _any_ interrupt
2. update state when user explicitly inquires (do_get_state())

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
Tested-by: Ruan Tingquan <tingquan.ruan@cn.bosch.com>
---
 drivers/net/can/flexcan.c | 91 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 72 insertions(+), 19 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 13f0f21..ac37fe4 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -199,6 +199,20 @@
 #define FLEXCAN_QUIRK_DISABLE_MECR	BIT(4) /* Disable Memory error detection */
 #define FLEXCAN_QUIRK_USE_OFF_TIMESTAMP	BIT(5) /* Use timestamp based offloading */
 
+/* FLEXCAN does not create a complete set of error state IRQs.
+ *
+ * Error State IRQ  i.MX53  i.MX6
+ * ------------------------------
+ * Active->Warn       no     yes
+ * Warn->Active       no      no
+ * Warn->Passive      no      no
+ * Passive->Warn      no      no
+ * Passive->BOFF     yes     yes
+ * BOFF->Active       no      no
+ * Errors            yes     yes
+ */
+#define FLEXCAN_QUIRK_BROKEN_STATE_IRQ	BIT(6) /* Incomplete state IRQ */
+
 /* Structure of the message buffer */
 struct flexcan_mb {
 	u32 can_ctrl;
@@ -288,7 +302,7 @@ static const struct flexcan_devtype_data fsl_imx28_devtype_data;
 
 static const struct flexcan_devtype_data fsl_imx6q_devtype_data = {
 	.quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS |
-		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP,
+		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP | FLEXCAN_QUIRK_BROKEN_STATE_IRQ,
 };
 
 static const struct flexcan_devtype_data fsl_vf610_devtype_data = {
@@ -478,6 +492,56 @@ static int flexcan_get_berr_counter(const struct net_device *dev,
 	return err;
 }
 
+static enum can_state __flexcan_get_state(const struct net_device *dev,
+                                          u32 reg_esr,
+                                          enum can_state *rx_state,
+                                          enum can_state *tx_state)
+{
+	enum can_state new_state;
+	int flt;
+	struct can_berr_counter bec;
+
+	flt = reg_esr & FLEXCAN_ESR_FLT_CONF_MASK;
+	if (likely(flt == FLEXCAN_ESR_FLT_CONF_ACTIVE)) {
+		*tx_state = unlikely(reg_esr & FLEXCAN_ESR_TX_WRN) ?
+			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
+		*rx_state = unlikely(reg_esr & FLEXCAN_ESR_RX_WRN) ?
+			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
+		new_state = max(*tx_state, *rx_state);
+	} else {
+		__flexcan_get_berr_counter(dev, &bec);
+		new_state = flt == FLEXCAN_ESR_FLT_CONF_PASSIVE ?
+			CAN_STATE_ERROR_PASSIVE : CAN_STATE_BUS_OFF;
+		*rx_state = bec.rxerr >= bec.txerr ? new_state : 0;
+		*tx_state = bec.rxerr <= bec.txerr ? new_state : 0;
+	}
+
+	return new_state;
+}
+
+static int flexcan_get_state(const struct net_device *dev,
+			     enum can_state *state)
+{
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->regs;
+	u32 reg_esr;
+	enum can_state rx_state, tx_state;
+
+	*state = priv->can.state;
+
+	/* only update the states which are supposed to be managed by hardware
+	 * in case the core has incomplete state interrupt
+	 */
+	if ((priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ) &&
+	    (priv->can.state != CAN_STATE_STOPPED) &&
+	    (priv->can.state != CAN_STATE_SLEEPING)) {
+		reg_esr = flexcan_read(&regs->esr);
+		*state = __flexcan_get_state(dev, reg_esr, &rx_state, &tx_state);
+	}
+
+	return 0;
+}
+
 static int flexcan_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	const struct flexcan_priv *priv = netdev_priv(dev);
@@ -587,23 +651,8 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
 	struct sk_buff *skb;
 	struct can_frame *cf;
 	enum can_state new_state, rx_state, tx_state;
-	int flt;
-	struct can_berr_counter bec;
 
-	flt = reg_esr & FLEXCAN_ESR_FLT_CONF_MASK;
-	if (likely(flt == FLEXCAN_ESR_FLT_CONF_ACTIVE)) {
-		tx_state = unlikely(reg_esr & FLEXCAN_ESR_TX_WRN) ?
-			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
-		rx_state = unlikely(reg_esr & FLEXCAN_ESR_RX_WRN) ?
-			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
-		new_state = max(tx_state, rx_state);
-	} else {
-		__flexcan_get_berr_counter(dev, &bec);
-		new_state = flt == FLEXCAN_ESR_FLT_CONF_PASSIVE ?
-			CAN_STATE_ERROR_PASSIVE : CAN_STATE_BUS_OFF;
-		rx_state = bec.rxerr >= bec.txerr ? new_state : 0;
-		tx_state = bec.rxerr <= bec.txerr ? new_state : 0;
-	}
+	new_state = __flexcan_get_state(dev, reg_esr, &rx_state, &tx_state);
 
 	/* state hasn't changed */
 	if (likely(new_state == priv->can.state))
@@ -765,8 +814,9 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 		flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, &regs->esr);
 	}
 
-	/* state change interrupt */
-	if (reg_esr & FLEXCAN_ESR_ERR_STATE)
+	/* state change interrupt or on incomplete state interrupt core */
+	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
+	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ))
 		flexcan_irq_state(dev, reg_esr);
 
 	/* bus error IRQ - handle if bus error reporting is activated */
@@ -1265,6 +1315,9 @@ static int flexcan_probe(struct platform_device *pdev)
 	priv->devtype_data = devtype_data;
 	priv->reg_xceiver = reg_xceiver;
 
+	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ)
+		priv->can.do_get_state = flexcan_get_state;
+
 	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_USE_OFF_TIMESTAMP) {
 		priv->tx_mb_idx = FLEXCAN_TX_MB_OFF_TIMESTAMP;
 		priv->tx_mb_reserved = &regs->mb[FLEXCAN_TX_MB_RESERVED_OFF_TIMESTAMP];
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-18  7:47 flexcan missing error state transitions ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-21 16:18 ` Andri Yngvason
  2017-08-21 17:13   ` AW: " Jonas Mark (ST-FIR/ENG1)
  0 siblings, 1 reply; 28+ messages in thread
From: Andri Yngvason @ 2017-08-21 16:18 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), linux-can, mkl, wg
  Cc: hs, Jonas Mark (ST-FIR/ENG1), RUAN Tingquan (ST-FIR/ENG1-Zhu)

Hello Yi ZHU,

Quoting ZHU Yi (ST-FIR/ENG1-Zhu) (2017-08-18 07:47:41)
> Sadly, the flexcan core does not generate error state interrupts for all state transitions.
> This is required for the current flexcan driver to operate smoothly. We propose the following
> workaround to mitigate this problem:
> 1. update state upon _any_ interrupt
> 2. update state when user explicitly inquires (via do_get_state callback)

For now, I think that it would be good enough to implement option 1 as this is
how things were before the regression. Unless there's a performance penalty?

I am not sure that I like the userspace polling idea. I think it would be better
to have the polling implemented within the driver because it keeps the interface
"simple" and there will be fewer differences between different platforms. I.e.
the same userpspace code should work for flexcan, sja1000, mscan, etc..

Regards,
Andri

^ permalink raw reply	[flat|nested] 28+ messages in thread

* AW: flexcan missing error state transitions
  2017-08-21 16:18 ` Andri Yngvason
@ 2017-08-21 17:13   ` Jonas Mark (ST-FIR/ENG1)
  2017-08-21 18:21     ` Andri Yngvason
  0 siblings, 1 reply; 28+ messages in thread
From: Jonas Mark (ST-FIR/ENG1) @ 2017-08-21 17:13 UTC (permalink / raw)
  To: Andri Yngvason, ZHU Yi (ST-FIR/ENG1-Zhu), linux-can, mkl, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu)

Hello Andri,

> I am not sure that I like the userspace polling idea. I think it would be better to have
> the polling implemented within the driver because it keeps the interface "simple"
> and there will be fewer differences between different platforms. I.e.
> the same userpspace code should work for flexcan, sja1000, mscan, etc..

Which polling interval do you think would be appropriate? It somehow has to be a reasonable trade-off between latency and CPU usage.

Would it make sense to make the polling interval configurable via device tree?

Regards,
Mark

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: AW: flexcan missing error state transitions
  2017-08-21 17:13   ` AW: " Jonas Mark (ST-FIR/ENG1)
@ 2017-08-21 18:21     ` Andri Yngvason
  2017-08-22 13:50       ` AW: " Jonas Mark (ST-FIR/ENG1)
  0 siblings, 1 reply; 28+ messages in thread
From: Andri Yngvason @ 2017-08-21 18:21 UTC (permalink / raw)
  To: Jonas Mark (ST-FIR/ENG1), ZHU Yi (ST-FIR/ENG1-Zhu), linux-can, mkl, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu)

Hi Mark,

Quoting Jonas Mark (ST-FIR/ENG1) (2017-08-21 17:13:34)
> > I am not sure that I like the userspace polling idea. I think it would be better to have
> > the polling implemented within the driver because it keeps the interface "simple"
> > and there will be fewer differences between different platforms. I.e.
> > the same userpspace code should work for flexcan, sja1000, mscan, etc..
> 
> Which polling interval do you think would be appropriate? It somehow has to be a reasonable trade-off between latency and CPU usage.
That's a good question. I don't think that there is a really good answer to it.
It would be nice to see intermittent state-transitions happening, so I would say
as fast as possible without affecting CPU usage to a significant degree. The
polling timer should not be a high resolution timer or anything that would
preempt an rt task.

This value could also depend on the bitrate as that will affect the rate at
which the error-counter can change. E.g. on 125 k rate, you will receive at most
one frame per 1 ms. This means that the error state will change at at least
64 ms intervals, so we would have to sample at 32 ms intervals to catch all the
transitions. For 1 M this would be around 2 ms.

Note that this is all somewhat speculative as I do not know exactly how fast the
error counters are incremented/decremented.

> 
> Would it make sense to make the polling interval configurable via device tree?
Also a good questions. I tried compiling a C program that runs usleep(2000) in a
loop for the i.MX6. It runs at around 1% CPU usage in top. I would call that
significant. Perhaps it would run better in kernel space?

If the polling interval can be configured, users who care about it can set it to
a value that matches their bitrate. Others could set it to zero to turn it off.

As the useful sampling frequency depends on the bitrate, perhaps it should even
be a part of the netlink interface? This would also make it easier for the user
to find a good sampling frequency based on trial and error.

Regards,
Andri

^ permalink raw reply	[flat|nested] 28+ messages in thread

* AW: AW: flexcan missing error state transitions
  2017-08-21 18:21     ` Andri Yngvason
@ 2017-08-22 13:50       ` Jonas Mark (ST-FIR/ENG1)
  2017-08-22 14:06         ` Marc Kleine-Budde
  2017-08-22 14:14         ` AW: AW: " Andri Yngvason
  0 siblings, 2 replies; 28+ messages in thread
From: Jonas Mark (ST-FIR/ENG1) @ 2017-08-22 13:50 UTC (permalink / raw)
  To: Andri Yngvason, linux-can, mkl, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1),
	ZHU Yi (ST-FIR/ENG1-Zhu)

Hello Andri,

> > Which polling interval do you think would be appropriate? It somehow has to be
> a reasonable trade-off between latency and CPU usage.
> That's a good question. I don't think that there is a really good answer to it.
> It would be nice to see intermittent state-transitions happening, so I would say as
> fast as possible without affecting CPU usage to a significant degree. The polling
> timer should not be a high resolution timer or anything that would preempt an rt
> task.
> 
> This value could also depend on the bitrate as that will affect the rate at which the
> error-counter can change. E.g. on 125 k rate, you will receive at most one frame
> per 1 ms. This means that the error state will change at at least
> 64 ms intervals, so we would have to sample at 32 ms intervals to catch all the
> transitions. For 1 M this would be around 2 ms.
> 
> Note that this is all somewhat speculative as I do not know exactly how fast the
> error counters are incremented/decremented.

We agree that there is a sensible lower boundary for the polling interval which depends on the bitrate. And there is a upper boundary which depends on the application.

We are currently thinking about whether it would be possible to do the polling only in case the controller is actually experiencing trouble.

For the i.MX6 there is an interrupt when the controller changes from Error Active to Error Warning. This shall enable the polling.

For the i.MX53 there is no Active -> Warning interrupt. But for all kinds of error events on the bus an Error interrupt will be created. So the polling shall be enabled then. This is a little bit less efficient compared to i.MX6 but the polling will be disabled during the next interrupt if the state is still Error Active (see below).

For both, the polling can be disabled when the status is read and it is Error Active. The polling shall be disabled as well when a Bus-Off interrupt occurs.

What we still need to resolve is the concurrency between the timer and the interrupt when accessing internal data structures and sending SocketCAN error frames. Once we have that sorted out we will send a patch. None the less, early feedback is welcome.

> > Would it make sense to make the polling interval configurable via device tree?
> Also a good questions. I tried compiling a C program that runs usleep(2000) in a
> loop for the i.MX6. It runs at around 1% CPU usage in top. I would call that
> significant. Perhaps it would run better in kernel space?
> 
> If the polling interval can be configured, users who care about it can set it to a
> value that matches their bitrate. Others could set it to zero to turn it off.

We also think that the polling interval should be configurable. It strongly depends on the application which latency between state change and notification is acceptable and how much computing resources may be invested to achieve that.

> As the useful sampling frequency depends on the bitrate, perhaps it should even
> be a part of the netlink interface? This would also make it easier for the user to find
> a good sampling frequency based on trial and error.

Right now we are not convinced that adding to the netlink interface is justified to work around deficiencies of a single CAN peripheral.

Regards,
Mark

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: AW: AW: flexcan missing error state transitions
  2017-08-22 13:50       ` AW: " Jonas Mark (ST-FIR/ENG1)
@ 2017-08-22 14:06         ` Marc Kleine-Budde
  2017-08-22 19:03           ` AW: " Jonas Mark (ST-FIR/ENG1)
  2017-08-27 10:57           ` Wolfgang Grandegger
  2017-08-22 14:14         ` AW: AW: " Andri Yngvason
  1 sibling, 2 replies; 28+ messages in thread
From: Marc Kleine-Budde @ 2017-08-22 14:06 UTC (permalink / raw)
  To: Jonas Mark (ST-FIR/ENG1), Andri Yngvason, linux-can, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), ZHU Yi (ST-FIR/ENG1-Zhu)


[-- Attachment #1.1: Type: text/plain, Size: 1983 bytes --]

On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
> Hello Andri,
> 
>>> Which polling interval do you think would be appropriate? It
>>> somehow has to be
>> a reasonable trade-off between latency and CPU usage. That's a good
>> question. I don't think that there is a really good answer to it. 
>> It would be nice to see intermittent state-transitions happening,
>> so I would say as fast as possible without affecting CPU usage to a
>> significant degree. The polling timer should not be a high
>> resolution timer or anything that would preempt an rt task.
>> 
>> This value could also depend on the bitrate as that will affect the
>> rate at which the error-counter can change. E.g. on 125 k rate, you
>> will receive at most one frame per 1 ms. This means that the error
>> state will change at at least 64 ms intervals, so we would have to
>> sample at 32 ms intervals to catch all the transitions. For 1 M
>> this would be around 2 ms.
>> 
>> Note that this is all somewhat speculative as I do not know exactly
>> how fast the error counters are incremented/decremented.
> 
> We agree that there is a sensible lower boundary for the polling
> interval which depends on the bitrate. And there is a upper boundary
> which depends on the application.
> 
> We are currently thinking about whether it would be possible to do
> the polling only in case the controller is actually experiencing
> trouble.

Why not enable bus errors on broken controllers. Disable once a bus
error hits and re-enable them after a configurable timeout with a proper
default. I have a patch somewhere, although it applies only against the
pre rx-offload version of the driver.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: AW: AW: flexcan missing error state transitions
  2017-08-22 13:50       ` AW: " Jonas Mark (ST-FIR/ENG1)
  2017-08-22 14:06         ` Marc Kleine-Budde
@ 2017-08-22 14:14         ` Andri Yngvason
  2017-08-27 12:55           ` Wolfgang Grandegger
  1 sibling, 1 reply; 28+ messages in thread
From: Andri Yngvason @ 2017-08-22 14:14 UTC (permalink / raw)
  To: linux-can, mkl, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1),
	ZHU Yi (ST-FIR/ENG1-Zhu)

Hello Mark,

Quoting Jonas Mark (ST-FIR/ENG1) (2017-08-22 13:50:35)
> > > Which polling interval do you think would be appropriate? It somehow has to be
> > a reasonable trade-off between latency and CPU usage.
> > That's a good question. I don't think that there is a really good answer to it.
> > It would be nice to see intermittent state-transitions happening, so I would say as
> > fast as possible without affecting CPU usage to a significant degree. The polling
> > timer should not be a high resolution timer or anything that would preempt an rt
> > task.
> > 
> > This value could also depend on the bitrate as that will affect the rate at which the
> > error-counter can change. E.g. on 125 k rate, you will receive at most one frame
> > per 1 ms. This means that the error state will change at at least
> > 64 ms intervals, so we would have to sample at 32 ms intervals to catch all the
> > transitions. For 1 M this would be around 2 ms.
> > 
> > Note that this is all somewhat speculative as I do not know exactly how fast the
> > error counters are incremented/decremented.
> 
> We agree that there is a sensible lower boundary for the polling interval which depends on the bitrate. And there is a upper boundary which depends on the application.
> 
> We are currently thinking about whether it would be possible to do the polling only in case the controller is actually experiencing trouble.
> 
> For the i.MX6 there is an interrupt when the controller changes from Error Active to Error Warning. This shall enable the polling.
> 
> For the i.MX53 there is no Active -> Warning interrupt. But for all kinds of error events on the bus an Error interrupt will be created. So the polling shall be enabled then. This is a little bit less efficient compared to i.MX6 but the polling will be disabled during the next interrupt if the state is still Error Active (see below).
> 
> For both, the polling can be disabled when the status is read and it is Error Active. The polling shall be disabled as well when a Bus-Off interrupt occurs.
> 
> What we still need to resolve is the concurrency between the timer and the interrupt when accessing internal data structures and sending SocketCAN error frames. Once we have that sorted out we will send a patch. None the less, early feedback is welcome.
The last time we discussed this issue, I suggested something similar and
Wolfgang's response was "Puh, that's far too sophisticated."

What you suggest does not sound overly complicated, so I would say it's a good
idea. It seems like it's the best that we can do with what we have.

Wolfgang, Marc, what do you think?
> 
> > > Would it make sense to make the polling interval configurable via device tree?
> > Also a good questions. I tried compiling a C program that runs usleep(2000) in a
> > loop for the i.MX6. It runs at around 1% CPU usage in top. I would call that
> > significant. Perhaps it would run better in kernel space?
> > 
> > If the polling interval can be configured, users who care about it can set it to a
> > value that matches their bitrate. Others could set it to zero to turn it off.
> 
> We also think that the polling interval should be configurable. It strongly depends on the application which latency between state change and notification is acceptable and how much computing resources may be invested to achieve that.
Good.
> 
> > As the useful sampling frequency depends on the bitrate, perhaps it should even
> > be a part of the netlink interface? This would also make it easier for the user to find
> > a good sampling frequency based on trial and error.
> 
> Right now we are not convinced that adding to the netlink interface is justified to work around deficiencies of a single CAN peripheral.
Fair enough.

Regards,
Andri

^ permalink raw reply	[flat|nested] 28+ messages in thread

* AW: AW: AW: flexcan missing error state transitions
  2017-08-22 14:06         ` Marc Kleine-Budde
@ 2017-08-22 19:03           ` Jonas Mark (ST-FIR/ENG1)
  2017-08-24  9:40             ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-27 10:57           ` Wolfgang Grandegger
  1 sibling, 1 reply; 28+ messages in thread
From: Jonas Mark (ST-FIR/ENG1) @ 2017-08-22 19:03 UTC (permalink / raw)
  To: Marc Kleine-Budde, Andri Yngvason, linux-can, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), ZHU Yi (ST-FIR/ENG1-Zhu),
	Jonas Mark (ST-FIR/ENG1)

Hello Marc,

> > We are currently thinking about whether it would be possible to do the
> > polling only in case the controller is actually experiencing trouble.
> 
> Why not enable bus errors on broken controllers. Disable once a bus error hits
> and re-enable them after a configurable timeout with a proper default. I have a
> patch somewhere, although it applies only against the pre rx-offload version of the
> driver.

I am not sure whether I understand your proposal. Do you propose to poll all the
time but only stop the polling when the controller enters Bus-Off?

Or do you mean to enable automatic restart of the CAN controller after a Bus-Off?
We could not do this. Our application needs to handle Bus-Off by its own. And
there will be still situations where FlexCAN will be silently stuck in Error Passive.

Could you explain your idea a little more? 

In case you find the patch you mentioned, we would also like look at it to see if we
can get an idea from it.

We are a little bit trapped: On the one hand we really like the recent patches
because it solves the Rx overflow problem we also experienced. On the other hand
the error state handling is not as good any more as it was. So we highly value your
support in finding a mainline-ready solution.

Regards,
Mark

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: AW: AW: flexcan missing error state transitions
  2017-08-22 19:03           ` AW: " Jonas Mark (ST-FIR/ENG1)
@ 2017-08-24  9:40             ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-25 17:16               ` Andri Yngvason
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-24  9:40 UTC (permalink / raw)
  To: Marc Kleine-Budde, Andri Yngvason, linux-can, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]

Hello Andri, Hello Marc,

Sorry for late response, a typhoon hit our city yesterday, so we were
disconnected from the world.

Based on the comments we received for now, we attached 3 patches for
further discussion.

Patch 1 (0001-can-flexcan-update-state-upon-any-interrupt.patch)
This patch keeps the state correct as long as there is traffic ongoing
on the bus with at least two nodes. The cost is we have to check the
state upon every interrupt. This is a very low overhead.

However, this patch cannot solve the case of a singled out node on the
bus. Here, it should report error passive, but it depends on the
FlexCAN core version, whether the driver reports error active or error
warning (never error passive).

Hence, we also prepared the following 2 patches (both need to be
applied after Patch 1):

Patch 2 (0001-can-flexcan-return-up-to-date-state-via-do_get_state.patch)
This patch allows the user space to get the correct state when it
explicitly inquiries. However, as Andri mentioned that the same user
space code should work for different can controllers, thus, this patch
would potentially require changes to existing user space code (due to
it cannot notify some transitions via the SocketCAN error frame) when
using FlexCAN.

Patch 3 (0001-can-flexcan-compensate-state-interrupt-on-demand.patch)
This patch adds a timer to poll the state to compensate the incomplete
state interrupt. The purpose is to avoid polling the state as much as
possible and still notify upon any state transition via the SocketCAN
error frame. It depends on auto-retransmission being enabled. But to
our understanding that is always the case - there is no code to disable
it. The user can configure the polling interval via DT based on the
bitrate used (reasonable lower boundary) or acceptable notification
latency in the application (upper boundary). The polling can be
disabled, too. Please note that the polling interval *must not* be
faster than <maximum transmission time for a frame> * 16, or else the
error counter will not reach 128 and thus the error state might not
change within the polling interval. So too fast is not good here.

I am looking forward to your response! :)

Best regards
Yi

[-- Attachment #2: 0001-can-flexcan-update-state-upon-any-interrupt.patch --]
[-- Type: application/octet-stream, Size: 2441 bytes --]

From 4fc6f47c89c4ef92b7e169ff9dbf03c37c3a8d02 Mon Sep 17 00:00:00 2001
From: Zhu Yi <yi.zhu5@cn.bosch.com>
Date: Thu, 24 Aug 2017 10:02:20 +0800
Subject: [PATCH] can: flexcan: update state upon any interrupt

Due to i.MX6 flexcan core will not generate state interrupt for some
state transitions, so this patch changes to update state upon _any_
interrupt to mitigate this issue.

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
Tested-by: Ruan Tingquan <tingquan.ruan@cn.bosch.com>
---
 drivers/net/can/flexcan.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 13f0f21..c5c189f 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -199,6 +199,20 @@
 #define FLEXCAN_QUIRK_DISABLE_MECR	BIT(4) /* Disable Memory error detection */
 #define FLEXCAN_QUIRK_USE_OFF_TIMESTAMP	BIT(5) /* Use timestamp based offloading */
 
+/* FLEXCAN does not create a complete set of error state IRQs.
+ *
+ * Error State IRQ  i.MX53  i.MX6
+ * ------------------------------
+ * Active->Warn       no     yes
+ * Warn->Active       no      no
+ * Warn->Passive      no      no
+ * Passive->Warn      no      no
+ * Passive->BOFF     yes     yes
+ * BOFF->Active       no      no
+ * Errors            yes     yes
+ */
+#define FLEXCAN_QUIRK_BROKEN_STATE_IRQ	BIT(6) /* Incomplete state IRQ */
+
 /* Structure of the message buffer */
 struct flexcan_mb {
 	u32 can_ctrl;
@@ -288,7 +302,7 @@ static const struct flexcan_devtype_data fsl_imx28_devtype_data;
 
 static const struct flexcan_devtype_data fsl_imx6q_devtype_data = {
 	.quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS |
-		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP,
+		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP | FLEXCAN_QUIRK_BROKEN_STATE_IRQ,
 };
 
 static const struct flexcan_devtype_data fsl_vf610_devtype_data = {
@@ -765,8 +779,9 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 		flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, &regs->esr);
 	}
 
-	/* state change interrupt */
-	if (reg_esr & FLEXCAN_ESR_ERR_STATE)
+	/* state change interrupt or on incomplete state interrupt core */
+	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
+	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ))
 		flexcan_irq_state(dev, reg_esr);
 
 	/* bus error IRQ - handle if bus error reporting is activated */
-- 
2.7.4


[-- Attachment #3: 0001-can-flexcan-return-up-to-date-state-via-do_get_state.patch --]
[-- Type: application/octet-stream, Size: 4280 bytes --]

From 892555d3b930406f3c6433fa429b2f548b7322f2 Mon Sep 17 00:00:00 2001
From: Zhu Yi <yi.zhu5@cn.bosch.com>
Date: Thu, 24 Aug 2017 10:57:12 +0800
Subject: [PATCH] can: flexcan: return up-to-date state via do_get_state()

Due to i.MX6 flexcan core will not generate state interrupt for some
state transitions, this patch implements do_get_state() callback to
allow userspace manually retrieve the up-to-date state.

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
Tested-by: Ruan Tingquan <tingquan.ruan@cn.bosch.com>
---
 drivers/net/can/flexcan.c | 70 ++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 54 insertions(+), 16 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index c5c189f..ac37fe4 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -492,6 +492,56 @@ static int flexcan_get_berr_counter(const struct net_device *dev,
 	return err;
 }
 
+static enum can_state __flexcan_get_state(const struct net_device *dev,
+                                          u32 reg_esr,
+                                          enum can_state *rx_state,
+                                          enum can_state *tx_state)
+{
+	enum can_state new_state;
+	int flt;
+	struct can_berr_counter bec;
+
+	flt = reg_esr & FLEXCAN_ESR_FLT_CONF_MASK;
+	if (likely(flt == FLEXCAN_ESR_FLT_CONF_ACTIVE)) {
+		*tx_state = unlikely(reg_esr & FLEXCAN_ESR_TX_WRN) ?
+			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
+		*rx_state = unlikely(reg_esr & FLEXCAN_ESR_RX_WRN) ?
+			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
+		new_state = max(*tx_state, *rx_state);
+	} else {
+		__flexcan_get_berr_counter(dev, &bec);
+		new_state = flt == FLEXCAN_ESR_FLT_CONF_PASSIVE ?
+			CAN_STATE_ERROR_PASSIVE : CAN_STATE_BUS_OFF;
+		*rx_state = bec.rxerr >= bec.txerr ? new_state : 0;
+		*tx_state = bec.rxerr <= bec.txerr ? new_state : 0;
+	}
+
+	return new_state;
+}
+
+static int flexcan_get_state(const struct net_device *dev,
+			     enum can_state *state)
+{
+	const struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->regs;
+	u32 reg_esr;
+	enum can_state rx_state, tx_state;
+
+	*state = priv->can.state;
+
+	/* only update the states which are supposed to be managed by hardware
+	 * in case the core has incomplete state interrupt
+	 */
+	if ((priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ) &&
+	    (priv->can.state != CAN_STATE_STOPPED) &&
+	    (priv->can.state != CAN_STATE_SLEEPING)) {
+		reg_esr = flexcan_read(&regs->esr);
+		*state = __flexcan_get_state(dev, reg_esr, &rx_state, &tx_state);
+	}
+
+	return 0;
+}
+
 static int flexcan_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	const struct flexcan_priv *priv = netdev_priv(dev);
@@ -601,23 +651,8 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
 	struct sk_buff *skb;
 	struct can_frame *cf;
 	enum can_state new_state, rx_state, tx_state;
-	int flt;
-	struct can_berr_counter bec;
 
-	flt = reg_esr & FLEXCAN_ESR_FLT_CONF_MASK;
-	if (likely(flt == FLEXCAN_ESR_FLT_CONF_ACTIVE)) {
-		tx_state = unlikely(reg_esr & FLEXCAN_ESR_TX_WRN) ?
-			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
-		rx_state = unlikely(reg_esr & FLEXCAN_ESR_RX_WRN) ?
-			CAN_STATE_ERROR_WARNING : CAN_STATE_ERROR_ACTIVE;
-		new_state = max(tx_state, rx_state);
-	} else {
-		__flexcan_get_berr_counter(dev, &bec);
-		new_state = flt == FLEXCAN_ESR_FLT_CONF_PASSIVE ?
-			CAN_STATE_ERROR_PASSIVE : CAN_STATE_BUS_OFF;
-		rx_state = bec.rxerr >= bec.txerr ? new_state : 0;
-		tx_state = bec.rxerr <= bec.txerr ? new_state : 0;
-	}
+	new_state = __flexcan_get_state(dev, reg_esr, &rx_state, &tx_state);
 
 	/* state hasn't changed */
 	if (likely(new_state == priv->can.state))
@@ -1280,6 +1315,9 @@ static int flexcan_probe(struct platform_device *pdev)
 	priv->devtype_data = devtype_data;
 	priv->reg_xceiver = reg_xceiver;
 
+	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ)
+		priv->can.do_get_state = flexcan_get_state;
+
 	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_USE_OFF_TIMESTAMP) {
 		priv->tx_mb_idx = FLEXCAN_TX_MB_OFF_TIMESTAMP;
 		priv->tx_mb_reserved = &regs->mb[FLEXCAN_TX_MB_RESERVED_OFF_TIMESTAMP];
-- 
2.7.4


[-- Attachment #4: 0001-can-flexcan-compensate-state-interrupt-on-demand.patch --]
[-- Type: application/octet-stream, Size: 5260 bytes --]

From a038915dea536336d630d8c2fd17ff42e473f224 Mon Sep 17 00:00:00 2001
From: Zhu Yi <yi.zhu5@cn.bosch.com>
Date: Thu, 24 Aug 2017 13:10:15 +0800
Subject: [PATCH] can: flexcan: compensate state interrupt on demand

Due to i.MX6 flexcan core will not generate state interrupt for some
state transitions, this patch adds a timer with configurable timeout
to update state in case the core cannot fire interrupt.

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
---
 .../devicetree/bindings/net/can/fsl-flexcan.txt    |  4 +++
 drivers/net/can/flexcan.c                          | 42 ++++++++++++++++++++--
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
index 56d6cc3..f7a7327 100644
--- a/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
+++ b/Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
@@ -18,6 +18,9 @@ Optional properties:
 
 - xceiver-supply: Regulator that powers the CAN transceiver
 
+- update-state-ms: The time needed for FLT_CONF indicate error passive when
+                   there is single node on the bus.
+
 Example:
 
 	can@1c000 {
@@ -26,4 +29,5 @@ Example:
 		interrupts = <48 0x2>;
 		interrupt-parent = <&mpic>;
 		clock-frequency = <200000000>; // filled in by bootloader
+		update-state-ms = <1000>; // depends on bitrate
 	};
diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index c5c189f..92d967a 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -35,6 +35,7 @@
 #include <linux/of_device.h>
 #include <linux/platform_device.h>
 #include <linux/regulator/consumer.h>
+#include <linux/timer.h>
 
 #define DRV_NAME			"flexcan"
 
@@ -292,6 +293,9 @@ struct flexcan_priv {
 	struct clk *clk_per;
 	const struct flexcan_devtype_data *devtype_data;
 	struct regulator *reg_xceiver;
+
+	u32 update_state_ms;
+	struct timer_list update_state_timer;
 };
 
 static const struct flexcan_devtype_data fsl_p1010_devtype_data = {
@@ -635,6 +639,20 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
 	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
 }
 
+static void flexcan_update_state_timeout(unsigned long data)
+{
+	struct net_device *dev = (struct net_device*)data;
+	struct flexcan_priv *priv = netdev_priv(dev);
+	struct flexcan_regs __iomem *regs = priv->regs;
+	u32 reg_esr;
+
+	/* update state atomically */
+	disable_irq(dev->irq);
+	reg_esr = flexcan_read(&regs->esr);
+	flexcan_irq_state(dev, reg_esr);
+	enable_irq(dev->irq);
+}
+
 static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload *offload)
 {
 	return container_of(offload, struct flexcan_priv, offload);
@@ -781,9 +799,16 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 
 	/* state change interrupt or on incomplete state interrupt core */
 	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
-	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ))
+	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ)) {
 		flexcan_irq_state(dev, reg_esr);
 
+		/* (re)start timer to update state later on demand */
+		if ((priv->update_state_ms > 0) &&
+		    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ))
+			mod_timer(&priv->update_state_timer,
+			          jiffies + msecs_to_jiffies(priv->update_state_ms));
+	}
+
 	/* bus error IRQ - handle if bus error reporting is activated */
 	if ((reg_esr & FLEXCAN_ESR_ERR_BUS) &&
 	    (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING))
@@ -1210,6 +1235,7 @@ static int flexcan_probe(struct platform_device *pdev)
 	struct flexcan_regs __iomem *regs;
 	int err, irq;
 	u32 clock_freq = 0;
+	u32 update_state_ms = 0;
 
 	reg_xceiver = devm_regulator_get(&pdev->dev, "xceiver");
 	if (PTR_ERR(reg_xceiver) == -EPROBE_DEFER)
@@ -1217,9 +1243,12 @@ static int flexcan_probe(struct platform_device *pdev)
 	else if (IS_ERR(reg_xceiver))
 		reg_xceiver = NULL;
 
-	if (pdev->dev.of_node)
+	if (pdev->dev.of_node) {
 		of_property_read_u32(pdev->dev.of_node,
 				     "clock-frequency", &clock_freq);
+		of_property_read_u32(pdev->dev.of_node,
+		                     "update-state-ms", &update_state_ms);
+	}
 
 	if (!clock_freq) {
 		clk_ipg = devm_clk_get(&pdev->dev, "ipg");
@@ -1279,6 +1308,12 @@ static int flexcan_probe(struct platform_device *pdev)
 	priv->clk_per = clk_per;
 	priv->devtype_data = devtype_data;
 	priv->reg_xceiver = reg_xceiver;
+	priv->update_state_ms = update_state_ms;
+
+	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ)
+		setup_timer(&priv->update_state_timer,
+		            flexcan_update_state_timeout,
+		            (unsigned long)dev);
 
 	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_USE_OFF_TIMESTAMP) {
 		priv->tx_mb_idx = FLEXCAN_TX_MB_OFF_TIMESTAMP;
@@ -1337,6 +1372,9 @@ static int flexcan_remove(struct platform_device *pdev)
 	struct net_device *dev = platform_get_drvdata(pdev);
 	struct flexcan_priv *priv = netdev_priv(dev);
 
+	if (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_STATE_IRQ)
+		del_timer_sync(&priv->update_state_timer);
+
 	unregister_flexcandev(dev);
 	can_rx_offload_del(&priv->offload);
 	free_candev(dev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: AW: AW: flexcan missing error state transitions
  2017-08-24  9:40             ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-25 17:16               ` Andri Yngvason
  0 siblings, 0 replies; 28+ messages in thread
From: Andri Yngvason @ 2017-08-25 17:16 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, linux-can, wg
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hi Yi

I've only had time to review patch 3.

Quoting ZHU Yi (ST-FIR/ENG1-Zhu) (2017-08-24 09:40:38)
> Patch 3 (0001-can-flexcan-compensate-state-interrupt-on-demand.patch)
> This patch adds a timer to poll the state to compensate the incomplete
> state interrupt. The purpose is to avoid polling the state as much as
> possible and still notify upon any state transition via the SocketCAN
> error frame. It depends on auto-retransmission being enabled. But to
> our understanding that is always the case - there is no code to disable
> it. The user can configure the polling interval via DT based on the
> bitrate used (reasonable lower boundary) or acceptable notification
> latency in the application (upper boundary). The polling can be
> disabled, too. Please note that the polling interval *must not* be
> faster than <maximum transmission time for a frame> * 16, or else the
> error counter will not reach 128 and thus the error state might not
> change within the polling interval. So too fast is not good here.

That sounds rather shaky. Wouldn't it be more robust to have the timer be
periodic? Besides, over-sampling is preferable in order to avoid missed
transitions, right? See diff below.

> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
...
> +static void flexcan_update_state_timeout(unsigned long data)
> +{
> +	struct net_device *dev = (struct net_device*)data;
> +	struct flexcan_priv *priv = netdev_priv(dev);
> +	struct flexcan_regs __iomem *regs = priv->regs;
> +	u32 reg_esr;
> +
> +	/* update state atomically */
> +	disable_irq(dev->irq);
> +	reg_esr = flexcan_read(&regs->esr);
> +	flexcan_irq_state(dev, reg_esr);
You could restart the timer here if the error state is not error-active. Then
you would not have the problem of having to choose an interval so carefully.
> +	enable_irq(dev->irq);
> +}
...

Best regards,
Andri

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-22 14:06         ` Marc Kleine-Budde
  2017-08-22 19:03           ` AW: " Jonas Mark (ST-FIR/ENG1)
@ 2017-08-27 10:57           ` Wolfgang Grandegger
  2017-08-28  4:21             ` ZHU Yi (ST-FIR/ENG1-Zhu)
  1 sibling, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-27 10:57 UTC (permalink / raw)
  To: Marc Kleine-Budde, Jonas Mark (ST-FIR/ENG1), Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), ZHU Yi (ST-FIR/ENG1-Zhu)

Am 22.08.2017 um 16:06 schrieb Marc Kleine-Budde:
> On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
>> Hello Andri,
>>
>>>> Which polling interval do you think would be appropriate? It
>>>> somehow has to be
>>> a reasonable trade-off between latency and CPU usage. That's a good
>>> question. I don't think that there is a really good answer to it.
>>> It would be nice to see intermittent state-transitions happening,
>>> so I would say as fast as possible without affecting CPU usage to a
>>> significant degree. The polling timer should not be a high
>>> resolution timer or anything that would preempt an rt task.
>>>
>>> This value could also depend on the bitrate as that will affect the
>>> rate at which the error-counter can change. E.g. on 125 k rate, you
>>> will receive at most one frame per 1 ms. This means that the error
>>> state will change at at least 64 ms intervals, so we would have to
>>> sample at 32 ms intervals to catch all the transitions. For 1 M
>>> this would be around 2 ms.
>>>
>>> Note that this is all somewhat speculative as I do not know exactly
>>> how fast the error counters are incremented/decremented.
>>
>> We agree that there is a sensible lower boundary for the polling
>> interval which depends on the bitrate. And there is a upper boundary
>> which depends on the application.
>>
>> We are currently thinking about whether it would be possible to do
>> the polling only in case the controller is actually experiencing
>> trouble.
> 
> Why not enable bus errors on broken controllers. Disable once a bus

Yes, that's what we have done in the past for Flexcan controller cores 
that do not have dedicated interrupt lines for state changes. We have 
introduced FLEXCAN_QUIRK_BROKEN_ERR_STATE for that purpose. It should 
work already with the current driver. Yi ZHU, have you tried that?

> error hits and re-enable them after a configurable timeout with a proper
> default. I have a patch somewhere, although it applies only against the
> pre rx-offload version of the driver.

And that would help to avoid the interrupt flooding problem.

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-22 14:14         ` AW: AW: " Andri Yngvason
@ 2017-08-27 12:55           ` Wolfgang Grandegger
  0 siblings, 0 replies; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-27 12:55 UTC (permalink / raw)
  To: Andri Yngvason, Jonas Mark (ST-FIR/ENG1), linux-can, mkl
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), ZHU Yi (ST-FIR/ENG1-Zhu)

Hello Andri,

Am 22.08.2017 um 16:14 schrieb Andri Yngvason:
> Hello Mark,
> 
> Quoting Jonas Mark (ST-FIR/ENG1) (2017-08-22 13:50:35)
>>>> Which polling interval do you think would be appropriate? It somehow has to be
>>> a reasonable trade-off between latency and CPU usage.
>>> That's a good question. I don't think that there is a really good answer to it.
>>> It would be nice to see intermittent state-transitions happening, so I would say as
>>> fast as possible without affecting CPU usage to a significant degree. The polling
>>> timer should not be a high resolution timer or anything that would preempt an rt
>>> task.
>>>
>>> This value could also depend on the bitrate as that will affect the rate at which the
>>> error-counter can change. E.g. on 125 k rate, you will receive at most one frame
>>> per 1 ms. This means that the error state will change at at least
>>> 64 ms intervals, so we would have to sample at 32 ms intervals to catch all the
>>> transitions. For 1 M this would be around 2 ms.
>>>
>>> Note that this is all somewhat speculative as I do not know exactly how fast the
>>> error counters are incremented/decremented.
>>
>> We agree that there is a sensible lower boundary for the polling interval which depends on the bitrate. And there is a upper boundary which depends on the application.
>>
>> We are currently thinking about whether it would be possible to do the polling only in case the controller is actually experiencing trouble.
>>
>> For the i.MX6 there is an interrupt when the controller changes from Error Active to Error Warning. This shall enable the polling.
>>
>> For the i.MX53 there is no Active -> Warning interrupt. But for all kinds of error events on the bus an Error interrupt will be created. So the polling shall be enabled then. This is a little bit less efficient compared to i.MX6 but the polling will be disabled during the next interrupt if the state is still Error Active (see below).
>>
>> For both, the polling can be disabled when the status is read and it is Error Active. The polling shall be disabled as well when a Bus-Off interrupt occurs.
>>
>> What we still need to resolve is the concurrency between the timer and the interrupt when accessing internal data structures and sending SocketCAN error frames. Once we have that sorted out we will send a patch. None the less, early feedback is welcome.

I do not see a benefit doing polling vs. using bus error interrupts and 
disabling them for a certain amount of time (to avoid flooding). In both 
cases we may not get all state changes at the right moment but somehow 
delayed. The delay period could be configured via module parameter. On 
low end systems it may need to be rather high, on high end systems it 
could even be disabled (delay=0).

> The last time we discussed this issue, I suggested something similar and
> Wolfgang's response was "Puh, that's far too sophisticated."

Well, most users don't use state change monitoring but on the other hand 
it's a mandatory feature. I still prefer a simple solution, no bit-rate 
dependent delay, netlink interface, etc.

> What you suggest does not sound overly complicated, so I would say it's a good
> idea. It seems like it's the best that we can do with what we have.
> 
> Wolfgang, Marc, what do you think?

See above.

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-27 10:57           ` Wolfgang Grandegger
@ 2017-08-28  4:21             ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-28  8:33               ` Wolfgang Grandegger
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-28  4:21 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hi Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Sunday, August 27, 2017 6:57 PM
> 
> Am 22.08.2017 um 16:06 schrieb Marc Kleine-Budde:
> > On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
> >> Hello Andri,
> >>
> >>>> Which polling interval do you think would be appropriate? It
> >>>> somehow has to be
> >>> a reasonable trade-off between latency and CPU usage. That's a good
> >>> question. I don't think that there is a really good answer to it.
> >>> It would be nice to see intermittent state-transitions happening,
> >>> so I would say as fast as possible without affecting CPU usage to a
> >>> significant degree. The polling timer should not be a high
> >>> resolution timer or anything that would preempt an rt task.
> >>>
> >>> This value could also depend on the bitrate as that will affect the
> >>> rate at which the error-counter can change. E.g. on 125 k rate, you
> >>> will receive at most one frame per 1 ms. This means that the error
> >>> state will change at at least 64 ms intervals, so we would have to
> >>> sample at 32 ms intervals to catch all the transitions. For 1 M
> >>> this would be around 2 ms.
> >>>
> >>> Note that this is all somewhat speculative as I do not know exactly
> >>> how fast the error counters are incremented/decremented.
> >>
> >> We agree that there is a sensible lower boundary for the polling
> >> interval which depends on the bitrate. And there is a upper boundary
> >> which depends on the application.
> >>
> >> We are currently thinking about whether it would be possible to do
> >> the polling only in case the controller is actually experiencing
> >> trouble.
> >
> > Why not enable bus errors on broken controllers. Disable once a bus
> 
> Yes, that's what we have done in the past for Flexcan controller cores
> that do not have dedicated interrupt lines for state changes. We have
> introduced FLEXCAN_QUIRK_BROKEN_ERR_STATE for that purpose. It should
> work already with the current driver. Yi ZHU, have you tried that?
Yes, I've tried that. After add FLEXCAN_QUIRK_BROKEN_ERR_STATE to i.MX6,
the state transitions still not correct. The reason is flexcan_irq_state() will only
be called upon state interrupt, but the flexcan core will not generate interrupts for
all state transitions.
(http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L769)

After enable bus error interrupt and change to call flexcan_irq_state() upon _any_ 
interrupt, then we can get correct state transitions as expected. Of course, interrupt
flooding also came as expected.

> 
> > error hits and re-enable them after a configurable timeout with a proper
> > default. I have a patch somewhere, although it applies only against the
> > pre rx-offload version of the driver.
> 
> And that would help to avoid the interrupt flooding problem.
Yes, that would help. But I think if we change to above approach, then we do not
need to rely on a configurable timeout to avoid interrupt flooding. We can disable/
enable error interrupt depends on the state transitions:
1. disable error interrupt when state change to error passive
Why error passive not bus off?

First, flexcan is able to interrupt when bus off, and the node will not generate any
interrupt in this state. Second, interrupt flooding happens when node is doing 
auto-retransmission until bus off, and in case of only singled out node on the bus,
then this node cannot become bus off, hence, the flooding won't stop.

However at most 16 times consecutive bus error interrupts will change state to error
passive, so after the node in this state, we can safely disable the bus error interrupt.

2. enable error interrupt when state change to error warning
Why error warning not error active?

Minimal 32 times consecutive Tx/Rx interrupts will change state from error passive to
error active. So if the error condition occur again while the state is not recovered to
error active (e.g., at error warning), then until the error condition is removed, the state
will stay in error warning.

Hence, re-enable the error interrupt upon error warning transition allow us to report
error passive again. And if the error condition occur again while the state is still
error passive, we do not need to touch the bus error interrupt.

I've tested this, the state transitions works fine, and the flood of interrupts are limited
to the number of required frame transfer according to the rule of error counting and
state transitions defined in specification.

What do you think? If you agree with this, I can prepare a patch for review.

Best regards
Yi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-28  4:21             ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-28  8:33               ` Wolfgang Grandegger
  2017-08-29  8:49                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-28  8:33 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello ZHU Yi,

Am 28.08.2017 um 06:21 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hi Wolfgang,
> 
>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>> Sent: Sunday, August 27, 2017 6:57 PM
>>
>> Am 22.08.2017 um 16:06 schrieb Marc Kleine-Budde:
>>> On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
>>>> Hello Andri,
>>>>
>>>>>> Which polling interval do you think would be appropriate? It
>>>>>> somehow has to be
>>>>> a reasonable trade-off between latency and CPU usage. That's a good
>>>>> question. I don't think that there is a really good answer to it.
>>>>> It would be nice to see intermittent state-transitions happening,
>>>>> so I would say as fast as possible without affecting CPU usage to a
>>>>> significant degree. The polling timer should not be a high
>>>>> resolution timer or anything that would preempt an rt task.
>>>>>
>>>>> This value could also depend on the bitrate as that will affect the
>>>>> rate at which the error-counter can change. E.g. on 125 k rate, you
>>>>> will receive at most one frame per 1 ms. This means that the error
>>>>> state will change at at least 64 ms intervals, so we would have to
>>>>> sample at 32 ms intervals to catch all the transitions. For 1 M
>>>>> this would be around 2 ms.
>>>>>
>>>>> Note that this is all somewhat speculative as I do not know exactly
>>>>> how fast the error counters are incremented/decremented.
>>>>
>>>> We agree that there is a sensible lower boundary for the polling
>>>> interval which depends on the bitrate. And there is a upper boundary
>>>> which depends on the application.
>>>>
>>>> We are currently thinking about whether it would be possible to do
>>>> the polling only in case the controller is actually experiencing
>>>> trouble.
>>>
>>> Why not enable bus errors on broken controllers. Disable once a bus
>>
>> Yes, that's what we have done in the past for Flexcan controller cores
>> that do not have dedicated interrupt lines for state changes. We have
>> introduced FLEXCAN_QUIRK_BROKEN_ERR_STATE for that purpose. It should
>> work already with the current driver. Yi ZHU, have you tried that?
> Yes, I've tried that. After add FLEXCAN_QUIRK_BROKEN_ERR_STATE to i.MX6,
> the state transitions still not correct. The reason is flexcan_irq_state() will only
> be called upon state interrupt, but the flexcan core will not generate interrupts for
> all state transitions.
> (http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L769)

I see! That's a *bug*, which should be fixed first! The bus error 
interrupt reports bus errors, no state changes. It's triggered when the 
bus error counter changes, Therefore it should be possible to derive 
proper state changes from it. FLEXCAN_QUIRK_BROKEN_ERR_STATE relys on that.

> After enable bus error interrupt and change to call flexcan_irq_state() upon _any_
> interrupt, then we can get correct state transitions as expected. Of course, interrupt
> flooding also came as expected.

OK.

>>> error hits and re-enable them after a configurable timeout with a proper
>>> default. I have a patch somewhere, although it applies only against the
>>> pre rx-offload version of the driver.
>>
>> And that would help to avoid the interrupt flooding problem.
> Yes, that would help. But I think if we change to above approach, then we do not
> need to rely on a configurable timeout to avoid interrupt flooding. We can disable/
> enable error interrupt depends on the state transitions:
> 1. disable error interrupt when state change to error passive
> Why error passive not bus off?

That's defined somewhere in the standard, but I couldn't find a good 
ref/link quickly. When the node is bus-off, it can not receive messages 
any longer.

> First, flexcan is able to interrupt when bus off, and the node will not generate any
> interrupt in this state. Second, interrupt flooding happens when node is doing
> auto-retransmission until bus off, and in case of only singled out node on the bus,
> then this node cannot become bus off, hence, the flooding won't stop.

When no cable is connected, the node will never go to bus-off. Yes, 
that's normal behaviour.

> However at most 16 times consecutive bus error interrupts will change state to error
> passive, so after the node in this state, we can safely disable the bus error interrupt.

This depends on the bus off recovery method defined by "restart-ms". If 
it's "0", the controller should not be restarted automatically, 
otherwise after the specified time. But bus-off hanling is a different 
toppic.

> 2. enable error interrupt when state change to error warning

Guessing what's the idea behind! If the error warning interrupt works, 
there is no need to deal with bus error interrupts till the error 
warning state is reached. But to realize the error passive interrupt, we 
may need the bus errors.

> Why error warning not error active?

How the error counter decrease is described, e.g., in [1] and [2].

> Minimal 32 times consecutive Tx/Rx interrupts will change state from error passive to
> error active. So if the error condition occur again while the state is not recovered to
> error active (e.g., at error warning), then until the error condition is removed, the state
> will stay in error warning.
> 
> Hence, re-enable the error interrupt upon error warning transition allow us to report
> error passive again. And if the error condition occur again while the state is still
> error passive, we do not need to touch the bus error interrupt.
> 
> I've tested this, the state transitions works fine, and the flood of interrupts are limited
> to the number of required frame transfer according to the rule of error counting and
> state transitions defined in specification.
> 
> What do you think? If you agree with this, I can prepare a patch for review.
>

Bus error interrupts will only be enabled if the error-counters are 
between 96 and 127 to realize the state change to error passive. State 
changes to error warning and bus-off are reported via dedicated 
interrupt lines. Well, that may work even if it's tricky. But you need 
to take care of all other quirks and bus-error reporting as well. This 
may result is a rather complex patch.

[1] https://www.kvaser.com/about-can/the-can-protocol/can-error-handling/
[2] http://www.port.de/cgi-bin/CAN/CanFaqErrors

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-28  8:33               ` Wolfgang Grandegger
@ 2017-08-29  8:49                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-29  9:38                   ` Wolfgang Grandegger
                                     ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-29  8:49 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

[-- Attachment #1: Type: text/plain, Size: 9324 bytes --]

Hello Wolfgang, Hello Andri,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Monday, August 28, 2017 4:34 PM
> 
> Hello ZHU Yi,
> 
> Am 28.08.2017 um 06:21 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> > Hi Wolfgang,
> >
> >> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> >> Sent: Sunday, August 27, 2017 6:57 PM
> >>
> >> Am 22.08.2017 um 16:06 schrieb Marc Kleine-Budde:
> >>> On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
> >>>> Hello Andri,
> >>>>
> >>>>>> Which polling interval do you think would be appropriate? It
> >>>>>> somehow has to be
> >>>>> a reasonable trade-off between latency and CPU usage. That's a good
> >>>>> question. I don't think that there is a really good answer to it.
> >>>>> It would be nice to see intermittent state-transitions happening,
> >>>>> so I would say as fast as possible without affecting CPU usage to a
> >>>>> significant degree. The polling timer should not be a high
> >>>>> resolution timer or anything that would preempt an rt task.
> >>>>>
> >>>>> This value could also depend on the bitrate as that will affect the
> >>>>> rate at which the error-counter can change. E.g. on 125 k rate, you
> >>>>> will receive at most one frame per 1 ms. This means that the error
> >>>>> state will change at at least 64 ms intervals, so we would have to
> >>>>> sample at 32 ms intervals to catch all the transitions. For 1 M
> >>>>> this would be around 2 ms.
> >>>>>
> >>>>> Note that this is all somewhat speculative as I do not know exactly
> >>>>> how fast the error counters are incremented/decremented.
> >>>>
> >>>> We agree that there is a sensible lower boundary for the polling
> >>>> interval which depends on the bitrate. And there is a upper boundary
> >>>> which depends on the application.
> >>>>
> >>>> We are currently thinking about whether it would be possible to do
> >>>> the polling only in case the controller is actually experiencing
> >>>> trouble.
> >>>
> >>> Why not enable bus errors on broken controllers. Disable once a bus
> >>
> >> Yes, that's what we have done in the past for Flexcan controller cores
> >> that do not have dedicated interrupt lines for state changes. We have
> >> introduced FLEXCAN_QUIRK_BROKEN_ERR_STATE for that purpose. It
> should
> >> work already with the current driver. Yi ZHU, have you tried that?
> > Yes, I've tried that. After add FLEXCAN_QUIRK_BROKEN_ERR_STATE to i.MX6,
> > the state transitions still not correct. The reason is flexcan_irq_state() will only
> > be called upon state interrupt, but the flexcan core will not generate interrupts for
> > all state transitions.
> > (http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L769)
> 
> I see! That's a *bug*, which should be fixed first! The bus error
> interrupt reports bus errors, no state changes. It's triggered when the
> bus error counter changes, Therefore it should be possible to derive
> proper state changes from it. FLEXCAN_QUIRK_BROKEN_ERR_STATE relys on
> that.
> 
> > After enable bus error interrupt and change to call flexcan_irq_state() upon _any_
> > interrupt, then we can get correct state transitions as expected. Of course,
> interrupt
> > flooding also came as expected.
> 
> OK.
> 
> >>> error hits and re-enable them after a configurable timeout with a proper
> >>> default. I have a patch somewhere, although it applies only against the
> >>> pre rx-offload version of the driver.
> >>
> >> And that would help to avoid the interrupt flooding problem.
> > Yes, that would help. But I think if we change to above approach, then we do not
> > need to rely on a configurable timeout to avoid interrupt flooding. We can
> disable/
> > enable error interrupt depends on the state transitions:
> > 1. disable error interrupt when state change to error passive
> > Why error passive not bus off?
> 
> That's defined somewhere in the standard, but I couldn't find a good
> ref/link quickly. When the node is bus-off, it can not receive messages
> any longer.
Yes, here is the ref:

In ISO 11898-1:2015(E):
6.15 Bus-off

A node is in the bus-off state when it is switched off from the bus due to a request of FCE. In the bus-
off state, a node neither sends nor receives frames. In the bus-off state, a node does not send any
dominant bits.

> 
> > First, flexcan is able to interrupt when bus off, and the node will not generate any
> > interrupt in this state. Second, interrupt flooding happens when node is doing
> > auto-retransmission until bus off, and in case of only singled out node on the
> bus,
> > then this node cannot become bus off, hence, the flooding won't stop.
> 
> When no cable is connected, the node will never go to bus-off. Yes,
> that's normal behaviour.
> 
> > However at most 16 times consecutive bus error interrupts will change state to
> error
> > passive, so after the node in this state, we can safely disable the bus error
> interrupt.
> 
> This depends on the bus off recovery method defined by "restart-ms". If
> it's "0", the controller should not be restarted automatically,
> otherwise after the specified time. But bus-off hanling is a different
> toppic.
Agree.

> 
> > 2. enable error interrupt when state change to error warning
> 
> Guessing what's the idea behind! If the error warning interrupt works,
> there is no need to deal with bus error interrupts till the error
> warning state is reached. But to realize the error passive interrupt, we
> may need the bus errors.
> 
> > Why error warning not error active?
> 
> How the error counter decrease is described, e.g., in [1] and [2].
> 
> > Minimal 32 times consecutive Tx/Rx interrupts will change state from error
> passive to
> > error active. So if the error condition occur again while the state is not recovered
> to
> > error active (e.g., at error warning), then until the error condition is removed, the
> state
> > will stay in error warning.
> >
> > Hence, re-enable the error interrupt upon error warning transition allow us to
> report
> > error passive again. And if the error condition occur again while the state is still
> > error passive, we do not need to touch the bus error interrupt.
> >
> > I've tested this, the state transitions works fine, and the flood of interrupts are
> limited
> > to the number of required frame transfer according to the rule of error counting
> and
> > state transitions defined in specification.
> >
> > What do you think? If you agree with this, I can prepare a patch for review.
> >
> 
> Bus error interrupts will only be enabled if the error-counters are
> between 96 and 127 to realize the state change to error passive. State
> changes to error warning and bus-off are reported via dedicated
> interrupt lines. Well, that may work even if it's tricky. But you need
> to take care of all other quirks and bus-error reporting as well. This
> may result is a rather complex patch.
> 
> [1] https://www.kvaser.com/about-can/the-can-protocol/can-error-handling/
> [2] http://www.port.de/cgi-bin/CAN/CanFaqErrors
> 
Thanks for your interpretation which makes the original text understandable. :)
Attached please find the patch for review.

Patch 1 (0001-can-flexcan-mitigate-incomplete-state-interrupt.patch)
Fixes the state transition issue, but interrupt flooding may happen.

Patch 2 (0002-can-flexcan-mitigate-error-interrupt-flooding.patch)
Addresses the interrupt flooding issue as above mentioned, and takes care
of the bus-error reporting. For the other quirks, they looks like won't
influence the state transition and interrupt flooding. Please comment if
something missed.

One more thing regarding patch 2 is, it doesn't rely on error warning
can be reported by [TR]WRN_INT, because this state can also be reported
upon the _any_ interrupt. Only bus-off state needs to have dedicated
interrupt line (is this true for all flexcan core?)


These patches are only tested on i.MX6, but I guess/hope it can work
with the old core which lacks [TR]WRN_INT too.

---

> From: Andri Yngvason [mailto:andri.yngvason@marel.com]
> Sent: Monday, August 28, 2017 7:23 PM
> 
> Hi Yi,
> 
> I have users who can change the baudrate but they can not change the device
> tree. The baudrate can even be changed by service technicians. I.e. it's field
> configurable. How would I configure the timeout to work well for rates between
> 125k and 1M?
Yes, the poll once approach assumes user have access to device-tree, otherwise
a catch-all delay (based on the slowest bitrate) is required, which is less
flexible than the periodic timer.

> 
> Do you think that a periodic timer would load the system more than an IRQ-flood
> due to a bad error situation?
> 
If not the hrtimer than most likely the periodic timer will not load the system
more than IRQ flood, and we can configure the interval so it can even lower the
system load than IRQ flood. However, the error irq + suppress flooding approach
has the following benefits compare to the timer approach:
1. report state transitions in time
2. no need to handle the contention between irq and timer
3. no need to deal with how to specify the delay

What do you think?

Best regards
Yi


[-- Attachment #2: 0001-can-flexcan-mitigate-i.MX6-incomplete-state-irq.patch --]
[-- Type: application/octet-stream, Size: 1749 bytes --]

From 3025a3f6f27f6df99d1db60d3ef820efe629fd20 Mon Sep 17 00:00:00 2001
From: Zhu Yi <yi.zhu5@cn.bosch.com>
Date: Mon, 28 Aug 2017 17:29:23 +0800
Subject: [PATCH 1/2] can: flexcan: mitigate i.MX6 incomplete state irq

The i.MX6 flexcan core will not generate state interrupt for some state
transitions. This patch enables the bus error interrupt on i.MX6 by
enabling FLEXCAN_QUIRK_BROKEN_ERR_STATE and changing the call of
flexcan_irq_state() upon _any_ interrupt to mitigate this issue.

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
---
 drivers/net/can/flexcan.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 13f0f21..75bb9f6 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -288,7 +288,7 @@ static const struct flexcan_devtype_data fsl_imx28_devtype_data;
 
 static const struct flexcan_devtype_data fsl_imx6q_devtype_data = {
 	.quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS |
-		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP,
+		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP | FLEXCAN_QUIRK_BROKEN_ERR_STATE,
 };
 
 static const struct flexcan_devtype_data fsl_vf610_devtype_data = {
@@ -765,8 +765,9 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 		flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, &regs->esr);
 	}
 
-	/* state change interrupt */
-	if (reg_esr & FLEXCAN_ESR_ERR_STATE)
+	/* state change interrupt or on broken error state core */
+	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
+	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_ERR_STATE))
 		flexcan_irq_state(dev, reg_esr);
 
 	/* bus error IRQ - handle if bus error reporting is activated */
-- 
2.7.4


[-- Attachment #3: 0002-can-flexcan-mitigate-error-interrupt-flooding.patch --]
[-- Type: application/octet-stream, Size: 4557 bytes --]

From 5ca061fcb0cc8576caafdd9a0f398adf81c18ecc Mon Sep 17 00:00:00 2001
From: Zhu Yi <yi.zhu5@cn.bosch.com>
Date: Tue, 29 Aug 2017 12:19:04 +0800
Subject: [PATCH 2/2] can: flexcan: mitigate error interrupt flooding

Error interrupt flooding may happen if berr-reporting is enabled, or
the broken error state quirk fix is enabled. For example, in case
there is no cable connected, error interrupt flooding is very likely
because the node cannot go to bus off.

This patch mitigates this issue by:
1. disable error interrupt upon error passive state transition
2. re-enable error interrupt upon error warning state transition

In this way, the driver is still able to report correct state
transitions without additional latency. And flooding of error interrupts
is limited to the number of frames required to change the state to error
passive when there are bus problems.

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
---
 drivers/net/can/flexcan.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 75bb9f6..9d0e97e 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -581,7 +581,7 @@ static void flexcan_irq_bus_err(struct net_device *dev, u32 reg_esr)
 	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
 }
 
-static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
+static bool flexcan_irq_state(struct net_device *dev, u32 reg_esr)
 {
 	struct flexcan_priv *priv = netdev_priv(dev);
 	struct sk_buff *skb;
@@ -607,11 +607,11 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
 
 	/* state hasn't changed */
 	if (likely(new_state == priv->can.state))
-		return;
+		return false;
 
 	skb = alloc_can_err_skb(dev, &cf);
 	if (unlikely(!skb))
-		return;
+		return false;
 
 	can_change_state(dev, cf, tx_state, rx_state);
 
@@ -619,6 +619,7 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
 		can_bus_off(dev);
 
 	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
+	return true;
 }
 
 static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload *offload)
@@ -712,7 +713,8 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 	struct flexcan_priv *priv = netdev_priv(dev);
 	struct flexcan_regs __iomem *regs = priv->regs;
 	irqreturn_t handled = IRQ_NONE;
-	u32 reg_iflag1, reg_esr;
+	u32 reg_iflag1, reg_esr, reg_ctrl;
+	bool state_changed = false;
 
 	reg_iflag1 = flexcan_read(&regs->iflag1);
 
@@ -768,13 +770,46 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
 	/* state change interrupt or on broken error state core */
 	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
 	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_ERR_STATE))
-		flexcan_irq_state(dev, reg_esr);
+		state_changed = flexcan_irq_state(dev, reg_esr);
 
 	/* bus error IRQ - handle if bus error reporting is activated */
 	if ((reg_esr & FLEXCAN_ESR_ERR_BUS) &&
 	    (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING))
 		flexcan_irq_bus_err(dev, reg_esr);
 
+	/* disable error interrupt upon error passive state transition
+	 * to suspend bus error reporting (to avoid flooding).
+	 *
+	 * re-enable error interrupt upon error warning state transition
+	 * to resume bus error reporting and allow broken error state core
+	 * to report state correctly (again).
+	 *
+	 * availability of error interrupt among state transitions:
+	 *  +--------------------------------------------------------------+
+	 *  | +----------------------------------------------+ [stopped /  |
+	 *  | |                                              |  sleeping] -+
+	 *  +-+-> active <-> warning <-> passive -> bus off -+
+	 *        ^^^^^^^^^^^^^^^^^^^^^^^_______________________________
+	 *                enabled                      disabled
+	 */
+	if (state_changed &&
+	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_ERR_STATE ||
+	     priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)) {
+		if ((priv->can.state == CAN_STATE_ERROR_PASSIVE) ||
+		    (priv->can.state == CAN_STATE_BUS_OFF)) {
+			reg_ctrl = flexcan_read(&regs->ctrl);
+			reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;
+			flexcan_write(reg_ctrl, &regs->ctrl);
+		}
+
+		if ((priv->can.state == CAN_STATE_ERROR_WARNING) ||
+		    (priv->can.state == CAN_STATE_ERROR_ACTIVE)) {
+			reg_ctrl = flexcan_read(&regs->ctrl);
+			reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
+			flexcan_write(reg_ctrl, &regs->ctrl);
+		}
+	}
+
 	return handled;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-29  8:49                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-29  9:38                   ` Wolfgang Grandegger
  2017-08-30  1:39                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-29 11:17                   ` Wolfgang Grandegger
  2017-08-29 13:41                   ` Andri Yngvason
  2 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-29  9:38 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello ZHU Yi,

Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hello Wolfgang, Hello Andri,

Please answer to the original mail and not combine replys.

> 
>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>> Sent: Monday, August 28, 2017 4:34 PM
>>
>> Hello ZHU Yi,
>>
>> Am 28.08.2017 um 06:21 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>> Hi Wolfgang,
>>>
>>>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>>>> Sent: Sunday, August 27, 2017 6:57 PM
>>>>
>>>> Am 22.08.2017 um 16:06 schrieb Marc Kleine-Budde:
>>>>> On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
>>>>>> Hello Andri,
>>>>>>
>>>>>>>> Which polling interval do you think would be appropriate? It
>>>>>>>> somehow has to be
>>>>>>> a reasonable trade-off between latency and CPU usage. That's a good
>>>>>>> question. I don't think that there is a really good answer to it.
>>>>>>> It would be nice to see intermittent state-transitions happening,
>>>>>>> so I would say as fast as possible without affecting CPU usage to a
>>>>>>> significant degree. The polling timer should not be a high
>>>>>>> resolution timer or anything that would preempt an rt task.
>>>>>>>
>>>>>>> This value could also depend on the bitrate as that will affect the
>>>>>>> rate at which the error-counter can change. E.g. on 125 k rate, you
>>>>>>> will receive at most one frame per 1 ms. This means that the error
>>>>>>> state will change at at least 64 ms intervals, so we would have to
>>>>>>> sample at 32 ms intervals to catch all the transitions. For 1 M
>>>>>>> this would be around 2 ms.
>>>>>>>
>>>>>>> Note that this is all somewhat speculative as I do not know exactly
>>>>>>> how fast the error counters are incremented/decremented.
>>>>>>
>>>>>> We agree that there is a sensible lower boundary for the polling
>>>>>> interval which depends on the bitrate. And there is a upper boundary
>>>>>> which depends on the application.
>>>>>>
>>>>>> We are currently thinking about whether it would be possible to do
>>>>>> the polling only in case the controller is actually experiencing
>>>>>> trouble.
>>>>>
>>>>> Why not enable bus errors on broken controllers. Disable once a bus
>>>>
>>>> Yes, that's what we have done in the past for Flexcan controller cores
>>>> that do not have dedicated interrupt lines for state changes. We have
>>>> introduced FLEXCAN_QUIRK_BROKEN_ERR_STATE for that purpose. It
>> should
>>>> work already with the current driver. Yi ZHU, have you tried that?
>>> Yes, I've tried that. After add FLEXCAN_QUIRK_BROKEN_ERR_STATE to i.MX6,
>>> the state transitions still not correct. The reason is flexcan_irq_state() will only
>>> be called upon state interrupt, but the flexcan core will not generate interrupts for
>>> all state transitions.
>>> (http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L769)
>>
>> I see! That's a *bug*, which should be fixed first! The bus error
>> interrupt reports bus errors, no state changes. It's triggered when the
>> bus error counter changes, Therefore it should be possible to derive
>> proper state changes from it. FLEXCAN_QUIRK_BROKEN_ERR_STATE relys on
>> that.
>>
>>> After enable bus error interrupt and change to call flexcan_irq_state() upon _any_
>>> interrupt, then we can get correct state transitions as expected. Of course,
>> interrupt
>>> flooding also came as expected.
>>
>> OK.
>>
>>>>> error hits and re-enable them after a configurable timeout with a proper
>>>>> default. I have a patch somewhere, although it applies only against the
>>>>> pre rx-offload version of the driver.
>>>>
>>>> And that would help to avoid the interrupt flooding problem.
>>> Yes, that would help. But I think if we change to above approach, then we do not
>>> need to rely on a configurable timeout to avoid interrupt flooding. We can
>> disable/
>>> enable error interrupt depends on the state transitions:
>>> 1. disable error interrupt when state change to error passive
>>> Why error passive not bus off?
>>
>> That's defined somewhere in the standard, but I couldn't find a good
>> ref/link quickly. When the node is bus-off, it can not receive messages
>> any longer.
> Yes, here is the ref:
> 
> In ISO 11898-1:2015(E):
> 6.15 Bus-off
> 
> A node is in the bus-off state when it is switched off from the bus due to a request of FCE. In the bus-
> off state, a node neither sends nor receives frames. In the bus-off state, a node does not send any
> dominant bits.

That's not what I ment. When no cable is connected, the error count is 
not increased beyond 128 (error passive). Why? I was looking to an 
answer for that question.

> 
>>
>>> First, flexcan is able to interrupt when bus off, and the node will not generate any
>>> interrupt in this state. Second, interrupt flooding happens when node is doing
>>> auto-retransmission until bus off, and in case of only singled out node on the
>> bus,
>>> then this node cannot become bus off, hence, the flooding won't stop.
>>
>> When no cable is connected, the node will never go to bus-off. Yes,
>> that's normal behaviour.
>>
>>> However at most 16 times consecutive bus error interrupts will change state to
>> error
>>> passive, so after the node in this state, we can safely disable the bus error
>> interrupt.
>>
>> This depends on the bus off recovery method defined by "restart-ms". If
>> it's "0", the controller should not be restarted automatically,
>> otherwise after the specified time. But bus-off hanling is a different
>> toppic.
> Agree.
> 
>>
>>> 2. enable error interrupt when state change to error warning
>>
>> Guessing what's the idea behind! If the error warning interrupt works,
>> there is no need to deal with bus error interrupts till the error
>> warning state is reached. But to realize the error passive interrupt, we
>> may need the bus errors.
>>
>>> Why error warning not error active?
>>
>> How the error counter decrease is described, e.g., in [1] and [2].
>>
>>> Minimal 32 times consecutive Tx/Rx interrupts will change state from error
>> passive to
>>> error active. So if the error condition occur again while the state is not recovered
>> to
>>> error active (e.g., at error warning), then until the error condition is removed, the
>> state
>>> will stay in error warning.
>>>
>>> Hence, re-enable the error interrupt upon error warning transition allow us to
>> report
>>> error passive again. And if the error condition occur again while the state is still
>>> error passive, we do not need to touch the bus error interrupt.
>>>
>>> I've tested this, the state transitions works fine, and the flood of interrupts are
>> limited
>>> to the number of required frame transfer according to the rule of error counting
>> and
>>> state transitions defined in specification.
>>>
>>> What do you think? If you agree with this, I can prepare a patch for review.
>>>
>>
>> Bus error interrupts will only be enabled if the error-counters are
>> between 96 and 127 to realize the state change to error passive. State
>> changes to error warning and bus-off are reported via dedicated
>> interrupt lines. Well, that may work even if it's tricky. But you need
>> to take care of all other quirks and bus-error reporting as well. This
>> may result is a rather complex patch.
>>
>> [1] https://www.kvaser.com/about-can/the-can-protocol/can-error-handling/
>> [2] http://www.port.de/cgi-bin/CAN/CanFaqErrors
>>
> Thanks for your interpretation which makes the original text understandable. :)
> Attached please find the patch for review.

Please send the patches "inline" and as separate emails. This makes it 
much simpler for use to review them.

Thanks,

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-29  8:49                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-29  9:38                   ` Wolfgang Grandegger
@ 2017-08-29 11:17                   ` Wolfgang Grandegger
  2017-08-30  4:22                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-29 13:41                   ` Andri Yngvason
  2 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-29 11:17 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello,

now looking to your patches...

Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hello Wolfgang, Hello Andri,
[...]

> Attached please find the patch for review.
> 
> Patch 1 (0001-can-flexcan-mitigate-incomplete-state-interrupt.patch)
> Fixes the state transition issue, but interrupt flooding may happen.

> From 3025a3f6f27f6df99d1db60d3ef820efe629fd20 Mon Sep 17 00:00:00 2001
> From: Zhu Yi <yi.zhu5@cn.bosch.com>
> Date: Mon, 28 Aug 2017 17:29:23 +0800
> Subject: [PATCH 1/2] can: flexcan: mitigate i.MX6 incomplete state irq
> 
> The i.MX6 flexcan core will not generate state interrupt for some state
> transitions. This patch enables the bus error interrupt on i.MX6 by
> enabling FLEXCAN_QUIRK_BROKEN_ERR_STATE and changing the call of
> flexcan_irq_state() upon _any_ interrupt to mitigate this issue.
> 
> Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
> ---
>  drivers/net/can/flexcan.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index 13f0f21..75bb9f6 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -288,7 +288,7 @@ static const struct flexcan_devtype_data fsl_imx28_devtype_data;
>  
>  static const struct flexcan_devtype_data fsl_imx6q_devtype_data = {
>  	.quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS |
> -		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP,
> +		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP | FLEXCAN_QUIRK_BROKEN_ERR_STATE,
>  };

This patch should just fix the regression bug... without using 
FLEXCAN_QUIRK_BROKEN_ERR_STATE for the imx6q.

>  static const struct flexcan_devtype_data fsl_vf610_devtype_data = {
> @@ -765,8 +765,9 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>  		flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, &regs->esr);
>  	}
>  
> -	/* state change interrupt */
> -	if (reg_esr & FLEXCAN_ESR_ERR_STATE)
> +	/* state change interrupt or on broken error state core */
> +	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
> +	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_ERR_STATE))
>  		flexcan_irq_state(dev, reg_esr);
>  
>  	/* bus error IRQ - handle if bus error reporting is activated */
> -- 
> 2.7.4

Looks good!

> Patch 2 (0002-can-flexcan-mitigate-error-interrupt-flooding.patch)

> From 5ca061fcb0cc8576caafdd9a0f398adf81c18ecc Mon Sep 17 00:00:00 2001
> From: Zhu Yi <yi.zhu5@cn.bosch.com>
> Date: Tue, 29 Aug 2017 12:19:04 +0800
> Subject: [PATCH 2/2] can: flexcan: mitigate error interrupt flooding
> 
> Error interrupt flooding may happen if berr-reporting is enabled, or
> the broken error state quirk fix is enabled. For example, in case
> there is no cable connected, error interrupt flooding is very likely
> because the node cannot go to bus off.
> 
> This patch mitigates this issue by:
> 1. disable error interrupt upon error passive state transition
> 2. re-enable error interrupt upon error warning state transition
> 
> In this way, the driver is still able to report correct state
> transitions without additional latency. And flooding of error interrupts
> is limited to the number of frames required to change the state to error
> passive when there are bus problems.
> 
> Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
> ---
>  drivers/net/can/flexcan.c | 45 ++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 40 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index 75bb9f6..9d0e97e 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -581,7 +581,7 @@ static void flexcan_irq_bus_err(struct net_device *dev, u32 reg_esr)
>  	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
>  }
>  
> -static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> +static bool flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>  {
>  	struct flexcan_priv *priv = netdev_priv(dev);
>  	struct sk_buff *skb;
> @@ -607,11 +607,11 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>  
>  	/* state hasn't changed */
>  	if (likely(new_state == priv->can.state))
> -		return;
> +		return false;
>  
>  	skb = alloc_can_err_skb(dev, &cf);
>  	if (unlikely(!skb))
> -		return;
> +		return false;
>  
>  	can_change_state(dev, cf, tx_state, rx_state);
>  
> @@ -619,6 +619,7 @@ static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>  		can_bus_off(dev);
>  
>  	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
> +	return true;
>  }
>  
>  static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload *offload)
> @@ -712,7 +713,8 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>  	struct flexcan_priv *priv = netdev_priv(dev);
>  	struct flexcan_regs __iomem *regs = priv->regs;
>  	irqreturn_t handled = IRQ_NONE;
> -	u32 reg_iflag1, reg_esr;
> +	u32 reg_iflag1, reg_esr, reg_ctrl;
> +	bool state_changed = false;
>  
>  	reg_iflag1 = flexcan_read(&regs->iflag1);
>  
> @@ -768,13 +770,46 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>  	/* state change interrupt or on broken error state core */
>  	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
>  	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_ERR_STATE))
> -		flexcan_irq_state(dev, reg_esr);
> +		state_changed = flexcan_irq_state(dev, reg_esr);
>  
>  	/* bus error IRQ - handle if bus error reporting is activated */
>  	if ((reg_esr & FLEXCAN_ESR_ERR_BUS) &&
>  	    (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING))
>  		flexcan_irq_bus_err(dev, reg_esr);
>  
> +	/* disable error interrupt upon error passive state transition
> +	 * to suspend bus error reporting (to avoid flooding).
> +	 *
> +	 * re-enable error interrupt upon error warning state transition
> +	 * to resume bus error reporting and allow broken error state core
> +	 * to report state correctly (again).
> +	 *
> +	 * availability of error interrupt among state transitions:
> +	 *  +--------------------------------------------------------------+
> +	 *  | +----------------------------------------------+ [stopped /  |
> +	 *  | |                                              |  sleeping] -+
> +	 *  +-+-> active <-> warning <-> passive -> bus off -+
> +	 *        ^^^^^^^^^^^^^^^^^^^^^^^_______________________________
> +	 *                enabled                      disabled
> +	 */
> +	if (state_changed &&
> +	    (priv->devtype_data->quirks & FLEXCAN_QUIRK_BROKEN_ERR_STATE ||
> +	     priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)) {
> +		if ((priv->can.state == CAN_STATE_ERROR_PASSIVE) ||
> +		    (priv->can.state == CAN_STATE_BUS_OFF)) {
> +			reg_ctrl = flexcan_read(&regs->ctrl);
> +			reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;
> +			flexcan_write(reg_ctrl, &regs->ctrl);
> +		}

We do not want to disable bus error interrupts for bus error reporting!

> +		if ((priv->can.state == CAN_STATE_ERROR_WARNING) ||
> +		    (priv->can.state == CAN_STATE_ERROR_ACTIVE)) {
> +			reg_ctrl = flexcan_read(&regs->ctrl);
> +			reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
> +			flexcan_write(reg_ctrl, &regs->ctrl);
> +		}
> +	}
> +

For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to disable 
bus error interrupts as well. To handle that properly, I think we need 
an additional FLEXCAN_QUIRK_BROKEN_PERR_STATE and maybe rename 
FLEXCAN_QUIRK_BROKEN_ERR_STATE to FLEXCAN_QUIRK_BROKEN_WERR_STATE.

>  	return handled;
>  }
>  
> -- 
> 2.7.4


> Addresses the interrupt flooding issue as above mentioned, and takes care
> of the bus-error reporting. For the other quirks, they looks like won't
> influence the state transition and interrupt flooding. Please comment if
> something missed.
> 
> One more thing regarding patch 2 is, it doesn't rely on error warning
> can be reported by [TR]WRN_INT, because this state can also be reported
> upon the _any_ interrupt. Only bus-off state needs to have dedicated
> interrupt line (is this true for all flexcan core?)

I don't think so. How should it work if no cable is connected? Anyway, 
the patch seems not to be to lenghty.

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-29  8:49                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-29  9:38                   ` Wolfgang Grandegger
  2017-08-29 11:17                   ` Wolfgang Grandegger
@ 2017-08-29 13:41                   ` Andri Yngvason
  2 siblings, 0 replies; 28+ messages in thread
From: Andri Yngvason @ 2017-08-29 13:41 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu),
	Wolfgang Grandegger, Marc Kleine-Budde, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hi Yi,

Quoting ZHU Yi (ST-FIR/ENG1-Zhu) (2017-08-29 08:49:46)
> > From: Andri Yngvason [mailto:andri.yngvason@marel.com]
> > Sent: Monday, August 28, 2017 7:23 PM
...
> > I have users who can change the baudrate but they can not change the device
> > tree. The baudrate can even be changed by service technicians. I.e. it's field
> > configurable. How would I configure the timeout to work well for rates between
> > 125k and 1M?
> Yes, the poll once approach assumes user have access to device-tree, otherwise
> a catch-all delay (based on the slowest bitrate) is required, which is less
> flexible than the periodic timer.
Actually, if it depends on the bitrate, then it would be easiest for everyone to
just set it accordingly in the code based on the bitrate without involving dt.
Anyway, that doesn't really matter, since it seems that we're going with the
"irq + suppress flooding" approach.

> 
> > 
> > Do you think that a periodic timer would load the system more than an IRQ-flood
> > due to a bad error situation?
> > 
> If not the hrtimer than most likely the periodic timer will not load the system
> more than IRQ flood, and we can configure the interval so it can even lower the
> system load than IRQ flood. However, the error irq + suppress flooding approach
> has the following benefits compare to the timer approach:
> 1. report state transitions in time
> 2. no need to handle the contention between irq and timer
> 3. no need to deal with how to specify the delay
OK, good enough.

Regards,
Andri

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-29  9:38                   ` Wolfgang Grandegger
@ 2017-08-30  1:39                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 0 replies; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-30  1:39 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Tuesday, August 29, 2017 5:38 PM
> 
> Hello ZHU Yi,
> 
> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> > Hello Wolfgang, Hello Andri,
> 
> Please answer to the original mail and not combine replys.
OK, sorry for the squash!

> 
> >
> >> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> >> Sent: Monday, August 28, 2017 4:34 PM
> >>
> >> Hello ZHU Yi,
> >>
> >> Am 28.08.2017 um 06:21 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> >>> Hi Wolfgang,
> >>>
> >>>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> >>>> Sent: Sunday, August 27, 2017 6:57 PM
> >>>>
> >>>> Am 22.08.2017 um 16:06 schrieb Marc Kleine-Budde:
> >>>>> On 08/22/2017 03:50 PM, Jonas Mark (ST-FIR/ENG1) wrote:
> >>>>>> Hello Andri,
> >>>>>>
> >>>>>>>> Which polling interval do you think would be appropriate? It
> >>>>>>>> somehow has to be
> >>>>>>> a reasonable trade-off between latency and CPU usage. That's a good
> >>>>>>> question. I don't think that there is a really good answer to it.
> >>>>>>> It would be nice to see intermittent state-transitions happening,
> >>>>>>> so I would say as fast as possible without affecting CPU usage to a
> >>>>>>> significant degree. The polling timer should not be a high
> >>>>>>> resolution timer or anything that would preempt an rt task.
> >>>>>>>
> >>>>>>> This value could also depend on the bitrate as that will affect the
> >>>>>>> rate at which the error-counter can change. E.g. on 125 k rate, you
> >>>>>>> will receive at most one frame per 1 ms. This means that the error
> >>>>>>> state will change at at least 64 ms intervals, so we would have to
> >>>>>>> sample at 32 ms intervals to catch all the transitions. For 1 M
> >>>>>>> this would be around 2 ms.
> >>>>>>>
> >>>>>>> Note that this is all somewhat speculative as I do not know exactly
> >>>>>>> how fast the error counters are incremented/decremented.
> >>>>>>
> >>>>>> We agree that there is a sensible lower boundary for the polling
> >>>>>> interval which depends on the bitrate. And there is a upper boundary
> >>>>>> which depends on the application.
> >>>>>>
> >>>>>> We are currently thinking about whether it would be possible to do
> >>>>>> the polling only in case the controller is actually experiencing
> >>>>>> trouble.
> >>>>>
> >>>>> Why not enable bus errors on broken controllers. Disable once a bus
> >>>>
> >>>> Yes, that's what we have done in the past for Flexcan controller cores
> >>>> that do not have dedicated interrupt lines for state changes. We have
> >>>> introduced FLEXCAN_QUIRK_BROKEN_ERR_STATE for that purpose. It
> >> should
> >>>> work already with the current driver. Yi ZHU, have you tried that?
> >>> Yes, I've tried that. After add FLEXCAN_QUIRK_BROKEN_ERR_STATE to
> i.MX6,
> >>> the state transitions still not correct. The reason is flexcan_irq_state() will only
> >>> be called upon state interrupt, but the flexcan core will not generate interrupts
> for
> >>> all state transitions.
> >>> (http://elixir.free-
> electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L769)
> >>
> >> I see! That's a *bug*, which should be fixed first! The bus error
> >> interrupt reports bus errors, no state changes. It's triggered when the
> >> bus error counter changes, Therefore it should be possible to derive
> >> proper state changes from it. FLEXCAN_QUIRK_BROKEN_ERR_STATE relys
> on
> >> that.
> >>
> >>> After enable bus error interrupt and change to call flexcan_irq_state() upon
> _any_
> >>> interrupt, then we can get correct state transitions as expected. Of course,
> >> interrupt
> >>> flooding also came as expected.
> >>
> >> OK.
> >>
> >>>>> error hits and re-enable them after a configurable timeout with a proper
> >>>>> default. I have a patch somewhere, although it applies only against the
> >>>>> pre rx-offload version of the driver.
> >>>>
> >>>> And that would help to avoid the interrupt flooding problem.
> >>> Yes, that would help. But I think if we change to above approach, then we do
> not
> >>> need to rely on a configurable timeout to avoid interrupt flooding. We can
> >> disable/
> >>> enable error interrupt depends on the state transitions:
> >>> 1. disable error interrupt when state change to error passive
> >>> Why error passive not bus off?
> >>
> >> That's defined somewhere in the standard, but I couldn't find a good
> >> ref/link quickly. When the node is bus-off, it can not receive messages
> >> any longer.
> > Yes, here is the ref:
> >
> > In ISO 11898-1:2015(E):
> > 6.15 Bus-off
> >
> > A node is in the bus-off state when it is switched off from the bus due to a
> request of FCE. In the bus-
> > off state, a node neither sends nor receives frames. In the bus-off state, a node
> does not send any
> > dominant bits.
> 
> That's not what I ment. When no cable is connected, the error count is
> not increased beyond 128 (error passive). Why? I was looking to an
> answer for that question.
Ah, here is the ref:

In ISO 11898-1:2015(E):
```
12.1.5 Network start-up

If during network start-up only one node is online and if this node transmits some frame, it will not
get an ACK. In this case, it shall detect an error and repeat the frame. According to rule c) in 12.1.4.2,
exception 1, it may become error-passive but shall not go bus-off.
```

And,

```
12.1.4.2 Error counting
...
c) When a transmitter sends an error flag, the transmit error counter shall be incremented by 8.
Exception 1:
If the transmitter is error-passive and detects an ACK error because of not detecting a dominant
ACK and does not detect a dominant bit while sending its passive error flag.
...
In exception 1 and in exception 2, the transmit error counter remains unchanged.
```

> 
> >
> >>
> >>> First, flexcan is able to interrupt when bus off, and the node will not generate
> any
> >>> interrupt in this state. Second, interrupt flooding happens when node is doing
> >>> auto-retransmission until bus off, and in case of only singled out node on the
> >> bus,
> >>> then this node cannot become bus off, hence, the flooding won't stop.
> >>
> >> When no cable is connected, the node will never go to bus-off. Yes,
> >> that's normal behaviour.
> >>
> >>> However at most 16 times consecutive bus error interrupts will change state to
> >> error
> >>> passive, so after the node in this state, we can safely disable the bus error
> >> interrupt.
> >>
> >> This depends on the bus off recovery method defined by "restart-ms". If
> >> it's "0", the controller should not be restarted automatically,
> >> otherwise after the specified time. But bus-off hanling is a different
> >> toppic.
> > Agree.
> >
> >>
> >>> 2. enable error interrupt when state change to error warning
> >>
> >> Guessing what's the idea behind! If the error warning interrupt works,
> >> there is no need to deal with bus error interrupts till the error
> >> warning state is reached. But to realize the error passive interrupt, we
> >> may need the bus errors.
> >>
> >>> Why error warning not error active?
> >>
> >> How the error counter decrease is described, e.g., in [1] and [2].
> >>
> >>> Minimal 32 times consecutive Tx/Rx interrupts will change state from error
> >> passive to
> >>> error active. So if the error condition occur again while the state is not
> recovered
> >> to
> >>> error active (e.g., at error warning), then until the error condition is removed,
> the
> >> state
> >>> will stay in error warning.
> >>>
> >>> Hence, re-enable the error interrupt upon error warning transition allow us to
> >> report
> >>> error passive again. And if the error condition occur again while the state is
> still
> >>> error passive, we do not need to touch the bus error interrupt.
> >>>
> >>> I've tested this, the state transitions works fine, and the flood of interrupts are
> >> limited
> >>> to the number of required frame transfer according to the rule of error
> counting
> >> and
> >>> state transitions defined in specification.
> >>>
> >>> What do you think? If you agree with this, I can prepare a patch for review.
> >>>
> >>
> >> Bus error interrupts will only be enabled if the error-counters are
> >> between 96 and 127 to realize the state change to error passive. State
> >> changes to error warning and bus-off are reported via dedicated
> >> interrupt lines. Well, that may work even if it's tricky. But you need
> >> to take care of all other quirks and bus-error reporting as well. This
> >> may result is a rather complex patch.
> >>
> >> [1] https://www.kvaser.com/about-can/the-can-protocol/can-error-handling/
> >> [2] http://www.port.de/cgi-bin/CAN/CanFaqErrors
> >>
> > Thanks for your interpretation which makes the original text understandable. :)
> > Attached please find the patch for review.
> 
> Please send the patches "inline" and as separate emails. This makes it
> much simpler for use to review them.
OK. I saw your comments in this thread after I sent out separate emails yesterday,
sorry for that I did not check the newest response before sending, please ignore them.

Best regards
Yi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-29 11:17                   ` Wolfgang Grandegger
@ 2017-08-30  4:22                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-30  6:25                       ` Wolfgang Grandegger
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-30  4:22 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Tuesday, August 29, 2017 7:17 PM
> 
> Hello,
> 
> now looking to your patches...
> 
> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> > Hello Wolfgang, Hello Andri,
> [...]
> 
> > Attached please find the patch for review.
> >
> > Patch 1 (0001-can-flexcan-mitigate-incomplete-state-interrupt.patch)
> > Fixes the state transition issue, but interrupt flooding may happen.
> 
> > From 3025a3f6f27f6df99d1db60d3ef820efe629fd20 Mon Sep 17 00:00:00 2001
> > From: Zhu Yi <yi.zhu5@cn.bosch.com>
> > Date: Mon, 28 Aug 2017 17:29:23 +0800
> > Subject: [PATCH 1/2] can: flexcan: mitigate i.MX6 incomplete state irq
> >
> > The i.MX6 flexcan core will not generate state interrupt for some state
> > transitions. This patch enables the bus error interrupt on i.MX6 by
> > enabling FLEXCAN_QUIRK_BROKEN_ERR_STATE and changing the call of
> > flexcan_irq_state() upon _any_ interrupt to mitigate this issue.
> >
> > Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
> > Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
> > ---
> >  drivers/net/can/flexcan.c | 7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> > index 13f0f21..75bb9f6 100644
> > --- a/drivers/net/can/flexcan.c
> > +++ b/drivers/net/can/flexcan.c
> > @@ -288,7 +288,7 @@ static const struct flexcan_devtype_data
> fsl_imx28_devtype_data;
> >
> >  static const struct flexcan_devtype_data fsl_imx6q_devtype_data = {
> >  	.quirks = FLEXCAN_QUIRK_DISABLE_RXFG |
> FLEXCAN_QUIRK_ENABLE_EACEN_RRS |
> > -		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP,
> > +		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP |
> FLEXCAN_QUIRK_BROKEN_ERR_STATE,
> >  };
> 
> This patch should just fix the regression bug... without using
> FLEXCAN_QUIRK_BROKEN_ERR_STATE for the imx6q.
OK. I'll create a separate patch to enable the quirk for imx6q.

> 
> >  static const struct flexcan_devtype_data fsl_vf610_devtype_data = {
> > @@ -765,8 +765,9 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
> >  		flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, &regs->esr);
> >  	}
> >
> > -	/* state change interrupt */
> > -	if (reg_esr & FLEXCAN_ESR_ERR_STATE)
> > +	/* state change interrupt or on broken error state core */
> > +	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
> > +	    (priv->devtype_data->quirks &
> FLEXCAN_QUIRK_BROKEN_ERR_STATE))
> >  		flexcan_irq_state(dev, reg_esr);
> >
> >  	/* bus error IRQ - handle if bus error reporting is activated */
> > --
> > 2.7.4
> 
> Looks good!
> 
> > Patch 2 (0002-can-flexcan-mitigate-error-interrupt-flooding.patch)
> 
> > From 5ca061fcb0cc8576caafdd9a0f398adf81c18ecc Mon Sep 17 00:00:00 2001
> > From: Zhu Yi <yi.zhu5@cn.bosch.com>
> > Date: Tue, 29 Aug 2017 12:19:04 +0800
> > Subject: [PATCH 2/2] can: flexcan: mitigate error interrupt flooding
> >
> > Error interrupt flooding may happen if berr-reporting is enabled, or
> > the broken error state quirk fix is enabled. For example, in case
> > there is no cable connected, error interrupt flooding is very likely
> > because the node cannot go to bus off.
> >
> > This patch mitigates this issue by:
> > 1. disable error interrupt upon error passive state transition
> > 2. re-enable error interrupt upon error warning state transition
> >
> > In this way, the driver is still able to report correct state
> > transitions without additional latency. And flooding of error interrupts
> > is limited to the number of frames required to change the state to error
> > passive when there are bus problems.
> >
> > Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
> > Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
> > ---
> >  drivers/net/can/flexcan.c | 45
> ++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 40 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> > index 75bb9f6..9d0e97e 100644
> > --- a/drivers/net/can/flexcan.c
> > +++ b/drivers/net/can/flexcan.c
> > @@ -581,7 +581,7 @@ static void flexcan_irq_bus_err(struct net_device *dev,
> u32 reg_esr)
> >  	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
> >  }
> >
> > -static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> > +static bool flexcan_irq_state(struct net_device *dev, u32 reg_esr)
> >  {
> >  	struct flexcan_priv *priv = netdev_priv(dev);
> >  	struct sk_buff *skb;
> > @@ -607,11 +607,11 @@ static void flexcan_irq_state(struct net_device *dev,
> u32 reg_esr)
> >
> >  	/* state hasn't changed */
> >  	if (likely(new_state == priv->can.state))
> > -		return;
> > +		return false;
> >
> >  	skb = alloc_can_err_skb(dev, &cf);
> >  	if (unlikely(!skb))
> > -		return;
> > +		return false;
> >
> >  	can_change_state(dev, cf, tx_state, rx_state);
> >
> > @@ -619,6 +619,7 @@ static void flexcan_irq_state(struct net_device *dev, u32
> reg_esr)
> >  		can_bus_off(dev);
> >
> >  	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
> > +	return true;
> >  }
> >
> >  static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload
> *offload)
> > @@ -712,7 +713,8 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
> >  	struct flexcan_priv *priv = netdev_priv(dev);
> >  	struct flexcan_regs __iomem *regs = priv->regs;
> >  	irqreturn_t handled = IRQ_NONE;
> > -	u32 reg_iflag1, reg_esr;
> > +	u32 reg_iflag1, reg_esr, reg_ctrl;
> > +	bool state_changed = false;
> >
> >  	reg_iflag1 = flexcan_read(&regs->iflag1);
> >
> > @@ -768,13 +770,46 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
> >  	/* state change interrupt or on broken error state core */
> >  	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
> >  	    (priv->devtype_data->quirks &
> FLEXCAN_QUIRK_BROKEN_ERR_STATE))
> > -		flexcan_irq_state(dev, reg_esr);
> > +		state_changed = flexcan_irq_state(dev, reg_esr);
> >
> >  	/* bus error IRQ - handle if bus error reporting is activated */
> >  	if ((reg_esr & FLEXCAN_ESR_ERR_BUS) &&
> >  	    (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING))
> >  		flexcan_irq_bus_err(dev, reg_esr);
> >
> > +	/* disable error interrupt upon error passive state transition
> > +	 * to suspend bus error reporting (to avoid flooding).
> > +	 *
> > +	 * re-enable error interrupt upon error warning state transition
> > +	 * to resume bus error reporting and allow broken error state core
> > +	 * to report state correctly (again).
> > +	 *
> > +	 * availability of error interrupt among state transitions:
> > +	 *  +--------------------------------------------------------------+
> > +	 *  | +----------------------------------------------+ [stopped /  |
> > +	 *  | |                                              |  sleeping] -+
> > +	 *  +-+-> active <-> warning <-> passive -> bus off -+
> > +	 *        ^^^^^^^^^^^^^^^^^^^^^^^_______________________________
> > +	 *                enabled                      disabled
> > +	 */
> > +	if (state_changed &&
> > +	    (priv->devtype_data->quirks &
> FLEXCAN_QUIRK_BROKEN_ERR_STATE ||
> > +	     priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)) {
> > +		if ((priv->can.state == CAN_STATE_ERROR_PASSIVE) ||
> > +		    (priv->can.state == CAN_STATE_BUS_OFF)) {
> > +			reg_ctrl = flexcan_read(&regs->ctrl);
> > +			reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;
> > +			flexcan_write(reg_ctrl, &regs->ctrl);
> > +		}
> 
> We do not want to disable bus error interrupts for bus error reporting!
I see! If user enabled bus error reporting, that means he want all bus errors,
include flooding. I will fix this.

> 
> > +		if ((priv->can.state == CAN_STATE_ERROR_WARNING) ||
> > +		    (priv->can.state == CAN_STATE_ERROR_ACTIVE)) {
> > +			reg_ctrl = flexcan_read(&regs->ctrl);
> > +			reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
> > +			flexcan_write(reg_ctrl, &regs->ctrl);
> > +		}
> > +	}
> > +
> 
> For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to
> disable
> bus error interrupts as well. To handle that properly, I think we need
I'm not sure why we do not want to disable error interrupt for legacy broken core,
I think if this approach works for both (with/without warning irq) core, then disable
the interrupts helps avoid flooding.

> an additional FLEXCAN_QUIRK_BROKEN_PERR_STATE and maybe rename
> FLEXCAN_QUIRK_BROKEN_ERR_STATE to
> FLEXCAN_QUIRK_BROKEN_WERR_STATE.
That's a good point. With FLEXCAN_QUIRK_BROKEN_WERR_STATE and
FLEXCAN_QUIRK_BROKEN_PERR_STATE, we can precisely tell which core
lacks of which capability.

However, I'm wondering is there a flexcan core which lacking warning irq,
but having passive irq? If this is not true, then we will never specify the
WERR_STATE quirk alone, right?

> 
> >  	return handled;
> >  }
> >
> > --
> > 2.7.4
> 
> 
> > Addresses the interrupt flooding issue as above mentioned, and takes care
> > of the bus-error reporting. For the other quirks, they looks like won't
> > influence the state transition and interrupt flooding. Please comment if
> > something missed.
> >
> > One more thing regarding patch 2 is, it doesn't rely on error warning
> > can be reported by [TR]WRN_INT, because this state can also be reported
> > upon the _any_ interrupt. Only bus-off state needs to have dedicated
> > interrupt line (is this true for all flexcan core?)
> 
> I don't think so. How should it work if no cable is connected? Anyway,
> the patch seems not to be to lenghty.
I disabled the [TR]WRN_INT on i.MX6 by change
(http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L88)
to:
#define FLEXCAN_CTRL_ERR_STATE FLEXCAN_CTRL_BOFF_MSK
and tested with the patch, seems the state transitions still works fine.

I think the reason is the same as how the error interrupt helps report error passive:
At most 12 times consecutive bus error interrupts will change to error warning, and
minimal 1 time Tx/Rx interrupt will change state from error passive to error warning.

If we change to:
#define FLEXCAN_CTRL_ERR_STATE 0
then we cannot enter bus off state any more as expected (other transitions still ok).

As we only have i.MX6 at hand, so we cannot verify this on other flexcan core.
If we want to limit the patch to only take effect for i.MX6, then we should introduce
the FLEXCAN_QUIRK_BROKEN_P(W)ERR_STATE, thus the legacy core will not be
affected.

Otherwise, the legacy FLEXCAN_QUIRK_BORKEN_ERR_STATE can be extended
to the core with broken warning and/or passive irq.

What do you think?

I'll prepare patches for review later:
1. fix regression bug
2. suppress interrupt flooding for broken core
3. enable quirk for i.mx6

Shall I send separate e-mail for each patch? Due to our mail client is outlook, I'm
not sure whether the inline patch can work smoothly, if not, shall I attach the patch?

Thanks!

Best regards
Yi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-30  4:22                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-30  6:25                       ` Wolfgang Grandegger
  2017-08-30  6:50                         ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-30  6:25 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)



Am 30.08.2017 um 06:22 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hello Wolfgang,
> 
>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>> Sent: Tuesday, August 29, 2017 7:17 PM
>>
>> Hello,
>>
>> now looking to your patches...
>>
>> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>> Hello Wolfgang, Hello Andri,
>> [...]
>>
>>> Attached please find the patch for review.
>>>
>>> Patch 1 (0001-can-flexcan-mitigate-incomplete-state-interrupt.patch)
>>> Fixes the state transition issue, but interrupt flooding may happen.
>>
>>>  From 3025a3f6f27f6df99d1db60d3ef820efe629fd20 Mon Sep 17 00:00:00 2001
>>> From: Zhu Yi <yi.zhu5@cn.bosch.com>
>>> Date: Mon, 28 Aug 2017 17:29:23 +0800
>>> Subject: [PATCH 1/2] can: flexcan: mitigate i.MX6 incomplete state irq
>>>
>>> The i.MX6 flexcan core will not generate state interrupt for some state
>>> transitions. This patch enables the bus error interrupt on i.MX6 by
>>> enabling FLEXCAN_QUIRK_BROKEN_ERR_STATE and changing the call of
>>> flexcan_irq_state() upon _any_ interrupt to mitigate this issue.
>>>
>>> Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
>>> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
>>> ---
>>>   drivers/net/can/flexcan.c | 7 ++++---
>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
>>> index 13f0f21..75bb9f6 100644
>>> --- a/drivers/net/can/flexcan.c
>>> +++ b/drivers/net/can/flexcan.c
>>> @@ -288,7 +288,7 @@ static const struct flexcan_devtype_data
>> fsl_imx28_devtype_data;
>>>
>>>   static const struct flexcan_devtype_data fsl_imx6q_devtype_data = {
>>>   	.quirks = FLEXCAN_QUIRK_DISABLE_RXFG |
>> FLEXCAN_QUIRK_ENABLE_EACEN_RRS |
>>> -		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP,
>>> +		FLEXCAN_QUIRK_USE_OFF_TIMESTAMP |
>> FLEXCAN_QUIRK_BROKEN_ERR_STATE,
>>>   };
>>
>> This patch should just fix the regression bug... without using
>> FLEXCAN_QUIRK_BROKEN_ERR_STATE for the imx6q.
> OK. I'll create a separate patch to enable the quirk for imx6q.
> 
>>
>>>   static const struct flexcan_devtype_data fsl_vf610_devtype_data = {
>>> @@ -765,8 +765,9 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>>>   		flexcan_write(reg_esr & FLEXCAN_ESR_ALL_INT, &regs->esr);
>>>   	}
>>>
>>> -	/* state change interrupt */
>>> -	if (reg_esr & FLEXCAN_ESR_ERR_STATE)
>>> +	/* state change interrupt or on broken error state core */
>>> +	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
>>> +	    (priv->devtype_data->quirks &
>> FLEXCAN_QUIRK_BROKEN_ERR_STATE))
>>>   		flexcan_irq_state(dev, reg_esr);
>>>
>>>   	/* bus error IRQ - handle if bus error reporting is activated */
>>> --
>>> 2.7.4
>>
>> Looks good!
>>
>>> Patch 2 (0002-can-flexcan-mitigate-error-interrupt-flooding.patch)
>>
>>>  From 5ca061fcb0cc8576caafdd9a0f398adf81c18ecc Mon Sep 17 00:00:00 2001
>>> From: Zhu Yi <yi.zhu5@cn.bosch.com>
>>> Date: Tue, 29 Aug 2017 12:19:04 +0800
>>> Subject: [PATCH 2/2] can: flexcan: mitigate error interrupt flooding
>>>
>>> Error interrupt flooding may happen if berr-reporting is enabled, or
>>> the broken error state quirk fix is enabled. For example, in case
>>> there is no cable connected, error interrupt flooding is very likely
>>> because the node cannot go to bus off.
>>>
>>> This patch mitigates this issue by:
>>> 1. disable error interrupt upon error passive state transition
>>> 2. re-enable error interrupt upon error warning state transition
>>>
>>> In this way, the driver is still able to report correct state
>>> transitions without additional latency. And flooding of error interrupts
>>> is limited to the number of frames required to change the state to error
>>> passive when there are bus problems.
>>>
>>> Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
>>> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
>>> ---
>>>   drivers/net/can/flexcan.c | 45
>> ++++++++++++++++++++++++++++++++++++++++-----
>>>   1 file changed, 40 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
>>> index 75bb9f6..9d0e97e 100644
>>> --- a/drivers/net/can/flexcan.c
>>> +++ b/drivers/net/can/flexcan.c
>>> @@ -581,7 +581,7 @@ static void flexcan_irq_bus_err(struct net_device *dev,
>> u32 reg_esr)
>>>   	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
>>>   }
>>>
>>> -static void flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>>> +static bool flexcan_irq_state(struct net_device *dev, u32 reg_esr)
>>>   {
>>>   	struct flexcan_priv *priv = netdev_priv(dev);
>>>   	struct sk_buff *skb;
>>> @@ -607,11 +607,11 @@ static void flexcan_irq_state(struct net_device *dev,
>> u32 reg_esr)
>>>
>>>   	/* state hasn't changed */
>>>   	if (likely(new_state == priv->can.state))
>>> -		return;
>>> +		return false;
>>>
>>>   	skb = alloc_can_err_skb(dev, &cf);
>>>   	if (unlikely(!skb))
>>> -		return;
>>> +		return false;
>>>
>>>   	can_change_state(dev, cf, tx_state, rx_state);
>>>
>>> @@ -619,6 +619,7 @@ static void flexcan_irq_state(struct net_device *dev, u32
>> reg_esr)
>>>   		can_bus_off(dev);
>>>
>>>   	can_rx_offload_irq_queue_err_skb(&priv->offload, skb);
>>> +	return true;
>>>   }
>>>
>>>   static inline struct flexcan_priv *rx_offload_to_priv(struct can_rx_offload
>> *offload)
>>> @@ -712,7 +713,8 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>>>   	struct flexcan_priv *priv = netdev_priv(dev);
>>>   	struct flexcan_regs __iomem *regs = priv->regs;
>>>   	irqreturn_t handled = IRQ_NONE;
>>> -	u32 reg_iflag1, reg_esr;
>>> +	u32 reg_iflag1, reg_esr, reg_ctrl;
>>> +	bool state_changed = false;
>>>
>>>   	reg_iflag1 = flexcan_read(&regs->iflag1);
>>>
>>> @@ -768,13 +770,46 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>>>   	/* state change interrupt or on broken error state core */
>>>   	if ((reg_esr & FLEXCAN_ESR_ERR_STATE) ||
>>>   	    (priv->devtype_data->quirks &
>> FLEXCAN_QUIRK_BROKEN_ERR_STATE))
>>> -		flexcan_irq_state(dev, reg_esr);
>>> +		state_changed = flexcan_irq_state(dev, reg_esr);
>>>
>>>   	/* bus error IRQ - handle if bus error reporting is activated */
>>>   	if ((reg_esr & FLEXCAN_ESR_ERR_BUS) &&
>>>   	    (priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING))
>>>   		flexcan_irq_bus_err(dev, reg_esr);
>>>
>>> +	/* disable error interrupt upon error passive state transition
>>> +	 * to suspend bus error reporting (to avoid flooding).
>>> +	 *
>>> +	 * re-enable error interrupt upon error warning state transition
>>> +	 * to resume bus error reporting and allow broken error state core
>>> +	 * to report state correctly (again).
>>> +	 *
>>> +	 * availability of error interrupt among state transitions:
>>> +	 *  +--------------------------------------------------------------+
>>> +	 *  | +----------------------------------------------+ [stopped /  |
>>> +	 *  | |                                              |  sleeping] -+
>>> +	 *  +-+-> active <-> warning <-> passive -> bus off -+
>>> +	 *        ^^^^^^^^^^^^^^^^^^^^^^^_______________________________
>>> +	 *                enabled                      disabled
>>> +	 */
>>> +	if (state_changed &&
>>> +	    (priv->devtype_data->quirks &
>> FLEXCAN_QUIRK_BROKEN_ERR_STATE ||
>>> +	     priv->can.ctrlmode & CAN_CTRLMODE_BERR_REPORTING)) {
>>> +		if ((priv->can.state == CAN_STATE_ERROR_PASSIVE) ||
>>> +		    (priv->can.state == CAN_STATE_BUS_OFF)) {
>>> +			reg_ctrl = flexcan_read(&regs->ctrl);
>>> +			reg_ctrl &= ~FLEXCAN_CTRL_ERR_MSK;
>>> +			flexcan_write(reg_ctrl, &regs->ctrl);
>>> +		}
>>
>> We do not want to disable bus error interrupts for bus error reporting!
> I see! If user enabled bus error reporting, that means he want all bus errors,
> include flooding. I will fix this.

Yep, serious flooding usually happens only on low end systems and at 
high bitrates.

>>
>>> +		if ((priv->can.state == CAN_STATE_ERROR_WARNING) ||
>>> +		    (priv->can.state == CAN_STATE_ERROR_ACTIVE)) {
>>> +			reg_ctrl = flexcan_read(&regs->ctrl);
>>> +			reg_ctrl |= FLEXCAN_CTRL_ERR_MSK;
>>> +			flexcan_write(reg_ctrl, &regs->ctrl);
>>> +		}
>>> +	}
>>> +
>>
>> For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to
>> disable
>> bus error interrupts as well. To handle that properly, I think we need
> I'm not sure why we do not want to disable error interrupt for legacy broken core,
> I think if this approach works for both (with/without warning irq) core, then disable
> the interrupts helps avoid flooding.

How does it work if no cable is connected?  On a legacy device (with 
broken [TR]WRN_INT), no interrupt will arrive other than the bus error 
interrupt. Maybe I have missed something. Unfortunately, I do not have 
any Flexcan device at hand to check myself.

>> an additional FLEXCAN_QUIRK_BROKEN_PERR_STATE and maybe rename
>> FLEXCAN_QUIRK_BROKEN_ERR_STATE to
>> FLEXCAN_QUIRK_BROKEN_WERR_STATE.
> That's a good point. With FLEXCAN_QUIRK_BROKEN_WERR_STATE and
> FLEXCAN_QUIRK_BROKEN_PERR_STATE, we can precisely tell which core
> lacks of which capability.

That's the idea.

> However, I'm wondering is there a flexcan core which lacking warning irq,

Good question? Don't know if the VF610 core does need that quirk as well.

> but having passive irq? If this is not true, then we will never specify the
> WERR_STATE quirk alone, right?

Yep!
> 
>>
>>>   	return handled;
>>>   }
>>>
>>> --
>>> 2.7.4
>>
>>
>>> Addresses the interrupt flooding issue as above mentioned, and takes care
>>> of the bus-error reporting. For the other quirks, they looks like won't
>>> influence the state transition and interrupt flooding. Please comment if
>>> something missed.
>>>
>>> One more thing regarding patch 2 is, it doesn't rely on error warning
>>> can be reported by [TR]WRN_INT, because this state can also be reported
>>> upon the _any_ interrupt. Only bus-off state needs to have dedicated
>>> interrupt line (is this true for all flexcan core?)
>>
>> I don't think so. How should it work if no cable is connected? Anyway,
>> the patch seems not to be to lenghty.
> I disabled the [TR]WRN_INT on i.MX6 by change
> (http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L88)
> to:
> #define FLEXCAN_CTRL_ERR_STATE FLEXCAN_CTRL_BOFF_MSK
> and tested with the patch, seems the state transitions still works fine.

With no cable connected?

> I think the reason is the same as how the error interrupt helps report error passive:
> At most 12 times consecutive bus error interrupts will change to error warning, and
> minimal 1 time Tx/Rx interrupt will change state from error passive to error warning.
> 
> If we change to:
> #define FLEXCAN_CTRL_ERR_STATE 0
> then we cannot enter bus off state any more as expected (other transitions still ok).
> 
> As we only have i.MX6 at hand, so we cannot verify this on other flexcan core.
> If we want to limit the patch to only take effect for i.MX6, then we should introduce
> the FLEXCAN_QUIRK_BROKEN_P(W)ERR_STATE, thus the legacy core will not be
> affected.

As you/I cannot test on legacy and other cores, I would go for a 
separate quirk for the mx6q.

> Otherwise, the legacy FLEXCAN_QUIRK_BORKEN_ERR_STATE can be extended
> to the core with broken warning and/or passive irq.
> 
> What do you think?
> 
> I'll prepare patches for review later:
> 1. fix regression bug
> 2. suppress interrupt flooding for broken core
> 3. enable quirk for i.mx6
> 
> Shall I send separate e-mail for each patch? Due to our mail client is outlook, I'm
> not sure whether the inline patch can work smoothly, if not, shall I attach the patch?

Separate patches please, especially for the first one.

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-30  6:25                       ` Wolfgang Grandegger
@ 2017-08-30  6:50                         ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-30  7:15                           ` Wolfgang Grandegger
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-30  6:50 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Wednesday, August 30, 2017 2:26 PM
> 
> Am 30.08.2017 um 06:22 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> >
> >> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> >> Sent: Tuesday, August 29, 2017 7:17 PM
> >>
> >> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> >> [...]
> >>
> >> For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to
> >> disable
> >> bus error interrupts as well. To handle that properly, I think we need
> > I'm not sure why we do not want to disable error interrupt for legacy broken core,
> > I think if this approach works for both (with/without warning irq) core, then
> > disable the interrupts helps avoid flooding.
> 
> How does it work if no cable is connected?  On a legacy device (with
> broken [TR]WRN_INT), no interrupt will arrive other than the bus error
> interrupt. Maybe I have missed something. Unfortunately, I do not have
> any Flexcan device at hand to check myself.
The bus error interrupt means error counter changed, and we can derive correct
state from that. Ah, the node needs to try to send something, otherwise,
the state will not change, but this is valid for all can controllers.

>
> >>> One more thing regarding patch 2 is, it doesn't rely on error warning
> >>> can be reported by [TR]WRN_INT, because this state can also be reported
> >>> upon the _any_ interrupt. Only bus-off state needs to have dedicated
> >>> interrupt line (is this true for all flexcan core?)
> >>
> >> I don't think so. How should it work if no cable is connected? Anyway,
> >> the patch seems not to be to lenghty.
> > I disabled the [TR]WRN_INT on i.MX6 by change
> > (http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L88)
> > to:
> > #define FLEXCAN_CTRL_ERR_STATE FLEXCAN_CTRL_BOFF_MSK
> > and tested with the patch, seems the state transitions still works fine.
> 
> With no cable connected?
Yes, 
1. Unplug the cable will change to error passive
2. plug back the cable will eventually change to error active
3. unplug the cable while the bus is recovering to error warning / error active,
it will report error passive again
4. repeat plug/unplug the cable and watched the state change as expected

>
> > As we only have i.MX6 at hand, so we cannot verify this on other flexcan core.
> > If we want to limit the patch to only take effect for i.MX6, then we should
> introduce
> > the FLEXCAN_QUIRK_BROKEN_P(W)ERR_STATE, thus the legacy core will not
> be
> > affected.
> 
> As you/I cannot test on legacy and other cores, I would go for a
> separate quirk for the mx6q.
OK.

> 
> > Shall I send separate e-mail for each patch? Due to our mail client is outlook, I'm
> > not sure whether the inline patch can work smoothly, if not, shall I attach the
> patch?
> 
> Separate patches please, especially for the first one.
OK.

Best regards
Yi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-30  6:50                         ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-30  7:15                           ` Wolfgang Grandegger
  2017-08-30  9:05                             ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-30  7:15 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello ZHU Yi,

Am 30.08.2017 um 08:50 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hello Wolfgang,
> 
>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>> Sent: Wednesday, August 30, 2017 2:26 PM
>>
>> Am 30.08.2017 um 06:22 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>>
>>>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>>>> Sent: Tuesday, August 29, 2017 7:17 PM
>>>>
>>>> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>>> [...]
>>>>
>>>> For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to
>>>> disable
>>>> bus error interrupts as well. To handle that properly, I think we need
>>> I'm not sure why we do not want to disable error interrupt for legacy broken core,
>>> I think if this approach works for both (with/without warning irq) core, then
>>> disable the interrupts helps avoid flooding.
>>
>> How does it work if no cable is connected?  On a legacy device (with
>> broken [TR]WRN_INT), no interrupt will arrive other than the bus error
>> interrupt. Maybe I have missed something. Unfortunately, I do not have
>> any Flexcan device at hand to check myself.
> The bus error interrupt means error counter changed, and we can derive correct
> state from that. Ah, the node needs to try to send something, otherwise,
> the state will not change, but this is valid for all can controllers.

Of course, we can always read the error counter on demand. But no 
interrupt will be triggered after the send. How should the state change 
then be realized?

>>>>> One more thing regarding patch 2 is, it doesn't rely on error warning
>>>>> can be reported by [TR]WRN_INT, because this state can also be reported
>>>>> upon the _any_ interrupt. Only bus-off state needs to have dedicated
>>>>> interrupt line (is this true for all flexcan core?)
>>>>
>>>> I don't think so. How should it work if no cable is connected? Anyway,
>>>> the patch seems not to be to lenghty.
>>> I disabled the [TR]WRN_INT on i.MX6 by change
>>> (http://elixir.free-electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L88)
>>> to:
>>> #define FLEXCAN_CTRL_ERR_STATE FLEXCAN_CTRL_BOFF_MSK
>>> and tested with the patch, seems the state transitions still works fine.
>>
>> With no cable connected?
> Yes,
> 1. Unplug the cable will change to error passive

No cable is connected. Then do an "ifconfig canX up". Then send  a 
message. The state should change from "active->warning->passive", but 
that's only realized by the driver if an interrupt is received. On 
legacy cores, the bus error interrupt does the trigger. Can you show the 
output of "dmesg| grep flexcan" with "CONFIG_CAN_DEBUG_DEVICES=y" for 
that scenario, or even better the output of:

   # candump -e -td any,0:0,#FFFFFFFF

> 2. plug back the cable will eventually change to error active
> 3. unplug the cable while the bus is recovering to error warning / error active,
> it will report error passive again
> 4. repeat plug/unplug the cable and watched the state change as expected

It may report something, but only if the TX done interrupt occurs.

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-30  7:15                           ` Wolfgang Grandegger
@ 2017-08-30  9:05                             ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-30 10:59                               ` Wolfgang Grandegger
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-30  9:05 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Wednesday, August 30, 2017 3:16 PM
> 
> Hello ZHU Yi,
> 
> Am 30.08.2017 um 08:50 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> > Hello Wolfgang,
> >
> >> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> >> Sent: Wednesday, August 30, 2017 2:26 PM
> >>
> >> Am 30.08.2017 um 06:22 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> >>>
> >>>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> >>>> Sent: Tuesday, August 29, 2017 7:17 PM
> >>>>
> >>>> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> >>>> [...]
> >>>>
> >>>> For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to
> >>>> disable
> >>>> bus error interrupts as well. To handle that properly, I think we need
> >>> I'm not sure why we do not want to disable error interrupt for legacy broken
> core,
> >>> I think if this approach works for both (with/without warning irq) core, then
> >>> disable the interrupts helps avoid flooding.
> >>
> >> How does it work if no cable is connected?  On a legacy device (with
> >> broken [TR]WRN_INT), no interrupt will arrive other than the bus error
> >> interrupt. Maybe I have missed something. Unfortunately, I do not have
> >> any Flexcan device at hand to check myself.
> > The bus error interrupt means error counter changed, and we can derive correct
> > state from that. Ah, the node needs to try to send something, otherwise,
> > the state will not change, but this is valid for all can controllers.
> 
> Of course, we can always read the error counter on demand. But no
> interrupt will be triggered after the send. How should the state change
> then be realized?
The error interrupt will always be triggered after the send, because the
auto-retransmission will keep getting Ack error if the cable is not connected,
that's where the interrupt flooding come from (and the flooding won't stop
until the cable is connected again).

After the cable is connected, and there're error conditions, e.g., bus short,
the node would eventually go to bus off, in this case, the flooding will stop
when it enter bus off state.

So if auto-retransmission is enabled, then the error interrupt will be generated,
and we can use this to realize the state change. 

As we understood so far, flexcan core does not have register to disable
auto-retransmission. Is this understanding correct?

> 
> >>>>> One more thing regarding patch 2 is, it doesn't rely on error warning
> >>>>> can be reported by [TR]WRN_INT, because this state can also be reported
> >>>>> upon the _any_ interrupt. Only bus-off state needs to have dedicated
> >>>>> interrupt line (is this true for all flexcan core?)
> >>>>
> >>>> I don't think so. How should it work if no cable is connected? Anyway,
> >>>> the patch seems not to be to lenghty.
> >>> I disabled the [TR]WRN_INT on i.MX6 by change
> >>> (http://elixir.free-
> electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L88)
> >>> to:
> >>> #define FLEXCAN_CTRL_ERR_STATE FLEXCAN_CTRL_BOFF_MSK
> >>> and tested with the patch, seems the state transitions still works fine.
> >>
> >> With no cable connected?
> > Yes,
> > 1. Unplug the cable will change to error passive
> 
> No cable is connected. Then do an "ifconfig canX up". Then send  a
> message. The state should change from "active->warning->passive", but
> that's only realized by the driver if an interrupt is received. On
> legacy cores, the bus error interrupt does the trigger. Can you show the
> output of "dmesg| grep flexcan" with "CONFIG_CAN_DEBUG_DEVICES=y" for
> that scenario, or even better the output of:
> 
>    # candump -e -td any,0:0,#FFFFFFFF
> 
> > 2. plug back the cable will eventually change to error active
> > 3. unplug the cable while the bus is recovering to error warning / error active,
> > it will report error passive again
> > 4. repeat plug/unplug the cable and watched the state change as expected
> 
> It may report something, but only if the TX done interrupt occurs.

**Test with [TR]WRN_INT enabled:**
# dmesg | grep flexcan
[    1.560695] flexcan 2090000.flexcan: 2090000.flexcan supply xceiver not found, using dummy regulator
[    1.570075] flexcan 2090000.flexcan: can_rx_offload_init_queue: skb_queue_len_max=512
[    1.570660] flexcan 2090000.flexcan: device registered (reg_base=e0a78000, irq=34)
[    6.126129] flexcan 2090000.flexcan mob0: renamed from can0
[  103.057132] flexcan 2090000.flexcan mob0: writing ctrl=0x013a2007
[  103.057161] flexcan 2090000.flexcan mob0: flexcan_set_bittiming: mcr=0x5980000f ctrl=0x013a2007
[  103.057178] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing mcr=0x59a3023f
[  103.057193] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing ctrl=0x013a2057
[  103.057291] flexcan 2090000.flexcan mob0: flexcan_chip_start: reading mcr=0x40a3023f ctrl=0x013aec57
[  163.925178] flexcan 2090000.flexcan mob0: New error state: 1
[  163.926220] flexcan 2090000.flexcan mob0: New error state: 2

# candump -e -td any,0:0,#FFFFFFFF
 (000.000000)  mob0  20000004   [8]  00 08 00 00 00 00 00 00   ERRORFRAME
        controller-problem{tx-error-warning}
 (000.001048)  mob0  20000004   [8]  00 20 00 00 00 00 00 00   ERRORFRAME
        controller-problem{tx-error-passive}

**Test with [TR]WRN_INT disabled:**
# dmesg | grep flexcan
[    1.561154] flexcan 2090000.flexcan: 2090000.flexcan supply xceiver not found, using dummy regulator
[    1.570532] flexcan 2090000.flexcan: can_rx_offload_init_queue: skb_queue_len_max=512
[    1.571125] flexcan 2090000.flexcan: device registered (reg_base=e0a78000, irq=34)
[    6.115142] flexcan 2090000.flexcan mob0: renamed from can0
[   35.931121] flexcan 2090000.flexcan mob0: writing ctrl=0x013a2007
[   35.931149] flexcan 2090000.flexcan mob0: flexcan_set_bittiming: mcr=0x5980000f ctrl=0x013a2007
[   35.931166] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing mcr=0x59a3023f
[   35.931182] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing ctrl=0x013a2057
[   35.931279] flexcan 2090000.flexcan mob0: flexcan_chip_start: reading mcr=0x40a3023f ctrl=0x013ae057
[   64.660324] flexcan 2090000.flexcan mob0: New error state: 1
[   64.661362] flexcan 2090000.flexcan mob0: New error state: 2

# candump -e -td any,0:0,#FFFFFFFF
 (000.000000)  mob0  20000004   [8]  00 08 00 00 00 00 00 00   ERRORFRAME
        controller-problem{tx-error-warning}
 (000.001050)  mob0  20000004   [8]  00 20 00 00 00 00 00 00   ERRORFRAME
        controller-problem{tx-error-passive}

Best regards
Yi

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-30  9:05                             ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-30 10:59                               ` Wolfgang Grandegger
  2017-08-31  8:33                                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-30 10:59 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello ZHU Yi,

Am 30.08.2017 um 11:05 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hello Wolfgang,
> 
>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>> Sent: Wednesday, August 30, 2017 3:16 PM
>>
>> Hello ZHU Yi,
>>
>> Am 30.08.2017 um 08:50 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>> Hello Wolfgang,
>>>
>>>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>>>> Sent: Wednesday, August 30, 2017 2:26 PM
>>>>
>>>> Am 30.08.2017 um 06:22 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>>>>
>>>>>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>>>>>> Sent: Tuesday, August 29, 2017 7:17 PM
>>>>>>
>>>>>> Am 29.08.2017 um 10:49 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
>>>>>> [...]
>>>>>>
>>>>>> For the legacy FLEXCAN_QUIRK_BROKEN_ERR_STATE we do not want to
>>>>>> disable
>>>>>> bus error interrupts as well. To handle that properly, I think we need
>>>>> I'm not sure why we do not want to disable error interrupt for legacy broken
>> core,
>>>>> I think if this approach works for both (with/without warning irq) core, then
>>>>> disable the interrupts helps avoid flooding.
>>>>
>>>> How does it work if no cable is connected?  On a legacy device (with
>>>> broken [TR]WRN_INT), no interrupt will arrive other than the bus error
>>>> interrupt. Maybe I have missed something. Unfortunately, I do not have
>>>> any Flexcan device at hand to check myself.
>>> The bus error interrupt means error counter changed, and we can derive correct
>>> state from that. Ah, the node needs to try to send something, otherwise,
>>> the state will not change, but this is valid for all can controllers.
>>
>> Of course, we can always read the error counter on demand. But no
>> interrupt will be triggered after the send. How should the state change
>> then be realized?
> The error interrupt will always be triggered after the send, because the
> auto-retransmission will keep getting Ack error if the cable is not connected,
> that's where the interrupt flooding come from (and the flooding won't stop
> until the cable is connected again).

You mean the error interrupt signalling the ACK error occurs even with
FLEXCAN_CTRL_ERR_MSK *not* set in "ctrl"?
Or is it set from the beginning?

> 
> After the cable is connected, and there're error conditions, e.g., bus short,
> the node would eventually go to bus off, in this case, the flooding will stop
> when it enter bus off state.
> 
> So if auto-retransmission is enabled, then the error interrupt will be generated,
> and we can use this to realize the state change.
> 
> As we understood so far, flexcan core does not have register to disable
> auto-retransmission. Is this understanding correct?

That's what I try to understand as well.

> 
>>
>>>>>>> One more thing regarding patch 2 is, it doesn't rely on error warning
>>>>>>> can be reported by [TR]WRN_INT, because this state can also be reported
>>>>>>> upon the _any_ interrupt. Only bus-off state needs to have dedicated
>>>>>>> interrupt line (is this true for all flexcan core?)
>>>>>>
>>>>>> I don't think so. How should it work if no cable is connected? Anyway,
>>>>>> the patch seems not to be to lenghty.
>>>>> I disabled the [TR]WRN_INT on i.MX6 by change
>>>>> (http://elixir.free-
>> electrons.com/linux/latest/source/drivers/net/can/flexcan.c#L88)
>>>>> to:
>>>>> #define FLEXCAN_CTRL_ERR_STATE FLEXCAN_CTRL_BOFF_MSK
>>>>> and tested with the patch, seems the state transitions still works fine.
>>>>
>>>> With no cable connected?
>>> Yes,
>>> 1. Unplug the cable will change to error passive
>>
>> No cable is connected. Then do an "ifconfig canX up". Then send  a
>> message. The state should change from "active->warning->passive", but
>> that's only realized by the driver if an interrupt is received. On
>> legacy cores, the bus error interrupt does the trigger. Can you show the
>> output of "dmesg| grep flexcan" with "CONFIG_CAN_DEBUG_DEVICES=y" for
>> that scenario, or even better the output of:
>>
>>     # candump -e -td any,0:0,#FFFFFFFF
>>
>>> 2. plug back the cable will eventually change to error active
>>> 3. unplug the cable while the bus is recovering to error warning / error active,
>>> it will report error passive again
>>> 4. repeat plug/unplug the cable and watched the state change as expected
>>
>> It may report something, but only if the TX done interrupt occurs.
> 
> **Test with [TR]WRN_INT enabled:**
> # dmesg | grep flexcan
> [    1.560695] flexcan 2090000.flexcan: 2090000.flexcan supply xceiver not found, using dummy regulator
> [    1.570075] flexcan 2090000.flexcan: can_rx_offload_init_queue: skb_queue_len_max=512
> [    1.570660] flexcan 2090000.flexcan: device registered (reg_base=e0a78000, irq=34)
> [    6.126129] flexcan 2090000.flexcan mob0: renamed from can0
> [  103.057132] flexcan 2090000.flexcan mob0: writing ctrl=0x013a2007
> [  103.057161] flexcan 2090000.flexcan mob0: flexcan_set_bittiming: mcr=0x5980000f ctrl=0x013a2007
> [  103.057178] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing mcr=0x59a3023f
> [  103.057193] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing ctrl=0x013a2057
> [  103.057291] flexcan 2090000.flexcan mob0: flexcan_chip_start: reading mcr=0x40a3023f ctrl=0x013aec57

Here FLEXCAN_CTRL_ERR_MSK (0x4000) is set in "ctrl"!

Could you please add to flexcan_irq():

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 13f0f21..0df0c34 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -758,6 +758,7 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
        }
 
        reg_esr = flexcan_read(&regs->esr);
+       netdev_dbg(dev, "reg_iflag1=%#x reg_esr=%#x\n", reg_iflag1, regs->esr);
 
        /* ACK all bus error and state change IRQ sources */
        if (reg_esr & FLEXCAN_ESR_ALL_INT) {
	reg_esr = flexcan_read(&regs->esr);
	
This would make cleaner what is going on.

Thanks,

Wolfgang.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-30 10:59                               ` Wolfgang Grandegger
@ 2017-08-31  8:33                                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
  2017-08-31  9:53                                   ` Wolfgang Grandegger
  0 siblings, 1 reply; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-08-31  8:33 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Wednesday, August 30, 2017 6:59 PM
> [...]
>
> > The error interrupt will always be triggered after the send, because the
> > auto-retransmission will keep getting Ack error if the cable is not connected,
> > that's where the interrupt flooding come from (and the flooding won't stop
> > until the cable is connected again).
> 
> You mean the error interrupt signalling the ACK error occurs even with
> FLEXCAN_CTRL_ERR_MSK *not* set in "ctrl"?
> Or is it set from the beginning?
Sorry for the incomplete description, I forgot to mention it is assumed
that the broken error state quirk fix is enabled for the core, hence
the error interrupt is enabled from the beginning.

Although only p1010 enabled the quirk so far, but to our understanding,
the rest core seems also need the quirk enabled to report correct state
transitions, e.g., i.MX6.

However, as neither of us can test the legacy and other cores, so we
cannot touch them without testing. Perhaps someone who has the access
to these cores can give a hand to verify that in future.

>
> [...]
> Here FLEXCAN_CTRL_ERR_MSK (0x4000) is set in "ctrl"!
> 
> Could you please add to flexcan_irq():
> 
> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
> index 13f0f21..0df0c34 100644
> --- a/drivers/net/can/flexcan.c
> +++ b/drivers/net/can/flexcan.c
> @@ -758,6 +758,7 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>         }
> 
>         reg_esr = flexcan_read(&regs->esr);
> +       netdev_dbg(dev, "reg_iflag1=%#x reg_esr=%#x\n", reg_iflag1, regs->esr);
> 
>         /* ACK all bus error and state change IRQ sources */
>         if (reg_esr & FLEXCAN_ESR_ALL_INT) {
> 	reg_esr = flexcan_read(&regs->esr);
> 
> This would make cleaner what is going on.
Sure. Below test is based on kernel v4.11 with the proposed patches.
To demonstrate the theory of the legacy cores should also work, the
"[TR]WRN_INT" are disabled.
(clear WRN_EN in MCR and [TR]WRN_MSK in CTRL1)

Test case:
1. Disconnect the cable
2. "ip link set xxx type can bitrate 500000" and "ip link set xxx up"
3. "cansend xxx 123#ffffffffffffffff"

# dmesg | grep flexcan
[    1.561243] flexcan 2090000.flexcan: 2090000.flexcan supply xceiver not found, using dummy regulator
[    1.570621] flexcan 2090000.flexcan: can_rx_offload_init_queue: skb_queue_len_max=512
[    1.571207] flexcan 2090000.flexcan: device registered (reg_base=e0a78000, irq=34)
[    6.227746] flexcan 2090000.flexcan mob0: renamed from can0
[   34.400669] flexcan 2090000.flexcan mob0: writing ctrl=0x013a2007
[   34.400698] flexcan 2090000.flexcan mob0: flexcan_set_bittiming: mcr=0x5980000f ctrl=0x013a2007
[   34.400715] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing mcr=0x5983023f
[   34.400731] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing ctrl=0x013a2057
[   34.400829] flexcan 2090000.flexcan mob0: flexcan_chip_start: reading mcr=0x4083023f ctrl=0x013ae057
[   44.112307] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.112547] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.112810] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.113074] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.113331] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.113603] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.113851] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.114124] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.114367] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.114631] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.114884] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
[   44.115142] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
[   44.115162] flexcan 2090000.flexcan mob0: New error state: 1
[   44.115404] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
[   44.115661] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
[   44.115921] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
[   44.116180] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42252
[   44.116199] flexcan 2090000.flexcan mob0: New error state: 2

Best regards
Yi


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: flexcan missing error state transitions
  2017-08-31  8:33                                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
@ 2017-08-31  9:53                                   ` Wolfgang Grandegger
  2017-09-01  8:24                                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 1 reply; 28+ messages in thread
From: Wolfgang Grandegger @ 2017-08-31  9:53 UTC (permalink / raw)
  To: ZHU Yi (ST-FIR/ENG1-Zhu), Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello ZHU Yi,

Am 31.08.2017 um 10:33 schrieb ZHU Yi (ST-FIR/ENG1-Zhu):
> Hello Wolfgang,
> 
>> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
>> Sent: Wednesday, August 30, 2017 6:59 PM
>> [...]
>>
>>> The error interrupt will always be triggered after the send, because the
>>> auto-retransmission will keep getting Ack error if the cable is not connected,
>>> that's where the interrupt flooding come from (and the flooding won't stop
>>> until the cable is connected again).
>>
>> You mean the error interrupt signalling the ACK error occurs even with
>> FLEXCAN_CTRL_ERR_MSK *not* set in "ctrl"?
>> Or is it set from the beginning?
> Sorry for the incomplete description, I forgot to mention it is assumed
> that the broken error state quirk fix is enabled for the core, hence
> the error interrupt is enabled from the beginning.

OK.

> Although only p1010 enabled the quirk so far, but to our understanding,
> the rest core seems also need the quirk enabled to report correct state
> transitions, e.g., i.MX6.

I will have a i.MX53 and i.MX28 board soon on my table to check error 
passive handling. Would be nice to reduce interrupt flooding on these as 
well.

> However, as neither of us can test the legacy and other cores, so we
> cannot touch them without testing. Perhaps someone who has the access
> to these cores can give a hand to verify that in future.

Other tester are welcome!

>>
>> [...]
>> Here FLEXCAN_CTRL_ERR_MSK (0x4000) is set in "ctrl"!
>>
>> Could you please add to flexcan_irq():
>>
>> diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
>> index 13f0f21..0df0c34 100644
>> --- a/drivers/net/can/flexcan.c
>> +++ b/drivers/net/can/flexcan.c
>> @@ -758,6 +758,7 @@ static irqreturn_t flexcan_irq(int irq, void *dev_id)
>>          }
>>
>>          reg_esr = flexcan_read(&regs->esr);
>> +       netdev_dbg(dev, "reg_iflag1=%#x reg_esr=%#x\n", reg_iflag1, regs->esr);
>>
>>          /* ACK all bus error and state change IRQ sources */
>>          if (reg_esr & FLEXCAN_ESR_ALL_INT) {
>> 	reg_esr = flexcan_read(&regs->esr);
>>
>> This would make cleaner what is going on.
> Sure. Below test is based on kernel v4.11 with the proposed patches.
> To demonstrate the theory of the legacy cores should also work, the
> "[TR]WRN_INT" are disabled.
> (clear WRN_EN in MCR and [TR]WRN_MSK in CTRL1)

OK, then I understand how the following works:
> Test case:
> 1. Disconnect the cable
> 2. "ip link set xxx type can bitrate 500000" and "ip link set xxx up"
> 3. "cansend xxx 123#ffffffffffffffff"
> 
> # dmesg | grep flexcan
> [    1.561243] flexcan 2090000.flexcan: 2090000.flexcan supply xceiver not found, using dummy regulator
> [    1.570621] flexcan 2090000.flexcan: can_rx_offload_init_queue: skb_queue_len_max=512
> [    1.571207] flexcan 2090000.flexcan: device registered (reg_base=e0a78000, irq=34)
> [    6.227746] flexcan 2090000.flexcan mob0: renamed from can0
> [   34.400669] flexcan 2090000.flexcan mob0: writing ctrl=0x013a2007
> [   34.400698] flexcan 2090000.flexcan mob0: flexcan_set_bittiming: mcr=0x5980000f ctrl=0x013a2007
> [   34.400715] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing mcr=0x5983023f
> [   34.400731] flexcan 2090000.flexcan mob0: flexcan_chip_start: writing ctrl=0x013a2057
> [   34.400829] flexcan 2090000.flexcan mob0: flexcan_chip_start: reading mcr=0x4083023f ctrl=0x013ae057
> [   44.112307] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.112547] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.112810] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.113074] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.113331] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.113603] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.113851] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.114124] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.114367] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.114631] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.114884] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42042
> [   44.115142] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
> [   44.115162] flexcan 2090000.flexcan mob0: New error state: 1
> [   44.115404] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
> [   44.115661] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
> [   44.115921] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42242
> [   44.116180] flexcan 2090000.flexcan mob0: reg_iflag1=0x0 reg_esr=0x42252
> [   44.116199] flexcan 2090000.flexcan mob0: New error state: 2

We disable the bus errors when error passive is reached. But what 
happens if the error state decreases. It will require successful 
transmissions. When the TX done interrupt comes, changes to error 
warning are realized. Then we re-enable the error interrupt. But below 
error warning, we would like to disable the error interrupt (for the 
mx6q). That might be a problem but could be fixed by special treatment 
(for legacy cores).

Wolfgang.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: flexcan missing error state transitions
  2017-08-31  9:53                                   ` Wolfgang Grandegger
@ 2017-09-01  8:24                                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
  0 siblings, 0 replies; 28+ messages in thread
From: ZHU Yi (ST-FIR/ENG1-Zhu) @ 2017-09-01  8:24 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde, Andri Yngvason, linux-can
  Cc: hs, RUAN Tingquan (ST-FIR/ENG1-Zhu), Jonas Mark (ST-FIR/ENG1)

Hello Wolfgang,

> From: Wolfgang Grandegger [mailto:wg@grandegger.com]
> Sent: Thursday, August 31, 2017 5:54 PM
> 
> [...]
> > Although only p1010 enabled the quirk so far, but to our understanding,
> > the rest core seems also need the quirk enabled to report correct state
> > transitions, e.g., i.MX6.
> 
> I will have a i.MX53 and i.MX28 board soon on my table to check error
> passive handling. Would be nice to reduce interrupt flooding on these as
> well.
That's really great. :)
I'm looking forward to the result and hope the patch can work for them!

> 
> > However, as neither of us can test the legacy and other cores, so we
> > cannot touch them without testing. Perhaps someone who has the access
> > to these cores can give a hand to verify that in future.
> 
> Other tester are welcome!
> 
> [...]
> We disable the bus errors when error passive is reached. But what
> happens if the error state decreases. It will require successful
> transmissions. When the TX done interrupt comes, changes to error
> warning are realized. Then we re-enable the error interrupt. But below
> error warning, we would like to disable the error interrupt (for the
> mx6q). That might be a problem but could be fixed by special treatment
> (for legacy cores).
Yes. I think this is also related to the discussion of patch 3 thread
(https://marc.info/?l=linux-can&m=150424626206586&w=2). I'll fix this
after we come to a conclusion there.

Thanks!
Yi

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-09-01  8:25 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-18  7:47 flexcan missing error state transitions ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-21 16:18 ` Andri Yngvason
2017-08-21 17:13   ` AW: " Jonas Mark (ST-FIR/ENG1)
2017-08-21 18:21     ` Andri Yngvason
2017-08-22 13:50       ` AW: " Jonas Mark (ST-FIR/ENG1)
2017-08-22 14:06         ` Marc Kleine-Budde
2017-08-22 19:03           ` AW: " Jonas Mark (ST-FIR/ENG1)
2017-08-24  9:40             ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-25 17:16               ` Andri Yngvason
2017-08-27 10:57           ` Wolfgang Grandegger
2017-08-28  4:21             ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-28  8:33               ` Wolfgang Grandegger
2017-08-29  8:49                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-29  9:38                   ` Wolfgang Grandegger
2017-08-30  1:39                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-29 11:17                   ` Wolfgang Grandegger
2017-08-30  4:22                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-30  6:25                       ` Wolfgang Grandegger
2017-08-30  6:50                         ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-30  7:15                           ` Wolfgang Grandegger
2017-08-30  9:05                             ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-30 10:59                               ` Wolfgang Grandegger
2017-08-31  8:33                                 ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-31  9:53                                   ` Wolfgang Grandegger
2017-09-01  8:24                                     ` ZHU Yi (ST-FIR/ENG1-Zhu)
2017-08-29 13:41                   ` Andri Yngvason
2017-08-22 14:14         ` AW: AW: " Andri Yngvason
2017-08-27 12:55           ` Wolfgang Grandegger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.