All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack
@ 2018-01-30 16:56 Andri Yngvason
  2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andri Yngvason @ 2018-01-30 16:56 UTC (permalink / raw)
  To: linux-can
  Cc: mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell, Andri Yngvason

This has been reported to cause stalls on rt-linux.

Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
---
 drivers/net/can/cc770/cc770.c | 15 ---------------
 1 file changed, 15 deletions(-)

diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
index 1e37313..9fed163 100644
--- a/drivers/net/can/cc770/cc770.c
+++ b/drivers/net/can/cc770/cc770.c
@@ -447,15 +447,6 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	stats->tx_bytes += dlc;
 
-
-	/*
-	 * HM: We had some cases of repeated IRQs so make sure the
-	 * INT is acknowledged I know it's already further up, but
-	 * doing again fixed the issue
-	 */
-	cc770_write_reg(priv, msgobj[mo].ctrl0,
-			MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES);
-
 	return NETDEV_TX_OK;
 }
 
@@ -684,12 +675,6 @@ static void cc770_tx_interrupt(struct net_device *dev, unsigned int o)
 	/* Nothing more to send, switch off interrupts */
 	cc770_write_reg(priv, msgobj[mo].ctrl0,
 			MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
-	/*
-	 * We had some cases of repeated IRQ so make sure the
-	 * INT is acknowledged
-	 */
-	cc770_write_reg(priv, msgobj[mo].ctrl0,
-			MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES);
 
 	stats->tx_packets++;
 	can_get_echo_skb(dev, 0);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason
@ 2018-01-30 16:56 ` Andri Yngvason
  2018-02-12 19:40   ` Richard Weinberger
  2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason
  2018-02-12 19:41 ` [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Richard Weinberger
  2 siblings, 1 reply; 16+ messages in thread
From: Andri Yngvason @ 2018-01-30 16:56 UTC (permalink / raw)
  To: linux-can
  Cc: mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell, Andri Yngvason

If the queue is not stopped, the start_xmit function will continue to be
called until the driver or the system is reset.

Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
---
 drivers/net/can/cc770/cc770.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
index 9fed163..12d3b89 100644
--- a/drivers/net/can/cc770/cc770.c
+++ b/drivers/net/can/cc770/cc770.c
@@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if ((cc770_read_reg(priv,
 			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
+		netif_stop_queue(dev);
 		netdev_err(dev, "TX register is still occupied!\n");
 		return NETDEV_TX_BUSY;
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply
  2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason
  2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason
@ 2018-01-30 16:56 ` Andri Yngvason
  2018-02-12 19:45   ` Richard Weinberger
  2018-02-12 19:41 ` [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Richard Weinberger
  2 siblings, 1 reply; 16+ messages in thread
From: Andri Yngvason @ 2018-01-30 16:56 UTC (permalink / raw)
  To: linux-can
  Cc: mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell, Andri Yngvason

While waiting for the TX object to send an RTR, an external message with a
matching id can overwrite the TX data. In this case we must call the rx
routine and then try transmitting the message that was overwritten again.

The queue was being stalled because the RX event did not generate an
interrupt to wake up the queue again and the TX event did not happen
because the TXRQST flag is reset by the chip when new data is received.

According to the CC770 datasheet the id of a message object should not be
changed while the MSGVAL bit is set. This has been fixed by resetting the
MSGVAL bit before modifying the object in the transmit function and setting
it after. It is not enough to set & reset CPUUPD.

It is important to keep the MSGVAL bit reset while the message object is
being modified. Otherwise, during RTR transmission, a frame with matching
id could trigger an rx-interrupt, which would cause a race condition
between the interrupt routine and the transmit function.

Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
---
Changes from v1:
 - Squashed 4 into 3
 - Modified comment
 - Now using NEWDAT flag instead of MSGLST to check if a message was
   received into TX object. This allows us to check if the message object
   was overwritten more than once, and report it.

 drivers/net/can/cc770/cc770.c | 96 ++++++++++++++++++++++++++++++-------------
 drivers/net/can/cc770/cc770.h |  2 +
 2 files changed, 69 insertions(+), 29 deletions(-)

diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
index 12d3b89..e6458cb 100644
--- a/drivers/net/can/cc770/cc770.c
+++ b/drivers/net/can/cc770/cc770.c
@@ -390,38 +390,23 @@ static int cc770_get_berr_counter(const struct net_device *dev,
 	return 0;
 }
 
-static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev)
+static void cc770_tx(struct net_device *dev, int mo)
 {
 	struct cc770_priv *priv = netdev_priv(dev);
-	struct net_device_stats *stats = &dev->stats;
-	struct can_frame *cf = (struct can_frame *)skb->data;
-	unsigned int mo = obj2msgobj(CC770_OBJ_TX);
+	struct can_frame *cf = (struct can_frame *)priv->tx_skb->data;
 	u8 dlc, rtr;
 	u32 id;
 	int i;
 
-	if (can_dropped_invalid_skb(dev, skb))
-		return NETDEV_TX_OK;
-
-	if ((cc770_read_reg(priv,
-			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
-		netif_stop_queue(dev);
-		netdev_err(dev, "TX register is still occupied!\n");
-		return NETDEV_TX_BUSY;
-	}
-
-	netif_stop_queue(dev);
-
 	dlc = cf->can_dlc;
 	id = cf->can_id;
-	if (cf->can_id & CAN_RTR_FLAG)
-		rtr = 0;
-	else
-		rtr = MSGCFG_DIR;
+	rtr = cf->can_id & CAN_RTR_FLAG ? 0 : MSGCFG_DIR;
+
+	cc770_write_reg(priv, msgobj[mo].ctrl0,
+			MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
 	cc770_write_reg(priv, msgobj[mo].ctrl1,
 			RMTPND_RES | TXRQST_RES | CPUUPD_SET | NEWDAT_RES);
-	cc770_write_reg(priv, msgobj[mo].ctrl0,
-			MSGVAL_SET | TXIE_SET | RXIE_RES | INTPND_RES);
+
 	if (id & CAN_EFF_FLAG) {
 		id &= CAN_EFF_MASK;
 		cc770_write_reg(priv, msgobj[mo].config,
@@ -440,13 +425,31 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	for (i = 0; i < dlc; i++)
 		cc770_write_reg(priv, msgobj[mo].data[i], cf->data[i]);
 
-	/* Store echo skb before starting the transfer */
-	can_put_echo_skb(skb, dev, 0);
-
 	cc770_write_reg(priv, msgobj[mo].ctrl1,
-			RMTPND_RES | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC);
+			RMTPND_UNC | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC);
+	cc770_write_reg(priv, msgobj[mo].ctrl0,
+			MSGVAL_SET | TXIE_SET | RXIE_SET | INTPND_UNC);
+}
+
+static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct cc770_priv *priv = netdev_priv(dev);
+	unsigned int mo = obj2msgobj(CC770_OBJ_TX);
+
+	if (can_dropped_invalid_skb(dev, skb))
+		return NETDEV_TX_OK;
 
-	stats->tx_bytes += dlc;
+	if ((cc770_read_reg(priv,
+			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
+		netif_stop_queue(dev);
+		netdev_err(dev, "TX register is still occupied!\n");
+		return NETDEV_TX_BUSY;
+	}
+
+	netif_stop_queue(dev);
+
+	priv->tx_skb = skb;
+	cc770_tx(dev, mo);
 
 	return NETDEV_TX_OK;
 }
@@ -672,13 +675,47 @@ static void cc770_tx_interrupt(struct net_device *dev, unsigned int o)
 	struct cc770_priv *priv = netdev_priv(dev);
 	struct net_device_stats *stats = &dev->stats;
 	unsigned int mo = obj2msgobj(o);
+	struct can_frame *cf;
+	u8 ctrl1;
+
+	ctrl1 = cc770_read_reg(priv, msgobj[mo].ctrl1);
 
-	/* Nothing more to send, switch off interrupts */
 	cc770_write_reg(priv, msgobj[mo].ctrl0,
 			MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
+	cc770_write_reg(priv, msgobj[mo].ctrl1,
+			RMTPND_RES | TXRQST_RES | MSGLST_RES | NEWDAT_RES);
 
-	stats->tx_packets++;
+	if (unlikely(!priv->tx_skb)) {
+		netdev_err(dev, "missing tx skb in tx interrupt\n");
+		return;
+	}
+
+	if (unlikely(ctrl1 & MSGLST_SET)) {
+		stats->rx_over_errors++;
+		stats->rx_errors++;
+	}
+
+	/* When the CC770 is sending an RTR message and it receives a regular
+	 * message that matches the id of the RTR message, it will overwrite the
+	 * outgoing message in the TX register. When this happens we must
+	 * process the received message and try to transmit the outgoing skb
+	 * again.
+	 */
+	if (unlikely(ctrl1 & NEWDAT_SET)) {
+		cc770_rx(dev, mo, ctrl1);
+		cc770_tx(dev, mo);
+		return;
+	}
+
+	can_put_echo_skb(priv->tx_skb, dev, 0);
 	can_get_echo_skb(dev, 0);
+
+	cf = (struct can_frame *)priv->tx_skb->data;
+	stats->tx_bytes += cf->can_dlc;
+	stats->tx_packets++;
+
+	priv->tx_skb = NULL;
+
 	netif_wake_queue(dev);
 }
 
@@ -790,6 +827,7 @@ struct net_device *alloc_cc770dev(int sizeof_priv)
 	priv->can.do_set_bittiming = cc770_set_bittiming;
 	priv->can.do_set_mode = cc770_set_mode;
 	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES;
+	priv->tx_skb = NULL;
 
 	memcpy(priv->obj_flags, cc770_obj_flags, sizeof(cc770_obj_flags));
 
diff --git a/drivers/net/can/cc770/cc770.h b/drivers/net/can/cc770/cc770.h
index a1739db..95752e1 100644
--- a/drivers/net/can/cc770/cc770.h
+++ b/drivers/net/can/cc770/cc770.h
@@ -193,6 +193,8 @@ struct cc770_priv {
 	u8 cpu_interface;	/* CPU interface register */
 	u8 clkout;		/* Clock out register */
 	u8 bus_config;		/* Bus conffiguration register */
+
+	struct sk_buff *tx_skb;
 };
 
 struct net_device *alloc_cc770dev(int sizeof_priv);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason
@ 2018-02-12 19:40   ` Richard Weinberger
  2018-02-12 20:37     ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2018-02-12 19:40 UTC (permalink / raw)
  To: Andri Yngvason
  Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell

Andri,

Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
> If the queue is not stopped, the start_xmit function will continue to be
> called until the driver or the system is reset.
> 
> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
> ---
>  drivers/net/can/cc770/cc770.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
> index 9fed163..12d3b89 100644
> --- a/drivers/net/can/cc770/cc770.c
> +++ b/drivers/net/can/cc770/cc770.c
> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb,
> struct net_device *dev)
> 
>  	if ((cc770_read_reg(priv,
>  			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
> +		netif_stop_queue(dev);
>  		netdev_err(dev, "TX register is still occupied!\n");
>  		return NETDEV_TX_BUSY;
>  	}

I see that some can drivers stop the queue, others not.
What exactly is the problem in this case?

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack
  2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason
  2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason
  2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason
@ 2018-02-12 19:41 ` Richard Weinberger
       [not found]   ` <151851733257.10946.11726494714017260046@maxwell>
  2 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2018-02-12 19:41 UTC (permalink / raw)
  To: Andri Yngvason
  Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell

Andri,

Am Dienstag, 30. Januar 2018, 17:56:47 CET schrieb Andri Yngvason:
> This has been reported to cause stalls on rt-linux.
> 
> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
> ---
>  drivers/net/can/cc770/cc770.c | 15 ---------------
>  1 file changed, 15 deletions(-)
> 
> diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
> index 1e37313..9fed163 100644
> --- a/drivers/net/can/cc770/cc770.c
> +++ b/drivers/net/can/cc770/cc770.c
> @@ -447,15 +447,6 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff
> *skb, struct net_device *dev)
> 
>  	stats->tx_bytes += dlc;
> 
> -
> -	/*
> -	 * HM: We had some cases of repeated IRQs so make sure the
> -	 * INT is acknowledged I know it's already further up, but
> -	 * doing again fixed the issue
> -	 */
> -	cc770_write_reg(priv, msgobj[mo].ctrl0,
> -			MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES);
> -
>  	return NETDEV_TX_OK;
>  }
> 
> @@ -684,12 +675,6 @@ static void cc770_tx_interrupt(struct net_device *dev,
> unsigned int o) /* Nothing more to send, switch off interrupts */
>  	cc770_write_reg(priv, msgobj[mo].ctrl0,
>  			MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
> -	/*
> -	 * We had some cases of repeated IRQ so make sure the
> -	 * INT is acknowledged
> -	 */
> -	cc770_write_reg(priv, msgobj[mo].ctrl0,
> -			MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES);
> 
>  	stats->tx_packets++;
>  	can_get_echo_skb(dev, 0);

I saw that stalls too.
But instead of just removing these writes I suggest adding a quirk for the can 
card where they cause trouble.
Maybe Wolfgang can give more details why they are here.

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply
  2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason
@ 2018-02-12 19:45   ` Richard Weinberger
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Weinberger @ 2018-02-12 19:45 UTC (permalink / raw)
  To: Andri Yngvason
  Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell

Andri,

Am Dienstag, 30. Januar 2018, 17:56:49 CET schrieb Andri Yngvason:
> While waiting for the TX object to send an RTR, an external message with a
> matching id can overwrite the TX data. In this case we must call the rx
> routine and then try transmitting the message that was overwritten again.
> 
> The queue was being stalled because the RX event did not generate an
> interrupt to wake up the queue again and the TX event did not happen
> because the TXRQST flag is reset by the chip when new data is received.
> 
> According to the CC770 datasheet the id of a message object should not be
> changed while the MSGVAL bit is set. This has been fixed by resetting the
> MSGVAL bit before modifying the object in the transmit function and setting
> it after. It is not enough to set & reset CPUUPD.
> 
> It is important to keep the MSGVAL bit reset while the message object is
> being modified. Otherwise, during RTR transmission, a frame with matching
> id could trigger an rx-interrupt, which would cause a race condition
> between the interrupt routine and the transmit function.
> 
> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
> ---
> Changes from v1:
>  - Squashed 4 into 3
>  - Modified comment
>  - Now using NEWDAT flag instead of MSGLST to check if a message was
>    received into TX object. This allows us to check if the message object
>    was overwritten more than once, and report it.
> 
>  drivers/net/can/cc770/cc770.c | 96
> ++++++++++++++++++++++++++++++------------- drivers/net/can/cc770/cc770.h |
>  2 +
>  2 files changed, 69 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
> index 12d3b89..e6458cb 100644
> --- a/drivers/net/can/cc770/cc770.c
> +++ b/drivers/net/can/cc770/cc770.c
> @@ -390,38 +390,23 @@ static int cc770_get_berr_counter(const struct
> net_device *dev, return 0;
>  }
> 
> -static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device
> *dev) +static void cc770_tx(struct net_device *dev, int mo)
>  {
>  	struct cc770_priv *priv = netdev_priv(dev);
> -	struct net_device_stats *stats = &dev->stats;
> -	struct can_frame *cf = (struct can_frame *)skb->data;
> -	unsigned int mo = obj2msgobj(CC770_OBJ_TX);
> +	struct can_frame *cf = (struct can_frame *)priv->tx_skb->data;
>  	u8 dlc, rtr;
>  	u32 id;
>  	int i;
> 
> -	if (can_dropped_invalid_skb(dev, skb))
> -		return NETDEV_TX_OK;
> -
> -	if ((cc770_read_reg(priv,
> -			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
> -		netif_stop_queue(dev);
> -		netdev_err(dev, "TX register is still occupied!\n");
> -		return NETDEV_TX_BUSY;
> -	}
> -
> -	netif_stop_queue(dev);
> -
>  	dlc = cf->can_dlc;
>  	id = cf->can_id;
> -	if (cf->can_id & CAN_RTR_FLAG)
> -		rtr = 0;
> -	else
> -		rtr = MSGCFG_DIR;
> +	rtr = cf->can_id & CAN_RTR_FLAG ? 0 : MSGCFG_DIR;
> +
> +	cc770_write_reg(priv, msgobj[mo].ctrl0,
> +			MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
>  	cc770_write_reg(priv, msgobj[mo].ctrl1,
>  			RMTPND_RES | TXRQST_RES | CPUUPD_SET | NEWDAT_RES);
> -	cc770_write_reg(priv, msgobj[mo].ctrl0,
> -			MSGVAL_SET | TXIE_SET | RXIE_RES | INTPND_RES);
> +
>  	if (id & CAN_EFF_FLAG) {
>  		id &= CAN_EFF_MASK;
>  		cc770_write_reg(priv, msgobj[mo].config,
> @@ -440,13 +425,31 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff
> *skb, struct net_device *dev) for (i = 0; i < dlc; i++)
>  		cc770_write_reg(priv, msgobj[mo].data[i], cf->data[i]);
> 
> -	/* Store echo skb before starting the transfer */
> -	can_put_echo_skb(skb, dev, 0);
> -
>  	cc770_write_reg(priv, msgobj[mo].ctrl1,
> -			RMTPND_RES | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC);
> +			RMTPND_UNC | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC);
> +	cc770_write_reg(priv, msgobj[mo].ctrl0,
> +			MSGVAL_SET | TXIE_SET | RXIE_SET | INTPND_UNC);
> +}
> +
> +static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device
> *dev) +{
> +	struct cc770_priv *priv = netdev_priv(dev);
> +	unsigned int mo = obj2msgobj(CC770_OBJ_TX);
> +
> +	if (can_dropped_invalid_skb(dev, skb))
> +		return NETDEV_TX_OK;
> 
> -	stats->tx_bytes += dlc;
> +	if ((cc770_read_reg(priv,
> +			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
> +		netif_stop_queue(dev);
> +		netdev_err(dev, "TX register is still occupied!\n");
> +		return NETDEV_TX_BUSY;
> +	}
> +
> +	netif_stop_queue(dev);
> +
> +	priv->tx_skb = skb;
> +	cc770_tx(dev, mo);
> 
>  	return NETDEV_TX_OK;
>  }
> @@ -672,13 +675,47 @@ static void cc770_tx_interrupt(struct net_device *dev,
> unsigned int o) struct cc770_priv *priv = netdev_priv(dev);
>  	struct net_device_stats *stats = &dev->stats;
>  	unsigned int mo = obj2msgobj(o);
> +	struct can_frame *cf;
> +	u8 ctrl1;
> +
> +	ctrl1 = cc770_read_reg(priv, msgobj[mo].ctrl1);
> 
> -	/* Nothing more to send, switch off interrupts */
>  	cc770_write_reg(priv, msgobj[mo].ctrl0,
>  			MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
> +	cc770_write_reg(priv, msgobj[mo].ctrl1,
> +			RMTPND_RES | TXRQST_RES | MSGLST_RES | NEWDAT_RES);
> 
> -	stats->tx_packets++;
> +	if (unlikely(!priv->tx_skb)) {
> +		netdev_err(dev, "missing tx skb in tx interrupt\n");
> +		return;
> +	}
> +
> +	if (unlikely(ctrl1 & MSGLST_SET)) {
> +		stats->rx_over_errors++;
> +		stats->rx_errors++;
> +	}
> +
> +	/* When the CC770 is sending an RTR message and it receives a regular
> +	 * message that matches the id of the RTR message, it will overwrite the
> +	 * outgoing message in the TX register. When this happens we must
> +	 * process the received message and try to transmit the outgoing skb
> +	 * again.
> +	 */
> +	if (unlikely(ctrl1 & NEWDAT_SET)) {
> +		cc770_rx(dev, mo, ctrl1);
> +		cc770_tx(dev, mo);
> +		return;
> +	}
> +
> +	can_put_echo_skb(priv->tx_skb, dev, 0);
>  	can_get_echo_skb(dev, 0);
> +
> +	cf = (struct can_frame *)priv->tx_skb->data;
> +	stats->tx_bytes += cf->can_dlc;
> +	stats->tx_packets++;
> +
> +	priv->tx_skb = NULL;
> +
>  	netif_wake_queue(dev);
>  }
> 
> @@ -790,6 +827,7 @@ struct net_device *alloc_cc770dev(int sizeof_priv)
>  	priv->can.do_set_bittiming = cc770_set_bittiming;
>  	priv->can.do_set_mode = cc770_set_mode;
>  	priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES;
> +	priv->tx_skb = NULL;
> 
>  	memcpy(priv->obj_flags, cc770_obj_flags, sizeof(cc770_obj_flags));
> 
> diff --git a/drivers/net/can/cc770/cc770.h b/drivers/net/can/cc770/cc770.h
> index a1739db..95752e1 100644
> --- a/drivers/net/can/cc770/cc770.h
> +++ b/drivers/net/can/cc770/cc770.h
> @@ -193,6 +193,8 @@ struct cc770_priv {
>  	u8 cpu_interface;	/* CPU interface register */
>  	u8 clkout;		/* Clock out register */
>  	u8 bus_config;		/* Bus conffiguration register */
> +
> +	struct sk_buff *tx_skb;
>  };
> 
>  struct net_device *alloc_cc770dev(int sizeof_priv);

Looks good to me.
If this patch survives 1-2 days on my test bed, I'll replay with a Reviewed/Tested-by. 

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 19:40   ` Richard Weinberger
@ 2018-02-12 20:37     ` Wolfgang Grandegger
  2018-02-12 20:44       ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2018-02-12 20:37 UTC (permalink / raw)
  To: Richard Weinberger, Andri Yngvason
  Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell

Hello,

Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
> Andri,
> 
> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>> If the queue is not stopped, the start_xmit function will continue to be
>> called until the driver or the system is reset.
>>
>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>> ---
>>   drivers/net/can/cc770/cc770.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c
>> index 9fed163..12d3b89 100644
>> --- a/drivers/net/can/cc770/cc770.c
>> +++ b/drivers/net/can/cc770/cc770.c
>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb,
>> struct net_device *dev)
>>
>>   	if ((cc770_read_reg(priv,
>>   			    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>> +		netif_stop_queue(dev);
>>   		netdev_err(dev, "TX register is still occupied!\n");
>>   		return NETDEV_TX_BUSY;
>>   	}
> 
> I see that some can drivers stop the queue, others not.
> What exactly is the problem in this case?

Stopping the queue and returning NETDEV_TX_BUSY does make little sense. 
NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that 
case should be avoided by stopping the queue earlier. Actually, 
netif_stop_queue(dev) is called directly after the if block above,
which means that "TX register is still occupied" indicates an error/bug. 
I think there is a race somewhere.

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 20:37     ` Wolfgang Grandegger
@ 2018-02-12 20:44       ` Wolfgang Grandegger
  2018-02-12 20:55         ` Richard Weinberger
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2018-02-12 20:44 UTC (permalink / raw)
  To: Richard Weinberger, Andri Yngvason
  Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell

Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
> Hello,
> 
> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
>> Andri,
>>
>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>>> If the queue is not stopped, the start_xmit function will continue to be
>>> called until the driver or the system is reset.
>>>
>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>>> ---
>>>   drivers/net/can/cc770/cc770.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/net/can/cc770/cc770.c 
>>> b/drivers/net/can/cc770/cc770.c
>>> index 9fed163..12d3b89 100644
>>> --- a/drivers/net/can/cc770/cc770.c
>>> +++ b/drivers/net/can/cc770/cc770.c
>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct 
>>> sk_buff *skb,
>>> struct net_device *dev)
>>>
>>>       if ((cc770_read_reg(priv,
>>>                   msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>>> +        netif_stop_queue(dev);
>>>           netdev_err(dev, "TX register is still occupied!\n");
>>>           return NETDEV_TX_BUSY;
>>>       }
>>
>> I see that some can drivers stop the queue, others not.
>> What exactly is the problem in this case?
> 
> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. 
> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that 
> case should be avoided by stopping the queue earlier. Actually, 
> netif_stop_queue(dev) is called directly after the if block above,
> which means that "TX register is still occupied" indicates an error/bug. 
> I think there is a race somewhere.

If you run on "-rt", the IRQ is threaded. This would explain the issue.

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 20:44       ` Wolfgang Grandegger
@ 2018-02-12 20:55         ` Richard Weinberger
  2018-02-12 21:28           ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Weinberger @ 2018-02-12 20:55 UTC (permalink / raw)
  To: Wolfgang Grandegger
  Cc: Andri Yngvason, linux-can, mkl, sigurbjorn.narfason,
	hrafnkell.eiriksson, patric.thysell

Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
> > Hello,
> > 
> > Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
> >> Andri,
> >> 
> >> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
> >>> If the queue is not stopped, the start_xmit function will continue to be
> >>> called until the driver or the system is reset.
> >>> 
> >>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
> >>> ---
> >>>   drivers/net/can/cc770/cc770.c | 1 +
> >>>   1 file changed, 1 insertion(+)
> >>> 
> >>> diff --git a/drivers/net/can/cc770/cc770.c
> >>> b/drivers/net/can/cc770/cc770.c
> >>> index 9fed163..12d3b89 100644
> >>> --- a/drivers/net/can/cc770/cc770.c
> >>> +++ b/drivers/net/can/cc770/cc770.c
> >>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
> >>> sk_buff *skb,
> >>> struct net_device *dev)
> >>> 
> >>>       if ((cc770_read_reg(priv,
> >>>                   msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
> >>> +        netif_stop_queue(dev);
> >>>           netdev_err(dev, "TX register is still occupied!\n");
> >>>           return NETDEV_TX_BUSY;
> >>>       }
> >> 
> >> I see that some can drivers stop the queue, others not.
> >> What exactly is the problem in this case?
> > 
> > Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
> > NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
> > case should be avoided by stopping the queue earlier. Actually,
> > netif_stop_queue(dev) is called directly after the if block above,
> > which means that "TX register is still occupied" indicates an error/bug.
> > I think there is a race somewhere.
> 
> If you run on "-rt", the IRQ is threaded. This would explain the issue.
> 
> Wolfgang.

Andri, do you see the "TX register is still occupied!" error message?
In my case, I never saw it while working on the rt-related stalls during my 
tests.

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 20:55         ` Richard Weinberger
@ 2018-02-12 21:28           ` Wolfgang Grandegger
  2018-02-12 21:36             ` Wolfgang Grandegger
  0 siblings, 1 reply; 16+ messages in thread
From: Wolfgang Grandegger @ 2018-02-12 21:28 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Andri Yngvason, linux-can, mkl, sigurbjorn.narfason,
	hrafnkell.eiriksson, patric.thysell

Am 12.02.2018 um 21:55 schrieb Richard Weinberger:
> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
>>> Hello,
>>>
>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
>>>> Andri,
>>>>
>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>>>>> If the queue is not stopped, the start_xmit function will continue to be
>>>>> called until the driver or the system is reset.
>>>>>
>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>>>>> ---
>>>>>    drivers/net/can/cc770/cc770.c | 1 +
>>>>>    1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/net/can/cc770/cc770.c
>>>>> b/drivers/net/can/cc770/cc770.c
>>>>> index 9fed163..12d3b89 100644
>>>>> --- a/drivers/net/can/cc770/cc770.c
>>>>> +++ b/drivers/net/can/cc770/cc770.c
>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
>>>>> sk_buff *skb,
>>>>> struct net_device *dev)
>>>>>
>>>>>        if ((cc770_read_reg(priv,
>>>>>                    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>>>>> +        netif_stop_queue(dev);
>>>>>            netdev_err(dev, "TX register is still occupied!\n");
>>>>>            return NETDEV_TX_BUSY;
>>>>>        }
>>>>
>>>> I see that some can drivers stop the queue, others not.
>>>> What exactly is the problem in this case?
>>>
>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
>>> case should be avoided by stopping the queue earlier. Actually,
>>> netif_stop_queue(dev) is called directly after the if block above,
>>> which means that "TX register is still occupied" indicates an error/bug.
>>> I think there is a race somewhere.
>>
>> If you run on "-rt", the IRQ is threaded. This would explain the issue.
>>
>> Wolfgang.
> 
> Andri, do you see the "TX register is still occupied!" error message?
> In my case, I never saw it while working on the rt-related stalls during my
> tests.

At a closer look I cannot identify a race. Anyway, the message should 
never show up.

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 21:28           ` Wolfgang Grandegger
@ 2018-02-12 21:36             ` Wolfgang Grandegger
  2018-02-13 10:32               ` Andri Yngvason
  2018-02-16 14:22               ` Marc Kleine-Budde
  0 siblings, 2 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2018-02-12 21:36 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Andri Yngvason, linux-can, mkl, sigurbjorn.narfason,
	hrafnkell.eiriksson, patric.thysell

Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger:
> Am 12.02.2018 um 21:55 schrieb Richard Weinberger:
>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
>>>> Hello,
>>>>
>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
>>>>> Andri,
>>>>>
>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>>>>>> If the queue is not stopped, the start_xmit function will continue 
>>>>>> to be
>>>>>> called until the driver or the system is reset.
>>>>>>
>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>>>>>> ---
>>>>>>    drivers/net/can/cc770/cc770.c | 1 +
>>>>>>    1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/net/can/cc770/cc770.c
>>>>>> b/drivers/net/can/cc770/cc770.c
>>>>>> index 9fed163..12d3b89 100644
>>>>>> --- a/drivers/net/can/cc770/cc770.c
>>>>>> +++ b/drivers/net/can/cc770/cc770.c
>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
>>>>>> sk_buff *skb,
>>>>>> struct net_device *dev)
>>>>>>
>>>>>>        if ((cc770_read_reg(priv,
>>>>>>                    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>>>>>> +        netif_stop_queue(dev);
>>>>>>            netdev_err(dev, "TX register is still occupied!\n");
>>>>>>            return NETDEV_TX_BUSY;
>>>>>>        }
>>>>>
>>>>> I see that some can drivers stop the queue, others not.
>>>>> What exactly is the problem in this case?
>>>>
>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
>>>> case should be avoided by stopping the queue earlier. Actually,
>>>> netif_stop_queue(dev) is called directly after the if block above,
>>>> which means that "TX register is still occupied" indicates an 
>>>> error/bug.
>>>> I think there is a race somewhere.
>>>
>>> If you run on "-rt", the IRQ is threaded. This would explain the issue.
>>>
>>> Wolfgang.
>>
>> Andri, do you see the "TX register is still occupied!" error message?
>> In my case, I never saw it while working on the rt-related stalls 
>> during my
>> tests.
> 
> At a closer look I cannot identify a race. Anyway, the message should 
> never show up.

You could try to get a function trace by enabling functions tracing and 
stopping it with "tracing_off" when the if block above is entered. A 
filter for "cc770_*" and "netif_*" might be useful as well.

Wolfgang.




> 
> Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 21:36             ` Wolfgang Grandegger
@ 2018-02-13 10:32               ` Andri Yngvason
  2018-02-13 16:30                 ` Wolfgang Grandegger
  2018-02-16 14:22               ` Marc Kleine-Budde
  1 sibling, 1 reply; 16+ messages in thread
From: Andri Yngvason @ 2018-02-13 10:32 UTC (permalink / raw)
  To: Wolfgang Grandegger, Richard Weinberger
  Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell

Hello Wolfgang, Richard,

Quoting Wolfgang Grandegger (2018-02-12 21:36:37)
> Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger:
> > Am 12.02.2018 um 21:55 schrieb Richard Weinberger:
> >> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
> >>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
> >>>> Hello,
> >>>>
> >>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
> >>>>> Andri,
> >>>>>
> >>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
> >>>>>> If the queue is not stopped, the start_xmit function will continue 
> >>>>>> to be
> >>>>>> called until the driver or the system is reset.
> >>>>>>
> >>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
> >>>>>> ---
> >>>>>>    drivers/net/can/cc770/cc770.c | 1 +
> >>>>>>    1 file changed, 1 insertion(+)
> >>>>>>
> >>>>>> diff --git a/drivers/net/can/cc770/cc770.c
> >>>>>> b/drivers/net/can/cc770/cc770.c
> >>>>>> index 9fed163..12d3b89 100644
> >>>>>> --- a/drivers/net/can/cc770/cc770.c
> >>>>>> +++ b/drivers/net/can/cc770/cc770.c
> >>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
> >>>>>> sk_buff *skb,
> >>>>>> struct net_device *dev)
> >>>>>>
> >>>>>>        if ((cc770_read_reg(priv,
> >>>>>>                    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
> >>>>>> +        netif_stop_queue(dev);
> >>>>>>            netdev_err(dev, "TX register is still occupied!\n");
> >>>>>>            return NETDEV_TX_BUSY;
> >>>>>>        }
> >>>>>
> >>>>> I see that some can drivers stop the queue, others not.
> >>>>> What exactly is the problem in this case?
> >>>>
> >>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
> >>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
> >>>> case should be avoided by stopping the queue earlier. Actually,
> >>>> netif_stop_queue(dev) is called directly after the if block above,
> >>>> which means that "TX register is still occupied" indicates an 
> >>>> error/bug.
> >>>> I think there is a race somewhere.
> >>>
> >>> If you run on "-rt", the IRQ is threaded. This would explain the issue.
> >>>
> >>> Wolfgang.
> >>
> >> Andri, do you see the "TX register is still occupied!" error message?
> >> In my case, I never saw it while working on the rt-related stalls 
> >> during my
> >> tests.
> > 
> > At a closer look I cannot identify a race. Anyway, the message should 
> > never show up.
> 
> You could try to get a function trace by enabling functions tracing and 
> stopping it with "tracing_off" when the if block above is entered. A 
> filter for "cc770_*" and "netif_*" might be useful as well.
> 
> Wolfgang.

There is no issue currently that will cause NETDEV_TX_BUSY to be returned. I
included this patch because while I was working on PATCH 3, this return would
stall the system when I ran into the race condition which I subsequently fixed.

Also, according to include/linux/netdevice.h:
 * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
 *                               struct net_device *dev);
 *	Called when a packet needs to be transmitted.
 *	Returns NETDEV_TX_OK.  Can return NETDEV_TX_BUSY, but you should stop
 *	the queue before that can happen; it's for obsolete devices and weird
 *	corner cases, but the stack really does a non-trivial amount
 *	of useless work if you return NETDEV_TX_BUSY.
 *	Required; cannot be NULL.

All netdevs should stop the queue before returning NETDEV_TX_BUSY.

Best regards,
Andri

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack
       [not found]   ` <151851733257.10946.11726494714017260046@maxwell>
@ 2018-02-13 10:35     ` Richard Weinberger
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Weinberger @ 2018-02-13 10:35 UTC (permalink / raw)
  To: Andri Yngvason
  Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson,
	patric.thysell

Andri,

Am Dienstag, 13. Februar 2018, 11:22:12 CET schrieb Andri Yngvason:
> Hi Richard,
> 
> Quoting Richard Weinberger (2018-02-12 19:41:48)
> 
> > Andri,
> > 
> > Am Dienstag, 30. Januar 2018, 17:56:47 CET schrieb Andri Yngvason:
> > > This has been reported to cause stalls on rt-linux.
> > > 
> > > Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
> > > ---
> > > 
> > >  drivers/net/can/cc770/cc770.c | 15 ---------------
> > >  1 file changed, 15 deletions(-)
> > > 
> > > diff --git a/drivers/net/can/cc770/cc770.c
> > > b/drivers/net/can/cc770/cc770.c
> > > index 1e37313..9fed163 100644
> > > --- a/drivers/net/can/cc770/cc770.c
> > > +++ b/drivers/net/can/cc770/cc770.c
> > > @@ -447,15 +447,6 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff
> > > *skb, struct net_device *dev)
> > > 
> > >       stats->tx_bytes += dlc;
> > > 
> > > -
> > > -     /*
> > > -      * HM: We had some cases of repeated IRQs so make sure the
> > > -      * INT is acknowledged I know it's already further up, but
> > > -      * doing again fixed the issue
> > > -      */
> > > -     cc770_write_reg(priv, msgobj[mo].ctrl0,
> > > -                     MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES);
> > > -
> > > 
> > >       return NETDEV_TX_OK;
> > >  
> > >  }
> > > 
> > > @@ -684,12 +675,6 @@ static void cc770_tx_interrupt(struct net_device
> > > *dev,
> > > unsigned int o) /* Nothing more to send, switch off interrupts */
> > > 
> > >       cc770_write_reg(priv, msgobj[mo].ctrl0,
> > >       
> > >                       MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES);
> > > 
> > > -     /*
> > > -      * We had some cases of repeated IRQ so make sure the
> > > -      * INT is acknowledged
> > > -      */
> > > -     cc770_write_reg(priv, msgobj[mo].ctrl0,
> > > -                     MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES);
> > > 
> > >       stats->tx_packets++;
> > >       can_get_echo_skb(dev, 0);
> > 
> > I saw that stalls too.
> > But instead of just removing these writes I suggest adding a quirk for the
> > can card where they cause trouble.
> > Maybe Wolfgang can give more details why they are here.
> 
> I did not run into this issue myself, I just included this because Patric
> sent me this patch for an issue that you were having. I did not know the
> original author, otherwise I would have added a "Suggested-by".

I'm the author of that debug patch. :-)
While reviewing the driver these acks alerted me and after removing them the 
can card no longer stopped working after some time.

Thanks,
//richard

-- 
sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria
ATU66964118 - FN 374287y

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-13 10:32               ` Andri Yngvason
@ 2018-02-13 16:30                 ` Wolfgang Grandegger
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2018-02-13 16:30 UTC (permalink / raw)
  To: Andri Yngvason, Richard Weinberger
  Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell



Am 13.02.2018 um 11:32 schrieb Andri Yngvason:
> Hello Wolfgang, Richard,
> 
> Quoting Wolfgang Grandegger (2018-02-12 21:36:37)
>> Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger:
>>> Am 12.02.2018 um 21:55 schrieb Richard Weinberger:
>>>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
>>>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
>>>>>> Hello,
>>>>>>
>>>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
>>>>>>> Andri,
>>>>>>>
>>>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>>>>>>>> If the queue is not stopped, the start_xmit function will continue
>>>>>>>> to be
>>>>>>>> called until the driver or the system is reset.
>>>>>>>>
>>>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>>>>>>>> ---
>>>>>>>>     drivers/net/can/cc770/cc770.c | 1 +
>>>>>>>>     1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/can/cc770/cc770.c
>>>>>>>> b/drivers/net/can/cc770/cc770.c
>>>>>>>> index 9fed163..12d3b89 100644
>>>>>>>> --- a/drivers/net/can/cc770/cc770.c
>>>>>>>> +++ b/drivers/net/can/cc770/cc770.c
>>>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
>>>>>>>> sk_buff *skb,
>>>>>>>> struct net_device *dev)
>>>>>>>>
>>>>>>>>         if ((cc770_read_reg(priv,
>>>>>>>>                     msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>>>>>>>> +        netif_stop_queue(dev);
>>>>>>>>             netdev_err(dev, "TX register is still occupied!\n");
>>>>>>>>             return NETDEV_TX_BUSY;
>>>>>>>>         }
>>>>>>>
>>>>>>> I see that some can drivers stop the queue, others not.
>>>>>>> What exactly is the problem in this case?
>>>>>>
>>>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
>>>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
>>>>>> case should be avoided by stopping the queue earlier. Actually,
>>>>>> netif_stop_queue(dev) is called directly after the if block above,
>>>>>> which means that "TX register is still occupied" indicates an
>>>>>> error/bug.
>>>>>> I think there is a race somewhere.
>>>>>
>>>>> If you run on "-rt", the IRQ is threaded. This would explain the issue.
>>>>>
>>>>> Wolfgang.
>>>>
>>>> Andri, do you see the "TX register is still occupied!" error message?
>>>> In my case, I never saw it while working on the rt-related stalls
>>>> during my
>>>> tests.
>>>
>>> At a closer look I cannot identify a race. Anyway, the message should
>>> never show up.
>>
>> You could try to get a function trace by enabling functions tracing and
>> stopping it with "tracing_off" when the if block above is entered. A
>> filter for "cc770_*" and "netif_*" might be useful as well.
>>
>> Wolfgang.
> 
> There is no issue currently that will cause NETDEV_TX_BUSY to be returned. I
> included this patch because while I was working on PATCH 3, this return would
> stall the system when I ran into the race condition which I subsequently fixed.

OK.

> Also, according to include/linux/netdevice.h:
>   * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb,
>   *                               struct net_device *dev);
>   *	Called when a packet needs to be transmitted.
>   *	Returns NETDEV_TX_OK.  Can return NETDEV_TX_BUSY, but you should stop
>   *	the queue before that can happen; it's for obsolete devices and weird
>   *	corner cases, but the stack really does a non-trivial amount
>   *	of useless work if you return NETDEV_TX_BUSY.
>   *	Required; cannot be NULL.
> 
> All netdevs should stop the queue before returning NETDEV_TX_BUSY.

Thanks for the info, I was not aware of that. Without stopping the 
queue, the upper layer will retry asap... which may cause heavy CPU 
load... which is bad!

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-12 21:36             ` Wolfgang Grandegger
  2018-02-13 10:32               ` Andri Yngvason
@ 2018-02-16 14:22               ` Marc Kleine-Budde
  2018-02-16 15:20                 ` Wolfgang Grandegger
  1 sibling, 1 reply; 16+ messages in thread
From: Marc Kleine-Budde @ 2018-02-16 14:22 UTC (permalink / raw)
  To: Wolfgang Grandegger, Richard Weinberger
  Cc: Andri Yngvason, linux-can, sigurbjorn.narfason,
	hrafnkell.eiriksson, patric.thysell


[-- Attachment #1.1: Type: text/plain, Size: 3068 bytes --]

On 02/12/2018 10:36 PM, Wolfgang Grandegger wrote:
> Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger:
>> Am 12.02.2018 um 21:55 schrieb Richard Weinberger:
>>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
>>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
>>>>> Hello,
>>>>>
>>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
>>>>>> Andri,
>>>>>>
>>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>>>>>>> If the queue is not stopped, the start_xmit function will continue 
>>>>>>> to be
>>>>>>> called until the driver or the system is reset.
>>>>>>>
>>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>>>>>>> ---
>>>>>>>    drivers/net/can/cc770/cc770.c | 1 +
>>>>>>>    1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/net/can/cc770/cc770.c
>>>>>>> b/drivers/net/can/cc770/cc770.c
>>>>>>> index 9fed163..12d3b89 100644
>>>>>>> --- a/drivers/net/can/cc770/cc770.c
>>>>>>> +++ b/drivers/net/can/cc770/cc770.c
>>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
>>>>>>> sk_buff *skb,
>>>>>>> struct net_device *dev)
>>>>>>>
>>>>>>>        if ((cc770_read_reg(priv,
>>>>>>>                    msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>>>>>>> +        netif_stop_queue(dev);
>>>>>>>            netdev_err(dev, "TX register is still occupied!\n");
>>>>>>>            return NETDEV_TX_BUSY;
>>>>>>>        }
>>>>>>
>>>>>> I see that some can drivers stop the queue, others not.
>>>>>> What exactly is the problem in this case?
>>>>>
>>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
>>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
>>>>> case should be avoided by stopping the queue earlier. Actually,
>>>>> netif_stop_queue(dev) is called directly after the if block above,
>>>>> which means that "TX register is still occupied" indicates an 
>>>>> error/bug.
>>>>> I think there is a race somewhere.
>>>>
>>>> If you run on "-rt", the IRQ is threaded. This would explain the issue.
>>>>
>>>> Wolfgang.
>>>
>>> Andri, do you see the "TX register is still occupied!" error message?
>>> In my case, I never saw it while working on the rt-related stalls 
>>> during my
>>> tests.
>>
>> At a closer look I cannot identify a race. Anyway, the message should 
>> never show up.
> 
> You could try to get a function trace by enabling functions tracing and 
> stopping it with "tracing_off" when the if block above is entered. A 
> filter for "cc770_*" and "netif_*" might be useful as well.

It seems to me these patches are still WIP, right? Or should I take some
to into my git-tree?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY
  2018-02-16 14:22               ` Marc Kleine-Budde
@ 2018-02-16 15:20                 ` Wolfgang Grandegger
  0 siblings, 0 replies; 16+ messages in thread
From: Wolfgang Grandegger @ 2018-02-16 15:20 UTC (permalink / raw)
  To: Marc Kleine-Budde, Richard Weinberger
  Cc: Andri Yngvason, linux-can, sigurbjorn.narfason,
	hrafnkell.eiriksson, patric.thysell

Hello Marc,

Am 16.02.2018 um 15:22 schrieb Marc Kleine-Budde:
> On 02/12/2018 10:36 PM, Wolfgang Grandegger wrote:
>> Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger:
>>> Am 12.02.2018 um 21:55 schrieb Richard Weinberger:
>>>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger:
>>>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger:
>>>>>> Hello,
>>>>>>
>>>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger:
>>>>>>> Andri,
>>>>>>>
>>>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason:
>>>>>>>> If the queue is not stopped, the start_xmit function will continue
>>>>>>>> to be
>>>>>>>> called until the driver or the system is reset.
>>>>>>>>
>>>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
>>>>>>>> ---
>>>>>>>>     drivers/net/can/cc770/cc770.c | 1 +
>>>>>>>>     1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/can/cc770/cc770.c
>>>>>>>> b/drivers/net/can/cc770/cc770.c
>>>>>>>> index 9fed163..12d3b89 100644
>>>>>>>> --- a/drivers/net/can/cc770/cc770.c
>>>>>>>> +++ b/drivers/net/can/cc770/cc770.c
>>>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct
>>>>>>>> sk_buff *skb,
>>>>>>>> struct net_device *dev)
>>>>>>>>
>>>>>>>>         if ((cc770_read_reg(priv,
>>>>>>>>                     msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) {
>>>>>>>> +        netif_stop_queue(dev);
>>>>>>>>             netdev_err(dev, "TX register is still occupied!\n");
>>>>>>>>             return NETDEV_TX_BUSY;
>>>>>>>>         }
>>>>>>>
>>>>>>> I see that some can drivers stop the queue, others not.
>>>>>>> What exactly is the problem in this case?
>>>>>>
>>>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense.
>>>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that
>>>>>> case should be avoided by stopping the queue earlier. Actually,
>>>>>> netif_stop_queue(dev) is called directly after the if block above,
>>>>>> which means that "TX register is still occupied" indicates an
>>>>>> error/bug.
>>>>>> I think there is a race somewhere.
>>>>>
>>>>> If you run on "-rt", the IRQ is threaded. This would explain the issue.
>>>>>
>>>>> Wolfgang.
>>>>
>>>> Andri, do you see the "TX register is still occupied!" error message?
>>>> In my case, I never saw it while working on the rt-related stalls
>>>> during my
>>>> tests.
>>>
>>> At a closer look I cannot identify a race. Anyway, the message should
>>> never show up.
>>
>> You could try to get a function trace by enabling functions tracing and
>> stopping it with "tracing_off" when the if block above is entered. A
>> filter for "cc770_*" and "netif_*" might be useful as well.
> 
> It seems to me these patches are still WIP, right? Or should I take some
> to into my git-tree?

Richard? My comment was about "TX register is still occupied!, which in 
fact seems not to happen.

Wolfgang.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-02-16 15:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason
2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason
2018-02-12 19:40   ` Richard Weinberger
2018-02-12 20:37     ` Wolfgang Grandegger
2018-02-12 20:44       ` Wolfgang Grandegger
2018-02-12 20:55         ` Richard Weinberger
2018-02-12 21:28           ` Wolfgang Grandegger
2018-02-12 21:36             ` Wolfgang Grandegger
2018-02-13 10:32               ` Andri Yngvason
2018-02-13 16:30                 ` Wolfgang Grandegger
2018-02-16 14:22               ` Marc Kleine-Budde
2018-02-16 15:20                 ` Wolfgang Grandegger
2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason
2018-02-12 19:45   ` Richard Weinberger
2018-02-12 19:41 ` [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Richard Weinberger
     [not found]   ` <151851733257.10946.11726494714017260046@maxwell>
2018-02-13 10:35     ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.