All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Net driver bugs fix
@ 2016-05-09 17:13 Elad Kanfi
  2016-05-09 17:13 ` [PATCH v3 1/2] net: nps_enet: Tx handler synchronization Elad Kanfi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Elad Kanfi @ 2016-05-09 17:13 UTC (permalink / raw)
  To: davem; +Cc: noamca, linux-kernel, abrodkin, eladkan, netdev

From: Elad Kanfi <eladkan@mellanox.com>

v3:
tx_packet_sent flag is not necessary, use socket buffer pointer
instead.
Use wmb() instead of smp_wmb().

v2:
Remove code style commit for now.
Code style commit will be added after the bugs fix will be approved.

Summary:
 1. Bug description: TX done interrupts that arrives while interrupts
    are masked, during NAPI poll, will not trigger an interrupt handling.
    Since TX interrupt is of level edge we will lose the TX done interrupt.
    As a result all pending tx frames will get no service.

    Solution: Check if there is a pending tx request after unmasking the
    interrupt and if answer is yes then re-add ourselves to
    the NAPI poll list.

 2. Bug description: CPU-A before sending a frame will set a variable
    to true. CPU-B that executes the tx done interrupt service routine
    might read a non valid value of that variable.

    Solution: Use the socket buffer pointer instead of the variable,
    and add a write memory barrier at the tx sending function after
    the pointer is set.

Elad Kanfi (2):
  net: nps_enet: Tx handler synchronization
  net: nps_enet: bug fix - handle lost tx interrupts

 drivers/net/ethernet/ezchip/nps_enet.c |   30 ++++++++++++++++++++++++------
 drivers/net/ethernet/ezchip/nps_enet.h |    2 --
 2 files changed, 24 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v3 1/2] net: nps_enet: Tx handler synchronization
  2016-05-09 17:13 [PATCH v3 0/2] Net driver bugs fix Elad Kanfi
@ 2016-05-09 17:13 ` Elad Kanfi
  2016-05-09 17:13 ` [PATCH v3 2/2] net: nps_enet: bug fix - handle lost tx interrupts Elad Kanfi
  2016-05-10 18:40 ` [PATCH v3 0/2] Net driver bugs fix David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Elad Kanfi @ 2016-05-09 17:13 UTC (permalink / raw)
  To: davem; +Cc: noamca, linux-kernel, abrodkin, eladkan, netdev

From: Elad Kanfi <eladkan@mellanox.com>

Below is a description of a possible problematic
sequence. CPU-A is sending a frame and CPU-B handles
the interrupt that indicates the frame was sent. CPU-B
reads an invalid value of tx_packet_sent.

	CPU-A				CPU-B
	-----				-----
	nps_enet_send_frame
	.
	.
	tx_skb = skb
	tx_packet_sent = true
	order HW to start tx
	.
	.
	HW complete tx
			    ------> 	get tx complete interrupt
					.
					.
					if(tx_packet_sent == true)
						handle tx_skb

	end memory transaction
	(tx_packet_sent actually
	 written)

Furthermore there is a dependency between tx_skb and tx_packet_sent.
There is no assurance that tx_skb contains a valid pointer at CPU B
when it sees tx_packet_sent == true.

Solution:

Initialize tx_skb to NULL and use it to indicate that packet was sent,
in this way tx_packet_sent can be removed.
Add a write memory barrier after setting tx_skb in order to make sure
that it is valid before HW is informed and IRQ is fired.

Fixed sequence will be:

       CPU-A                           CPU-B
       -----                           -----

	tx_skb = skb
	wmb()
	.
	.
	order HW to start tx
	.
	.
	HW complete tx
			------>		get tx complete interrupt
					.
					.
					if(tx_skb != NULL)
						handle tx_skb

					tx_skb = NULL

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Acked-by: Noam Camus <noamca@mellanox.com>
Acked-by: Gilad Ben-Yossef <giladby@mellanox.com>
---
 drivers/net/ethernet/ezchip/nps_enet.c |   15 +++++++++------
 drivers/net/ethernet/ezchip/nps_enet.h |    2 --
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c
index 1f23845..25ac2de 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -145,7 +145,7 @@ static void nps_enet_tx_handler(struct net_device *ndev)
 	u32 tx_ctrl_nt = (tx_ctrl_value & TX_CTL_NT_MASK) >> TX_CTL_NT_SHIFT;
 
 	/* Check if we got TX */
-	if (!priv->tx_packet_sent || tx_ctrl_ct)
+	if (!priv->tx_skb || tx_ctrl_ct)
 		return;
 
 	/* Ack Tx ctrl register */
@@ -160,7 +160,7 @@ static void nps_enet_tx_handler(struct net_device *ndev)
 	}
 
 	dev_kfree_skb(priv->tx_skb);
-	priv->tx_packet_sent = false;
+	priv->tx_skb = NULL;
 
 	if (netif_queue_stopped(ndev))
 		netif_wake_queue(ndev);
@@ -217,7 +217,7 @@ static irqreturn_t nps_enet_irq_handler(s32 irq, void *dev_instance)
 	u32 tx_ctrl_ct = (tx_ctrl_value & TX_CTL_CT_MASK) >> TX_CTL_CT_SHIFT;
 	u32 rx_ctrl_cr = (rx_ctrl_value & RX_CTL_CR_MASK) >> RX_CTL_CR_SHIFT;
 
-	if ((!tx_ctrl_ct && priv->tx_packet_sent) || rx_ctrl_cr)
+	if ((!tx_ctrl_ct && priv->tx_skb) || rx_ctrl_cr)
 		if (likely(napi_schedule_prep(&priv->napi))) {
 			nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
 			__napi_schedule(&priv->napi);
@@ -387,8 +387,6 @@ static void nps_enet_send_frame(struct net_device *ndev,
 	/* Write the length of the Frame */
 	tx_ctrl_value |= length << TX_CTL_NT_SHIFT;
 
-	/* Indicate SW is done */
-	priv->tx_packet_sent = true;
 	tx_ctrl_value |= NPS_ENET_ENABLE << TX_CTL_CT_SHIFT;
 	/* Send Frame */
 	nps_enet_reg_set(priv, NPS_ENET_REG_TX_CTL, tx_ctrl_value);
@@ -465,7 +463,7 @@ static s32 nps_enet_open(struct net_device *ndev)
 	s32 err;
 
 	/* Reset private variables */
-	priv->tx_packet_sent = false;
+	priv->tx_skb = NULL;
 	priv->ge_mac_cfg_2_value = 0;
 	priv->ge_mac_cfg_3_value = 0;
 
@@ -534,6 +532,11 @@ static netdev_tx_t nps_enet_start_xmit(struct sk_buff *skb,
 
 	priv->tx_skb = skb;
 
+	/* make sure tx_skb is actually written to the memory
+	 * before the HW is informed and the IRQ is fired.
+	 */
+	wmb();
+
 	nps_enet_send_frame(ndev, skb);
 
 	return NETDEV_TX_OK;
diff --git a/drivers/net/ethernet/ezchip/nps_enet.h b/drivers/net/ethernet/ezchip/nps_enet.h
index d0cab60..3939ca2 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.h
+++ b/drivers/net/ethernet/ezchip/nps_enet.h
@@ -165,14 +165,12 @@
  * struct nps_enet_priv - Storage of ENET's private information.
  * @regs_base:      Base address of ENET memory-mapped control registers.
  * @irq:            For RX/TX IRQ number.
- * @tx_packet_sent: SW indication if frame is being sent.
  * @tx_skb:         socket buffer of sent frame.
  * @napi:           Structure for NAPI.
  */
 struct nps_enet_priv {
 	void __iomem *regs_base;
 	s32 irq;
-	bool tx_packet_sent;
 	struct sk_buff *tx_skb;
 	struct napi_struct napi;
 	u32 ge_mac_cfg_2_value;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3 2/2] net: nps_enet: bug fix - handle lost tx interrupts
  2016-05-09 17:13 [PATCH v3 0/2] Net driver bugs fix Elad Kanfi
  2016-05-09 17:13 ` [PATCH v3 1/2] net: nps_enet: Tx handler synchronization Elad Kanfi
@ 2016-05-09 17:13 ` Elad Kanfi
  2016-05-10 18:40 ` [PATCH v3 0/2] Net driver bugs fix David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Elad Kanfi @ 2016-05-09 17:13 UTC (permalink / raw)
  To: davem; +Cc: noamca, linux-kernel, abrodkin, eladkan, netdev

From: Elad Kanfi <eladkan@mellanox.com>

The tx interrupt is of edge type, and in case such interrupt is triggered
while it is masked it will not be handled even after tx interrupts are
re-enabled in the end of NAPI poll.
This will cause tx network to stop in the following scenario:
 * Rx is being handled, hence interrupts are masked.
 * Tx interrupt is triggered after checking if there is some tx to handle
   and before re-enabling the interrupts.
In this situation only rx transaction will release tx requests.

In order to handle the tx that was missed( if there was one ),
a NAPI reschdule was added after enabling the interrupts.

Signed-off-by: Elad Kanfi <eladkan@mellanox.com>
Acked-by: Noam Camus <noamca@mellanox.com>
Acked-by: Gilad Ben-Yossef <giladby@mellanox.com>
---
 drivers/net/ethernet/ezchip/nps_enet.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c
index 25ac2de..085f912 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -183,6 +183,9 @@ static int nps_enet_poll(struct napi_struct *napi, int budget)
 	work_done = nps_enet_rx_handler(ndev);
 	if (work_done < budget) {
 		u32 buf_int_enable_value = 0;
+		u32 tx_ctrl_value = nps_enet_reg_get(priv, NPS_ENET_REG_TX_CTL);
+		u32 tx_ctrl_ct =
+			(tx_ctrl_value & TX_CTL_CT_MASK) >> TX_CTL_CT_SHIFT;
 
 		napi_complete(napi);
 
@@ -192,6 +195,18 @@ static int nps_enet_poll(struct napi_struct *napi, int budget)
 
 		nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE,
 				 buf_int_enable_value);
+
+		/* in case we will get a tx interrupt while interrupts
+		 * are masked, we will lose it since the tx is edge interrupt.
+		 * specifically, while executing the code section above,
+		 * between nps_enet_tx_handler and the interrupts enable, all
+		 * tx requests will be stuck until we will get an rx interrupt.
+		 * the two code lines below will solve this situation by
+		 * re-adding ourselves to the poll list.
+		 */
+
+		if (priv->tx_skb && !tx_ctrl_ct)
+			napi_reschedule(napi);
 	}
 
 	return work_done;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3 0/2] Net driver bugs fix
  2016-05-09 17:13 [PATCH v3 0/2] Net driver bugs fix Elad Kanfi
  2016-05-09 17:13 ` [PATCH v3 1/2] net: nps_enet: Tx handler synchronization Elad Kanfi
  2016-05-09 17:13 ` [PATCH v3 2/2] net: nps_enet: bug fix - handle lost tx interrupts Elad Kanfi
@ 2016-05-10 18:40 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2016-05-10 18:40 UTC (permalink / raw)
  To: eladkan; +Cc: noamca, linux-kernel, abrodkin, netdev

From: Elad Kanfi <eladkan@mellanox.com>
Date: Mon, 9 May 2016 20:13:18 +0300

> Summary:
>  1. Bug description: TX done interrupts that arrives while interrupts
>     are masked, during NAPI poll, will not trigger an interrupt handling.
>     Since TX interrupt is of level edge we will lose the TX done interrupt.
>     As a result all pending tx frames will get no service.
> 
>     Solution: Check if there is a pending tx request after unmasking the
>     interrupt and if answer is yes then re-add ourselves to
>     the NAPI poll list.
> 
>  2. Bug description: CPU-A before sending a frame will set a variable
>     to true. CPU-B that executes the tx done interrupt service routine
>     might read a non valid value of that variable.
> 
>     Solution: Use the socket buffer pointer instead of the variable,
>     and add a write memory barrier at the tx sending function after
>     the pointer is set.

Series applied, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-05-10 18:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-09 17:13 [PATCH v3 0/2] Net driver bugs fix Elad Kanfi
2016-05-09 17:13 ` [PATCH v3 1/2] net: nps_enet: Tx handler synchronization Elad Kanfi
2016-05-09 17:13 ` [PATCH v3 2/2] net: nps_enet: bug fix - handle lost tx interrupts Elad Kanfi
2016-05-10 18:40 ` [PATCH v3 0/2] Net driver bugs fix David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.