linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
@ 2020-05-27 13:47 Leon Romanovsky
  2020-05-27 20:32 ` Doug Ledford
  0 siblings, 1 reply; 4+ messages in thread
From: Leon Romanovsky @ 2020-05-27 13:47 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Valentine Fatiev, Alaa Hleihel, Alex Vesker, Erez Shitrit, linux-rdma

From: Valentine Fatiev <valentinef@mellanox.com>

While connected mode set and we have connected and datagram traffic
in parallel it might end with double free of datagram skb.

Current mechanism assumes that the order in the completion queue is the
same as the order of sent packets for all QPs. Order is kept only for
specific QP, in case of mixed UD and CM traffic we have few QPs(one UD
and few CM's) in parallel.

The problem:
----------------------------------------------------------

Transmit queue:
-----------------
UD skb pointer kept in queue itself, CM skb kept in spearate queue and
uses transmit queue as a placeholder to count the number of total
transmitted packets.

0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
------------------------------------------------------------
NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
    ^                                  ^
   tail                               head

Completion queue (problematic scenario) - the order not the same as in
the transmit queue:

  1  2  3  4  5  6  7  8  9
------------------------------------
 ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
------------------------------------

1. CM1 'wc' processing
   - skb freed in cm separate ring.
   - tx_tail of transmit queue increased although UD2 is not freed.
     Now driver assumes UD2 index is already freed and it could be used for
     new transmitted skb.

0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
------------------------------------------------------------
NL NL  UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
------------------------------------------------------------
        ^   ^                       ^
      (Bad)tail                    head
(Bad - Could be used for new SKB)

In this case (due to heavy load) UD2 skb pointer could be replaced by
new transmitted packet UD_NEW, as the driver assumes its free.
At this point we will have to process two 'wc' with same index but we
have only one pointer to free.
During second attempt to free the same skb we will have NULL pointer
exception.

2. UD2 'wc' processing
   - skb freed according the index we got from 'wc', but it was already
     overwritten by mistake. So actually the skb that was released is the
     skb of the new transmitted packet and not the original one.

3. UD_NEW 'wc' processing
   - attempt to free already freed skb. NUll pointer exception.

The fix:
-----------------------------------------------------------------------
The fix is to stop using the UD ring as a placeholder for CM packets,
the cyclic ring variables tx_head and tx_tail will manage the UD tx_ring,
a new cyclic variables global_tx_head and global_tx_tail are introduced
for managing and counting the overall outstanding sent packets, then the
send queue will be stopped and waken based on these variables only.

Note that no locking is needed since global_tx_head is updated in the xmit
flow and global_tx_tail is updated in the NAPI flow only.
A previous attempt tried to use one variable to count the outstanding sent
packets, but it did not work since xmit and NAPI flows can run at the same
time and the counter will be updated wrongly. Thus, we use the same simple
cyclic head and tail scheme that we have today for the UD tx_ring.

Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
Changelog:
v1:
 * Alaa rewrote the patch without atomic variables
v0: https://lore.kernel.org/linux-rdma/20200212072635.682689-5-leon@kernel.org/
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  4 ++++
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 15 +++++++++------
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   |  9 +++++++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++++----
 4 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index e188a95984b5..9a3379c49541 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -377,8 +377,12 @@ struct ipoib_dev_priv {
 	struct ipoib_rx_buf *rx_ring;

 	struct ipoib_tx_buf *tx_ring;
+	/* cyclic ring variables for managing tx_ring, for UD only */
 	unsigned int	     tx_head;
 	unsigned int	     tx_tail;
+	/* cyclic ring variables for counting overall outstanding send WRs */
+	unsigned int	     global_tx_head;
+	unsigned int	     global_tx_tail;
 	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
 	struct ib_ud_wr      tx_wr;
 	struct ib_wc	     send_wc[MAX_SEND_CQE];
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index c59e00a0881f..9bf0fa30df28 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -756,7 +756,8 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 		return;
 	}

-	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
+	if ((priv->global_tx_head - priv->global_tx_tail) ==
+	    ipoib_sendq_size - 1) {
 		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
 			  tx->qp->qp_num);
 		netif_stop_queue(dev);
@@ -786,7 +787,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 	} else {
 		netif_trans_update(dev);
 		++tx->tx_head;
-		++priv->tx_head;
+		++priv->global_tx_head;
 	}
 }

@@ -820,10 +821,11 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 	netif_tx_lock(dev);

 	++tx->tx_tail;
-	++priv->tx_tail;
+	++priv->global_tx_tail;

 	if (unlikely(netif_queue_stopped(dev) &&
-		     (priv->tx_head - priv->tx_tail) <= ipoib_sendq_size >> 1 &&
+		     ((priv->global_tx_head - priv->global_tx_tail) <=
+		      ipoib_sendq_size >> 1) &&
 		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
 		netif_wake_queue(dev);

@@ -1232,8 +1234,9 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 		dev_kfree_skb_any(tx_req->skb);
 		netif_tx_lock_bh(p->dev);
 		++p->tx_tail;
-		++priv->tx_tail;
-		if (unlikely(priv->tx_head - priv->tx_tail == ipoib_sendq_size >> 1) &&
+		++priv->global_tx_tail;
+		if (unlikely((priv->global_tx_head - priv->global_tx_tail) <=
+			     ipoib_sendq_size >> 1) &&
 		    netif_queue_stopped(p->dev) &&
 		    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 			netif_wake_queue(p->dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index c332b4761816..da3c5315bbb5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -407,9 +407,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 	dev_kfree_skb_any(tx_req->skb);

 	++priv->tx_tail;
+	++priv->global_tx_tail;

 	if (unlikely(netif_queue_stopped(dev) &&
-		     ((priv->tx_head - priv->tx_tail) <= ipoib_sendq_size >> 1) &&
+		     ((priv->global_tx_head - priv->global_tx_tail) <=
+		      ipoib_sendq_size >> 1) &&
 		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
 		netif_wake_queue(dev);

@@ -634,7 +636,8 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	else
 		priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
 	/* increase the tx_head after send success, but use it for queue state */
-	if (priv->tx_head - priv->tx_tail == ipoib_sendq_size - 1) {
+	if ((priv->global_tx_head - priv->global_tx_tail) ==
+	    ipoib_sendq_size - 1) {
 		ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 		netif_stop_queue(dev);
 	}
@@ -662,6 +665,7 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb,

 		rc = priv->tx_head;
 		++priv->tx_head;
+		++priv->global_tx_head;
 	}
 	return rc;
 }
@@ -807,6 +811,7 @@ int ipoib_ib_dev_stop_default(struct net_device *dev)
 				ipoib_dma_unmap_tx(priv, tx_req);
 				dev_kfree_skb_any(tx_req->skb);
 				++priv->tx_tail;
+				++priv->global_tx_tail;
 			}

 			for (i = 0; i < ipoib_recvq_size; ++i) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index d12e5c9c38af..3cfb682b91b0 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1183,9 +1183,11 @@ static void ipoib_timeout(struct net_device *dev, unsigned int txqueue)

 	ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
 		   jiffies_to_msecs(jiffies - dev_trans_start(dev)));
-	ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail %u\n",
-		   netif_queue_stopped(dev),
-		   priv->tx_head, priv->tx_tail);
+	ipoib_warn(priv,
+		   "queue stopped %d, tx_head %u, tx_tail %u, global_tx_head %u, global_tx_tail %u\n",
+		   netif_queue_stopped(dev), priv->tx_head, priv->tx_tail,
+		   priv->global_tx_head, priv->global_tx_tail);
+
 	/* XXX reset QP, etc. */
 }

@@ -1700,7 +1702,7 @@ static int ipoib_dev_init_default(struct net_device *dev)
 		goto out_rx_ring_cleanup;
 	}

-	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
+	/* priv->tx_head, tx_tail and global_tx_tail/head are already 0 */

 	if (ipoib_transport_dev_init(dev, priv->ca)) {
 		pr_warn("%s: ipoib_transport_dev_init failed\n",
--
2.26.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-05-27 13:47 [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
@ 2020-05-27 20:32 ` Doug Ledford
  2020-05-28  0:16   ` Jason Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Doug Ledford @ 2020-05-27 20:32 UTC (permalink / raw)
  To: Leon Romanovsky, Jason Gunthorpe
  Cc: Valentine Fatiev, Alaa Hleihel, Alex Vesker, Erez Shitrit, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 10480 bytes --]

On Wed, 2020-05-27 at 16:47 +0300, Leon Romanovsky wrote:
> From: Valentine Fatiev <valentinef@mellanox.com>
> 
> While connected mode set and we have connected and datagram traffic
> in parallel it might end with double free of datagram skb.
> 
> Current mechanism assumes that the order in the completion queue is
> the
> same as the order of sent packets for all QPs. Order is kept only for
> specific QP, in case of mixed UD and CM traffic we have few QPs(one UD
> and few CM's) in parallel.
> 
> The problem:
> ----------------------------------------------------------
> 
> Transmit queue:
> -----------------
> UD skb pointer kept in queue itself, CM skb kept in spearate queue and
> uses transmit queue as a placeholder to count the number of total
> transmitted packets.
> 
> 0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
> ------------------------------------------------------------
> NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
> ------------------------------------------------------------
>     ^                                  ^
>    tail                               head
> 
> Completion queue (problematic scenario) - the order not the same as in
> the transmit queue:
> 
>   1  2  3  4  5  6  7  8  9
> ------------------------------------
>  ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
> ------------------------------------
> 
> 1. CM1 'wc' processing
>    - skb freed in cm separate ring.
>    - tx_tail of transmit queue increased although UD2 is not freed.
>      Now driver assumes UD2 index is already freed and it could be
> used for
>      new transmitted skb.
> 
> 0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
> ------------------------------------------------------------
> NL NL  UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
> ------------------------------------------------------------
>         ^   ^                       ^
>       (Bad)tail                    head
> (Bad - Could be used for new SKB)
> 
> In this case (due to heavy load) UD2 skb pointer could be replaced by
> new transmitted packet UD_NEW, as the driver assumes its free.
> At this point we will have to process two 'wc' with same index but we
> have only one pointer to free.
> During second attempt to free the same skb we will have NULL pointer
> exception.
> 
> 2. UD2 'wc' processing
>    - skb freed according the index we got from 'wc', but it was
> already
>      overwritten by mistake. So actually the skb that was released is
> the
>      skb of the new transmitted packet and not the original one.
> 
> 3. UD_NEW 'wc' processing
>    - attempt to free already freed skb. NUll pointer exception.
> 
> The fix:
> ----------------------------------------------------------------------
> -
> The fix is to stop using the UD ring as a placeholder for CM packets,
> the cyclic ring variables tx_head and tx_tail will manage the UD
> tx_ring,
> a new cyclic variables global_tx_head and global_tx_tail are
> introduced
> for managing and counting the overall outstanding sent packets, then
> the
> send queue will be stopped and waken based on these variables only.
> 
> Note that no locking is needed since global_tx_head is updated in the
> xmit
> flow and global_tx_tail is updated in the NAPI flow only.
> A previous attempt tried to use one variable to count the outstanding
> sent
> packets, but it did not work since xmit and NAPI flows can run at the
> same
> time and the counter will be updated wrongly. Thus, we use the same
> simple
> cyclic head and tail scheme that we have today for the UD tx_ring.
> 
> Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable
> in all modes")
> Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
> Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>

This seems like a pretty important fix that should go to for-rc, not
for-next.

Regardless, looks good to me.
Acked-by: Doug Ledford <dledford@redhat.com>

> ---
> Changelog:
> v1:
>  * Alaa rewrote the patch without atomic variables
> v0: 
> https://lore.kernel.org/linux-rdma/20200212072635.682689-5-leon@kernel.org/
> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h      |  4 ++++
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 15 +++++++++------
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c   |  9 +++++++--
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++++----
>  4 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index e188a95984b5..9a3379c49541 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -377,8 +377,12 @@ struct ipoib_dev_priv {
>  	struct ipoib_rx_buf *rx_ring;
> 
>  	struct ipoib_tx_buf *tx_ring;
> +	/* cyclic ring variables for managing tx_ring, for UD only */
>  	unsigned int	     tx_head;
>  	unsigned int	     tx_tail;
> +	/* cyclic ring variables for counting overall outstanding send
> WRs */
> +	unsigned int	     global_tx_head;
> +	unsigned int	     global_tx_tail;
>  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
>  	struct ib_ud_wr      tx_wr;
>  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index c59e00a0881f..9bf0fa30df28 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -756,7 +756,8 @@ void ipoib_cm_send(struct net_device *dev, struct
> sk_buff *skb, struct ipoib_cm_
>  		return;
>  	}
> 
> -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> +	if ((priv->global_tx_head - priv->global_tx_tail) ==
> +	    ipoib_sendq_size - 1) {
>  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net
> queue\n",
>  			  tx->qp->qp_num);
>  		netif_stop_queue(dev);
> @@ -786,7 +787,7 @@ void ipoib_cm_send(struct net_device *dev, struct
> sk_buff *skb, struct ipoib_cm_
>  	} else {
>  		netif_trans_update(dev);
>  		++tx->tx_head;
> -		++priv->tx_head;
> +		++priv->global_tx_head;
>  	}
>  }
> 
> @@ -820,10 +821,11 @@ void ipoib_cm_handle_tx_wc(struct net_device
> *dev, struct ib_wc *wc)
>  	netif_tx_lock(dev);
> 
>  	++tx->tx_tail;
> -	++priv->tx_tail;
> +	++priv->global_tx_tail;
> 
>  	if (unlikely(netif_queue_stopped(dev) &&
> -		     (priv->tx_head - priv->tx_tail) <= ipoib_sendq_size
> >> 1 &&
> +		     ((priv->global_tx_head - priv->global_tx_tail) <=
> +		      ipoib_sendq_size >> 1) &&
>  		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
>  		netif_wake_queue(dev);
> 
> @@ -1232,8 +1234,9 @@ static void ipoib_cm_tx_destroy(struct
> ipoib_cm_tx *p)
>  		dev_kfree_skb_any(tx_req->skb);
>  		netif_tx_lock_bh(p->dev);
>  		++p->tx_tail;
> -		++priv->tx_tail;
> -		if (unlikely(priv->tx_head - priv->tx_tail ==
> ipoib_sendq_size >> 1) &&
> +		++priv->global_tx_tail;
> +		if (unlikely((priv->global_tx_head - priv-
> >global_tx_tail) <=
> +			     ipoib_sendq_size >> 1) &&
>  		    netif_queue_stopped(p->dev) &&
>  		    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
>  			netif_wake_queue(p->dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index c332b4761816..da3c5315bbb5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -407,9 +407,11 @@ static void ipoib_ib_handle_tx_wc(struct
> net_device *dev, struct ib_wc *wc)
>  	dev_kfree_skb_any(tx_req->skb);
> 
>  	++priv->tx_tail;
> +	++priv->global_tx_tail;
> 
>  	if (unlikely(netif_queue_stopped(dev) &&
> -		     ((priv->tx_head - priv->tx_tail) <=
> ipoib_sendq_size >> 1) &&
> +		     ((priv->global_tx_head - priv->global_tx_tail) <=
> +		      ipoib_sendq_size >> 1) &&
>  		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
>  		netif_wake_queue(dev);
> 
> @@ -634,7 +636,8 @@ int ipoib_send(struct net_device *dev, struct
> sk_buff *skb,
>  	else
>  		priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
>  	/* increase the tx_head after send success, but use it for queue
> state */
> -	if (priv->tx_head - priv->tx_tail == ipoib_sendq_size - 1) {
> +	if ((priv->global_tx_head - priv->global_tx_tail) ==
> +	    ipoib_sendq_size - 1) {
>  		ipoib_dbg(priv, "TX ring full, stopping kernel net
> queue\n");
>  		netif_stop_queue(dev);
>  	}
> @@ -662,6 +665,7 @@ int ipoib_send(struct net_device *dev, struct
> sk_buff *skb,
> 
>  		rc = priv->tx_head;
>  		++priv->tx_head;
> +		++priv->global_tx_head;
>  	}
>  	return rc;
>  }
> @@ -807,6 +811,7 @@ int ipoib_ib_dev_stop_default(struct net_device
> *dev)
>  				ipoib_dma_unmap_tx(priv, tx_req);
>  				dev_kfree_skb_any(tx_req->skb);
>  				++priv->tx_tail;
> +				++priv->global_tx_tail;
>  			}
> 
>  			for (i = 0; i < ipoib_recvq_size; ++i) {
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index d12e5c9c38af..3cfb682b91b0 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1183,9 +1183,11 @@ static void ipoib_timeout(struct net_device
> *dev, unsigned int txqueue)
> 
>  	ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
>  		   jiffies_to_msecs(jiffies - dev_trans_start(dev)));
> -	ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail %u\n",
> -		   netif_queue_stopped(dev),
> -		   priv->tx_head, priv->tx_tail);
> +	ipoib_warn(priv,
> +		   "queue stopped %d, tx_head %u, tx_tail %u,
> global_tx_head %u, global_tx_tail %u\n",
> +		   netif_queue_stopped(dev), priv->tx_head, priv-
> >tx_tail,
> +		   priv->global_tx_head, priv->global_tx_tail);
> +
>  	/* XXX reset QP, etc. */
>  }
> 
> @@ -1700,7 +1702,7 @@ static int ipoib_dev_init_default(struct
> net_device *dev)
>  		goto out_rx_ring_cleanup;
>  	}
> 
> -	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
> +	/* priv->tx_head, tx_tail and global_tx_tail/head are already 0
> */
> 
>  	if (ipoib_transport_dev_init(dev, priv->ca)) {
>  		pr_warn("%s: ipoib_transport_dev_init failed\n",
> --
> 2.26.2
> 

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-05-27 20:32 ` Doug Ledford
@ 2020-05-28  0:16   ` Jason Gunthorpe
  2020-05-28  0:30     ` Doug Ledford
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2020-05-28  0:16 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Leon Romanovsky, Valentine Fatiev, Alaa Hleihel, Alex Vesker,
	Erez Shitrit, linux-rdma

On Wed, May 27, 2020 at 04:32:45PM -0400, Doug Ledford wrote:
> On Wed, 2020-05-27 at 16:47 +0300, Leon Romanovsky wrote:
> > From: Valentine Fatiev <valentinef@mellanox.com>
> > 
> > While connected mode set and we have connected and datagram traffic
> > in parallel it might end with double free of datagram skb.
> > 
> > Current mechanism assumes that the order in the completion queue is
> > the
> > same as the order of sent packets for all QPs. Order is kept only for
> > specific QP, in case of mixed UD and CM traffic we have few QPs(one UD
> > and few CM's) in parallel.
> > 
> > The problem:
> > 
> > Transmit queue:
> > UD skb pointer kept in queue itself, CM skb kept in spearate queue and
> > uses transmit queue as a placeholder to count the number of total
> > transmitted packets.
> > 
> > 0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
> > NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
> >     ^                                  ^
> >    tail                               head
> > 
> > Completion queue (problematic scenario) - the order not the same as in
> > the transmit queue:
> > 
> >   1  2  3  4  5  6  7  8  9
> >  ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
> > 
> > 1. CM1 'wc' processing
> >    - skb freed in cm separate ring.
> >    - tx_tail of transmit queue increased although UD2 is not freed.
> >      Now driver assumes UD2 index is already freed and it could be
> > used for
> >      new transmitted skb.
> > 
> > 0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
> > NL NL  UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
> >         ^   ^                       ^
> >       (Bad)tail                    head
> > (Bad - Could be used for new SKB)
> > 
> > In this case (due to heavy load) UD2 skb pointer could be replaced by
> > new transmitted packet UD_NEW, as the driver assumes its free.
> > At this point we will have to process two 'wc' with same index but we
> > have only one pointer to free.
> > During second attempt to free the same skb we will have NULL pointer
> > exception.
> > 
> > 2. UD2 'wc' processing
> >    - skb freed according the index we got from 'wc', but it was
> > already
> >      overwritten by mistake. So actually the skb that was released is
> > the
> >      skb of the new transmitted packet and not the original one.
> > 
> > 3. UD_NEW 'wc' processing
> >    - attempt to free already freed skb. NUll pointer exception.
> > 
> > The fix:
> > -
> > The fix is to stop using the UD ring as a placeholder for CM packets,
> > the cyclic ring variables tx_head and tx_tail will manage the UD
> > tx_ring,
> > a new cyclic variables global_tx_head and global_tx_tail are
> > introduced
> > for managing and counting the overall outstanding sent packets, then
> > the
> > send queue will be stopped and waken based on these variables only.
> > 
> > Note that no locking is needed since global_tx_head is updated in the
> > xmit
> > flow and global_tx_tail is updated in the NAPI flow only.
> > A previous attempt tried to use one variable to count the outstanding
> > sent
> > packets, but it did not work since xmit and NAPI flows can run at the
> > same
> > time and the counter will be updated wrongly. Thus, we use the same
> > simple
> > cyclic head and tail scheme that we have today for the UD tx_ring.
> > 
> > Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable
> > in all modes")
> > Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
> > Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> 
> This seems like a pretty important fix that should go to for-rc, not
> for-next.
> 
> Regardless, looks good to me.
> Acked-by: Doug Ledford <dledford@redhat.com>

Sure, it looks reasonable for -rc, but the crash is not so common

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-05-28  0:16   ` Jason Gunthorpe
@ 2020-05-28  0:30     ` Doug Ledford
  0 siblings, 0 replies; 4+ messages in thread
From: Doug Ledford @ 2020-05-28  0:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Valentine Fatiev, Alaa Hleihel, Alex Vesker,
	Erez Shitrit, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 562 bytes --]

On Wed, 2020-05-27 at 21:16 -0300, Jason Gunthorpe wrote:
> > This seems like a pretty important fix that should go to for-rc, not
> > for-next.
> > 
> > Regardless, looks good to me.
> > Acked-by: Doug Ledford <dledford@redhat.com>
> 
> Sure, it looks reasonable for -rc, but the crash is not so common

Yeah, it's not that easy to trigger, but it's an oops when you do ;-)

> Applied to for-rc, thanks

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-05-28  0:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 13:47 [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
2020-05-27 20:32 ` Doug Ledford
2020-05-28  0:16   ` Jason Gunthorpe
2020-05-28  0:30     ` Doug Ledford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).