All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/4] net/mlx4: Bug fixes for the mlx4_en driver
@ 2012-06-21  9:19 Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 1/4] net/mlx4_en: Set correct port parameters during device initialization Yevgeny Petrilin
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-06-21  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev

This is a set of 4 bug fixes generated agains net tree:

Yevgeny Petrilin (4):
	net/mlx4_en: Set correct port parameters during device initialization
	net/mlx4: Use single completion vector after NOP failure
	net/mlx4_en: Release QP range in free_resources
	net/mlx4_en: Use atomic counter to decide when queue is full

 en_netdev.c |   18 ++++++++++++------
 en_tx.c     |   16 ++++++++--------
 main.c      |    2 ++
 mlx4_en.h   |    2 ++
 4 files changed, 24 insertions(+), 14 deletions(-)

Thanks,
Yevgeny

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] net/mlx4_en: Set correct port parameters during device initialization
  2012-06-21  9:19 [PATCH net 0/4] net/mlx4: Bug fixes for the mlx4_en driver Yevgeny Petrilin
@ 2012-06-21  9:19 ` Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 2/4] net/mlx4: Use single completion vector after NOP failure Yevgeny Petrilin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-06-21  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yevgeny Petrilin

Set valid port parameters: MTU and flow control configuration when
configuring the port during HW device initialization,
prior to the net device open() being called.
Using  invalid parameters (such as all zeros)
could lead to bad firmware behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 926d8aa..a80280e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1204,9 +1204,11 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	en_warn(priv, "Using %d RX rings\n", prof->rx_ring_num);
 
 	/* Configure port */
+	mlx4_en_calc_rx_buf(dev);
 	err = mlx4_SET_PORT_general(mdev->dev, priv->port,
-				    MLX4_EN_MIN_MTU,
-				    0, 0, 0, 0);
+				    priv->rx_skb_size + ETH_FCS_LEN,
+				    prof->tx_pause, prof->tx_ppp,
+				    prof->rx_pause, prof->rx_ppp);
 	if (err) {
 		en_err(priv, "Failed setting port general configurations "
 		       "for port %d, with error %d\n", priv->port, err);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] net/mlx4: Use single completion vector after NOP failure
  2012-06-21  9:19 [PATCH net 0/4] net/mlx4: Bug fixes for the mlx4_en driver Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 1/4] net/mlx4_en: Set correct port parameters during device initialization Yevgeny Petrilin
@ 2012-06-21  9:19 ` Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 3/4] net/mlx4_en: Release QP range in free_resources Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full Yevgeny Petrilin
  3 siblings, 0 replies; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-06-21  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yevgeny Petrilin

Fix a crash at the error flow of NOP command which caused the driver to try and use
a completion vector which wasn't allocated.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/main.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 18c8deb..c91a2b8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -1978,6 +1978,8 @@ slave_start:
 	if (err == -EBUSY && (dev->flags & MLX4_FLAG_MSI_X) &&
 	    !mlx4_is_mfunc(dev)) {
 		dev->flags &= ~MLX4_FLAG_MSI_X;
+		dev->caps.num_comp_vectors = 1;
+		dev->caps.comp_pool	   = 0;
 		pci_disable_msix(pdev);
 		err = mlx4_setup_hca(dev);
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] net/mlx4_en: Release QP range in free_resources
  2012-06-21  9:19 [PATCH net 0/4] net/mlx4: Bug fixes for the mlx4_en driver Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 1/4] net/mlx4_en: Set correct port parameters during device initialization Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 2/4] net/mlx4: Use single completion vector after NOP failure Yevgeny Petrilin
@ 2012-06-21  9:19 ` Yevgeny Petrilin
  2012-06-21  9:19 ` [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full Yevgeny Petrilin
  3 siblings, 0 replies; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-06-21  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yevgeny Petrilin

Add a missing resource release in ring cleanup.
Not doing this leaves a range of QPs that are being reserved,
and no one can use them.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |   12 ++++++++----
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |    1 +
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index a80280e..073b85b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -929,15 +929,20 @@ void mlx4_en_free_resources(struct mlx4_en_priv *priv)
 		if (priv->rx_cq[i].buf)
 			mlx4_en_destroy_cq(priv, &priv->rx_cq[i]);
 	}
+
+	if (priv->base_tx_qpn) {
+		mlx4_qp_release_range(priv->mdev->dev, priv->base_tx_qpn, priv->tx_ring_num);
+		priv->base_tx_qpn = 0;
+	}
 }
 
 int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 {
 	struct mlx4_en_port_profile *prof = priv->prof;
 	int i;
-	int base_tx_qpn, err;
+	int err;
 
-	err = mlx4_qp_reserve_range(priv->mdev->dev, priv->tx_ring_num, 256, &base_tx_qpn);
+	err = mlx4_qp_reserve_range(priv->mdev->dev, priv->tx_ring_num, 256, &priv->base_tx_qpn);
 	if (err) {
 		en_err(priv, "failed reserving range for TX rings\n");
 		return err;
@@ -949,7 +954,7 @@ int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 				      prof->tx_ring_size, i, TX))
 			goto err;
 
-		if (mlx4_en_create_tx_ring(priv, &priv->tx_ring[i], base_tx_qpn + i,
+		if (mlx4_en_create_tx_ring(priv, &priv->tx_ring[i], priv->base_tx_qpn + i,
 					   prof->tx_ring_size, TXBB_SIZE))
 			goto err;
 	}
@@ -969,7 +974,6 @@ int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 
 err:
 	en_err(priv, "Failed to allocate NIC resources\n");
-	mlx4_qp_release_range(priv->mdev->dev, base_tx_qpn, priv->tx_ring_num);
 	return -ENOMEM;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 6ae3509..225c20d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -495,6 +495,7 @@ struct mlx4_en_priv {
 	int vids[128];
 	bool wol;
 	struct device *ddev;
+	int base_tx_qpn;
 
 #ifdef CONFIG_MLX4_EN_DCB
 	struct ieee_ets ets;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full
  2012-06-21  9:19 [PATCH net 0/4] net/mlx4: Bug fixes for the mlx4_en driver Yevgeny Petrilin
                   ` (2 preceding siblings ...)
  2012-06-21  9:19 ` [PATCH 3/4] net/mlx4_en: Release QP range in free_resources Yevgeny Petrilin
@ 2012-06-21  9:19 ` Yevgeny Petrilin
  2012-06-23  0:23   ` David Miller
  3 siblings, 1 reply; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-06-21  9:19 UTC (permalink / raw)
  To: davem; +Cc: netdev, Yevgeny Petrilin

The Transmit and transmit completion flows execute from different contexts,
which are not synchronized. Hence naive reading the of consumer index might
give wrong value by the time it is being used, That could lead to a state of transmit timeout.
Fix that by using atomic variable to maintain that index.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   16 ++++++++--------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |    1 +
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 019d856..f4b4703 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -165,6 +165,7 @@ int mlx4_en_activate_tx_ring(struct mlx4_en_priv *priv,
 	ring->last_nr_txbb = 1;
 	ring->poll_cnt = 0;
 	ring->blocked = 0;
+	atomic_set(&ring->inflight, 0);
 	memset(ring->tx_info, 0, ring->size * sizeof(struct mlx4_en_tx_info));
 	memset(ring->buf, 0, ring->buf_size);
 
@@ -364,15 +365,13 @@ static void mlx4_en_process_tx_cq(struct net_device *dev, struct mlx4_en_cq *cq)
 	wmb();
 	ring->cons += txbbs_skipped;
 	netdev_tx_completed_queue(ring->tx_queue, packets, bytes);
+	atomic_sub(txbbs_skipped, &ring->inflight);
 
 	/* Wakeup Tx queue if this ring stopped it */
-	if (unlikely(ring->blocked)) {
-		if ((u32) (ring->prod - ring->cons) <=
-		     ring->size - HEADROOM - MAX_DESC_TXBBS) {
-			ring->blocked = 0;
-			netif_tx_wake_queue(ring->tx_queue);
-			priv->port_stats.wake_queue++;
-		}
+	if (unlikely(ring->blocked && txbbs_skipped > 0)) {
+		ring->blocked = 0;
+		netif_tx_wake_queue(ring->tx_queue);
+		priv->port_stats.wake_queue++;
 	}
 }
 
@@ -588,7 +587,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		vlan_tag = vlan_tx_tag_get(skb);
 
 	/* Check available TXBBs And 2K spare for prefetch */
-	if (unlikely(((int)(ring->prod - ring->cons)) >
+	if (unlikely(atomic_read(&ring->inflight) >
 		     ring->size - HEADROOM - MAX_DESC_TXBBS)) {
 		/* every full Tx ring stops queue */
 		netif_tx_stop_queue(ring->tx_queue);
@@ -710,6 +709,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	ring->prod += nr_txbb;
+	atomic_add(nr_txbb, &ring->inflight);
 
 	/* If we used a bounce buffer then copy descriptor back into place */
 	if (bounce)
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 225c20d..6a8a69d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -257,6 +257,7 @@ struct mlx4_en_tx_ring {
 	struct mlx4_bf bf;
 	bool bf_enabled;
 	struct netdev_queue *tx_queue;
+	atomic_t inflight;
 };
 
 struct mlx4_en_rx_desc {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full
  2012-06-21  9:19 ` [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full Yevgeny Petrilin
@ 2012-06-23  0:23   ` David Miller
  2012-06-25  9:00     ` David Laight
  2012-06-25 13:06     ` Yevgeny Petrilin
  0 siblings, 2 replies; 9+ messages in thread
From: David Miller @ 2012-06-23  0:23 UTC (permalink / raw)
  To: yevgenyp; +Cc: netdev

From: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Date: Thu, 21 Jun 2012 12:19:17 +0300

> The Transmit and transmit completion flows execute from different contexts,
> which are not synchronized. Hence naive reading the of consumer index might
> give wrong value by the time it is being used, That could lead to a state of transmit timeout.
> Fix that by using atomic variable to maintain that index.
> 
> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>

I'm not convinced.  There is only one place that actually changes
the counter.

So it seems more like you have a missing memory barrier somewhere.

Other drivers do not need to use something as expansive as an atomic
variable for this and neither should you.

I'm not applying this patch series, you'll need to resubmit it in
it's entirety once you fix this patch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full
  2012-06-23  0:23   ` David Miller
@ 2012-06-25  9:00     ` David Laight
  2012-06-25 10:12       ` Eric Dumazet
  2012-06-25 13:06     ` Yevgeny Petrilin
  1 sibling, 1 reply; 9+ messages in thread
From: David Laight @ 2012-06-25  9:00 UTC (permalink / raw)
  To: David Miller, yevgenyp; +Cc: netdev

 
> > The Transmit and transmit completion flows execute from different
contexts,
> > which are not synchronized. Hence naive reading the of consumer
index might
> > give wrong value by the time it is being used, That could lead to a
state of transmit timeout.
> > Fix that by using atomic variable to maintain that index.
> > 
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> I'm not convinced.  There is only one place that actually changes
> the counter.
> 
> So it seems more like you have a missing memory barrier somewhere.

Or just keep the two ring indexes - instead of keeping the
number of 'active' entries as well.
Then you don't have a variable which the tx setup and
tx completion routines both update.

	David

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full
  2012-06-25  9:00     ` David Laight
@ 2012-06-25 10:12       ` Eric Dumazet
  0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2012-06-25 10:12 UTC (permalink / raw)
  To: David Laight; +Cc: David Miller, yevgenyp, netdev

On Mon, 2012-06-25 at 10:00 +0100, David Laight wrote:
> > > The Transmit and transmit completion flows execute from different
> contexts,
> > > which are not synchronized. Hence naive reading the of consumer
> index might
> > > give wrong value by the time it is being used, That could lead to a
> state of transmit timeout.
> > > Fix that by using atomic variable to maintain that index.
> > > 
> > > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > 
> > I'm not convinced.  There is only one place that actually changes
> > the counter.
> > 
> > So it seems more like you have a missing memory barrier somewhere.
> 
> Or just keep the two ring indexes - instead of keeping the
> number of 'active' entries as well.
> Then you don't have a variable which the tx setup and
> tx completion routines both update.

This is what was implied by David.

Using a producer/consumer index and appropriate memory barriers.

start_xmit() and tx completion can be truly lockless and atomicless in
their fast path.

There are many drivers doing that correctly.

tg3 driver is a good example.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full
  2012-06-23  0:23   ` David Miller
  2012-06-25  9:00     ` David Laight
@ 2012-06-25 13:06     ` Yevgeny Petrilin
  1 sibling, 0 replies; 9+ messages in thread
From: Yevgeny Petrilin @ 2012-06-25 13:06 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

> > The Transmit and transmit completion flows execute from different
> > contexts, which are not synchronized. Hence naive reading the of
> > consumer index might give wrong value by the time it is being used, That could lead to a state of transmit timeout.
> > Fix that by using atomic variable to maintain that index.
> >
> > Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> 
> I'm not convinced.  There is only one place that actually changes the counter.
> 
> So it seems more like you have a missing memory barrier somewhere.
> 
> Other drivers do not need to use something as expansive as an atomic
> variable for this and neither should you.
> 
> I'm not applying this patch series, you'll need to resubmit it in it's
> entirety once you fix this patch.

Thanks,
I'll resubmit the other 3 and continue to work on this one.

Thanks,
Yevgeny

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-06-25 13:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-21  9:19 [PATCH net 0/4] net/mlx4: Bug fixes for the mlx4_en driver Yevgeny Petrilin
2012-06-21  9:19 ` [PATCH 1/4] net/mlx4_en: Set correct port parameters during device initialization Yevgeny Petrilin
2012-06-21  9:19 ` [PATCH 2/4] net/mlx4: Use single completion vector after NOP failure Yevgeny Petrilin
2012-06-21  9:19 ` [PATCH 3/4] net/mlx4_en: Release QP range in free_resources Yevgeny Petrilin
2012-06-21  9:19 ` [PATCH 4/4] net/mlx4_en: Use atomic counter to decide when queue is full Yevgeny Petrilin
2012-06-23  0:23   ` David Miller
2012-06-25  9:00     ` David Laight
2012-06-25 10:12       ` Eric Dumazet
2012-06-25 13:06     ` Yevgeny Petrilin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.