All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next 00/10][pull request] Intel Wired LAN Driver Update
@ 2011-08-21  7:29 Jeff Kirsher
  2011-08-21  7:29 ` [net-next 01/10] ixgbe: Simplify transmit cleanup path Jeff Kirsher
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo

The following series contains updates to ixgbe.

The following are changes since commit ca1ba7caa68520864e4b9227e67f3bbc6fed373b:
  Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next
and are available in the git repository at:
  master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (10):
  ixgbe: Simplify transmit cleanup path
  ixgbe: convert rings from q_vector bit indexed array to linked list
  ixgbe: Drop the TX work limit and instead just leave it to budget
  ixgbe: consolidate all MSI-X ring interrupts and poll routines into
    one
  ixgbe: cleanup allocation and freeing of IRQ affinity hint
  ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA
  ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt
    types
  ixgbe: Drop unnecessary adapter->hw dereference in loopback test
    setup
  ixgbe: combine PCI_VDEVICE and board declaration to same line
  ixgbe: Update TXDCTL configuration to correctly handle WTHRESH

 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   11 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   25 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  738 +++++++---------------
 3 files changed, 239 insertions(+), 535 deletions(-)

-- 
1.7.6


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [net-next 01/10] ixgbe: Simplify transmit cleanup path
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 02/10] ixgbe: convert rings from q_vector bit indexed array to linked list Jeff Kirsher
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch helps to simplify the work being done by the transmit path by
removing the unnecessary compares between count and the work limit.  Instead
we can simplify this by just adding a budget value that will act as a count
down from the work limit value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e8aad76..e5a4eb6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -804,13 +804,13 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
+	u16 budget = q_vector->tx.work_limit;
 	u16 i = tx_ring->next_to_clean;
-	u16 count;
 
 	tx_buffer = &tx_ring->tx_buffer_info[i];
 	tx_desc = IXGBE_TX_DESC_ADV(tx_ring, i);
 
-	for (count = 0; count < q_vector->tx.work_limit; count++) {
+	for (; budget; budget--) {
 		union ixgbe_adv_tx_desc *eop_desc = tx_buffer->next_to_watch;
 
 		/* if next_to_watch is not set then there is no work pending */
@@ -891,11 +891,11 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		ixgbe_tx_timeout_reset(adapter);
 
 		/* the adapter is about to reset, no point in enabling stuff */
-		return true;
+		return budget;
 	}
 
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
-	if (unlikely(count && netif_carrier_ok(tx_ring->netdev) &&
+	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
 		     (ixgbe_desc_unused(tx_ring) >= TX_WAKE_THRESHOLD))) {
 		/* Make sure that anybody stopping the queue after this
 		 * sees the new next_to_clean.
@@ -908,7 +908,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		}
 	}
 
-	return count < q_vector->tx.work_limit;
+	return budget;
 }
 
 #ifdef CONFIG_IXGBE_DCA
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 02/10] ixgbe: convert rings from q_vector bit indexed array to linked list
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
  2011-08-21  7:29 ` [net-next 01/10] ixgbe: Simplify transmit cleanup path Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget Jeff Kirsher
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change converts the current bit array into a linked list so that the
q_vectors can simply go through ring by ring and locate each ring needing
to be cleaned.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    7 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  192 ++++++++-----------------
 2 files changed, 60 insertions(+), 139 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 378ce46..dc3b12e 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -209,6 +209,7 @@ enum ixbge_ring_state_t {
 #define clear_ring_rsc_enabled(ring) \
 	clear_bit(__IXGBE_RX_RSC_ENABLED, &(ring)->state)
 struct ixgbe_ring {
+	struct ixgbe_ring *next;	/* pointer to next ring in q_vector */
 	void *desc;			/* descriptor ring memory */
 	struct device *dev;             /* device for DMA mapping */
 	struct net_device *netdev;      /* netdev ring belongs to */
@@ -277,11 +278,7 @@ struct ixgbe_ring_feature {
 } ____cacheline_internodealigned_in_smp;
 
 struct ixgbe_ring_container {
-#if MAX_RX_QUEUES > MAX_TX_QUEUES
-	DECLARE_BITMAP(idx, MAX_RX_QUEUES);
-#else
-	DECLARE_BITMAP(idx, MAX_TX_QUEUES);
-#endif
+	struct ixgbe_ring *ring;	/* pointer to linked list of rings */
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
 	u16 work_limit;			/* total work allowed per interrupt */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e5a4eb6..bb54d3d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -974,26 +974,17 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter,
 static void ixgbe_update_dca(struct ixgbe_q_vector *q_vector)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
+	struct ixgbe_ring *ring;
 	int cpu = get_cpu();
-	long r_idx;
-	int i;
 
 	if (q_vector->cpu == cpu)
 		goto out_no_update;
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		ixgbe_update_tx_dca(adapter, adapter->tx_ring[r_idx], cpu);
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
+	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+		ixgbe_update_tx_dca(adapter, ring, cpu);
 
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		ixgbe_update_rx_dca(adapter, adapter->rx_ring[r_idx], cpu);
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
+	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
+		ixgbe_update_rx_dca(adapter, ring, cpu);
 
 	q_vector->cpu = cpu;
 out_no_update:
@@ -1546,7 +1537,7 @@ static int ixgbe_clean_rxonly(struct napi_struct *, int);
 static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 {
 	struct ixgbe_q_vector *q_vector;
-	int i, q_vectors, v_idx, r_idx;
+	int q_vectors, v_idx;
 	u32 mask;
 
 	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
@@ -1556,33 +1547,19 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 	 * corresponding register.
 	 */
 	for (v_idx = 0; v_idx < q_vectors; v_idx++) {
+		struct ixgbe_ring *ring;
 		q_vector = adapter->q_vector[v_idx];
-		/* XXX for_each_set_bit(...) */
-		r_idx = find_first_bit(q_vector->rx.idx,
-				       adapter->num_rx_queues);
-
-		for (i = 0; i < q_vector->rx.count; i++) {
-			u8 reg_idx = adapter->rx_ring[r_idx]->reg_idx;
-			ixgbe_set_ivar(adapter, 0, reg_idx, v_idx);
-			r_idx = find_next_bit(q_vector->rx.idx,
-					      adapter->num_rx_queues,
-					      r_idx + 1);
-		}
-		r_idx = find_first_bit(q_vector->tx.idx,
-				       adapter->num_tx_queues);
-
-		for (i = 0; i < q_vector->tx.count; i++) {
-			u8 reg_idx = adapter->tx_ring[r_idx]->reg_idx;
-			ixgbe_set_ivar(adapter, 1, reg_idx, v_idx);
-			r_idx = find_next_bit(q_vector->tx.idx,
-					      adapter->num_tx_queues,
-					      r_idx + 1);
-		}
 
-		if (q_vector->tx.count && !q_vector->rx.count)
+		for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
+			ixgbe_set_ivar(adapter, 0, ring->reg_idx, v_idx);
+
+		for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+			ixgbe_set_ivar(adapter, 1, ring->reg_idx, v_idx);
+
+		if (q_vector->tx.ring && !q_vector->rx.ring)
 			/* tx only */
 			q_vector->eitr = adapter->tx_eitr_param;
-		else if (q_vector->rx.count)
+		else if (q_vector->rx.ring)
 			/* rx or mixed */
 			q_vector->eitr = adapter->rx_eitr_param;
 
@@ -2006,20 +1983,10 @@ static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
 static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
-	struct ixgbe_adapter  *adapter = q_vector->adapter;
-	struct ixgbe_ring     *tx_ring;
-	int i, r_idx;
 
 	if (!q_vector->tx.count)
 		return IRQ_HANDLED;
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		tx_ring = adapter->tx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
-
 	/* EIAM disabled interrupts (on this vector) for us */
 	napi_schedule(&q_vector->napi);
 
@@ -2034,22 +2001,6 @@ static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data)
 static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
-	struct ixgbe_adapter  *adapter = q_vector->adapter;
-	struct ixgbe_ring  *rx_ring;
-	int r_idx;
-	int i;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		rx_ring = adapter->rx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
 
 	if (!q_vector->rx.count)
 		return IRQ_HANDLED;
@@ -2063,28 +2014,10 @@ static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data)
 static irqreturn_t ixgbe_msix_clean_many(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
-	struct ixgbe_adapter  *adapter = q_vector->adapter;
-	struct ixgbe_ring  *ring;
-	int r_idx;
-	int i;
 
 	if (!q_vector->tx.count && !q_vector->rx.count)
 		return IRQ_HANDLED;
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		ring = adapter->tx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
-
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		ring = adapter->rx_ring[r_idx];
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
-
 	/* EIAM disabled interrupts (on this vector) for us */
 	napi_schedule(&q_vector->napi);
 
@@ -2104,19 +2037,14 @@ static int ixgbe_clean_rxonly(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 			       container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *rx_ring = NULL;
 	int work_done = 0;
-	long r_idx;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	rx_ring = adapter->rx_ring[r_idx];
-
-	ixgbe_clean_rx_irq(q_vector, rx_ring, &work_done, budget);
+	ixgbe_clean_rx_irq(q_vector, q_vector->rx.ring, &work_done, budget);
 
 	/* If all Rx work done, exit the polling mode */
 	if (work_done < budget) {
@@ -2144,38 +2072,29 @@ static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 			       container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *ring = NULL;
-	int work_done = 0, i;
-	long r_idx;
-	bool tx_clean_complete = true;
+	struct ixgbe_ring *ring;
+	int work_done = 0;
+	bool clean_complete = true;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	for (i = 0; i < q_vector->tx.count; i++) {
-		ring = adapter->tx_ring[r_idx];
-		tx_clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
-		r_idx = find_next_bit(q_vector->tx.idx, adapter->num_tx_queues,
-				      r_idx + 1);
-	}
+	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
 
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
 	budget /= (q_vector->rx.count ?: 1);
 	budget = max(budget, 1);
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	for (i = 0; i < q_vector->rx.count; i++) {
-		ring = adapter->rx_ring[r_idx];
+
+	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
 		ixgbe_clean_rx_irq(q_vector, ring, &work_done, budget);
-		r_idx = find_next_bit(q_vector->rx.idx, adapter->num_rx_queues,
-				      r_idx + 1);
-	}
 
-	r_idx = find_first_bit(q_vector->rx.idx, adapter->num_rx_queues);
-	ring = adapter->rx_ring[r_idx];
+	if (!clean_complete)
+		work_done = budget;
+
 	/* If all Rx work done, exit the polling mode */
 	if (work_done < budget) {
 		napi_complete(napi);
@@ -2203,32 +2122,23 @@ static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 			       container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *tx_ring = NULL;
-	int work_done = 0;
-	long r_idx;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	r_idx = find_first_bit(q_vector->tx.idx, adapter->num_tx_queues);
-	tx_ring = adapter->tx_ring[r_idx];
-
-	if (!ixgbe_clean_tx_irq(q_vector, tx_ring))
-		work_done = budget;
+	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring))
+		return budget;
 
 	/* If all Tx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->tx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter,
-						((u64)1 << q_vector->v_idx));
-	}
+	napi_complete(napi);
+	if (adapter->tx_itr_setting & 1)
+		ixgbe_set_itr(q_vector);
+	if (!test_bit(__IXGBE_DOWN, &adapter->state))
+		ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx));
 
-	return work_done;
+	return 0;
 }
 
 static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx,
@@ -2237,9 +2147,10 @@ static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx,
 	struct ixgbe_q_vector *q_vector = a->q_vector[v_idx];
 	struct ixgbe_ring *rx_ring = a->rx_ring[r_idx];
 
-	set_bit(r_idx, q_vector->rx.idx);
-	q_vector->rx.count++;
 	rx_ring->q_vector = q_vector;
+	rx_ring->next = q_vector->rx.ring;
+	q_vector->rx.ring = rx_ring;
+	q_vector->rx.count++;
 }
 
 static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
@@ -2248,9 +2159,10 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
 	struct ixgbe_q_vector *q_vector = a->q_vector[v_idx];
 	struct ixgbe_ring *tx_ring = a->tx_ring[t_idx];
 
-	set_bit(t_idx, q_vector->tx.idx);
-	q_vector->tx.count++;
 	tx_ring->q_vector = q_vector;
+	tx_ring->next = q_vector->tx.ring;
+	q_vector->tx.ring = tx_ring;
+	q_vector->tx.count++;
 	q_vector->tx.work_limit = a->tx_work_limit;
 }
 
@@ -2508,14 +2420,26 @@ static irqreturn_t ixgbe_intr(int irq, void *data)
 
 static inline void ixgbe_reset_q_vectors(struct ixgbe_adapter *adapter)
 {
-	int i, q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int i;
+
+	/* legacy and MSI only use one vector */
+	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
+		q_vectors = 1;
+
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		adapter->rx_ring[i]->q_vector = NULL;
+		adapter->rx_ring[i]->next = NULL;
+	}
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		adapter->tx_ring[i]->q_vector = NULL;
+		adapter->tx_ring[i]->next = NULL;
+	}
 
 	for (i = 0; i < q_vectors; i++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
-		bitmap_zero(q_vector->rx.idx, MAX_RX_QUEUES);
-		bitmap_zero(q_vector->tx.idx, MAX_TX_QUEUES);
-		q_vector->rx.count = 0;
-		q_vector->tx.count = 0;
+		memset(&q_vector->rx, 0, sizeof(struct ixgbe_ring_container));
+		memset(&q_vector->tx, 0, sizeof(struct ixgbe_ring_container));
 	}
 }
 
@@ -5923,7 +5847,7 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
 		/* get one bit for every active tx/rx interrupt vector */
 		for (i = 0; i < adapter->num_msix_vectors - NON_Q_VECTORS; i++) {
 			struct ixgbe_q_vector *qv = adapter->q_vector[i];
-			if (qv->rx.count || qv->tx.count)
+			if (qv->rx.ring || qv->tx.ring)
 				eics |= ((u64)1 << i);
 		}
 	}
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
  2011-08-21  7:29 ` [net-next 01/10] ixgbe: Simplify transmit cleanup path Jeff Kirsher
  2011-08-21  7:29 ` [net-next 02/10] ixgbe: convert rings from q_vector bit indexed array to linked list Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21 14:01   ` Ben Hutchings
  2011-08-21  7:29 ` [net-next 04/10] ixgbe: consolidate all MSI-X ring interrupts and poll routines into one Jeff Kirsher
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change makes it so that the TX work limit is now obsolete.  Instead of
using it we can instead rely on the NAPI budget for the number of packets
we should clean per interrupt.  The advantage to this approach is that it
results in a much more balanced work flow since the same number of RX and
TX packets should be cleaned per interrupts.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |    4 ----
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |    7 -------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   16 +++++++---------
 3 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index dc3b12e..b85312f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -281,7 +281,6 @@ struct ixgbe_ring_container {
 	struct ixgbe_ring *ring;	/* pointer to linked list of rings */
 	unsigned int total_bytes;	/* total bytes processed this int */
 	unsigned int total_packets;	/* total packets processed this int */
-	u16 work_limit;			/* total work allowed per interrupt */
 	u8 count;			/* total number of rings in vector */
 	u8 itr;				/* current ITR setting for ring */
 };
@@ -416,9 +415,6 @@ struct ixgbe_adapter {
 	u16 eitr_low;
 	u16 eitr_high;
 
-	/* Work limits */
-	u16 tx_work_limit;
-
 	/* TX */
 	struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
 	int num_tx_queues;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 82d4244..16835cc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -2005,8 +2005,6 @@ static int ixgbe_get_coalesce(struct net_device *netdev,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 
-	ec->tx_max_coalesced_frames_irq = adapter->tx_work_limit;
-
 	/* only valid if in constant ITR mode */
 	switch (adapter->rx_itr_setting) {
 	case 0:
@@ -2093,9 +2091,6 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
 	   && ec->tx_coalesce_usecs)
 		return -EINVAL;
 
-	if (ec->tx_max_coalesced_frames_irq)
-		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
-
 	if (ec->rx_coalesce_usecs > 1) {
 		/* check the limits */
 		if ((1000000/ec->rx_coalesce_usecs > IXGBE_MAX_INT_RATE) ||
@@ -2169,14 +2164,12 @@ static int ixgbe_set_coalesce(struct net_device *netdev,
 			else
 				/* rx only or mixed */
 				q_vector->eitr = adapter->rx_eitr_param;
-			q_vector->tx.work_limit = adapter->tx_work_limit;
 			ixgbe_write_eitr(q_vector);
 		}
 	/* Legacy Interrupt Mode */
 	} else {
 		q_vector = adapter->q_vector[0];
 		q_vector->eitr = adapter->rx_eitr_param;
-		q_vector->tx.work_limit = adapter->tx_work_limit;
 		ixgbe_write_eitr(q_vector);
 	}
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bb54d3d..2450279 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -796,15 +796,16 @@ static void ixgbe_tx_timeout_reset(struct ixgbe_adapter *adapter)
  * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
  * @q_vector: structure containing interrupt and ring information
  * @tx_ring: tx ring to clean
+ * @budget: amount of work driver is allowed to do this pass, in packets
  **/
 static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *tx_ring)
+			       struct ixgbe_ring *tx_ring,
+			       int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
-	u16 budget = q_vector->tx.work_limit;
 	u16 i = tx_ring->next_to_clean;
 
 	tx_buffer = &tx_ring->tx_buffer_info[i];
@@ -2082,7 +2083,7 @@ static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
 #endif
 
 	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
-		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring);
+		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring, budget);
 
 	/* attempt to distribute budget to each queue fairly, but don't allow
 	 * the budget to go below 1 because we'll exit polling */
@@ -2128,7 +2129,7 @@ static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring))
+	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring, budget))
 		return budget;
 
 	/* If all Tx work done, exit the polling mode */
@@ -2163,7 +2164,6 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
 	tx_ring->next = q_vector->tx.ring;
 	q_vector->tx.ring = tx_ring;
 	q_vector->tx.count++;
-	q_vector->tx.work_limit = a->tx_work_limit;
 }
 
 /**
@@ -4155,7 +4155,8 @@ static int ixgbe_poll(struct napi_struct *napi, int budget)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0]);
+	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0],
+					       budget);
 	ixgbe_clean_rx_irq(q_vector, adapter->rx_ring[0], &work_done, budget);
 
 	if (!tx_clean_complete)
@@ -5090,9 +5091,6 @@ static int __devinit ixgbe_sw_init(struct ixgbe_adapter *adapter)
 	adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
 	adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
 
-	/* set default work limits */
-	adapter->tx_work_limit = adapter->tx_ring_count;
-
 	/* initialize eeprom parameters */
 	if (ixgbe_init_eeprom_params_generic(hw)) {
 		e_dev_err("EEPROM initialization failed\n");
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 04/10] ixgbe: consolidate all MSI-X ring interrupts and poll routines into one
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (2 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 05/10] ixgbe: cleanup allocation and freeing of IRQ affinity hint Jeff Kirsher
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change consolidates all of the MSI-X interrupt and polling routines
into two single functions.  One for the interrupt and one for the code.
The main advantage to doing this is that the compiler can optimize the
routines into single monolithic functions which should allow all of them
function to occupy a single block of memory and as such avoid jumping
around.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  288 ++++++-------------------
 1 files changed, 67 insertions(+), 221 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2450279..03432dc 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -798,9 +798,9 @@ static void ixgbe_tx_timeout_reset(struct ixgbe_adapter *adapter)
  * @tx_ring: tx ring to clean
  * @budget: amount of work driver is allowed to do this pass, in packets
  **/
-static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *tx_ring,
-			       int budget)
+static int ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
+			      struct ixgbe_ring *tx_ring,
+			      int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	struct ixgbe_tx_buffer *tx_buffer;
@@ -1298,9 +1298,9 @@ static inline bool ixgbe_get_rsc_state(union ixgbe_adv_rx_desc *rx_desc)
 		IXGBE_RXDADV_RSCCNT_MASK);
 }
 
-static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
-			       struct ixgbe_ring *rx_ring,
-			       int *work_done, int work_to_do)
+static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
+			      struct ixgbe_ring *rx_ring,
+			      int budget)
 {
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
@@ -1480,11 +1480,11 @@ static void ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 #endif /* IXGBE_FCOE */
 		ixgbe_receive_skb(q_vector, skb, staterr, rx_ring, rx_desc);
 
+		budget--;
 next_desc:
 		rx_desc->wb.upper.status_error = 0;
 
-		(*work_done)++;
-		if (*work_done >= work_to_do)
+		if (!budget)
 			break;
 
 		/* return some buffers to hardware, one at a time is too slow */
@@ -1525,9 +1525,10 @@ next_desc:
 	u64_stats_update_end(&rx_ring->syncp);
 	q_vector->rx.total_packets += total_rx_packets;
 	q_vector->rx.total_bytes += total_rx_bytes;
+
+	return budget;
 }
 
-static int ixgbe_clean_rxonly(struct napi_struct *, int);
 /**
  * ixgbe_configure_msix - Configure MSI-X hardware
  * @adapter: board private structure
@@ -1981,167 +1982,18 @@ static inline void ixgbe_irq_disable_queues(struct ixgbe_adapter *adapter,
 	/* skip the flush */
 }
 
-static irqreturn_t ixgbe_msix_clean_tx(int irq, void *data)
+static irqreturn_t ixgbe_msix_clean_rings(int irq, void *data)
 {
 	struct ixgbe_q_vector *q_vector = data;
 
-	if (!q_vector->tx.count)
-		return IRQ_HANDLED;
-
 	/* EIAM disabled interrupts (on this vector) for us */
-	napi_schedule(&q_vector->napi);
 
-	return IRQ_HANDLED;
-}
-
-/**
- * ixgbe_msix_clean_rx - single unshared vector rx clean (all queues)
- * @irq: unused
- * @data: pointer to our q_vector struct for this interrupt vector
- **/
-static irqreturn_t ixgbe_msix_clean_rx(int irq, void *data)
-{
-	struct ixgbe_q_vector *q_vector = data;
-
-	if (!q_vector->rx.count)
-		return IRQ_HANDLED;
-
-	/* EIAM disabled interrupts (on this vector) for us */
-	napi_schedule(&q_vector->napi);
+	if (q_vector->rx.ring || q_vector->tx.ring)
+		napi_schedule(&q_vector->napi);
 
 	return IRQ_HANDLED;
 }
 
-static irqreturn_t ixgbe_msix_clean_many(int irq, void *data)
-{
-	struct ixgbe_q_vector *q_vector = data;
-
-	if (!q_vector->tx.count && !q_vector->rx.count)
-		return IRQ_HANDLED;
-
-	/* EIAM disabled interrupts (on this vector) for us */
-	napi_schedule(&q_vector->napi);
-
-	return IRQ_HANDLED;
-}
-
-/**
- * ixgbe_clean_rxonly - msix (aka one shot) rx clean routine
- * @napi: napi struct with our devices info in it
- * @budget: amount of work driver is allowed to do this pass, in packets
- *
- * This function is optimized for cleaning one queue only on a single
- * q_vector!!!
- **/
-static int ixgbe_clean_rxonly(struct napi_struct *napi, int budget)
-{
-	struct ixgbe_q_vector *q_vector =
-			       container_of(napi, struct ixgbe_q_vector, napi);
-	struct ixgbe_adapter *adapter = q_vector->adapter;
-	int work_done = 0;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	ixgbe_clean_rx_irq(q_vector, q_vector->rx.ring, &work_done, budget);
-
-	/* If all Rx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->rx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter,
-						((u64)1 << q_vector->v_idx));
-	}
-
-	return work_done;
-}
-
-/**
- * ixgbe_clean_rxtx_many - msix (aka one shot) rx clean routine
- * @napi: napi struct with our devices info in it
- * @budget: amount of work driver is allowed to do this pass, in packets
- *
- * This function will clean more than one rx queue associated with a
- * q_vector.
- **/
-static int ixgbe_clean_rxtx_many(struct napi_struct *napi, int budget)
-{
-	struct ixgbe_q_vector *q_vector =
-			       container_of(napi, struct ixgbe_q_vector, napi);
-	struct ixgbe_adapter *adapter = q_vector->adapter;
-	struct ixgbe_ring *ring;
-	int work_done = 0;
-	bool clean_complete = true;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
-		clean_complete &= ixgbe_clean_tx_irq(q_vector, ring, budget);
-
-	/* attempt to distribute budget to each queue fairly, but don't allow
-	 * the budget to go below 1 because we'll exit polling */
-	budget /= (q_vector->rx.count ?: 1);
-	budget = max(budget, 1);
-
-	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
-		ixgbe_clean_rx_irq(q_vector, ring, &work_done, budget);
-
-	if (!clean_complete)
-		work_done = budget;
-
-	/* If all Rx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->rx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter,
-						((u64)1 << q_vector->v_idx));
-		return 0;
-	}
-
-	return work_done;
-}
-
-/**
- * ixgbe_clean_txonly - msix (aka one shot) tx clean routine
- * @napi: napi struct with our devices info in it
- * @budget: amount of work driver is allowed to do this pass, in packets
- *
- * This function is optimized for cleaning one queue only on a single
- * q_vector!!!
- **/
-static int ixgbe_clean_txonly(struct napi_struct *napi, int budget)
-{
-	struct ixgbe_q_vector *q_vector =
-			       container_of(napi, struct ixgbe_q_vector, napi);
-	struct ixgbe_adapter *adapter = q_vector->adapter;
-
-#ifdef CONFIG_IXGBE_DCA
-	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
-		ixgbe_update_dca(q_vector);
-#endif
-
-	if (!ixgbe_clean_tx_irq(q_vector, q_vector->tx.ring, budget))
-		return budget;
-
-	/* If all Tx work done, exit the polling mode */
-	napi_complete(napi);
-	if (adapter->tx_itr_setting & 1)
-		ixgbe_set_itr(q_vector);
-	if (!test_bit(__IXGBE_DOWN, &adapter->state))
-		ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx));
-
-	return 0;
-}
-
 static inline void map_vector_to_rxq(struct ixgbe_adapter *a, int v_idx,
 				     int r_idx)
 {
@@ -2241,7 +2093,6 @@ out:
 static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	irqreturn_t (*handler)(int, void *);
 	int i, vector, q_vectors, err;
 	int ri = 0, ti = 0;
 
@@ -2252,31 +2103,25 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	if (err)
 		return err;
 
-#define SET_HANDLER(_v) (((_v)->rx.count && (_v)->tx.count)        \
-					  ? &ixgbe_msix_clean_many : \
-			  (_v)->rx.count ? &ixgbe_msix_clean_rx   : \
-			  (_v)->tx.count ? &ixgbe_msix_clean_tx   : \
-			  NULL)
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[vector];
-		handler = SET_HANDLER(q_vector);
 
-		if (handler == &ixgbe_msix_clean_rx) {
+		if (q_vector->tx.ring && q_vector->rx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-			         "%s-%s-%d", netdev->name, "rx", ri++);
-		} else if (handler == &ixgbe_msix_clean_tx) {
+				 "%s-%s-%d", netdev->name, "TxRx", ri++);
+			ti++;
+		} else if (q_vector->rx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-			         "%s-%s-%d", netdev->name, "tx", ti++);
-		} else if (handler == &ixgbe_msix_clean_many) {
+				 "%s-%s-%d", netdev->name, "rx", ri++);
+		} else if (q_vector->tx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
-			         "%s-%s-%d", netdev->name, "TxRx", ri++);
-			ti++;
+				 "%s-%s-%d", netdev->name, "tx", ti++);
 		} else {
 			/* skip this unused q_vector */
 			continue;
 		}
 		err = request_irq(adapter->msix_entries[vector].vector,
-				  handler, 0, q_vector->name,
+				  &ixgbe_msix_clean_rings, 0, q_vector->name,
 				  q_vector);
 		if (err) {
 			e_err(probe, "request_irq failed for MSIX interrupt "
@@ -2484,8 +2329,8 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
 		i--;
 		for (; i >= 0; i--) {
 			/* free only the irqs that were actually requested */
-			if (!adapter->q_vector[i]->rx.count &&
-			    !adapter->q_vector[i]->tx.count)
+			if (!adapter->q_vector[i]->rx.ring &&
+			    !adapter->q_vector[i]->tx.ring)
 				continue;
 
 			free_irq(adapter->msix_entries[i].vector,
@@ -3478,19 +3323,8 @@ static void ixgbe_napi_enable_all(struct ixgbe_adapter *adapter)
 		q_vectors = 1;
 
 	for (q_idx = 0; q_idx < q_vectors; q_idx++) {
-		struct napi_struct *napi;
 		q_vector = adapter->q_vector[q_idx];
-		napi = &q_vector->napi;
-		if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
-			if (!q_vector->rx.count || !q_vector->tx.count) {
-				if (q_vector->tx.count == 1)
-					napi->poll = &ixgbe_clean_txonly;
-				else if (q_vector->rx.count == 1)
-					napi->poll = &ixgbe_clean_rxonly;
-			}
-		}
-
-		napi_enable(napi);
+		napi_enable(&q_vector->napi);
 	}
 }
 
@@ -4148,29 +3982,41 @@ static int ixgbe_poll(struct napi_struct *napi, int budget)
 	struct ixgbe_q_vector *q_vector =
 				container_of(napi, struct ixgbe_q_vector, napi);
 	struct ixgbe_adapter *adapter = q_vector->adapter;
-	int tx_clean_complete, work_done = 0;
+	struct ixgbe_ring *ring;
+	int per_ring_budget;
+	bool clean_complete = true;
 
 #ifdef CONFIG_IXGBE_DCA
 	if (adapter->flags & IXGBE_FLAG_DCA_ENABLED)
 		ixgbe_update_dca(q_vector);
 #endif
 
-	tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0],
-					       budget);
-	ixgbe_clean_rx_irq(q_vector, adapter->rx_ring[0], &work_done, budget);
+	for (ring = q_vector->tx.ring; ring != NULL; ring = ring->next)
+		clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring, budget);
 
-	if (!tx_clean_complete)
-		work_done = budget;
+	/* attempt to distribute budget to each queue fairly, but don't allow
+	 * the budget to go below 1 because we'll exit polling */
+	if (q_vector->rx.count > 1)
+		per_ring_budget = max(budget/q_vector->rx.count, 1);
+	else
+		per_ring_budget = budget;
 
-	/* If budget not fully consumed, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		if (adapter->rx_itr_setting & 1)
-			ixgbe_set_itr(q_vector);
-		if (!test_bit(__IXGBE_DOWN, &adapter->state))
-			ixgbe_irq_enable_queues(adapter, IXGBE_EIMS_RTX_QUEUE);
-	}
-	return work_done;
+	for (ring = q_vector->rx.ring; ring != NULL; ring = ring->next)
+		clean_complete &= !!ixgbe_clean_rx_irq(q_vector, ring,
+						       per_ring_budget);
+
+	/* If all work not completed, return budget and keep polling */
+	if (!clean_complete)
+		return budget;
+
+	/* all work done, exit the polling mode */
+	napi_complete(napi);
+	if (adapter->rx_itr_setting & 1)
+		ixgbe_set_itr(q_vector);
+	if (!test_bit(__IXGBE_DOWN, &adapter->state))
+		ixgbe_irq_enable_queues(adapter, ((u64)1 << q_vector->v_idx));
+
+	return 0;
 }
 
 /**
@@ -4811,19 +4657,15 @@ out:
  **/
 static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 {
-	int q_idx, num_q_vectors;
+	int v_idx, num_q_vectors;
 	struct ixgbe_q_vector *q_vector;
-	int (*poll)(struct napi_struct *, int);
 
-	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
+	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
 		num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
-		poll = &ixgbe_clean_rxtx_many;
-	} else {
+	else
 		num_q_vectors = 1;
-		poll = &ixgbe_poll;
-	}
 
-	for (q_idx = 0; q_idx < num_q_vectors; q_idx++) {
+	for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
 		q_vector = kzalloc_node(sizeof(struct ixgbe_q_vector),
 					GFP_KERNEL, adapter->node);
 		if (!q_vector)
@@ -4831,25 +4673,29 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 					   GFP_KERNEL);
 		if (!q_vector)
 			goto err_out;
+
 		q_vector->adapter = adapter;
+		q_vector->v_idx = v_idx;
+
 		if (q_vector->tx.count && !q_vector->rx.count)
 			q_vector->eitr = adapter->tx_eitr_param;
 		else
 			q_vector->eitr = adapter->rx_eitr_param;
-		q_vector->v_idx = q_idx;
-		netif_napi_add(adapter->netdev, &q_vector->napi, (*poll), 64);
-		adapter->q_vector[q_idx] = q_vector;
+
+		netif_napi_add(adapter->netdev, &q_vector->napi,
+			       ixgbe_poll, 64);
+		adapter->q_vector[v_idx] = q_vector;
 	}
 
 	return 0;
 
 err_out:
-	while (q_idx) {
-		q_idx--;
-		q_vector = adapter->q_vector[q_idx];
+	while (v_idx) {
+		v_idx--;
+		q_vector = adapter->q_vector[v_idx];
 		netif_napi_del(&q_vector->napi);
 		kfree(q_vector);
-		adapter->q_vector[q_idx] = NULL;
+		adapter->q_vector[v_idx] = NULL;
 	}
 	return -ENOMEM;
 }
@@ -6944,7 +6790,7 @@ static void ixgbe_netpoll(struct net_device *netdev)
 		int num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 		for (i = 0; i < num_q_vectors; i++) {
 			struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
-			ixgbe_msix_clean_many(0, q_vector);
+			ixgbe_msix_clean_rings(0, q_vector);
 		}
 	} else {
 		ixgbe_intr(adapter->pdev->irq, netdev);
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 05/10] ixgbe: cleanup allocation and freeing of IRQ affinity hint
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (3 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 04/10] ixgbe: consolidate all MSI-X ring interrupts and poll routines into one Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 06/10] ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA Jeff Kirsher
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

The allocation and freeing of the IRQ affinity hint needs some updates
since there are a number of spots where we run into possible issues with
the hint not being correctly updated.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   76 ++++++++++++-------------
 1 files changed, 36 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 03432dc..7d745e2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1566,20 +1566,6 @@ static void ixgbe_configure_msix(struct ixgbe_adapter *adapter)
 			q_vector->eitr = adapter->rx_eitr_param;
 
 		ixgbe_write_eitr(q_vector);
-		/* If ATR is enabled, set interrupt affinity */
-		if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) {
-			/*
-			 * Allocate the affinity_hint cpumask, assign the mask
-			 * for this vector, and set our affinity_hint for
-			 * this irq.
-			 */
-			if (!alloc_cpumask_var(&q_vector->affinity_mask,
-			                       GFP_KERNEL))
-				return;
-			cpumask_set_cpu(v_idx, q_vector->affinity_mask);
-			irq_set_affinity_hint(adapter->msix_entries[v_idx].vector,
-			                      q_vector->affinity_mask);
-		}
 	}
 
 	switch (adapter->hw.mac.type) {
@@ -2093,18 +2079,17 @@ out:
 static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 {
 	struct net_device *netdev = adapter->netdev;
-	int i, vector, q_vectors, err;
+	int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int vector, err;
 	int ri = 0, ti = 0;
 
-	/* Decrement for Other and TCP Timer vectors */
-	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
-
 	err = ixgbe_map_rings_to_vectors(adapter);
 	if (err)
 		return err;
 
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[vector];
+		struct msix_entry *entry = &adapter->msix_entries[vector];
 
 		if (q_vector->tx.ring && q_vector->rx.ring) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
@@ -2120,14 +2105,19 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 			/* skip this unused q_vector */
 			continue;
 		}
-		err = request_irq(adapter->msix_entries[vector].vector,
-				  &ixgbe_msix_clean_rings, 0, q_vector->name,
-				  q_vector);
+		err = request_irq(entry->vector, &ixgbe_msix_clean_rings, 0,
+				  q_vector->name, q_vector);
 		if (err) {
 			e_err(probe, "request_irq failed for MSIX interrupt "
 			      "Error: %d\n", err);
 			goto free_queue_irqs;
 		}
+		/* If Flow Director is enabled, set interrupt affinity */
+		if (adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) {
+			/* assign the mask for this irq */
+			irq_set_affinity_hint(entry->vector,
+					      q_vector->affinity_mask);
+		}
 	}
 
 	sprintf(adapter->lsc_int_name, "%s:lsc", netdev->name);
@@ -2141,9 +2131,13 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	return 0;
 
 free_queue_irqs:
-	for (i = vector - 1; i >= 0; i--)
-		free_irq(adapter->msix_entries[--vector].vector,
-			 adapter->q_vector[i]);
+	while (vector) {
+		vector--;
+		irq_set_affinity_hint(adapter->msix_entries[vector].vector,
+				      NULL);
+		free_irq(adapter->msix_entries[vector].vector,
+			 adapter->q_vector[vector]);
+	}
 	adapter->flags &= ~IXGBE_FLAG_MSIX_ENABLED;
 	pci_disable_msix(adapter->pdev);
 	kfree(adapter->msix_entries);
@@ -2333,14 +2327,19 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
 			    !adapter->q_vector[i]->tx.ring)
 				continue;
 
+			/* clear the affinity_mask in the IRQ descriptor */
+			irq_set_affinity_hint(adapter->msix_entries[i].vector,
+					      NULL);
+
 			free_irq(adapter->msix_entries[i].vector,
 				 adapter->q_vector[i]);
 		}
-
-		ixgbe_reset_q_vectors(adapter);
 	} else {
 		free_irq(adapter->pdev->irq, adapter);
 	}
+
+	/* clear q_vector state information */
+	ixgbe_reset_q_vectors(adapter);
 }
 
 /**
@@ -3879,7 +3878,6 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	u32 rxctrl;
 	int i;
-	int num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 
 	/* signal that we are down to the interrupt handler */
 	set_bit(__IXGBE_DOWN, &adapter->state);
@@ -3924,15 +3922,6 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 			adapter->vfinfo[i].clear_to_send = 0;
 	}
 
-	/* Cleanup the affinity_hint CPU mask memory and callback */
-	for (i = 0; i < num_q_vectors; i++) {
-		struct ixgbe_q_vector *q_vector = adapter->q_vector[i];
-		/* clear the affinity_mask in the IRQ descriptor */
-		irq_set_affinity_hint(adapter->msix_entries[i]. vector, NULL);
-		/* release the CPU mask memory */
-		free_cpumask_var(q_vector->affinity_mask);
-	}
-
 	/* disable transmits in the hardware now that interrupts are off */
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		u8 reg_idx = adapter->tx_ring[i]->reg_idx;
@@ -4677,6 +4666,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		q_vector->adapter = adapter;
 		q_vector->v_idx = v_idx;
 
+		/* Allocate the affinity_hint cpumask, configure the mask */
+		if (!alloc_cpumask_var(&q_vector->affinity_mask, GFP_KERNEL))
+			goto err_out;
+		cpumask_set_cpu(v_idx, q_vector->affinity_mask);
+
 		if (q_vector->tx.count && !q_vector->rx.count)
 			q_vector->eitr = adapter->tx_eitr_param;
 		else
@@ -4694,6 +4688,7 @@ err_out:
 		v_idx--;
 		q_vector = adapter->q_vector[v_idx];
 		netif_napi_del(&q_vector->napi);
+		free_cpumask_var(q_vector->affinity_mask);
 		kfree(q_vector);
 		adapter->q_vector[v_idx] = NULL;
 	}
@@ -4710,17 +4705,18 @@ err_out:
  **/
 static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
 {
-	int q_idx, num_q_vectors;
+	int v_idx, num_q_vectors;
 
 	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
 		num_q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
 	else
 		num_q_vectors = 1;
 
-	for (q_idx = 0; q_idx < num_q_vectors; q_idx++) {
-		struct ixgbe_q_vector *q_vector = adapter->q_vector[q_idx];
-		adapter->q_vector[q_idx] = NULL;
+	for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
+		struct ixgbe_q_vector *q_vector = adapter->q_vector[v_idx];
+		adapter->q_vector[v_idx] = NULL;
 		netif_napi_del(&q_vector->napi);
+		free_cpumask_var(q_vector->affinity_mask);
 		kfree(q_vector);
 	}
 }
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 06/10] ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (4 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 05/10] ixgbe: cleanup allocation and freeing of IRQ affinity hint Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 07/10] ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt types Jeff Kirsher
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change switches us over to using the ring->dev pointer instead of
having to use the adapter->pdev->dev reference.  The advantage to this is
that it is a much shorter route to get the to final needed value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7d745e2..1929d23 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -925,12 +925,12 @@ static void ixgbe_update_rx_dca(struct ixgbe_adapter *adapter,
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
 		rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK;
-		rxctrl |= dca3_get_tag(&adapter->pdev->dev, cpu);
+		rxctrl |= dca3_get_tag(rx_ring->dev, cpu);
 		break;
 	case ixgbe_mac_82599EB:
 	case ixgbe_mac_X540:
 		rxctrl &= ~IXGBE_DCA_RXCTRL_CPUID_MASK_82599;
-		rxctrl |= (dca3_get_tag(&adapter->pdev->dev, cpu) <<
+		rxctrl |= (dca3_get_tag(rx_ring->dev, cpu) <<
 			   IXGBE_DCA_RXCTRL_CPUID_SHIFT_82599);
 		break;
 	default:
@@ -954,7 +954,7 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter,
 	case ixgbe_mac_82598EB:
 		txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL(reg_idx));
 		txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK;
-		txctrl |= dca3_get_tag(&adapter->pdev->dev, cpu);
+		txctrl |= dca3_get_tag(tx_ring->dev, cpu);
 		txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN;
 		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL(reg_idx), txctrl);
 		break;
@@ -962,7 +962,7 @@ static void ixgbe_update_tx_dca(struct ixgbe_adapter *adapter,
 	case ixgbe_mac_X540:
 		txctrl = IXGBE_READ_REG(hw, IXGBE_DCA_TXCTRL_82599(reg_idx));
 		txctrl &= ~IXGBE_DCA_TXCTRL_CPUID_MASK_82599;
-		txctrl |= (dca3_get_tag(&adapter->pdev->dev, cpu) <<
+		txctrl |= (dca3_get_tag(tx_ring->dev, cpu) <<
 			   IXGBE_DCA_TXCTRL_CPUID_SHIFT_82599);
 		txctrl |= IXGBE_DCA_TXCTRL_DESC_DCA_EN;
 		IXGBE_WRITE_REG(hw, IXGBE_DCA_TXCTRL_82599(reg_idx), txctrl);
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 07/10] ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt types
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (5 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 06/10] ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 08/10] ixgbe: Drop unnecessary adapter->hw dereference in loopback test setup Jeff Kirsher
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch makes it so that the map_rings_to_vectors call will work with
all interrupt types.  The advantage to this is that there will now be a
predictable mapping for all given interrupt types.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   90 ++++++++++---------------
 1 files changed, 35 insertions(+), 55 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1929d23..2879a74 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2014,59 +2014,41 @@ static inline void map_vector_to_txq(struct ixgbe_adapter *a, int v_idx,
  * group the rings as "efficiently" as possible.  You would add new
  * mapping configurations in here.
  **/
-static int ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter)
+static void ixgbe_map_rings_to_vectors(struct ixgbe_adapter *adapter)
 {
-	int q_vectors;
+	int q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+	int rxr_remaining = adapter->num_rx_queues, rxr_idx = 0;
+	int txr_remaining = adapter->num_tx_queues, txr_idx = 0;
 	int v_start = 0;
-	int rxr_idx = 0, txr_idx = 0;
-	int rxr_remaining = adapter->num_rx_queues;
-	int txr_remaining = adapter->num_tx_queues;
-	int i, j;
-	int rqpv, tqpv;
-	int err = 0;
 
-	/* No mapping required if MSI-X is disabled. */
+	/* only one q_vector if MSI-X is disabled. */
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
-		goto out;
-
-	q_vectors = adapter->num_msix_vectors - NON_Q_VECTORS;
+		q_vectors = 1;
 
 	/*
-	 * The ideal configuration...
-	 * We have enough vectors to map one per queue.
+	 * If we don't have enough vectors for a 1-to-1 mapping, we'll have to
+	 * group them so there are multiple queues per vector.
+	 *
+	 * Re-adjusting *qpv takes care of the remainder.
 	 */
-	if (q_vectors == adapter->num_rx_queues + adapter->num_tx_queues) {
-		for (; rxr_idx < rxr_remaining; v_start++, rxr_idx++)
+	for (; v_start < q_vectors && rxr_remaining; v_start++) {
+		int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_start);
+		for (; rqpv; rqpv--, rxr_idx++, rxr_remaining--)
 			map_vector_to_rxq(adapter, v_start, rxr_idx);
-
-		for (; txr_idx < txr_remaining; v_start++, txr_idx++)
-			map_vector_to_txq(adapter, v_start, txr_idx);
-
-		goto out;
 	}
 
 	/*
-	 * If we don't have enough vectors for a 1-to-1
-	 * mapping, we'll have to group them so there are
-	 * multiple queues per vector.
+	 * If there are not enough q_vectors for each ring to have it's own
+	 * vector then we must pair up Rx/Tx on a each vector
 	 */
-	/* Re-adjusting *qpv takes care of the remainder. */
-	for (i = v_start; i < q_vectors; i++) {
-		rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - i);
-		for (j = 0; j < rqpv; j++) {
-			map_vector_to_rxq(adapter, i, rxr_idx);
-			rxr_idx++;
-			rxr_remaining--;
-		}
-		tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - i);
-		for (j = 0; j < tqpv; j++) {
-			map_vector_to_txq(adapter, i, txr_idx);
-			txr_idx++;
-			txr_remaining--;
-		}
+	if ((v_start + txr_remaining) > q_vectors)
+		v_start = 0;
+
+	for (; v_start < q_vectors && txr_remaining; v_start++) {
+		int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_start);
+		for (; tqpv; tqpv--, txr_idx++, txr_remaining--)
+			map_vector_to_txq(adapter, v_start, txr_idx);
 	}
-out:
-	return err;
 }
 
 /**
@@ -2083,10 +2065,6 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter *adapter)
 	int vector, err;
 	int ri = 0, ti = 0;
 
-	err = ixgbe_map_rings_to_vectors(adapter);
-	if (err)
-		return err;
-
 	for (vector = 0; vector < q_vectors; vector++) {
 		struct ixgbe_q_vector *q_vector = adapter->q_vector[vector];
 		struct msix_entry *entry = &adapter->msix_entries[vector];
@@ -2294,19 +2272,25 @@ static int ixgbe_request_irq(struct ixgbe_adapter *adapter)
 	struct net_device *netdev = adapter->netdev;
 	int err;
 
-	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED) {
+	/* map all of the rings to the q_vectors */
+	ixgbe_map_rings_to_vectors(adapter);
+
+	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
 		err = ixgbe_request_msix_irqs(adapter);
-	} else if (adapter->flags & IXGBE_FLAG_MSI_ENABLED) {
+	else if (adapter->flags & IXGBE_FLAG_MSI_ENABLED)
 		err = request_irq(adapter->pdev->irq, ixgbe_intr, 0,
 				  netdev->name, adapter);
-	} else {
+	else
 		err = request_irq(adapter->pdev->irq, ixgbe_intr, IRQF_SHARED,
 				  netdev->name, adapter);
-	}
 
-	if (err)
+	if (err) {
 		e_err(probe, "request_irq failed, Error %d\n", err);
 
+		/* place q_vectors and rings back into a known good state */
+		ixgbe_reset_q_vectors(adapter);
+	}
+
 	return err;
 }
 
@@ -2316,11 +2300,10 @@ static void ixgbe_free_irq(struct ixgbe_adapter *adapter)
 		int i, q_vectors;
 
 		q_vectors = adapter->num_msix_vectors;
-
 		i = q_vectors - 1;
 		free_irq(adapter->msix_entries[i].vector, adapter);
-
 		i--;
+
 		for (; i >= 0; i--) {
 			/* free only the irqs that were actually requested */
 			if (!adapter->q_vector[i]->rx.ring &&
@@ -2387,9 +2370,6 @@ static void ixgbe_configure_msi_and_legacy(struct ixgbe_adapter *adapter)
 	ixgbe_set_ivar(adapter, 0, 0, 0);
 	ixgbe_set_ivar(adapter, 1, 0, 0);
 
-	map_vector_to_rxq(adapter, 0, 0);
-	map_vector_to_txq(adapter, 0, 0);
-
 	e_info(hw, "Legacy interrupt IVAR setup done\n");
 }
 
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 08/10] ixgbe: Drop unnecessary adapter->hw dereference in loopback test setup
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (6 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 07/10] ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt types Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 09/10] ixgbe: combine PCI_VDEVICE and board declaration to same line Jeff Kirsher
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch drops a set of unnecessary dereferences to the hardware structure
since we already have a local copy of the hardware pointer.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   18 +++++++++---------
 1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index 16835cc..6241e9a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1566,26 +1566,26 @@ static int ixgbe_setup_loopback_test(struct ixgbe_adapter *adapter)
 
 	/* X540 needs to set the MACC.FLU bit to force link up */
 	if (adapter->hw.mac.type == ixgbe_mac_X540) {
-		reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_MACC);
+		reg_data = IXGBE_READ_REG(hw, IXGBE_MACC);
 		reg_data |= IXGBE_MACC_FLU;
-		IXGBE_WRITE_REG(&adapter->hw, IXGBE_MACC, reg_data);
+		IXGBE_WRITE_REG(hw, IXGBE_MACC, reg_data);
 	}
 
 	/* right now we only support MAC loopback in the driver */
-	reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_HLREG0);
+	reg_data = IXGBE_READ_REG(hw, IXGBE_HLREG0);
 	/* Setup MAC loopback */
 	reg_data |= IXGBE_HLREG0_LPBK;
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_HLREG0, reg_data);
+	IXGBE_WRITE_REG(hw, IXGBE_HLREG0, reg_data);
 
-	reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_FCTRL);
+	reg_data = IXGBE_READ_REG(hw, IXGBE_FCTRL);
 	reg_data |= IXGBE_FCTRL_BAM | IXGBE_FCTRL_SBP | IXGBE_FCTRL_MPE;
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_FCTRL, reg_data);
+	IXGBE_WRITE_REG(hw, IXGBE_FCTRL, reg_data);
 
-	reg_data = IXGBE_READ_REG(&adapter->hw, IXGBE_AUTOC);
+	reg_data = IXGBE_READ_REG(hw, IXGBE_AUTOC);
 	reg_data &= ~IXGBE_AUTOC_LMS_MASK;
 	reg_data |= IXGBE_AUTOC_LMS_10G_LINK_NO_AN | IXGBE_AUTOC_FLU;
-	IXGBE_WRITE_REG(&adapter->hw, IXGBE_AUTOC, reg_data);
-	IXGBE_WRITE_FLUSH(&adapter->hw);
+	IXGBE_WRITE_REG(hw, IXGBE_AUTOC, reg_data);
+	IXGBE_WRITE_FLUSH(hw);
 	usleep_range(10000, 20000);
 
 	/* Disable Atlas Tx lanes; re-enabled in reset path */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 09/10] ixgbe: combine PCI_VDEVICE and board declaration to same line
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (7 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 08/10] ixgbe: Drop unnecessary adapter->hw dereference in loopback test setup Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-21  7:29 ` [net-next 10/10] ixgbe: Update TXDCTL configuration to correctly handle WTHRESH Jeff Kirsher
  2011-08-23 22:45 ` [net-next 00/10][pull request] Intel Wired LAN Driver Update David Miller
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is a minor whitespace cleanup to compress the device ID
declaration and board type declaration onto the same line.  It seems to
make sense since all of the combinations of the two are less than 80
characters and it makes the overall layout a bit more readable.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   79 ++++++++-----------------
 1 files changed, 26 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 2879a74..35f9912 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -79,59 +79,32 @@ static const struct ixgbe_info *ixgbe_info_tbl[] = {
  *   Class, Class Mask, private data (not used) }
  */
 static DEFINE_PCI_DEVICE_TABLE(ixgbe_pci_tbl) = {
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_DUAL_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_SINGLE_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT2),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_CX4),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_CX4_DUAL_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_DA_DUAL_PORT),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_XF_LR),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_SFP_LOM),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_BX),
-	 board_82598 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_XAUI_LOM),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KR),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_EM),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4_MEZZ),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_CX4),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_BACKPLANE_FCOE),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_FCOE),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_T3_LOM),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_COMBO_BACKPLANE),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X540T),
-	 board_X540 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_SF2),
-	 board_82599 },
-	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_LS),
-	 board_82599 },
-
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_DUAL_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AF_SINGLE_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598AT2), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_CX4), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_CX4_DUAL_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_DA_DUAL_PORT), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_SR_DUAL_PORT_EM), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_XF_LR), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598EB_SFP_LOM), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82598_BX), board_82598 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_XAUI_LOM), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KR), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_EM), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_KX4_MEZZ), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_CX4), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_BACKPLANE_FCOE), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_FCOE), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_T3_LOM), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_COMBO_BACKPLANE), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_X540T), board_X540 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_SFP_SF2), board_82599 },
+	{PCI_VDEVICE(INTEL, IXGBE_DEV_ID_82599_LS), board_82599 },
 	/* required last entry */
 	{0, }
 };
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next 10/10] ixgbe: Update TXDCTL configuration to correctly handle WTHRESH
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (8 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 09/10] ixgbe: combine PCI_VDEVICE and board declaration to same line Jeff Kirsher
@ 2011-08-21  7:29 ` Jeff Kirsher
  2011-08-23 22:45 ` [net-next 00/10][pull request] Intel Wired LAN Driver Update David Miller
  10 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-21  7:29 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change updated the TXDCTL configuration.  The main goal is to be much
more explicit about the configuration and avoid a possible fake TX hang
when the interrupt throttle rate is set to 0.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   35 +++++++++++++------------
 1 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 35f9912..8cf8670 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2359,13 +2359,11 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
-	u32 txdctl;
+	u32 txdctl = IXGBE_TXDCTL_ENABLE;
 	u8 reg_idx = ring->reg_idx;
 
 	/* disable queue to avoid issues while updating state */
-	txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(reg_idx));
-	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx),
-			txdctl & ~IXGBE_TXDCTL_ENABLE);
+	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
 	IXGBE_WRITE_FLUSH(hw);
 
 	IXGBE_WRITE_REG(hw, IXGBE_TDBAL(reg_idx),
@@ -2377,18 +2375,22 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_TDT(reg_idx), 0);
 	ring->tail = hw->hw_addr + IXGBE_TDT(reg_idx);
 
-	/* configure fetching thresholds */
-	if (adapter->rx_itr_setting == 0) {
-		/* cannot set wthresh when itr==0 */
-		txdctl &= ~0x007F0000;
-	} else {
-		/* enable WTHRESH=8 descriptors, to encourage burst writeback */
-		txdctl |= (8 << 16);
-	}
-	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
-		/* PThresh workaround for Tx hang with DFP enabled. */
-		txdctl |= 32;
-	}
+	/*
+	 * set WTHRESH to encourage burst writeback, it should not be set
+	 * higher than 1 when ITR is 0 as it could cause false TX hangs
+	 *
+	 * In order to avoid issues WTHRESH + PTHRESH should always be equal
+	 * to or less than the number of on chip descriptors, which is
+	 * currently 40.
+	 */
+	if (!adapter->tx_itr_setting || !adapter->rx_itr_setting)
+		txdctl |= (1 << 16);	/* WTHRESH = 1 */
+	else
+		txdctl |= (8 << 16);	/* WTHRESH = 8 */
+
+	/* PTHRESH=32 is needed to avoid a Tx hang with DFP enabled. */
+	txdctl |= (1 << 8) |	/* HTHRESH = 1 */
+		   32;		/* PTHRESH = 32 */
 
 	/* reinitialize flowdirector state */
 	if ((adapter->flags & IXGBE_FLAG_FDIR_HASH_CAPABLE) &&
@@ -2403,7 +2405,6 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	clear_bit(__IXGBE_HANG_CHECK_ARMED, &ring->state);
 
 	/* enable queue */
-	txdctl |= IXGBE_TXDCTL_ENABLE;
 	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), txdctl);
 
 	/* TXDCTL.EN will return 0 on 82598 if link is down, so skip it */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-21  7:29 ` [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget Jeff Kirsher
@ 2011-08-21 14:01   ` Ben Hutchings
  2011-08-22 16:30     ` Alexander Duyck
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2011-08-21 14:01 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Alexander Duyck, netdev, gospo

On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change makes it so that the TX work limit is now obsolete.  Instead of
> using it we can instead rely on the NAPI budget for the number of packets
> we should clean per interrupt.  The advantage to this approach is that it
> results in a much more balanced work flow since the same number of RX and
> TX packets should be cleaned per interrupts.
[...]

This seems kind of sensible, but it's not how Dave has been recommending
people to account for TX work in NAPI.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-21 14:01   ` Ben Hutchings
@ 2011-08-22 16:30     ` Alexander Duyck
  2011-08-22 16:46       ` Ben Hutchings
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2011-08-22 16:30 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, netdev, gospo

On 08/21/2011 07:01 AM, Ben Hutchings wrote:
> On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>
>> This change makes it so that the TX work limit is now obsolete.  Instead of
>> using it we can instead rely on the NAPI budget for the number of packets
>> we should clean per interrupt.  The advantage to this approach is that it
>> results in a much more balanced work flow since the same number of RX and
>> TX packets should be cleaned per interrupts.
> [...]
>
> This seems kind of sensible, but it's not how Dave has been recommending
> people to account for TX work in NAPI.
>
> Ben.
>
I wasn't aware there was a recommended approach.  Could you tell me more 
about it?

As I stated in the patch description this approach works very well for 
me, especially in routing workloads since it typically keeps the TX 
clean-up in polling as long as the RX is in polling.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-22 16:30     ` Alexander Duyck
@ 2011-08-22 16:46       ` Ben Hutchings
  2011-08-22 17:29         ` Alexander Duyck
  0 siblings, 1 reply; 22+ messages in thread
From: Ben Hutchings @ 2011-08-22 16:46 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Jeff Kirsher, davem, netdev, gospo

On Mon, 2011-08-22 at 09:30 -0700, Alexander Duyck wrote:
> On 08/21/2011 07:01 AM, Ben Hutchings wrote:
> > On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
> >> From: Alexander Duyck<alexander.h.duyck@intel.com>
> >>
> >> This change makes it so that the TX work limit is now obsolete.  Instead of
> >> using it we can instead rely on the NAPI budget for the number of packets
> >> we should clean per interrupt.  The advantage to this approach is that it
> >> results in a much more balanced work flow since the same number of RX and
> >> TX packets should be cleaned per interrupts.
> > [...]
> >
> > This seems kind of sensible, but it's not how Dave has been recommending
> > people to account for TX work in NAPI.
> >
> > Ben.
> >
> I wasn't aware there was a recommended approach.  Could you tell me more 
> about it?

If a whole TX ring is cleaned then consider the budget spent; otherwise
don't count it.

Ben.

> As I stated in the patch description this approach works very well for 
> me, especially in routing workloads since it typically keeps the TX 
> clean-up in polling as long as the RX is in polling.
> 
> Thanks,
> 
> Alex

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-22 16:46       ` Ben Hutchings
@ 2011-08-22 17:29         ` Alexander Duyck
  2011-08-22 20:56           ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2011-08-22 17:29 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, netdev, gospo

On 08/22/2011 09:46 AM, Ben Hutchings wrote:
> On Mon, 2011-08-22 at 09:30 -0700, Alexander Duyck wrote:
>> On 08/21/2011 07:01 AM, Ben Hutchings wrote:
>>> On Sun, 2011-08-21 at 00:29 -0700, Jeff Kirsher wrote:
>>>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>>>
>>>> This change makes it so that the TX work limit is now obsolete.  Instead of
>>>> using it we can instead rely on the NAPI budget for the number of packets
>>>> we should clean per interrupt.  The advantage to this approach is that it
>>>> results in a much more balanced work flow since the same number of RX and
>>>> TX packets should be cleaned per interrupts.
>>> [...]
>>>
>>> This seems kind of sensible, but it's not how Dave has been recommending
>>> people to account for TX work in NAPI.
>>>
>>> Ben.
>>>
>> I wasn't aware there was a recommended approach.  Could you tell me more
>> about it?
> If a whole TX ring is cleaned then consider the budget spent; otherwise
> don't count it.
>
> Ben.

The only problem I was seeing with that was that in certain cases it 
seemed like the TX cleanup could consume enough CPU time to cause pretty 
significant delays in processing the RX cleanup.  This in turn was 
causing single queue bi-directional routing tests to come out pretty 
unbalanced since what seemed to happen is that one CPUs RX work would 
overwhelm the other CPU with the TX processing resulting in an 
unbalanced flow that was something like a 60/40 split between the 
upstream and downstream throughput.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-22 17:29         ` Alexander Duyck
@ 2011-08-22 20:56           ` David Miller
  2011-08-22 22:57             ` Alexander Duyck
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2011-08-22 20:56 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: bhutchings, jeffrey.t.kirsher, netdev, gospo

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 22 Aug 2011 10:29:51 -0700

> The only problem I was seeing with that was that in certain cases it
> seemed like the TX cleanup could consume enough CPU time to cause
> pretty significant delays in processing the RX cleanup.  This in turn
> was causing single queue bi-directional routing tests to come out
> pretty unbalanced since what seemed to happen is that one CPUs RX work
> would overwhelm the other CPU with the TX processing resulting in an
> unbalanced flow that was something like a 60/40 split between the
> upstream and downstream throughput.

But the problem is that now you're applying the budget to two operations
that have much differing costs.  Freeing up a TX ring packet is probably
on the order of 1/10th the cost of processing an incoming RX ring frame.

I've advocated to not apply the budget at all to TX ring processing.

I can see your delimma with respect to RX ring processing being delayed,
but if that's really happening you can consider whether the TX ring is
simply too large.

In any event can you try something like dampening the cost applied to
budget for TX work (1/2, 1/4, etc.)?  Because as far as I can tell, if
you are really hitting the budget limit on TX then you won't be doing
any RX work on that device until a future NAPI round that depletes the
TX ring work without going over the budget.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-22 20:56           ` David Miller
@ 2011-08-22 22:57             ` Alexander Duyck
  2011-08-22 23:40               ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2011-08-22 22:57 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, jeffrey.t.kirsher, netdev, gospo

On 08/22/2011 01:56 PM, David Miller wrote:
> From: Alexander Duyck<alexander.h.duyck@intel.com>
> Date: Mon, 22 Aug 2011 10:29:51 -0700
>
>> The only problem I was seeing with that was that in certain cases it
>> seemed like the TX cleanup could consume enough CPU time to cause
>> pretty significant delays in processing the RX cleanup.  This in turn
>> was causing single queue bi-directional routing tests to come out
>> pretty unbalanced since what seemed to happen is that one CPUs RX work
>> would overwhelm the other CPU with the TX processing resulting in an
>> unbalanced flow that was something like a 60/40 split between the
>> upstream and downstream throughput.
> But the problem is that now you're applying the budget to two operations
> that have much differing costs.  Freeing up a TX ring packet is probably
> on the order of 1/10th the cost of processing an incoming RX ring frame.
>
> I've advocated to not apply the budget at all to TX ring processing.
I fully understand that the TX path is much cheaper than the RX path.  
One step I have taken in all of this code is that the TX path only 
counts SKBs cleaned, it doesn't count descriptors.  So a single 
descriptor 60byte transmit will cost the same as a 64K 18 descriptor 
TSO.  All I am really counting is the number of times I have called 
dev_kfree_skb_any();
> I can see your delimma with respect to RX ring processing being delayed,
> but if that's really happening you can consider whether the TX ring is
> simply too large.
The problem was occurring even without large rings.  I was seeing issues 
with rings just 256 descriptors in size.  The problem seemed to be that 
the TX cleanup being a multiple of budget was allowing one CPU to 
overwhelm the other and the fact that the TX was essentially unbounded 
was just allowing the issue to feedback on itself.

In the routing test case I was actually seeing significant advantages to 
this approach as we were essentially cleaning just the right number of 
buffers to make room for the next set of transmits when the RX cleanup 
came though.  In addition since the RX and TX workload was balanced it 
kept both locked into polling while the CPU was saturated instead of 
allowing the TX to become interrupt driven.  In addition since the TX 
was working on the same budget as the RX the number of SKBs freed up in 
the TX path would match the number consumed when being reallocated on 
the RX path.

> In any event can you try something like dampening the cost applied to
> budget for TX work (1/2, 1/4, etc.)?  Because as far as I can tell, if
> you are really hitting the budget limit on TX then you won't be doing
> any RX work on that device until a future NAPI round that depletes the
> TX ring work without going over the budget.
The problem seemed to be present as long as I allowed the TX budget to 
be a multiple of the RX budget.  The easiest way to keep things balanced 
and avoid allowing the TX from one CPU to overwhelm the RX on another 
was just to keep the budgets equal.

I'm a bit confused by this last comment.  The full budget is used for TX 
and RX, it isn't divided.  I do a budget worth of TX cleanup and a 
budget worth of RX cleanup within the ixgbe_poll routine, and if either 
of them consume their full budget then I return the budget value as the 
work done.

If you are referring to the case where two devices are sharing the CPU 
then I would suspect this might lead to faster consumption of the 
netdev_budget, but other than that I don't see any starvation issues for 
RX or TX.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-22 22:57             ` Alexander Duyck
@ 2011-08-22 23:40               ` David Miller
  2011-08-23  4:04                 ` Alexander Duyck
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2011-08-22 23:40 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: bhutchings, jeffrey.t.kirsher, netdev, gospo

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 22 Aug 2011 15:57:51 -0700

> The problem was occurring even without large rings.
> I was seeing issues with rings just 256 descriptors in size.

And the default in the ixgbe driver is 512 entries which I think
itself is quite excessive.  Something like 128 is more in line with
what I'd call a sane default.

So the only side effect of your change is to decrease the TX quota to
64 (the default NAPI quota) from it's current value of 512
(IXGBE_DEFAULT_TXD).

Talking about the existing code, I can't even see how the current
driver private TX quota can trigger except in the most extreme cases.
This is because the quota is set the same as the size you're setting
the TX ring to.

> The problem seemed to be that the TX cleanup being a multiple of
> budget was allowing one CPU to overwhelm the other and the fact that
> the TX was essentially unbounded was just allowing the issue to
> feedback on itself.

I still don't understand what issue you could even be running into.

On each CPU we round robin against all NAPI requestors for that CPU.

In your routing test setup, we should have one cpu doing the RX and
another different cpu doing TX.

Therefore if the TX cpu simply spins in a loop doing nothing but TX
reclaim work it should not really matter.

And if you hit the TX budget on the TX cpu, it's just going to come
right back into the ixgbe NAPI handler and thus the TX reclaim
processing not even a dozen cycles later.

The only effect is to have us go through the whole function call
sequence and data structure setup into local variables more than you
would be doing so before.

> In addition since the RX and TX workload was balanced it kept both
> locked into polling while the CPU was saturated instead of allowing
> the TX to become interrupt driven.  In addition since the TX was
> working on the same budget as the RX the number of SKBs freed up in
> the TX path would match the number consumed when being reallocated
> on the RX path.

So the only conclusion I can come to is that what happens is we're now
executing what are essentially wasted cpu cycles and this takes us
over the threshold such that we poll more and take interrupts less.
And this improves performance.

That's pretty unwise if you ask me, we should do something useful with
cpu cycles instead of wasting them merely to make us poll more.

> The problem seemed to be present as long as I allowed the TX budget to
> be a multiple of the RX budget.  The easiest way to keep things
> balanced and avoid allowing the TX from one CPU to overwhelm the RX on
> another was just to keep the budgets equal.

You're executing 10 or 20 cpu cycles after every 64 TX reclaims,
that's the only effect of these changes.  That's not even long enough
for a cache line to transfer between two cpus.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-22 23:40               ` David Miller
@ 2011-08-23  4:04                 ` Alexander Duyck
  2011-08-23 20:52                   ` Alexander Duyck
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2011-08-23  4:04 UTC (permalink / raw)
  To: David Miller
  Cc: alexander.h.duyck, bhutchings, jeffrey.t.kirsher, netdev, gospo

On Mon, Aug 22, 2011 at 4:40 PM, David Miller <davem@davemloft.net> wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> Date: Mon, 22 Aug 2011 15:57:51 -0700
>
>> The problem was occurring even without large rings.
>> I was seeing issues with rings just 256 descriptors in size.
>
> And the default in the ixgbe driver is 512 entries which I think
> itself is quite excessive.  Something like 128 is more in line with
> what I'd call a sane default.

Are you suggesting I change the the ring size, the TX quota, or both?

> So the only side effect of your change is to decrease the TX quota to
> 64 (the default NAPI quota) from it's current value of 512
> (IXGBE_DEFAULT_TXD).

Yeah, that pretty much sums it up.  However as I said in the earlier
email I am counting SKBs freed instead of descriptors.  As such we
will probably end up cleaning more like 128 descriptors per TX clean
just due to our context descriptors that are also occupying space in
the ring.

> Talking about the existing code, I can't even see how the current
> driver private TX quota can trigger except in the most extreme cases.
> This is because the quota is set the same as the size you're setting
> the TX ring to.

I'm not sure if it ever met the quota either.  I suspect not since
under the routing workloads I would see the RX interrupts get disabled
but the TX interrupts keep going.

>> The problem seemed to be that the TX cleanup being a multiple of
>> budget was allowing one CPU to overwhelm the other and the fact that
>> the TX was essentially unbounded was just allowing the issue to
>> feedback on itself.
>
> I still don't understand what issue you could even be running into.
>
> On each CPU we round robin against all NAPI requestors for that CPU.
>
> In your routing test setup, we should have one cpu doing the RX and
> another different cpu doing TX.
>
> Therefore if the TX cpu simply spins in a loop doing nothing but TX
> reclaim work it should not really matter.

Doing a unidirectional test was fine and everything worked as you
describe.  It was doing a bidirectional test with two ports and a
single queue for each port that was the issue.  Specifically what
would happen is that one direction would tend to dominate over the
other so I would end up with a 60/40 split with either upstream or
downstream dominating.  By using the budget as the quota I found the
results were generally much closer to 50/50 and the overall result of
the two flows combined was higher.  I suspect it had to do with TX
work getting backlogged on the ring and acting as a feedback mechanism
to prevent the RX work on that CPU from getting squashed.

> And if you hit the TX budget on the TX cpu, it's just going to come
> right back into the ixgbe NAPI handler and thus the TX reclaim
> processing not even a dozen cycles later.
>
> The only effect is to have us go through the whole function call
> sequence and data structure setup into local variables more than you
> would be doing so before.

I suppose that is possible.  It all depends on the type of packets
being sent.  For single sends without any offloads we would only be
cleaning 64 descriptors per call to ixgbe_poll,  with an offloaded
checksum or VLAN it would be 128 descriptors per call, and with TSO we
probably still wouldn't consume the budget since the sk_buff
consumption rate would be too low.

I can try testing the throughput with pktgen tomorrow to see if it
improves by increasing the TX budget.  I suppose there could be a few
factors affecting this since the budget value also determines the
number of buffers we clean before we call netif_wake_queue to
re-enable the transmit path.

>> In addition since the RX and TX workload was balanced it kept both
>> locked into polling while the CPU was saturated instead of allowing
>> the TX to become interrupt driven.  In addition since the TX was
>> working on the same budget as the RX the number of SKBs freed up in
>> the TX path would match the number consumed when being reallocated
>> on the RX path.
>
> So the only conclusion I can come to is that what happens is we're now
> executing what are essentially wasted cpu cycles and this takes us
> over the threshold such that we poll more and take interrupts less.
> And this improves performance.
>
> That's pretty unwise if you ask me, we should do something useful with
> cpu cycles instead of wasting them merely to make us poll more.

The thing is we are probably going to be wasting those cycles anyway.
In the case of bidirectional routing I was always locked into RX
polling with this change in place or not.  The only difference is that
the TX will likely clean itself completely with each poll versus the
possibility of leaving a few buffers behind when it hits the 64 quota.

>> The problem seemed to be present as long as I allowed the TX budget to
>> be a multiple of the RX budget.  The easiest way to keep things
>> balanced and avoid allowing the TX from one CPU to overwhelm the RX on
>> another was just to keep the budgets equal.
>
> You're executing 10 or 20 cpu cycles after every 64 TX reclaims,
> that's the only effect of these changes.  That's not even long enough
> for a cache line to transfer between two cpus.

It sounds like I may not have been seeing this due to the type of
workload I was focusing on.  I'll try generating some data with pktgen
and netperf tomorrow to see how this holds up under small packet
transmit only traffic since those are the cases most likely to get
into the state you mention.

Also I would appreciate it if you had any suggestions on other
workloads I might need to focus on in order to determine the impact of
this change.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget
  2011-08-23  4:04                 ` Alexander Duyck
@ 2011-08-23 20:52                   ` Alexander Duyck
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Duyck @ 2011-08-23 20:52 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, bhutchings, jeffrey.t.kirsher, netdev, gospo

On 08/22/2011 09:04 PM, Alexander Duyck wrote:
> On Mon, Aug 22, 2011 at 4:40 PM, David Miller<davem@davemloft.net>  wrote:
>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>> Date: Mon, 22 Aug 2011 15:57:51 -0700
>>> The problem seemed to be present as long as I allowed the TX budget to
>>> be a multiple of the RX budget.  The easiest way to keep things
>>> balanced and avoid allowing the TX from one CPU to overwhelm the RX on
>>> another was just to keep the budgets equal.
>> You're executing 10 or 20 cpu cycles after every 64 TX reclaims,
>> that's the only effect of these changes.  That's not even long enough
>> for a cache line to transfer between two cpus.
> It sounds like I may not have been seeing this due to the type of
> workload I was focusing on.  I'll try generating some data with pktgen
> and netperf tomorrow to see how this holds up under small packet
> transmit only traffic since those are the cases most likely to get
> into the state you mention.
>
> Also I would appreciate it if you had any suggestions on other
> workloads I might need to focus on in order to determine the impact of
> this change.
>
> Thanks,
>
> Alex

I found a reason to rewrite this.  Basically this modification has a 
negative impact in the case of multiple ports on a single CPU all 
routing to the same port on the same CPU.  It ends up making it so that 
the transmit throughput is only (total CPU packets per second)/(number 
of ports receiving on cpu).  So on a system that can receive at 1.4Mpps 
on a single core we end up seeing only a little over 350Kpps of transmit 
when 4 ports are all receiving packets on the system.

I'll look at rewriting this.  I'll probably leave the work limit 
controlling things but lower it to a more reasonable value such as 1/2 
to 1/4 of the ring size.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 00/10][pull request] Intel Wired LAN Driver Update
  2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
                   ` (9 preceding siblings ...)
  2011-08-21  7:29 ` [net-next 10/10] ixgbe: Update TXDCTL configuration to correctly handle WTHRESH Jeff Kirsher
@ 2011-08-23 22:45 ` David Miller
  2011-08-24  2:34   ` Jeff Kirsher
  10 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2011-08-23 22:45 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Sun, 21 Aug 2011 00:29:11 -0700

> The following series contains updates to ixgbe.
> 
> The following are changes since commit ca1ba7caa68520864e4b9227e67f3bbc6fed373b:
>   Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next
> and are available in the git repository at:
>   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander says he is going to respin patch #3, so I'm dropping this and
waiting for the next submission.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next 00/10][pull request] Intel Wired LAN Driver Update
  2011-08-23 22:45 ` [net-next 00/10][pull request] Intel Wired LAN Driver Update David Miller
@ 2011-08-24  2:34   ` Jeff Kirsher
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff Kirsher @ 2011-08-24  2:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, gospo

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Tue, 2011-08-23 at 15:45 -0700, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Sun, 21 Aug 2011 00:29:11 -0700
> 
> > The following series contains updates to ixgbe.
> > 
> > The following are changes since commit ca1ba7caa68520864e4b9227e67f3bbc6fed373b:
> >   Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next
> > and are available in the git repository at:
> >   master.kernel.org:/pub/scm/linux/kernel/git/jkirsher/net-next master
> 
> Alexander says he is going to respin patch #3, so I'm dropping this and
> waiting for the next submission.

I agree, I have been following the thread.  I know know Alex will work
on re-working the patch, in the meantime I will review the ixgbe-patches
ready to be submitted which are not affected by the changes in patch 3
and push those while I await.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-08-24  2:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-21  7:29 [net-next 00/10][pull request] Intel Wired LAN Driver Update Jeff Kirsher
2011-08-21  7:29 ` [net-next 01/10] ixgbe: Simplify transmit cleanup path Jeff Kirsher
2011-08-21  7:29 ` [net-next 02/10] ixgbe: convert rings from q_vector bit indexed array to linked list Jeff Kirsher
2011-08-21  7:29 ` [net-next 03/10] ixgbe: Drop the TX work limit and instead just leave it to budget Jeff Kirsher
2011-08-21 14:01   ` Ben Hutchings
2011-08-22 16:30     ` Alexander Duyck
2011-08-22 16:46       ` Ben Hutchings
2011-08-22 17:29         ` Alexander Duyck
2011-08-22 20:56           ` David Miller
2011-08-22 22:57             ` Alexander Duyck
2011-08-22 23:40               ` David Miller
2011-08-23  4:04                 ` Alexander Duyck
2011-08-23 20:52                   ` Alexander Duyck
2011-08-21  7:29 ` [net-next 04/10] ixgbe: consolidate all MSI-X ring interrupts and poll routines into one Jeff Kirsher
2011-08-21  7:29 ` [net-next 05/10] ixgbe: cleanup allocation and freeing of IRQ affinity hint Jeff Kirsher
2011-08-21  7:29 ` [net-next 06/10] ixgbe: Use ring->dev instead of adapter->pdev->dev when updating DCA Jeff Kirsher
2011-08-21  7:29 ` [net-next 07/10] ixgbe: commonize ixgbe_map_rings_to_vectors to work for all interrupt types Jeff Kirsher
2011-08-21  7:29 ` [net-next 08/10] ixgbe: Drop unnecessary adapter->hw dereference in loopback test setup Jeff Kirsher
2011-08-21  7:29 ` [net-next 09/10] ixgbe: combine PCI_VDEVICE and board declaration to same line Jeff Kirsher
2011-08-21  7:29 ` [net-next 10/10] ixgbe: Update TXDCTL configuration to correctly handle WTHRESH Jeff Kirsher
2011-08-23 22:45 ` [net-next 00/10][pull request] Intel Wired LAN Driver Update David Miller
2011-08-24  2:34   ` Jeff Kirsher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.