[net-next 00/13][pull request] Intel Wired LAN Driver Updates

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [net-next 00/13][pull request] Intel Wired LAN Driver Updates
@ 2011-09-17  8:04 Jeff Kirsher
  2011-09-17  8:04 ` [net-next 01/13] ixgb: eliminate checkstack warnings Jeff Kirsher
                   ` (12 more replies)
  0 siblings, 13 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo

The following series contains updates to ixgb and igb. The ixgb patch
is a trivial fix.  The remaining patches are for igb to do the following:

  - cleanup/consolidate Tx descriptors
  - trivial fix to remove _adv/_ADV since igb uses only advanced
    descriptors
  - performance enhancements to help with  general and routing

This series of igb patches is the first of three to update/cleanup the igb
driver by Alex.  So there are additional patches/changes coming to complete
this work.

The following are changes since commit 765cf9976e937f1cfe9159bf4534967c8bf8eb6d:
  tcp: md5: remove one indirection level in tcp_md5sig_pool
and are available in the git repository at:
  git://github.com/Jkirsher/net-next.git

Alexander Duyck (12):
  igb: Update RXDCTL/TXDCTL configurations
  igb: Update max_frame_size to account for an optional VLAN tag if
    present
  igb: drop support for single buffer mode
  igb: streamline Rx buffer allocation and cleanup
  igb: update ring and adapter structure to improve performance
  igb: Refactor clean_rx_irq to reduce overhead and improve performance
  igb: drop the "adv" off function names relating to descriptors
  igb: Replace E1000_XX_DESC_ADV with IGB_XX_DESC
  igb: Remove multi_tx_table and simplify igb_xmit_frame
  igb: Make Tx budget for NAPI user adjustable
  igb: split buffer_info into tx_buffer_info and rx_buffer_info
  igb: Consolidate creation of Tx context descriptors into a single
    function

Jesse Brandeburg (1):
  ixgb: eliminate checkstack warnings

 drivers/net/ethernet/intel/igb/igb.h         |  160 +++--
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   36 +-
 drivers/net/ethernet/intel/igb/igb_main.c    |  972 +++++++++++++-------------
 drivers/net/ethernet/intel/ixgb/ixgb_main.c  |   10 +-
 4 files changed, 586 insertions(+), 592 deletions(-)

-- 
1.7.6

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [net-next 01/13] ixgb: eliminate checkstack warnings
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:43   ` Joe Perches
  2011-09-17  8:04 ` [net-next 02/13] igb: Update RXDCTL/TXDCTL configurations Jeff Kirsher
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Jesse Brandeburg, netdev, gospo, Jeff Kirsher

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

Really trivial fix, use kzalloc/kree instead of stack space.

before:
[jbrandeb@jbrandeb-mobl2 linux-2.6]$ make checkstack|grep ixgb_
0x0210 ixgb_set_multi [ixgb]:				768
0x04f8 ixgb_check_options [ixgb]:			220
0x04334 ixgb_set_ringparam [ixgb]:			124
0x04516 ixgb_set_ringparam [ixgb]:			124

after:
0x04f8 ixgb_check_options [ixgb]:			220
0x04354 ixgb_set_ringparam [ixgb]:			124
0x04536 ixgb_set_ringparam [ixgb]:			124

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgb/ixgb_main.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
index b8fb163..500823b 100644
--- a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
+++ b/drivers/net/ethernet/intel/ixgb/ixgb_main.c
@@ -1120,8 +1120,12 @@ ixgb_set_multi(struct net_device *netdev)
 		rctl |= IXGB_RCTL_MPE;
 		IXGB_WRITE_REG(hw, RCTL, rctl);
 	} else {
-		u8 mta[IXGB_MAX_NUM_MULTICAST_ADDRESSES *
-			    IXGB_ETH_LENGTH_OF_ADDRESS];
+		u8 *mta = kzalloc(IXGB_MAX_NUM_MULTICAST_ADDRESSES *
+			      IXGB_ETH_LENGTH_OF_ADDRESS, GFP_KERNEL);
+		if (!mta) {
+			pr_err("allocation of multicast memory failed\n");
+			goto alloc_failed;
+		}
 
 		IXGB_WRITE_REG(hw, RCTL, rctl);
 
@@ -1131,8 +1135,10 @@ ixgb_set_multi(struct net_device *netdev)
 			       ha->addr, IXGB_ETH_LENGTH_OF_ADDRESS);
 
 		ixgb_mc_addr_list_update(hw, mta, netdev_mc_count(netdev), 0);
+		kfree(mta);
 	}
 
+alloc_failed:
 	if (netdev->features & NETIF_F_HW_VLAN_RX)
 		ixgb_vlan_strip_enable(adapter);
 	else
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 02/13] igb: Update RXDCTL/TXDCTL configurations
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
  2011-09-17  8:04 ` [net-next 01/13] ixgb: eliminate checkstack warnings Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 03/13] igb: Update max_frame_size to account for an optional VLAN tag if present Jeff Kirsher
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change cleans up the RXDCTL and TXDCTL configurations and optimizes RX
performance by allowing back write-backs on all hardware other than 82576.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |    5 +++--
 drivers/net/ethernet/intel/igb/igb_main.c |   23 +++++++++--------------
 2 files changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 265e151..577fd3e 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -100,11 +100,12 @@ struct vf_data_storage {
  */
 #define IGB_RX_PTHRESH                     8
 #define IGB_RX_HTHRESH                     8
-#define IGB_RX_WTHRESH                     1
 #define IGB_TX_PTHRESH                     8
 #define IGB_TX_HTHRESH                     1
+#define IGB_RX_WTHRESH                     ((hw->mac.type == e1000_82576 && \
+					     adapter->msix_entries) ? 1 : 4)
 #define IGB_TX_WTHRESH                     ((hw->mac.type == e1000_82576 && \
-                                             adapter->msix_entries) ? 1 : 16)
+					     adapter->msix_entries) ? 1 : 16)
 
 /* this is the size past which hardware will drop packets when setting LPE=0 */
 #define MAXIMUM_ETHERNET_VLAN_SIZE 1522
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 3cb1bc9..aa78c10 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2666,14 +2666,12 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
                            struct igb_ring *ring)
 {
 	struct e1000_hw *hw = &adapter->hw;
-	u32 txdctl;
+	u32 txdctl = 0;
 	u64 tdba = ring->dma;
 	int reg_idx = ring->reg_idx;
 
 	/* disable the queue */
-	txdctl = rd32(E1000_TXDCTL(reg_idx));
-	wr32(E1000_TXDCTL(reg_idx),
-	                txdctl & ~E1000_TXDCTL_QUEUE_ENABLE);
+	wr32(E1000_TXDCTL(reg_idx), 0);
 	wrfl();
 	mdelay(10);
 
@@ -2685,7 +2683,7 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
 
 	ring->head = hw->hw_addr + E1000_TDH(reg_idx);
 	ring->tail = hw->hw_addr + E1000_TDT(reg_idx);
-	writel(0, ring->head);
+	wr32(E1000_TDH(reg_idx), 0);
 	writel(0, ring->tail);
 
 	txdctl |= IGB_TX_PTHRESH;
@@ -3028,12 +3026,10 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 	struct e1000_hw *hw = &adapter->hw;
 	u64 rdba = ring->dma;
 	int reg_idx = ring->reg_idx;
-	u32 srrctl, rxdctl;
+	u32 srrctl = 0, rxdctl = 0;
 
 	/* disable the queue */
-	rxdctl = rd32(E1000_RXDCTL(reg_idx));
-	wr32(E1000_RXDCTL(reg_idx),
-	                rxdctl & ~E1000_RXDCTL_QUEUE_ENABLE);
+	wr32(E1000_RXDCTL(reg_idx), 0);
 
 	/* Set DMA base address registers */
 	wr32(E1000_RDBAL(reg_idx),
@@ -3045,7 +3041,7 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 	/* initialize head and tail */
 	ring->head = hw->hw_addr + E1000_RDH(reg_idx);
 	ring->tail = hw->hw_addr + E1000_RDT(reg_idx);
-	writel(0, ring->head);
+	wr32(E1000_RDH(reg_idx), 0);
 	writel(0, ring->tail);
 
 	/* set descriptor configuration */
@@ -3076,13 +3072,12 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 	/* set filtering for VMDQ pools */
 	igb_set_vmolr(adapter, reg_idx & 0x7, true);
 
-	/* enable receive descriptor fetching */
-	rxdctl = rd32(E1000_RXDCTL(reg_idx));
-	rxdctl |= E1000_RXDCTL_QUEUE_ENABLE;
-	rxdctl &= 0xFFF00000;
 	rxdctl |= IGB_RX_PTHRESH;
 	rxdctl |= IGB_RX_HTHRESH << 8;
 	rxdctl |= IGB_RX_WTHRESH << 16;
+
+	/* enable receive descriptor fetching */
+	rxdctl |= E1000_RXDCTL_QUEUE_ENABLE;
 	wr32(E1000_RXDCTL(reg_idx), rxdctl);
 }
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 03/13] igb: Update max_frame_size to account for an optional VLAN tag if present
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
  2011-09-17  8:04 ` [net-next 01/13] ixgb: eliminate checkstack warnings Jeff Kirsher
  2011-09-17  8:04 ` [net-next 02/13] igb: Update RXDCTL/TXDCTL configurations Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 04/13] igb: drop support for single buffer mode Jeff Kirsher
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch modifies the max_frame_size in order account for an optional
VLAN tag.  In order to support this we must also increase the
MAX_STD_JUMBO_FRAME_SIZE to account for the 4 extra bytes.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |    2 --
 drivers/net/ethernet/intel/igb/igb_main.c |   19 ++++++++++++-------
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 577fd3e..8e90b85 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -117,8 +117,6 @@ struct vf_data_storage {
 #define IGB_RXBUFFER_2048  2048
 #define IGB_RXBUFFER_16384 16384
 
-#define MAX_STD_JUMBO_FRAME_SIZE 9234
-
 /* How many Tx Descriptors do we need to call netif_wake_queue ? */
 #define IGB_TX_QUEUE_WAKE	16
 /* How many Rx Buffers do we bundle into one write to the hardware ? */
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index aa78c10..6156275 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2396,7 +2396,8 @@ static int __devinit igb_sw_init(struct igb_adapter *adapter)
 	adapter->rx_itr_setting = IGB_DEFAULT_ITR;
 	adapter->tx_itr_setting = IGB_DEFAULT_ITR;
 
-	adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
+	adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN +
+				  VLAN_HLEN;
 	adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
 
 	spin_lock_init(&adapter->stats64_lock);
@@ -2962,16 +2963,19 @@ static inline int igb_set_vf_rlpml(struct igb_adapter *adapter, int size,
  **/
 static void igb_rlpml_set(struct igb_adapter *adapter)
 {
-	u32 max_frame_size;
+	u32 max_frame_size = adapter->max_frame_size;
 	struct e1000_hw *hw = &adapter->hw;
 	u16 pf_id = adapter->vfs_allocated_count;
 
-	max_frame_size = adapter->max_frame_size + VLAN_TAG_SIZE;
-
-	/* if vfs are enabled we set RLPML to the largest possible request
-	 * size and set the VMOLR RLPML to the size we need */
 	if (pf_id) {
 		igb_set_vf_rlpml(adapter, max_frame_size, pf_id);
+		/*
+		 * If we're in VMDQ or SR-IOV mode, then set global RLPML
+		 * to our max jumbo frame size, in case we need to enable
+		 * jumbo frames on one of the rings later.
+		 * This will not pass over-length frames into the default
+		 * queue because it's gated by the VMOLR.RLPML.
+		 */
 		max_frame_size = MAX_JUMBO_FRAME_SIZE;
 	}
 
@@ -4461,7 +4465,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct pci_dev *pdev = adapter->pdev;
-	int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN;
+	int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
 	u32 rx_buffer_len, i;
 
 	if ((new_mtu < 68) || (max_frame > MAX_JUMBO_FRAME_SIZE)) {
@@ -4469,6 +4473,7 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 		return -EINVAL;
 	}
 
+#define MAX_STD_JUMBO_FRAME_SIZE 9238
 	if (max_frame > MAX_STD_JUMBO_FRAME_SIZE) {
 		dev_err(&pdev->dev, "MTU > 9216 not supported.\n");
 		return -EINVAL;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 04/13] igb: drop support for single buffer mode
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (2 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 03/13] igb: Update max_frame_size to account for an optional VLAN tag if present Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 05/13] igb: streamline Rx buffer allocation and cleanup Jeff Kirsher
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change removes support for single buffer mode from igb and makes the
driver function in packet split always.  The advantage to doing this is
that we can reduce total memory allocation overhead significantly as we
will only need to allocate one 1K slab per packet and then make use of a
reusable half page instead of allocating a 2K slab per packet.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |    7 +-
 drivers/net/ethernet/intel/igb/igb_ethtool.c |    5 +-
 drivers/net/ethernet/intel/igb/igb_main.c    |  102 ++++++--------------------
 3 files changed, 28 insertions(+), 86 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 8e90b85..50632b1 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -111,11 +111,9 @@ struct vf_data_storage {
 #define MAXIMUM_ETHERNET_VLAN_SIZE 1522
 
 /* Supported Rx Buffer Sizes */
-#define IGB_RXBUFFER_64    64     /* Used for packet split */
-#define IGB_RXBUFFER_128   128    /* Used for packet split */
-#define IGB_RXBUFFER_1024  1024
-#define IGB_RXBUFFER_2048  2048
+#define IGB_RXBUFFER_512   512
 #define IGB_RXBUFFER_16384 16384
+#define IGB_RX_HDR_LEN     IGB_RXBUFFER_512
 
 /* How many Tx Descriptors do we need to call netif_wake_queue ? */
 #define IGB_TX_QUEUE_WAKE	16
@@ -221,7 +219,6 @@ struct igb_ring {
 		struct {
 			struct igb_rx_queue_stats rx_stats;
 			struct u64_stats_sync rx_syncp;
-			u32 rx_buffer_len;
 		};
 	};
 };
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 414b022..04bc7a5 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1368,7 +1368,6 @@ static int igb_setup_desc_rings(struct igb_adapter *adapter)
 	rx_ring->count = IGB_DEFAULT_RXD;
 	rx_ring->dev = &adapter->pdev->dev;
 	rx_ring->netdev = adapter->netdev;
-	rx_ring->rx_buffer_len = IGB_RXBUFFER_2048;
 	rx_ring->reg_idx = adapter->vfs_allocated_count;
 
 	if (igb_setup_rx_resources(rx_ring)) {
@@ -1597,7 +1596,7 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
 		/* unmap rx buffer, will be remapped by alloc_rx_buffers */
 		dma_unmap_single(rx_ring->dev,
 		                 buffer_info->dma,
-				 rx_ring->rx_buffer_len,
+				 IGB_RX_HDR_LEN,
 				 DMA_FROM_DEVICE);
 		buffer_info->dma = 0;
 
@@ -1635,7 +1634,7 @@ static int igb_run_loopback_test(struct igb_adapter *adapter)
 	struct igb_ring *tx_ring = &adapter->test_tx_ring;
 	struct igb_ring *rx_ring = &adapter->test_rx_ring;
 	int i, j, lc, good_cnt, ret_val = 0;
-	unsigned int size = 1024;
+	unsigned int size = IGB_RX_HDR_LEN;
 	netdev_tx_t tx_ret_val;
 	struct sk_buff *skb;
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 6156275..022c442 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -517,16 +517,14 @@ rx_ring_summary:
 						DUMP_PREFIX_ADDRESS,
 						16, 1,
 						phys_to_virt(buffer_info->dma),
-						rx_ring->rx_buffer_len, true);
-					if (rx_ring->rx_buffer_len
-						< IGB_RXBUFFER_1024)
-						print_hex_dump(KERN_INFO, "",
-						  DUMP_PREFIX_ADDRESS,
-						  16, 1,
-						  phys_to_virt(
-						    buffer_info->page_dma +
-						    buffer_info->page_offset),
-						  PAGE_SIZE/2, true);
+						IGB_RX_HDR_LEN, true);
+					print_hex_dump(KERN_INFO, "",
+					  DUMP_PREFIX_ADDRESS,
+					  16, 1,
+					  phys_to_virt(
+					    buffer_info->page_dma +
+					    buffer_info->page_offset),
+					  PAGE_SIZE/2, true);
 				}
 			}
 
@@ -707,7 +705,6 @@ static int igb_alloc_queues(struct igb_adapter *adapter)
 		ring->queue_index = i;
 		ring->dev = &adapter->pdev->dev;
 		ring->netdev = adapter->netdev;
-		ring->rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
 		ring->flags = IGB_RING_FLAG_RX_CSUM; /* enable rx checksum */
 		/* set flag indicating ring supports SCTP checksum offload */
 		if (adapter->hw.mac.type >= e1000_82576)
@@ -3049,22 +3046,13 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 	writel(0, ring->tail);
 
 	/* set descriptor configuration */
-	if (ring->rx_buffer_len < IGB_RXBUFFER_1024) {
-		srrctl = ALIGN(ring->rx_buffer_len, 64) <<
-		         E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
+	srrctl = IGB_RX_HDR_LEN << E1000_SRRCTL_BSIZEHDRSIZE_SHIFT;
 #if (PAGE_SIZE / 2) > IGB_RXBUFFER_16384
-		srrctl |= IGB_RXBUFFER_16384 >>
-		          E1000_SRRCTL_BSIZEPKT_SHIFT;
+	srrctl |= IGB_RXBUFFER_16384 >> E1000_SRRCTL_BSIZEPKT_SHIFT;
 #else
-		srrctl |= (PAGE_SIZE / 2) >>
-		          E1000_SRRCTL_BSIZEPKT_SHIFT;
+	srrctl |= (PAGE_SIZE / 2) >> E1000_SRRCTL_BSIZEPKT_SHIFT;
 #endif
-		srrctl |= E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
-	} else {
-		srrctl = ALIGN(ring->rx_buffer_len, 1024) >>
-		         E1000_SRRCTL_BSIZEPKT_SHIFT;
-		srrctl |= E1000_SRRCTL_DESCTYPE_ADV_ONEBUF;
-	}
+	srrctl |= E1000_SRRCTL_DESCTYPE_HDR_SPLIT_ALWAYS;
 	if (hw->mac.type == e1000_82580)
 		srrctl |= E1000_SRRCTL_TIMESTAMP;
 	/* Only set Drop Enable if we are supporting multiple queues */
@@ -3268,7 +3256,7 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
 		if (buffer_info->dma) {
 			dma_unmap_single(rx_ring->dev,
 			                 buffer_info->dma,
-					 rx_ring->rx_buffer_len,
+					 IGB_RX_HDR_LEN,
 					 DMA_FROM_DEVICE);
 			buffer_info->dma = 0;
 		}
@@ -4466,7 +4454,6 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct pci_dev *pdev = adapter->pdev;
 	int max_frame = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
-	u32 rx_buffer_len, i;
 
 	if ((new_mtu < 68) || (max_frame > MAX_JUMBO_FRAME_SIZE)) {
 		dev_err(&pdev->dev, "Invalid MTU setting\n");
@@ -4485,30 +4472,6 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 	/* igb_down has a dependency on max_frame_size */
 	adapter->max_frame_size = max_frame;
 
-	/* NOTE: netdev_alloc_skb reserves 16 bytes, and typically NET_IP_ALIGN
-	 * means we reserve 2 more, this pushes us to allocate from the next
-	 * larger slab size.
-	 * i.e. RXBUFFER_2048 --> size-4096 slab
-	 */
-
-	if (adapter->hw.mac.type == e1000_82580)
-		max_frame += IGB_TS_HDR_LEN;
-
-	if (max_frame <= IGB_RXBUFFER_1024)
-		rx_buffer_len = IGB_RXBUFFER_1024;
-	else if (max_frame <= MAXIMUM_ETHERNET_VLAN_SIZE)
-		rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE;
-	else
-		rx_buffer_len = IGB_RXBUFFER_128;
-
-	if ((max_frame == ETH_FRAME_LEN + ETH_FCS_LEN + IGB_TS_HDR_LEN) ||
-	     (max_frame == MAXIMUM_ETHERNET_VLAN_SIZE + IGB_TS_HDR_LEN))
-		rx_buffer_len = MAXIMUM_ETHERNET_VLAN_SIZE + IGB_TS_HDR_LEN;
-
-	if ((adapter->hw.mac.type == e1000_82580) &&
-	    (rx_buffer_len == IGB_RXBUFFER_128))
-		rx_buffer_len += IGB_RXBUFFER_64;
-
 	if (netif_running(netdev))
 		igb_down(adapter);
 
@@ -4516,9 +4479,6 @@ static int igb_change_mtu(struct net_device *netdev, int new_mtu)
 		 netdev->mtu, new_mtu);
 	netdev->mtu = new_mtu;
 
-	for (i = 0; i < adapter->num_rx_queues; i++)
-		adapter->rx_ring[i]->rx_buffer_len = rx_buffer_len;
-
 	if (netif_running(netdev))
 		igb_up(adapter);
 	else
@@ -5781,8 +5741,7 @@ static void igb_rx_hwtstamp(struct igb_q_vector *q_vector, u32 staterr,
 
 	igb_systim_to_hwtstamp(adapter, skb_hwtstamps(skb), regval);
 }
-static inline u16 igb_get_hlen(struct igb_ring *rx_ring,
-                               union e1000_adv_rx_desc *rx_desc)
+static inline u16 igb_get_hlen(union e1000_adv_rx_desc *rx_desc)
 {
 	/* HW will not DMA in data larger than the given buffer, even if it
 	 * parses the (NFS, of course) header to be larger.  In that case, it
@@ -5790,8 +5749,8 @@ static inline u16 igb_get_hlen(struct igb_ring *rx_ring,
 	 */
 	u16 hlen = (le16_to_cpu(rx_desc->wb.lower.lo_dword.hdr_info) &
 	           E1000_RXDADV_HDRBUFLEN_MASK) >> E1000_RXDADV_HDRBUFLEN_SHIFT;
-	if (hlen > rx_ring->rx_buffer_len)
-		hlen = rx_ring->rx_buffer_len;
+	if (hlen > IGB_RX_HDR_LEN)
+		hlen = IGB_RX_HDR_LEN;
 	return hlen;
 }
 
@@ -5841,14 +5800,10 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 
 		if (buffer_info->dma) {
 			dma_unmap_single(dev, buffer_info->dma,
-					 rx_ring->rx_buffer_len,
+					 IGB_RX_HDR_LEN,
 					 DMA_FROM_DEVICE);
 			buffer_info->dma = 0;
-			if (rx_ring->rx_buffer_len >= IGB_RXBUFFER_1024) {
-				skb_put(skb, length);
-				goto send_up;
-			}
-			skb_put(skb, igb_get_hlen(rx_ring, rx_desc));
+			skb_put(skb, igb_get_hlen(rx_desc));
 		}
 
 		if (length) {
@@ -5879,7 +5834,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 			next_buffer->dma = 0;
 			goto next_desc;
 		}
-send_up:
+
 		if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
 			dev_kfree_skb_irq(skb);
 			goto next_desc;
@@ -5943,17 +5898,14 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 	struct igb_buffer *buffer_info;
 	struct sk_buff *skb;
 	unsigned int i;
-	int bufsz;
 
 	i = rx_ring->next_to_use;
 	buffer_info = &rx_ring->buffer_info[i];
 
-	bufsz = rx_ring->rx_buffer_len;
-
 	while (cleaned_count--) {
 		rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
 
-		if ((bufsz < IGB_RXBUFFER_1024) && !buffer_info->page_dma) {
+		if (!buffer_info->page_dma) {
 			if (!buffer_info->page) {
 				buffer_info->page = netdev_alloc_page(netdev);
 				if (unlikely(!buffer_info->page)) {
@@ -5983,7 +5935,7 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 
 		skb = buffer_info->skb;
 		if (!skb) {
-			skb = netdev_alloc_skb_ip_align(netdev, bufsz);
+			skb = netdev_alloc_skb_ip_align(netdev, IGB_RX_HDR_LEN);
 			if (unlikely(!skb)) {
 				u64_stats_update_begin(&rx_ring->rx_syncp);
 				rx_ring->rx_stats.alloc_failed++;
@@ -5996,7 +5948,7 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 		if (!buffer_info->dma) {
 			buffer_info->dma = dma_map_single(rx_ring->dev,
 			                                  skb->data,
-							  bufsz,
+							  IGB_RX_HDR_LEN,
 							  DMA_FROM_DEVICE);
 			if (dma_mapping_error(rx_ring->dev,
 					      buffer_info->dma)) {
@@ -6009,14 +5961,8 @@ void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
 		}
 		/* Refresh the desc even if buffer_addrs didn't change because
 		 * each write-back erases this info. */
-		if (bufsz < IGB_RXBUFFER_1024) {
-			rx_desc->read.pkt_addr =
-			     cpu_to_le64(buffer_info->page_dma);
-			rx_desc->read.hdr_addr = cpu_to_le64(buffer_info->dma);
-		} else {
-			rx_desc->read.pkt_addr = cpu_to_le64(buffer_info->dma);
-			rx_desc->read.hdr_addr = 0;
-		}
+		rx_desc->read.pkt_addr = cpu_to_le64(buffer_info->page_dma);
+		rx_desc->read.hdr_addr = cpu_to_le64(buffer_info->dma);
 
 		i++;
 		if (i == rx_ring->count)
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 05/13] igb: streamline Rx buffer allocation and cleanup
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (3 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 04/13] igb: drop support for single buffer mode Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 06/13] igb: update ring and adapter structure to improve performance Jeff Kirsher
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to streamline the Rx buffer allocation and cleanup.
This is accomplished by reducing the number of writes by only having the Rx
descriptor ring written by software during allocation, and it will only be
read during cleanup.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |    2 +-
 drivers/net/ethernet/intel/igb/igb_main.c |  190 ++++++++++++++++-------------
 2 files changed, 104 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 50632b1..b2f2a8c 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -370,7 +370,7 @@ extern void igb_setup_rctl(struct igb_adapter *);
 extern netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *, struct igb_ring *);
 extern void igb_unmap_and_free_tx_resource(struct igb_ring *,
 					   struct igb_buffer *);
-extern void igb_alloc_rx_buffers_adv(struct igb_ring *, int);
+extern void igb_alloc_rx_buffers_adv(struct igb_ring *, u16);
 extern void igb_update_stats(struct igb_adapter *, struct rtnl_link_stats64 *);
 extern bool igb_has_link(struct igb_adapter *adapter);
 extern void igb_set_ethtool_ops(struct net_device *);
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 022c442..af8c2f7 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -3243,16 +3243,15 @@ static void igb_free_all_rx_resources(struct igb_adapter *adapter)
  **/
 static void igb_clean_rx_ring(struct igb_ring *rx_ring)
 {
-	struct igb_buffer *buffer_info;
 	unsigned long size;
-	unsigned int i;
+	u16 i;
 
 	if (!rx_ring->buffer_info)
 		return;
 
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
-		buffer_info = &rx_ring->buffer_info[i];
+		struct igb_buffer *buffer_info = &rx_ring->buffer_info[i];
 		if (buffer_info->dma) {
 			dma_unmap_single(rx_ring->dev,
 			                 buffer_info->dma,
@@ -5764,7 +5763,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 	struct igb_buffer *buffer_info , *next_buffer;
 	struct sk_buff *skb;
 	bool cleaned = false;
-	int cleaned_count = 0;
+	u16 cleaned_count = igb_desc_unused(rx_ring);
 	int current_node = numa_node_id();
 	unsigned int total_bytes = 0, total_packets = 0;
 	unsigned int i;
@@ -5848,7 +5847,6 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 		igb_rx_checksum_adv(rx_ring, staterr, skb);
 
 		skb->protocol = eth_type_trans(skb, netdev);
-		skb_record_rx_queue(skb, rx_ring->queue_index);
 
 		if (staterr & E1000_RXD_STAT_VP) {
 			u16 vid = le16_to_cpu(rx_desc->wb.upper.vlan);
@@ -5858,8 +5856,6 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 		napi_gro_receive(&q_vector->napi, skb);
 
 next_desc:
-		rx_desc->wb.upper.status_error = 0;
-
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IGB_RX_BUFFER_WRITE) {
 			igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
@@ -5873,110 +5869,130 @@ next_desc:
 	}
 
 	rx_ring->next_to_clean = i;
-	cleaned_count = igb_desc_unused(rx_ring);
-
-	if (cleaned_count)
-		igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
-
-	rx_ring->total_packets += total_packets;
-	rx_ring->total_bytes += total_bytes;
 	u64_stats_update_begin(&rx_ring->rx_syncp);
 	rx_ring->rx_stats.packets += total_packets;
 	rx_ring->rx_stats.bytes += total_bytes;
 	u64_stats_update_end(&rx_ring->rx_syncp);
+	rx_ring->total_packets += total_packets;
+	rx_ring->total_bytes += total_bytes;
+
+	if (cleaned_count)
+		igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
+
 	return cleaned;
 }
 
+static bool igb_alloc_mapped_skb(struct igb_ring *rx_ring,
+				 struct igb_buffer *bi)
+{
+	struct sk_buff *skb = bi->skb;
+	dma_addr_t dma = bi->dma;
+
+	if (dma)
+		return true;
+
+	if (likely(!skb)) {
+		skb = netdev_alloc_skb_ip_align(rx_ring->netdev,
+						IGB_RX_HDR_LEN);
+		bi->skb = skb;
+		if (!skb) {
+			rx_ring->rx_stats.alloc_failed++;
+			return false;
+		}
+
+		/* initialize skb for ring */
+		skb_record_rx_queue(skb, rx_ring->queue_index);
+	}
+
+	dma = dma_map_single(rx_ring->dev, skb->data,
+			     IGB_RX_HDR_LEN, DMA_FROM_DEVICE);
+
+	if (dma_mapping_error(rx_ring->dev, dma)) {
+		rx_ring->rx_stats.alloc_failed++;
+		return false;
+	}
+
+	bi->dma = dma;
+	return true;
+}
+
+static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
+				  struct igb_buffer *bi)
+{
+	struct page *page = bi->page;
+	dma_addr_t page_dma = bi->page_dma;
+	unsigned int page_offset = bi->page_offset ^ (PAGE_SIZE / 2);
+
+	if (page_dma)
+		return true;
+
+	if (!page) {
+		page = netdev_alloc_page(rx_ring->netdev);
+		bi->page = page;
+		if (unlikely(!page)) {
+			rx_ring->rx_stats.alloc_failed++;
+			return false;
+		}
+	}
+
+	page_dma = dma_map_page(rx_ring->dev, page,
+				page_offset, PAGE_SIZE / 2,
+				DMA_FROM_DEVICE);
+
+	if (dma_mapping_error(rx_ring->dev, page_dma)) {
+		rx_ring->rx_stats.alloc_failed++;
+		return false;
+	}
+
+	bi->page_dma = page_dma;
+	bi->page_offset = page_offset;
+	return true;
+}
+
 /**
  * igb_alloc_rx_buffers_adv - Replace used receive buffers; packet split
  * @adapter: address of board private structure
  **/
-void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, int cleaned_count)
+void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, u16 cleaned_count)
 {
-	struct net_device *netdev = rx_ring->netdev;
 	union e1000_adv_rx_desc *rx_desc;
-	struct igb_buffer *buffer_info;
-	struct sk_buff *skb;
-	unsigned int i;
+	struct igb_buffer *bi;
+	u16 i = rx_ring->next_to_use;
 
-	i = rx_ring->next_to_use;
-	buffer_info = &rx_ring->buffer_info[i];
+	rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
+	bi = &rx_ring->buffer_info[i];
+	i -= rx_ring->count;
 
 	while (cleaned_count--) {
-		rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
-
-		if (!buffer_info->page_dma) {
-			if (!buffer_info->page) {
-				buffer_info->page = netdev_alloc_page(netdev);
-				if (unlikely(!buffer_info->page)) {
-					u64_stats_update_begin(&rx_ring->rx_syncp);
-					rx_ring->rx_stats.alloc_failed++;
-					u64_stats_update_end(&rx_ring->rx_syncp);
-					goto no_buffers;
-				}
-				buffer_info->page_offset = 0;
-			} else {
-				buffer_info->page_offset ^= PAGE_SIZE / 2;
-			}
-			buffer_info->page_dma =
-				dma_map_page(rx_ring->dev, buffer_info->page,
-					     buffer_info->page_offset,
-					     PAGE_SIZE / 2,
-					     DMA_FROM_DEVICE);
-			if (dma_mapping_error(rx_ring->dev,
-					      buffer_info->page_dma)) {
-				buffer_info->page_dma = 0;
-				u64_stats_update_begin(&rx_ring->rx_syncp);
-				rx_ring->rx_stats.alloc_failed++;
-				u64_stats_update_end(&rx_ring->rx_syncp);
-				goto no_buffers;
-			}
-		}
+		if (!igb_alloc_mapped_skb(rx_ring, bi))
+			break;
 
-		skb = buffer_info->skb;
-		if (!skb) {
-			skb = netdev_alloc_skb_ip_align(netdev, IGB_RX_HDR_LEN);
-			if (unlikely(!skb)) {
-				u64_stats_update_begin(&rx_ring->rx_syncp);
-				rx_ring->rx_stats.alloc_failed++;
-				u64_stats_update_end(&rx_ring->rx_syncp);
-				goto no_buffers;
-			}
+		/* Refresh the desc even if buffer_addrs didn't change
+		 * because each write-back erases this info. */
+		rx_desc->read.hdr_addr = cpu_to_le64(bi->dma);
 
-			buffer_info->skb = skb;
-		}
-		if (!buffer_info->dma) {
-			buffer_info->dma = dma_map_single(rx_ring->dev,
-			                                  skb->data,
-							  IGB_RX_HDR_LEN,
-							  DMA_FROM_DEVICE);
-			if (dma_mapping_error(rx_ring->dev,
-					      buffer_info->dma)) {
-				buffer_info->dma = 0;
-				u64_stats_update_begin(&rx_ring->rx_syncp);
-				rx_ring->rx_stats.alloc_failed++;
-				u64_stats_update_end(&rx_ring->rx_syncp);
-				goto no_buffers;
-			}
-		}
-		/* Refresh the desc even if buffer_addrs didn't change because
-		 * each write-back erases this info. */
-		rx_desc->read.pkt_addr = cpu_to_le64(buffer_info->page_dma);
-		rx_desc->read.hdr_addr = cpu_to_le64(buffer_info->dma);
+		if (!igb_alloc_mapped_page(rx_ring, bi))
+			break;
+
+		rx_desc->read.pkt_addr = cpu_to_le64(bi->page_dma);
 
+		rx_desc++;
+		bi++;
 		i++;
-		if (i == rx_ring->count)
-			i = 0;
-		buffer_info = &rx_ring->buffer_info[i];
+		if (unlikely(!i)) {
+			rx_desc = E1000_RX_DESC_ADV(*rx_ring, 0);
+			bi = rx_ring->buffer_info;
+			i -= rx_ring->count;
+		}
+
+		/* clear the hdr_addr for the next_to_use descriptor */
+		rx_desc->read.hdr_addr = 0;
 	}
 
-no_buffers:
+	i += rx_ring->count;
+
 	if (rx_ring->next_to_use != i) {
 		rx_ring->next_to_use = i;
-		if (i == 0)
-			i = (rx_ring->count - 1);
-		else
-			i--;
 
 		/* Force memory writes to complete before letting h/w
 		 * know there are new descriptors to fetch.  (Only
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 06/13] igb: update ring and adapter structure to improve performance
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (4 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 05/13] igb: streamline Rx buffer allocation and cleanup Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 07/13] igb: Refactor clean_rx_irq to reduce overhead and " Jeff Kirsher
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to improve performance by splitting the Tx and Rx
rings into 3 sections.  The first is primarily a read only section
containing basic things like the indexes, a pointer to the dev and netdev
structures, and basic information.  The second section contains the stats
and next_to_use and next_to_clean values.  The third section is primarily
unused values that can just be placed at the end of the ring and are not
used in the hot path.

The adapter structure has several sections that are read in the hot path.
In order to improve performance there I am combining the frequent read
hot path items into a single cache line.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |   83 ++++++++++++++++-------------
 drivers/net/ethernet/intel/igb/igb_main.c |    4 +-
 2 files changed, 46 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index b2f2a8c..7036fd5 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -187,26 +187,26 @@ struct igb_q_vector {
 };
 
 struct igb_ring {
-	struct igb_q_vector *q_vector; /* backlink to q_vector */
-	struct net_device *netdev;     /* back pointer to net_device */
-	struct device *dev;            /* device pointer for dma mapping */
-	dma_addr_t dma;                /* phys address of the ring */
-	void *desc;                    /* descriptor ring memory */
-	unsigned int size;             /* length of desc. ring in bytes */
-	u16 count;                     /* number of desc. in the ring */
+	struct igb_q_vector *q_vector;	/* backlink to q_vector */
+	struct net_device *netdev;	/* back pointer to net_device */
+	struct device *dev;		/* device pointer for dma mapping */
+	struct igb_buffer *buffer_info;	/* array of buffer info structs */
+	void *desc;			/* descriptor ring memory */
+	unsigned long flags;		/* ring specific flags */
+	void __iomem *tail;		/* pointer to ring tail register */
+
+	u16 count;			/* number of desc. in the ring */
+	u8 queue_index;			/* logical index of the ring*/
+	u8 reg_idx;			/* physical index of the ring */
+	u32 size;			/* length of desc. ring in bytes */
+
+	/* everything past this point are written often */
+	u16 next_to_clean ____cacheline_aligned_in_smp;
 	u16 next_to_use;
-	u16 next_to_clean;
-	u8 queue_index;
-	u8 reg_idx;
-	void __iomem *head;
-	void __iomem *tail;
-	struct igb_buffer *buffer_info; /* array of buffer info structs */
 
 	unsigned int total_bytes;
 	unsigned int total_packets;
 
-	u32 flags;
-
 	union {
 		/* TX */
 		struct {
@@ -221,6 +221,8 @@ struct igb_ring {
 			struct u64_stats_sync rx_syncp;
 		};
 	};
+	/* Items past this point are only used during ring alloc / free */
+	dma_addr_t dma;                /* phys address of the ring */
 };
 
 #define IGB_RING_FLAG_RX_CSUM        0x00000001 /* RX CSUM enabled */
@@ -248,15 +250,15 @@ static inline int igb_desc_unused(struct igb_ring *ring)
 
 /* board specific private data structure */
 struct igb_adapter {
-	struct timer_list watchdog_timer;
-	struct timer_list phy_info_timer;
 	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
-	u16 mng_vlan_id;
-	u32 bd_number;
-	u32 wol;
-	u32 en_mng_pt;
-	u16 link_speed;
-	u16 link_duplex;
+
+	struct net_device *netdev;
+
+	unsigned long state;
+	unsigned int flags;
+
+	unsigned int num_q_vectors;
+	struct msix_entry *msix_entries;
 
 	/* Interrupt Throttle Rate */
 	u32 rx_itr_setting;
@@ -264,27 +266,36 @@ struct igb_adapter {
 	u16 tx_itr;
 	u16 rx_itr;
 
-	struct work_struct reset_task;
-	struct work_struct watchdog_task;
-	bool fc_autoneg;
-	u8  tx_timeout_factor;
-	struct timer_list blink_timer;
-	unsigned long led_status;
-
 	/* TX */
-	struct igb_ring *tx_ring[16];
 	u32 tx_timeout_count;
+	int num_tx_queues;
+	struct igb_ring *tx_ring[16];
 
 	/* RX */
-	struct igb_ring *rx_ring[16];
-	int num_tx_queues;
 	int num_rx_queues;
+	struct igb_ring *rx_ring[16];
 
 	u32 max_frame_size;
 	u32 min_frame_size;
 
+	struct timer_list watchdog_timer;
+	struct timer_list phy_info_timer;
+
+	u16 mng_vlan_id;
+	u32 bd_number;
+	u32 wol;
+	u32 en_mng_pt;
+	u16 link_speed;
+	u16 link_duplex;
+
+	struct work_struct reset_task;
+	struct work_struct watchdog_task;
+	bool fc_autoneg;
+	u8  tx_timeout_factor;
+	struct timer_list blink_timer;
+	unsigned long led_status;
+
 	/* OS defined structs */
-	struct net_device *netdev;
 	struct pci_dev *pdev;
 	struct cyclecounter cycles;
 	struct timecounter clock;
@@ -306,15 +317,11 @@ struct igb_adapter {
 
 	int msg_enable;
 
-	unsigned int num_q_vectors;
 	struct igb_q_vector *q_vector[MAX_Q_VECTORS];
-	struct msix_entry *msix_entries;
 	u32 eims_enable_mask;
 	u32 eims_other;
 
 	/* to not mess up cache alignment, always add to the bottom */
-	unsigned long state;
-	unsigned int flags;
 	u32 eeprom_wol;
 
 	struct igb_ring *multi_tx_table[IGB_ABS_MAX_TX_QUEUES];
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index af8c2f7..9fa2ad0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2679,7 +2679,6 @@ void igb_configure_tx_ring(struct igb_adapter *adapter,
 	                tdba & 0x00000000ffffffffULL);
 	wr32(E1000_TDBAH(reg_idx), tdba >> 32);
 
-	ring->head = hw->hw_addr + E1000_TDH(reg_idx);
 	ring->tail = hw->hw_addr + E1000_TDT(reg_idx);
 	wr32(E1000_TDH(reg_idx), 0);
 	writel(0, ring->tail);
@@ -3040,7 +3039,6 @@ void igb_configure_rx_ring(struct igb_adapter *adapter,
 	               ring->count * sizeof(union e1000_adv_rx_desc));
 
 	/* initialize head and tail */
-	ring->head = hw->hw_addr + E1000_RDH(reg_idx);
 	ring->tail = hw->hw_addr + E1000_RDT(reg_idx);
 	wr32(E1000_RDH(reg_idx), 0);
 	writel(0, ring->tail);
@@ -5653,7 +5651,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 				"  jiffies              <%lx>\n"
 				"  desc.status          <%x>\n",
 				tx_ring->queue_index,
-				readl(tx_ring->head),
+				rd32(E1000_TDH(tx_ring->reg_idx)),
 				readl(tx_ring->tail),
 				tx_ring->next_to_use,
 				tx_ring->next_to_clean,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 07/13] igb: Refactor clean_rx_irq to reduce overhead and improve performance
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (5 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 06/13] igb: update ring and adapter structure to improve performance Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 08/13] igb: drop the "adv" off function names relating to descriptors Jeff Kirsher
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to be a general cleanup and performance improvement
for clean_rx_irq.  The previous patch should have updated the allocation so
that the rings can be treated as read-only within the clean_rx_irq
function.  In addition I am re-ordering the operations such that several
goals are accomplished including reducing the overhead for packet
accounting, reducing the number of items on the stack, and improving
overall performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |   96 ++++++++++++++---------------
 1 files changed, 47 insertions(+), 49 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9fa2ad0..dd85df0 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -138,7 +138,7 @@ static void igb_setup_dca(struct igb_adapter *);
 #endif /* CONFIG_IGB_DCA */
 static bool igb_clean_tx_irq(struct igb_q_vector *);
 static int igb_poll(struct napi_struct *, int);
-static bool igb_clean_rx_irq_adv(struct igb_q_vector *, int *, int);
+static bool igb_clean_rx_irq_adv(struct igb_q_vector *, int);
 static int igb_ioctl(struct net_device *, struct ifreq *, int cmd);
 static void igb_tx_timeout(struct net_device *);
 static void igb_reset_task(struct work_struct *);
@@ -5481,28 +5481,27 @@ static int igb_poll(struct napi_struct *napi, int budget)
 	struct igb_q_vector *q_vector = container_of(napi,
 	                                             struct igb_q_vector,
 	                                             napi);
-	int tx_clean_complete = 1, work_done = 0;
+	bool clean_complete = true;
 
 #ifdef CONFIG_IGB_DCA
 	if (q_vector->adapter->flags & IGB_FLAG_DCA_ENABLED)
 		igb_update_dca(q_vector);
 #endif
 	if (q_vector->tx_ring)
-		tx_clean_complete = igb_clean_tx_irq(q_vector);
+		clean_complete = !!igb_clean_tx_irq(q_vector);
 
 	if (q_vector->rx_ring)
-		igb_clean_rx_irq_adv(q_vector, &work_done, budget);
+		clean_complete &= igb_clean_rx_irq_adv(q_vector, budget);
 
-	if (!tx_clean_complete)
-		work_done = budget;
+	/* If all work not completed, return budget and keep polling */
+	if (!clean_complete)
+		return budget;
 
 	/* If not enough Rx work done, exit the polling mode */
-	if (work_done < budget) {
-		napi_complete(napi);
-		igb_ring_irq_enable(q_vector);
-	}
+	napi_complete(napi);
+	igb_ring_irq_enable(q_vector);
 
-	return work_done;
+	return 0;
 }
 
 /**
@@ -5751,37 +5750,26 @@ static inline u16 igb_get_hlen(union e1000_adv_rx_desc *rx_desc)
 	return hlen;
 }
 
-static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
-                                 int *work_done, int budget)
+static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector, int budget)
 {
 	struct igb_ring *rx_ring = q_vector->rx_ring;
-	struct net_device *netdev = rx_ring->netdev;
-	struct device *dev = rx_ring->dev;
-	union e1000_adv_rx_desc *rx_desc , *next_rxd;
-	struct igb_buffer *buffer_info , *next_buffer;
-	struct sk_buff *skb;
-	bool cleaned = false;
-	u16 cleaned_count = igb_desc_unused(rx_ring);
-	int current_node = numa_node_id();
+	union e1000_adv_rx_desc *rx_desc;
+	const int current_node = numa_node_id();
 	unsigned int total_bytes = 0, total_packets = 0;
-	unsigned int i;
 	u32 staterr;
-	u16 length;
+	u16 cleaned_count = igb_desc_unused(rx_ring);
+	u16 i = rx_ring->next_to_clean;
 
-	i = rx_ring->next_to_clean;
-	buffer_info = &rx_ring->buffer_info[i];
 	rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
 	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 
 	while (staterr & E1000_RXD_STAT_DD) {
-		if (*work_done >= budget)
-			break;
-		(*work_done)++;
-		rmb(); /* read descriptor and rx_buffer_info after status DD */
+		struct igb_buffer *buffer_info = &rx_ring->buffer_info[i];
+		struct sk_buff *skb = buffer_info->skb;
+		union e1000_adv_rx_desc *next_rxd;
 
-		skb = buffer_info->skb;
-		prefetch(skb->data - NET_IP_ALIGN);
 		buffer_info->skb = NULL;
+		prefetch(skb->data);
 
 		i++;
 		if (i == rx_ring->count)
@@ -5789,42 +5777,48 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 
 		next_rxd = E1000_RX_DESC_ADV(*rx_ring, i);
 		prefetch(next_rxd);
-		next_buffer = &rx_ring->buffer_info[i];
 
-		length = le16_to_cpu(rx_desc->wb.upper.length);
-		cleaned = true;
-		cleaned_count++;
+		/*
+		 * This memory barrier is needed to keep us from reading
+		 * any other fields out of the rx_desc until we know the
+		 * RXD_STAT_DD bit is set
+		 */
+		rmb();
 
-		if (buffer_info->dma) {
-			dma_unmap_single(dev, buffer_info->dma,
+		if (!skb_is_nonlinear(skb)) {
+			__skb_put(skb, igb_get_hlen(rx_desc));
+			dma_unmap_single(rx_ring->dev, buffer_info->dma,
 					 IGB_RX_HDR_LEN,
 					 DMA_FROM_DEVICE);
 			buffer_info->dma = 0;
-			skb_put(skb, igb_get_hlen(rx_desc));
 		}
 
-		if (length) {
-			dma_unmap_page(dev, buffer_info->page_dma,
-				       PAGE_SIZE / 2, DMA_FROM_DEVICE);
-			buffer_info->page_dma = 0;
+		if (rx_desc->wb.upper.length) {
+			u16 length = le16_to_cpu(rx_desc->wb.upper.length);
 
 			skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
 						buffer_info->page,
 						buffer_info->page_offset,
 						length);
 
+			skb->len += length;
+			skb->data_len += length;
+			skb->truesize += length;
+
 			if ((page_count(buffer_info->page) != 1) ||
 			    (page_to_nid(buffer_info->page) != current_node))
 				buffer_info->page = NULL;
 			else
 				get_page(buffer_info->page);
 
-			skb->len += length;
-			skb->data_len += length;
-			skb->truesize += length;
+			dma_unmap_page(rx_ring->dev, buffer_info->page_dma,
+				       PAGE_SIZE / 2, DMA_FROM_DEVICE);
+			buffer_info->page_dma = 0;
 		}
 
 		if (!(staterr & E1000_RXD_STAT_EOP)) {
+			struct igb_buffer *next_buffer;
+			next_buffer = &rx_ring->buffer_info[i];
 			buffer_info->skb = next_buffer->skb;
 			buffer_info->dma = next_buffer->dma;
 			next_buffer->skb = skb;
@@ -5833,7 +5827,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 		}
 
 		if (staterr & E1000_RXDEXT_ERR_FRAME_ERR_MASK) {
-			dev_kfree_skb_irq(skb);
+			dev_kfree_skb_any(skb);
 			goto next_desc;
 		}
 
@@ -5844,7 +5838,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 
 		igb_rx_checksum_adv(rx_ring, staterr, skb);
 
-		skb->protocol = eth_type_trans(skb, netdev);
+		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
 		if (staterr & E1000_RXD_STAT_VP) {
 			u16 vid = le16_to_cpu(rx_desc->wb.upper.vlan);
@@ -5853,7 +5847,12 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector,
 		}
 		napi_gro_receive(&q_vector->napi, skb);
 
+		budget--;
 next_desc:
+		if (!budget)
+			break;
+
+		cleaned_count++;
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IGB_RX_BUFFER_WRITE) {
 			igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
@@ -5862,7 +5861,6 @@ next_desc:
 
 		/* use prefetched values */
 		rx_desc = next_rxd;
-		buffer_info = next_buffer;
 		staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	}
 
@@ -5877,7 +5875,7 @@ next_desc:
 	if (cleaned_count)
 		igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
 
-	return cleaned;
+	return !!budget;
 }
 
 static bool igb_alloc_mapped_skb(struct igb_ring *rx_ring,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 08/13] igb: drop the "adv" off function names relating to descriptors
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (6 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 07/13] igb: Refactor clean_rx_irq to reduce overhead and " Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 09/13] igb: Replace E1000_XX_DESC_ADV with IGB_XX_DESC Jeff Kirsher
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

Many of the function names in the hot path are carrying an extra "_adv"
suffix on the end of them to represent the fact that they are using
advanced descriptors instead of legacy descriptors.  However since all igb
uses are advanced descriptors adding the extra suffix doesn't really add
any additional data.  Since this is the case it is easiest to just drop the
suffix and save us from having to store the extra characters.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |    4 +-
 drivers/net/ethernet/intel/igb/igb_ethtool.c |    6 +-
 drivers/net/ethernet/intel/igb/igb_main.c    |   62 +++++++++++++-------------
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 7036fd5..b1ca8ea 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -374,10 +374,10 @@ extern void igb_configure_tx_ring(struct igb_adapter *, struct igb_ring *);
 extern void igb_configure_rx_ring(struct igb_adapter *, struct igb_ring *);
 extern void igb_setup_tctl(struct igb_adapter *);
 extern void igb_setup_rctl(struct igb_adapter *);
-extern netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *, struct igb_ring *);
+extern netdev_tx_t igb_xmit_frame_ring(struct sk_buff *, struct igb_ring *);
 extern void igb_unmap_and_free_tx_resource(struct igb_ring *,
 					   struct igb_buffer *);
-extern void igb_alloc_rx_buffers_adv(struct igb_ring *, u16);
+extern void igb_alloc_rx_buffers(struct igb_ring *, u16);
 extern void igb_update_stats(struct igb_adapter *, struct rtnl_link_stats64 *);
 extern bool igb_has_link(struct igb_adapter *adapter);
 extern void igb_set_ethtool_ops(struct net_device *);
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 04bc7a5..67eee0a 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1382,7 +1382,7 @@ static int igb_setup_desc_rings(struct igb_adapter *adapter)
 	igb_setup_rctl(adapter);
 	igb_configure_rx_ring(adapter, rx_ring);
 
-	igb_alloc_rx_buffers_adv(rx_ring, igb_desc_unused(rx_ring));
+	igb_alloc_rx_buffers(rx_ring, igb_desc_unused(rx_ring));
 
 	return 0;
 
@@ -1622,7 +1622,7 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
 	}
 
 	/* re-map buffers to ring, store next to clean values */
-	igb_alloc_rx_buffers_adv(rx_ring, count);
+	igb_alloc_rx_buffers(rx_ring, count);
 	rx_ring->next_to_clean = rx_ntc;
 	tx_ring->next_to_clean = tx_ntc;
 
@@ -1665,7 +1665,7 @@ static int igb_run_loopback_test(struct igb_adapter *adapter)
 		/* place 64 packets on the transmit queue*/
 		for (i = 0; i < 64; i++) {
 			skb_get(skb);
-			tx_ret_val = igb_xmit_frame_ring_adv(skb, tx_ring);
+			tx_ret_val = igb_xmit_frame_ring(skb, tx_ring);
 			if (tx_ret_val == NETDEV_TX_OK)
 				good_cnt++;
 		}
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index dd85df0..9a0cfd6 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -122,7 +122,7 @@ static void igb_set_rx_mode(struct net_device *);
 static void igb_update_phy_info(unsigned long);
 static void igb_watchdog(unsigned long);
 static void igb_watchdog_task(struct work_struct *);
-static netdev_tx_t igb_xmit_frame_adv(struct sk_buff *skb, struct net_device *);
+static netdev_tx_t igb_xmit_frame(struct sk_buff *skb, struct net_device *);
 static struct rtnl_link_stats64 *igb_get_stats64(struct net_device *dev,
 						 struct rtnl_link_stats64 *stats);
 static int igb_change_mtu(struct net_device *, int);
@@ -138,7 +138,7 @@ static void igb_setup_dca(struct igb_adapter *);
 #endif /* CONFIG_IGB_DCA */
 static bool igb_clean_tx_irq(struct igb_q_vector *);
 static int igb_poll(struct napi_struct *, int);
-static bool igb_clean_rx_irq_adv(struct igb_q_vector *, int);
+static bool igb_clean_rx_irq(struct igb_q_vector *, int);
 static int igb_ioctl(struct net_device *, struct ifreq *, int cmd);
 static void igb_tx_timeout(struct net_device *);
 static void igb_reset_task(struct work_struct *);
@@ -1436,7 +1436,7 @@ static void igb_configure(struct igb_adapter *adapter)
 	 * next_to_use != next_to_clean */
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		struct igb_ring *ring = adapter->rx_ring[i];
-		igb_alloc_rx_buffers_adv(ring, igb_desc_unused(ring));
+		igb_alloc_rx_buffers(ring, igb_desc_unused(ring));
 	}
 }
 
@@ -1784,7 +1784,7 @@ static int igb_set_features(struct net_device *netdev, u32 features)
 static const struct net_device_ops igb_netdev_ops = {
 	.ndo_open		= igb_open,
 	.ndo_stop		= igb_close,
-	.ndo_start_xmit		= igb_xmit_frame_adv,
+	.ndo_start_xmit		= igb_xmit_frame,
 	.ndo_get_stats64	= igb_get_stats64,
 	.ndo_set_rx_mode	= igb_set_rx_mode,
 	.ndo_set_mac_address	= igb_set_mac,
@@ -3955,8 +3955,8 @@ set_itr_now:
 #define IGB_TX_FLAGS_VLAN_MASK		0xffff0000
 #define IGB_TX_FLAGS_VLAN_SHIFT		        16
 
-static inline int igb_tso_adv(struct igb_ring *tx_ring,
-			      struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
+static inline int igb_tso(struct igb_ring *tx_ring,
+			  struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
 {
 	struct e1000_adv_tx_context_desc *context_desc;
 	unsigned int i;
@@ -4035,8 +4035,8 @@ static inline int igb_tso_adv(struct igb_ring *tx_ring,
 	return true;
 }
 
-static inline bool igb_tx_csum_adv(struct igb_ring *tx_ring,
-				   struct sk_buff *skb, u32 tx_flags)
+static inline bool igb_tx_csum(struct igb_ring *tx_ring,
+			       struct sk_buff *skb, u32 tx_flags)
 {
 	struct e1000_adv_tx_context_desc *context_desc;
 	struct device *dev = tx_ring->dev;
@@ -4120,8 +4120,8 @@ static inline bool igb_tx_csum_adv(struct igb_ring *tx_ring,
 #define IGB_MAX_TXD_PWR	16
 #define IGB_MAX_DATA_PER_TXD	(1<<IGB_MAX_TXD_PWR)
 
-static inline int igb_tx_map_adv(struct igb_ring *tx_ring, struct sk_buff *skb,
-				 unsigned int first)
+static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
+			     unsigned int first)
 {
 	struct igb_buffer *buffer_info;
 	struct device *dev = tx_ring->dev;
@@ -4196,9 +4196,9 @@ dma_error:
 	return 0;
 }
 
-static inline void igb_tx_queue_adv(struct igb_ring *tx_ring,
-				    u32 tx_flags, int count, u32 paylen,
-				    u8 hdr_len)
+static inline void igb_tx_queue(struct igb_ring *tx_ring,
+				u32 tx_flags, int count, u32 paylen,
+				u8 hdr_len)
 {
 	union e1000_adv_tx_desc *tx_desc;
 	struct igb_buffer *buffer_info;
@@ -4296,8 +4296,8 @@ static inline int igb_maybe_stop_tx(struct igb_ring *tx_ring, int size)
 	return __igb_maybe_stop_tx(tx_ring, size);
 }
 
-netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
-				    struct igb_ring *tx_ring)
+netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
+				struct igb_ring *tx_ring)
 {
 	int tso = 0, count;
 	u32 tx_flags = 0;
@@ -4329,7 +4329,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 
 	first = tx_ring->next_to_use;
 	if (skb_is_gso(skb)) {
-		tso = igb_tso_adv(tx_ring, skb, tx_flags, &hdr_len);
+		tso = igb_tso(tx_ring, skb, tx_flags, &hdr_len);
 
 		if (tso < 0) {
 			dev_kfree_skb_any(skb);
@@ -4339,7 +4339,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 
 	if (tso)
 		tx_flags |= IGB_TX_FLAGS_TSO;
-	else if (igb_tx_csum_adv(tx_ring, skb, tx_flags) &&
+	else if (igb_tx_csum(tx_ring, skb, tx_flags) &&
 	         (skb->ip_summed == CHECKSUM_PARTIAL))
 		tx_flags |= IGB_TX_FLAGS_CSUM;
 
@@ -4347,7 +4347,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 	 * count reflects descriptors mapped, if 0 or less then mapping error
 	 * has occurred and we need to rewind the descriptor queue
 	 */
-	count = igb_tx_map_adv(tx_ring, skb, first);
+	count = igb_tx_map(tx_ring, skb, first);
 	if (!count) {
 		dev_kfree_skb_any(skb);
 		tx_ring->buffer_info[first].time_stamp = 0;
@@ -4355,7 +4355,7 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 	}
 
-	igb_tx_queue_adv(tx_ring, tx_flags, count, skb->len, hdr_len);
+	igb_tx_queue(tx_ring, tx_flags, count, skb->len, hdr_len);
 
 	/* Make sure there is space in the ring for the next send. */
 	igb_maybe_stop_tx(tx_ring, MAX_SKB_FRAGS + 4);
@@ -4363,8 +4363,8 @@ netdev_tx_t igb_xmit_frame_ring_adv(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
-static netdev_tx_t igb_xmit_frame_adv(struct sk_buff *skb,
-				      struct net_device *netdev)
+static netdev_tx_t igb_xmit_frame(struct sk_buff *skb,
+				  struct net_device *netdev)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
 	struct igb_ring *tx_ring;
@@ -4387,7 +4387,7 @@ static netdev_tx_t igb_xmit_frame_adv(struct sk_buff *skb,
 	 * to a flow.  Right now, performance is impacted slightly negatively
 	 * if using multiple tx queues.  If the stack breaks away from a
 	 * single qdisc implementation, we can look at this again. */
-	return igb_xmit_frame_ring_adv(skb, tx_ring);
+	return igb_xmit_frame_ring(skb, tx_ring);
 }
 
 /**
@@ -5491,7 +5491,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
 		clean_complete = !!igb_clean_tx_irq(q_vector);
 
 	if (q_vector->rx_ring)
-		clean_complete &= igb_clean_rx_irq_adv(q_vector, budget);
+		clean_complete &= igb_clean_rx_irq(q_vector, budget);
 
 	/* If all work not completed, return budget and keep polling */
 	if (!clean_complete)
@@ -5670,8 +5670,8 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 	return count < tx_ring->count;
 }
 
-static inline void igb_rx_checksum_adv(struct igb_ring *ring,
-				       u32 status_err, struct sk_buff *skb)
+static inline void igb_rx_checksum(struct igb_ring *ring,
+				   u32 status_err, struct sk_buff *skb)
 {
 	skb_checksum_none_assert(skb);
 
@@ -5750,7 +5750,7 @@ static inline u16 igb_get_hlen(union e1000_adv_rx_desc *rx_desc)
 	return hlen;
 }
 
-static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector, int budget)
+static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, int budget)
 {
 	struct igb_ring *rx_ring = q_vector->rx_ring;
 	union e1000_adv_rx_desc *rx_desc;
@@ -5836,7 +5836,7 @@ static bool igb_clean_rx_irq_adv(struct igb_q_vector *q_vector, int budget)
 		total_bytes += skb->len;
 		total_packets++;
 
-		igb_rx_checksum_adv(rx_ring, staterr, skb);
+		igb_rx_checksum(rx_ring, staterr, skb);
 
 		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
@@ -5855,7 +5855,7 @@ next_desc:
 		cleaned_count++;
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IGB_RX_BUFFER_WRITE) {
-			igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
+			igb_alloc_rx_buffers(rx_ring, cleaned_count);
 			cleaned_count = 0;
 		}
 
@@ -5873,7 +5873,7 @@ next_desc:
 	rx_ring->total_bytes += total_bytes;
 
 	if (cleaned_count)
-		igb_alloc_rx_buffers_adv(rx_ring, cleaned_count);
+		igb_alloc_rx_buffers(rx_ring, cleaned_count);
 
 	return !!budget;
 }
@@ -5946,10 +5946,10 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
 }
 
 /**
- * igb_alloc_rx_buffers_adv - Replace used receive buffers; packet split
+ * igb_alloc_rx_buffers - Replace used receive buffers; packet split
  * @adapter: address of board private structure
  **/
-void igb_alloc_rx_buffers_adv(struct igb_ring *rx_ring, u16 cleaned_count)
+void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 {
 	union e1000_adv_rx_desc *rx_desc;
 	struct igb_buffer *bi;
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 09/13] igb: Replace E1000_XX_DESC_ADV with IGB_XX_DESC
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (7 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 08/13] igb: drop the "adv" off function names relating to descriptors Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 10/13] igb: Remove multi_tx_table and simplify igb_xmit_frame Jeff Kirsher
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

Since igb only uses advanced descriptors we might as well just use an IGB
specific define and drop the _ADV suffix for the descriptor declarations.
In addition this can be further reduced by assuming that it will be working
on pointers since that is normally how the Tx descriptors are handled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |   12 ++++++------
 drivers/net/ethernet/intel/igb/igb_ethtool.c |    4 ++--
 drivers/net/ethernet/intel/igb/igb_main.c    |   24 ++++++++++++------------
 3 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index b1ca8ea..8607a1d 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -232,12 +232,12 @@ struct igb_ring {
 
 #define IGB_ADVTXD_DCMD (E1000_TXD_CMD_EOP | E1000_TXD_CMD_RS)
 
-#define E1000_RX_DESC_ADV(R, i)	    \
-	(&(((union e1000_adv_rx_desc *)((R).desc))[i]))
-#define E1000_TX_DESC_ADV(R, i)	    \
-	(&(((union e1000_adv_tx_desc *)((R).desc))[i]))
-#define E1000_TX_CTXTDESC_ADV(R, i)	    \
-	(&(((struct e1000_adv_tx_context_desc *)((R).desc))[i]))
+#define IGB_RX_DESC(R, i)	    \
+	(&(((union e1000_adv_rx_desc *)((R)->desc))[i]))
+#define IGB_TX_DESC(R, i)	    \
+	(&(((union e1000_adv_tx_desc *)((R)->desc))[i]))
+#define IGB_TX_CTXTDESC(R, i)	    \
+	(&(((struct e1000_adv_tx_context_desc *)((R)->desc))[i]))
 
 /* igb_desc_unused - calculate if we have unused descriptors */
 static inline int igb_desc_unused(struct igb_ring *ring)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 67eee0a..f231d82 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1586,7 +1586,7 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
 	/* initialize next to clean and descriptor values */
 	rx_ntc = rx_ring->next_to_clean;
 	tx_ntc = tx_ring->next_to_clean;
-	rx_desc = E1000_RX_DESC_ADV(*rx_ring, rx_ntc);
+	rx_desc = IGB_RX_DESC(rx_ring, rx_ntc);
 	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 
 	while (staterr & E1000_RXD_STAT_DD) {
@@ -1617,7 +1617,7 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
 			tx_ntc = 0;
 
 		/* fetch next descriptor */
-		rx_desc = E1000_RX_DESC_ADV(*rx_ring, rx_ntc);
+		rx_desc = IGB_RX_DESC(rx_ring, rx_ntc);
 		staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 	}
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9a0cfd6..55d6431 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -413,7 +413,7 @@ static void igb_dump(struct igb_adapter *adapter)
 			"leng  ntw timestamp        bi->skb\n");
 
 		for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
-			tx_desc = E1000_TX_DESC_ADV(*tx_ring, i);
+			tx_desc = IGB_TX_DESC(tx_ring, i);
 			buffer_info = &tx_ring->buffer_info[i];
 			u0 = (struct my_u0 *)tx_desc;
 			printk(KERN_INFO "T [0x%03X]    %016llX %016llX %016llX"
@@ -494,7 +494,7 @@ rx_ring_summary:
 
 		for (i = 0; i < rx_ring->count; i++) {
 			buffer_info = &rx_ring->buffer_info[i];
-			rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
+			rx_desc = IGB_RX_DESC(rx_ring, i);
 			u0 = (struct my_u0 *)rx_desc;
 			staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 			if (staterr & E1000_RXD_STAT_DD) {
@@ -3993,7 +3993,7 @@ static inline int igb_tso(struct igb_ring *tx_ring,
 	i = tx_ring->next_to_use;
 
 	buffer_info = &tx_ring->buffer_info[i];
-	context_desc = E1000_TX_CTXTDESC_ADV(*tx_ring, i);
+	context_desc = IGB_TX_CTXTDESC(tx_ring, i);
 	/* VLAN MACLEN IPLEN */
 	if (tx_flags & IGB_TX_FLAGS_VLAN)
 		info |= (tx_flags & IGB_TX_FLAGS_VLAN_MASK);
@@ -4048,7 +4048,7 @@ static inline bool igb_tx_csum(struct igb_ring *tx_ring,
 	    (tx_flags & IGB_TX_FLAGS_VLAN)) {
 		i = tx_ring->next_to_use;
 		buffer_info = &tx_ring->buffer_info[i];
-		context_desc = E1000_TX_CTXTDESC_ADV(*tx_ring, i);
+		context_desc = IGB_TX_CTXTDESC(tx_ring, i);
 
 		if (tx_flags & IGB_TX_FLAGS_VLAN)
 			info |= (tx_flags & IGB_TX_FLAGS_VLAN_MASK);
@@ -4238,7 +4238,7 @@ static inline void igb_tx_queue(struct igb_ring *tx_ring,
 
 	do {
 		buffer_info = &tx_ring->buffer_info[i];
-		tx_desc = E1000_TX_DESC_ADV(*tx_ring, i);
+		tx_desc = IGB_TX_DESC(tx_ring, i);
 		tx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma);
 		tx_desc->read.cmd_type_len =
 			cpu_to_le32(cmd_type_len | buffer_info->length);
@@ -5580,13 +5580,13 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 
 	i = tx_ring->next_to_clean;
 	eop = tx_ring->buffer_info[i].next_to_watch;
-	eop_desc = E1000_TX_DESC_ADV(*tx_ring, eop);
+	eop_desc = IGB_TX_DESC(tx_ring, eop);
 
 	while ((eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)) &&
 	       (count < tx_ring->count)) {
 		rmb();	/* read buffer_info after eop_desc status */
 		for (cleaned = false; !cleaned; count++) {
-			tx_desc = E1000_TX_DESC_ADV(*tx_ring, i);
+			tx_desc = IGB_TX_DESC(tx_ring, i);
 			buffer_info = &tx_ring->buffer_info[i];
 			cleaned = (i == eop);
 
@@ -5605,7 +5605,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 				i = 0;
 		}
 		eop = tx_ring->buffer_info[i].next_to_watch;
-		eop_desc = E1000_TX_DESC_ADV(*tx_ring, eop);
+		eop_desc = IGB_TX_DESC(tx_ring, eop);
 	}
 
 	tx_ring->next_to_clean = i;
@@ -5760,7 +5760,7 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, int budget)
 	u16 cleaned_count = igb_desc_unused(rx_ring);
 	u16 i = rx_ring->next_to_clean;
 
-	rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
+	rx_desc = IGB_RX_DESC(rx_ring, i);
 	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 
 	while (staterr & E1000_RXD_STAT_DD) {
@@ -5775,7 +5775,7 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, int budget)
 		if (i == rx_ring->count)
 			i = 0;
 
-		next_rxd = E1000_RX_DESC_ADV(*rx_ring, i);
+		next_rxd = IGB_RX_DESC(rx_ring, i);
 		prefetch(next_rxd);
 
 		/*
@@ -5955,7 +5955,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 	struct igb_buffer *bi;
 	u16 i = rx_ring->next_to_use;
 
-	rx_desc = E1000_RX_DESC_ADV(*rx_ring, i);
+	rx_desc = IGB_RX_DESC(rx_ring, i);
 	bi = &rx_ring->buffer_info[i];
 	i -= rx_ring->count;
 
@@ -5976,7 +5976,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		bi++;
 		i++;
 		if (unlikely(!i)) {
-			rx_desc = E1000_RX_DESC_ADV(*rx_ring, 0);
+			rx_desc = IGB_RX_DESC(rx_ring, 0);
 			bi = rx_ring->buffer_info;
 			i -= rx_ring->count;
 		}
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 10/13] igb: Remove multi_tx_table and simplify igb_xmit_frame
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (8 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 09/13] igb: Replace E1000_XX_DESC_ADV with IGB_XX_DESC Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 11/13] igb: Make Tx budget for NAPI user adjustable Jeff Kirsher
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

Instead of using the multi_tx_table to map possible Tx queues to Tx rings
we can just do simple subtraction for the unlikely event that the Tx queue
provided exceeds the number of Tx rings.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h      |    4 +--
 drivers/net/ethernet/intel/igb/igb_main.c |   36 +++++++++++++++++-----------
 2 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index 8607a1d..b725937 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -63,8 +63,7 @@ struct igb_adapter;
 /* Transmit and receive queues */
 #define IGB_MAX_RX_QUEUES                  (adapter->vfs_allocated_count ? 2 : \
                                            (hw->mac.type > e1000_82575 ? 8 : 4))
-#define IGB_ABS_MAX_TX_QUEUES              8
-#define IGB_MAX_TX_QUEUES                  IGB_MAX_RX_QUEUES
+#define IGB_MAX_TX_QUEUES                  16
 
 #define IGB_MAX_VF_MC_ENTRIES              30
 #define IGB_MAX_VF_FUNCTIONS               8
@@ -324,7 +323,6 @@ struct igb_adapter {
 	/* to not mess up cache alignment, always add to the bottom */
 	u32 eeprom_wol;
 
-	struct igb_ring *multi_tx_table[IGB_ABS_MAX_TX_QUEUES];
 	u16 tx_ring_count;
 	u16 rx_ring_count;
 	unsigned int vfs_allocated_count;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 55d6431..7ad25e8 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -1875,7 +1875,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 
 	err = -ENOMEM;
 	netdev = alloc_etherdev_mq(sizeof(struct igb_adapter),
-	                           IGB_ABS_MAX_TX_QUEUES);
+				   IGB_MAX_TX_QUEUES);
 	if (!netdev)
 		goto err_alloc_etherdev;
 
@@ -2620,10 +2620,6 @@ static int igb_setup_all_tx_resources(struct igb_adapter *adapter)
 		}
 	}
 
-	for (i = 0; i < IGB_ABS_MAX_TX_QUEUES; i++) {
-		int r_idx = i % adapter->num_tx_queues;
-		adapter->multi_tx_table[i] = adapter->tx_ring[r_idx];
-	}
 	return err;
 }
 
@@ -4363,12 +4359,21 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	return NETDEV_TX_OK;
 }
 
+static inline struct igb_ring *igb_tx_queue_mapping(struct igb_adapter *adapter,
+						    struct sk_buff *skb)
+{
+	unsigned int r_idx = skb->queue_mapping;
+
+	if (r_idx >= adapter->num_tx_queues)
+		r_idx = r_idx % adapter->num_tx_queues;
+
+	return adapter->tx_ring[r_idx];
+}
+
 static netdev_tx_t igb_xmit_frame(struct sk_buff *skb,
 				  struct net_device *netdev)
 {
 	struct igb_adapter *adapter = netdev_priv(netdev);
-	struct igb_ring *tx_ring;
-	int r_idx = 0;
 
 	if (test_bit(__IGB_DOWN, &adapter->state)) {
 		dev_kfree_skb_any(skb);
@@ -4380,14 +4385,17 @@ static netdev_tx_t igb_xmit_frame(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 	}
 
-	r_idx = skb->queue_mapping & (IGB_ABS_MAX_TX_QUEUES - 1);
-	tx_ring = adapter->multi_tx_table[r_idx];
+	/*
+	 * The minimum packet size with TCTL.PSP set is 17 so pad the skb
+	 * in order to meet this minimum size requirement.
+	 */
+	if (skb->len < 17) {
+		if (skb_padto(skb, 17))
+			return NETDEV_TX_OK;
+		skb->len = 17;
+	}
 
-	/* This goes back to the question of how to logically map a tx queue
-	 * to a flow.  Right now, performance is impacted slightly negatively
-	 * if using multiple tx queues.  If the stack breaks away from a
-	 * single qdisc implementation, we can look at this again. */
-	return igb_xmit_frame_ring(skb, tx_ring);
+	return igb_xmit_frame_ring(skb, igb_tx_queue_mapping(adapter, skb));
 }
 
 /**
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (9 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 10/13] igb: Remove multi_tx_table and simplify igb_xmit_frame Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17 17:04   ` Ben Hutchings
  2011-09-19 20:57   ` David Miller
  2011-09-17  8:04 ` [net-next 12/13] igb: split buffer_info into tx_buffer_info and rx_buffer_info Jeff Kirsher
  2011-09-17  8:04 ` [net-next 13/13] igb: Consolidate creation of Tx context descriptors into a single function Jeff Kirsher
  12 siblings, 2 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This change is meant to make the NAPI budget limits for transmit
adjustable.  By doing this it is possible to tune the value for optimal
performance with applications such as routing.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |    3 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c |    6 +
 drivers/net/ethernet/intel/igb/igb_main.c    |  136 ++++++++++++++++----------
 3 files changed, 92 insertions(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index b725937..cfa590c8 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -47,6 +47,7 @@ struct igb_adapter;
 
 /* TX/RX descriptor defines */
 #define IGB_DEFAULT_TXD                  256
+#define IXGBE_DEFAULT_TX_WORK		 128
 #define IGB_MIN_TXD                       80
 #define IGB_MAX_TXD                     4096
 
@@ -177,6 +178,7 @@ struct igb_q_vector {
 
 	u32 eims_value;
 	u16 cpu;
+	u16 tx_work_limit;
 
 	u16 itr_val;
 	u8 set_itr;
@@ -266,6 +268,7 @@ struct igb_adapter {
 	u16 rx_itr;
 
 	/* TX */
+	u16 tx_work_limit;
 	u32 tx_timeout_count;
 	int num_tx_queues;
 	struct igb_ring *tx_ring[16];
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index f231d82..f6da820 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1989,6 +1989,9 @@ static int igb_set_coalesce(struct net_device *netdev,
 	if ((adapter->flags & IGB_FLAG_QUEUE_PAIRS) && ec->tx_coalesce_usecs)
 		return -EINVAL;
 
+	if (ec->tx_max_coalesced_frames_irq)
+		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
+
 	/* If ITR is disabled, disable DMAC */
 	if (ec->rx_coalesce_usecs == 0) {
 		if (adapter->flags & IGB_FLAG_DMAC)
@@ -2011,6 +2014,7 @@ static int igb_set_coalesce(struct net_device *netdev,
 
 	for (i = 0; i < adapter->num_q_vectors; i++) {
 		struct igb_q_vector *q_vector = adapter->q_vector[i];
+		q_vector->tx_work_limit = adapter->tx_work_limit;
 		if (q_vector->rx_ring)
 			q_vector->itr_val = adapter->rx_itr_setting;
 		else
@@ -2033,6 +2037,8 @@ static int igb_get_coalesce(struct net_device *netdev,
 	else
 		ec->rx_coalesce_usecs = adapter->rx_itr_setting >> 2;
 
+	ec->tx_max_coalesced_frames_irq = adapter->tx_work_limit;
+
 	if (!(adapter->flags & IGB_FLAG_QUEUE_PAIRS)) {
 		if (adapter->tx_itr_setting <= 3)
 			ec->tx_coalesce_usecs = adapter->tx_itr_setting;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7ad25e8..989e774 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -136,8 +136,8 @@ static irqreturn_t igb_msix_ring(int irq, void *);
 static void igb_update_dca(struct igb_q_vector *);
 static void igb_setup_dca(struct igb_adapter *);
 #endif /* CONFIG_IGB_DCA */
-static bool igb_clean_tx_irq(struct igb_q_vector *);
 static int igb_poll(struct napi_struct *, int);
+static bool igb_clean_tx_irq(struct igb_q_vector *);
 static bool igb_clean_rx_irq(struct igb_q_vector *, int);
 static int igb_ioctl(struct net_device *, struct ifreq *, int cmd);
 static void igb_tx_timeout(struct net_device *);
@@ -1120,6 +1120,7 @@ static void igb_map_tx_ring_to_vector(struct igb_adapter *adapter,
 	q_vector->tx_ring = adapter->tx_ring[ring_idx];
 	q_vector->tx_ring->q_vector = q_vector;
 	q_vector->itr_val = adapter->tx_itr_setting;
+	q_vector->tx_work_limit = adapter->tx_work_limit;
 	if (q_vector->itr_val && q_vector->itr_val <= 3)
 		q_vector->itr_val = IGB_START_ITR;
 }
@@ -2388,11 +2389,17 @@ static int __devinit igb_sw_init(struct igb_adapter *adapter)
 
 	pci_read_config_word(pdev, PCI_COMMAND, &hw->bus.pci_cmd_word);
 
+	/* set default ring sizes */
 	adapter->tx_ring_count = IGB_DEFAULT_TXD;
 	adapter->rx_ring_count = IGB_DEFAULT_RXD;
+
+	/* set default ITR values */
 	adapter->rx_itr_setting = IGB_DEFAULT_ITR;
 	adapter->tx_itr_setting = IGB_DEFAULT_ITR;
 
+	/* set default work limits */
+	adapter->tx_work_limit = IXGBE_DEFAULT_TX_WORK;
+
 	adapter->max_frame_size = netdev->mtu + ETH_HLEN + ETH_FCS_LEN +
 				  VLAN_HLEN;
 	adapter->min_frame_size = ETH_ZLEN + ETH_FCS_LEN;
@@ -5496,7 +5503,7 @@ static int igb_poll(struct napi_struct *napi, int budget)
 		igb_update_dca(q_vector);
 #endif
 	if (q_vector->tx_ring)
-		clean_complete = !!igb_clean_tx_irq(q_vector);
+		clean_complete = igb_clean_tx_irq(q_vector);
 
 	if (q_vector->rx_ring)
 		clean_complete &= igb_clean_rx_irq(q_vector, budget);
@@ -5578,64 +5585,69 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 {
 	struct igb_adapter *adapter = q_vector->adapter;
 	struct igb_ring *tx_ring = q_vector->tx_ring;
-	struct net_device *netdev = tx_ring->netdev;
-	struct e1000_hw *hw = &adapter->hw;
-	struct igb_buffer *buffer_info;
-	union e1000_adv_tx_desc *tx_desc, *eop_desc;
+	struct igb_buffer *tx_buffer;
+	union e1000_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
-	unsigned int i, eop, count = 0;
-	bool cleaned = false;
+	unsigned int budget = q_vector->tx_work_limit;
+	u16 i = tx_ring->next_to_clean;
 
-	i = tx_ring->next_to_clean;
-	eop = tx_ring->buffer_info[i].next_to_watch;
-	eop_desc = IGB_TX_DESC(tx_ring, eop);
+	if (test_bit(__IGB_DOWN, &adapter->state))
+		return true;
 
-	while ((eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)) &&
-	       (count < tx_ring->count)) {
-		rmb();	/* read buffer_info after eop_desc status */
-		for (cleaned = false; !cleaned; count++) {
-			tx_desc = IGB_TX_DESC(tx_ring, i);
-			buffer_info = &tx_ring->buffer_info[i];
-			cleaned = (i == eop);
+	tx_buffer = &tx_ring->buffer_info[i];
+	tx_desc = IGB_TX_DESC(tx_ring, i);
 
-			if (buffer_info->skb) {
-				total_bytes += buffer_info->bytecount;
-				/* gso_segs is currently only valid for tcp */
-				total_packets += buffer_info->gso_segs;
-				igb_tx_hwtstamp(q_vector, buffer_info);
-			}
+	for (; budget; budget--) {
+		u16 eop = tx_buffer->next_to_watch;
+		union e1000_adv_tx_desc *eop_desc;
+
+		eop_desc = IGB_TX_DESC(tx_ring, eop);
+
+		/* if DD is not set pending work has not been completed */
+		if (!(eop_desc->wb.status & cpu_to_le32(E1000_TXD_STAT_DD)))
+			break;
+
+		/* prevent any other reads prior to eop_desc being verified */
+		rmb();
 
-			igb_unmap_and_free_tx_resource(tx_ring, buffer_info);
+		do {
 			tx_desc->wb.status = 0;
+			if (likely(tx_desc == eop_desc)) {
+				eop_desc = NULL;
+
+				total_bytes += tx_buffer->bytecount;
+				total_packets += tx_buffer->gso_segs;
+				igb_tx_hwtstamp(q_vector, tx_buffer);
+			}
+
+			igb_unmap_and_free_tx_resource(tx_ring, tx_buffer);
 
+			tx_buffer++;
+			tx_desc++;
 			i++;
-			if (i == tx_ring->count)
+			if (unlikely(i == tx_ring->count)) {
 				i = 0;
-		}
-		eop = tx_ring->buffer_info[i].next_to_watch;
-		eop_desc = IGB_TX_DESC(tx_ring, eop);
+				tx_buffer = tx_ring->buffer_info;
+				tx_desc = IGB_TX_DESC(tx_ring, 0);
+			}
+		} while (eop_desc);
 	}
 
 	tx_ring->next_to_clean = i;
+	u64_stats_update_begin(&tx_ring->tx_syncp);
+	tx_ring->tx_stats.bytes += total_bytes;
+	tx_ring->tx_stats.packets += total_packets;
+	u64_stats_update_end(&tx_ring->tx_syncp);
+	tx_ring->total_bytes += total_bytes;
+	tx_ring->total_packets += total_packets;
 
-	if (unlikely(count &&
-		     netif_carrier_ok(netdev) &&
-		     igb_desc_unused(tx_ring) >= IGB_TX_QUEUE_WAKE)) {
-		/* Make sure that anybody stopping the queue after this
-		 * sees the new next_to_clean.
-		 */
-		smp_mb();
-		if (__netif_subqueue_stopped(netdev, tx_ring->queue_index) &&
-		    !(test_bit(__IGB_DOWN, &adapter->state))) {
-			netif_wake_subqueue(netdev, tx_ring->queue_index);
+	if (tx_ring->detect_tx_hung) {
+		struct e1000_hw *hw = &adapter->hw;
+		u16 eop = tx_ring->buffer_info[i].next_to_watch;
+		union e1000_adv_tx_desc *eop_desc;
 
-			u64_stats_update_begin(&tx_ring->tx_syncp);
-			tx_ring->tx_stats.restart_queue++;
-			u64_stats_update_end(&tx_ring->tx_syncp);
-		}
-	}
+		eop_desc = IGB_TX_DESC(tx_ring, eop);
 
-	if (tx_ring->detect_tx_hung) {
 		/* Detect a transmit hang in hardware, this serializes the
 		 * check with the clearing of time_stamp and movement of i */
 		tx_ring->detect_tx_hung = false;
@@ -5666,16 +5678,34 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 				eop,
 				jiffies,
 				eop_desc->wb.status);
-			netif_stop_subqueue(netdev, tx_ring->queue_index);
+			netif_stop_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
+
+			/* we are about to reset, no point in enabling stuff */
+			return true;
 		}
 	}
-	tx_ring->total_bytes += total_bytes;
-	tx_ring->total_packets += total_packets;
-	u64_stats_update_begin(&tx_ring->tx_syncp);
-	tx_ring->tx_stats.bytes += total_bytes;
-	tx_ring->tx_stats.packets += total_packets;
-	u64_stats_update_end(&tx_ring->tx_syncp);
-	return count < tx_ring->count;
+
+	if (unlikely(total_packets &&
+		     netif_carrier_ok(tx_ring->netdev) &&
+		     igb_desc_unused(tx_ring) >= IGB_TX_QUEUE_WAKE)) {
+		/* Make sure that anybody stopping the queue after this
+		 * sees the new next_to_clean.
+		 */
+		smp_mb();
+		if (__netif_subqueue_stopped(tx_ring->netdev,
+					     tx_ring->queue_index) &&
+		    !(test_bit(__IGB_DOWN, &adapter->state))) {
+			netif_wake_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
+
+			u64_stats_update_begin(&tx_ring->tx_syncp);
+			tx_ring->tx_stats.restart_queue++;
+			u64_stats_update_end(&tx_ring->tx_syncp);
+		}
+	}
+
+	return !!budget;
 }
 
 static inline void igb_rx_checksum(struct igb_ring *ring,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 12/13] igb: split buffer_info into tx_buffer_info and rx_buffer_info
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (10 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 11/13] igb: Make Tx budget for NAPI user adjustable Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  2011-09-17  8:04 ` [net-next 13/13] igb: Consolidate creation of Tx context descriptors into a single function Jeff Kirsher
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

In order to be able to improve the performance of the Tx path it has been
necessary to add addition info to the tx_buffer_info structure.  However a
side effect is that the structure has gotten larger and this in turn has
also increased the size of the Rx buffer info structure.  In order to avoid
this in the future I am splitting the single buffer_info structure into two
separate ones and instead I will join them by making the buffer_info
pointer in the ring a union of the two.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |   42 +++++-----
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   15 ++--
 drivers/net/ethernet/intel/igb/igb_main.c    |  123 +++++++++++++-------------
 3 files changed, 92 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index cfa590c8..fab6e92 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -132,27 +132,24 @@ struct vf_data_storage {
 
 /* wrapper around a pointer to a socket buffer,
  * so a DMA handle can be stored along with the buffer */
-struct igb_buffer {
+struct igb_tx_buffer {
+	u16 next_to_watch;
+	unsigned long time_stamp;
+	dma_addr_t dma;
+	u32 length;
+	u32 tx_flags;
+	struct sk_buff *skb;
+	unsigned int bytecount;
+	u16 gso_segs;
+	u8 mapped_as_page;
+};
+
+struct igb_rx_buffer {
 	struct sk_buff *skb;
 	dma_addr_t dma;
-	union {
-		/* TX */
-		struct {
-			unsigned long time_stamp;
-			u16 length;
-			u16 next_to_watch;
-			unsigned int bytecount;
-			u16 gso_segs;
-			u8 tx_flags;
-			u8 mapped_as_page;
-		};
-		/* RX */
-		struct {
-			struct page *page;
-			dma_addr_t page_dma;
-			u16 page_offset;
-		};
-	};
+	struct page *page;
+	dma_addr_t page_dma;
+	u32 page_offset;
 };
 
 struct igb_tx_queue_stats {
@@ -191,7 +188,10 @@ struct igb_ring {
 	struct igb_q_vector *q_vector;	/* backlink to q_vector */
 	struct net_device *netdev;	/* back pointer to net_device */
 	struct device *dev;		/* device pointer for dma mapping */
-	struct igb_buffer *buffer_info;	/* array of buffer info structs */
+	union {				/* array of buffer info structs */
+		struct igb_tx_buffer *tx_buffer_info;
+		struct igb_rx_buffer *rx_buffer_info;
+	};
 	void *desc;			/* descriptor ring memory */
 	unsigned long flags;		/* ring specific flags */
 	void __iomem *tail;		/* pointer to ring tail register */
@@ -377,7 +377,7 @@ extern void igb_setup_tctl(struct igb_adapter *);
 extern void igb_setup_rctl(struct igb_adapter *);
 extern netdev_tx_t igb_xmit_frame_ring(struct sk_buff *, struct igb_ring *);
 extern void igb_unmap_and_free_tx_resource(struct igb_ring *,
-					   struct igb_buffer *);
+					   struct igb_tx_buffer *);
 extern void igb_alloc_rx_buffers(struct igb_ring *, u16);
 extern void igb_update_stats(struct igb_adapter *, struct rtnl_link_stats64 *);
 extern bool igb_has_link(struct igb_adapter *adapter);
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index f6da820..6bcfad1 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -1579,7 +1579,8 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
                                 unsigned int size)
 {
 	union e1000_adv_rx_desc *rx_desc;
-	struct igb_buffer *buffer_info;
+	struct igb_rx_buffer *rx_buffer_info;
+	struct igb_tx_buffer *tx_buffer_info;
 	int rx_ntc, tx_ntc, count = 0;
 	u32 staterr;
 
@@ -1591,22 +1592,22 @@ static int igb_clean_test_rings(struct igb_ring *rx_ring,
 
 	while (staterr & E1000_RXD_STAT_DD) {
 		/* check rx buffer */
-		buffer_info = &rx_ring->buffer_info[rx_ntc];
+		rx_buffer_info = &rx_ring->rx_buffer_info[rx_ntc];
 
 		/* unmap rx buffer, will be remapped by alloc_rx_buffers */
 		dma_unmap_single(rx_ring->dev,
-		                 buffer_info->dma,
+				 rx_buffer_info->dma,
 				 IGB_RX_HDR_LEN,
 				 DMA_FROM_DEVICE);
-		buffer_info->dma = 0;
+		rx_buffer_info->dma = 0;
 
 		/* verify contents of skb */
-		if (!igb_check_lbtest_frame(buffer_info->skb, size))
+		if (!igb_check_lbtest_frame(rx_buffer_info->skb, size))
 			count++;
 
 		/* unmap buffer on tx side */
-		buffer_info = &tx_ring->buffer_info[tx_ntc];
-		igb_unmap_and_free_tx_resource(tx_ring, buffer_info);
+		tx_buffer_info = &tx_ring->tx_buffer_info[tx_ntc];
+		igb_unmap_and_free_tx_resource(tx_ring, tx_buffer_info);
 
 		/* increment rx/tx next to clean counters */
 		rx_ntc++;
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 989e774..afa0ca9 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -339,7 +339,6 @@ static void igb_dump(struct igb_adapter *adapter)
 	struct igb_ring *tx_ring;
 	union e1000_adv_tx_desc *tx_desc;
 	struct my_u0 { u64 a; u64 b; } *u0;
-	struct igb_buffer *buffer_info;
 	struct igb_ring *rx_ring;
 	union e1000_adv_rx_desc *rx_desc;
 	u32 staterr;
@@ -376,8 +375,9 @@ static void igb_dump(struct igb_adapter *adapter)
 	printk(KERN_INFO "Queue [NTU] [NTC] [bi(ntc)->dma  ]"
 		" leng ntw timestamp\n");
 	for (n = 0; n < adapter->num_tx_queues; n++) {
+		struct igb_tx_buffer *buffer_info;
 		tx_ring = adapter->tx_ring[n];
-		buffer_info = &tx_ring->buffer_info[tx_ring->next_to_clean];
+		buffer_info = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
 		printk(KERN_INFO " %5d %5X %5X %016llX %04X %3X %016llX\n",
 			   n, tx_ring->next_to_use, tx_ring->next_to_clean,
 			   (u64)buffer_info->dma,
@@ -413,8 +413,9 @@ static void igb_dump(struct igb_adapter *adapter)
 			"leng  ntw timestamp        bi->skb\n");
 
 		for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
+			struct igb_tx_buffer *buffer_info;
 			tx_desc = IGB_TX_DESC(tx_ring, i);
-			buffer_info = &tx_ring->buffer_info[i];
+			buffer_info = &tx_ring->tx_buffer_info[i];
 			u0 = (struct my_u0 *)tx_desc;
 			printk(KERN_INFO "T [0x%03X]    %016llX %016llX %016llX"
 				" %04X  %3X %016llX %p", i,
@@ -493,7 +494,8 @@ rx_ring_summary:
 			"<-- Adv Rx Write-Back format\n");
 
 		for (i = 0; i < rx_ring->count; i++) {
-			buffer_info = &rx_ring->buffer_info[i];
+			struct igb_rx_buffer *buffer_info;
+			buffer_info = &rx_ring->rx_buffer_info[i];
 			rx_desc = IGB_RX_DESC(rx_ring, i);
 			u0 = (struct my_u0 *)rx_desc;
 			staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
@@ -2576,9 +2578,9 @@ int igb_setup_tx_resources(struct igb_ring *tx_ring)
 	struct device *dev = tx_ring->dev;
 	int size;
 
-	size = sizeof(struct igb_buffer) * tx_ring->count;
-	tx_ring->buffer_info = vzalloc(size);
-	if (!tx_ring->buffer_info)
+	size = sizeof(struct igb_tx_buffer) * tx_ring->count;
+	tx_ring->tx_buffer_info = vzalloc(size);
+	if (!tx_ring->tx_buffer_info)
 		goto err;
 
 	/* round up to nearest 4K */
@@ -2598,7 +2600,7 @@ int igb_setup_tx_resources(struct igb_ring *tx_ring)
 	return 0;
 
 err:
-	vfree(tx_ring->buffer_info);
+	vfree(tx_ring->tx_buffer_info);
 	dev_err(dev,
 		"Unable to allocate memory for the transmit descriptor ring\n");
 	return -ENOMEM;
@@ -2719,9 +2721,9 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
 	struct device *dev = rx_ring->dev;
 	int size, desc_len;
 
-	size = sizeof(struct igb_buffer) * rx_ring->count;
-	rx_ring->buffer_info = vzalloc(size);
-	if (!rx_ring->buffer_info)
+	size = sizeof(struct igb_rx_buffer) * rx_ring->count;
+	rx_ring->rx_buffer_info = vzalloc(size);
+	if (!rx_ring->rx_buffer_info)
 		goto err;
 
 	desc_len = sizeof(union e1000_adv_rx_desc);
@@ -2744,8 +2746,8 @@ int igb_setup_rx_resources(struct igb_ring *rx_ring)
 	return 0;
 
 err:
-	vfree(rx_ring->buffer_info);
-	rx_ring->buffer_info = NULL;
+	vfree(rx_ring->rx_buffer_info);
+	rx_ring->rx_buffer_info = NULL;
 	dev_err(dev, "Unable to allocate memory for the receive descriptor"
 		" ring\n");
 	return -ENOMEM;
@@ -3107,8 +3109,8 @@ void igb_free_tx_resources(struct igb_ring *tx_ring)
 {
 	igb_clean_tx_ring(tx_ring);
 
-	vfree(tx_ring->buffer_info);
-	tx_ring->buffer_info = NULL;
+	vfree(tx_ring->tx_buffer_info);
+	tx_ring->tx_buffer_info = NULL;
 
 	/* if not set, then don't free */
 	if (!tx_ring->desc)
@@ -3135,7 +3137,7 @@ static void igb_free_all_tx_resources(struct igb_adapter *adapter)
 }
 
 void igb_unmap_and_free_tx_resource(struct igb_ring *tx_ring,
-				    struct igb_buffer *buffer_info)
+				    struct igb_tx_buffer *buffer_info)
 {
 	if (buffer_info->dma) {
 		if (buffer_info->mapped_as_page)
@@ -3166,21 +3168,21 @@ void igb_unmap_and_free_tx_resource(struct igb_ring *tx_ring,
  **/
 static void igb_clean_tx_ring(struct igb_ring *tx_ring)
 {
-	struct igb_buffer *buffer_info;
+	struct igb_tx_buffer *buffer_info;
 	unsigned long size;
 	unsigned int i;
 
-	if (!tx_ring->buffer_info)
+	if (!tx_ring->tx_buffer_info)
 		return;
 	/* Free all the Tx ring sk_buffs */
 
 	for (i = 0; i < tx_ring->count; i++) {
-		buffer_info = &tx_ring->buffer_info[i];
+		buffer_info = &tx_ring->tx_buffer_info[i];
 		igb_unmap_and_free_tx_resource(tx_ring, buffer_info);
 	}
 
-	size = sizeof(struct igb_buffer) * tx_ring->count;
-	memset(tx_ring->buffer_info, 0, size);
+	size = sizeof(struct igb_tx_buffer) * tx_ring->count;
+	memset(tx_ring->tx_buffer_info, 0, size);
 
 	/* Zero out the descriptor ring */
 	memset(tx_ring->desc, 0, tx_ring->size);
@@ -3211,8 +3213,8 @@ void igb_free_rx_resources(struct igb_ring *rx_ring)
 {
 	igb_clean_rx_ring(rx_ring);
 
-	vfree(rx_ring->buffer_info);
-	rx_ring->buffer_info = NULL;
+	vfree(rx_ring->rx_buffer_info);
+	rx_ring->rx_buffer_info = NULL;
 
 	/* if not set, then don't free */
 	if (!rx_ring->desc)
@@ -3247,12 +3249,12 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
 	unsigned long size;
 	u16 i;
 
-	if (!rx_ring->buffer_info)
+	if (!rx_ring->rx_buffer_info)
 		return;
 
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
-		struct igb_buffer *buffer_info = &rx_ring->buffer_info[i];
+		struct igb_rx_buffer *buffer_info = &rx_ring->rx_buffer_info[i];
 		if (buffer_info->dma) {
 			dma_unmap_single(rx_ring->dev,
 			                 buffer_info->dma,
@@ -3279,8 +3281,8 @@ static void igb_clean_rx_ring(struct igb_ring *rx_ring)
 		}
 	}
 
-	size = sizeof(struct igb_buffer) * rx_ring->count;
-	memset(rx_ring->buffer_info, 0, size);
+	size = sizeof(struct igb_rx_buffer) * rx_ring->count;
+	memset(rx_ring->rx_buffer_info, 0, size);
 
 	/* Zero out the descriptor ring */
 	memset(rx_ring->desc, 0, rx_ring->size);
@@ -3964,7 +3966,7 @@ static inline int igb_tso(struct igb_ring *tx_ring,
 	struct e1000_adv_tx_context_desc *context_desc;
 	unsigned int i;
 	int err;
-	struct igb_buffer *buffer_info;
+	struct igb_tx_buffer *buffer_info;
 	u32 info = 0, tu_cmd = 0;
 	u32 mss_l4len_idx;
 	u8 l4len;
@@ -3995,7 +3997,7 @@ static inline int igb_tso(struct igb_ring *tx_ring,
 
 	i = tx_ring->next_to_use;
 
-	buffer_info = &tx_ring->buffer_info[i];
+	buffer_info = &tx_ring->tx_buffer_info[i];
 	context_desc = IGB_TX_CTXTDESC(tx_ring, i);
 	/* VLAN MACLEN IPLEN */
 	if (tx_flags & IGB_TX_FLAGS_VLAN)
@@ -4043,14 +4045,14 @@ static inline bool igb_tx_csum(struct igb_ring *tx_ring,
 {
 	struct e1000_adv_tx_context_desc *context_desc;
 	struct device *dev = tx_ring->dev;
-	struct igb_buffer *buffer_info;
+	struct igb_tx_buffer *buffer_info;
 	u32 info = 0, tu_cmd = 0;
 	unsigned int i;
 
 	if ((skb->ip_summed == CHECKSUM_PARTIAL) ||
 	    (tx_flags & IGB_TX_FLAGS_VLAN)) {
 		i = tx_ring->next_to_use;
-		buffer_info = &tx_ring->buffer_info[i];
+		buffer_info = &tx_ring->tx_buffer_info[i];
 		context_desc = IGB_TX_CTXTDESC(tx_ring, i);
 
 		if (tx_flags & IGB_TX_FLAGS_VLAN)
@@ -4126,7 +4128,7 @@ static inline bool igb_tx_csum(struct igb_ring *tx_ring,
 static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 			     unsigned int first)
 {
-	struct igb_buffer *buffer_info;
+	struct igb_tx_buffer *buffer_info;
 	struct device *dev = tx_ring->dev;
 	unsigned int hlen = skb_headlen(skb);
 	unsigned int count = 0, i;
@@ -4135,7 +4137,7 @@ static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 
 	i = tx_ring->next_to_use;
 
-	buffer_info = &tx_ring->buffer_info[i];
+	buffer_info = &tx_ring->tx_buffer_info[i];
 	BUG_ON(hlen >= IGB_MAX_DATA_PER_TXD);
 	buffer_info->length = hlen;
 	/* set time_stamp *before* dma to help avoid a possible race */
@@ -4155,7 +4157,7 @@ static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 		if (i == tx_ring->count)
 			i = 0;
 
-		buffer_info = &tx_ring->buffer_info[i];
+		buffer_info = &tx_ring->tx_buffer_info[i];
 		BUG_ON(len >= IGB_MAX_DATA_PER_TXD);
 		buffer_info->length = len;
 		buffer_info->time_stamp = jiffies;
@@ -4168,12 +4170,12 @@ static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 
 	}
 
-	tx_ring->buffer_info[i].skb = skb;
-	tx_ring->buffer_info[i].tx_flags = skb_shinfo(skb)->tx_flags;
+	buffer_info->skb = skb;
+	buffer_info->tx_flags = skb_shinfo(skb)->tx_flags;
 	/* multiply data chunks by size of headers */
-	tx_ring->buffer_info[i].bytecount = ((gso_segs - 1) * hlen) + skb->len;
-	tx_ring->buffer_info[i].gso_segs = gso_segs;
-	tx_ring->buffer_info[first].next_to_watch = i;
+	buffer_info->bytecount = ((gso_segs - 1) * hlen) + skb->len;
+	buffer_info->gso_segs = gso_segs;
+	tx_ring->tx_buffer_info[first].next_to_watch = i;
 
 	return ++count;
 
@@ -4192,7 +4194,7 @@ dma_error:
 		if (i == 0)
 			i = tx_ring->count;
 		i--;
-		buffer_info = &tx_ring->buffer_info[i];
+		buffer_info = &tx_ring->tx_buffer_info[i];
 		igb_unmap_and_free_tx_resource(tx_ring, buffer_info);
 	}
 
@@ -4204,7 +4206,7 @@ static inline void igb_tx_queue(struct igb_ring *tx_ring,
 				u8 hdr_len)
 {
 	union e1000_adv_tx_desc *tx_desc;
-	struct igb_buffer *buffer_info;
+	struct igb_tx_buffer *buffer_info;
 	u32 olinfo_status = 0, cmd_type_len;
 	unsigned int i = tx_ring->next_to_use;
 
@@ -4240,7 +4242,7 @@ static inline void igb_tx_queue(struct igb_ring *tx_ring,
 	olinfo_status |= ((paylen - hdr_len) << E1000_ADVTXD_PAYLEN_SHIFT);
 
 	do {
-		buffer_info = &tx_ring->buffer_info[i];
+		buffer_info = &tx_ring->tx_buffer_info[i];
 		tx_desc = IGB_TX_DESC(tx_ring, i);
 		tx_desc->read.buffer_addr = cpu_to_le64(buffer_info->dma);
 		tx_desc->read.cmd_type_len =
@@ -4353,7 +4355,7 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	count = igb_tx_map(tx_ring, skb, first);
 	if (!count) {
 		dev_kfree_skb_any(skb);
-		tx_ring->buffer_info[first].time_stamp = 0;
+		tx_ring->tx_buffer_info[first].time_stamp = 0;
 		tx_ring->next_to_use = first;
 		return NETDEV_TX_OK;
 	}
@@ -5551,13 +5553,14 @@ static void igb_systim_to_hwtstamp(struct igb_adapter *adapter,
 /**
  * igb_tx_hwtstamp - utility function which checks for TX time stamp
  * @q_vector: pointer to q_vector containing needed info
- * @buffer: pointer to igb_buffer structure
+ * @buffer: pointer to igb_tx_buffer structure
  *
  * If we were asked to do hardware stamping and such a time stamp is
  * available, then it must have been for this skb here because we only
  * allow only one such packet into the queue.
  */
-static void igb_tx_hwtstamp(struct igb_q_vector *q_vector, struct igb_buffer *buffer_info)
+static void igb_tx_hwtstamp(struct igb_q_vector *q_vector,
+			    struct igb_tx_buffer *buffer_info)
 {
 	struct igb_adapter *adapter = q_vector->adapter;
 	struct e1000_hw *hw = &adapter->hw;
@@ -5585,7 +5588,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 {
 	struct igb_adapter *adapter = q_vector->adapter;
 	struct igb_ring *tx_ring = q_vector->tx_ring;
-	struct igb_buffer *tx_buffer;
+	struct igb_tx_buffer *tx_buffer;
 	union e1000_adv_tx_desc *tx_desc;
 	unsigned int total_bytes = 0, total_packets = 0;
 	unsigned int budget = q_vector->tx_work_limit;
@@ -5594,7 +5597,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 	if (test_bit(__IGB_DOWN, &adapter->state))
 		return true;
 
-	tx_buffer = &tx_ring->buffer_info[i];
+	tx_buffer = &tx_ring->tx_buffer_info[i];
 	tx_desc = IGB_TX_DESC(tx_ring, i);
 
 	for (; budget; budget--) {
@@ -5627,7 +5630,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 			i++;
 			if (unlikely(i == tx_ring->count)) {
 				i = 0;
-				tx_buffer = tx_ring->buffer_info;
+				tx_buffer = tx_ring->tx_buffer_info;
 				tx_desc = IGB_TX_DESC(tx_ring, 0);
 			}
 		} while (eop_desc);
@@ -5643,7 +5646,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 
 	if (tx_ring->detect_tx_hung) {
 		struct e1000_hw *hw = &adapter->hw;
-		u16 eop = tx_ring->buffer_info[i].next_to_watch;
+		u16 eop = tx_ring->tx_buffer_info[i].next_to_watch;
 		union e1000_adv_tx_desc *eop_desc;
 
 		eop_desc = IGB_TX_DESC(tx_ring, eop);
@@ -5651,8 +5654,8 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 		/* Detect a transmit hang in hardware, this serializes the
 		 * check with the clearing of time_stamp and movement of i */
 		tx_ring->detect_tx_hung = false;
-		if (tx_ring->buffer_info[i].time_stamp &&
-		    time_after(jiffies, tx_ring->buffer_info[i].time_stamp +
+		if (tx_ring->tx_buffer_info[i].time_stamp &&
+		    time_after(jiffies, tx_ring->tx_buffer_info[i].time_stamp +
 			       (adapter->tx_timeout_factor * HZ)) &&
 		    !(rd32(E1000_STATUS) & E1000_STATUS_TXOFF)) {
 
@@ -5674,7 +5677,7 @@ static bool igb_clean_tx_irq(struct igb_q_vector *q_vector)
 				readl(tx_ring->tail),
 				tx_ring->next_to_use,
 				tx_ring->next_to_clean,
-				tx_ring->buffer_info[eop].time_stamp,
+				tx_ring->tx_buffer_info[eop].time_stamp,
 				eop,
 				jiffies,
 				eop_desc->wb.status);
@@ -5802,7 +5805,7 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, int budget)
 	staterr = le32_to_cpu(rx_desc->wb.upper.status_error);
 
 	while (staterr & E1000_RXD_STAT_DD) {
-		struct igb_buffer *buffer_info = &rx_ring->buffer_info[i];
+		struct igb_rx_buffer *buffer_info = &rx_ring->rx_buffer_info[i];
 		struct sk_buff *skb = buffer_info->skb;
 		union e1000_adv_rx_desc *next_rxd;
 
@@ -5855,8 +5858,8 @@ static bool igb_clean_rx_irq(struct igb_q_vector *q_vector, int budget)
 		}
 
 		if (!(staterr & E1000_RXD_STAT_EOP)) {
-			struct igb_buffer *next_buffer;
-			next_buffer = &rx_ring->buffer_info[i];
+			struct igb_rx_buffer *next_buffer;
+			next_buffer = &rx_ring->rx_buffer_info[i];
 			buffer_info->skb = next_buffer->skb;
 			buffer_info->dma = next_buffer->dma;
 			next_buffer->skb = skb;
@@ -5917,7 +5920,7 @@ next_desc:
 }
 
 static bool igb_alloc_mapped_skb(struct igb_ring *rx_ring,
-				 struct igb_buffer *bi)
+				 struct igb_rx_buffer *bi)
 {
 	struct sk_buff *skb = bi->skb;
 	dma_addr_t dma = bi->dma;
@@ -5951,7 +5954,7 @@ static bool igb_alloc_mapped_skb(struct igb_ring *rx_ring,
 }
 
 static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
-				  struct igb_buffer *bi)
+				  struct igb_rx_buffer *bi)
 {
 	struct page *page = bi->page;
 	dma_addr_t page_dma = bi->page_dma;
@@ -5990,11 +5993,11 @@ static bool igb_alloc_mapped_page(struct igb_ring *rx_ring,
 void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 {
 	union e1000_adv_rx_desc *rx_desc;
-	struct igb_buffer *bi;
+	struct igb_rx_buffer *bi;
 	u16 i = rx_ring->next_to_use;
 
 	rx_desc = IGB_RX_DESC(rx_ring, i);
-	bi = &rx_ring->buffer_info[i];
+	bi = &rx_ring->rx_buffer_info[i];
 	i -= rx_ring->count;
 
 	while (cleaned_count--) {
@@ -6015,7 +6018,7 @@ void igb_alloc_rx_buffers(struct igb_ring *rx_ring, u16 cleaned_count)
 		i++;
 		if (unlikely(!i)) {
 			rx_desc = IGB_RX_DESC(rx_ring, 0);
-			bi = rx_ring->buffer_info;
+			bi = rx_ring->rx_buffer_info;
 			i -= rx_ring->count;
 		}
 
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [net-next 13/13] igb: Consolidate creation of Tx context descriptors into a single function
  2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
                   ` (11 preceding siblings ...)
  2011-09-17  8:04 ` [net-next 12/13] igb: split buffer_info into tx_buffer_info and rx_buffer_info Jeff Kirsher
@ 2011-09-17  8:04 ` Jeff Kirsher
  12 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-17  8:04 UTC (permalink / raw)
  To: davem; +Cc: Alexander Duyck, netdev, gospo, Jeff Kirsher

From: Alexander Duyck <alexander.h.duyck@intel.com>

This patch is meant to simplify the transmit path by reducing the overhead
for creating a transmit context descriptor.  The current implementation is
split with igb_tso and igb_tx_csum doing two separate implementations on
how to setup the tx_buffer_info structure and the tx_desc.  By combining
them it is possible to reduce code and simplify things since now only one
function will create context descriptors.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |  233 +++++++++++++----------------
 1 files changed, 106 insertions(+), 127 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index afa0ca9..563224a 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -45,6 +45,9 @@
 #include <linux/pci-aspm.h>
 #include <linux/delay.h>
 #include <linux/interrupt.h>
+#include <linux/ip.h>
+#include <linux/tcp.h>
+#include <linux/sctp.h>
 #include <linux/if_ether.h>
 #include <linux/aer.h>
 #include <linux/prefetch.h>
@@ -3960,16 +3963,39 @@ set_itr_now:
 #define IGB_TX_FLAGS_VLAN_MASK		0xffff0000
 #define IGB_TX_FLAGS_VLAN_SHIFT		        16
 
+void igb_tx_ctxtdesc(struct igb_ring *tx_ring, u32 vlan_macip_lens,
+		     u32 type_tucmd, u32 mss_l4len_idx)
+{
+	struct e1000_adv_tx_context_desc *context_desc;
+	u16 i = tx_ring->next_to_use;
+
+	context_desc = IGB_TX_CTXTDESC(tx_ring, i);
+
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
+
+	/* set bits to identify this as an advanced context descriptor */
+	type_tucmd |= E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT;
+
+	/* For 82575, context index must be unique per ring. */
+	if (tx_ring->flags & IGB_RING_FLAG_TX_CTX_IDX)
+		mss_l4len_idx |= tx_ring->reg_idx << 4;
+
+	context_desc->vlan_macip_lens	= cpu_to_le32(vlan_macip_lens);
+	context_desc->seqnum_seed	= 0;
+	context_desc->type_tucmd_mlhl	= cpu_to_le32(type_tucmd);
+	context_desc->mss_l4len_idx	= cpu_to_le32(mss_l4len_idx);
+}
+
 static inline int igb_tso(struct igb_ring *tx_ring,
 			  struct sk_buff *skb, u32 tx_flags, u8 *hdr_len)
 {
-	struct e1000_adv_tx_context_desc *context_desc;
-	unsigned int i;
 	int err;
-	struct igb_tx_buffer *buffer_info;
-	u32 info = 0, tu_cmd = 0;
-	u32 mss_l4len_idx;
-	u8 l4len;
+	u32 vlan_macip_lens, type_tucmd;
+	u32 mss_l4len_idx, l4len;
+
+	if (!skb_is_gso(skb))
+		return 0;
 
 	if (skb_header_cloned(skb)) {
 		err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC);
@@ -3977,8 +4003,8 @@ static inline int igb_tso(struct igb_ring *tx_ring,
 			return err;
 	}
 
-	l4len = tcp_hdrlen(skb);
-	*hdr_len += l4len;
+	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
+	type_tucmd = E1000_ADVTXD_TUCMD_L4T_TCP;
 
 	if (skb->protocol == htons(ETH_P_IP)) {
 		struct iphdr *iph = ip_hdr(skb);
@@ -3988,6 +4014,7 @@ static inline int igb_tso(struct igb_ring *tx_ring,
 							 iph->daddr, 0,
 							 IPPROTO_TCP,
 							 0);
+		type_tucmd |= E1000_ADVTXD_TUCMD_IPV4;
 	} else if (skb_is_gso_v6(skb)) {
 		ipv6_hdr(skb)->payload_len = 0;
 		tcp_hdr(skb)->check = ~csum_ipv6_magic(&ipv6_hdr(skb)->saddr,
@@ -3995,131 +4022,85 @@ static inline int igb_tso(struct igb_ring *tx_ring,
 						       0, IPPROTO_TCP, 0);
 	}
 
-	i = tx_ring->next_to_use;
-
-	buffer_info = &tx_ring->tx_buffer_info[i];
-	context_desc = IGB_TX_CTXTDESC(tx_ring, i);
-	/* VLAN MACLEN IPLEN */
-	if (tx_flags & IGB_TX_FLAGS_VLAN)
-		info |= (tx_flags & IGB_TX_FLAGS_VLAN_MASK);
-	info |= (skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT);
-	*hdr_len += skb_network_offset(skb);
-	info |= skb_network_header_len(skb);
-	*hdr_len += skb_network_header_len(skb);
-	context_desc->vlan_macip_lens = cpu_to_le32(info);
-
-	/* ADV DTYP TUCMD MKRLOC/ISCSIHEDLEN */
-	tu_cmd |= (E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT);
-
-	if (skb->protocol == htons(ETH_P_IP))
-		tu_cmd |= E1000_ADVTXD_TUCMD_IPV4;
-	tu_cmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
-
-	context_desc->type_tucmd_mlhl = cpu_to_le32(tu_cmd);
+	l4len = tcp_hdrlen(skb);
+	*hdr_len = skb_transport_offset(skb) + l4len;
 
 	/* MSS L4LEN IDX */
-	mss_l4len_idx = (skb_shinfo(skb)->gso_size << E1000_ADVTXD_MSS_SHIFT);
-	mss_l4len_idx |= (l4len << E1000_ADVTXD_L4LEN_SHIFT);
+	mss_l4len_idx = l4len << E1000_ADVTXD_L4LEN_SHIFT;
+	mss_l4len_idx |= skb_shinfo(skb)->gso_size << E1000_ADVTXD_MSS_SHIFT;
 
-	/* For 82575, context index must be unique per ring. */
-	if (tx_ring->flags & IGB_RING_FLAG_TX_CTX_IDX)
-		mss_l4len_idx |= tx_ring->reg_idx << 4;
-
-	context_desc->mss_l4len_idx = cpu_to_le32(mss_l4len_idx);
-	context_desc->seqnum_seed = 0;
-
-	buffer_info->time_stamp = jiffies;
-	buffer_info->next_to_watch = i;
-	buffer_info->dma = 0;
-	i++;
-	if (i == tx_ring->count)
-		i = 0;
+	/* VLAN MACLEN IPLEN */
+	vlan_macip_lens = skb_network_header_len(skb);
+	vlan_macip_lens |= skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= tx_flags & IGB_TX_FLAGS_VLAN_MASK;
 
-	tx_ring->next_to_use = i;
+	igb_tx_ctxtdesc(tx_ring, vlan_macip_lens, type_tucmd, mss_l4len_idx);
 
-	return true;
+	return 1;
 }
 
 static inline bool igb_tx_csum(struct igb_ring *tx_ring,
 			       struct sk_buff *skb, u32 tx_flags)
 {
-	struct e1000_adv_tx_context_desc *context_desc;
-	struct device *dev = tx_ring->dev;
-	struct igb_tx_buffer *buffer_info;
-	u32 info = 0, tu_cmd = 0;
-	unsigned int i;
-
-	if ((skb->ip_summed == CHECKSUM_PARTIAL) ||
-	    (tx_flags & IGB_TX_FLAGS_VLAN)) {
-		i = tx_ring->next_to_use;
-		buffer_info = &tx_ring->tx_buffer_info[i];
-		context_desc = IGB_TX_CTXTDESC(tx_ring, i);
-
-		if (tx_flags & IGB_TX_FLAGS_VLAN)
-			info |= (tx_flags & IGB_TX_FLAGS_VLAN_MASK);
-
-		info |= (skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT);
-		if (skb->ip_summed == CHECKSUM_PARTIAL)
-			info |= skb_network_header_len(skb);
-
-		context_desc->vlan_macip_lens = cpu_to_le32(info);
-
-		tu_cmd |= (E1000_TXD_CMD_DEXT | E1000_ADVTXD_DTYP_CTXT);
-
-		if (skb->ip_summed == CHECKSUM_PARTIAL) {
-			__be16 protocol;
+	u32 vlan_macip_lens = 0;
+	u32 mss_l4len_idx = 0;
+	u32 type_tucmd = 0;
 
-			if (skb->protocol == cpu_to_be16(ETH_P_8021Q)) {
-				const struct vlan_ethhdr *vhdr =
-				          (const struct vlan_ethhdr*)skb->data;
-
-				protocol = vhdr->h_vlan_encapsulated_proto;
-			} else {
-				protocol = skb->protocol;
+	if (skb->ip_summed != CHECKSUM_PARTIAL) {
+		if (!(tx_flags & IGB_TX_FLAGS_VLAN))
+			return false;
+	} else {
+		u8 l4_hdr = 0;
+		switch (skb->protocol) {
+		case __constant_htons(ETH_P_IP):
+			vlan_macip_lens |= skb_network_header_len(skb);
+			type_tucmd |= E1000_ADVTXD_TUCMD_IPV4;
+			l4_hdr = ip_hdr(skb)->protocol;
+			break;
+		case __constant_htons(ETH_P_IPV6):
+			vlan_macip_lens |= skb_network_header_len(skb);
+			l4_hdr = ipv6_hdr(skb)->nexthdr;
+			break;
+		default:
+			if (unlikely(net_ratelimit())) {
+				dev_warn(tx_ring->dev,
+				 "partial checksum but proto=%x!\n",
+				 skb->protocol);
 			}
+			break;
+		}
 
-			switch (protocol) {
-			case cpu_to_be16(ETH_P_IP):
-				tu_cmd |= E1000_ADVTXD_TUCMD_IPV4;
-				if (ip_hdr(skb)->protocol == IPPROTO_TCP)
-					tu_cmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
-				else if (ip_hdr(skb)->protocol == IPPROTO_SCTP)
-					tu_cmd |= E1000_ADVTXD_TUCMD_L4T_SCTP;
-				break;
-			case cpu_to_be16(ETH_P_IPV6):
-				/* XXX what about other V6 headers?? */
-				if (ipv6_hdr(skb)->nexthdr == IPPROTO_TCP)
-					tu_cmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
-				else if (ipv6_hdr(skb)->nexthdr == IPPROTO_SCTP)
-					tu_cmd |= E1000_ADVTXD_TUCMD_L4T_SCTP;
-				break;
-			default:
-				if (unlikely(net_ratelimit()))
-					dev_warn(dev,
-					    "partial checksum but proto=%x!\n",
-					    skb->protocol);
-				break;
+		switch (l4_hdr) {
+		case IPPROTO_TCP:
+			type_tucmd |= E1000_ADVTXD_TUCMD_L4T_TCP;
+			mss_l4len_idx = tcp_hdrlen(skb) <<
+					E1000_ADVTXD_L4LEN_SHIFT;
+			break;
+		case IPPROTO_SCTP:
+			type_tucmd |= E1000_ADVTXD_TUCMD_L4T_SCTP;
+			mss_l4len_idx = sizeof(struct sctphdr) <<
+					E1000_ADVTXD_L4LEN_SHIFT;
+			break;
+		case IPPROTO_UDP:
+			mss_l4len_idx = sizeof(struct udphdr) <<
+					E1000_ADVTXD_L4LEN_SHIFT;
+			break;
+		default:
+			if (unlikely(net_ratelimit())) {
+				dev_warn(tx_ring->dev,
+				 "partial checksum but l4 proto=%x!\n",
+				 l4_hdr);
 			}
+			break;
 		}
+	}
 
-		context_desc->type_tucmd_mlhl = cpu_to_le32(tu_cmd);
-		context_desc->seqnum_seed = 0;
-		if (tx_ring->flags & IGB_RING_FLAG_TX_CTX_IDX)
-			context_desc->mss_l4len_idx =
-				cpu_to_le32(tx_ring->reg_idx << 4);
+	vlan_macip_lens |= skb_network_offset(skb) << E1000_ADVTXD_MACLEN_SHIFT;
+	vlan_macip_lens |= tx_flags & IGB_TX_FLAGS_VLAN_MASK;
 
-		buffer_info->time_stamp = jiffies;
-		buffer_info->next_to_watch = i;
-		buffer_info->dma = 0;
+	igb_tx_ctxtdesc(tx_ring, vlan_macip_lens, type_tucmd, mss_l4len_idx);
 
-		i++;
-		if (i == tx_ring->count)
-			i = 0;
-		tx_ring->next_to_use = i;
-
-		return true;
-	}
-	return false;
+	return (skb->ip_summed == CHECKSUM_PARTIAL);
 }
 
 #define IGB_MAX_TXD_PWR	16
@@ -4140,8 +4121,6 @@ static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 	buffer_info = &tx_ring->tx_buffer_info[i];
 	BUG_ON(hlen >= IGB_MAX_DATA_PER_TXD);
 	buffer_info->length = hlen;
-	/* set time_stamp *before* dma to help avoid a possible race */
-	buffer_info->time_stamp = jiffies;
 	buffer_info->next_to_watch = i;
 	buffer_info->dma = dma_map_single(dev, skb->data, hlen,
 					  DMA_TO_DEVICE);
@@ -4160,7 +4139,6 @@ static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 		buffer_info = &tx_ring->tx_buffer_info[i];
 		BUG_ON(len >= IGB_MAX_DATA_PER_TXD);
 		buffer_info->length = len;
-		buffer_info->time_stamp = jiffies;
 		buffer_info->next_to_watch = i;
 		buffer_info->mapped_as_page = true;
 		buffer_info->dma = skb_frag_dma_map(dev, frag, 0, len,
@@ -4176,6 +4154,7 @@ static inline int igb_tx_map(struct igb_ring *tx_ring, struct sk_buff *skb,
 	buffer_info->bytecount = ((gso_segs - 1) * hlen) + skb->len;
 	buffer_info->gso_segs = gso_segs;
 	tx_ring->tx_buffer_info[first].next_to_watch = i;
+	tx_ring->tx_buffer_info[first].time_stamp = jiffies;
 
 	return ++count;
 
@@ -4304,7 +4283,7 @@ static inline int igb_maybe_stop_tx(struct igb_ring *tx_ring, int size)
 netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 				struct igb_ring *tx_ring)
 {
-	int tso = 0, count;
+	int tso, count;
 	u32 tx_flags = 0;
 	u16 first;
 	u8 hdr_len = 0;
@@ -4333,16 +4312,12 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 		tx_flags |= IGB_TX_FLAGS_IPV4;
 
 	first = tx_ring->next_to_use;
-	if (skb_is_gso(skb)) {
-		tso = igb_tso(tx_ring, skb, tx_flags, &hdr_len);
 
-		if (tso < 0) {
-			dev_kfree_skb_any(skb);
-			return NETDEV_TX_OK;
-		}
-	}
+	tso = igb_tso(tx_ring, skb, tx_flags, &hdr_len);
 
-	if (tso)
+	if (tso < 0)
+		goto out_drop;
+	else if (tso)
 		tx_flags |= IGB_TX_FLAGS_TSO;
 	else if (igb_tx_csum(tx_ring, skb, tx_flags) &&
 	         (skb->ip_summed == CHECKSUM_PARTIAL))
@@ -4366,6 +4341,10 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
 	igb_maybe_stop_tx(tx_ring, MAX_SKB_FRAGS + 4);
 
 	return NETDEV_TX_OK;
+
+out_drop:
+	dev_kfree_skb_any(skb);
+	return NETDEV_TX_OK;
 }
 
 static inline struct igb_ring *igb_tx_queue_mapping(struct igb_adapter *adapter,
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [net-next 01/13] ixgb: eliminate checkstack warnings
  2011-09-17  8:04 ` [net-next 01/13] ixgb: eliminate checkstack warnings Jeff Kirsher
@ 2011-09-17  8:43   ` Joe Perches
  2011-09-19 21:36     ` Jesse Brandeburg
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2011-09-17  8:43 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Jesse Brandeburg, netdev, gospo

On Sat, 2011-09-17 at 01:04 -0700, Jeff Kirsher wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Really trivial fix, use kzalloc/kree instead of stack space.

Some more trivialities...

> diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c b/drivers/net/ethernet/intel/ixgb/ixgb_main.c\
[]
> @@ -1120,8 +1120,12 @@ ixgb_set_multi(struct net_device *netdev)
>  		rctl |= IXGB_RCTL_MPE;
>  		IXGB_WRITE_REG(hw, RCTL, rctl);
>  	} else {
> -		u8 mta[IXGB_MAX_NUM_MULTICAST_ADDRESSES *
> -			    IXGB_ETH_LENGTH_OF_ADDRESS];
> +		u8 *mta = kzalloc(IXGB_MAX_NUM_MULTICAST_ADDRESSES *
> +			      IXGB_ETH_LENGTH_OF_ADDRESS, GFP_KERNEL);

This doesn't need to be kzalloc as every byte is overwritten.
It should be kmalloc.

Maybe delete the #define IXGB_ETH_LENGTH_OF_ADDRESS and
sed 's/\bIXGB_ETH_LENGTH_OF_ADDRESS\b/ETH_ALEN/g' ?

Perhaps this loop could be clearer without the multiply:

		i = 0;
		netdev_for_each_mc_addr(ha, netdev)
			memcpy(&mta[i++ * IXGB_ETH_LENGTH_OF_ADDRESS],
			       ha->addr, IXGB_ETH_LENGTH_OF_ADDRESS);

Perhaps:

		u8 *addr = mta;
		netdev_for_each_mc_addr(ha, netdev) {
			memcpy(addr, ha->addr, ETH_ALEN);
			addr += ETH_ALEN;
		}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-17  8:04 ` [net-next 11/13] igb: Make Tx budget for NAPI user adjustable Jeff Kirsher
@ 2011-09-17 17:04   ` Ben Hutchings
  2011-09-19 15:48     ` Alexander Duyck
  2011-09-19 20:57   ` David Miller
  1 sibling, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-09-17 17:04 UTC (permalink / raw)
  To: Jeff Kirsher; +Cc: davem, Alexander Duyck, netdev, gospo

On Sat, 2011-09-17 at 01:04 -0700, Jeff Kirsher wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change is meant to make the NAPI budget limits for transmit
> adjustable.  By doing this it is possible to tune the value for optimal
> performance with applications such as routing.
[...]
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> @@ -1989,6 +1989,9 @@ static int igb_set_coalesce(struct net_device *netdev,
>  	if ((adapter->flags & IGB_FLAG_QUEUE_PAIRS) && ec->tx_coalesce_usecs)
>  		return -EINVAL;
>  
> +	if (ec->tx_max_coalesced_frames_irq)
> +		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
> +
[...]

I don't think it really makes sense to conflate NAPI and interrupt
moderation parameters.  This really ought to be added to NAPI itself.

(NAPI contexts really ought to be exposed through sysfs somehow.  I
think we've discussed this before, and it's tricky due to the lack of a
consistent mapping between those contexts and net devices.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-17 17:04   ` Ben Hutchings
@ 2011-09-19 15:48     ` Alexander Duyck
  2011-09-19 16:05       ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Alexander Duyck @ 2011-09-19 15:48 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, netdev, gospo

On 09/17/2011 10:04 AM, Ben Hutchings wrote:
> On Sat, 2011-09-17 at 01:04 -0700, Jeff Kirsher wrote:
>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>
>> This change is meant to make the NAPI budget limits for transmit
>> adjustable.  By doing this it is possible to tune the value for optimal
>> performance with applications such as routing.
> [...]
>> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
>> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
>> @@ -1989,6 +1989,9 @@ static int igb_set_coalesce(struct net_device *netdev,
>>   	if ((adapter->flags&  IGB_FLAG_QUEUE_PAIRS)&&  ec->tx_coalesce_usecs)
>>   		return -EINVAL;
>>
>> +	if (ec->tx_max_coalesced_frames_irq)
>> +		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
>> +
> [...]
>
> I don't think it really makes sense to conflate NAPI and interrupt
> moderation parameters.  This really ought to be added to NAPI itself.
>
> (NAPI contexts really ought to be exposed through sysfs somehow.  I
> think we've discussed this before, and it's tricky due to the lack of a
> consistent mapping between those contexts and net devices.)
>
> Ben.

All NAPI does is move things from a hard interrupt to a soft interrupt 
in the case of TX cleanup.  If it wasn't for NAPI we would be calling 
ixgbe_clean_tx_irq directly from the interrupt handler and would still 
be using the same limiting value.  This is why placing it here makes sense.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 15:48     ` Alexander Duyck
@ 2011-09-19 16:05       ` Ben Hutchings
  2011-09-19 16:32         ` Alexander Duyck
  2011-09-19 20:56         ` David Miller
  0 siblings, 2 replies; 36+ messages in thread
From: Ben Hutchings @ 2011-09-19 16:05 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Jeff Kirsher, davem, netdev, gospo

On Mon, 2011-09-19 at 08:48 -0700, Alexander Duyck wrote:
> On 09/17/2011 10:04 AM, Ben Hutchings wrote:
> > On Sat, 2011-09-17 at 01:04 -0700, Jeff Kirsher wrote:
> >> From: Alexander Duyck<alexander.h.duyck@intel.com>
> >>
> >> This change is meant to make the NAPI budget limits for transmit
> >> adjustable.  By doing this it is possible to tune the value for optimal
> >> performance with applications such as routing.
> > [...]
> >> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> >> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> >> @@ -1989,6 +1989,9 @@ static int igb_set_coalesce(struct net_device *netdev,
> >>   	if ((adapter->flags&  IGB_FLAG_QUEUE_PAIRS)&&  ec->tx_coalesce_usecs)
> >>   		return -EINVAL;
> >>
> >> +	if (ec->tx_max_coalesced_frames_irq)
> >> +		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
> >> +
> > [...]
> >
> > I don't think it really makes sense to conflate NAPI and interrupt
> > moderation parameters.  This really ought to be added to NAPI itself.
> >
> > (NAPI contexts really ought to be exposed through sysfs somehow.  I
> > think we've discussed this before, and it's tricky due to the lack of a
> > consistent mapping between those contexts and net devices.)
> >
> > Ben.
> 
> All NAPI does is move things from a hard interrupt to a soft interrupt 
> in the case of TX cleanup.  If it wasn't for NAPI we would be calling 
> ixgbe_clean_tx_irq directly from the interrupt handler and would still 
> be using the same limiting value.  This is why placing it here makes sense.

But tx_max_coalesced_frames_irq is not supposed to be a work limit (and
such a work limit doesn't seem useful in the absence of NAPI).  As I
understand it, it is supposed to be an alternate moderation value for
the hardware to use if a frame is sent while the IRQ handler is running.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 16:05       ` Ben Hutchings
@ 2011-09-19 16:32         ` Alexander Duyck
  2011-09-19 21:00           ` David Miller
  2011-09-19 20:56         ` David Miller
  1 sibling, 1 reply; 36+ messages in thread
From: Alexander Duyck @ 2011-09-19 16:32 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Jeff Kirsher, davem, netdev, gospo

On 09/19/2011 09:05 AM, Ben Hutchings wrote:
> On Mon, 2011-09-19 at 08:48 -0700, Alexander Duyck wrote:
>> On 09/17/2011 10:04 AM, Ben Hutchings wrote:
>>> On Sat, 2011-09-17 at 01:04 -0700, Jeff Kirsher wrote:
>>>> From: Alexander Duyck<alexander.h.duyck@intel.com>
>>>>
>>>> This change is meant to make the NAPI budget limits for transmit
>>>> adjustable.  By doing this it is possible to tune the value for optimal
>>>> performance with applications such as routing.
>>> [...]
>>>> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
>>>> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
>>>> @@ -1989,6 +1989,9 @@ static int igb_set_coalesce(struct net_device *netdev,
>>>>    	if ((adapter->flags&   IGB_FLAG_QUEUE_PAIRS)&&   ec->tx_coalesce_usecs)
>>>>    		return -EINVAL;
>>>>
>>>> +	if (ec->tx_max_coalesced_frames_irq)
>>>> +		adapter->tx_work_limit = ec->tx_max_coalesced_frames_irq;
>>>> +
>>> [...]
>>>
>>> I don't think it really makes sense to conflate NAPI and interrupt
>>> moderation parameters.  This really ought to be added to NAPI itself.
>>>
>>> (NAPI contexts really ought to be exposed through sysfs somehow.  I
>>> think we've discussed this before, and it's tricky due to the lack of a
>>> consistent mapping between those contexts and net devices.)
>>>
>>> Ben.
>> All NAPI does is move things from a hard interrupt to a soft interrupt
>> in the case of TX cleanup.  If it wasn't for NAPI we would be calling
>> ixgbe_clean_tx_irq directly from the interrupt handler and would still
>> be using the same limiting value.  This is why placing it here makes sense.
> But tx_max_coalesced_frames_irq is not supposed to be a work limit (and
> such a work limit doesn't seem useful in the absence of NAPI).  As I
> understand it, it is supposed to be an alternate moderation value for
> the hardware to use if a frame is sent while the IRQ handler is running.
>
> Ben.
The fact is ixgbe has been using this parameter this way for over 2 
years now and the main goal of this patch was just to synchronize how 
things work on igb and ixgbe.

Our hardware doesn't have a mechanism for firing an interrupt after X 
number of frames so instead we simply have modified things so that we 
will only process X number of frames and then fire another 
interrupt/poll if needed.  As such we aren't that far out of compliance 
with the meaning of how this parameter is supposed to be used.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 16:05       ` Ben Hutchings
  2011-09-19 16:32         ` Alexander Duyck
@ 2011-09-19 20:56         ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2011-09-19 20:56 UTC (permalink / raw)
  To: bhutchings; +Cc: alexander.h.duyck, jeffrey.t.kirsher, netdev, gospo

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 19 Sep 2011 17:05:52 +0100

> But tx_max_coalesced_frames_irq is not supposed to be a work limit (and
> such a work limit doesn't seem useful in the absence of NAPI).  As I
> understand it, it is supposed to be an alternate moderation value for
> the hardware to use if a frame is sent while the IRQ handler is running.

Exactly, the ethtool settings modify what the hardware interrupt mechanisms
do, ragardless of whether those hardware interrupts trigger NAPI or not.

And this is precisely what we want, because optimal behavior of NAPI
absoulutely depends upon having the interrupt moderated in hardware
at least a little bit.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-17  8:04 ` [net-next 11/13] igb: Make Tx budget for NAPI user adjustable Jeff Kirsher
  2011-09-17 17:04   ` Ben Hutchings
@ 2011-09-19 20:57   ` David Miller
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2011-09-19 20:57 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: alexander.h.duyck, netdev, gospo

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Sat, 17 Sep 2011 01:04:35 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This change is meant to make the NAPI budget limits for transmit
> adjustable.  By doing this it is possible to tune the value for optimal
> performance with applications such as routing.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Tested-by:  Aaron Brown  <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

As Ben and I have mentioned, this change is not acceptable.

Ethtool settings for "hardware interrupt" coalescing and mitigation
should have no effect on NAPI behavior.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 16:32         ` Alexander Duyck
@ 2011-09-19 21:00           ` David Miller
  2011-09-19 22:27             ` Alexander Duyck
  0 siblings, 1 reply; 36+ messages in thread
From: David Miller @ 2011-09-19 21:00 UTC (permalink / raw)
  To: alexander.h.duyck; +Cc: bhutchings, jeffrey.t.kirsher, netdev, gospo

From: Alexander Duyck <alexander.h.duyck@intel.com>
Date: Mon, 19 Sep 2011 09:32:18 -0700

> The fact is ixgbe has been using this parameter this way for over 2
> years now and the main goal of this patch was just to synchronize how
> things work on igb and ixgbe.
> 
> Our hardware doesn't have a mechanism for firing an interrupt after X
> number of frames so instead we simply have modified things so that we
> will only process X number of frames and then fire another
> interrupt/poll if needed.  As such we aren't that far out of
> compliance with the meaning of how this parameter is supposed to be
> used.

All I can say is this was a huge mistake you therefore need to revert
the IXGBE change, these ethtool settings are not for changing NAPI or
software interrupt behavior.

And if you guys plan to be difficult on this and refuse to remove the
IXGBE bits, I'm letting you guys know ahead of time that I'll do it
for you.

If the hardware can't support this facility, neither should these
ethtool hooks, because the whole point is to avoid hardware interrupts
from firing using these parameters.

Propose new mechanisms to control NAPI behavior if you want.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 01/13] ixgb: eliminate checkstack warnings
  2011-09-17  8:43   ` Joe Perches
@ 2011-09-19 21:36     ` Jesse Brandeburg
  2011-09-19 22:28       ` [PATCH] intel: Convert <FOO>_LENGTH_OF_ADDRESS to ETH_ALEN Joe Perches
  0 siblings, 1 reply; 36+ messages in thread
From: Jesse Brandeburg @ 2011-09-19 21:36 UTC (permalink / raw)
  To: Joe Perches; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo

On Sat, 17 Sep 2011 01:43:26 -0700
Joe Perches <joe@perches.com> wrote:

> On Sat, 2011-09-17 at 01:04 -0700, Jeff Kirsher wrote:
> > From: Jesse Brandeburg <jesse.brandeburg@intel.com>
> > Really trivial fix, use kzalloc/kree instead of stack space.
> 
> Some more trivialities...
> 
> > diff --git a/drivers/net/ethernet/intel/ixgb/ixgb_main.c
> > b/drivers/net/ethernet/intel/ixgb/ixgb_main.c\
> []
> > @@ -1120,8 +1120,12 @@ ixgb_set_multi(struct net_device *netdev)
> >  		rctl |= IXGB_RCTL_MPE;
> >  		IXGB_WRITE_REG(hw, RCTL, rctl);
> >  	} else {
> > -		u8 mta[IXGB_MAX_NUM_MULTICAST_ADDRESSES *
> > -			    IXGB_ETH_LENGTH_OF_ADDRESS];
> > +		u8 *mta = kzalloc(IXGB_MAX_NUM_MULTICAST_ADDRESSES
> > *
> > +			      IXGB_ETH_LENGTH_OF_ADDRESS,
> > GFP_KERNEL);
> 
> This doesn't need to be kzalloc as every byte is overwritten.
> It should be kmalloc.

done, V2 on its way

> Maybe delete the #define IXGB_ETH_LENGTH_OF_ADDRESS and
> sed 's/\bIXGB_ETH_LENGTH_OF_ADDRESS\b/ETH_ALEN/g' ?

done

 
> Perhaps this loop could be clearer without the multiply:
> 
> 		i = 0;
> 		netdev_for_each_mc_addr(ha, netdev)
> 			memcpy(&mta[i++ * IXGB_ETH_LENGTH_OF_ADDRESS],
> 			       ha->addr, IXGB_ETH_LENGTH_OF_ADDRESS);
> 
> Perhaps:
> 
> 		u8 *addr = mta;
> 		netdev_for_each_mc_addr(ha, netdev) {
> 			memcpy(addr, ha->addr, ETH_ALEN);
> 			addr += ETH_ALEN;
> 		}

done, but because of the nature of the changes being code flow, I'm
going to retest through our lab.  V2 will hopefully be at the list
shortly.

Thanks for the feedback,
 Jesse

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 21:00           ` David Miller
@ 2011-09-19 22:27             ` Alexander Duyck
  2011-09-19 23:36               ` Jeff Kirsher
  0 siblings, 1 reply; 36+ messages in thread
From: Alexander Duyck @ 2011-09-19 22:27 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings, jeffrey.t.kirsher, netdev, gospo

On 09/19/2011 02:00 PM, David Miller wrote:
> From: Alexander Duyck<alexander.h.duyck@intel.com>
> Date: Mon, 19 Sep 2011 09:32:18 -0700
>> The fact is ixgbe has been using this parameter this way for over 2
>> years now and the main goal of this patch was just to synchronize how
>> things work on igb and ixgbe.
>>
>> Our hardware doesn't have a mechanism for firing an interrupt after X
>> number of frames so instead we simply have modified things so that we
>> will only process X number of frames and then fire another
>> interrupt/poll if needed.  As such we aren't that far out of
>> compliance with the meaning of how this parameter is supposed to be
>> used.
> All I can say is this was a huge mistake you therefore need to revert
> the IXGBE change, these ethtool settings are not for changing NAPI or
> software interrupt behavior.
>
> And if you guys plan to be difficult on this and refuse to remove the
> IXGBE bits, I'm letting you guys know ahead of time that I'll do it
> for you.
>
> If the hardware can't support this facility, neither should these
> ethtool hooks, because the whole point is to avoid hardware interrupts
> from firing using these parameters.
>
> Propose new mechanisms to control NAPI behavior if you want.
I'll remove the ixgbe code if that is what you want.  It may be a month 
or so before I can get to it though since I am slammed with work so if 
you are in a hurry for it you might want to work with Jeff Kirsher to 
have the code removed.

As far as this current patch goes I honestly don't have the time to add 
or rewrite yet another ethtool interface so I will probably just see 
about dropping the ethtool portion of this patch and update the 
description in order to make it acceptable.

Thanks,

Alex

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] intel: Convert <FOO>_LENGTH_OF_ADDRESS to ETH_ALEN
  2011-09-19 21:36     ` Jesse Brandeburg
@ 2011-09-19 22:28       ` Joe Perches
  2011-09-19 23:33         ` Jeff Kirsher
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Perches @ 2011-09-19 22:28 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Kirsher, Jeffrey T, davem, netdev, gospo

Use the normal #defines not module specific ones.

Signed-off-by: Joe Perches <joe@perches.com>
---
Here are the other, not ixgb uses of LENGTH_OF_ADDRESS
converted to ETH_ALEN.  Compile tested allyesconfig only.

 drivers/net/ethernet/intel/e1000/e1000_hw.h    |    1 -
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |    2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h  |    8 +++-----
 drivers/net/ethernet/intel/ixgbevf/defines.h   |    1 -
 drivers/net/ethernet/intel/ixgbevf/vf.c        |    4 ++--
 5 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_hw.h b/drivers/net/ethernet/intel/e1000/e1000_hw.h
index 5c9a840..cf7e3c0 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_hw.h
+++ b/drivers/net/ethernet/intel/e1000/e1000_hw.h
@@ -448,7 +448,6 @@ void e1000_io_write(struct e1000_hw *hw, unsigned long port, u32 value);
 #define E1000_DEV_ID_INTEL_CE4100_GBE    0x2E6E
 
 #define NODE_ADDRESS_SIZE 6
-#define ETH_LENGTH_OF_ADDRESS 6
 
 /* MAC decode size is 128K - This is the size of BAR0 */
 #define MAC_DECODE_SIZE (128 * 1024)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index d99d01e..29e092c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -363,7 +363,7 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 
 		/* reply to reset with ack and vf mac address */
 		msgbuf[0] = IXGBE_VF_RESET | IXGBE_VT_MSGTYPE_ACK;
-		memcpy(new_mac, vf_mac, IXGBE_ETH_LENGTH_OF_ADDRESS);
+		memcpy(new_mac, vf_mac, ETH_ALEN);
 		/*
 		 * Piggyback the multicast filter type so VF can compute the
 		 * correct vectors
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 9f618ee..ef34606 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -1699,8 +1699,6 @@ enum {
 #define IXGBE_NVM_POLL_WRITE       1  /* Flag for polling for write complete */
 #define IXGBE_NVM_POLL_READ        0  /* Flag for polling for read complete */
 
-#define IXGBE_ETH_LENGTH_OF_ADDRESS   6
-
 #define IXGBE_EEPROM_PAGE_SIZE_MAX       128
 #define IXGBE_EEPROM_RD_BUFFER_MAX_COUNT 512 /* EEPROM words # read in burst */
 #define IXGBE_EEPROM_WR_BUFFER_MAX_COUNT 256 /* EEPROM words # wr in burst */
@@ -2737,9 +2735,9 @@ struct ixgbe_eeprom_info {
 struct ixgbe_mac_info {
 	struct ixgbe_mac_operations     ops;
 	enum ixgbe_mac_type             type;
-	u8                              addr[IXGBE_ETH_LENGTH_OF_ADDRESS];
-	u8                              perm_addr[IXGBE_ETH_LENGTH_OF_ADDRESS];
-	u8                              san_addr[IXGBE_ETH_LENGTH_OF_ADDRESS];
+	u8                              addr[ETH_ALEN];
+	u8                              perm_addr[ETH_ALEN];
+	u8                              san_addr[ETH_ALEN];
 	/* prefix for World Wide Node Name (WWNN) */
 	u16                             wwnn_prefix;
 	/* prefix for World Wide Port Name (WWPN) */
diff --git a/drivers/net/ethernet/intel/ixgbevf/defines.h b/drivers/net/ethernet/intel/ixgbevf/defines.h
index 78abb6f..2eb89cb 100644
--- a/drivers/net/ethernet/intel/ixgbevf/defines.h
+++ b/drivers/net/ethernet/intel/ixgbevf/defines.h
@@ -35,7 +35,6 @@
 #define IXGBE_VF_IRQ_CLEAR_MASK         7
 #define IXGBE_VF_MAX_TX_QUEUES          1
 #define IXGBE_VF_MAX_RX_QUEUES          1
-#define IXGBE_ETH_LENGTH_OF_ADDRESS     6
 
 /* Link speed */
 typedef u32 ixgbe_link_speed;
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index aa3682e..21533e3 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -108,7 +108,7 @@ static s32 ixgbevf_reset_hw_vf(struct ixgbe_hw *hw)
 	if (msgbuf[0] != (IXGBE_VF_RESET | IXGBE_VT_MSGTYPE_ACK))
 		return IXGBE_ERR_INVALID_MAC_ADDR;
 
-	memcpy(hw->mac.perm_addr, addr, IXGBE_ETH_LENGTH_OF_ADDRESS);
+	memcpy(hw->mac.perm_addr, addr, ETH_ALEN);
 	hw->mac.mc_filter_type = msgbuf[IXGBE_VF_MC_TYPE_WORD];
 
 	return 0;
@@ -211,7 +211,7 @@ static s32 ixgbevf_mta_vector(struct ixgbe_hw *hw, u8 *mc_addr)
  **/
 static s32 ixgbevf_get_mac_addr_vf(struct ixgbe_hw *hw, u8 *mac_addr)
 {
-	memcpy(mac_addr, hw->mac.perm_addr, IXGBE_ETH_LENGTH_OF_ADDRESS);
+	memcpy(mac_addr, hw->mac.perm_addr, ETH_ALEN);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH] intel: Convert <FOO>_LENGTH_OF_ADDRESS to ETH_ALEN
  2011-09-19 22:28       ` [PATCH] intel: Convert <FOO>_LENGTH_OF_ADDRESS to ETH_ALEN Joe Perches
@ 2011-09-19 23:33         ` Jeff Kirsher
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-19 23:33 UTC (permalink / raw)
  To: Joe Perches; +Cc: Brandeburg, Jesse, davem, netdev, gospo

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]

On Mon, 2011-09-19 at 15:28 -0700, Joe Perches wrote:
> Use the normal #defines not module specific ones.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
> Here are the other, not ixgb uses of LENGTH_OF_ADDRESS
> converted to ETH_ALEN.  Compile tested allyesconfig only.
> 
>  drivers/net/ethernet/intel/e1000/e1000_hw.h    |    1 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |    2 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_type.h  |    8 +++-----
>  drivers/net/ethernet/intel/ixgbevf/defines.h   |    1 -
>  drivers/net/ethernet/intel/ixgbevf/vf.c        |    4 ++--
>  5 files changed, 6 insertions(+), 10 deletions(-) 

Thanks Joe.  I have added the patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 22:27             ` Alexander Duyck
@ 2011-09-19 23:36               ` Jeff Kirsher
  2011-09-19 23:42                 ` Stephen Hemminger
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Kirsher @ 2011-09-19 23:36 UTC (permalink / raw)
  To: Duyck, Alexander H; +Cc: David Miller, bhutchings, netdev, gospo

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

On Mon, 2011-09-19 at 15:27 -0700, Duyck, Alexander H wrote:
> On 09/19/2011 02:00 PM, David Miller wrote:
> > From: Alexander Duyck<alexander.h.duyck@intel.com>
> > Date: Mon, 19 Sep 2011 09:32:18 -0700
> >> The fact is ixgbe has been using this parameter this way for over 2
> >> years now and the main goal of this patch was just to synchronize how
> >> things work on igb and ixgbe.
> >>
> >> Our hardware doesn't have a mechanism for firing an interrupt after X
> >> number of frames so instead we simply have modified things so that we
> >> will only process X number of frames and then fire another
> >> interrupt/poll if needed.  As such we aren't that far out of
> >> compliance with the meaning of how this parameter is supposed to be
> >> used.
> > All I can say is this was a huge mistake you therefore need to revert
> > the IXGBE change, these ethtool settings are not for changing NAPI or
> > software interrupt behavior.
> >
> > And if you guys plan to be difficult on this and refuse to remove the
> > IXGBE bits, I'm letting you guys know ahead of time that I'll do it
> > for you.
> >
> > If the hardware can't support this facility, neither should these
> > ethtool hooks, because the whole point is to avoid hardware interrupts
> > from firing using these parameters.
> >
> > Propose new mechanisms to control NAPI behavior if you want.
> I'll remove the ixgbe code if that is what you want.  It may be a month 
> or so before I can get to it though since I am slammed with work so if 
> you are in a hurry for it you might want to work with Jeff Kirsher to 
> have the code removed.

Alex- I will work on this to resolve the issues the Ben and Dave have
pointed out.

> 
> As far as this current patch goes I honestly don't have the time to add 
> or rewrite yet another ethtool interface so I will probably just see 
> about dropping the ethtool portion of this patch and update the 
> description in order to make it acceptable.
> 
> Thanks,
> 
> Alex



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 23:36               ` Jeff Kirsher
@ 2011-09-19 23:42                 ` Stephen Hemminger
  2011-09-19 23:47                   ` David Miller
  2011-09-20  0:10                   ` Ben Hutchings
  0 siblings, 2 replies; 36+ messages in thread
From: Stephen Hemminger @ 2011-09-19 23:42 UTC (permalink / raw)
  To: jeffrey t kirsher
  Cc: David Miller, bhutchings, netdev, gospo, Alexander H Duyck

I would like to see a general solution to allow configuring
napi weight. The Rx weight isn't easily configurable either.
Probably needs to be through ethtool callback since actual value range
and dev -> napi relationship is device specific.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 23:42                 ` Stephen Hemminger
@ 2011-09-19 23:47                   ` David Miller
  2011-09-20  0:10                   ` Ben Hutchings
  1 sibling, 0 replies; 36+ messages in thread
From: David Miller @ 2011-09-19 23:47 UTC (permalink / raw)
  To: stephen.hemminger
  Cc: jeffrey.t.kirsher, bhutchings, netdev, gospo, alexander.h.duyck

From: Stephen Hemminger <stephen.hemminger@vyatta.com>
Date: Mon, 19 Sep 2011 16:42:31 -0700 (PDT)

> I would like to see a general solution to allow configuring
> napi weight. The Rx weight isn't easily configurable either.
> Probably needs to be through ethtool callback since actual value range
> and dev -> napi relationship is device specific.

Agreed, it probably has to be per-queue too.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-19 23:42                 ` Stephen Hemminger
  2011-09-19 23:47                   ` David Miller
@ 2011-09-20  0:10                   ` Ben Hutchings
  2011-09-20 18:59                     ` Andy Gospodarek
  1 sibling, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-09-20  0:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: jeffrey t kirsher, David Miller, netdev, gospo, Alexander H Duyck

On Mon, 2011-09-19 at 16:42 -0700, Stephen Hemminger wrote:
> I would like to see a general solution to allow configuring
> napi weight. The Rx weight isn't easily configurable either.

Indeed.

> Probably needs to be through ethtool callback since actual value range
> and dev -> napi relationship is device specific.

The maximum meaningful value is device specific but I'm not sure that
really matters.

And as David said it's really a many-to-one mapping of queue -> NAPI.
At netconf we talked about having 'irq' as an attribute of each queue
but maybe we should expose NAPI contexts through sysfs and make queues
refer to them instead.  NAPI contexts would be named (in the same way as
the corresponding IRQ handlers) and have irq, weight, etc.

(Still short of time to work on this myself, alas.)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-20  0:10                   ` Ben Hutchings
@ 2011-09-20 18:59                     ` Andy Gospodarek
  2011-09-20 20:23                       ` Neil Horman
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Gospodarek @ 2011-09-20 18:59 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Stephen Hemminger, jeffrey t kirsher, David Miller, netdev,
	gospo, Alexander H Duyck, nhorman

On Tue, Sep 20, 2011 at 01:10:01AM +0100, Ben Hutchings wrote:
> On Mon, 2011-09-19 at 16:42 -0700, Stephen Hemminger wrote:
> > I would like to see a general solution to allow configuring
> > napi weight. The Rx weight isn't easily configurable either.
> 
> Indeed.
> 
> > Probably needs to be through ethtool callback since actual value range
> > and dev -> napi relationship is device specific.
> 
> The maximum meaningful value is device specific but I'm not sure that
> really matters.
> 
> And as David said it's really a many-to-one mapping of queue -> NAPI.
> At netconf we talked about having 'irq' as an attribute of each queue
> but maybe we should expose NAPI contexts through sysfs and make queues
> refer to them instead.  NAPI contexts would be named (in the same way as
> the corresponding IRQ handlers) and have irq, weight, etc.
> 
> (Still short of time to work on this myself, alas.)
> 

I've been having a similar discussion with Neil Horman about how we can
better control all sorts of device values (interrupts, number of queues,
queue to node mapping, etc.).  Some of this is based on what I would
like to see and some is from Stephen's talk at LPC two weeks ago.  I
think the sysfs work Neil has done and posted to lkml can easily be
expanded to allow enhanced configuration of each device.  Having napi
weight in there too seems like a reasonable addition to this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-20 18:59                     ` Andy Gospodarek
@ 2011-09-20 20:23                       ` Neil Horman
  2011-09-27 23:45                         ` Ben Hutchings
  0 siblings, 1 reply; 36+ messages in thread
From: Neil Horman @ 2011-09-20 20:23 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Ben Hutchings, Stephen Hemminger, jeffrey t kirsher,
	David Miller, netdev, gospo, Alexander H Duyck

On Tue, Sep 20, 2011 at 02:59:18PM -0400, Andy Gospodarek wrote:
> On Tue, Sep 20, 2011 at 01:10:01AM +0100, Ben Hutchings wrote:
> > On Mon, 2011-09-19 at 16:42 -0700, Stephen Hemminger wrote:
> > > I would like to see a general solution to allow configuring
> > > napi weight. The Rx weight isn't easily configurable either.
> > 
> > Indeed.
> > 
> > > Probably needs to be through ethtool callback since actual value range
> > > and dev -> napi relationship is device specific.
> > 
> > The maximum meaningful value is device specific but I'm not sure that
> > really matters.
> > 
> > And as David said it's really a many-to-one mapping of queue -> NAPI.
> > At netconf we talked about having 'irq' as an attribute of each queue
> > but maybe we should expose NAPI contexts through sysfs and make queues
> > refer to them instead.  NAPI contexts would be named (in the same way as
> > the corresponding IRQ handlers) and have irq, weight, etc.
> > 
> > (Still short of time to work on this myself, alas.)
> > 
> 
> I've been having a similar discussion with Neil Horman about how we can
> better control all sorts of device values (interrupts, number of queues,
> queue to node mapping, etc.).  Some of this is based on what I would
> like to see and some is from Stephen's talk at LPC two weeks ago.  I
> think the sysfs work Neil has done and posted to lkml can easily be
> expanded to allow enhanced configuration of each device.  Having napi
> weight in there too seems like a reasonable addition to this.
> 
> 


This is the work Andy is referring to for those interested:
http://marc.info/?l=linux-kernel&m=131644727521409&w=2

This version has Gregs Ack, and is waiting for an Ack from Jesse Barnes at the
moment.  I think Andy's probably right, theres room here for expansion to create
a relationship between a given interrupt and a napi wieght.  I expect what would
be most direct would be adding a napi_weight attribute that was conditional on
the class of the pci device allocating the irqs (make it visible for class 0x200
devs, invisible for others).

Thoughts?
Neil

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-20 20:23                       ` Neil Horman
@ 2011-09-27 23:45                         ` Ben Hutchings
  2011-09-28 11:00                           ` Neil Horman
  0 siblings, 1 reply; 36+ messages in thread
From: Ben Hutchings @ 2011-09-27 23:45 UTC (permalink / raw)
  To: Neil Horman
  Cc: Andy Gospodarek, Stephen Hemminger, jeffrey t kirsher,
	David Miller, netdev, gospo, Alexander H Duyck

On Tue, 2011-09-20 at 16:23 -0400, Neil Horman wrote:
[...]
> This is the work Andy is referring to for those interested:
> http://marc.info/?l=linux-kernel&m=131644727521409&w=2
> 
> This version has Gregs Ack, and is waiting for an Ack from Jesse Barnes at the
> moment.

While I think it's useful to be able to list all IRQs assigned to a PCI
device, this doesn't tell us anything about the way they're associated
with queues.

> I think Andy's probably right, theres room here for expansion to create
> a relationship between a given interrupt and a napi wieght.  I expect what would
> be most direct would be adding a napi_weight attribute that was conditional on
> the class of the pci device allocating the irqs (make it visible for class 0x200
> devs, invisible for others).

That's a terrible idea; what has NAPI got to do with PCI devices?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-27 23:45                         ` Ben Hutchings
@ 2011-09-28 11:00                           ` Neil Horman
  2011-09-28 15:11                             ` Stephen Hemminger
  0 siblings, 1 reply; 36+ messages in thread
From: Neil Horman @ 2011-09-28 11:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Andy Gospodarek, Stephen Hemminger, jeffrey t kirsher,
	David Miller, netdev, gospo, Alexander H Duyck

On Wed, Sep 28, 2011 at 12:45:46AM +0100, Ben Hutchings wrote:
> On Tue, 2011-09-20 at 16:23 -0400, Neil Horman wrote:
> [...]
> > This is the work Andy is referring to for those interested:
> > http://marc.info/?l=linux-kernel&m=131644727521409&w=2
> > 
> > This version has Gregs Ack, and is waiting for an Ack from Jesse Barnes at the
> > moment.
> 
> While I think it's useful to be able to list all IRQs assigned to a PCI
> device, this doesn't tell us anything about the way they're associated
> with queues.
> 
I never claimed that it did.  This just lets us definitavely correlate msi irq
instances to pci devices, and their device class.  Preveously we were limited to
guessing what kind device an irq belonged to by doing hopeful string matches on
device names from proc/interrupts, something which was easily broken by a change
in driver naming practice, or an administrative device name change.

> > I think Andy's probably right, theres room here for expansion to create
> > a relationship between a given interrupt and a napi wieght.  I expect what would
> > be most direct would be adding a napi_weight attribute that was conditional on
> > the class of the pci device allocating the irqs (make it visible for class 0x200
> > devs, invisible for others).
> 
> That's a terrible idea; what has NAPI got to do with PCI devices?
> 
Nothing directly, but stop making up an implementation, and deciding based on
that the whole notion is stupid.  There are a few ways to do this, some of which
make sense.

I was thinking of something along the lines of two more attributes in
/sys/class/net/<if>/queues:
napi_weight
irq

The former is the napi weight of a given napi instance associated with a queue,
while the latter is a symlink either to ../device/irq or ../device/msi_irqs/<n>/
(or perhaps to ../devices/msi_irqs/<n>/irq if we want more consistency).   This
lets us tune the napi weight of a queue and know what interrupt is associated
with it.  That seems fairly sane to me.
Neil

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-28 11:00                           ` Neil Horman
@ 2011-09-28 15:11                             ` Stephen Hemminger
  2011-09-28 17:07                               ` Neil Horman
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen Hemminger @ 2011-09-28 15:11 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ben Hutchings, Andy Gospodarek, Stephen Hemminger,
	jeffrey t kirsher, David Miller, netdev, gospo,
	Alexander H Duyck

On Wed, 28 Sep 2011 07:00:55 -0400
Neil Horman <nhorman@tuxdriver.com> wrote:

> I was thinking of something along the lines of two more attributes in
> /sys/class/net/<if>/queues:
> napi_weight
> irq
> 
> The former is the napi weight of a given napi instance associated with a queue,
> while the latter is a symlink either to ../device/irq or ../device/msi_irqs/<n>/
> (or perhaps to ../devices/msi_irqs/<n>/irq if we want more consistency).   This
> lets us tune the napi weight of a queue and know what interrupt is associated
> with it.  That seems fairly sane to me.

This breaks for the case of some corner case devices like multi-port Marvell boards.
There can be a N to 1 or 1 to N relationship between NAPI and the device.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [net-next 11/13] igb: Make Tx budget for NAPI user adjustable
  2011-09-28 15:11                             ` Stephen Hemminger
@ 2011-09-28 17:07                               ` Neil Horman
  0 siblings, 0 replies; 36+ messages in thread
From: Neil Horman @ 2011-09-28 17:07 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ben Hutchings, Andy Gospodarek, Stephen Hemminger,
	jeffrey t kirsher, David Miller, netdev, gospo,
	Alexander H Duyck

On Wed, Sep 28, 2011 at 08:11:58AM -0700, Stephen Hemminger wrote:
> On Wed, 28 Sep 2011 07:00:55 -0400
> Neil Horman <nhorman@tuxdriver.com> wrote:
> 
> > I was thinking of something along the lines of two more attributes in
> > /sys/class/net/<if>/queues:
> > napi_weight
> > irq
> > 
> > The former is the napi weight of a given napi instance associated with a queue,
> > while the latter is a symlink either to ../device/irq or ../device/msi_irqs/<n>/
> > (or perhaps to ../devices/msi_irqs/<n>/irq if we want more consistency).   This
> > lets us tune the napi weight of a queue and know what interrupt is associated
> > with it.  That seems fairly sane to me.
> 
> This breaks for the case of some corner case devices like multi-port Marvell boards.
> There can be a N to 1 or 1 to N relationship between NAPI and the device.
> 

I don't see that it has to explicitly _break_ anything.  If a multiqueue device
uses a single napi instance to handle all queues, we can still create a napi
weight attribute for each queue, and simply let the device driver tie all the
weight sysfs objects to the same napi instance.  It would be odd for certain,
but doable.  In fact, since we'd have to get the driver involved in the creation
of such a per-queue napi weight attribute (since the driver is the only thing
with any knoweldge about which queue maps to which napi instance), we might be
able to explicitly export this information by allowing a single queue to hold
the napi wieght, and allowing the other queues sysfs symlinks back to that
object.  

I'm just spitballing here on implementation, but I don't think the idea is
broken.
Neil

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2011-09-28 17:07 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-17  8:04 [net-next 00/13][pull request] Intel Wired LAN Driver Updates Jeff Kirsher
2011-09-17  8:04 ` [net-next 01/13] ixgb: eliminate checkstack warnings Jeff Kirsher
2011-09-17  8:43   ` Joe Perches
2011-09-19 21:36     ` Jesse Brandeburg
2011-09-19 22:28       ` [PATCH] intel: Convert <FOO>_LENGTH_OF_ADDRESS to ETH_ALEN Joe Perches
2011-09-19 23:33         ` Jeff Kirsher
2011-09-17  8:04 ` [net-next 02/13] igb: Update RXDCTL/TXDCTL configurations Jeff Kirsher
2011-09-17  8:04 ` [net-next 03/13] igb: Update max_frame_size to account for an optional VLAN tag if present Jeff Kirsher
2011-09-17  8:04 ` [net-next 04/13] igb: drop support for single buffer mode Jeff Kirsher
2011-09-17  8:04 ` [net-next 05/13] igb: streamline Rx buffer allocation and cleanup Jeff Kirsher
2011-09-17  8:04 ` [net-next 06/13] igb: update ring and adapter structure to improve performance Jeff Kirsher
2011-09-17  8:04 ` [net-next 07/13] igb: Refactor clean_rx_irq to reduce overhead and " Jeff Kirsher
2011-09-17  8:04 ` [net-next 08/13] igb: drop the "adv" off function names relating to descriptors Jeff Kirsher
2011-09-17  8:04 ` [net-next 09/13] igb: Replace E1000_XX_DESC_ADV with IGB_XX_DESC Jeff Kirsher
2011-09-17  8:04 ` [net-next 10/13] igb: Remove multi_tx_table and simplify igb_xmit_frame Jeff Kirsher
2011-09-17  8:04 ` [net-next 11/13] igb: Make Tx budget for NAPI user adjustable Jeff Kirsher
2011-09-17 17:04   ` Ben Hutchings
2011-09-19 15:48     ` Alexander Duyck
2011-09-19 16:05       ` Ben Hutchings
2011-09-19 16:32         ` Alexander Duyck
2011-09-19 21:00           ` David Miller
2011-09-19 22:27             ` Alexander Duyck
2011-09-19 23:36               ` Jeff Kirsher
2011-09-19 23:42                 ` Stephen Hemminger
2011-09-19 23:47                   ` David Miller
2011-09-20  0:10                   ` Ben Hutchings
2011-09-20 18:59                     ` Andy Gospodarek
2011-09-20 20:23                       ` Neil Horman
2011-09-27 23:45                         ` Ben Hutchings
2011-09-28 11:00                           ` Neil Horman
2011-09-28 15:11                             ` Stephen Hemminger
2011-09-28 17:07                               ` Neil Horman
2011-09-19 20:56         ` David Miller
2011-09-19 20:57   ` David Miller
2011-09-17  8:04 ` [net-next 12/13] igb: split buffer_info into tx_buffer_info and rx_buffer_info Jeff Kirsher
2011-09-17  8:04 ` [net-next 13/13] igb: Consolidate creation of Tx context descriptors into a single function Jeff Kirsher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).