All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates
@ 2016-04-14 13:19 Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 01/14] i40e: Allow RSS Hash set with less than four parameters Harshitha Ramamurthy
                   ` (13 more replies)
  0 siblings, 14 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

Carolyn Wyborny allows for setting RSS Hash opts using less than four parameters
and adds message for unsupported MFP mode.

Catherine Sullivan renames a version macro.

Jesse Brandeburg fixes up 32 bit timespec references, refactors tunnel
interpretation and receive routine, removes unused hardware receive
descriptor code and adds code to test memory before ethtool allocation
succeeds. 

Mitch Williams adds code to report link speed, sets netdev carrier properly,
clears mac filter count on reset, enables adaptive interrupt throttling
and allocates RX buffers properly.

Shannon Nelson adds elements to fd filter compare.

 drivers/net/ethernet/intel/i40e/i40e.h             |  14 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |  30 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     | 259 +++---
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  78 +-
 drivers/net/ethernet/intel/i40e/i40e_ptp.c         |   5 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 964 ++++++++++-----------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |  69 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   1 +
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c      | 924 ++++++++++----------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |  69 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h         |   8 +-
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  78 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  54 +-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    |  64 +-
 14 files changed, 1264 insertions(+), 1353 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 01/14] i40e: Allow RSS Hash set with less than four parameters.
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 02/14] i40e: Fix up 32 bit timespec references Harshitha Ramamurthy
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

This patch allows using ethtool to set rss hash opts using less than
four parameters if desired. This patch also avoids code duplication
from RSS input set change code path.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Change-Id: I324bac8672178d5d9214583b039abe5476fdc377
---
Testing Hints : Test that rss hash changes
appropriately when changing hash input set for different traffic
flow types.  X710/XL710 device design means this change is global
and not per port as X550 and other devices.
More testing hints- (ethtool -N ethx rx-flow-hash ....)

 drivers/net/ethernet/intel/i40e/i40e.h         |   3 -
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 193 +++++++++++++++----------
 2 files changed, 115 insertions(+), 81 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 618a2a8..b5fcd9c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -595,9 +595,6 @@ struct i40e_vsi {
 
 	/* VSI specific handlers */
 	irqreturn_t (*irq_handler)(int irq, void *data);
-
-	/* current rxnfc data */
-	struct ethtool_rxnfc rxnfc; /* current rss hash opts */
 } ____cacheline_internodealigned_in_smp;
 
 struct i40e_netdev_priv {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 865c8db..ea8afde 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2066,41 +2066,72 @@ static int i40e_set_per_queue_coalesce(struct net_device *netdev, u32 queue,
  **/
 static int i40e_get_rss_hash_opts(struct i40e_pf *pf, struct ethtool_rxnfc *cmd)
 {
+	struct i40e_hw *hw = &pf->hw;
+	u8 flow_pctype = 0;
+	u64 i_set = 0;
+
 	cmd->data = 0;
 
-	if (pf->vsi[pf->lan_vsi]->rxnfc.data != 0) {
-		cmd->data = pf->vsi[pf->lan_vsi]->rxnfc.data;
-		cmd->flow_type = pf->vsi[pf->lan_vsi]->rxnfc.flow_type;
-		return 0;
-	}
-	/* Report default options for RSS on i40e */
 	switch (cmd->flow_type) {
 	case TCP_V4_FLOW:
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV4_TCP;
+		break;
 	case UDP_V4_FLOW:
-		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
-	/* fall through to add IP fields */
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV4_UDP;
+		break;
+	case TCP_V6_FLOW:
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV6_TCP;
+		break;
+	case UDP_V6_FLOW:
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV6_UDP;
+		break;
 	case SCTP_V4_FLOW:
 	case AH_ESP_V4_FLOW:
 	case AH_V4_FLOW:
 	case ESP_V4_FLOW:
 	case IPV4_FLOW:
-		cmd->data |= RXH_IP_SRC | RXH_IP_DST;
-		break;
-	case TCP_V6_FLOW:
-	case UDP_V6_FLOW:
-		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
-	/* fall through to add IP fields */
 	case SCTP_V6_FLOW:
 	case AH_ESP_V6_FLOW:
 	case AH_V6_FLOW:
 	case ESP_V6_FLOW:
 	case IPV6_FLOW:
+		/* Default is src/dest for IP, no matter the L4 hashing */
 		cmd->data |= RXH_IP_SRC | RXH_IP_DST;
 		break;
 	default:
 		return -EINVAL;
 	}
 
+	/* Read flow based hash input set register */
+	if (flow_pctype) {
+		i_set = (u64)i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(0,
+					      flow_pctype)) |
+			((u64)i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(1,
+					       flow_pctype)) << 32);
+	}
+
+	/* Process bits of hash input set */
+	if (i_set) {
+		if (i_set & BIT_ULL(I40E_L4_SRC_SHIFT))
+			cmd->data |= RXH_L4_B_0_1;
+		if (i_set & BIT_ULL(I40E_L4_DST_SHIFT))
+			cmd->data |= RXH_L4_B_2_3;
+
+		if (cmd->flow_type == TCP_V4_FLOW ||
+		    cmd->flow_type == UDP_V4_FLOW) {
+			if (i_set & BIT_ULL(I40E_L3_SRC_SHIFT))
+				cmd->data |= RXH_IP_SRC;
+			if (i_set & BIT_ULL(I40E_L3_DST_SHIFT))
+				cmd->data |= RXH_IP_DST;
+		} else if (cmd->flow_type == TCP_V6_FLOW ||
+			  cmd->flow_type == UDP_V6_FLOW) {
+			if (i_set & BIT_ULL(I40E_L3_V6_SRC_SHIFT))
+				cmd->data |= RXH_IP_SRC;
+			if (i_set & BIT_ULL(I40E_L3_V6_DST_SHIFT))
+				cmd->data |= RXH_IP_DST;
+		}
+	}
+
 	return 0;
 }
 
@@ -2243,6 +2274,36 @@ static int i40e_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 }
 
 /**
+ * i40e_get_rss_hash_bits - Read RSS Hash bits from register
+ * @nfc: pointer to user request
+ * @i_setc bits currently set
+ *
+ * Returns value of bits to be set per user request
+ **/
+static u64 i40e_get_rss_hash_bits(struct ethtool_rxnfc *nfc, u64 i_setc)
+{
+	u64 i_set = i_setc;
+
+		if (nfc->data & RXH_L4_B_0_1)
+			i_set |= BIT_ULL(I40E_L4_SRC_SHIFT);
+		else
+			i_set &= ~BIT_ULL(I40E_L4_SRC_SHIFT);
+		if (nfc->data & RXH_L4_B_2_3)
+			i_set |= BIT_ULL(I40E_L4_DST_SHIFT);
+		else
+			i_set &= ~BIT_ULL(I40E_L4_DST_SHIFT);
+		if (nfc->data & RXH_IP_SRC)
+			i_set |= BIT_ULL(I40E_L3_SRC_SHIFT);
+		else
+			i_set &= ~BIT_ULL(I40E_L3_SRC_SHIFT);
+		if (nfc->data & RXH_IP_DST)
+			i_set |= BIT_ULL(I40E_L3_DST_SHIFT);
+		else
+			i_set &= ~BIT_ULL(I40E_L3_DST_SHIFT);
+	return i_set;
+}
+
+/**
  * i40e_set_rss_hash_opt - Enable/Disable flow types for RSS hash
  * @pf: pointer to the physical function struct
  * @cmd: ethtool rxnfc command
@@ -2254,6 +2315,8 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc)
 	struct i40e_hw *hw = &pf->hw;
 	u64 hena = (u64)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(0)) |
 		   ((u64)i40e_read_rx_ctl(hw, I40E_PFQF_HENA(1)) << 32);
+	u8 flow_pctype = 0;
+	u64 i_set, i_setc;
 
 	/* RSS does not support anything other than hashing
 	 * to queues on src and dst IPs and ports
@@ -2262,75 +2325,39 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc)
 			  RXH_L4_B_0_1 | RXH_L4_B_2_3))
 		return -EINVAL;
 
-	/* We need@least the IP SRC and DEST fields for hashing */
-	if (!(nfc->data & RXH_IP_SRC) ||
-	    !(nfc->data & RXH_IP_DST))
-		return -EINVAL;
-
 	switch (nfc->flow_type) {
 	case TCP_V4_FLOW:
-		switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
-		case 0:
-			return -EINVAL;
-		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
-			if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
-				hena |=
-			   BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
-
-			hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP);
-			break;
-		default:
-			return -EINVAL;
-		}
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV4_TCP;
+		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+			hena |=
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
 		break;
 	case TCP_V6_FLOW:
-		switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
-		case 0:
-			return -EINVAL;
-		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
-			if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
-				hena |=
-			   BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
-
-			hena |= BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP);
-			break;
-		default:
-			return -EINVAL;
-		}
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV6_TCP;
+		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+			hena |=
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_TCP_SYN_NO_ACK);
+		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+			hena |=
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_TCP_SYN_NO_ACK);
 		break;
 	case UDP_V4_FLOW:
-		switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
-		case 0:
-			return -EINVAL;
-		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
-			if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
-				hena |=
-			    BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) |
-			    BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
-
-			hena |= (BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV4_UDP) |
-				 BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV4));
-			break;
-		default:
-			return -EINVAL;
-		}
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV4_UDP;
+		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+			hena |=
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV4_UDP) |
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV4_UDP);
+
+		hena |= BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV4);
 		break;
 	case UDP_V6_FLOW:
-		switch (nfc->data & (RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
-		case 0:
-			return -EINVAL;
-		case (RXH_L4_B_0_1 | RXH_L4_B_2_3):
-			if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
-				hena |=
-			    BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) |
-			    BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
-
-			hena |= (BIT_ULL(I40E_FILTER_PCTYPE_NONF_IPV6_UDP) |
-				 BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV6));
-			break;
-		default:
-			return -EINVAL;
-		}
+		flow_pctype = I40E_FILTER_PCTYPE_NONF_IPV6_UDP;
+		if (pf->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE)
+			hena |=
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) |
+			  BIT_ULL(I40E_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP);
+
+		hena |= BIT_ULL(I40E_FILTER_PCTYPE_FRAG_IPV6);
 		break;
 	case AH_ESP_V4_FLOW:
 	case AH_V4_FLOW:
@@ -2362,13 +2389,23 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc)
 		return -EINVAL;
 	}
 
+	if (flow_pctype) {
+		i_setc = (u64)i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(0,
+					       flow_pctype)) |
+			((u64)i40e_read_rx_ctl(hw, I40E_GLQF_HASH_INSET(1,
+					       flow_pctype)) << 32);
+		i_set = i40e_get_rss_hash_bits(nfc, i_setc);
+		i40e_write_rx_ctl(hw, I40E_GLQF_HASH_INSET(0, flow_pctype),
+				  (u32)i_set);
+		i40e_write_rx_ctl(hw, I40E_GLQF_HASH_INSET(1, flow_pctype),
+				  (u32)(i_set >> 32));
+		hena |= BIT_ULL(flow_pctype);
+	}
+
 	i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (u32)hena);
 	i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (u32)(hena >> 32));
 	i40e_flush(hw);
 
-	/* Save setting for future output/update */
-	pf->vsi[pf->lan_vsi]->rxnfc = *nfc;
-
 	return 0;
 }
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 02/14] i40e: Fix up 32 bit timespec references
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 01/14] i40e: Allow RSS Hash set with less than four parameters Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 03/14] i40e: Add elements to fd filter compare Harshitha Ramamurthy
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

As it turns out there was only a small set of errors
on 32 bit, and we just needed to be using the right calls
for dealing with timespec64 variables.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Change-Id: I307345bf658b34a1d573f4f4263fe9ba2fa72051
---
Testing Hints : build tests, test PTP works.

 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index a1b878a..a5a6d2d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -158,14 +158,13 @@ static int i40e_ptp_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
 static int i40e_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
 {
 	struct i40e_pf *pf = container_of(ptp, struct i40e_pf, ptp_caps);
-	struct timespec64 now, then;
+	struct timespec64 now;
 	unsigned long flags;
 
-	then = ns_to_timespec64(delta);
 	spin_lock_irqsave(&pf->tmreg_lock, flags);
 
 	i40e_ptp_read(pf, &now);
-	now = timespec64_add(now, then);
+	timespec64_add_ns(&now, delta);
 	i40e_ptp_write(pf, (const struct timespec64 *)&now);
 
 	spin_unlock_irqrestore(&pf->tmreg_lock, flags);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 03/14] i40e: Add elements to fd filter compare
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 01/14] i40e: Allow RSS Hash set with less than four parameters Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 02/14] i40e: Fix up 32 bit timespec references Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 04/14] i40evf: Report link speed Harshitha Ramamurthy
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Shannon Nelson <shannon.nelson@intel.com>

Make sure we're comparing enough elements between filters to
know whether they are really the same.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Change-Id: Ibbf097906a4a8c4fe8d973e3092a8fcbd09ce6ff
---
Testing Hints :
	Add a flow director filter and then print the content
	Add the same filter, but with a different queue
	Print the output and make sure it reflects the new queue
	Make sure the traffic goes to the new queue

 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index ea8afde..3409702 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2422,7 +2422,11 @@ static bool i40e_match_fdir_input_set(struct i40e_fdir_filter *rule,
 	if ((rule->dst_ip[0] != input->dst_ip[0]) ||
 	    (rule->src_ip[0] != input->src_ip[0]) ||
 	    (rule->dst_port != input->dst_port) ||
-	    (rule->src_port != input->src_port))
+	    (rule->src_port != input->src_port) ||
+	    (rule->flow_type != input->flow_type) ||
+	    (rule->ip4_proto != input->ip4_proto) ||
+	    (rule->sctp_v_tag != input->sctp_v_tag) ||
+	    (rule->q_index != input->q_index))
 		return false;
 	return true;
 }
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 04/14] i40evf: Report link speed
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (2 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 03/14] i40e: Add elements to fd filter compare Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 05/14] i40evf: Set netdev carrier properly Harshitha Ramamurthy
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Mitch Williams <mitch.a.williams@intel.com>

The PF driver tells us the link speed, so do something with that
information. Add link speed to log messages, and report speed through
ethtool.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Change-Id: I279dc9540cc5203376406050a3e8d67e128d5882
---
Testing Hints : Test reporting of link up/down and
various speeds

 drivers/net/ethernet/intel/i40evf/i40evf.h         |  1 +
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c | 26 ++++++++--
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    | 55 ++++++++++++++++++----
 3 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index 3b8075f..e8dee48 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -261,6 +261,7 @@ struct i40evf_adapter {
 	struct work_struct watchdog_task;
 	bool netdev_registered;
 	bool link_up;
+	enum i40e_aq_link_speed link_speed;
 	enum i40e_virtchnl_ops current_op;
 #define CLIENT_ENABLED(_a) ((_a)->vf_res ? \
 			    (_a)->vf_res->vf_offload_flags & \
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 3a95c7c..179fa6a 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -80,13 +80,33 @@ static const char i40evf_priv_flags_strings[][ETH_GSTRING_LEN] = {
 static int i40evf_get_settings(struct net_device *netdev,
 			       struct ethtool_cmd *ecmd)
 {
-	/* In the future the VF will be able to query the PF for
-	 * some information - for now use a dummy value
-	 */
+	struct i40evf_adapter *adapter = netdev_priv(netdev);
+
 	ecmd->supported = 0;
 	ecmd->autoneg = AUTONEG_DISABLE;
 	ecmd->transceiver = XCVR_DUMMY1;
 	ecmd->port = PORT_NONE;
+	/* Set speed and duplex */
+	switch (adapter->link_speed) {
+	case I40E_LINK_SPEED_40GB:
+		ethtool_cmd_speed_set(ecmd, SPEED_40000);
+		break;
+	case I40E_LINK_SPEED_20GB:
+		ethtool_cmd_speed_set(ecmd, SPEED_20000);
+		break;
+	case I40E_LINK_SPEED_10GB:
+		ethtool_cmd_speed_set(ecmd, SPEED_10000);
+		break;
+	case I40E_LINK_SPEED_1GB:
+		ethtool_cmd_speed_set(ecmd, SPEED_1000);
+		break;
+	case I40E_LINK_SPEED_100MB:
+		ethtool_cmd_speed_set(ecmd, SPEED_100);
+		break;
+	default:
+		break;
+	}
+	ecmd->duplex = DUPLEX_FULL;
 
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index ba7fbc0..4c0ae43 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -802,6 +802,45 @@ void i40evf_set_rss_lut(struct i40evf_adapter *adapter)
 }
 
 /**
+ * i40evf_print_link_message - print link up or down
+ * @adapter: adapter structure
+ *
+ * Log a message telling the world of our wonderous link status
+ */
+static void i40evf_print_link_message(struct i40evf_adapter *adapter)
+{
+	struct net_device *netdev = adapter->netdev;
+	char *speed = "Unknown ";
+
+	if (!adapter->link_up) {
+		netdev_info(netdev, "NIC Link is Down\n");
+		return;
+	}
+
+	switch (adapter->link_speed) {
+	case I40E_LINK_SPEED_40GB:
+		speed = "40 G";
+		break;
+	case I40E_LINK_SPEED_20GB:
+		speed = "20 G";
+		break;
+	case I40E_LINK_SPEED_10GB:
+		speed = "10 G";
+		break;
+	case I40E_LINK_SPEED_1GB:
+		speed = "1000 M";
+		break;
+	case I40E_LINK_SPEED_100MB:
+		speed = "100 M";
+		break;
+	default:
+		break;
+	}
+
+	netdev_info(netdev, "NIC Link is Up %sbps Full Duplex\n", speed);
+}
+
+/**
  * i40evf_request_reset
  * @adapter: adapter structure
  *
@@ -838,15 +877,13 @@ void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
 			(struct i40e_virtchnl_pf_event *)msg;
 		switch (vpe->event) {
 		case I40E_VIRTCHNL_EVENT_LINK_CHANGE:
-			adapter->link_up =
-				vpe->event_data.link_event.link_status;
-			if (adapter->link_up && !netif_carrier_ok(netdev)) {
-				dev_info(&adapter->pdev->dev, "NIC Link is Up\n");
-				netif_carrier_on(netdev);
-				netif_tx_wake_all_queues(netdev);
-			} else if (!adapter->link_up) {
-				dev_info(&adapter->pdev->dev, "NIC Link is Down\n");
-				netif_carrier_off(netdev);
+			adapter->link_speed =
+				vpe->event_data.link_event.link_speed;
+			if (adapter->link_up !=
+			    vpe->event_data.link_event.link_status) {
+				adapter->link_up =
+					vpe->event_data.link_event.link_status;
+				i40evf_print_link_message(adapter);
 				netif_tx_stop_all_queues(netdev);
 			}
 			break;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 05/14] i40evf: Set netdev carrier properly
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (3 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 04/14] i40evf: Report link speed Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 06/14] i40e: Clear mac filter count on reset Harshitha Ramamurthy
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Mitch Williams <mitch.a.williams@intel.com>

If the link status changes, a polite driver, a well-behaved driver, a
driver that does not want to DISAPPOINT its MOTHER, will notify the
network layer. Good drivers know not to cross their mommy.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Change-Id: I1df8ab55885b0c89e6fe08938af87bb910d19da6
---
Testing Hints : Toggle link state via cable and the
'ip' command and make sure ethtool reports correctly.

 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index 4c0ae43..e0ea64b 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -884,7 +884,10 @@ void i40evf_virtchnl_completion(struct i40evf_adapter *adapter,
 				adapter->link_up =
 					vpe->event_data.link_event.link_status;
 				i40evf_print_link_message(adapter);
-				netif_tx_stop_all_queues(netdev);
+				if (adapter->link_up)
+					netif_carrier_on(netdev);
+				else
+					netif_carrier_off(netdev);
 			}
 			break;
 		case I40E_VIRTCHNL_EVENT_RESET_IMPENDING:
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 06/14] i40e: Clear mac filter count on reset
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (4 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 05/14] i40evf: Set netdev carrier properly Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 07/14] i40evf: Enable adaptive interrupt throttling Harshitha Ramamurthy
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Mitch Williams <mitch.a.williams@intel.com>

When a VF is reset, it gets a new VSI, so all of its MAC filters go
away. Correctly set the number of filters to 0 when freeing VF
resources. This corrects a problem with failure to add filters when the
VF driver is reloaded.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Change-Id: Ic8c1ee87118b9b45fe23b5b8a9cd935160d1b2d3
---
Testing Hints : Reload the VF driver and observe
dmesg on the host.

 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 41be42d..205eca6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -805,6 +805,7 @@ static void i40e_free_vf_res(struct i40e_vf *vf)
 		i40e_vsi_release(pf->vsi[vf->lan_vsi_idx]);
 		vf->lan_vsi_idx = 0;
 		vf->lan_vsi_id = 0;
+		vf->num_mac = 0;
 	}
 	msix_vf = pf->hw.func_caps.num_msix_vectors_vf;
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 07/14] i40evf: Enable adaptive interrupt throttling
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (5 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 06/14] i40e: Clear mac filter count on reset Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 08/14] i40e: Add message for unsupported MFP mode Harshitha Ramamurthy
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Mitch Williams <mitch.a.williams@intel.com>

All of the code to support adaptive interrupt throttling is already in
the interrupt handler, it just needs to be enabled. Fill out the data
structures properly to make it happen. Single-flow traffic tests may
show slightly lower throughput, but interrupts per second will drop by
about 75%.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Change-Id: I5fb6185ec1cff491bd295c2587b415162393dc6d
---
Testing Hints : Use itop to monitor interrupt load
and run netperf.

 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index b9b1dd8..4c86687 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -368,6 +368,8 @@ i40evf_map_vector_to_rxq(struct i40evf_adapter *adapter, int v_idx, int r_idx)
 {
 	struct i40e_q_vector *q_vector = &adapter->q_vectors[v_idx];
 	struct i40e_ring *rx_ring = &adapter->rx_rings[r_idx];
+	struct i40e_vsi *vsi = &adapter->vsi;
+	struct i40e_hw *hw = &adapter->hw;
 
 	rx_ring->q_vector = q_vector;
 	rx_ring->next = q_vector->rx.ring;
@@ -375,7 +377,10 @@ i40evf_map_vector_to_rxq(struct i40evf_adapter *adapter, int v_idx, int r_idx)
 	q_vector->rx.ring = rx_ring;
 	q_vector->rx.count++;
 	q_vector->rx.latency_range = I40E_LOW_LATENCY;
+	q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
+	q_vector->ring_mask |= BIT(r_idx);
 	q_vector->itr_countdown = ITR_COUNTDOWN_START;
+	wr32(hw, I40E_VFINT_ITRN1(I40E_RX_ITR, v_idx - 1), q_vector->rx.itr);
 }
 
 /**
@@ -389,6 +394,8 @@ i40evf_map_vector_to_txq(struct i40evf_adapter *adapter, int v_idx, int t_idx)
 {
 	struct i40e_q_vector *q_vector = &adapter->q_vectors[v_idx];
 	struct i40e_ring *tx_ring = &adapter->tx_rings[t_idx];
+	struct i40e_vsi *vsi = &adapter->vsi;
+	struct i40e_hw *hw = &adapter->hw;
 
 	tx_ring->q_vector = q_vector;
 	tx_ring->next = q_vector->tx.ring;
@@ -396,9 +403,10 @@ i40evf_map_vector_to_txq(struct i40evf_adapter *adapter, int v_idx, int t_idx)
 	q_vector->tx.ring = tx_ring;
 	q_vector->tx.count++;
 	q_vector->tx.latency_range = I40E_LOW_LATENCY;
+	q_vector->tx.itr = ITR_TO_REG(vsi->tx_itr_setting);
 	q_vector->itr_countdown = ITR_COUNTDOWN_START;
 	q_vector->num_ringpairs++;
-	q_vector->ring_mask |= BIT(t_idx);
+	wr32(hw, I40E_VFINT_ITRN1(I40E_TX_ITR, v_idx - 1), q_vector->tx.itr);
 }
 
 /**
@@ -2271,10 +2279,8 @@ int i40evf_process_config(struct i40evf_adapter *adapter)
 	adapter->vsi.back = adapter;
 	adapter->vsi.base_vector = 1;
 	adapter->vsi.work_limit = I40E_DEFAULT_IRQ_WORK;
-	adapter->vsi.rx_itr_setting = (I40E_ITR_DYNAMIC |
-				       ITR_REG_TO_USEC(I40E_ITR_RX_DEF));
-	adapter->vsi.tx_itr_setting = (I40E_ITR_DYNAMIC |
-				       ITR_REG_TO_USEC(I40E_ITR_TX_DEF));
+	adapter->vsi.rx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF);
+	adapter->vsi.tx_itr_setting = (I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF);
 	vsi->netdev = adapter->netdev;
 	vsi->qs_handle = adapter->vsi_res->qset_handle;
 	if (vfres->vf_offload_flags & I40E_VIRTCHNL_VF_OFFLOAD_RSS_PF) {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 08/14] i40e: Add message for unsupported MFP mode
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (6 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 07/14] i40evf: Enable adaptive interrupt throttling Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 09/14] i40e/i40evf: Rename version macro Harshitha Ramamurthy
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Carolyn Wyborny <carolyn.wyborny@intel.com>

This patch adds a check and message if the device is in
MFP mode as changing RSS input set is not supported in
MFP mode per the DCR.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Change-Id: I4c066c6124fd77ce9d3587ac1164e07b49c49d89
---
Testing Hints : Check that trying to
change RSS Hash options with ethtool returns a not
supported message.  There should also be a similar
message in the log.

 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 3409702..35c211f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2318,6 +2318,12 @@ static int i40e_set_rss_hash_opt(struct i40e_pf *pf, struct ethtool_rxnfc *nfc)
 	u8 flow_pctype = 0;
 	u64 i_set, i_setc;
 
+	if (pf->flags & I40E_FLAG_MFP_ENABLED) {
+		dev_err(&pf->pdev->dev,
+			"Change of RSS hash input set is not supported when MFP mode is enabled\n");
+		return -EOPNOTSUPP;
+	}
+
 	/* RSS does not support anything other than hashing
 	 * to queues on src and dst IPs and ports
 	 */
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 09/14] i40e/i40evf: Rename version macro
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (7 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 08/14] i40e: Add message for unsupported MFP mode Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 10/14] i40e/i40evf: Refactor tunnel interpretation Harshitha Ramamurthy
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Catherine Sullivan <catherine.sullivan@intel.com>

Change the '-k' version description macro to DRV_VERSION_DESC
to be more generic.

Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Change-Id: I422bc6727b854f230217f04c61e750447435d183
---
 drivers/net/ethernet/intel/i40e/i40e_main.c     | 5 +++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 19a2d30..6308218 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -42,14 +42,15 @@ const char i40e_driver_name[] = "i40e";
 static const char i40e_driver_string[] =
 			"Intel(R) Ethernet Connection XL710 Network Driver";
 
-#define DRV_KERN "-k"
+#define DRV_VERSION_DESC "-k"
 
 #define DRV_VERSION_MAJOR 1
 #define DRV_VERSION_MINOR 5
 #define DRV_VERSION_BUILD 10
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 	     __stringify(DRV_VERSION_MINOR) "." \
-	     __stringify(DRV_VERSION_BUILD)    DRV_KERN
+	     __stringify(DRV_VERSION_BUILD)    \
+	     DRV_VERSION_DESC
 const char i40e_driver_version_str[] = DRV_VERSION;
 static const char i40e_copyright[] = "Copyright (c) 2013 - 2014 Intel Corporation.";
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 4c86687..1b8dc22 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -34,7 +34,7 @@ char i40evf_driver_name[] = "i40evf";
 static const char i40evf_driver_string[] =
 	"Intel(R) 40-10 Gigabit Virtual Function Network Driver";
 
-#define DRV_KERN "-k"
+#define DRV_VF_VERSION_DESC "-k"
 
 #define DRV_VERSION_MAJOR 1
 #define DRV_VERSION_MINOR 5
@@ -42,7 +42,7 @@ static const char i40evf_driver_string[] =
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 	     __stringify(DRV_VERSION_MINOR) "." \
 	     __stringify(DRV_VERSION_BUILD) \
-	     DRV_KERN
+			DRV_VF_VERSION_DESC
 const char i40evf_driver_version[] = DRV_VERSION;
 static const char i40evf_copyright[] =
 	"Copyright (c) 2013 - 2015 Intel Corporation.";
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 10/14] i40e/i40evf: Refactor tunnel interpretation
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (8 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 09/14] i40e/i40evf: Rename version macro Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine Harshitha Ramamurthy
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

Refactor the interpretation of a tunnel.  This removes
some code and lets us start using the hardware's parsing.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
Testing Hints: make sure VxLAN, Geneve, GRE tunnels all still work over
both ipv4 and ipv6 tunnel headers.

 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 13 ++++++-------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 13 ++++++-------
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 17ee438..ce4d94b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1410,7 +1410,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 				    u16 rx_ptype)
 {
 	struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
-	bool ipv4, ipv6, ipv4_tunnel, ipv6_tunnel;
+	bool ipv4, ipv6, tunnel = false;
 
 	skb->ip_summed = CHECKSUM_NONE;
 
@@ -1459,14 +1459,13 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 	 * doesn't make it a hard requirement so if we have validated the
 	 * inner checksum report CHECKSUM_UNNECESSARY.
 	 */
-
-	ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
-		     (rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
-	ipv6_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT6_MAC_PAY3) &&
-		     (rx_ptype <= I40E_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4);
+	if (decoded.inner_prot & (I40E_RX_PTYPE_INNER_PROT_TCP |
+				  I40E_RX_PTYPE_INNER_PROT_UDP |
+				  I40E_RX_PTYPE_INNER_PROT_SCTP))
+		tunnel = true;
 
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
-	skb->csum_level = ipv4_tunnel || ipv6_tunnel;
+	skb->csum_level = tunnel ? 1 : 0;
 
 	return;
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 4633235..cf42f16 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -864,7 +864,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 				    u16 rx_ptype)
 {
 	struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
-	bool ipv4, ipv6, ipv4_tunnel, ipv6_tunnel;
+	bool ipv4, ipv6, tunnel = false;
 
 	skb->ip_summed = CHECKSUM_NONE;
 
@@ -913,14 +913,13 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 	 * doesn't make it a hard requirement so if we have validated the
 	 * inner checksum report CHECKSUM_UNNECESSARY.
 	 */
-
-	ipv4_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT4_MAC_PAY3) &&
-		     (rx_ptype <= I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4);
-	ipv6_tunnel = (rx_ptype >= I40E_RX_PTYPE_GRENAT6_MAC_PAY3) &&
-		     (rx_ptype <= I40E_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4);
+	if (decoded.inner_prot & (I40E_RX_PTYPE_INNER_PROT_TCP |
+				  I40E_RX_PTYPE_INNER_PROT_UDP |
+				  I40E_RX_PTYPE_INNER_PROT_SCTP))
+		tunnel = true;
 
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
-	skb->csum_level = ipv4_tunnel || ipv6_tunnel;
+	skb->csum_level = tunnel ? 1 : 0;
 
 	return;
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (9 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 10/14] i40e/i40evf: Refactor tunnel interpretation Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-15 17:25   ` Alexander Duyck
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 12/14] i40e/i40evf: Remove unused hardware receive descriptor code Harshitha Ramamurthy
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

This refactor aligns the receive routine with the one in
ixgbe which was highly optimized.  This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.

In order to do this:
- consolidate the receive path into a single function that doesn't
  use packet split but *does* use pages for rx buffers.
- remove the old _1buf and _ps routines
- consolidate several routines into helper functions
- remove ethtool control over packet split
- remove VF ethtool control over packet split

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0
---
Testing Hints : lots of receive traffic, make
sure it's all working, PF and VF. Please test on a machine with 8kB
or larger pages, and check for memory leaks.

 drivers/net/ethernet/intel/i40e/i40e.h             |   4 -
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |  12 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |  20 -
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  57 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 951 ++++++++++-----------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h        |  45 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c      | 911 ++++++++++----------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |  45 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h         |   7 -
 drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  52 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  31 +-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    |   4 -
 12 files changed, 977 insertions(+), 1162 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index b5fcd9c..5e23cc9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -101,7 +101,6 @@
 #define I40E_PRIV_FLAGS_LINKPOLL_FLAG	BIT(1)
 #define I40E_PRIV_FLAGS_FD_ATR		BIT(2)
 #define I40E_PRIV_FLAGS_VEB_STATS	BIT(3)
-#define I40E_PRIV_FLAGS_PS		BIT(4)
 #define I40E_PRIV_FLAGS_HW_ATR_EVICT	BIT(5)
 
 #define I40E_NVM_VERSION_LO_SHIFT  0
@@ -335,8 +334,6 @@ struct i40e_pf {
 #define I40E_FLAG_RX_CSUM_ENABLED		BIT_ULL(1)
 #define I40E_FLAG_MSI_ENABLED			BIT_ULL(2)
 #define I40E_FLAG_MSIX_ENABLED			BIT_ULL(3)
-#define I40E_FLAG_RX_1BUF_ENABLED		BIT_ULL(4)
-#define I40E_FLAG_RX_PS_ENABLED			BIT_ULL(5)
 #define I40E_FLAG_RSS_ENABLED			BIT_ULL(6)
 #define I40E_FLAG_VMDQ_ENABLED			BIT_ULL(7)
 #define I40E_FLAG_FDIR_REQUIRES_REINIT		BIT_ULL(8)
@@ -549,7 +546,6 @@ struct i40e_vsi {
 	u8  *rss_lut_user;  /* User configured lookup table entries */
 
 	u16 max_frame;
-	u16 rx_hdr_len;
 	u16 rx_buf_len;
 	u8  dtype;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 83dccf1..519cfc8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -268,13 +268,13 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 rx_ring->queue_index,
 			 rx_ring->reg_idx);
 		dev_info(&pf->pdev->dev,
-			 "    rx_rings[%i]: rx_hdr_len = %d, rx_buf_len = %d, dtype = %d\n",
-			 i, rx_ring->rx_hdr_len,
+			 "    rx_rings[%i]: rx_buf_len = %d, dtype = %d\n",
+			 i,
 			 rx_ring->rx_buf_len,
 			 rx_ring->dtype);
 		dev_info(&pf->pdev->dev,
-			 "    rx_rings[%i]: hsplit = %d, next_to_use = %d, next_to_clean = %d, ring_active = %i\n",
-			 i, ring_is_ps_enabled(rx_ring),
+			 "    rx_rings[%i]: next_to_use = %d, next_to_clean = %d, ring_active = %i\n",
+			 i,
 			 rx_ring->next_to_use,
 			 rx_ring->next_to_clean,
 			 rx_ring->ring_active);
@@ -365,8 +365,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 		 "    work_limit = %d\n",
 		 vsi->work_limit);
 	dev_info(&pf->pdev->dev,
-		 "    max_frame = %d, rx_hdr_len = %d, rx_buf_len = %d dtype = %d\n",
-		 vsi->max_frame, vsi->rx_hdr_len, vsi->rx_buf_len, vsi->dtype);
+		 "    max_frame = %d, rx_buf_len = %d dtype = %d\n",
+		 vsi->max_frame, vsi->rx_buf_len, vsi->dtype);
 	dev_info(&pf->pdev->dev,
 		 "    num_q_vectors = %i, base_vector = %i\n",
 		 vsi->num_q_vectors, vsi->base_vector);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 35c211f..feb370b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -235,7 +235,6 @@ static const char i40e_priv_flags_strings[][ETH_GSTRING_LEN] = {
 	"LinkPolling",
 	"flow-director-atr",
 	"veb-stats",
-	"packet-split",
 	"hw-atr-eviction",
 };
 
@@ -2997,8 +2996,6 @@ static u32 i40e_get_priv_flags(struct net_device *dev)
 		I40E_PRIV_FLAGS_FD_ATR : 0;
 	ret_flags |= pf->flags & I40E_FLAG_VEB_STATS_ENABLED ?
 		I40E_PRIV_FLAGS_VEB_STATS : 0;
-	ret_flags |= pf->flags & I40E_FLAG_RX_PS_ENABLED ?
-		I40E_PRIV_FLAGS_PS : 0;
 	ret_flags |= pf->auto_disable_flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE ?
 		0 : I40E_PRIV_FLAGS_HW_ATR_EVICT;
 
@@ -3019,23 +3016,6 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
 
 	/* NOTE: MFP is not settable */
 
-	/* allow the user to control the method of receive
-	 * buffer DMA, whether the packet is split at header
-	 * boundaries into two separate buffers.  In some cases
-	 * one routine or the other will perform better.
-	 */
-	if ((flags & I40E_PRIV_FLAGS_PS) &&
-	    !(pf->flags & I40E_FLAG_RX_PS_ENABLED)) {
-		pf->flags |= I40E_FLAG_RX_PS_ENABLED;
-		pf->flags &= ~I40E_FLAG_RX_1BUF_ENABLED;
-		reset_required = true;
-	} else if (!(flags & I40E_PRIV_FLAGS_PS) &&
-		   (pf->flags & I40E_FLAG_RX_PS_ENABLED)) {
-		pf->flags &= ~I40E_FLAG_RX_PS_ENABLED;
-		pf->flags |= I40E_FLAG_RX_1BUF_ENABLED;
-		reset_required = true;
-	}
-
 	if (flags & I40E_PRIV_FLAGS_LINKPOLL_FLAG)
 		pf->flags |= I40E_FLAG_LINK_POLLING_ENABLED;
 	else
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6308218..b5713ae 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2856,10 +2856,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	memset(&rx_ctx, 0, sizeof(rx_ctx));
 
 	ring->rx_buf_len = vsi->rx_buf_len;
-	ring->rx_hdr_len = vsi->rx_hdr_len;
 
 	rx_ctx.dbuff = ring->rx_buf_len >> I40E_RXQ_CTX_DBUFF_SHIFT;
-	rx_ctx.hbuff = ring->rx_hdr_len >> I40E_RXQ_CTX_HBUFF_SHIFT;
 
 	rx_ctx.base = (ring->dma / 128);
 	rx_ctx.qlen = ring->count;
@@ -2872,18 +2870,9 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	}
 
 	rx_ctx.dtype = vsi->dtype;
-	if (vsi->dtype) {
-		set_ring_ps_enabled(ring);
-		rx_ctx.hsplit_0 = I40E_RX_SPLIT_L2      |
-				  I40E_RX_SPLIT_IP      |
-				  I40E_RX_SPLIT_TCP_UDP |
-				  I40E_RX_SPLIT_SCTP;
-	} else {
-		rx_ctx.hsplit_0 = 0;
-	}
+	rx_ctx.hsplit_0 = 0;
 
-	rx_ctx.rxmax = min_t(u16, vsi->max_frame,
-				  (chain_len * ring->rx_buf_len));
+	rx_ctx.rxmax = min_t(u16, vsi->max_frame, chain_len * ring->rx_buf_len);
 	if (hw->revision_id == 0)
 		rx_ctx.lrxqthresh = 0;
 	else
@@ -2920,12 +2909,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q);
 	writel(0, ring->tail);
 
-	if (ring_is_ps_enabled(ring)) {
-		i40e_alloc_rx_headers(ring);
-		i40e_alloc_rx_buffers_ps(ring, I40E_DESC_UNUSED(ring));
-	} else {
-		i40e_alloc_rx_buffers_1buf(ring, I40E_DESC_UNUSED(ring));
-	}
+	i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
 
 	return 0;
 }
@@ -2964,31 +2948,13 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
 	else
 		vsi->max_frame = I40E_RXBUFFER_2048;
 
-	/* figure out correct receive buffer length */
-	switch (vsi->back->flags & (I40E_FLAG_RX_1BUF_ENABLED |
-				    I40E_FLAG_RX_PS_ENABLED)) {
-	case I40E_FLAG_RX_1BUF_ENABLED:
-		vsi->rx_hdr_len = 0;
-		vsi->rx_buf_len = vsi->max_frame;
-		vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
-		break;
-	case I40E_FLAG_RX_PS_ENABLED:
-		vsi->rx_hdr_len = I40E_RX_HDR_SIZE;
-		vsi->rx_buf_len = I40E_RXBUFFER_2048;
-		vsi->dtype = I40E_RX_DTYPE_HEADER_SPLIT;
-		break;
-	default:
-		vsi->rx_hdr_len = I40E_RX_HDR_SIZE;
-		vsi->rx_buf_len = I40E_RXBUFFER_2048;
-		vsi->dtype = I40E_RX_DTYPE_SPLIT_ALWAYS;
-		break;
-	}
+	vsi->rx_buf_len = I40E_RXBUFFER_2048;
+	vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
 
 #ifdef I40E_FCOE
 	/* setup rx buffer for FCoE */
 	if ((vsi->type == I40E_VSI_FCOE) &&
 	    (vsi->back->flags & I40E_FLAG_FCOE_ENABLED)) {
-		vsi->rx_hdr_len = 0;
 		vsi->rx_buf_len = I40E_RXBUFFER_3072;
 		vsi->max_frame = I40E_RXBUFFER_3072;
 		vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
@@ -2996,8 +2962,6 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
 
 #endif /* I40E_FCOE */
 	/* round up for the chip's needs */
-	vsi->rx_hdr_len = ALIGN(vsi->rx_hdr_len,
-				BIT_ULL(I40E_RXQ_CTX_HBUFF_SHIFT));
 	vsi->rx_buf_len = ALIGN(vsi->rx_buf_len,
 				BIT_ULL(I40E_RXQ_CTX_DBUFF_SHIFT));
 
@@ -8461,11 +8425,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
 		    I40E_FLAG_MSI_ENABLED     |
 		    I40E_FLAG_MSIX_ENABLED;
 
-	if (iommu_present(&pci_bus_type))
-		pf->flags |= I40E_FLAG_RX_PS_ENABLED;
-	else
-		pf->flags |= I40E_FLAG_RX_1BUF_ENABLED;
-
 	/* Set default ITR */
 	pf->rx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF;
 	pf->tx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF;
@@ -10691,11 +10650,9 @@ static void i40e_print_features(struct i40e_pf *pf)
 #ifdef CONFIG_PCI_IOV
 	i += snprintf(&buf[i], REMAIN(i), " VFs: %d", pf->num_req_vfs);
 #endif
-	i += snprintf(&buf[i], REMAIN(i), " VSIs: %d QP: %d RX: %s",
+	i += snprintf(&buf[i], REMAIN(i), " VSIs: %d QP: %d",
 		      pf->hw.func_caps.num_vsis,
-		      pf->vsi[pf->lan_vsi]->num_queue_pairs,
-		      pf->flags & I40E_FLAG_RX_PS_ENABLED ? "PS" : "1BUF");
-
+		      pf->vsi[pf->lan_vsi]->num_queue_pairs);
 	if (pf->flags & I40E_FLAG_RSS_ENABLED)
 		i += snprintf(&buf[i], REMAIN(i), " RSS");
 	if (pf->flags & I40E_FLAG_FD_ATR_ENABLED)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ce4d94b..2f50ab8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1042,7 +1042,6 @@ err:
 void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
-	struct i40e_rx_buffer *rx_bi;
 	unsigned long bi_size;
 	u16 i;
 
@@ -1050,48 +1049,22 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 	if (!rx_ring->rx_bi)
 		return;
 
-	if (ring_is_ps_enabled(rx_ring)) {
-		int bufsz = ALIGN(rx_ring->rx_hdr_len, 256) * rx_ring->count;
-
-		rx_bi = &rx_ring->rx_bi[0];
-		if (rx_bi->hdr_buf) {
-			dma_free_coherent(dev,
-					  bufsz,
-					  rx_bi->hdr_buf,
-					  rx_bi->dma);
-			for (i = 0; i < rx_ring->count; i++) {
-				rx_bi = &rx_ring->rx_bi[i];
-				rx_bi->dma = 0;
-				rx_bi->hdr_buf = NULL;
-			}
-		}
-	}
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
-		rx_bi = &rx_ring->rx_bi[i];
-		if (rx_bi->dma) {
-			dma_unmap_single(dev,
-					 rx_bi->dma,
-					 rx_ring->rx_buf_len,
-					 DMA_FROM_DEVICE);
-			rx_bi->dma = 0;
-		}
+		struct i40e_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
+
 		if (rx_bi->skb) {
 			dev_kfree_skb(rx_bi->skb);
 			rx_bi->skb = NULL;
 		}
-		if (rx_bi->page) {
-			if (rx_bi->page_dma) {
-				dma_unmap_page(dev,
-					       rx_bi->page_dma,
-					       PAGE_SIZE,
-					       DMA_FROM_DEVICE);
-				rx_bi->page_dma = 0;
-			}
-			__free_page(rx_bi->page);
-			rx_bi->page = NULL;
-			rx_bi->page_offset = 0;
-		}
+		if (!rx_bi->page)
+			continue;
+
+		dma_unmap_page(dev, rx_bi->dma, PAGE_SIZE, DMA_FROM_DEVICE);
+		__free_pages(rx_bi->page, 0);
+
+		rx_bi->page = NULL;
+		rx_bi->page_offset = 0;
 	}
 
 	bi_size = sizeof(struct i40e_rx_buffer) * rx_ring->count;
@@ -1100,6 +1073,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 	/* Zero out the descriptor ring */
 	memset(rx_ring->desc, 0, rx_ring->size);
 
+	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
 }
@@ -1124,37 +1098,6 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring)
 }
 
 /**
- * i40e_alloc_rx_headers - allocate rx header buffers
- * @rx_ring: ring to alloc buffers
- *
- * Allocate rx header buffers for the entire ring. As these are static,
- * this is only called when setting up a new ring.
- **/
-void i40e_alloc_rx_headers(struct i40e_ring *rx_ring)
-{
-	struct device *dev = rx_ring->dev;
-	struct i40e_rx_buffer *rx_bi;
-	dma_addr_t dma;
-	void *buffer;
-	int buf_size;
-	int i;
-
-	if (rx_ring->rx_bi[0].hdr_buf)
-		return;
-	/* Make sure the buffers don't cross cache line boundaries. */
-	buf_size = ALIGN(rx_ring->rx_hdr_len, 256);
-	buffer = dma_alloc_coherent(dev, buf_size * rx_ring->count,
-				    &dma, GFP_KERNEL);
-	if (!buffer)
-		return;
-	for (i = 0; i < rx_ring->count; i++) {
-		rx_bi = &rx_ring->rx_bi[i];
-		rx_bi->dma = dma + (i * buf_size);
-		rx_bi->hdr_buf = buffer + (i * buf_size);
-	}
-}
-
-/**
  * i40e_setup_rx_descriptors - Allocate Rx descriptors
  * @rx_ring: Rx descriptor ring (for a specific queue) to setup
  *
@@ -1175,9 +1118,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	u64_stats_init(&rx_ring->syncp);
 
 	/* Round up to nearest 4K */
-	rx_ring->size = ring_is_16byte_desc_enabled(rx_ring)
-		? rx_ring->count * sizeof(union i40e_16byte_rx_desc)
-		: rx_ring->count * sizeof(union i40e_32byte_rx_desc);
+	rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
 	rx_ring->size = ALIGN(rx_ring->size, 4096);
 	rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
 					   &rx_ring->dma, GFP_KERNEL);
@@ -1188,6 +1129,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 		goto err;
 	}
 
+	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
 
@@ -1206,6 +1148,10 @@ err:
 static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 {
 	rx_ring->next_to_use = val;
+
+	/* update next to alloc since we have filled the ring */
+	rx_ring->next_to_alloc = val;
+
 	/* Force memory writes to complete before letting h/w
 	 * know there are new descriptors to fetch.  (Only
 	 * applicable for weak-ordered memory model archs,
@@ -1216,160 +1162,122 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 }
 
 /**
- * i40e_alloc_rx_buffers_ps - Replace used receive buffers; packet split
- * @rx_ring: ring to place buffers on
- * @cleaned_count: number of buffers to replace
+ * i40e_alloc_mapped_page - recycle or make a new page
+ * @rx_ring: ring to use
+ * @bi: rx_buffer struct to modify
  *
- * Returns true if any errors on allocation
+ * Returns true if the page was successfully allocated or
+ * reused.
  **/
-bool i40e_alloc_rx_buffers_ps(struct i40e_ring *rx_ring, u16 cleaned_count)
+static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
+				   struct i40e_rx_buffer *bi)
 {
-	u16 i = rx_ring->next_to_use;
-	union i40e_rx_desc *rx_desc;
-	struct i40e_rx_buffer *bi;
-	const int current_node = numa_node_id();
+	struct page *page = bi->page;
+	dma_addr_t dma;
 
-	/* do nothing if no valid netdev defined */
-	if (!rx_ring->netdev || !cleaned_count)
-		return false;
+	/* since we are recycling buffers we should seldom need to alloc */
+	if (likely(page)) {
+		rx_ring->rx_stats.page_reuse_count++;
+		return true;
+	}
 
-	while (cleaned_count--) {
-		rx_desc = I40E_RX_DESC(rx_ring, i);
-		bi = &rx_ring->rx_bi[i];
+	/* alloc new page for storage */
+	page = dev_alloc_page();
+	if (unlikely(!page)) {
+		rx_ring->rx_stats.alloc_page_failed++;
+		return false;
+	}
 
-		if (bi->skb) /* desc is in use */
-			goto no_buffers;
+	/* map page for use */
+	dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
 
-	/* If we've been moved to a different NUMA node, release the
-	 * page so we can get a new one on the current node.
+	/* if mapping failed free memory back to system since
+	 * there isn't much point in holding memory we can't use
 	 */
-		if (bi->page &&  page_to_nid(bi->page) != current_node) {
-			dma_unmap_page(rx_ring->dev,
-				       bi->page_dma,
-				       PAGE_SIZE,
-				       DMA_FROM_DEVICE);
-			__free_page(bi->page);
-			bi->page = NULL;
-			bi->page_dma = 0;
-			rx_ring->rx_stats.realloc_count++;
-		} else if (bi->page) {
-			rx_ring->rx_stats.page_reuse_count++;
-		}
-
-		if (!bi->page) {
-			bi->page = alloc_page(GFP_ATOMIC);
-			if (!bi->page) {
-				rx_ring->rx_stats.alloc_page_failed++;
-				goto no_buffers;
-			}
-			bi->page_dma = dma_map_page(rx_ring->dev,
-						    bi->page,
-						    0,
-						    PAGE_SIZE,
-						    DMA_FROM_DEVICE);
-			if (dma_mapping_error(rx_ring->dev, bi->page_dma)) {
-				rx_ring->rx_stats.alloc_page_failed++;
-				__free_page(bi->page);
-				bi->page = NULL;
-				bi->page_dma = 0;
-				bi->page_offset = 0;
-				goto no_buffers;
-			}
-			bi->page_offset = 0;
-		}
-
-		/* Refresh the desc even if buffer_addrs didn't change
-		 * because each write-back erases this info.
-		 */
-		rx_desc->read.pkt_addr =
-				cpu_to_le64(bi->page_dma + bi->page_offset);
-		rx_desc->read.hdr_addr = cpu_to_le64(bi->dma);
-		i++;
-		if (i == rx_ring->count)
-			i = 0;
+	if (dma_mapping_error(rx_ring->dev, dma)) {
+		__free_pages(page, 0);
+		rx_ring->rx_stats.alloc_page_failed++;
+		return false;
 	}
 
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+	bi->dma = dma;
+	bi->page = page;
+	bi->page_offset = 0;
 
-	return false;
+	return true;
+}
 
-no_buffers:
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+/**
+ * i40e_receive_skb - Send a completed packet up the stack
+ * @rx_ring:  rx ring in play
+ * @skb: packet to send up
+ * @vlan_tag: vlan tag for packet
+ **/
+static void i40e_receive_skb(struct i40e_ring *rx_ring,
+			     struct sk_buff *skb, u16 vlan_tag)
+{
+	struct i40e_q_vector *q_vector = rx_ring->q_vector;
 
-	/* make sure to come back via polling to try again after
-	 * allocation failure
-	 */
-	return true;
+	if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
+	    (vlan_tag & VLAN_VID_MASK))
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
+
+	napi_gro_receive(&q_vector->napi, skb);
 }
 
 /**
- * i40e_alloc_rx_buffers_1buf - Replace used receive buffers; single buffer
+ * i40e_alloc_rx_buffers - Replace used receive buffers
  * @rx_ring: ring to place buffers on
  * @cleaned_count: number of buffers to replace
  *
- * Returns true if any errors on allocation
+ * Returns false if all allocations were successful, true if any fail
  **/
-bool i40e_alloc_rx_buffers_1buf(struct i40e_ring *rx_ring, u16 cleaned_count)
+bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count)
 {
-	u16 i = rx_ring->next_to_use;
+	u16 ntu = rx_ring->next_to_use;
 	union i40e_rx_desc *rx_desc;
 	struct i40e_rx_buffer *bi;
-	struct sk_buff *skb;
 
 	/* do nothing if no valid netdev defined */
 	if (!rx_ring->netdev || !cleaned_count)
 		return false;
 
-	while (cleaned_count--) {
-		rx_desc = I40E_RX_DESC(rx_ring, i);
-		bi = &rx_ring->rx_bi[i];
-		skb = bi->skb;
-
-		if (!skb) {
-			skb = __netdev_alloc_skb_ip_align(rx_ring->netdev,
-							  rx_ring->rx_buf_len,
-							  GFP_ATOMIC |
-							  __GFP_NOWARN);
-			if (!skb) {
-				rx_ring->rx_stats.alloc_buff_failed++;
-				goto no_buffers;
-			}
-			/* initialize queue mapping */
-			skb_record_rx_queue(skb, rx_ring->queue_index);
-			bi->skb = skb;
-		}
+	rx_desc = I40E_RX_DESC(rx_ring, ntu);
+	bi = &rx_ring->rx_bi[ntu];
 
-		if (!bi->dma) {
-			bi->dma = dma_map_single(rx_ring->dev,
-						 skb->data,
-						 rx_ring->rx_buf_len,
-						 DMA_FROM_DEVICE);
-			if (dma_mapping_error(rx_ring->dev, bi->dma)) {
-				rx_ring->rx_stats.alloc_buff_failed++;
-				bi->dma = 0;
-				dev_kfree_skb(bi->skb);
-				bi->skb = NULL;
-				goto no_buffers;
-			}
-		}
+	do {
+		if (!i40e_alloc_mapped_page(rx_ring, bi))
+			goto no_buffers;
 
-		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma);
+		/* Refresh the desc even if buffer_addrs didn't change
+		 * because each write-back erases this info.
+		 */
+		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
 		rx_desc->read.hdr_addr = 0;
-		i++;
-		if (i == rx_ring->count)
-			i = 0;
-	}
 
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+		rx_desc++;
+		bi++;
+		ntu++;
+		if (unlikely(ntu == rx_ring->count)) {
+			rx_desc = I40E_RX_DESC(rx_ring, 0);
+			bi = rx_ring->rx_bi;
+			ntu = 0;
+		}
+
+		/* clear the status bits for the next_to_use descriptor */
+		rx_desc->wb.qword1.status_error_len = 0;
+
+		cleaned_count--;
+	} while (cleaned_count);
+
+	if (rx_ring->next_to_use != ntu)
+		i40e_release_rx_desc(rx_ring, ntu);
 
 	return false;
 
 no_buffers:
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+	if (rx_ring->next_to_use != ntu)
+		i40e_release_rx_desc(rx_ring, ntu);
 
 	/* make sure to come back via polling to try again after
 	 * allocation failure
@@ -1378,42 +1286,35 @@ no_buffers:
 }
 
 /**
- * i40e_receive_skb - Send a completed packet up the stack
- * @rx_ring:  rx ring in play
- * @skb: packet to send up
- * @vlan_tag: vlan tag for packet
- **/
-static void i40e_receive_skb(struct i40e_ring *rx_ring,
-			     struct sk_buff *skb, u16 vlan_tag)
-{
-	struct i40e_q_vector *q_vector = rx_ring->q_vector;
-
-	if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
-	    (vlan_tag & VLAN_VID_MASK))
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
-
-	napi_gro_receive(&q_vector->napi, skb);
-}
-
-/**
  * i40e_rx_checksum - Indicate in skb if hw indicated a good cksum
  * @vsi: the VSI we care about
  * @skb: skb currently being received and modified
- * @rx_status: status value of last descriptor in packet
- * @rx_error: error value of last descriptor in packet
- * @rx_ptype: ptype value of last descriptor in packet
+ * @rx_desc: the receive descriptor
+ *
+ * skb->protocol must be set before this function is called
  **/
 static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 				    struct sk_buff *skb,
-				    u32 rx_status,
-				    u32 rx_error,
-				    u16 rx_ptype)
+				    union i40e_rx_desc *rx_desc)
 {
-	struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
+	struct i40e_rx_ptype_decoded decoded;
 	bool ipv4, ipv6, tunnel = false;
+	u32 rx_error, rx_status;
+	u8 ptype;
+	u64 qword;
+
+	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+	ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
+	rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
+		   I40E_RXD_QW1_ERROR_SHIFT;
+	rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
+		    I40E_RXD_QW1_STATUS_SHIFT;
+	decoded = decode_rx_desc_ptype(ptype);
 
 	skb->ip_summed = CHECKSUM_NONE;
 
+	skb_checksum_none_assert(skb);
+
 	/* Rx csum enabled and ip headers found? */
 	if (!(vsi->netdev->features & NETIF_F_RXCSUM))
 		return;
@@ -1479,7 +1380,7 @@ checksum_fail:
  *
  * Returns a hash type to be used by skb_set_hash
  **/
-static inline enum pkt_hash_types i40e_ptype_to_htype(u8 ptype)
+static inline int i40e_ptype_to_htype(u8 ptype)
 {
 	struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
 
@@ -1507,7 +1408,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
 				u8 rx_ptype)
 {
 	u32 hash;
-	const __le64 rss_mask  =
+	const __le64 rss_mask =
 		cpu_to_le64((u64)I40E_RX_DESC_FLTSTAT_RSS_HASH <<
 			    I40E_RX_DESC_STATUS_FLTSTAT_SHIFT);
 
@@ -1521,338 +1422,409 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
 }
 
 /**
- * i40e_clean_rx_irq_ps - Reclaim resources after receive; packet split
- * @rx_ring:  rx ring to clean
- * @budget:   how many cleans we're allowed
+ * i40e_process_skb_fields - Populate skb header fields from Rx descriptor
+ * @rx_ring: rx descriptor ring packet is being transacted on
+ * @rx_desc: pointer to the EOP Rx descriptor
+ * @skb: pointer to current skb being populated
+ * @rx_ptype: the packet type decoded by hardware
  *
- * Returns true if there's any budget left (e.g. the clean is finished)
+ * This function checks the ring, descriptor, and packet information in
+ * order to populate the hash, checksum, VLAN, protocol, and
+ * other fields within the skb.
  **/
-static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, const int budget)
+static inline
+void i40e_process_skb_fields(struct i40e_ring *rx_ring,
+			     union i40e_rx_desc *rx_desc, struct sk_buff *skb,
+			     u8 rx_ptype)
 {
-	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-	u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
-	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	struct i40e_vsi *vsi = rx_ring->vsi;
-	u16 i = rx_ring->next_to_clean;
-	union i40e_rx_desc *rx_desc;
-	u32 rx_error, rx_status;
-	bool failure = false;
-	u8 rx_ptype;
-	u64 qword;
-	u32 copysize;
+	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+	u32 rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
+			I40E_RXD_QW1_STATUS_SHIFT;
+	u32 rsyn = (rx_status & I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
+		   I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT;
 
-	if (budget <= 0)
-		return 0;
+	if (unlikely(rsyn)) {
+		i40e_ptp_rx_hwtstamp(rx_ring->vsi->back, skb, rsyn);
+		rx_ring->last_rx_timestamp = jiffies;
+	}
 
-	do {
-		struct i40e_rx_buffer *rx_bi;
-		struct sk_buff *skb;
-		u16 vlan_tag;
-		/* return some buffers to hardware, one at a time is too slow */
-		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
-			failure = failure ||
-				  i40e_alloc_rx_buffers_ps(rx_ring,
-							   cleaned_count);
-			cleaned_count = 0;
-		}
+	i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
 
-		i = rx_ring->next_to_clean;
-		rx_desc = I40E_RX_DESC(rx_ring, i);
-		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
-		rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
-			I40E_RXD_QW1_STATUS_SHIFT;
+	/* modifies the skb - consumes the enet header */
+	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
-		if (!(rx_status & BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
-			break;
+	i40e_rx_checksum(rx_ring->vsi, skb, rx_desc);
+}
 
-		/* This memory barrier is needed to keep us from reading
-		 * any other fields out of the rx_desc until we know the
-		 * DD bit is set.
-		 */
-		dma_rmb();
-		/* sync header buffer for reading */
-		dma_sync_single_range_for_cpu(rx_ring->dev,
-					      rx_ring->rx_bi[0].dma,
-					      i * rx_ring->rx_hdr_len,
-					      rx_ring->rx_hdr_len,
-					      DMA_FROM_DEVICE);
-		if (i40e_rx_is_programming_status(qword)) {
-			i40e_clean_programming_status(rx_ring, rx_desc);
-			I40E_RX_INCREMENT(rx_ring, i);
-			continue;
-		}
-		rx_bi = &rx_ring->rx_bi[i];
-		skb = rx_bi->skb;
-		if (likely(!skb)) {
-			skb = __netdev_alloc_skb_ip_align(rx_ring->netdev,
-							  rx_ring->rx_hdr_len,
-							  GFP_ATOMIC |
-							  __GFP_NOWARN);
-			if (!skb) {
-				rx_ring->rx_stats.alloc_buff_failed++;
-				failure = true;
-				break;
-			}
+/**
+ * i40e_pull_tail - i40e specific version of skb_pull_tail
+ * @rx_ring: rx descriptor ring packet is being transacted on
+ * @skb: pointer to current skb being adjusted
+ *
+ * This function is an i40e specific version of __pskb_pull_tail.  The
+ * main difference between this version and the original function is that
+ * this function can make several assumptions about the state of things
+ * that allow for significant optimizations versus the standard function.
+ * As a result we can do things like drop a frag and maintain an accurate
+ * truesize for the skb.
+ */
+static void i40e_pull_tail(struct i40e_ring *rx_ring, struct sk_buff *skb)
+{
+	struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
+	unsigned char *va;
+	unsigned int pull_len;
 
-			/* initialize queue mapping */
-			skb_record_rx_queue(skb, rx_ring->queue_index);
-			/* we are reusing so sync this buffer for CPU use */
-			dma_sync_single_range_for_cpu(rx_ring->dev,
-						      rx_ring->rx_bi[0].dma,
-						      i * rx_ring->rx_hdr_len,
-						      rx_ring->rx_hdr_len,
-						      DMA_FROM_DEVICE);
-		}
-		rx_packet_len = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
-				I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
-		rx_header_len = (qword & I40E_RXD_QW1_LENGTH_HBUF_MASK) >>
-				I40E_RXD_QW1_LENGTH_HBUF_SHIFT;
-		rx_sph = (qword & I40E_RXD_QW1_LENGTH_SPH_MASK) >>
-			 I40E_RXD_QW1_LENGTH_SPH_SHIFT;
-
-		rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
-			   I40E_RXD_QW1_ERROR_SHIFT;
-		rx_hbo = rx_error & BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
-		rx_error &= ~BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
+	/* it is valid to use page_address instead of kmap since we are
+	 * working with pages allocated out of the lomem pool per
+	 * alloc_page(GFP_ATOMIC)
+	 */
+	va = skb_frag_address(frag);
 
-		rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
-			   I40E_RXD_QW1_PTYPE_SHIFT;
-		/* sync half-page for reading */
-		dma_sync_single_range_for_cpu(rx_ring->dev,
-					      rx_bi->page_dma,
-					      rx_bi->page_offset,
-					      PAGE_SIZE / 2,
-					      DMA_FROM_DEVICE);
-		prefetch(page_address(rx_bi->page) + rx_bi->page_offset);
-		rx_bi->skb = NULL;
-		cleaned_count++;
-		copysize = 0;
-		if (rx_hbo || rx_sph) {
-			int len;
+	/* we need the header to contain the greater of either ETH_HLEN or
+	 * 60 bytes if the skb->len is less than 60 for skb_pad.
+	 */
+	pull_len = eth_get_headlen(va, I40E_RX_HDR_SIZE);
 
-			if (rx_hbo)
-				len = I40E_RX_HDR_SIZE;
-			else
-				len = rx_header_len;
-			memcpy(__skb_put(skb, len), rx_bi->hdr_buf, len);
-		} else if (skb->len == 0) {
-			int len;
-			unsigned char *va = page_address(rx_bi->page) +
-					    rx_bi->page_offset;
-
-			len = min(rx_packet_len, rx_ring->rx_hdr_len);
-			memcpy(__skb_put(skb, len), va, len);
-			copysize = len;
-			rx_packet_len -= len;
-		}
-		/* Get the rest of the data if this was a header split */
-		if (rx_packet_len) {
-			skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-					rx_bi->page,
-					rx_bi->page_offset + copysize,
-					rx_packet_len, I40E_RXBUFFER_2048);
-
-			/* If the page count is more than 2, then both halves
-			 * of the page are used and we need to free it. Do it
-			 * here instead of in the alloc code. Otherwise one
-			 * of the half-pages might be released between now and
-			 * then, and we wouldn't know which one to use.
-			 * Don't call get_page and free_page since those are
-			 * both expensive atomic operations that just change
-			 * the refcount in opposite directions. Just give the
-			 * page to the stack; he can have our refcount.
-			 */
-			if (page_count(rx_bi->page) > 2) {
-				dma_unmap_page(rx_ring->dev,
-					       rx_bi->page_dma,
-					       PAGE_SIZE,
-					       DMA_FROM_DEVICE);
-				rx_bi->page = NULL;
-				rx_bi->page_dma = 0;
-				rx_ring->rx_stats.realloc_count++;
-			} else {
-				get_page(rx_bi->page);
-				/* switch to the other half-page here; the
-				 * allocation code programs the right addr
-				 * into HW. If we haven't used this half-page,
-				 * the address won't be changed, and HW can
-				 * just use it next time through.
-				 */
-				rx_bi->page_offset ^= PAGE_SIZE / 2;
-			}
+	/* align pull length to size of long to optimize memcpy performance */
+	skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
 
-		}
-		I40E_RX_INCREMENT(rx_ring, i);
+	/* update all of the pointers */
+	skb_frag_size_sub(frag, pull_len);
+	frag->page_offset += pull_len;
+	skb->data_len -= pull_len;
+	skb->tail += pull_len;
+}
 
-		if (unlikely(
-		    !(rx_status & BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)))) {
-			struct i40e_rx_buffer *next_buffer;
+/**
+ * i40e_cleanup_headers - Correct empty headers
+ * @rx_ring: rx descriptor ring packet is being transacted on
+ * @skb: pointer to current skb being fixed
+ *
+ * Also address the case where we are pulling data in on pages only
+ * and as such no data is present in the skb header.
+ *
+ * In addition if skb is not at least 60 bytes we need to pad it so that
+ * it is large enough to qualify as a valid Ethernet frame.
+ *
+ * Returns true if an error was encountered and skb was freed.
+ **/
+static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb)
+{
+	/* place header in linear portion of buffer */
+	if (skb_is_nonlinear(skb))
+		i40e_pull_tail(rx_ring, skb);
 
-			next_buffer = &rx_ring->rx_bi[i];
-			next_buffer->skb = skb;
-			rx_ring->rx_stats.non_eop_descs++;
-			continue;
-		}
+	/* if eth_skb_pad returns an error the skb was freed */
+	if (eth_skb_pad(skb))
+		return true;
 
-		/* ERR_MASK will only have valid bits if EOP set */
-		if (unlikely(rx_error & BIT(I40E_RX_DESC_ERROR_RXE_SHIFT))) {
-			dev_kfree_skb_any(skb);
-			continue;
-		}
+	return false;
+}
 
-		i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
+/**
+ * i40e_reuse_rx_page - page flip buffer and store it back on the ring
+ * @rx_ring: rx descriptor ring to store buffers on
+ * @old_buff: donor buffer to have page reused
+ *
+ * Synchronizes page for reuse by the adapter
+ **/
+static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
+			       struct i40e_rx_buffer *old_buff)
+{
+	struct i40e_rx_buffer *new_buff;
+	u16 nta = rx_ring->next_to_alloc;
 
-		if (unlikely(rx_status & I40E_RXD_QW1_STATUS_TSYNVALID_MASK)) {
-			i40e_ptp_rx_hwtstamp(vsi->back, skb, (rx_status &
-					   I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
-					   I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT);
-			rx_ring->last_rx_timestamp = jiffies;
-		}
+	new_buff = &rx_ring->rx_bi[nta];
 
-		/* probably a little skewed due to removing CRC */
-		total_rx_bytes += skb->len;
-		total_rx_packets++;
+	/* update, and store next to alloc */
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
+	/* transfer page from old buffer to new buffer */
+	*new_buff = *old_buff;
+}
 
-		i40e_rx_checksum(vsi, skb, rx_status, rx_error, rx_ptype);
+/**
+ * i40e_page_is_reserved - check if reuse is possible
+ * @page: page struct to check
+ */
+static inline bool i40e_page_is_reserved(struct page *page)
+{
+	return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
+}
 
-		vlan_tag = rx_status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)
-			 ? le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1)
-			 : 0;
-#ifdef I40E_FCOE
-		if (unlikely(
-		    i40e_rx_is_fcoe(rx_ptype) &&
-		    !i40e_fcoe_handle_offload(rx_ring, rx_desc, skb))) {
-			dev_kfree_skb_any(skb);
-			continue;
-		}
+/**
+ * i40e_add_rx_frag - Add contents of Rx buffer to sk_buff
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @rx_buffer: buffer containing page to add
+ * @rx_desc: descriptor containing length of buffer written by hardware
+ * @skb: sk_buff to place the data into
+ *
+ * This function will add the data contained in rx_buffer->page to the skb.
+ * This is done either through a direct copy if the data in the buffer is
+ * less than the skb header size, otherwise it will just attach the page as
+ * a frag to the skb.
+ *
+ * The function will then update the page offset if necessary and return
+ * true if the buffer can be reused by the adapter.
+ **/
+static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
+			     struct i40e_rx_buffer *rx_buffer,
+			     union i40e_rx_desc *rx_desc,
+			     struct sk_buff *skb)
+{
+	struct page *page = rx_buffer->page;
+	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+	unsigned int size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
+			    I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
+#if (PAGE_SIZE < 8192)
+	unsigned int truesize = I40E_RXBUFFER_2048;
+#else
+	unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+	unsigned int last_offset = PAGE_SIZE - I40E_RXBUFFER_2048;
 #endif
-		i40e_receive_skb(rx_ring, skb, vlan_tag);
 
-		rx_desc->wb.qword1.status_error_len = 0;
+	/* will the data fit in the skb we allocated? if so, just
+	 * copy it as it is pretty small anyway
+	 */
+	if ((size <= I40E_RX_HDR_SIZE) && !skb_is_nonlinear(skb)) {
+		unsigned char *va = page_address(page) + rx_buffer->page_offset;
 
-	} while (likely(total_rx_packets < budget));
+		memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
 
-	u64_stats_update_begin(&rx_ring->syncp);
-	rx_ring->stats.packets += total_rx_packets;
-	rx_ring->stats.bytes += total_rx_bytes;
-	u64_stats_update_end(&rx_ring->syncp);
-	rx_ring->q_vector->rx.total_packets += total_rx_packets;
-	rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
+		/* page is not reserved, we can reuse buffer as-is */
+		if (likely(!i40e_page_is_reserved(page)))
+			return true;
 
-	return failure ? budget : total_rx_packets;
+		/* this page cannot be reused so discard it */
+		__free_pages(page, 0);
+		return false;
+	}
+
+	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+			rx_buffer->page_offset, size, truesize);
+
+	/* avoid re-using remote pages */
+	if (unlikely(i40e_page_is_reserved(page)))
+		return false;
+
+#if (PAGE_SIZE < 8192)
+	/* if we are only owner of page we can reuse it */
+	if (unlikely(page_count(page) != 1))
+		return false;
+
+	/* flip page offset to other buffer */
+	rx_buffer->page_offset ^= truesize;
+#else
+	/* move offset up to the next cache line */
+	rx_buffer->page_offset += truesize;
+
+	if (rx_buffer->page_offset > last_offset)
+		return false;
+#endif
+
+	/* Even if we own the page, we are not allowed to use atomic_set()
+	 * This would break get_page_unless_zero() users.
+	 */
+	get_page(rx_buffer->page);
+
+	return true;
 }
 
 /**
- * i40e_clean_rx_irq_1buf - Reclaim resources after receive; single buffer
- * @rx_ring:  rx ring to clean
- * @budget:   how many cleans we're allowed
+ * i40e_fetch_rx_buffer - Allocate skb and populate it
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @rx_desc: descriptor containing info written by hardware
  *
- * Returns number of packets cleaned
+ * This function allocates an skb on the fly, and populates it with the page
+ * data from the current receive descriptor, taking care to set up the skb
+ * correctly, as well as handling calling the page recycle function if
+ * necessary.
+ */
+static inline
+struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring *rx_ring,
+				     union i40e_rx_desc *rx_desc)
+{
+	struct i40e_rx_buffer *rx_buffer;
+	struct sk_buff *skb;
+	struct page *page;
+
+	rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
+	page = rx_buffer->page;
+	prefetchw(page);
+
+	skb = rx_buffer->skb;
+
+	if (likely(!skb)) {
+		void *page_addr = page_address(page) + rx_buffer->page_offset;
+
+		/* prefetch first cache line of first page */
+		prefetch(page_addr);
+#if L1_CACHE_BYTES < 128
+		prefetch(page_addr + L1_CACHE_BYTES);
+#endif
+
+		/* allocate a skb to store the frags */
+		skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
+				       I40E_RX_HDR_SIZE,
+				       GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(!skb)) {
+			rx_ring->rx_stats.alloc_buff_failed++;
+			return NULL;
+		}
+
+		/* we will be copying header into skb->data in
+		 * pskb_may_pull so it is in our interest to prefetch
+		 * it now to avoid a possible cache miss
+		 */
+		prefetchw(skb->data);
+
+		skb_record_rx_queue(skb, rx_ring->queue_index);
+	} else {
+		/* we are reusing so sync this buffer for CPU use */
+		dma_sync_single_range_for_cpu(rx_ring->dev,
+					      rx_buffer->dma,
+					      rx_buffer->page_offset,
+					      I40E_RXBUFFER_2048,
+					      DMA_FROM_DEVICE);
+
+		rx_buffer->skb = NULL;
+	}
+
+	/* pull page into skb */
+	if (i40e_add_rx_frag(rx_ring, rx_buffer, rx_desc, skb)) {
+		/* hand second half of page back to the ring */
+		i40e_reuse_rx_page(rx_ring, rx_buffer);
+		rx_ring->rx_stats.page_reuse_count++;
+	} else {
+		/* we are not reusing the buffer so unmap it */
+		dma_unmap_page(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
+			       DMA_FROM_DEVICE);
+	}
+
+	/* clear contents of buffer_info */
+	rx_buffer->page = NULL;
+
+	return skb;
+}
+
+/**
+ * i40e_is_non_eop - process handling of non-EOP buffers
+ * @rx_ring: Rx ring being processed
+ * @rx_desc: Rx descriptor for current buffer
+ * @skb: Current socket buffer containing buffer in progress
+ *
+ * This function updates next to clean.  If the buffer is an EOP buffer
+ * this function exits returning false, otherwise it will place the
+ * sk_buff in the next buffer to be chained and return true indicating
+ * that this is in fact a non-EOP buffer.
+ **/
+static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
+			    union i40e_rx_desc *rx_desc,
+			    struct sk_buff *skb)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	/* fetch, update, and store next to clean */
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+
+	prefetch(I40E_RX_DESC(rx_ring, ntc));
+
+#define staterrlen rx_desc->wb.qword1.status_error_len
+	if (unlikely(i40e_rx_is_programming_status(le64_to_cpu(staterrlen)))) {
+		i40e_clean_programming_status(rx_ring, rx_desc);
+		rx_ring->rx_bi[ntc].skb = skb;
+		return true;
+	}
+	/* if we are the last buffer then there is nothing else to do */
+#define I40E_RXD_EOF BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)
+	if (likely(i40e_test_staterr(rx_desc, I40E_RXD_EOF)))
+		return false;
+
+	/* place skb in next buffer to be received */
+	rx_ring->rx_bi[ntc].skb = skb;
+	rx_ring->rx_stats.non_eop_descs++;
+
+	return true;
+}
+
+/**
+ * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @budget: Total limit on number of packets to process
+ *
+ * This function provides a "bounce buffer" approach to Rx interrupt
+ * processing.  The advantage to this is that on systems that have
+ * expensive overhead for IOMMU access this provides a means of avoiding
+ * it by maintaining the mapping of the page to the system.
+ *
+ * Returns amount of work completed
  **/
-static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
+static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	struct i40e_vsi *vsi = rx_ring->vsi;
-	union i40e_rx_desc *rx_desc;
-	u32 rx_error, rx_status;
-	u16 rx_packet_len;
 	bool failure = false;
-	u8 rx_ptype;
-	u64 qword;
-	u16 i;
 
-	do {
-		struct i40e_rx_buffer *rx_bi;
+	while (likely(total_rx_packets < budget)) {
+		union i40e_rx_desc *rx_desc;
 		struct sk_buff *skb;
+		u32 rx_status;
 		u16 vlan_tag;
+		u8 rx_ptype;
+		u64 qword;
+
 		/* return some buffers to hardware, one@a time is too slow */
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
 			failure = failure ||
-				  i40e_alloc_rx_buffers_1buf(rx_ring,
-							     cleaned_count);
+				  i40e_alloc_rx_buffers(rx_ring, cleaned_count);
 			cleaned_count = 0;
 		}
 
-		i = rx_ring->next_to_clean;
-		rx_desc = I40E_RX_DESC(rx_ring, i);
+		rx_desc = I40E_RX_DESC(rx_ring, rx_ring->next_to_clean);
+
 		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+		rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
+			   I40E_RXD_QW1_PTYPE_SHIFT;
 		rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
-			I40E_RXD_QW1_STATUS_SHIFT;
+			    I40E_RXD_QW1_STATUS_SHIFT;
 
 		if (!(rx_status & BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
 			break;
 
+		/* status_error_len will always be zero for unused descriptors
+		 * because it's cleared in cleanup, and overlaps with hdr_addr
+		 * which is always zero because packet split isn't used, if the
+		 * hardware wrote DD then it will be non-zero
+		 */
+		if (!rx_desc->wb.qword1.status_error_len)
+			break;
+
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
 		 * DD bit is set.
 		 */
 		dma_rmb();
 
-		if (i40e_rx_is_programming_status(qword)) {
-			i40e_clean_programming_status(rx_ring, rx_desc);
-			I40E_RX_INCREMENT(rx_ring, i);
-			continue;
-		}
-		rx_bi = &rx_ring->rx_bi[i];
-		skb = rx_bi->skb;
-		prefetch(skb->data);
-
-		rx_packet_len = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
-				I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
-
-		rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
-			   I40E_RXD_QW1_ERROR_SHIFT;
-		rx_error &= ~BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
+		skb = i40e_fetch_rx_buffer(rx_ring, rx_desc);
+		if (!skb)
+			break;
 
-		rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
-			   I40E_RXD_QW1_PTYPE_SHIFT;
-		rx_bi->skb = NULL;
 		cleaned_count++;
 
-		/* Get the header and possibly the whole packet
-		 * If this is an skb from previous receive dma will be 0
-		 */
-		skb_put(skb, rx_packet_len);
-		dma_unmap_single(rx_ring->dev, rx_bi->dma, rx_ring->rx_buf_len,
-				 DMA_FROM_DEVICE);
-		rx_bi->dma = 0;
-
-		I40E_RX_INCREMENT(rx_ring, i);
-
-		if (unlikely(
-		    !(rx_status & BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)))) {
-			rx_ring->rx_stats.non_eop_descs++;
+		if (i40e_is_non_eop(rx_ring, rx_desc, skb))
 			continue;
-		}
 
-		/* ERR_MASK will only have valid bits if EOP set */
-		if (unlikely(rx_error & BIT(I40E_RX_DESC_ERROR_RXE_SHIFT))) {
-			dev_kfree_skb_any(skb);
+		if (i40e_cleanup_headers(rx_ring, skb))
 			continue;
-		}
-
-		i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
-		if (unlikely(rx_status & I40E_RXD_QW1_STATUS_TSYNVALID_MASK)) {
-			i40e_ptp_rx_hwtstamp(vsi->back, skb, (rx_status &
-					   I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
-					   I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT);
-			rx_ring->last_rx_timestamp = jiffies;
-		}
 
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
-		total_rx_packets++;
 
-		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
+		/* populate checksum, VLAN, and protocol */
+		i40e_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
 
-		i40e_rx_checksum(vsi, skb, rx_status, rx_error, rx_ptype);
-
-		vlan_tag = rx_status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)
-			 ? le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1)
-			 : 0;
 #ifdef I40E_FCOE
 		if (unlikely(
 		    i40e_rx_is_fcoe(rx_ptype) &&
@@ -1861,10 +1833,15 @@ static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
 			continue;
 		}
 #endif
+
+		vlan_tag = (qword & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
+			   le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1) : 0;
+
 		i40e_receive_skb(rx_ring, skb, vlan_tag);
 
-		rx_desc->wb.qword1.status_error_len = 0;
-	} while (likely(total_rx_packets < budget));
+		/* update budget accounting */
+		total_rx_packets++;
+	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
 	rx_ring->stats.packets += total_rx_packets;
@@ -1873,6 +1850,7 @@ static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
 	rx_ring->q_vector->rx.total_packets += total_rx_packets;
 	rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
 
+	/* guarantee a trip back through this routine if there was a failure */
 	return failure ? budget : total_rx_packets;
 }
 
@@ -2017,12 +1995,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
 	budget_per_ring = max(budget/q_vector->num_ringpairs, 1);
 
 	i40e_for_each_ring(ring, q_vector->rx) {
-		int cleaned;
-
-		if (ring_is_ps_enabled(ring))
-			cleaned = i40e_clean_rx_irq_ps(ring, budget_per_ring);
-		else
-			cleaned = i40e_clean_rx_irq_1buf(ring, budget_per_ring);
+		int cleaned = i40e_clean_rx_irq(ring, budget_per_ring);
 
 		work_done += cleaned;
 		/* if we clean as many as budgeted, we must not be done */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 6b2b191..54ddbd4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -102,8 +102,8 @@ enum i40e_dyn_idx_t {
 	(((pf)->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
 	  I40E_DEFAULT_RSS_HENA_EXPANDED : I40E_DEFAULT_RSS_HENA)
 
-/* Supported Rx Buffer Sizes */
-#define I40E_RXBUFFER_512   512    /* Used for packet split */
+/* Supported Rx Buffer Sizes (a multiple of 128) */
+#define I40E_RXBUFFER_256   256
 #define I40E_RXBUFFER_2048  2048
 #define I40E_RXBUFFER_3072  3072   /* For FCoE MTU of 2158 */
 #define I40E_RXBUFFER_4096  4096
@@ -114,9 +114,28 @@ enum i40e_dyn_idx_t {
  * reserve 2 more, and skb_shared_info adds an additional 384 bytes more,
  * this adds up to 512 bytes of extra data meaning the smallest allocation
  * we could have is 1K.
- * i.e. RXBUFFER_512 --> size-1024 slab
+ * i.e. RXBUFFER_256 --> 960 byte skb (size-1024 slab)
+ * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
  */
-#define I40E_RX_HDR_SIZE  I40E_RXBUFFER_512
+#define I40E_RX_HDR_SIZE I40E_RXBUFFER_256
+#define i40e_rx_desc i40e_32byte_rx_desc
+
+/**
+ * i40e_test_staterr - tests bits in Rx descriptor status and error fields
+ * @rx_desc: pointer to receive descriptor (in le64 format)
+ * @stat_err_bits: value to mask
+ *
+ * This function does some fast chicanery in order to return the
+ * value of the mask which is really only used for boolean tests.
+ * The status_error_len doesn't need to be shifted because it begins
+ * at offset zero.
+ */
+static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,
+				     const u64 stat_err_bits)
+{
+	return !!(rx_desc->wb.qword1.status_error_len &
+		  cpu_to_le64(stat_err_bits));
+}
 
 /* How many Rx Buffers do we bundle into one write to the hardware ? */
 #define I40E_RX_BUFFER_WRITE	16	/* Must be power of 2 */
@@ -142,8 +161,6 @@ enum i40e_dyn_idx_t {
 		prefetch((n));				\
 	} while (0)
 
-#define i40e_rx_desc i40e_32byte_rx_desc
-
 #define I40E_MAX_BUFFER_TXD	8
 #define I40E_MIN_TX_LEN		17
 
@@ -213,10 +230,8 @@ struct i40e_tx_buffer {
 
 struct i40e_rx_buffer {
 	struct sk_buff *skb;
-	void *hdr_buf;
 	dma_addr_t dma;
 	struct page *page;
-	dma_addr_t page_dma;
 	unsigned int page_offset;
 };
 
@@ -245,16 +260,10 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
-	__I40E_RX_PS_ENABLED,
+	__UNUSED,
 	__I40E_RX_16BYTE_DESC_ENABLED,
 };
 
-#define ring_is_ps_enabled(ring) \
-	test_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
-#define set_ring_ps_enabled(ring) \
-	set_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
-#define clear_ring_ps_enabled(ring) \
-	clear_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
 #define ring_is_16byte_desc_enabled(ring) \
 	test_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
 #define set_ring_16byte_desc_enabled(ring) \
@@ -287,7 +296,6 @@ struct i40e_ring {
 
 	u16 count;			/* Number of descriptors */
 	u16 reg_idx;			/* HW register index of the ring */
-	u16 rx_hdr_len;
 	u16 rx_buf_len;
 	u8  dtype;
 #define I40E_RX_DTYPE_NO_SPLIT      0
@@ -330,6 +338,7 @@ struct i40e_ring {
 	struct i40e_q_vector *q_vector;	/* Backreference to associated vector */
 
 	struct rcu_head rcu;		/* to avoid race on free */
+	u16 next_to_alloc;
 } ____cacheline_internodealigned_in_smp;
 
 enum i40e_latency_range {
@@ -353,9 +362,7 @@ struct i40e_ring_container {
 #define i40e_for_each_ring(pos, head) \
 	for (pos = (head).ring; pos != NULL; pos = pos->next)
 
-bool i40e_alloc_rx_buffers_ps(struct i40e_ring *rxr, u16 cleaned_count);
-bool i40e_alloc_rx_buffers_1buf(struct i40e_ring *rxr, u16 cleaned_count);
-void i40e_alloc_rx_headers(struct i40e_ring *rxr);
+bool i40e_alloc_rx_buffers(struct i40e_ring *rxr, u16 cleaned_count);
 netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
 void i40e_clean_tx_ring(struct i40e_ring *tx_ring);
 void i40e_clean_rx_ring(struct i40e_ring *rx_ring);
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index cf42f16..6435347 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -496,7 +496,6 @@ err:
 void i40evf_clean_rx_ring(struct i40e_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
-	struct i40e_rx_buffer *rx_bi;
 	unsigned long bi_size;
 	u16 i;
 
@@ -504,48 +503,22 @@ void i40evf_clean_rx_ring(struct i40e_ring *rx_ring)
 	if (!rx_ring->rx_bi)
 		return;
 
-	if (ring_is_ps_enabled(rx_ring)) {
-		int bufsz = ALIGN(rx_ring->rx_hdr_len, 256) * rx_ring->count;
-
-		rx_bi = &rx_ring->rx_bi[0];
-		if (rx_bi->hdr_buf) {
-			dma_free_coherent(dev,
-					  bufsz,
-					  rx_bi->hdr_buf,
-					  rx_bi->dma);
-			for (i = 0; i < rx_ring->count; i++) {
-				rx_bi = &rx_ring->rx_bi[i];
-				rx_bi->dma = 0;
-				rx_bi->hdr_buf = NULL;
-			}
-		}
-	}
 	/* Free all the Rx ring sk_buffs */
 	for (i = 0; i < rx_ring->count; i++) {
-		rx_bi = &rx_ring->rx_bi[i];
-		if (rx_bi->dma) {
-			dma_unmap_single(dev,
-					 rx_bi->dma,
-					 rx_ring->rx_buf_len,
-					 DMA_FROM_DEVICE);
-			rx_bi->dma = 0;
-		}
+		struct i40e_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
+
 		if (rx_bi->skb) {
 			dev_kfree_skb(rx_bi->skb);
 			rx_bi->skb = NULL;
 		}
-		if (rx_bi->page) {
-			if (rx_bi->page_dma) {
-				dma_unmap_page(dev,
-					       rx_bi->page_dma,
-					       PAGE_SIZE,
-					       DMA_FROM_DEVICE);
-				rx_bi->page_dma = 0;
-			}
-			__free_page(rx_bi->page);
-			rx_bi->page = NULL;
-			rx_bi->page_offset = 0;
-		}
+		if (!rx_bi->page)
+			continue;
+
+		dma_unmap_page(dev, rx_bi->dma, PAGE_SIZE, DMA_FROM_DEVICE);
+		__free_pages(rx_bi->page, 0);
+
+		rx_bi->page = NULL;
+		rx_bi->page_offset = 0;
 	}
 
 	bi_size = sizeof(struct i40e_rx_buffer) * rx_ring->count;
@@ -554,6 +527,7 @@ void i40evf_clean_rx_ring(struct i40e_ring *rx_ring)
 	/* Zero out the descriptor ring */
 	memset(rx_ring->desc, 0, rx_ring->size);
 
+	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
 }
@@ -578,37 +552,6 @@ void i40evf_free_rx_resources(struct i40e_ring *rx_ring)
 }
 
 /**
- * i40evf_alloc_rx_headers - allocate rx header buffers
- * @rx_ring: ring to alloc buffers
- *
- * Allocate rx header buffers for the entire ring. As these are static,
- * this is only called when setting up a new ring.
- **/
-void i40evf_alloc_rx_headers(struct i40e_ring *rx_ring)
-{
-	struct device *dev = rx_ring->dev;
-	struct i40e_rx_buffer *rx_bi;
-	dma_addr_t dma;
-	void *buffer;
-	int buf_size;
-	int i;
-
-	if (rx_ring->rx_bi[0].hdr_buf)
-		return;
-	/* Make sure the buffers don't cross cache line boundaries. */
-	buf_size = ALIGN(rx_ring->rx_hdr_len, 256);
-	buffer = dma_alloc_coherent(dev, buf_size * rx_ring->count,
-				    &dma, GFP_KERNEL);
-	if (!buffer)
-		return;
-	for (i = 0; i < rx_ring->count; i++) {
-		rx_bi = &rx_ring->rx_bi[i];
-		rx_bi->dma = dma + (i * buf_size);
-		rx_bi->hdr_buf = buffer + (i * buf_size);
-	}
-}
-
-/**
  * i40evf_setup_rx_descriptors - Allocate Rx descriptors
  * @rx_ring: Rx descriptor ring (for a specific queue) to setup
  *
@@ -629,9 +572,7 @@ int i40evf_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	u64_stats_init(&rx_ring->syncp);
 
 	/* Round up to nearest 4K */
-	rx_ring->size = ring_is_16byte_desc_enabled(rx_ring)
-		? rx_ring->count * sizeof(union i40e_16byte_rx_desc)
-		: rx_ring->count * sizeof(union i40e_32byte_rx_desc);
+	rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
 	rx_ring->size = ALIGN(rx_ring->size, 4096);
 	rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
 					   &rx_ring->dma, GFP_KERNEL);
@@ -642,6 +583,7 @@ int i40evf_setup_rx_descriptors(struct i40e_ring *rx_ring)
 		goto err;
 	}
 
+	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
 
@@ -660,6 +602,10 @@ err:
 static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 {
 	rx_ring->next_to_use = val;
+
+	/* update next to alloc since we have filled the ring */
+	rx_ring->next_to_alloc = val;
+
 	/* Force memory writes to complete before letting h/w
 	 * know there are new descriptors to fetch.  (Only
 	 * applicable for weak-ordered memory model archs,
@@ -670,160 +616,122 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
 }
 
 /**
- * i40evf_alloc_rx_buffers_ps - Replace used receive buffers; packet split
- * @rx_ring: ring to place buffers on
- * @cleaned_count: number of buffers to replace
+ * i40e_alloc_mapped_page - recycle or make a new page
+ * @rx_ring: ring to use
+ * @bi: rx_buffer struct to modify
  *
- * Returns true if any errors on allocation
+ * Returns true if the page was successfully allocated or
+ * reused.
  **/
-bool i40evf_alloc_rx_buffers_ps(struct i40e_ring *rx_ring, u16 cleaned_count)
+static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
+				   struct i40e_rx_buffer *bi)
 {
-	u16 i = rx_ring->next_to_use;
-	union i40e_rx_desc *rx_desc;
-	struct i40e_rx_buffer *bi;
-	const int current_node = numa_node_id();
+	struct page *page = bi->page;
+	dma_addr_t dma;
 
-	/* do nothing if no valid netdev defined */
-	if (!rx_ring->netdev || !cleaned_count)
-		return false;
+	/* since we are recycling buffers we should seldom need to alloc */
+	if (likely(page)) {
+		rx_ring->rx_stats.page_reuse_count++;
+		return true;
+	}
 
-	while (cleaned_count--) {
-		rx_desc = I40E_RX_DESC(rx_ring, i);
-		bi = &rx_ring->rx_bi[i];
+	/* alloc new page for storage */
+	page = dev_alloc_page();
+	if (unlikely(!page)) {
+		rx_ring->rx_stats.alloc_page_failed++;
+		return false;
+	}
 
-		if (bi->skb) /* desc is in use */
-			goto no_buffers;
+	/* map page for use */
+	dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
 
-	/* If we've been moved to a different NUMA node, release the
-	 * page so we can get a new one on the current node.
+	/* if mapping failed free memory back to system since
+	 * there isn't much point in holding memory we can't use
 	 */
-		if (bi->page &&  page_to_nid(bi->page) != current_node) {
-			dma_unmap_page(rx_ring->dev,
-				       bi->page_dma,
-				       PAGE_SIZE,
-				       DMA_FROM_DEVICE);
-			__free_page(bi->page);
-			bi->page = NULL;
-			bi->page_dma = 0;
-			rx_ring->rx_stats.realloc_count++;
-		} else if (bi->page) {
-			rx_ring->rx_stats.page_reuse_count++;
-		}
-
-		if (!bi->page) {
-			bi->page = alloc_page(GFP_ATOMIC);
-			if (!bi->page) {
-				rx_ring->rx_stats.alloc_page_failed++;
-				goto no_buffers;
-			}
-			bi->page_dma = dma_map_page(rx_ring->dev,
-						    bi->page,
-						    0,
-						    PAGE_SIZE,
-						    DMA_FROM_DEVICE);
-			if (dma_mapping_error(rx_ring->dev, bi->page_dma)) {
-				rx_ring->rx_stats.alloc_page_failed++;
-				__free_page(bi->page);
-				bi->page = NULL;
-				bi->page_dma = 0;
-				bi->page_offset = 0;
-				goto no_buffers;
-			}
-			bi->page_offset = 0;
-		}
-
-		/* Refresh the desc even if buffer_addrs didn't change
-		 * because each write-back erases this info.
-		 */
-		rx_desc->read.pkt_addr =
-				cpu_to_le64(bi->page_dma + bi->page_offset);
-		rx_desc->read.hdr_addr = cpu_to_le64(bi->dma);
-		i++;
-		if (i == rx_ring->count)
-			i = 0;
+	if (dma_mapping_error(rx_ring->dev, dma)) {
+		__free_pages(page, 0);
+		rx_ring->rx_stats.alloc_page_failed++;
+		return false;
 	}
 
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+	bi->dma = dma;
+	bi->page = page;
+	bi->page_offset = 0;
 
-	return false;
+	return true;
+}
 
-no_buffers:
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+/**
+ * i40e_receive_skb - Send a completed packet up the stack
+ * @rx_ring:  rx ring in play
+ * @skb: packet to send up
+ * @vlan_tag: vlan tag for packet
+ **/
+static void i40e_receive_skb(struct i40e_ring *rx_ring,
+			     struct sk_buff *skb, u16 vlan_tag)
+{
+	struct i40e_q_vector *q_vector = rx_ring->q_vector;
 
-	/* make sure to come back via polling to try again after
-	 * allocation failure
-	 */
-	return true;
+	if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
+	    (vlan_tag & VLAN_VID_MASK))
+		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
+
+	napi_gro_receive(&q_vector->napi, skb);
 }
 
 /**
- * i40evf_alloc_rx_buffers_1buf - Replace used receive buffers; single buffer
+ * i40evf_alloc_rx_buffers - Replace used receive buffers
  * @rx_ring: ring to place buffers on
  * @cleaned_count: number of buffers to replace
  *
- * Returns true if any errors on allocation
+ * Returns false if all allocations were successful, true if any fail
  **/
-bool i40evf_alloc_rx_buffers_1buf(struct i40e_ring *rx_ring, u16 cleaned_count)
+bool i40evf_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count)
 {
-	u16 i = rx_ring->next_to_use;
+	u16 ntu = rx_ring->next_to_use;
 	union i40e_rx_desc *rx_desc;
 	struct i40e_rx_buffer *bi;
-	struct sk_buff *skb;
 
 	/* do nothing if no valid netdev defined */
 	if (!rx_ring->netdev || !cleaned_count)
 		return false;
 
-	while (cleaned_count--) {
-		rx_desc = I40E_RX_DESC(rx_ring, i);
-		bi = &rx_ring->rx_bi[i];
-		skb = bi->skb;
-
-		if (!skb) {
-			skb = __netdev_alloc_skb_ip_align(rx_ring->netdev,
-							  rx_ring->rx_buf_len,
-							  GFP_ATOMIC |
-							  __GFP_NOWARN);
-			if (!skb) {
-				rx_ring->rx_stats.alloc_buff_failed++;
-				goto no_buffers;
-			}
-			/* initialize queue mapping */
-			skb_record_rx_queue(skb, rx_ring->queue_index);
-			bi->skb = skb;
-		}
+	rx_desc = I40E_RX_DESC(rx_ring, ntu);
+	bi = &rx_ring->rx_bi[ntu];
 
-		if (!bi->dma) {
-			bi->dma = dma_map_single(rx_ring->dev,
-						 skb->data,
-						 rx_ring->rx_buf_len,
-						 DMA_FROM_DEVICE);
-			if (dma_mapping_error(rx_ring->dev, bi->dma)) {
-				rx_ring->rx_stats.alloc_buff_failed++;
-				bi->dma = 0;
-				dev_kfree_skb(bi->skb);
-				bi->skb = NULL;
-				goto no_buffers;
-			}
-		}
+	do {
+		if (!i40e_alloc_mapped_page(rx_ring, bi))
+			goto no_buffers;
 
-		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma);
+		/* Refresh the desc even if buffer_addrs didn't change
+		 * because each write-back erases this info.
+		 */
+		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
 		rx_desc->read.hdr_addr = 0;
-		i++;
-		if (i == rx_ring->count)
-			i = 0;
-	}
 
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+		rx_desc++;
+		bi++;
+		ntu++;
+		if (unlikely(ntu == rx_ring->count)) {
+			rx_desc = I40E_RX_DESC(rx_ring, 0);
+			bi = rx_ring->rx_bi;
+			ntu = 0;
+		}
+
+		/* clear the status bits for the next_to_use descriptor */
+		rx_desc->wb.qword1.status_error_len = 0;
+
+		cleaned_count--;
+	} while (cleaned_count);
+
+	if (rx_ring->next_to_use != ntu)
+		i40e_release_rx_desc(rx_ring, ntu);
 
 	return false;
 
 no_buffers:
-	if (rx_ring->next_to_use != i)
-		i40e_release_rx_desc(rx_ring, i);
+	if (rx_ring->next_to_use != ntu)
+		i40e_release_rx_desc(rx_ring, ntu);
 
 	/* make sure to come back via polling to try again after
 	 * allocation failure
@@ -832,42 +740,35 @@ no_buffers:
 }
 
 /**
- * i40e_receive_skb - Send a completed packet up the stack
- * @rx_ring:  rx ring in play
- * @skb: packet to send up
- * @vlan_tag: vlan tag for packet
- **/
-static void i40e_receive_skb(struct i40e_ring *rx_ring,
-			     struct sk_buff *skb, u16 vlan_tag)
-{
-	struct i40e_q_vector *q_vector = rx_ring->q_vector;
-
-	if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
-	    (vlan_tag & VLAN_VID_MASK))
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
-
-	napi_gro_receive(&q_vector->napi, skb);
-}
-
-/**
  * i40e_rx_checksum - Indicate in skb if hw indicated a good cksum
  * @vsi: the VSI we care about
  * @skb: skb currently being received and modified
- * @rx_status: status value of last descriptor in packet
- * @rx_error: error value of last descriptor in packet
- * @rx_ptype: ptype value of last descriptor in packet
+ * @rx_desc: the receive descriptor
+ *
+ * skb->protocol must be set before this function is called
  **/
 static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 				    struct sk_buff *skb,
-				    u32 rx_status,
-				    u32 rx_error,
-				    u16 rx_ptype)
+				    union i40e_rx_desc *rx_desc)
 {
-	struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
+	struct i40e_rx_ptype_decoded decoded;
 	bool ipv4, ipv6, tunnel = false;
+	u32 rx_error, rx_status;
+	u8 ptype;
+	u64 qword;
+
+	qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+	ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
+	rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
+		   I40E_RXD_QW1_ERROR_SHIFT;
+	rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
+		    I40E_RXD_QW1_STATUS_SHIFT;
+	decoded = decode_rx_desc_ptype(ptype);
 
 	skb->ip_summed = CHECKSUM_NONE;
 
+	skb_checksum_none_assert(skb);
+
 	/* Rx csum enabled and ip headers found? */
 	if (!(vsi->netdev->features & NETIF_F_RXCSUM))
 		return;
@@ -933,7 +834,7 @@ checksum_fail:
  *
  * Returns a hash type to be used by skb_set_hash
  **/
-static inline enum pkt_hash_types i40e_ptype_to_htype(u8 ptype)
+static inline int i40e_ptype_to_htype(u8 ptype)
 {
 	struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
 
@@ -961,7 +862,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
 				u8 rx_ptype)
 {
 	u32 hash;
-	const __le64 rss_mask  =
+	const __le64 rss_mask =
 		cpu_to_le64((u64)I40E_RX_DESC_FLTSTAT_RSS_HASH <<
 			    I40E_RX_DESC_STATUS_FLTSTAT_SHIFT);
 
@@ -975,315 +876,401 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
 }
 
 /**
- * i40e_clean_rx_irq_ps - Reclaim resources after receive; packet split
- * @rx_ring:  rx ring to clean
- * @budget:   how many cleans we're allowed
+ * i40evf_process_skb_fields - Populate skb header fields from Rx descriptor
+ * @rx_ring: rx descriptor ring packet is being transacted on
+ * @rx_desc: pointer to the EOP Rx descriptor
+ * @skb: pointer to current skb being populated
+ * @rx_ptype: the packet type decoded by hardware
  *
- * Returns true if there's any budget left (e.g. the clean is finished)
+ * This function checks the ring, descriptor, and packet information in
+ * order to populate the hash, checksum, VLAN, protocol, and
+ * other fields within the skb.
  **/
-static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, const int budget)
+static inline
+void i40evf_process_skb_fields(struct i40e_ring *rx_ring,
+			       union i40e_rx_desc *rx_desc, struct sk_buff *skb,
+			       u8 rx_ptype)
 {
-	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-	u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
-	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	struct i40e_vsi *vsi = rx_ring->vsi;
-	u16 i = rx_ring->next_to_clean;
-	union i40e_rx_desc *rx_desc;
-	u32 rx_error, rx_status;
-	bool failure = false;
-	u8 rx_ptype;
-	u64 qword;
-	u32 copysize;
+	i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
 
-	do {
-		struct i40e_rx_buffer *rx_bi;
-		struct sk_buff *skb;
-		u16 vlan_tag;
-		/* return some buffers to hardware, one at a time is too slow */
-		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
-			failure = failure ||
-				  i40evf_alloc_rx_buffers_ps(rx_ring,
-							     cleaned_count);
-			cleaned_count = 0;
-		}
+	/* modifies the skb - consumes the enet header */
+	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
-		i = rx_ring->next_to_clean;
-		rx_desc = I40E_RX_DESC(rx_ring, i);
-		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
-		rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
-			I40E_RXD_QW1_STATUS_SHIFT;
+	i40e_rx_checksum(rx_ring->vsi, skb, rx_desc);
+}
 
-		if (!(rx_status & BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
-			break;
+/**
+ * i40e_pull_tail - i40e specific version of skb_pull_tail
+ * @rx_ring: rx descriptor ring packet is being transacted on
+ * @skb: pointer to current skb being adjusted
+ *
+ * This function is an i40e specific version of __pskb_pull_tail.  The
+ * main difference between this version and the original function is that
+ * this function can make several assumptions about the state of things
+ * that allow for significant optimizations versus the standard function.
+ * As a result we can do things like drop a frag and maintain an accurate
+ * truesize for the skb.
+ */
+static void i40e_pull_tail(struct i40e_ring *rx_ring, struct sk_buff *skb)
+{
+	struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
+	unsigned char *va;
+	unsigned int pull_len;
 
-		/* This memory barrier is needed to keep us from reading
-		 * any other fields out of the rx_desc until we know the
-		 * DD bit is set.
-		 */
-		dma_rmb();
-		/* sync header buffer for reading */
-		dma_sync_single_range_for_cpu(rx_ring->dev,
-					      rx_ring->rx_bi[0].dma,
-					      i * rx_ring->rx_hdr_len,
-					      rx_ring->rx_hdr_len,
-					      DMA_FROM_DEVICE);
-		rx_bi = &rx_ring->rx_bi[i];
-		skb = rx_bi->skb;
-		if (likely(!skb)) {
-			skb = __netdev_alloc_skb_ip_align(rx_ring->netdev,
-							  rx_ring->rx_hdr_len,
-							  GFP_ATOMIC |
-							  __GFP_NOWARN);
-			if (!skb) {
-				rx_ring->rx_stats.alloc_buff_failed++;
-				failure = true;
-				break;
-			}
+	/* it is valid to use page_address instead of kmap since we are
+	 * working with pages allocated out of the lomem pool per
+	 * alloc_page(GFP_ATOMIC)
+	 */
+	va = skb_frag_address(frag);
 
-			/* initialize queue mapping */
-			skb_record_rx_queue(skb, rx_ring->queue_index);
-			/* we are reusing so sync this buffer for CPU use */
-			dma_sync_single_range_for_cpu(rx_ring->dev,
-						      rx_ring->rx_bi[0].dma,
-						      i * rx_ring->rx_hdr_len,
-						      rx_ring->rx_hdr_len,
-						      DMA_FROM_DEVICE);
-		}
-		rx_packet_len = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
-				I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
-		rx_header_len = (qword & I40E_RXD_QW1_LENGTH_HBUF_MASK) >>
-				I40E_RXD_QW1_LENGTH_HBUF_SHIFT;
-		rx_sph = (qword & I40E_RXD_QW1_LENGTH_SPH_MASK) >>
-			 I40E_RXD_QW1_LENGTH_SPH_SHIFT;
-
-		rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
-			   I40E_RXD_QW1_ERROR_SHIFT;
-		rx_hbo = rx_error & BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
-		rx_error &= ~BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
+	/* we need the header to contain the greater of either ETH_HLEN or
+	 * 60 bytes if the skb->len is less than 60 for skb_pad.
+	 */
+	pull_len = eth_get_headlen(va, I40E_RX_HDR_SIZE);
 
-		rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
-			   I40E_RXD_QW1_PTYPE_SHIFT;
-		/* sync half-page for reading */
-		dma_sync_single_range_for_cpu(rx_ring->dev,
-					      rx_bi->page_dma,
-					      rx_bi->page_offset,
-					      PAGE_SIZE / 2,
-					      DMA_FROM_DEVICE);
-		prefetch(page_address(rx_bi->page) + rx_bi->page_offset);
-		rx_bi->skb = NULL;
-		cleaned_count++;
-		copysize = 0;
-		if (rx_hbo || rx_sph) {
-			int len;
-
-			if (rx_hbo)
-				len = I40E_RX_HDR_SIZE;
-			else
-				len = rx_header_len;
-			memcpy(__skb_put(skb, len), rx_bi->hdr_buf, len);
-		} else if (skb->len == 0) {
-			int len;
-			unsigned char *va = page_address(rx_bi->page) +
-					    rx_bi->page_offset;
-
-			len = min(rx_packet_len, rx_ring->rx_hdr_len);
-			memcpy(__skb_put(skb, len), va, len);
-			copysize = len;
-			rx_packet_len -= len;
-		}
-		/* Get the rest of the data if this was a header split */
-		if (rx_packet_len) {
-			skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-					rx_bi->page,
-					rx_bi->page_offset + copysize,
-					rx_packet_len, I40E_RXBUFFER_2048);
-
-			/* If the page count is more than 2, then both halves
-			 * of the page are used and we need to free it. Do it
-			 * here instead of in the alloc code. Otherwise one
-			 * of the half-pages might be released between now and
-			 * then, and we wouldn't know which one to use.
-			 * Don't call get_page and free_page since those are
-			 * both expensive atomic operations that just change
-			 * the refcount in opposite directions. Just give the
-			 * page to the stack; he can have our refcount.
-			 */
-			if (page_count(rx_bi->page) > 2) {
-				dma_unmap_page(rx_ring->dev,
-					       rx_bi->page_dma,
-					       PAGE_SIZE,
-					       DMA_FROM_DEVICE);
-				rx_bi->page = NULL;
-				rx_bi->page_dma = 0;
-				rx_ring->rx_stats.realloc_count++;
-			} else {
-				get_page(rx_bi->page);
-				/* switch to the other half-page here; the
-				 * allocation code programs the right addr
-				 * into HW. If we haven't used this half-page,
-				 * the address won't be changed, and HW can
-				 * just use it next time through.
-				 */
-				rx_bi->page_offset ^= PAGE_SIZE / 2;
-			}
+	/* align pull length to size of long to optimize memcpy performance */
+	skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
 
-		}
-		I40E_RX_INCREMENT(rx_ring, i);
+	/* update all of the pointers */
+	skb_frag_size_sub(frag, pull_len);
+	frag->page_offset += pull_len;
+	skb->data_len -= pull_len;
+	skb->tail += pull_len;
+}
 
-		if (unlikely(
-		    !(rx_status & BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)))) {
-			struct i40e_rx_buffer *next_buffer;
+/**
+ * i40e_cleanup_headers - Correct empty headers
+ * @rx_ring: rx descriptor ring packet is being transacted on
+ * @skb: pointer to current skb being fixed
+ *
+ * Also address the case where we are pulling data in on pages only
+ * and as such no data is present in the skb header.
+ *
+ * In addition if skb is not at least 60 bytes we need to pad it so that
+ * it is large enough to qualify as a valid Ethernet frame.
+ *
+ * Returns true if an error was encountered and skb was freed.
+ **/
+static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb)
+{
+	/* place header in linear portion of buffer */
+	if (skb_is_nonlinear(skb))
+		i40e_pull_tail(rx_ring, skb);
 
-			next_buffer = &rx_ring->rx_bi[i];
-			next_buffer->skb = skb;
-			rx_ring->rx_stats.non_eop_descs++;
-			continue;
-		}
+	/* if eth_skb_pad returns an error the skb was freed */
+	if (eth_skb_pad(skb))
+		return true;
 
-		/* ERR_MASK will only have valid bits if EOP set */
-		if (unlikely(rx_error & BIT(I40E_RX_DESC_ERROR_RXE_SHIFT))) {
-			dev_kfree_skb_any(skb);
-			continue;
-		}
+	return false;
+}
 
-		i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
+/**
+ * i40e_reuse_rx_page - page flip buffer and store it back on the ring
+ * @rx_ring: rx descriptor ring to store buffers on
+ * @old_buff: donor buffer to have page reused
+ *
+ * Synchronizes page for reuse by the adapter
+ **/
+static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
+			       struct i40e_rx_buffer *old_buff)
+{
+	struct i40e_rx_buffer *new_buff;
+	u16 nta = rx_ring->next_to_alloc;
 
-		/* probably a little skewed due to removing CRC */
-		total_rx_bytes += skb->len;
-		total_rx_packets++;
+	new_buff = &rx_ring->rx_bi[nta];
 
-		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
+	/* update, and store next to alloc */
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
 
-		i40e_rx_checksum(vsi, skb, rx_status, rx_error, rx_ptype);
+	/* transfer page from old buffer to new buffer */
+	*new_buff = *old_buff;
+}
 
-		vlan_tag = rx_status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)
-			 ? le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1)
-			 : 0;
-#ifdef I40E_FCOE
-		if (unlikely(
-		    i40e_rx_is_fcoe(rx_ptype) &&
-		    !i40e_fcoe_handle_offload(rx_ring, rx_desc, skb))) {
-			dev_kfree_skb_any(skb);
-			continue;
-		}
+/**
+ * i40e_page_is_reserved - check if reuse is possible
+ * @page: page struct to check
+ */
+static inline bool i40e_page_is_reserved(struct page *page)
+{
+	return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
+}
+
+/**
+ * i40e_add_rx_frag - Add contents of Rx buffer to sk_buff
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @rx_buffer: buffer containing page to add
+ * @rx_desc: descriptor containing length of buffer written by hardware
+ * @skb: sk_buff to place the data into
+ *
+ * This function will add the data contained in rx_buffer->page to the skb.
+ * This is done either through a direct copy if the data in the buffer is
+ * less than the skb header size, otherwise it will just attach the page as
+ * a frag to the skb.
+ *
+ * The function will then update the page offset if necessary and return
+ * true if the buffer can be reused by the adapter.
+ **/
+static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
+			     struct i40e_rx_buffer *rx_buffer,
+			     union i40e_rx_desc *rx_desc,
+			     struct sk_buff *skb)
+{
+	struct page *page = rx_buffer->page;
+	u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+	unsigned int size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
+			    I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
+#if (PAGE_SIZE < 8192)
+	unsigned int truesize = I40E_RXBUFFER_2048;
+#else
+	unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
+	unsigned int last_offset = PAGE_SIZE - I40E_RXBUFFER_2048;
 #endif
-		i40e_receive_skb(rx_ring, skb, vlan_tag);
 
-		rx_desc->wb.qword1.status_error_len = 0;
+	/* will the data fit in the skb we allocated? if so, just
+	 * copy it as it is pretty small anyway
+	 */
+	if ((size <= I40E_RX_HDR_SIZE) && !skb_is_nonlinear(skb)) {
+		unsigned char *va = page_address(page) + rx_buffer->page_offset;
 
-	} while (likely(total_rx_packets < budget));
+		memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
 
-	u64_stats_update_begin(&rx_ring->syncp);
-	rx_ring->stats.packets += total_rx_packets;
-	rx_ring->stats.bytes += total_rx_bytes;
-	u64_stats_update_end(&rx_ring->syncp);
-	rx_ring->q_vector->rx.total_packets += total_rx_packets;
-	rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
+		/* page is not reserved, we can reuse buffer as-is */
+		if (likely(!i40e_page_is_reserved(page)))
+			return true;
 
-	return failure ? budget : total_rx_packets;
+		/* this page cannot be reused so discard it */
+		__free_pages(page, 0);
+		return false;
+	}
+
+	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+			rx_buffer->page_offset, size, truesize);
+
+	/* avoid re-using remote pages */
+	if (unlikely(i40e_page_is_reserved(page)))
+		return false;
+
+#if (PAGE_SIZE < 8192)
+	/* if we are only owner of page we can reuse it */
+	if (unlikely(page_count(page) != 1))
+		return false;
+
+	/* flip page offset to other buffer */
+	rx_buffer->page_offset ^= truesize;
+#else
+	/* move offset up to the next cache line */
+	rx_buffer->page_offset += truesize;
+
+	if (rx_buffer->page_offset > last_offset)
+		return false;
+#endif
+
+	/* Even if we own the page, we are not allowed to use atomic_set()
+	 * This would break get_page_unless_zero() users.
+	 */
+	get_page(rx_buffer->page);
+
+	return true;
+}
+
+/**
+ * i40evf_fetch_rx_buffer - Allocate skb and populate it
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @rx_desc: descriptor containing info written by hardware
+ *
+ * This function allocates an skb on the fly, and populates it with the page
+ * data from the current receive descriptor, taking care to set up the skb
+ * correctly, as well as handling calling the page recycle function if
+ * necessary.
+ */
+static inline
+struct sk_buff *i40evf_fetch_rx_buffer(struct i40e_ring *rx_ring,
+				       union i40e_rx_desc *rx_desc)
+{
+	struct i40e_rx_buffer *rx_buffer;
+	struct sk_buff *skb;
+	struct page *page;
+
+	rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
+	page = rx_buffer->page;
+	prefetchw(page);
+
+	skb = rx_buffer->skb;
+
+	if (likely(!skb)) {
+		void *page_addr = page_address(page) + rx_buffer->page_offset;
+
+		/* prefetch first cache line of first page */
+		prefetch(page_addr);
+#if L1_CACHE_BYTES < 128
+		prefetch(page_addr + L1_CACHE_BYTES);
+#endif
+
+		/* allocate a skb to store the frags */
+		skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
+				       I40E_RX_HDR_SIZE,
+				       GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(!skb)) {
+			rx_ring->rx_stats.alloc_buff_failed++;
+			return NULL;
+		}
+
+		/* we will be copying header into skb->data in
+		 * pskb_may_pull so it is in our interest to prefetch
+		 * it now to avoid a possible cache miss
+		 */
+		prefetchw(skb->data);
+
+		skb_record_rx_queue(skb, rx_ring->queue_index);
+	} else {
+		/* we are reusing so sync this buffer for CPU use */
+		dma_sync_single_range_for_cpu(rx_ring->dev,
+					      rx_buffer->dma,
+					      rx_buffer->page_offset,
+					      I40E_RXBUFFER_2048,
+					      DMA_FROM_DEVICE);
+
+		rx_buffer->skb = NULL;
+	}
+
+	/* pull page into skb */
+	if (i40e_add_rx_frag(rx_ring, rx_buffer, rx_desc, skb)) {
+		/* hand second half of page back to the ring */
+		i40e_reuse_rx_page(rx_ring, rx_buffer);
+		rx_ring->rx_stats.page_reuse_count++;
+	} else {
+		/* we are not reusing the buffer so unmap it */
+		dma_unmap_page(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
+			       DMA_FROM_DEVICE);
+	}
+
+	/* clear contents of buffer_info */
+	rx_buffer->page = NULL;
+
+	return skb;
 }
 
 /**
- * i40e_clean_rx_irq_1buf - Reclaim resources after receive; single buffer
- * @rx_ring:  rx ring to clean
- * @budget:   how many cleans we're allowed
+ * i40e_is_non_eop - process handling of non-EOP buffers
+ * @rx_ring: Rx ring being processed
+ * @rx_desc: Rx descriptor for current buffer
+ * @skb: Current socket buffer containing buffer in progress
  *
- * Returns number of packets cleaned
+ * This function updates next to clean.  If the buffer is an EOP buffer
+ * this function exits returning false, otherwise it will place the
+ * sk_buff in the next buffer to be chained and return true indicating
+ * that this is in fact a non-EOP buffer.
  **/
-static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
+static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
+			    union i40e_rx_desc *rx_desc,
+			    struct sk_buff *skb)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	/* fetch, update, and store next to clean */
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+
+	prefetch(I40E_RX_DESC(rx_ring, ntc));
+
+	/* if we are the last buffer then there is nothing else to do */
+#define I40E_RXD_EOF BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)
+	if (likely(i40e_test_staterr(rx_desc, I40E_RXD_EOF)))
+		return false;
+
+	/* place skb in next buffer to be received */
+	rx_ring->rx_bi[ntc].skb = skb;
+	rx_ring->rx_stats.non_eop_descs++;
+
+	return true;
+}
+
+/**
+ * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
+ * @rx_ring: rx descriptor ring to transact packets on
+ * @budget: Total limit on number of packets to process
+ *
+ * This function provides a "bounce buffer" approach to Rx interrupt
+ * processing.  The advantage to this is that on systems that have
+ * expensive overhead for IOMMU access this provides a means of avoiding
+ * it by maintaining the mapping of the page to the system.
+ *
+ * Returns amount of work completed
+ **/
+static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	struct i40e_vsi *vsi = rx_ring->vsi;
-	union i40e_rx_desc *rx_desc;
-	u32 rx_error, rx_status;
-	u16 rx_packet_len;
 	bool failure = false;
-	u8 rx_ptype;
-	u64 qword;
-	u16 i;
 
-	do {
-		struct i40e_rx_buffer *rx_bi;
+	while (likely(total_rx_packets < budget)) {
+		union i40e_rx_desc *rx_desc;
 		struct sk_buff *skb;
+		u32 rx_status;
 		u16 vlan_tag;
+		u8 rx_ptype;
+		u64 qword;
+
 		/* return some buffers to hardware, one@a time is too slow */
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
 			failure = failure ||
-				  i40evf_alloc_rx_buffers_1buf(rx_ring,
-							       cleaned_count);
+				  i40evf_alloc_rx_buffers(rx_ring, cleaned_count);
 			cleaned_count = 0;
 		}
 
-		i = rx_ring->next_to_clean;
-		rx_desc = I40E_RX_DESC(rx_ring, i);
+		rx_desc = I40E_RX_DESC(rx_ring, rx_ring->next_to_clean);
+
 		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
+		rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
+			   I40E_RXD_QW1_PTYPE_SHIFT;
 		rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
-			I40E_RXD_QW1_STATUS_SHIFT;
+			    I40E_RXD_QW1_STATUS_SHIFT;
 
 		if (!(rx_status & BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
 			break;
 
+		/* status_error_len will always be zero for unused descriptors
+		 * because it's cleared in cleanup, and overlaps with hdr_addr
+		 * which is always zero because packet split isn't used, if the
+		 * hardware wrote DD then it will be non-zero
+		 */
+		if (!rx_desc->wb.qword1.status_error_len)
+			break;
+
 		/* This memory barrier is needed to keep us from reading
 		 * any other fields out of the rx_desc until we know the
 		 * DD bit is set.
 		 */
 		dma_rmb();
 
-		rx_bi = &rx_ring->rx_bi[i];
-		skb = rx_bi->skb;
-		prefetch(skb->data);
-
-		rx_packet_len = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
-				I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
-
-		rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
-			   I40E_RXD_QW1_ERROR_SHIFT;
-		rx_error &= ~BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
+		skb = i40evf_fetch_rx_buffer(rx_ring, rx_desc);
+		if (!skb)
+			break;
 
-		rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
-			   I40E_RXD_QW1_PTYPE_SHIFT;
-		rx_bi->skb = NULL;
 		cleaned_count++;
 
-		/* Get the header and possibly the whole packet
-		 * If this is an skb from previous receive dma will be 0
-		 */
-		skb_put(skb, rx_packet_len);
-		dma_unmap_single(rx_ring->dev, rx_bi->dma, rx_ring->rx_buf_len,
-				 DMA_FROM_DEVICE);
-		rx_bi->dma = 0;
-
-		I40E_RX_INCREMENT(rx_ring, i);
-
-		if (unlikely(
-		    !(rx_status & BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)))) {
-			rx_ring->rx_stats.non_eop_descs++;
+		if (i40e_is_non_eop(rx_ring, rx_desc, skb))
 			continue;
-		}
 
-		/* ERR_MASK will only have valid bits if EOP set */
-		if (unlikely(rx_error & BIT(I40E_RX_DESC_ERROR_RXE_SHIFT))) {
-			dev_kfree_skb_any(skb);
+		if (i40e_cleanup_headers(rx_ring, skb))
 			continue;
-		}
 
-		i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
 		/* probably a little skewed due to removing CRC */
 		total_rx_bytes += skb->len;
-		total_rx_packets++;
 
-		skb->protocol = eth_type_trans(skb, rx_ring->netdev);
+		/* populate checksum, VLAN, and protocol */
+		i40evf_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
 
-		i40e_rx_checksum(vsi, skb, rx_status, rx_error, rx_ptype);
 
-		vlan_tag = rx_status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)
-			 ? le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1)
-			 : 0;
+		vlan_tag = (qword & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
+			   le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1) : 0;
+
 		i40e_receive_skb(rx_ring, skb, vlan_tag);
 
-		rx_desc->wb.qword1.status_error_len = 0;
-	} while (likely(total_rx_packets < budget));
+		/* update budget accounting */
+		total_rx_packets++;
+	}
 
 	u64_stats_update_begin(&rx_ring->syncp);
 	rx_ring->stats.packets += total_rx_packets;
@@ -1292,6 +1279,7 @@ static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
 	rx_ring->q_vector->rx.total_packets += total_rx_packets;
 	rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
 
+	/* guarantee a trip back through this routine if there was a failure */
 	return failure ? budget : total_rx_packets;
 }
 
@@ -1433,12 +1421,7 @@ int i40evf_napi_poll(struct napi_struct *napi, int budget)
 	budget_per_ring = max(budget/q_vector->num_ringpairs, 1);
 
 	i40e_for_each_ring(ring, q_vector->rx) {
-		int cleaned;
-
-		if (ring_is_ps_enabled(ring))
-			cleaned = i40e_clean_rx_irq_ps(ring, budget_per_ring);
-		else
-			cleaned = i40e_clean_rx_irq_1buf(ring, budget_per_ring);
+		int cleaned = i40e_clean_rx_irq(ring, budget_per_ring);
 
 		work_done += cleaned;
 		/* if we clean as many as budgeted, we must not be done */
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 54b52e8..0bffbd9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -102,8 +102,8 @@ enum i40e_dyn_idx_t {
 	(((pf)->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
 	  I40E_DEFAULT_RSS_HENA_EXPANDED : I40E_DEFAULT_RSS_HENA)
 
-/* Supported Rx Buffer Sizes */
-#define I40E_RXBUFFER_512   512    /* Used for packet split */
+/* Supported Rx Buffer Sizes (a multiple of 128) */
+#define I40E_RXBUFFER_256   256
 #define I40E_RXBUFFER_2048  2048
 #define I40E_RXBUFFER_3072  3072   /* For FCoE MTU of 2158 */
 #define I40E_RXBUFFER_4096  4096
@@ -114,9 +114,28 @@ enum i40e_dyn_idx_t {
  * reserve 2 more, and skb_shared_info adds an additional 384 bytes more,
  * this adds up to 512 bytes of extra data meaning the smallest allocation
  * we could have is 1K.
- * i.e. RXBUFFER_512 --> size-1024 slab
+ * i.e. RXBUFFER_256 --> 960 byte skb (size-1024 slab)
+ * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
  */
-#define I40E_RX_HDR_SIZE  I40E_RXBUFFER_512
+#define I40E_RX_HDR_SIZE I40E_RXBUFFER_256
+#define i40e_rx_desc i40e_32byte_rx_desc
+
+/**
+ * i40e_test_staterr - tests bits in Rx descriptor status and error fields
+ * @rx_desc: pointer to receive descriptor (in le64 format)
+ * @stat_err_bits: value to mask
+ *
+ * This function does some fast chicanery in order to return the
+ * value of the mask which is really only used for boolean tests.
+ * The status_error_len doesn't need to be shifted because it begins
+ * at offset zero.
+ */
+static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,
+				     const u64 stat_err_bits)
+{
+	return !!(rx_desc->wb.qword1.status_error_len &
+		  cpu_to_le64(stat_err_bits));
+}
 
 /* How many Rx Buffers do we bundle into one write to the hardware ? */
 #define I40E_RX_BUFFER_WRITE	16	/* Must be power of 2 */
@@ -142,8 +161,6 @@ enum i40e_dyn_idx_t {
 		prefetch((n));				\
 	} while (0)
 
-#define i40e_rx_desc i40e_32byte_rx_desc
-
 #define I40E_MAX_BUFFER_TXD	8
 #define I40E_MIN_TX_LEN		17
 
@@ -212,10 +229,8 @@ struct i40e_tx_buffer {
 
 struct i40e_rx_buffer {
 	struct sk_buff *skb;
-	void *hdr_buf;
 	dma_addr_t dma;
 	struct page *page;
-	dma_addr_t page_dma;
 	unsigned int page_offset;
 };
 
@@ -244,16 +259,10 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
-	__I40E_RX_PS_ENABLED,
+	__UNUSED,
 	__I40E_RX_16BYTE_DESC_ENABLED,
 };
 
-#define ring_is_ps_enabled(ring) \
-	test_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
-#define set_ring_ps_enabled(ring) \
-	set_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
-#define clear_ring_ps_enabled(ring) \
-	clear_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
 #define ring_is_16byte_desc_enabled(ring) \
 	test_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
 #define set_ring_16byte_desc_enabled(ring) \
@@ -278,7 +287,6 @@ struct i40e_ring {
 
 	u16 count;			/* Number of descriptors */
 	u16 reg_idx;			/* HW register index of the ring */
-	u16 rx_hdr_len;
 	u16 rx_buf_len;
 	u8  dtype;
 #define I40E_RX_DTYPE_NO_SPLIT      0
@@ -319,6 +327,7 @@ struct i40e_ring {
 	struct i40e_q_vector *q_vector;	/* Backreference to associated vector */
 
 	struct rcu_head rcu;		/* to avoid race on free */
+	u16 next_to_alloc;
 } ____cacheline_internodealigned_in_smp;
 
 enum i40e_latency_range {
@@ -342,9 +351,7 @@ struct i40e_ring_container {
 #define i40e_for_each_ring(pos, head) \
 	for (pos = (head).ring; pos != NULL; pos = pos->next)
 
-bool i40evf_alloc_rx_buffers_ps(struct i40e_ring *rxr, u16 cleaned_count);
-bool i40evf_alloc_rx_buffers_1buf(struct i40e_ring *rxr, u16 cleaned_count);
-void i40evf_alloc_rx_headers(struct i40e_ring *rxr);
+bool i40evf_alloc_rx_buffers(struct i40e_ring *rxr, u16 cleaned_count);
 netdev_tx_t i40evf_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
 void i40evf_clean_tx_ring(struct i40e_ring *tx_ring);
 void i40evf_clean_rx_ring(struct i40e_ring *rx_ring);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h b/drivers/net/ethernet/intel/i40evf/i40evf.h
index e8dee48..4275e39 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -80,9 +80,6 @@ struct i40e_vsi {
 #define I40EVF_REQ_DESCRIPTOR_MULTIPLE  32
 
 /* Supported Rx Buffer Sizes */
-#define I40EVF_RXBUFFER_64    64     /* Used for packet split */
-#define I40EVF_RXBUFFER_128   128    /* Used for packet split */
-#define I40EVF_RXBUFFER_256   256    /* Used for packet split */
 #define I40EVF_RXBUFFER_2048  2048
 #define I40EVF_MAX_RXBUFFER   16384  /* largest size for single descriptor */
 #define I40EVF_MAX_AQ_BUF_SIZE    4096
@@ -208,9 +205,6 @@ struct i40evf_adapter {
 
 	u32 flags;
 #define I40EVF_FLAG_RX_CSUM_ENABLED              BIT(0)
-#define I40EVF_FLAG_RX_1BUF_CAPABLE              BIT(1)
-#define I40EVF_FLAG_RX_PS_CAPABLE                BIT(2)
-#define I40EVF_FLAG_RX_PS_ENABLED                BIT(3)
 #define I40EVF_FLAG_IMIR_ENABLED                 BIT(5)
 #define I40EVF_FLAG_MQ_CAPABLE                   BIT(6)
 #define I40EVF_FLAG_NEED_LINK_UPDATE             BIT(7)
@@ -296,7 +290,6 @@ struct i40evf_adapter {
 
 
 /* Ethtool Private Flags */
-#define I40EVF_PRIV_FLAGS_PS		BIT(0)
 
 /* needed by i40evf_ethtool.c */
 extern char i40evf_driver_name[];
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
index 179fa6a..89f3067 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c
@@ -64,7 +64,7 @@ static const struct i40evf_stats i40evf_gstrings_stats[] = {
 	(I40EVF_GLOBAL_STATS_LEN + I40EVF_QUEUE_STATS_LEN(_dev))
 
 static const char i40evf_priv_flags_strings[][ETH_GSTRING_LEN] = {
-	"packet-split",
+	/* this is now empty but ready for priv flags */
 };
 
 #define I40EVF_PRIV_FLAGS_STR_LEN ARRAY_SIZE(i40evf_priv_flags_strings)
@@ -535,54 +535,6 @@ static int i40evf_set_rxfh(struct net_device *netdev, const u32 *indir,
 	return i40evf_config_rss(adapter);
 }
 
-/**
- * i40evf_get_priv_flags - report device private flags
- * @dev: network interface device structure
- *
- * The get string set count and the string set should be matched for each
- * flag returned.  Add new strings for each flag to the i40e_priv_flags_strings
- * array.
- *
- * Returns a u32 bitmap of flags.
- **/
-static u32 i40evf_get_priv_flags(struct net_device *dev)
-{
-	struct i40evf_adapter *adapter = netdev_priv(dev);
-	u32 ret_flags = 0;
-
-	ret_flags |= adapter->flags & I40EVF_FLAG_RX_PS_ENABLED ?
-		I40EVF_PRIV_FLAGS_PS : 0;
-
-	return ret_flags;
-}
-
-/**
- * i40evf_set_priv_flags - set private flags
- * @dev: network interface device structure
- * @flags: bit flags to be set
- **/
-static int i40evf_set_priv_flags(struct net_device *dev, u32 flags)
-{
-	struct i40evf_adapter *adapter = netdev_priv(dev);
-	bool reset_required = false;
-
-	if ((flags & I40EVF_PRIV_FLAGS_PS) &&
-	    !(adapter->flags & I40EVF_FLAG_RX_PS_ENABLED)) {
-		adapter->flags |= I40EVF_FLAG_RX_PS_ENABLED;
-		reset_required = true;
-	} else if (!(flags & I40EVF_PRIV_FLAGS_PS) &&
-		   (adapter->flags & I40EVF_FLAG_RX_PS_ENABLED)) {
-		adapter->flags &= ~I40EVF_FLAG_RX_PS_ENABLED;
-		reset_required = true;
-	}
-
-	/* if needed, issue reset to cause things to take effect */
-	if (reset_required)
-		i40evf_schedule_reset(adapter);
-
-	return 0;
-}
-
 static const struct ethtool_ops i40evf_ethtool_ops = {
 	.get_settings		= i40evf_get_settings,
 	.get_drvinfo		= i40evf_get_drvinfo,
@@ -592,8 +544,6 @@ static const struct ethtool_ops i40evf_ethtool_ops = {
 	.get_strings		= i40evf_get_strings,
 	.get_ethtool_stats	= i40evf_get_ethtool_stats,
 	.get_sset_count		= i40evf_get_sset_count,
-	.get_priv_flags		= i40evf_get_priv_flags,
-	.set_priv_flags		= i40evf_set_priv_flags,
 	.get_msglevel		= i40evf_get_msglevel,
 	.set_msglevel		= i40evf_set_msglevel,
 	.get_coalesce		= i40evf_get_coalesce,
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 1b8dc22..7b186d1 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -649,28 +649,11 @@ static void i40evf_configure_tx(struct i40evf_adapter *adapter)
 static void i40evf_configure_rx(struct i40evf_adapter *adapter)
 {
 	struct i40e_hw *hw = &adapter->hw;
-	struct net_device *netdev = adapter->netdev;
-	int max_frame = netdev->mtu + ETH_HLEN + ETH_FCS_LEN;
 	int i;
-	int rx_buf_len;
-
-
-	/* Set the RX buffer length according to the mode */
-	if (adapter->flags & I40EVF_FLAG_RX_PS_ENABLED ||
-	    netdev->mtu <= ETH_DATA_LEN)
-		rx_buf_len = I40EVF_RXBUFFER_2048;
-	else
-		rx_buf_len = ALIGN(max_frame, 1024);
 
 	for (i = 0; i < adapter->num_active_queues; i++) {
 		adapter->rx_rings[i].tail = hw->hw_addr + I40E_QRX_TAIL1(i);
-		adapter->rx_rings[i].rx_buf_len = rx_buf_len;
-		if (adapter->flags & I40EVF_FLAG_RX_PS_ENABLED) {
-			set_ring_ps_enabled(&adapter->rx_rings[i]);
-			adapter->rx_rings[i].rx_hdr_len = I40E_RX_HDR_SIZE;
-		} else {
-			clear_ring_ps_enabled(&adapter->rx_rings[i]);
-		}
+		adapter->rx_rings[i].rx_buf_len = I40EVF_RXBUFFER_2048;
 	}
 }
 
@@ -1015,12 +998,7 @@ static void i40evf_configure(struct i40evf_adapter *adapter)
 	for (i = 0; i < adapter->num_active_queues; i++) {
 		struct i40e_ring *ring = &adapter->rx_rings[i];
 
-	if (adapter->flags & I40EVF_FLAG_RX_PS_ENABLED) {
-		i40evf_alloc_rx_headers(ring);
-		i40evf_alloc_rx_buffers_ps(ring, ring->count);
-	} else {
-		i40evf_alloc_rx_buffers_1buf(ring, ring->count);
-	}
+		i40evf_alloc_rx_buffers(ring, ring->count);
 		ring->next_to_use = ring->count - 1;
 		writel(ring->next_to_use, ring->tail);
 	}
@@ -2423,11 +2401,6 @@ static void i40evf_init_task(struct work_struct *work)
 	adapter->current_op = I40E_VIRTCHNL_OP_UNKNOWN;
 
 	adapter->flags |= I40EVF_FLAG_RX_CSUM_ENABLED;
-	adapter->flags |= I40EVF_FLAG_RX_1BUF_CAPABLE;
-	adapter->flags |= I40EVF_FLAG_RX_PS_CAPABLE;
-
-	/* Default to single buffer rx, can be changed through ethtool. */
-	adapter->flags &= ~I40EVF_FLAG_RX_PS_ENABLED;
 
 	netdev->netdev_ops = &i40evf_netdev_ops;
 	i40evf_set_ethtool_ops(netdev);
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index e0ea64b..6b422b9 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -270,10 +270,6 @@ void i40evf_configure_queues(struct i40evf_adapter *adapter)
 		vqpi->rxq.max_pkt_size = adapter->netdev->mtu
 					+ ETH_HLEN + VLAN_HLEN + ETH_FCS_LEN;
 		vqpi->rxq.databuffer_size = adapter->rx_rings[i].rx_buf_len;
-		if (adapter->flags & I40EVF_FLAG_RX_PS_ENABLED) {
-			vqpi->rxq.splithdr_enabled = true;
-			vqpi->rxq.hdr_size = I40E_RX_HDR_SIZE;
-		}
 		vqpi++;
 	}
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 12/14] i40e/i40evf: Remove unused hardware receive descriptor code
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (10 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 13/14] i40evf: Allocate rx buffers properly Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 14/14] i40e: Test memory before ethtool alloc succeeds Harshitha Ramamurthy
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

The hardware supports a 16 byte descriptor for receive, but the
driver was never using it in production.  There was no performance
benefit to the real driver of 16 byte descriptors, so drop a whole
lot of complexity while getting rid of the code.

Also since the previous patch made us use no-split mode all the
time, drop any support in the driver for any other value in dtype
and assume it is always zero (aka no-split).

Hooray for code removal!

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Change-ID: I2257e902e4dad84a07b94db6d2e6f4ce69b27bc0
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  7 +------
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 20 +++-----------------
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 18 +++++-------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    | 26 ++++++++++----------------
 drivers/net/ethernet/intel/i40evf/i40e_txrx.h  | 26 ++++++++++----------------
 5 files changed, 29 insertions(+), 68 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 5e23cc9..5831a7f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -122,10 +122,7 @@
 #define XSTRINGIFY(bar) STRINGIFY(bar)
 
 #define I40E_RX_DESC(R, i)			\
-	((ring_is_16byte_desc_enabled(R))	\
-		? (union i40e_32byte_rx_desc *)	\
-			(&(((union i40e_16byte_rx_desc *)((R)->desc))[i])) \
-		: (&(((union i40e_32byte_rx_desc *)((R)->desc))[i])))
+	(&(((union i40e_32byte_rx_desc *)((R)->desc))[i]))
 #define I40E_TX_DESC(R, i)			\
 	(&(((struct i40e_tx_desc *)((R)->desc))[i]))
 #define I40E_TX_CTXTDESC(R, i)			\
@@ -342,7 +339,6 @@ struct i40e_pf {
 #ifdef I40E_FCOE
 #define I40E_FLAG_FCOE_ENABLED			BIT_ULL(11)
 #endif /* I40E_FCOE */
-#define I40E_FLAG_16BYTE_RX_DESC_ENABLED	BIT_ULL(13)
 #define I40E_FLAG_CLEAN_ADMINQ			BIT_ULL(14)
 #define I40E_FLAG_FILTER_SYNC			BIT_ULL(15)
 #define I40E_FLAG_SERVICE_CLIENT_REQUESTED	BIT_ULL(16)
@@ -547,7 +543,6 @@ struct i40e_vsi {
 
 	u16 max_frame;
 	u16 rx_buf_len;
-	u8  dtype;
 
 	/* List of q_vectors allocated to this VSI */
 	struct i40e_q_vector **q_vectors;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 519cfc8..ee9ba06 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -271,7 +271,7 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 "    rx_rings[%i]: rx_buf_len = %d, dtype = %d\n",
 			 i,
 			 rx_ring->rx_buf_len,
-			 rx_ring->dtype);
+			 0);
 		dev_info(&pf->pdev->dev,
 			 "    rx_rings[%i]: next_to_use = %d, next_to_clean = %d, ring_active = %i\n",
 			 i,
@@ -327,7 +327,7 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 			 tx_ring->reg_idx);
 		dev_info(&pf->pdev->dev,
 			 "    tx_rings[%i]: dtype = %d\n",
-			 i, tx_ring->dtype);
+			 i, 0);
 		dev_info(&pf->pdev->dev,
 			 "    tx_rings[%i]: next_to_use = %d, next_to_clean = %d, ring_active = %i\n",
 			 i,
@@ -366,7 +366,7 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 		 vsi->work_limit);
 	dev_info(&pf->pdev->dev,
 		 "    max_frame = %d, rx_buf_len = %d dtype = %d\n",
-		 vsi->max_frame, vsi->rx_buf_len, vsi->dtype);
+		 vsi->max_frame, vsi->rx_buf_len, 0);
 	dev_info(&pf->pdev->dev,
 		 "    num_q_vectors = %i, base_vector = %i\n",
 		 vsi->num_q_vectors, vsi->base_vector);
@@ -591,13 +591,6 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n,
 					 "   d[%03x] = 0x%016llx 0x%016llx\n",
 					 i, txd->buffer_addr,
 					 txd->cmd_type_offset_bsz);
-			} else if (sizeof(union i40e_rx_desc) ==
-				   sizeof(union i40e_16byte_rx_desc)) {
-				rxd = I40E_RX_DESC(ring, i);
-				dev_info(&pf->pdev->dev,
-					 "   d[%03x] = 0x%016llx 0x%016llx\n",
-					 i, rxd->read.pkt_addr,
-					 rxd->read.hdr_addr);
 			} else {
 				rxd = I40E_RX_DESC(ring, i);
 				dev_info(&pf->pdev->dev,
@@ -619,13 +612,6 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n,
 				 "vsi = %02i tx ring = %02i d[%03x] = 0x%016llx 0x%016llx\n",
 				 vsi_seid, ring_id, desc_n,
 				 txd->buffer_addr, txd->cmd_type_offset_bsz);
-		} else if (sizeof(union i40e_rx_desc) ==
-			   sizeof(union i40e_16byte_rx_desc)) {
-			rxd = I40E_RX_DESC(ring, desc_n);
-			dev_info(&pf->pdev->dev,
-				 "vsi = %02i rx ring = %02i d[%03x] = 0x%016llx 0x%016llx\n",
-				 vsi_seid, ring_id, desc_n,
-				 rxd->read.pkt_addr, rxd->read.hdr_addr);
 		} else {
 			rxd = I40E_RX_DESC(ring, desc_n);
 			dev_info(&pf->pdev->dev,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b5713ae..80390ea 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2862,14 +2862,12 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	rx_ctx.base = (ring->dma / 128);
 	rx_ctx.qlen = ring->count;
 
-	if (vsi->back->flags & I40E_FLAG_16BYTE_RX_DESC_ENABLED) {
-		set_ring_16byte_desc_enabled(ring);
-		rx_ctx.dsize = 0;
-	} else {
-		rx_ctx.dsize = 1;
-	}
+	/* use 32 byte descriptors */
+	rx_ctx.dsize = 1;
 
-	rx_ctx.dtype = vsi->dtype;
+	/* descriptor type is always zero
+	 * rx_ctx.dtype = 0;
+	 */
 	rx_ctx.hsplit_0 = 0;
 
 	rx_ctx.rxmax = min_t(u16, vsi->max_frame, chain_len * ring->rx_buf_len);
@@ -2949,7 +2947,6 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
 		vsi->max_frame = I40E_RXBUFFER_2048;
 
 	vsi->rx_buf_len = I40E_RXBUFFER_2048;
-	vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
 
 #ifdef I40E_FCOE
 	/* setup rx buffer for FCoE */
@@ -2957,7 +2954,6 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
 	    (vsi->back->flags & I40E_FLAG_FCOE_ENABLED)) {
 		vsi->rx_buf_len = I40E_RXBUFFER_3072;
 		vsi->max_frame = I40E_RXBUFFER_3072;
-		vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
 	}
 
 #endif /* I40E_FCOE */
@@ -7477,10 +7473,6 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
 		rx_ring->count = vsi->num_desc;
 		rx_ring->size = 0;
 		rx_ring->dcb_tc = 0;
-		if (pf->flags & I40E_FLAG_16BYTE_RX_DESC_ENABLED)
-			set_ring_16byte_desc_enabled(rx_ring);
-		else
-			clear_ring_16byte_desc_enabled(rx_ring);
 		rx_ring->rx_itr_setting = pf->rx_itr_default;
 		vsi->rx_rings[i] = rx_ring;
 	}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 54ddbd4..b78c810 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -260,16 +260,18 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
-	__UNUSED,
-	__I40E_RX_16BYTE_DESC_ENABLED,
 };
 
-#define ring_is_16byte_desc_enabled(ring) \
-	test_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
-#define set_ring_16byte_desc_enabled(ring) \
-	set_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
-#define clear_ring_16byte_desc_enabled(ring) \
-	clear_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
+/* some useful defines for virtchannel interface, which
+ * is the only remaining user of header split
+ */
+#define I40E_RX_DTYPE_NO_SPLIT      0
+#define I40E_RX_DTYPE_HEADER_SPLIT  1
+#define I40E_RX_DTYPE_SPLIT_ALWAYS  2
+#define I40E_RX_SPLIT_L2      0x1
+#define I40E_RX_SPLIT_IP      0x2
+#define I40E_RX_SPLIT_TCP_UDP 0x4
+#define I40E_RX_SPLIT_SCTP    0x8
 
 /* struct that defines a descriptor ring, associated with a VSI */
 struct i40e_ring {
@@ -297,14 +299,6 @@ struct i40e_ring {
 	u16 count;			/* Number of descriptors */
 	u16 reg_idx;			/* HW register index of the ring */
 	u16 rx_buf_len;
-	u8  dtype;
-#define I40E_RX_DTYPE_NO_SPLIT      0
-#define I40E_RX_DTYPE_HEADER_SPLIT  1
-#define I40E_RX_DTYPE_SPLIT_ALWAYS  2
-#define I40E_RX_SPLIT_L2      0x1
-#define I40E_RX_SPLIT_IP      0x2
-#define I40E_RX_SPLIT_TCP_UDP 0x4
-#define I40E_RX_SPLIT_SCTP    0x8
 
 	/* used in interrupt processing */
 	u16 next_to_use;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
index 0bffbd9..0112277 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.h
@@ -259,16 +259,18 @@ struct i40e_rx_queue_stats {
 enum i40e_ring_state_t {
 	__I40E_TX_FDIR_INIT_DONE,
 	__I40E_TX_XPS_INIT_DONE,
-	__UNUSED,
-	__I40E_RX_16BYTE_DESC_ENABLED,
 };
 
-#define ring_is_16byte_desc_enabled(ring) \
-	test_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
-#define set_ring_16byte_desc_enabled(ring) \
-	set_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
-#define clear_ring_16byte_desc_enabled(ring) \
-	clear_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
+/* some useful defines for virtchannel interface, which
+ * is the only remaining user of header split
+ */
+#define I40E_RX_DTYPE_NO_SPLIT      0
+#define I40E_RX_DTYPE_HEADER_SPLIT  1
+#define I40E_RX_DTYPE_SPLIT_ALWAYS  2
+#define I40E_RX_SPLIT_L2      0x1
+#define I40E_RX_SPLIT_IP      0x2
+#define I40E_RX_SPLIT_TCP_UDP 0x4
+#define I40E_RX_SPLIT_SCTP    0x8
 
 /* struct that defines a descriptor ring, associated with a VSI */
 struct i40e_ring {
@@ -288,14 +290,6 @@ struct i40e_ring {
 	u16 count;			/* Number of descriptors */
 	u16 reg_idx;			/* HW register index of the ring */
 	u16 rx_buf_len;
-	u8  dtype;
-#define I40E_RX_DTYPE_NO_SPLIT      0
-#define I40E_RX_DTYPE_HEADER_SPLIT  1
-#define I40E_RX_DTYPE_SPLIT_ALWAYS  2
-#define I40E_RX_SPLIT_L2      0x1
-#define I40E_RX_SPLIT_IP      0x2
-#define I40E_RX_SPLIT_TCP_UDP 0x4
-#define I40E_RX_SPLIT_SCTP    0x8
 
 	/* used in interrupt processing */
 	u16 next_to_use;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 13/14] i40evf: Allocate rx buffers properly
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (11 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 12/14] i40e/i40evf: Remove unused hardware receive descriptor code Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 14/14] i40e: Test memory before ethtool alloc succeeds Harshitha Ramamurthy
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Mitch Williams <mitch.a.williams@intel.com>

Allocate the correct number of RX buffers, and don't fiddle with
next_to_use. The common RX code handles all of this. This fixes a memory
leak of one page each time the driver is opened.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Change-Id: Id06eca353086e084921f047acad28c14745684ee
---
Testing Hints : use kedr to look for memory leaks

 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 7b186d1..b1f7c1e 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -998,9 +998,7 @@ static void i40evf_configure(struct i40evf_adapter *adapter)
 	for (i = 0; i < adapter->num_active_queues; i++) {
 		struct i40e_ring *ring = &adapter->rx_rings[i];
 
-		i40evf_alloc_rx_buffers(ring, ring->count);
-		ring->next_to_use = ring->count - 1;
-		writel(ring->next_to_use, ring->tail);
+		i40evf_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
 	}
 }
 
@@ -2768,7 +2766,6 @@ static void i40evf_remove(struct pci_dev *pdev)
 
 	iounmap(hw->hw_addr);
 	pci_release_regions(pdev);
-
 	i40evf_free_all_tx_resources(adapter);
 	i40evf_free_all_rx_resources(adapter);
 	i40evf_free_queues(adapter);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 14/14] i40e: Test memory before ethtool alloc succeeds
  2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
                   ` (12 preceding siblings ...)
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 13/14] i40evf: Allocate rx buffers properly Harshitha Ramamurthy
@ 2016-04-14 13:19 ` Harshitha Ramamurthy
  13 siblings, 0 replies; 20+ messages in thread
From: Harshitha Ramamurthy @ 2016-04-14 13:19 UTC (permalink / raw)
  To: intel-wired-lan

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

When testing on systems with very limited amounts of RAM, a bug was
found where, while changing the number of descriptors using ethtool,
the driver didn't test the limits of system memory before permanently
assuming it would be able to get receive buffer memory.

Work around this issue by pre-allocation of the receive buffer
memory, in the "ghost" ring, which is then used during reinit
using the new ring length.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Change-Id: I92d7a5fb59a6c884b2efdd1ec652845f101c3359
---
Testing Hints : see original report in
bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=1312502,
generally do ethtool -G changes on i686 RHEL6.7

 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 34 +++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index feb370b..deb8762 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1274,6 +1274,13 @@ static int i40e_set_ringparam(struct net_device *netdev,
 		}
 
 		for (i = 0; i < vsi->num_queue_pairs; i++) {
+			/* this is to allow wr32 to have something to write to
+			 * during early allocation of rx buffers
+			 */
+			u32 __iomem faketail = 0;
+			struct i40e_ring *ring;
+			u16 unused;
+
 			/* clone ring and setup updated count */
 			rx_rings[i] = *vsi->rx_rings[i];
 			rx_rings[i].count = new_rx_count;
@@ -1282,12 +1289,22 @@ static int i40e_set_ringparam(struct net_device *netdev,
 			 */
 			rx_rings[i].desc = NULL;
 			rx_rings[i].rx_bi = NULL;
+			rx_rings[i].tail = (u8 __iomem *)&faketail;
 			err = i40e_setup_rx_descriptors(&rx_rings[i]);
+			if (err)
+				goto rx_unwind;
+
+			/* now allocate the rx buffers to make sure the OS
+			 * has enough memory, any failure here means abort
+			 */
+			ring = &rx_rings[i];
+			unused = I40E_DESC_UNUSED(ring);
+			err = i40e_alloc_rx_buffers(ring, unused);
+rx_unwind:
 			if (err) {
-				while (i) {
-					i--;
+				do {
 					i40e_free_rx_resources(&rx_rings[i]);
-				}
+				} while (i--);
 				kfree(rx_rings);
 				rx_rings = NULL;
 
@@ -1313,6 +1330,17 @@ static int i40e_set_ringparam(struct net_device *netdev,
 	if (rx_rings) {
 		for (i = 0; i < vsi->num_queue_pairs; i++) {
 			i40e_free_rx_resources(vsi->rx_rings[i]);
+			/* get the real tail offset */
+			rx_rings[i].tail = vsi->rx_rings[i]->tail;
+			/* this is to fake out the allocation routine
+			 * into thinking it has to realloc everything
+			 * but the recycling logic will let us re-use
+			 * the buffers allocated above
+			 */
+			rx_rings[i].next_to_use = 0;
+			rx_rings[i].next_to_clean = 0;
+			rx_rings[i].next_to_alloc = 0;
+			/* do a struct copy */
 			*vsi->rx_rings[i] = rx_rings[i];
 		}
 		kfree(rx_rings);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine
  2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine Harshitha Ramamurthy
@ 2016-04-15 17:25   ` Alexander Duyck
  2016-04-15 18:19     ` Jesse Brandeburg
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Duyck @ 2016-04-15 17:25 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Apr 14, 2016 at 6:19 AM, Harshitha Ramamurthy
<harshitha.ramamurthy@intel.com> wrote:
> From: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
> This refactor aligns the receive routine with the one in
> ixgbe which was highly optimized.  This reduces the code
> we have to maintain and allows for (hopefully) more readable
> and maintainable RX hot path.
>
> In order to do this:
> - consolidate the receive path into a single function that doesn't
>   use packet split but *does* use pages for rx buffers.
> - remove the old _1buf and _ps routines
> - consolidate several routines into helper functions
> - remove ethtool control over packet split
> - remove VF ethtool control over packet split
>
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0
> ---
> Testing Hints : lots of receive traffic, make
> sure it's all working, PF and VF. Please test on a machine with 8kB
> or larger pages, and check for memory leaks.
>
>  drivers/net/ethernet/intel/i40e/i40e.h             |   4 -
>  drivers/net/ethernet/intel/i40e/i40e_debugfs.c     |  12 +-
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c     |  20 -
>  drivers/net/ethernet/intel/i40e/i40e_main.c        |  57 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 951 ++++++++++-----------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h        |  45 +-
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c      | 911 ++++++++++----------
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.h      |  45 +-
>  drivers/net/ethernet/intel/i40evf/i40evf.h         |   7 -
>  drivers/net/ethernet/intel/i40evf/i40evf_ethtool.c |  52 +-
>  drivers/net/ethernet/intel/i40evf/i40evf_main.c    |  31 +-
>  .../net/ethernet/intel/i40evf/i40evf_virtchnl.c    |   4 -
>  12 files changed, 977 insertions(+), 1162 deletions(-)

If due to only the size I really think this should probably be split
into two patches.  One for the VF and one for the PF.  That way we
should only be looking at about 1000 lines of change per patch instead
of 2000 which becomes a bit unwieldy.  If nothing else it makes the
reviews easier to read as we don't end up with a novel with review
comments scattered throughout.

> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index b5fcd9c..5e23cc9 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -101,7 +101,6 @@
>  #define I40E_PRIV_FLAGS_LINKPOLL_FLAG  BIT(1)
>  #define I40E_PRIV_FLAGS_FD_ATR         BIT(2)
>  #define I40E_PRIV_FLAGS_VEB_STATS      BIT(3)
> -#define I40E_PRIV_FLAGS_PS             BIT(4)
>  #define I40E_PRIV_FLAGS_HW_ATR_EVICT   BIT(5)
>
>  #define I40E_NVM_VERSION_LO_SHIFT  0
> @@ -335,8 +334,6 @@ struct i40e_pf {
>  #define I40E_FLAG_RX_CSUM_ENABLED              BIT_ULL(1)
>  #define I40E_FLAG_MSI_ENABLED                  BIT_ULL(2)
>  #define I40E_FLAG_MSIX_ENABLED                 BIT_ULL(3)
> -#define I40E_FLAG_RX_1BUF_ENABLED              BIT_ULL(4)
> -#define I40E_FLAG_RX_PS_ENABLED                        BIT_ULL(5)
>  #define I40E_FLAG_RSS_ENABLED                  BIT_ULL(6)
>  #define I40E_FLAG_VMDQ_ENABLED                 BIT_ULL(7)
>  #define I40E_FLAG_FDIR_REQUIRES_REINIT         BIT_ULL(8)
> @@ -549,7 +546,6 @@ struct i40e_vsi {
>         u8  *rss_lut_user;  /* User configured lookup table entries */
>
>         u16 max_frame;
> -       u16 rx_hdr_len;
>         u16 rx_buf_len;
>         u8  dtype;
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> index 83dccf1..519cfc8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
> @@ -268,13 +268,13 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
>                          rx_ring->queue_index,
>                          rx_ring->reg_idx);
>                 dev_info(&pf->pdev->dev,
> -                        "    rx_rings[%i]: rx_hdr_len = %d, rx_buf_len = %d, dtype = %d\n",
> -                        i, rx_ring->rx_hdr_len,
> +                        "    rx_rings[%i]: rx_buf_len = %d, dtype = %d\n",
> +                        i,
>                          rx_ring->rx_buf_len,
>                          rx_ring->dtype);
>                 dev_info(&pf->pdev->dev,
> -                        "    rx_rings[%i]: hsplit = %d, next_to_use = %d, next_to_clean = %d, ring_active = %i\n",
> -                        i, ring_is_ps_enabled(rx_ring),
> +                        "    rx_rings[%i]: next_to_use = %d, next_to_clean = %d, ring_active = %i\n",
> +                        i,
>                          rx_ring->next_to_use,
>                          rx_ring->next_to_clean,
>                          rx_ring->ring_active);
> @@ -365,8 +365,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
>                  "    work_limit = %d\n",
>                  vsi->work_limit);
>         dev_info(&pf->pdev->dev,
> -                "    max_frame = %d, rx_hdr_len = %d, rx_buf_len = %d dtype = %d\n",
> -                vsi->max_frame, vsi->rx_hdr_len, vsi->rx_buf_len, vsi->dtype);
> +                "    max_frame = %d, rx_buf_len = %d dtype = %d\n",
> +                vsi->max_frame, vsi->rx_buf_len, vsi->dtype);
>         dev_info(&pf->pdev->dev,
>                  "    num_q_vectors = %i, base_vector = %i\n",
>                  vsi->num_q_vectors, vsi->base_vector);
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index 35c211f..feb370b 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -235,7 +235,6 @@ static const char i40e_priv_flags_strings[][ETH_GSTRING_LEN] = {
>         "LinkPolling",
>         "flow-director-atr",
>         "veb-stats",
> -       "packet-split",
>         "hw-atr-eviction",
>  };
>
> @@ -2997,8 +2996,6 @@ static u32 i40e_get_priv_flags(struct net_device *dev)
>                 I40E_PRIV_FLAGS_FD_ATR : 0;
>         ret_flags |= pf->flags & I40E_FLAG_VEB_STATS_ENABLED ?
>                 I40E_PRIV_FLAGS_VEB_STATS : 0;
> -       ret_flags |= pf->flags & I40E_FLAG_RX_PS_ENABLED ?
> -               I40E_PRIV_FLAGS_PS : 0;
>         ret_flags |= pf->auto_disable_flags & I40E_FLAG_HW_ATR_EVICT_CAPABLE ?
>                 0 : I40E_PRIV_FLAGS_HW_ATR_EVICT;
>

One thing you might want to consider would be to pull the PS_ENABLED
option removal off into a separate patch.  If I recall I did something
similar when I was making these changes for igb and ixgbe.  Then at
least you will be looking at fewer files that have to change in the
next one.  In addition you can somewhat automate the process that way
as it would just be all cases where the flag is false become the
default.

> @@ -3019,23 +3016,6 @@ static int i40e_set_priv_flags(struct net_device *dev, u32 flags)
>
>         /* NOTE: MFP is not settable */
>
> -       /* allow the user to control the method of receive
> -        * buffer DMA, whether the packet is split at header
> -        * boundaries into two separate buffers.  In some cases
> -        * one routine or the other will perform better.
> -        */
> -       if ((flags & I40E_PRIV_FLAGS_PS) &&
> -           !(pf->flags & I40E_FLAG_RX_PS_ENABLED)) {
> -               pf->flags |= I40E_FLAG_RX_PS_ENABLED;
> -               pf->flags &= ~I40E_FLAG_RX_1BUF_ENABLED;
> -               reset_required = true;
> -       } else if (!(flags & I40E_PRIV_FLAGS_PS) &&
> -                  (pf->flags & I40E_FLAG_RX_PS_ENABLED)) {
> -               pf->flags &= ~I40E_FLAG_RX_PS_ENABLED;
> -               pf->flags |= I40E_FLAG_RX_1BUF_ENABLED;
> -               reset_required = true;
> -       }
> -
>         if (flags & I40E_PRIV_FLAGS_LINKPOLL_FLAG)
>                 pf->flags |= I40E_FLAG_LINK_POLLING_ENABLED;
>         else
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 6308218..b5713ae 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -2856,10 +2856,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
>         memset(&rx_ctx, 0, sizeof(rx_ctx));
>
>         ring->rx_buf_len = vsi->rx_buf_len;
> -       ring->rx_hdr_len = vsi->rx_hdr_len;
>
>         rx_ctx.dbuff = ring->rx_buf_len >> I40E_RXQ_CTX_DBUFF_SHIFT;
> -       rx_ctx.hbuff = ring->rx_hdr_len >> I40E_RXQ_CTX_HBUFF_SHIFT;
>
>         rx_ctx.base = (ring->dma / 128);
>         rx_ctx.qlen = ring->count;

So one concern I would have here would be to verify there isn't
anything that depends on the header length being defined for parsing.
I ask because I recall the 82599 had a dependency for the header
buffer length to be set for RSC to work correctly.  You might want to
make sure that setting this to 0 is correct.

> @@ -2872,18 +2870,9 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
>         }
>
>         rx_ctx.dtype = vsi->dtype;
> -       if (vsi->dtype) {
> -               set_ring_ps_enabled(ring);
> -               rx_ctx.hsplit_0 = I40E_RX_SPLIT_L2      |
> -                                 I40E_RX_SPLIT_IP      |
> -                                 I40E_RX_SPLIT_TCP_UDP |
> -                                 I40E_RX_SPLIT_SCTP;
> -       } else {
> -               rx_ctx.hsplit_0 = 0;
> -       }
> +       rx_ctx.hsplit_0 = 0;
>
> -       rx_ctx.rxmax = min_t(u16, vsi->max_frame,
> -                                 (chain_len * ring->rx_buf_len));
> +       rx_ctx.rxmax = min_t(u16, vsi->max_frame, chain_len * ring->rx_buf_len);
>         if (hw->revision_id == 0)
>                 rx_ctx.lrxqthresh = 0;
>         else

You might want to look at updating this rxmax value to something that
supports whatever your maximum jumbo frame size is.  There shouldn't
be any need to block any frame here because your maximum supported
frame size should be something like 32K with descriptor chaining.

> @@ -2920,12 +2909,7 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
>         ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q);
>         writel(0, ring->tail);
>
> -       if (ring_is_ps_enabled(ring)) {
> -               i40e_alloc_rx_headers(ring);
> -               i40e_alloc_rx_buffers_ps(ring, I40E_DESC_UNUSED(ring));
> -       } else {
> -               i40e_alloc_rx_buffers_1buf(ring, I40E_DESC_UNUSED(ring));
> -       }
> +       i40e_alloc_rx_buffers(ring, I40E_DESC_UNUSED(ring));
>
>         return 0;
>  }
> @@ -2964,31 +2948,13 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
>         else
>                 vsi->max_frame = I40E_RXBUFFER_2048;
>
> -       /* figure out correct receive buffer length */
> -       switch (vsi->back->flags & (I40E_FLAG_RX_1BUF_ENABLED |
> -                                   I40E_FLAG_RX_PS_ENABLED)) {
> -       case I40E_FLAG_RX_1BUF_ENABLED:
> -               vsi->rx_hdr_len = 0;
> -               vsi->rx_buf_len = vsi->max_frame;
> -               vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
> -               break;
> -       case I40E_FLAG_RX_PS_ENABLED:
> -               vsi->rx_hdr_len = I40E_RX_HDR_SIZE;
> -               vsi->rx_buf_len = I40E_RXBUFFER_2048;
> -               vsi->dtype = I40E_RX_DTYPE_HEADER_SPLIT;
> -               break;
> -       default:
> -               vsi->rx_hdr_len = I40E_RX_HDR_SIZE;
> -               vsi->rx_buf_len = I40E_RXBUFFER_2048;
> -               vsi->dtype = I40E_RX_DTYPE_SPLIT_ALWAYS;
> -               break;
> -       }
> +       vsi->rx_buf_len = I40E_RXBUFFER_2048;
> +       vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
>
>  #ifdef I40E_FCOE
>         /* setup rx buffer for FCoE */
>         if ((vsi->type == I40E_VSI_FCOE) &&
>             (vsi->back->flags & I40E_FLAG_FCOE_ENABLED)) {
> -               vsi->rx_hdr_len = 0;
>                 vsi->rx_buf_len = I40E_RXBUFFER_3072;
>                 vsi->max_frame = I40E_RXBUFFER_3072;
>                 vsi->dtype = I40E_RX_DTYPE_NO_SPLIT;
> @@ -2996,8 +2962,6 @@ static int i40e_vsi_configure_rx(struct i40e_vsi *vsi)
>
>  #endif /* I40E_FCOE */
>         /* round up for the chip's needs */
> -       vsi->rx_hdr_len = ALIGN(vsi->rx_hdr_len,
> -                               BIT_ULL(I40E_RXQ_CTX_HBUFF_SHIFT));
>         vsi->rx_buf_len = ALIGN(vsi->rx_buf_len,
>                                 BIT_ULL(I40E_RXQ_CTX_DBUFF_SHIFT));
>
> @@ -8461,11 +8425,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
>                     I40E_FLAG_MSI_ENABLED     |
>                     I40E_FLAG_MSIX_ENABLED;
>
> -       if (iommu_present(&pci_bus_type))
> -               pf->flags |= I40E_FLAG_RX_PS_ENABLED;
> -       else
> -               pf->flags |= I40E_FLAG_RX_1BUF_ENABLED;
> -
>         /* Set default ITR */
>         pf->rx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_RX_DEF;
>         pf->tx_itr_default = I40E_ITR_DYNAMIC | I40E_ITR_TX_DEF;
> @@ -10691,11 +10650,9 @@ static void i40e_print_features(struct i40e_pf *pf)
>  #ifdef CONFIG_PCI_IOV
>         i += snprintf(&buf[i], REMAIN(i), " VFs: %d", pf->num_req_vfs);
>  #endif
> -       i += snprintf(&buf[i], REMAIN(i), " VSIs: %d QP: %d RX: %s",
> +       i += snprintf(&buf[i], REMAIN(i), " VSIs: %d QP: %d",
>                       pf->hw.func_caps.num_vsis,
> -                     pf->vsi[pf->lan_vsi]->num_queue_pairs,
> -                     pf->flags & I40E_FLAG_RX_PS_ENABLED ? "PS" : "1BUF");
> -
> +                     pf->vsi[pf->lan_vsi]->num_queue_pairs);
>         if (pf->flags & I40E_FLAG_RSS_ENABLED)
>                 i += snprintf(&buf[i], REMAIN(i), " RSS");
>         if (pf->flags & I40E_FLAG_FD_ATR_ENABLED)
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index ce4d94b..2f50ab8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -1042,7 +1042,6 @@ err:
>  void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>  {
>         struct device *dev = rx_ring->dev;
> -       struct i40e_rx_buffer *rx_bi;
>         unsigned long bi_size;
>         u16 i;
>
> @@ -1050,48 +1049,22 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>         if (!rx_ring->rx_bi)
>                 return;
>
> -       if (ring_is_ps_enabled(rx_ring)) {
> -               int bufsz = ALIGN(rx_ring->rx_hdr_len, 256) * rx_ring->count;
> -
> -               rx_bi = &rx_ring->rx_bi[0];
> -               if (rx_bi->hdr_buf) {
> -                       dma_free_coherent(dev,
> -                                         bufsz,
> -                                         rx_bi->hdr_buf,
> -                                         rx_bi->dma);
> -                       for (i = 0; i < rx_ring->count; i++) {
> -                               rx_bi = &rx_ring->rx_bi[i];
> -                               rx_bi->dma = 0;
> -                               rx_bi->hdr_buf = NULL;
> -                       }
> -               }
> -       }
>         /* Free all the Rx ring sk_buffs */
>         for (i = 0; i < rx_ring->count; i++) {
> -               rx_bi = &rx_ring->rx_bi[i];
> -               if (rx_bi->dma) {
> -                       dma_unmap_single(dev,
> -                                        rx_bi->dma,
> -                                        rx_ring->rx_buf_len,
> -                                        DMA_FROM_DEVICE);
> -                       rx_bi->dma = 0;
> -               }
> +               struct i40e_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
> +
>                 if (rx_bi->skb) {
>                         dev_kfree_skb(rx_bi->skb);
>                         rx_bi->skb = NULL;
>                 }
> -               if (rx_bi->page) {
> -                       if (rx_bi->page_dma) {
> -                               dma_unmap_page(dev,
> -                                              rx_bi->page_dma,
> -                                              PAGE_SIZE,
> -                                              DMA_FROM_DEVICE);
> -                               rx_bi->page_dma = 0;
> -                       }
> -                       __free_page(rx_bi->page);
> -                       rx_bi->page = NULL;
> -                       rx_bi->page_offset = 0;
> -               }
> +               if (!rx_bi->page)
> +                       continue;
> +
> +               dma_unmap_page(dev, rx_bi->dma, PAGE_SIZE, DMA_FROM_DEVICE);
> +               __free_pages(rx_bi->page, 0);

If the order is going to be 0 you can just use __free_page instead of
__free_pages.

> +
> +               rx_bi->page = NULL;
> +               rx_bi->page_offset = 0;
>         }
>
>         bi_size = sizeof(struct i40e_rx_buffer) * rx_ring->count;
> @@ -1100,6 +1073,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>         /* Zero out the descriptor ring */
>         memset(rx_ring->desc, 0, rx_ring->size);
>
> +       rx_ring->next_to_alloc = 0;
>         rx_ring->next_to_clean = 0;
>         rx_ring->next_to_use = 0;
>  }
> @@ -1124,37 +1098,6 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring)
>  }
>
>  /**
> - * i40e_alloc_rx_headers - allocate rx header buffers
> - * @rx_ring: ring to alloc buffers
> - *
> - * Allocate rx header buffers for the entire ring. As these are static,
> - * this is only called when setting up a new ring.
> - **/
> -void i40e_alloc_rx_headers(struct i40e_ring *rx_ring)
> -{
> -       struct device *dev = rx_ring->dev;
> -       struct i40e_rx_buffer *rx_bi;
> -       dma_addr_t dma;
> -       void *buffer;
> -       int buf_size;
> -       int i;
> -
> -       if (rx_ring->rx_bi[0].hdr_buf)
> -               return;
> -       /* Make sure the buffers don't cross cache line boundaries. */
> -       buf_size = ALIGN(rx_ring->rx_hdr_len, 256);
> -       buffer = dma_alloc_coherent(dev, buf_size * rx_ring->count,
> -                                   &dma, GFP_KERNEL);
> -       if (!buffer)
> -               return;
> -       for (i = 0; i < rx_ring->count; i++) {
> -               rx_bi = &rx_ring->rx_bi[i];
> -               rx_bi->dma = dma + (i * buf_size);
> -               rx_bi->hdr_buf = buffer + (i * buf_size);
> -       }
> -}
> -
> -/**
>   * i40e_setup_rx_descriptors - Allocate Rx descriptors
>   * @rx_ring: Rx descriptor ring (for a specific queue) to setup
>   *
> @@ -1175,9 +1118,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
>         u64_stats_init(&rx_ring->syncp);
>
>         /* Round up to nearest 4K */
> -       rx_ring->size = ring_is_16byte_desc_enabled(rx_ring)
> -               ? rx_ring->count * sizeof(union i40e_16byte_rx_desc)
> -               : rx_ring->count * sizeof(union i40e_32byte_rx_desc);
> +       rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
>         rx_ring->size = ALIGN(rx_ring->size, 4096);
>         rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
>                                            &rx_ring->dma, GFP_KERNEL);
> @@ -1188,6 +1129,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
>                 goto err;
>         }
>
> +       rx_ring->next_to_alloc = 0;
>         rx_ring->next_to_clean = 0;
>         rx_ring->next_to_use = 0;
>
> @@ -1206,6 +1148,10 @@ err:
>  static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
>  {
>         rx_ring->next_to_use = val;
> +
> +       /* update next to alloc since we have filled the ring */
> +       rx_ring->next_to_alloc = val;
> +
>         /* Force memory writes to complete before letting h/w
>          * know there are new descriptors to fetch.  (Only
>          * applicable for weak-ordered memory model archs,
> @@ -1216,160 +1162,122 @@ static inline void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val)
>  }
>
>  /**
> - * i40e_alloc_rx_buffers_ps - Replace used receive buffers; packet split
> - * @rx_ring: ring to place buffers on
> - * @cleaned_count: number of buffers to replace
> + * i40e_alloc_mapped_page - recycle or make a new page
> + * @rx_ring: ring to use
> + * @bi: rx_buffer struct to modify
>   *
> - * Returns true if any errors on allocation
> + * Returns true if the page was successfully allocated or
> + * reused.
>   **/
> -bool i40e_alloc_rx_buffers_ps(struct i40e_ring *rx_ring, u16 cleaned_count)
> +static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,
> +                                  struct i40e_rx_buffer *bi)
>  {
> -       u16 i = rx_ring->next_to_use;
> -       union i40e_rx_desc *rx_desc;
> -       struct i40e_rx_buffer *bi;
> -       const int current_node = numa_node_id();
> +       struct page *page = bi->page;
> +       dma_addr_t dma;
>
> -       /* do nothing if no valid netdev defined */
> -       if (!rx_ring->netdev || !cleaned_count)
> -               return false;
> +       /* since we are recycling buffers we should seldom need to alloc */
> +       if (likely(page)) {
> +               rx_ring->rx_stats.page_reuse_count++;
> +               return true;
> +       }

I'd say you can probably get rid of the reuse_count stat.  Instead I
would track the number of pages allocated if you need a statistic.
That way the fast path can remain fast while the slow path should only
take a minor hit.

>
> -       while (cleaned_count--) {
> -               rx_desc = I40E_RX_DESC(rx_ring, i);
> -               bi = &rx_ring->rx_bi[i];
> +       /* alloc new page for storage */
> +       page = dev_alloc_page();
> +       if (unlikely(!page)) {
> +               rx_ring->rx_stats.alloc_page_failed++;
> +               return false;
> +       }
>
> -               if (bi->skb) /* desc is in use */
> -                       goto no_buffers;
> +       /* map page for use */
> +       dma = dma_map_page(rx_ring->dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
>
> -       /* If we've been moved to a different NUMA node, release the
> -        * page so we can get a new one on the current node.
> +       /* if mapping failed free memory back to system since
> +        * there isn't much point in holding memory we can't use
>          */
> -               if (bi->page &&  page_to_nid(bi->page) != current_node) {
> -                       dma_unmap_page(rx_ring->dev,
> -                                      bi->page_dma,
> -                                      PAGE_SIZE,
> -                                      DMA_FROM_DEVICE);
> -                       __free_page(bi->page);
> -                       bi->page = NULL;
> -                       bi->page_dma = 0;
> -                       rx_ring->rx_stats.realloc_count++;
> -               } else if (bi->page) {
> -                       rx_ring->rx_stats.page_reuse_count++;
> -               }
> -
> -               if (!bi->page) {
> -                       bi->page = alloc_page(GFP_ATOMIC);
> -                       if (!bi->page) {
> -                               rx_ring->rx_stats.alloc_page_failed++;
> -                               goto no_buffers;
> -                       }
> -                       bi->page_dma = dma_map_page(rx_ring->dev,
> -                                                   bi->page,
> -                                                   0,
> -                                                   PAGE_SIZE,
> -                                                   DMA_FROM_DEVICE);
> -                       if (dma_mapping_error(rx_ring->dev, bi->page_dma)) {
> -                               rx_ring->rx_stats.alloc_page_failed++;
> -                               __free_page(bi->page);
> -                               bi->page = NULL;
> -                               bi->page_dma = 0;
> -                               bi->page_offset = 0;
> -                               goto no_buffers;
> -                       }
> -                       bi->page_offset = 0;
> -               }
> -
> -               /* Refresh the desc even if buffer_addrs didn't change
> -                * because each write-back erases this info.
> -                */
> -               rx_desc->read.pkt_addr =
> -                               cpu_to_le64(bi->page_dma + bi->page_offset);
> -               rx_desc->read.hdr_addr = cpu_to_le64(bi->dma);
> -               i++;
> -               if (i == rx_ring->count)
> -                       i = 0;
> +       if (dma_mapping_error(rx_ring->dev, dma)) {
> +               __free_pages(page, 0);

Order 0 page can just use __free_page.

> +               rx_ring->rx_stats.alloc_page_failed++;
> +               return false;
>         }
>
> -       if (rx_ring->next_to_use != i)
> -               i40e_release_rx_desc(rx_ring, i);
> +       bi->dma = dma;
> +       bi->page = page;
> +       bi->page_offset = 0;
>
> -       return false;
> +       return true;
> +}
>
> -no_buffers:
> -       if (rx_ring->next_to_use != i)
> -               i40e_release_rx_desc(rx_ring, i);
> +/**
> + * i40e_receive_skb - Send a completed packet up the stack
> + * @rx_ring:  rx ring in play
> + * @skb: packet to send up
> + * @vlan_tag: vlan tag for packet
> + **/
> +static void i40e_receive_skb(struct i40e_ring *rx_ring,
> +                            struct sk_buff *skb, u16 vlan_tag)
> +{
> +       struct i40e_q_vector *q_vector = rx_ring->q_vector;
>
> -       /* make sure to come back via polling to try again after
> -        * allocation failure
> -        */
> -       return true;
> +       if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
> +           (vlan_tag & VLAN_VID_MASK))
> +               __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
> +

Really this piece does not fit in here.  You would be better off
keeping it with the rest of the Rx descriptor processing since that is
where the VLAN tag originated from.

> +       napi_gro_receive(&q_vector->napi, skb);
>  }
>
>  /**
> - * i40e_alloc_rx_buffers_1buf - Replace used receive buffers; single buffer
> + * i40e_alloc_rx_buffers - Replace used receive buffers
>   * @rx_ring: ring to place buffers on
>   * @cleaned_count: number of buffers to replace
>   *
> - * Returns true if any errors on allocation
> + * Returns false if all allocations were successful, true if any fail
>   **/
> -bool i40e_alloc_rx_buffers_1buf(struct i40e_ring *rx_ring, u16 cleaned_count)
> +bool i40e_alloc_rx_buffers(struct i40e_ring *rx_ring, u16 cleaned_count)
>  {
> -       u16 i = rx_ring->next_to_use;
> +       u16 ntu = rx_ring->next_to_use;
>         union i40e_rx_desc *rx_desc;
>         struct i40e_rx_buffer *bi;
> -       struct sk_buff *skb;
>
>         /* do nothing if no valid netdev defined */
>         if (!rx_ring->netdev || !cleaned_count)
>                 return false;
>
> -       while (cleaned_count--) {
> -               rx_desc = I40E_RX_DESC(rx_ring, i);
> -               bi = &rx_ring->rx_bi[i];
> -               skb = bi->skb;
> -
> -               if (!skb) {
> -                       skb = __netdev_alloc_skb_ip_align(rx_ring->netdev,
> -                                                         rx_ring->rx_buf_len,
> -                                                         GFP_ATOMIC |
> -                                                         __GFP_NOWARN);
> -                       if (!skb) {
> -                               rx_ring->rx_stats.alloc_buff_failed++;
> -                               goto no_buffers;
> -                       }
> -                       /* initialize queue mapping */
> -                       skb_record_rx_queue(skb, rx_ring->queue_index);
> -                       bi->skb = skb;
> -               }
> +       rx_desc = I40E_RX_DESC(rx_ring, ntu);
> +       bi = &rx_ring->rx_bi[ntu];
>
> -               if (!bi->dma) {
> -                       bi->dma = dma_map_single(rx_ring->dev,
> -                                                skb->data,
> -                                                rx_ring->rx_buf_len,
> -                                                DMA_FROM_DEVICE);
> -                       if (dma_mapping_error(rx_ring->dev, bi->dma)) {
> -                               rx_ring->rx_stats.alloc_buff_failed++;
> -                               bi->dma = 0;
> -                               dev_kfree_skb(bi->skb);
> -                               bi->skb = NULL;
> -                               goto no_buffers;
> -                       }
> -               }
> +       do {
> +               if (!i40e_alloc_mapped_page(rx_ring, bi))
> +                       goto no_buffers;
>

Instead of using a goto you can just use a break here.  See my comment
on how to deal with the return value below.

> -               rx_desc->read.pkt_addr = cpu_to_le64(bi->dma);
> +               /* Refresh the desc even if buffer_addrs didn't change
> +                * because each write-back erases this info.
> +                */
> +               rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
>                 rx_desc->read.hdr_addr = 0;
> -               i++;
> -               if (i == rx_ring->count)
> -                       i = 0;
> -       }
>
> -       if (rx_ring->next_to_use != i)
> -               i40e_release_rx_desc(rx_ring, i);
> +               rx_desc++;
> +               bi++;
> +               ntu++;
> +               if (unlikely(ntu == rx_ring->count)) {
> +                       rx_desc = I40E_RX_DESC(rx_ring, 0);
> +                       bi = rx_ring->rx_bi;
> +                       ntu = 0;
> +               }
> +
> +               /* clear the status bits for the next_to_use descriptor */
> +               rx_desc->wb.qword1.status_error_len = 0;
> +
> +               cleaned_count--;
> +       } while (cleaned_count);
> +
> +       if (rx_ring->next_to_use != ntu)
> +               i40e_release_rx_desc(rx_ring, ntu);
>
>         return false;
>

You probably don't really need the branch.  If you have an allocation
failure you can just break and then return !cleaned_count at the end.

>  no_buffers:
> -       if (rx_ring->next_to_use != i)
> -               i40e_release_rx_desc(rx_ring, i);
> +       if (rx_ring->next_to_use != ntu)
> +               i40e_release_rx_desc(rx_ring, ntu);
>
>         /* make sure to come back via polling to try again after
>          * allocation failure
> @@ -1378,42 +1286,35 @@ no_buffers:
>  }
>
>  /**
> - * i40e_receive_skb - Send a completed packet up the stack
> - * @rx_ring:  rx ring in play
> - * @skb: packet to send up
> - * @vlan_tag: vlan tag for packet
> - **/
> -static void i40e_receive_skb(struct i40e_ring *rx_ring,
> -                            struct sk_buff *skb, u16 vlan_tag)
> -{
> -       struct i40e_q_vector *q_vector = rx_ring->q_vector;
> -
> -       if ((rx_ring->netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
> -           (vlan_tag & VLAN_VID_MASK))
> -               __vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vlan_tag);
> -
> -       napi_gro_receive(&q_vector->napi, skb);
> -}
> -
> -/**
>   * i40e_rx_checksum - Indicate in skb if hw indicated a good cksum
>   * @vsi: the VSI we care about
>   * @skb: skb currently being received and modified
> - * @rx_status: status value of last descriptor in packet
> - * @rx_error: error value of last descriptor in packet
> - * @rx_ptype: ptype value of last descriptor in packet
> + * @rx_desc: the receive descriptor
> + *
> + * skb->protocol must be set before this function is called
>   **/
>  static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
>                                     struct sk_buff *skb,
> -                                   u32 rx_status,
> -                                   u32 rx_error,
> -                                   u16 rx_ptype)
> +                                   union i40e_rx_desc *rx_desc)
>  {
> -       struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(rx_ptype);
> +       struct i40e_rx_ptype_decoded decoded;
>         bool ipv4, ipv6, tunnel = false;
> +       u32 rx_error, rx_status;
> +       u8 ptype;
> +       u64 qword;
> +

So the ordering is screwed up on this.  There should be the
checksum_none_assert and NETIF_F_RXCSUM check up here.  We don't need
to do any of this decoding if we aren't doing a checksum.

> +       qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> +       ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
> +       rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
> +                  I40E_RXD_QW1_ERROR_SHIFT;
> +       rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
> +                   I40E_RXD_QW1_STATUS_SHIFT;
> +       decoded = decode_rx_desc_ptype(ptype);

This code should only be run after we have already tested for
I40E_RX_DESC_STATUS_L3L4P_SHIFT in the descriptor.  It would be the
perfect spot to look at using the staterr check function that checks
for a bit without requiring you to do any byteswapping.

In addition all the Rx error checks can benefit from the same thing as
well.  Odds are you just need to go through and clean this code up so
that you are using the staterr function wherever it is applicable.

>         skb->ip_summed = CHECKSUM_NONE;
>
> +       skb_checksum_none_assert(skb);
> +

Choose one.  Either we force CHECKSUM_NONE or we assert that it must
be 0.  Doing both here is kind of pointless.

>         /* Rx csum enabled and ip headers found? */
>         if (!(vsi->netdev->features & NETIF_F_RXCSUM))
>                 return;
> @@ -1479,7 +1380,7 @@ checksum_fail:
>   *
>   * Returns a hash type to be used by skb_set_hash
>   **/
> -static inline enum pkt_hash_types i40e_ptype_to_htype(u8 ptype)
> +static inline int i40e_ptype_to_htype(u8 ptype)
>  {
>         struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
>
> @@ -1507,7 +1408,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
>                                 u8 rx_ptype)
>  {
>         u32 hash;
> -       const __le64 rss_mask  =
> +       const __le64 rss_mask =
>                 cpu_to_le64((u64)I40E_RX_DESC_FLTSTAT_RSS_HASH <<
>                             I40E_RX_DESC_STATUS_FLTSTAT_SHIFT);
>
> @@ -1521,338 +1422,409 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
>  }
>
>  /**
> - * i40e_clean_rx_irq_ps - Reclaim resources after receive; packet split
> - * @rx_ring:  rx ring to clean
> - * @budget:   how many cleans we're allowed
> + * i40e_process_skb_fields - Populate skb header fields from Rx descriptor
> + * @rx_ring: rx descriptor ring packet is being transacted on
> + * @rx_desc: pointer to the EOP Rx descriptor
> + * @skb: pointer to current skb being populated
> + * @rx_ptype: the packet type decoded by hardware
>   *
> - * Returns true if there's any budget left (e.g. the clean is finished)
> + * This function checks the ring, descriptor, and packet information in
> + * order to populate the hash, checksum, VLAN, protocol, and
> + * other fields within the skb.
>   **/
> -static int i40e_clean_rx_irq_ps(struct i40e_ring *rx_ring, const int budget)
> +static inline
> +void i40e_process_skb_fields(struct i40e_ring *rx_ring,
> +                            union i40e_rx_desc *rx_desc, struct sk_buff *skb,
> +                            u8 rx_ptype)
>  {
> -       unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> -       u16 rx_packet_len, rx_header_len, rx_sph, rx_hbo;
> -       u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> -       struct i40e_vsi *vsi = rx_ring->vsi;
> -       u16 i = rx_ring->next_to_clean;
> -       union i40e_rx_desc *rx_desc;
> -       u32 rx_error, rx_status;
> -       bool failure = false;
> -       u8 rx_ptype;
> -       u64 qword;
> -       u32 copysize;
> +       u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> +       u32 rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
> +                       I40E_RXD_QW1_STATUS_SHIFT;
> +       u32 rsyn = (rx_status & I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
> +                  I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT;

The fewer byteswaps you have to do the better for non-Intel architectures.

> -       if (budget <= 0)
> -               return 0;
> +       if (unlikely(rsyn)) {
> +               i40e_ptp_rx_hwtstamp(rx_ring->vsi->back, skb, rsyn);
> +               rx_ring->last_rx_timestamp = jiffies;
> +       }
>
> -       do {
> -               struct i40e_rx_buffer *rx_bi;
> -               struct sk_buff *skb;
> -               u16 vlan_tag;
> -               /* return some buffers to hardware, one at a time is too slow */
> -               if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
> -                       failure = failure ||
> -                                 i40e_alloc_rx_buffers_ps(rx_ring,
> -                                                          cleaned_count);
> -                       cleaned_count = 0;
> -               }
> +       i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);

Since both the Rx hash and the Rx checksum depend on the Rx packet
type information you might want to try combining some of that so that
you can avoid duplication of effort.

> -               i = rx_ring->next_to_clean;
> -               rx_desc = I40E_RX_DESC(rx_ring, i);
> -               qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> -               rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
> -                       I40E_RXD_QW1_STATUS_SHIFT;
> +       /* modifies the skb - consumes the enet header */
> +       skb->protocol = eth_type_trans(skb, rx_ring->netdev);
>
> -               if (!(rx_status & BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
> -                       break;
> +       i40e_rx_checksum(rx_ring->vsi, skb, rx_desc);
> +}

As a matter of preference I usually prefer to see the Rx checksum work
done before you start removing header data.

>
> -               /* This memory barrier is needed to keep us from reading
> -                * any other fields out of the rx_desc until we know the
> -                * DD bit is set.
> -                */
> -               dma_rmb();
> -               /* sync header buffer for reading */
> -               dma_sync_single_range_for_cpu(rx_ring->dev,
> -                                             rx_ring->rx_bi[0].dma,
> -                                             i * rx_ring->rx_hdr_len,
> -                                             rx_ring->rx_hdr_len,
> -                                             DMA_FROM_DEVICE);
> -               if (i40e_rx_is_programming_status(qword)) {
> -                       i40e_clean_programming_status(rx_ring, rx_desc);
> -                       I40E_RX_INCREMENT(rx_ring, i);
> -                       continue;
> -               }
> -               rx_bi = &rx_ring->rx_bi[i];
> -               skb = rx_bi->skb;
> -               if (likely(!skb)) {
> -                       skb = __netdev_alloc_skb_ip_align(rx_ring->netdev,
> -                                                         rx_ring->rx_hdr_len,
> -                                                         GFP_ATOMIC |
> -                                                         __GFP_NOWARN);
> -                       if (!skb) {
> -                               rx_ring->rx_stats.alloc_buff_failed++;
> -                               failure = true;
> -                               break;
> -                       }
> +/**
> + * i40e_pull_tail - i40e specific version of skb_pull_tail
> + * @rx_ring: rx descriptor ring packet is being transacted on
> + * @skb: pointer to current skb being adjusted
> + *
> + * This function is an i40e specific version of __pskb_pull_tail.  The
> + * main difference between this version and the original function is that
> + * this function can make several assumptions about the state of things
> + * that allow for significant optimizations versus the standard function.
> + * As a result we can do things like drop a frag and maintain an accurate
> + * truesize for the skb.
> + */
> +static void i40e_pull_tail(struct i40e_ring *rx_ring, struct sk_buff *skb)
> +{
> +       struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
> +       unsigned char *va;
> +       unsigned int pull_len;
>
> -                       /* initialize queue mapping */
> -                       skb_record_rx_queue(skb, rx_ring->queue_index);
> -                       /* we are reusing so sync this buffer for CPU use */
> -                       dma_sync_single_range_for_cpu(rx_ring->dev,
> -                                                     rx_ring->rx_bi[0].dma,
> -                                                     i * rx_ring->rx_hdr_len,
> -                                                     rx_ring->rx_hdr_len,
> -                                                     DMA_FROM_DEVICE);
> -               }
> -               rx_packet_len = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
> -                               I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
> -               rx_header_len = (qword & I40E_RXD_QW1_LENGTH_HBUF_MASK) >>
> -                               I40E_RXD_QW1_LENGTH_HBUF_SHIFT;
> -               rx_sph = (qword & I40E_RXD_QW1_LENGTH_SPH_MASK) >>
> -                        I40E_RXD_QW1_LENGTH_SPH_SHIFT;
> -
> -               rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
> -                          I40E_RXD_QW1_ERROR_SHIFT;
> -               rx_hbo = rx_error & BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
> -               rx_error &= ~BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
> +       /* it is valid to use page_address instead of kmap since we are
> +        * working with pages allocated out of the lomem pool per
> +        * alloc_page(GFP_ATOMIC)
> +        */
> +       va = skb_frag_address(frag);
>
> -               rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
> -                          I40E_RXD_QW1_PTYPE_SHIFT;
> -               /* sync half-page for reading */
> -               dma_sync_single_range_for_cpu(rx_ring->dev,
> -                                             rx_bi->page_dma,
> -                                             rx_bi->page_offset,
> -                                             PAGE_SIZE / 2,
> -                                             DMA_FROM_DEVICE);
> -               prefetch(page_address(rx_bi->page) + rx_bi->page_offset);
> -               rx_bi->skb = NULL;
> -               cleaned_count++;
> -               copysize = 0;
> -               if (rx_hbo || rx_sph) {
> -                       int len;
> +       /* we need the header to contain the greater of either ETH_HLEN or
> +        * 60 bytes if the skb->len is less than 60 for skb_pad.
> +        */
> +       pull_len = eth_get_headlen(va, I40E_RX_HDR_SIZE);
>
> -                       if (rx_hbo)
> -                               len = I40E_RX_HDR_SIZE;
> -                       else
> -                               len = rx_header_len;
> -                       memcpy(__skb_put(skb, len), rx_bi->hdr_buf, len);
> -               } else if (skb->len == 0) {
> -                       int len;
> -                       unsigned char *va = page_address(rx_bi->page) +
> -                                           rx_bi->page_offset;
> -
> -                       len = min(rx_packet_len, rx_ring->rx_hdr_len);
> -                       memcpy(__skb_put(skb, len), va, len);
> -                       copysize = len;
> -                       rx_packet_len -= len;
> -               }
> -               /* Get the rest of the data if this was a header split */
> -               if (rx_packet_len) {
> -                       skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
> -                                       rx_bi->page,
> -                                       rx_bi->page_offset + copysize,
> -                                       rx_packet_len, I40E_RXBUFFER_2048);
> -
> -                       /* If the page count is more than 2, then both halves
> -                        * of the page are used and we need to free it. Do it
> -                        * here instead of in the alloc code. Otherwise one
> -                        * of the half-pages might be released between now and
> -                        * then, and we wouldn't know which one to use.
> -                        * Don't call get_page and free_page since those are
> -                        * both expensive atomic operations that just change
> -                        * the refcount in opposite directions. Just give the
> -                        * page to the stack; he can have our refcount.
> -                        */
> -                       if (page_count(rx_bi->page) > 2) {
> -                               dma_unmap_page(rx_ring->dev,
> -                                              rx_bi->page_dma,
> -                                              PAGE_SIZE,
> -                                              DMA_FROM_DEVICE);
> -                               rx_bi->page = NULL;
> -                               rx_bi->page_dma = 0;
> -                               rx_ring->rx_stats.realloc_count++;
> -                       } else {
> -                               get_page(rx_bi->page);
> -                               /* switch to the other half-page here; the
> -                                * allocation code programs the right addr
> -                                * into HW. If we haven't used this half-page,
> -                                * the address won't be changed, and HW can
> -                                * just use it next time through.
> -                                */
> -                               rx_bi->page_offset ^= PAGE_SIZE / 2;
> -                       }
> +       /* align pull length to size of long to optimize memcpy performance */
> +       skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));
>
> -               }
> -               I40E_RX_INCREMENT(rx_ring, i);
> +       /* update all of the pointers */
> +       skb_frag_size_sub(frag, pull_len);
> +       frag->page_offset += pull_len;
> +       skb->data_len -= pull_len;
> +       skb->tail += pull_len;
> +}
>
> -               if (unlikely(
> -                   !(rx_status & BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)))) {
> -                       struct i40e_rx_buffer *next_buffer;
> +/**
> + * i40e_cleanup_headers - Correct empty headers
> + * @rx_ring: rx descriptor ring packet is being transacted on
> + * @skb: pointer to current skb being fixed
> + *
> + * Also address the case where we are pulling data in on pages only
> + * and as such no data is present in the skb header.
> + *
> + * In addition if skb is not at least 60 bytes we need to pad it so that
> + * it is large enough to qualify as a valid Ethernet frame.
> + *
> + * Returns true if an error was encountered and skb was freed.
> + **/
> +static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb)
> +{

In the ixgbe approach I had a bit in here that was checking the Rx
descriptor and freeing the buffer if there was a packet error
detected.  I'm going to assume it is probably pushed off into a
separate piece in your Rx function, but it looks to me like there
should be a check for I40E_RX_DESC_ERROR_RXE_SHIFT here.

> +       /* place header in linear portion of buffer */
> +       if (skb_is_nonlinear(skb))
> +               i40e_pull_tail(rx_ring, skb);
>
> -                       next_buffer = &rx_ring->rx_bi[i];
> -                       next_buffer->skb = skb;
> -                       rx_ring->rx_stats.non_eop_descs++;
> -                       continue;
> -               }
> +       /* if eth_skb_pad returns an error the skb was freed */
> +       if (eth_skb_pad(skb))
> +               return true;
>
> -               /* ERR_MASK will only have valid bits if EOP set */
> -               if (unlikely(rx_error & BIT(I40E_RX_DESC_ERROR_RXE_SHIFT))) {
> -                       dev_kfree_skb_any(skb);
> -                       continue;
> -               }
> +       return false;
> +}
>
> -               i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
> +/**
> + * i40e_reuse_rx_page - page flip buffer and store it back on the ring
> + * @rx_ring: rx descriptor ring to store buffers on
> + * @old_buff: donor buffer to have page reused
> + *
> + * Synchronizes page for reuse by the adapter
> + **/
> +static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
> +                              struct i40e_rx_buffer *old_buff)
> +{
> +       struct i40e_rx_buffer *new_buff;
> +       u16 nta = rx_ring->next_to_alloc;
>
> -               if (unlikely(rx_status & I40E_RXD_QW1_STATUS_TSYNVALID_MASK)) {
> -                       i40e_ptp_rx_hwtstamp(vsi->back, skb, (rx_status &
> -                                          I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
> -                                          I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT);
> -                       rx_ring->last_rx_timestamp = jiffies;
> -               }
> +       new_buff = &rx_ring->rx_bi[nta];
>
> -               /* probably a little skewed due to removing CRC */
> -               total_rx_bytes += skb->len;
> -               total_rx_packets++;
> +       /* update, and store next to alloc */
> +       nta++;
> +       rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
>
> -               skb->protocol = eth_type_trans(skb, rx_ring->netdev);
> +       /* transfer page from old buffer to new buffer */
> +       *new_buff = *old_buff;

The ixgbe driver had a call here to sync the page back to the device.
I'll keep my eye out for it elseqhere but this makes a good space to
have something like that.

> +}
>
> -               i40e_rx_checksum(vsi, skb, rx_status, rx_error, rx_ptype);
> +/**
> + * i40e_page_is_reserved - check if reuse is possible
> + * @page: page struct to check
> + */
> +static inline bool i40e_page_is_reserved(struct page *page)
> +{
> +       return (page_to_nid(page) != numa_mem_id()) || page_is_pfmemalloc(page);
> +}
>
> -               vlan_tag = rx_status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)
> -                        ? le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1)
> -                        : 0;
> -#ifdef I40E_FCOE
> -               if (unlikely(
> -                   i40e_rx_is_fcoe(rx_ptype) &&
> -                   !i40e_fcoe_handle_offload(rx_ring, rx_desc, skb))) {
> -                       dev_kfree_skb_any(skb);
> -                       continue;
> -               }
> +/**
> + * i40e_add_rx_frag - Add contents of Rx buffer to sk_buff
> + * @rx_ring: rx descriptor ring to transact packets on
> + * @rx_buffer: buffer containing page to add
> + * @rx_desc: descriptor containing length of buffer written by hardware
> + * @skb: sk_buff to place the data into
> + *
> + * This function will add the data contained in rx_buffer->page to the skb.
> + * This is done either through a direct copy if the data in the buffer is
> + * less than the skb header size, otherwise it will just attach the page as
> + * a frag to the skb.
> + *
> + * The function will then update the page offset if necessary and return
> + * true if the buffer can be reused by the adapter.
> + **/
> +static bool i40e_add_rx_frag(struct i40e_ring *rx_ring,
> +                            struct i40e_rx_buffer *rx_buffer,
> +                            union i40e_rx_desc *rx_desc,
> +                            struct sk_buff *skb)
> +{
> +       struct page *page = rx_buffer->page;
> +       u64 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> +       unsigned int size = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
> +                           I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
> +#if (PAGE_SIZE < 8192)
> +       unsigned int truesize = I40E_RXBUFFER_2048;
> +#else
> +       unsigned int truesize = ALIGN(size, L1_CACHE_BYTES);
> +       unsigned int last_offset = PAGE_SIZE - I40E_RXBUFFER_2048;
>  #endif
> -               i40e_receive_skb(rx_ring, skb, vlan_tag);
>
> -               rx_desc->wb.qword1.status_error_len = 0;
> +       /* will the data fit in the skb we allocated? if so, just
> +        * copy it as it is pretty small anyway
> +        */
> +       if ((size <= I40E_RX_HDR_SIZE) && !skb_is_nonlinear(skb)) {
> +               unsigned char *va = page_address(page) + rx_buffer->page_offset;
>
> -       } while (likely(total_rx_packets < budget));
> +               memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
>
> -       u64_stats_update_begin(&rx_ring->syncp);
> -       rx_ring->stats.packets += total_rx_packets;
> -       rx_ring->stats.bytes += total_rx_bytes;
> -       u64_stats_update_end(&rx_ring->syncp);
> -       rx_ring->q_vector->rx.total_packets += total_rx_packets;
> -       rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
> +               /* page is not reserved, we can reuse buffer as-is */
> +               if (likely(!i40e_page_is_reserved(page)))
> +                       return true;
>
> -       return failure ? budget : total_rx_packets;
> +               /* this page cannot be reused so discard it */
> +               __free_pages(page, 0);
> +               return false;
> +       }
> +
> +       skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
> +                       rx_buffer->page_offset, size, truesize);
> +
> +       /* avoid re-using remote pages */
> +       if (unlikely(i40e_page_is_reserved(page)))
> +               return false;
> +
> +#if (PAGE_SIZE < 8192)
> +       /* if we are only owner of page we can reuse it */
> +       if (unlikely(page_count(page) != 1))
> +               return false;
> +
> +       /* flip page offset to other buffer */
> +       rx_buffer->page_offset ^= truesize;
> +#else
> +       /* move offset up to the next cache line */
> +       rx_buffer->page_offset += truesize;
> +
> +       if (rx_buffer->page_offset > last_offset)
> +               return false;
> +#endif
> +
> +       /* Even if we own the page, we are not allowed to use atomic_set()
> +        * This would break get_page_unless_zero() users.
> +        */
> +       get_page(rx_buffer->page);
> +
> +       return true;
>  }
>
>  /**
> - * i40e_clean_rx_irq_1buf - Reclaim resources after receive; single buffer
> - * @rx_ring:  rx ring to clean
> - * @budget:   how many cleans we're allowed
> + * i40e_fetch_rx_buffer - Allocate skb and populate it
> + * @rx_ring: rx descriptor ring to transact packets on
> + * @rx_desc: descriptor containing info written by hardware
>   *
> - * Returns number of packets cleaned
> + * This function allocates an skb on the fly, and populates it with the page
> + * data from the current receive descriptor, taking care to set up the skb
> + * correctly, as well as handling calling the page recycle function if
> + * necessary.
> + */
> +static inline
> +struct sk_buff *i40e_fetch_rx_buffer(struct i40e_ring *rx_ring,
> +                                    union i40e_rx_desc *rx_desc)
> +{
> +       struct i40e_rx_buffer *rx_buffer;
> +       struct sk_buff *skb;
> +       struct page *page;
> +
> +       rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
> +       page = rx_buffer->page;
> +       prefetchw(page);
> +
> +       skb = rx_buffer->skb;
> +

Keep the skb in the ring, not in the rx_buffer.

> +       if (likely(!skb)) {
> +               void *page_addr = page_address(page) + rx_buffer->page_offset;
> +
> +               /* prefetch first cache line of first page */
> +               prefetch(page_addr);
> +#if L1_CACHE_BYTES < 128
> +               prefetch(page_addr + L1_CACHE_BYTES);
> +#endif
> +
> +               /* allocate a skb to store the frags */
> +               skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
> +                                      I40E_RX_HDR_SIZE,
> +                                      GFP_ATOMIC | __GFP_NOWARN);
> +               if (unlikely(!skb)) {
> +                       rx_ring->rx_stats.alloc_buff_failed++;
> +                       return NULL;
> +               }
> +
> +               /* we will be copying header into skb->data in
> +                * pskb_may_pull so it is in our interest to prefetch
> +                * it now to avoid a possible cache miss
> +                */
> +               prefetchw(skb->data);
> +
> +               skb_record_rx_queue(skb, rx_ring->queue_index);

You can probably hold off on recording the Rx queue until later when
you are processing skb fields.

> +       } else {
> +               /* we are reusing so sync this buffer for CPU use */
> +               dma_sync_single_range_for_cpu(rx_ring->dev,
> +                                             rx_buffer->dma,
> +                                             rx_buffer->page_offset,
> +                                             I40E_RXBUFFER_2048,
> +                                             DMA_FROM_DEVICE);
> +
> +               rx_buffer->skb = NULL;
> +       }
> +

You don't support RSC dso you don't need to copy the code from ixgbe.
You would be better off basing this portion of the code on either igb
or fm10k.

> +       /* pull page into skb */
> +       if (i40e_add_rx_frag(rx_ring, rx_buffer, rx_desc, skb)) {
> +               /* hand second half of page back to the ring */
> +               i40e_reuse_rx_page(rx_ring, rx_buffer);
> +               rx_ring->rx_stats.page_reuse_count++;
> +       } else {
> +               /* we are not reusing the buffer so unmap it */
> +               dma_unmap_page(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
> +                              DMA_FROM_DEVICE);
> +       }
> +
> +       /* clear contents of buffer_info */
> +       rx_buffer->page = NULL;
> +
> +       return skb;
> +}
> +
> +/**
> + * i40e_is_non_eop - process handling of non-EOP buffers
> + * @rx_ring: Rx ring being processed
> + * @rx_desc: Rx descriptor for current buffer
> + * @skb: Current socket buffer containing buffer in progress
> + *
> + * This function updates next to clean.  If the buffer is an EOP buffer
> + * this function exits returning false, otherwise it will place the
> + * sk_buff in the next buffer to be chained and return true indicating
> + * that this is in fact a non-EOP buffer.
> + **/
> +static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
> +                           union i40e_rx_desc *rx_desc,
> +                           struct sk_buff *skb)
> +{
> +       u32 ntc = rx_ring->next_to_clean + 1;
> +
> +       /* fetch, update, and store next to clean */
> +       ntc = (ntc < rx_ring->count) ? ntc : 0;
> +       rx_ring->next_to_clean = ntc;
> +
> +       prefetch(I40E_RX_DESC(rx_ring, ntc));
> +
> +#define staterrlen rx_desc->wb.qword1.status_error_len
> +       if (unlikely(i40e_rx_is_programming_status(le64_to_cpu(staterrlen)))) {
> +               i40e_clean_programming_status(rx_ring, rx_desc);
> +               rx_ring->rx_bi[ntc].skb = skb;
> +               return true;
> +       }
> +       /* if we are the last buffer then there is nothing else to do */
> +#define I40E_RXD_EOF BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)
> +       if (likely(i40e_test_staterr(rx_desc, I40E_RXD_EOF)))
> +               return false;
> +
> +       /* place skb in next buffer to be received */
> +       rx_ring->rx_bi[ntc].skb = skb;
> +       rx_ring->rx_stats.non_eop_descs++;
> +
> +       return true;
> +}
> +
> +/**
> + * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
> + * @rx_ring: rx descriptor ring to transact packets on
> + * @budget: Total limit on number of packets to process
> + *
> + * This function provides a "bounce buffer" approach to Rx interrupt
> + * processing.  The advantage to this is that on systems that have
> + * expensive overhead for IOMMU access this provides a means of avoiding
> + * it by maintaining the mapping of the page to the system.
> + *
> + * Returns amount of work completed
>   **/
> -static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
> +static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>  {
>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>         u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> -       struct i40e_vsi *vsi = rx_ring->vsi;
> -       union i40e_rx_desc *rx_desc;
> -       u32 rx_error, rx_status;
> -       u16 rx_packet_len;
>         bool failure = false;
> -       u8 rx_ptype;
> -       u64 qword;
> -       u16 i;
>
> -       do {
> -               struct i40e_rx_buffer *rx_bi;
> +       while (likely(total_rx_packets < budget)) {
> +               union i40e_rx_desc *rx_desc;
>                 struct sk_buff *skb;
> +               u32 rx_status;
>                 u16 vlan_tag;
> +               u8 rx_ptype;
> +               u64 qword;
> +
>                 /* return some buffers to hardware, one at a time is too slow */
>                 if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
>                         failure = failure ||
> -                                 i40e_alloc_rx_buffers_1buf(rx_ring,
> -                                                            cleaned_count);
> +                                 i40e_alloc_rx_buffers(rx_ring, cleaned_count);
>                         cleaned_count = 0;
>                 }
>
> -               i = rx_ring->next_to_clean;
> -               rx_desc = I40E_RX_DESC(rx_ring, i);
> +               rx_desc = I40E_RX_DESC(rx_ring, rx_ring->next_to_clean);
> +
>                 qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
> +               rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
> +                          I40E_RXD_QW1_PTYPE_SHIFT;
>                 rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
> -                       I40E_RXD_QW1_STATUS_SHIFT;
> +                           I40E_RXD_QW1_STATUS_SHIFT;

These need to be done ins the spot where they are needed.  This
probably isn't the right spot.

>                 if (!(rx_status & BIT(I40E_RX_DESC_STATUS_DD_SHIFT)))
>                         break;
>
> +               /* status_error_len will always be zero for unused descriptors
> +                * because it's cleared in cleanup, and overlaps with hdr_addr
> +                * which is always zero because packet split isn't used, if the
> +                * hardware wrote DD then it will be non-zero
> +                */
> +               if (!rx_desc->wb.qword1.status_error_len)
> +                       break;
> +

You can remove the DD bit check if you added this line.

>                 /* This memory barrier is needed to keep us from reading
>                  * any other fields out of the rx_desc until we know the
>                  * DD bit is set.
>                  */
>                 dma_rmb();
>
> -               if (i40e_rx_is_programming_status(qword)) {
> -                       i40e_clean_programming_status(rx_ring, rx_desc);
> -                       I40E_RX_INCREMENT(rx_ring, i);
> -                       continue;
> -               }
> -               rx_bi = &rx_ring->rx_bi[i];
> -               skb = rx_bi->skb;
> -               prefetch(skb->data);
> -
> -               rx_packet_len = (qword & I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
> -                               I40E_RXD_QW1_LENGTH_PBUF_SHIFT;
> -
> -               rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
> -                          I40E_RXD_QW1_ERROR_SHIFT;
> -               rx_error &= ~BIT(I40E_RX_DESC_ERROR_HBO_SHIFT);
> +               skb = i40e_fetch_rx_buffer(rx_ring, rx_desc);
> +               if (!skb)
> +                       break;
>
> -               rx_ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >>
> -                          I40E_RXD_QW1_PTYPE_SHIFT;
> -               rx_bi->skb = NULL;
>                 cleaned_count++;
>
> -               /* Get the header and possibly the whole packet
> -                * If this is an skb from previous receive dma will be 0
> -                */
> -               skb_put(skb, rx_packet_len);
> -               dma_unmap_single(rx_ring->dev, rx_bi->dma, rx_ring->rx_buf_len,
> -                                DMA_FROM_DEVICE);
> -               rx_bi->dma = 0;
> -
> -               I40E_RX_INCREMENT(rx_ring, i);
> -
> -               if (unlikely(
> -                   !(rx_status & BIT(I40E_RX_DESC_STATUS_EOF_SHIFT)))) {
> -                       rx_ring->rx_stats.non_eop_descs++;
> +               if (i40e_is_non_eop(rx_ring, rx_desc, skb))
>                         continue;
> -               }
>
> -               /* ERR_MASK will only have valid bits if EOP set */
> -               if (unlikely(rx_error & BIT(I40E_RX_DESC_ERROR_RXE_SHIFT))) {
> -                       dev_kfree_skb_any(skb);
> +               if (i40e_cleanup_headers(rx_ring, skb))
>                         continue;

You need to move the RXE check into  i40e_cleanup_headers.

> -               }
> -
> -               i40e_rx_hash(rx_ring, rx_desc, skb, rx_ptype);
> -               if (unlikely(rx_status & I40E_RXD_QW1_STATUS_TSYNVALID_MASK)) {
> -                       i40e_ptp_rx_hwtstamp(vsi->back, skb, (rx_status &
> -                                          I40E_RXD_QW1_STATUS_TSYNINDX_MASK) >>
> -                                          I40E_RXD_QW1_STATUS_TSYNINDX_SHIFT);
> -                       rx_ring->last_rx_timestamp = jiffies;
> -               }
>
>                 /* probably a little skewed due to removing CRC */
>                 total_rx_bytes += skb->len;
> -               total_rx_packets++;
>
> -               skb->protocol = eth_type_trans(skb, rx_ring->netdev);
> +               /* populate checksum, VLAN, and protocol */
> +               i40e_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
>
> -               i40e_rx_checksum(vsi, skb, rx_status, rx_error, rx_ptype);
> -
> -               vlan_tag = rx_status & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)
> -                        ? le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1)
> -                        : 0;
>  #ifdef I40E_FCOE
>                 if (unlikely(
>                     i40e_rx_is_fcoe(rx_ptype) &&
> @@ -1861,10 +1833,15 @@ static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
>                         continue;
>                 }
>  #endif
> +
> +               vlan_tag = (qword & BIT(I40E_RX_DESC_STATUS_L2TAG1P_SHIFT)) ?
> +                          le16_to_cpu(rx_desc->wb.qword0.lo_dword.l2tag1) : 0;
> +

I'd say you could probably move this into process_skb_fields.

>                 i40e_receive_skb(rx_ring, skb, vlan_tag);
>
> -               rx_desc->wb.qword1.status_error_len = 0;
> -       } while (likely(total_rx_packets < budget));
> +               /* update budget accounting */
> +               total_rx_packets++;
> +       }
>
>         u64_stats_update_begin(&rx_ring->syncp);
>         rx_ring->stats.packets += total_rx_packets;
> @@ -1873,6 +1850,7 @@ static int i40e_clean_rx_irq_1buf(struct i40e_ring *rx_ring, int budget)
>         rx_ring->q_vector->rx.total_packets += total_rx_packets;
>         rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
>
> +       /* guarantee a trip back through this routine if there was a failure */
>         return failure ? budget : total_rx_packets;
>  }

I'm not sure what you hope to gain by polling extra times.  I'd say if
you had an allocation failure you probably just need to wait for the
bad-actor to free up memory.  In addition you should have something
that will force the interrupt to be triggered once every two seconds
anyway.

> @@ -2017,12 +1995,7 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
>         budget_per_ring = max(budget/q_vector->num_ringpairs, 1);
>
>         i40e_for_each_ring(ring, q_vector->rx) {
> -               int cleaned;
> -
> -               if (ring_is_ps_enabled(ring))
> -                       cleaned = i40e_clean_rx_irq_ps(ring, budget_per_ring);
> -               else
> -                       cleaned = i40e_clean_rx_irq_1buf(ring, budget_per_ring);
> +               int cleaned = i40e_clean_rx_irq(ring, budget_per_ring);
>
>                 work_done += cleaned;
>                 /* if we clean as many as budgeted, we must not be done */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index 6b2b191..54ddbd4 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -102,8 +102,8 @@ enum i40e_dyn_idx_t {
>         (((pf)->flags & I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE) ? \
>           I40E_DEFAULT_RSS_HENA_EXPANDED : I40E_DEFAULT_RSS_HENA)
>
> -/* Supported Rx Buffer Sizes */
> -#define I40E_RXBUFFER_512   512    /* Used for packet split */
> +/* Supported Rx Buffer Sizes (a multiple of 128) */
> +#define I40E_RXBUFFER_256   256
>  #define I40E_RXBUFFER_2048  2048
>  #define I40E_RXBUFFER_3072  3072   /* For FCoE MTU of 2158 */
>  #define I40E_RXBUFFER_4096  4096
> @@ -114,9 +114,28 @@ enum i40e_dyn_idx_t {
>   * reserve 2 more, and skb_shared_info adds an additional 384 bytes more,
>   * this adds up to 512 bytes of extra data meaning the smallest allocation
>   * we could have is 1K.
> - * i.e. RXBUFFER_512 --> size-1024 slab
> + * i.e. RXBUFFER_256 --> 960 byte skb (size-1024 slab)
> + * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
>   */
> -#define I40E_RX_HDR_SIZE  I40E_RXBUFFER_512
> +#define I40E_RX_HDR_SIZE I40E_RXBUFFER_256
> +#define i40e_rx_desc i40e_32byte_rx_desc
> +
> +/**
> + * i40e_test_staterr - tests bits in Rx descriptor status and error fields
> + * @rx_desc: pointer to receive descriptor (in le64 format)
> + * @stat_err_bits: value to mask
> + *
> + * This function does some fast chicanery in order to return the
> + * value of the mask which is really only used for boolean tests.
> + * The status_error_len doesn't need to be shifted because it begins
> + * at offset zero.
> + */
> +static inline bool i40e_test_staterr(union i40e_rx_desc *rx_desc,
> +                                    const u64 stat_err_bits)
> +{
> +       return !!(rx_desc->wb.qword1.status_error_len &
> +                 cpu_to_le64(stat_err_bits));
> +}

I'd say this should probably be 2 or 3 functions based on what I have
seen.  Basically you have several different shifts you can use so for
each one you could define a different function.

>  /* How many Rx Buffers do we bundle into one write to the hardware ? */
>  #define I40E_RX_BUFFER_WRITE   16      /* Must be power of 2 */
> @@ -142,8 +161,6 @@ enum i40e_dyn_idx_t {
>                 prefetch((n));                          \
>         } while (0)
>
> -#define i40e_rx_desc i40e_32byte_rx_desc
> -
>  #define I40E_MAX_BUFFER_TXD    8
>  #define I40E_MIN_TX_LEN                17
>
> @@ -213,10 +230,8 @@ struct i40e_tx_buffer {
>
>  struct i40e_rx_buffer {
>         struct sk_buff *skb;

The skb pointer can be moved to the rx_ring as well.  No need to keep
it unless you can support interleaving of frames on the ring.

> -       void *hdr_buf;
>         dma_addr_t dma;
>         struct page *page;
> -       dma_addr_t page_dma;
>         unsigned int page_offset;
>  };
>
> @@ -245,16 +260,10 @@ struct i40e_rx_queue_stats {
>  enum i40e_ring_state_t {
>         __I40E_TX_FDIR_INIT_DONE,
>         __I40E_TX_XPS_INIT_DONE,
> -       __I40E_RX_PS_ENABLED,
> +       __UNUSED,
>         __I40E_RX_16BYTE_DESC_ENABLED,
>  };

Just remove the line.  It isn't needed to use the __UNUSED unless you
have a external entity that can see the ring state.

> -#define ring_is_ps_enabled(ring) \
> -       test_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
> -#define set_ring_ps_enabled(ring) \
> -       set_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
> -#define clear_ring_ps_enabled(ring) \
> -       clear_bit(__I40E_RX_PS_ENABLED, &(ring)->state)
>  #define ring_is_16byte_desc_enabled(ring) \
>         test_bit(__I40E_RX_16BYTE_DESC_ENABLED, &(ring)->state)
>  #define set_ring_16byte_desc_enabled(ring) \
> @@ -287,7 +296,6 @@ struct i40e_ring {
>
>         u16 count;                      /* Number of descriptors */
>         u16 reg_idx;                    /* HW register index of the ring */
> -       u16 rx_hdr_len;
>         u16 rx_buf_len;
>         u8  dtype;
>  #define I40E_RX_DTYPE_NO_SPLIT      0
> @@ -330,6 +338,7 @@ struct i40e_ring {
>         struct i40e_q_vector *q_vector; /* Backreference to associated vector */
>
>         struct rcu_head rcu;            /* to avoid race on free */
> +       u16 next_to_alloc;
>  } ____cacheline_internodealigned_in_smp;

The next_to_alloc value can be made part of a union with the ATR bits
used for the Tx.

>  enum i40e_latency_range {
> @@ -353,9 +362,7 @@ struct i40e_ring_container {
>  #define i40e_for_each_ring(pos, head) \
>         for (pos = (head).ring; pos != NULL; pos = pos->next)
>
> -bool i40e_alloc_rx_buffers_ps(struct i40e_ring *rxr, u16 cleaned_count);
> -bool i40e_alloc_rx_buffers_1buf(struct i40e_ring *rxr, u16 cleaned_count);
> -void i40e_alloc_rx_headers(struct i40e_ring *rxr);
> +bool i40e_alloc_rx_buffers(struct i40e_ring *rxr, u16 cleaned_count);
>  netdev_tx_t i40e_lan_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
>  void i40e_clean_tx_ring(struct i40e_ring *tx_ring);
>  void i40e_clean_rx_ring(struct i40e_ring *rx_ring);

I'm stopping my review here, I don't think my mail client can cut it
as I am lagging hard when it saves a draft of the mail.  I figure
there is enough to fix based on just the comments I provided for the
PF that apply to the VF.

- Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine
  2016-04-15 17:25   ` Alexander Duyck
@ 2016-04-15 18:19     ` Jesse Brandeburg
  2016-04-15 19:21       ` Alexander Duyck
  0 siblings, 1 reply; 20+ messages in thread
From: Jesse Brandeburg @ 2016-04-15 18:19 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 15 Apr 2016 10:25:37 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:
> If due to only the size I really think this should probably be split
> into two patches.  One for the VF and one for the PF.  That way we
> should only be looking at about 1000 lines of change per patch instead
> of 2000 which becomes a bit unwieldy.  If nothing else it makes the
> reviews easier to read as we don't end up with a novel with review
> comments scattered throughout.

Thanks for all your comments Alex, they're really useful.  I can split
the patches into (at least) two.

As far as the comments go, most seem to be tweaks/tuning, and unless I
missed it not any bugs yet (but I know the patch was huge)

So, would it be at all reasonable to proceed with the code basically
unchanged except for splitting it up, if no bugs are noticed during
review?  Then I will follow up with a cleanup series?

Thanks,
 Jesse

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine
  2016-04-15 18:19     ` Jesse Brandeburg
@ 2016-04-15 19:21       ` Alexander Duyck
  2016-04-16  0:10         ` Jesse Brandeburg
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Duyck @ 2016-04-15 19:21 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Apr 15, 2016 at 11:19 AM, Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
> On Fri, 15 Apr 2016 10:25:37 -0700
> Alexander Duyck <alexander.duyck@gmail.com> wrote:
>> If due to only the size I really think this should probably be split
>> into two patches.  One for the VF and one for the PF.  That way we
>> should only be looking at about 1000 lines of change per patch instead
>> of 2000 which becomes a bit unwieldy.  If nothing else it makes the
>> reviews easier to read as we don't end up with a novel with review
>> comments scattered throughout.
>
> Thanks for all your comments Alex, they're really useful.  I can split
> the patches into (at least) two.
>
> As far as the comments go, most seem to be tweaks/tuning, and unless I
> missed it not any bugs yet (but I know the patch was huge)
>
> So, would it be at all reasonable to proceed with the code basically
> unchanged except for splitting it up, if no bugs are noticed during
> review?  Then I will follow up with a cleanup series?

The only big issue I saw is the RXE bit check being dropped
completely.  If you can incorporate that into the cleanup_headers
function then I would say the rest is mostly just performance bits
that can be cleaned up later.

- Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine
  2016-04-15 19:21       ` Alexander Duyck
@ 2016-04-16  0:10         ` Jesse Brandeburg
  2016-04-16  1:51           ` Alexander Duyck
  0 siblings, 1 reply; 20+ messages in thread
From: Jesse Brandeburg @ 2016-04-16  0:10 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 15 Apr 2016 12:21:24 -0700
Alexander Duyck <alexander.duyck@gmail.com> wrote:

> On Fri, Apr 15, 2016 at 11:19 AM, Jesse Brandeburg
> <jesse.brandeburg@intel.com> wrote:
> > On Fri, 15 Apr 2016 10:25:37 -0700
> > Alexander Duyck <alexander.duyck@gmail.com> wrote:
> >> If due to only the size I really think this should probably be split
> >> into two patches.  One for the VF and one for the PF.  That way we
> >> should only be looking at about 1000 lines of change per patch instead
> >> of 2000 which becomes a bit unwieldy.  If nothing else it makes the
> >> reviews easier to read as we don't end up with a novel with review
> >> comments scattered throughout.
> >
> > Thanks for all your comments Alex, they're really useful.  I can split
> > the patches into (at least) two.
> >
> > As far as the comments go, most seem to be tweaks/tuning, and unless I
> > missed it not any bugs yet (but I know the patch was huge)
> >
> > So, would it be at all reasonable to proceed with the code basically
> > unchanged except for splitting it up, if no bugs are noticed during
> > review?  Then I will follow up with a cleanup series?
> 
> The only big issue I saw is the RXE bit check being dropped
> completely.  If you can incorporate that into the cleanup_headers
> function then I would say the rest is mostly just performance bits
> that can be cleaned up later.
> 
> - Alex

Alex,

The patches will be coming from Harshitha, but in the meantime I
sent them to you after I broke them all apart and added a check for RXE.

I couldn't use cleanup_headers as it doesn't have access to the receive
descriptor, so rather than adding an argument I just put it inline.

v2:
split receive refactor patches into pieces that are smaller.
added check for RXE (receive error) bit being set in the descriptor.

Thanks again!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine
  2016-04-16  0:10         ` Jesse Brandeburg
@ 2016-04-16  1:51           ` Alexander Duyck
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Duyck @ 2016-04-16  1:51 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Apr 15, 2016 at 5:10 PM, Jesse Brandeburg
<jesse.brandeburg@intel.com> wrote:
> On Fri, 15 Apr 2016 12:21:24 -0700
> Alexander Duyck <alexander.duyck@gmail.com> wrote:
>
>> On Fri, Apr 15, 2016 at 11:19 AM, Jesse Brandeburg
>> <jesse.brandeburg@intel.com> wrote:
>> > On Fri, 15 Apr 2016 10:25:37 -0700
>> > Alexander Duyck <alexander.duyck@gmail.com> wrote:
>> >> If due to only the size I really think this should probably be split
>> >> into two patches.  One for the VF and one for the PF.  That way we
>> >> should only be looking at about 1000 lines of change per patch instead
>> >> of 2000 which becomes a bit unwieldy.  If nothing else it makes the
>> >> reviews easier to read as we don't end up with a novel with review
>> >> comments scattered throughout.
>> >
>> > Thanks for all your comments Alex, they're really useful.  I can split
>> > the patches into (at least) two.
>> >
>> > As far as the comments go, most seem to be tweaks/tuning, and unless I
>> > missed it not any bugs yet (but I know the patch was huge)
>> >
>> > So, would it be at all reasonable to proceed with the code basically
>> > unchanged except for splitting it up, if no bugs are noticed during
>> > review?  Then I will follow up with a cleanup series?
>>
>> The only big issue I saw is the RXE bit check being dropped
>> completely.  If you can incorporate that into the cleanup_headers
>> function then I would say the rest is mostly just performance bits
>> that can be cleaned up later.
>>
>> - Alex
>
> Alex,
>
> The patches will be coming from Harshitha, but in the meantime I
> sent them to you after I broke them all apart and added a check for RXE.
>
> I couldn't use cleanup_headers as it doesn't have access to the receive
> descriptor, so rather than adding an argument I just put it inline.
>
> v2:
> split receive refactor patches into pieces that are smaller.
> added check for RXE (receive error) bit being set in the descriptor.
>
> Thanks again!

I looked it over and it generally looks okay but I did find one more
item.  When the data in the first fragment is getting synced for CPU?
From what I can tell it doesn't look like it is.  That may cause
issues on a system that has a more temperamental DMA API.  The fix is
pretty simple it is just a matter of pulling that
dma_sync_single_for_cpu out of the else and place it a few lines down
like what we had for fm10k or igb.

- Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-04-16  1:51 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-14 13:19 [Intel-wired-lan] [next PATCH S35 00/14] i40e/i40evf updates Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 01/14] i40e: Allow RSS Hash set with less than four parameters Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 02/14] i40e: Fix up 32 bit timespec references Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 03/14] i40e: Add elements to fd filter compare Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 04/14] i40evf: Report link speed Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 05/14] i40evf: Set netdev carrier properly Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 06/14] i40e: Clear mac filter count on reset Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 07/14] i40evf: Enable adaptive interrupt throttling Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 08/14] i40e: Add message for unsupported MFP mode Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 09/14] i40e/i40evf: Rename version macro Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 10/14] i40e/i40evf: Refactor tunnel interpretation Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 11/14] i40e/i40evf: Refactor receive routine Harshitha Ramamurthy
2016-04-15 17:25   ` Alexander Duyck
2016-04-15 18:19     ` Jesse Brandeburg
2016-04-15 19:21       ` Alexander Duyck
2016-04-16  0:10         ` Jesse Brandeburg
2016-04-16  1:51           ` Alexander Duyck
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 12/14] i40e/i40evf: Remove unused hardware receive descriptor code Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 13/14] i40evf: Allocate rx buffers properly Harshitha Ramamurthy
2016-04-14 13:19 ` [Intel-wired-lan] [next PATCH S35 14/14] i40e: Test memory before ethtool alloc succeeds Harshitha Ramamurthy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.