All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW
@ 2017-12-09  6:27 Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 1/5] net: " Michael Chan
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Michael Chan @ 2017-12-09  6:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, andrew.gospodarek

Introduce NETIF_F_GRO_HW feature flag and convert drivers that support
hardware GRO to use the new flag.

v3:
- Let driver's ndo_fix_features() disable NETIF_F_LRO when NETIF_F_GRO_HW
is set instead of doing it in common netdev_fix_features().

v2:
- NETIF_F_GRO_HW flag propagation between upper and lower devices not
required (see patch 1).
- NETIF_F_GRO_HW depends on NETIF_F_GRO and NETIF_F_RXCSUM.
- Add dev_disable_gro_hw() to disable GRO_HW for generic XDP.
- Use ndo_fix_features() on all 3 drivers to drop GRO_HW when it is not
supported

Michael Chan (5):
  net: Introduce NETIF_F_GRO_HW.
  net: Disable GRO_HW when generic XDP is installed on a device.
  bnxt_en: Use NETIF_F_GRO_HW.
  bnx2x: Use NETIF_F_GRO_HW.
  qede: Use NETIF_F_GRO_HW.

 Documentation/networking/netdev-features.txt     |  8 ++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 19 ++++++++-----
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  4 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c        | 24 +++++++++++-----
 drivers/net/ethernet/qlogic/qede/qede.h          |  2 ++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c  |  3 ++
 drivers/net/ethernet/qlogic/qede/qede_filter.c   | 20 ++++++++-----
 drivers/net/ethernet/qlogic/qede/qede_main.c     | 17 ++++-------
 include/linux/netdev_features.h                  |  3 ++
 net/core/dev.c                                   | 36 ++++++++++++++++++++++++
 net/core/ethtool.c                               |  1 +
 11 files changed, 104 insertions(+), 33 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-09  6:27 [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW Michael Chan
@ 2017-12-09  6:27 ` Michael Chan
  2017-12-09 18:50   ` Alexander Duyck
  2017-12-09  6:27 ` [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device Michael Chan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Chan @ 2017-12-09  6:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, andrew.gospodarek, Ariel Elior, everest-linux-l2

Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
GRO.  With this flag, we can now independently turn on or off hardware
GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
control hardware GRO and so it cannot be independently turned on or
off without affecting GRO.

Hardware GRO (just like GRO) guarantees that packets can be re-segmented
by TSO/GSO to reconstruct the original packet stream.  It is a subset of
NETIF_F_GRO and depends on it, as well as NETIF_F_RXCSUM.

Since NETIF_F_GRO is not propagated between upper and lower devices,
NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
words, a lower device can independent have GRO/GRO_HW enabled or disabled
and no feature propagation is required.  This will preserve the current
GRO behavior.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 Documentation/networking/netdev-features.txt |  8 ++++++++
 include/linux/netdev_features.h              |  3 +++
 net/core/dev.c                               | 12 ++++++++++++
 net/core/ethtool.c                           |  1 +
 4 files changed, 24 insertions(+)

diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
index 7413eb0..8f36527 100644
--- a/Documentation/networking/netdev-features.txt
+++ b/Documentation/networking/netdev-features.txt
@@ -163,3 +163,11 @@ This requests that the NIC receive all possible frames, including errored
 frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
 bad packets on it.  Some NICs may receive more packets if also put into normal
 PROMISC mode.
+
+*  rx-gro-hw
+
+This requests that the NIC enables Hardware GRO (generic receive offload).
+Hardware GRO is basically the exact reverse of TSO, and is generally
+stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
+be re-segmentable by GSO or TSO back to the exact original packet stream.
+Hardware GRO is dependent on GRO and RXCSUM.
diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index b1b0ca7..db84c51 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -78,6 +78,8 @@ enum {
 	NETIF_F_HW_ESP_TX_CSUM_BIT,	/* ESP with TX checksum offload */
 	NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
 
+	NETIF_F_GRO_HW_BIT,		/* Hardware Generic receive offload */
+
 	/*
 	 * Add your fresh new feature above and remember to update
 	 * netdev_features_strings[] in net/core/ethtool.c and maybe
@@ -97,6 +99,7 @@ enum {
 #define NETIF_F_FRAGLIST	__NETIF_F(FRAGLIST)
 #define NETIF_F_FSO		__NETIF_F(FSO)
 #define NETIF_F_GRO		__NETIF_F(GRO)
+#define NETIF_F_GRO_HW		__NETIF_F(GRO_HW)
 #define NETIF_F_GSO		__NETIF_F(GSO)
 #define NETIF_F_GSO_ROBUST	__NETIF_F(GSO_ROBUST)
 #define NETIF_F_HIGHDMA		__NETIF_F(HIGHDMA)
diff --git a/net/core/dev.c b/net/core/dev.c
index e32cf5c..6ebd0e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7424,6 +7424,18 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
 		features &= ~dev->gso_partial_features;
 	}
 
+	if (features & NETIF_F_GRO_HW) {
+		/* Hardware GRO depends on GRO and RXCSUM. */
+		if (!(features & NETIF_F_GRO)) {
+			netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
+			features &= ~NETIF_F_GRO_HW;
+		}
+		if (!(features & NETIF_F_RXCSUM)) {
+			netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
+			features &= ~NETIF_F_GRO_HW;
+		}
+	}
+
 	return features;
 }
 
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index f8fcf45..50a7920 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -73,6 +73,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
 	[NETIF_F_LLTX_BIT] =             "tx-lockless",
 	[NETIF_F_NETNS_LOCAL_BIT] =      "netns-local",
 	[NETIF_F_GRO_BIT] =              "rx-gro",
+	[NETIF_F_GRO_HW_BIT] =           "rx-gro-hw",
 	[NETIF_F_LRO_BIT] =              "rx-lro",
 
 	[NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device.
  2017-12-09  6:27 [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 1/5] net: " Michael Chan
@ 2017-12-09  6:27 ` Michael Chan
  2017-12-09 18:56   ` Alexander Duyck
  2017-12-09  6:27 ` [PATCH net-next v3 3/5] bnxt_en: Use NETIF_F_GRO_HW Michael Chan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Chan @ 2017-12-09  6:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, andrew.gospodarek, Ariel Elior, everest-linux-l2

Hardware should not aggregate any packets when generic XDP is installed.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 net/core/dev.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6ebd0e7..ec08ace 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1542,6 +1542,29 @@ void dev_disable_lro(struct net_device *dev)
 }
 EXPORT_SYMBOL(dev_disable_lro);
 
+/**
+ *	dev_disable_gro_hw - disable HW Generic Receive Offload on a device
+ *	@dev: device
+ *
+ *	Disable HW Generic Receive Offload (GRO_HW) on a net device.  Must be
+ *	called under RTNL.  This is needed if Generic XDP is installed on
+ *	the device.
+ */
+static void dev_disable_gro_hw(struct net_device *dev)
+{
+	struct net_device *lower_dev;
+	struct list_head *iter;
+
+	dev->wanted_features &= ~NETIF_F_GRO_HW;
+	netdev_update_features(dev);
+
+	if (unlikely(dev->features & NETIF_F_GRO_HW))
+		netdev_WARN(dev, "failed to disable GRO_HW!\n");
+
+	netdev_for_each_lower_dev(dev, lower_dev, iter)
+		dev_disable_gro_hw(lower_dev);
+}
+
 static int call_netdevice_notifier(struct notifier_block *nb, unsigned long val,
 				   struct net_device *dev)
 {
@@ -4564,6 +4587,7 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
 		} else if (new && !old) {
 			static_key_slow_inc(&generic_xdp_needed);
 			dev_disable_lro(dev);
+			dev_disable_gro_hw(dev);
 		}
 		break;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v3 3/5] bnxt_en: Use NETIF_F_GRO_HW.
  2017-12-09  6:27 [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 1/5] net: " Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device Michael Chan
@ 2017-12-09  6:27 ` Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 4/5] bnx2x: " Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 5/5] qede: " Michael Chan
  4 siblings, 0 replies; 17+ messages in thread
From: Michael Chan @ 2017-12-09  6:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, andrew.gospodarek

Advertise NETIF_F_GRO_HW in hw_features if hardware GRO is supported.
In bnxt_fix_features(), disable GRO_HW and LRO if current hardware
configuration does not allow it.  XDP setup will now rely on
bnxt_fix_features() to turn off aggregation.  During chip init, turn on
or off hardware GRO based on NETIF_F_GRO_HW in features flag.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index af6c83f..04dddf6 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2755,7 +2755,7 @@ void bnxt_set_tpa_flags(struct bnxt *bp)
 		return;
 	if (bp->dev->features & NETIF_F_LRO)
 		bp->flags |= BNXT_FLAG_LRO;
-	if (bp->dev->features & NETIF_F_GRO)
+	else if (bp->dev->features & NETIF_F_GRO_HW)
 		bp->flags |= BNXT_FLAG_GRO;
 }
 
@@ -2843,10 +2843,10 @@ int bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode)
 			min_t(u16, bp->max_mtu, BNXT_MAX_PAGE_MODE_MTU);
 		bp->flags &= ~BNXT_FLAG_AGG_RINGS;
 		bp->flags |= BNXT_FLAG_NO_AGG_RINGS | BNXT_FLAG_RX_PAGE_MODE;
-		bp->dev->hw_features &= ~NETIF_F_LRO;
-		bp->dev->features &= ~NETIF_F_LRO;
 		bp->rx_dir = DMA_BIDIRECTIONAL;
 		bp->rx_skb_func = bnxt_rx_page_skb;
+		/* Disable LRO or GRO_HW */
+		netdev_update_features(bp->dev);
 	} else {
 		bp->dev->max_mtu = bp->max_mtu;
 		bp->flags &= ~BNXT_FLAG_RX_PAGE_MODE;
@@ -6788,6 +6788,12 @@ static netdev_features_t bnxt_fix_features(struct net_device *dev,
 	if ((features & NETIF_F_NTUPLE) && !bnxt_rfs_capable(bp))
 		features &= ~NETIF_F_NTUPLE;
 
+	if (bp->flags & BNXT_FLAG_NO_AGG_RINGS)
+		features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);
+
+	if (features & NETIF_F_GRO_HW)
+		features &= ~NETIF_F_LRO;
+
 	/* Both CTAG and STAG VLAN accelaration on the RX side have to be
 	 * turned on or off together.
 	 */
@@ -6821,9 +6827,9 @@ static int bnxt_set_features(struct net_device *dev, netdev_features_t features)
 	bool update_tpa = false;
 
 	flags &= ~BNXT_FLAG_ALL_CONFIG_FEATS;
-	if ((features & NETIF_F_GRO) && !BNXT_CHIP_TYPE_NITRO_A0(bp))
+	if (features & NETIF_F_GRO_HW)
 		flags |= BNXT_FLAG_GRO;
-	if (features & NETIF_F_LRO)
+	else if (features & NETIF_F_LRO)
 		flags |= BNXT_FLAG_LRO;
 
 	if (bp->flags & BNXT_FLAG_NO_AGG_RINGS)
@@ -7924,8 +7930,8 @@ static int bnxt_get_dflt_rings(struct bnxt *bp, int *max_rx, int *max_tx,
 		if (rc)
 			return rc;
 		bp->flags |= BNXT_FLAG_NO_AGG_RINGS;
-		bp->dev->hw_features &= ~NETIF_F_LRO;
-		bp->dev->features &= ~NETIF_F_LRO;
+		bp->dev->hw_features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);
+		bp->dev->features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);
 		bnxt_set_ring_params(bp);
 	}
 
@@ -8108,7 +8114,11 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	dev->vlan_features = dev->hw_features | NETIF_F_HIGHDMA;
 	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_TX |
 			    NETIF_F_HW_VLAN_STAG_RX | NETIF_F_HW_VLAN_STAG_TX;
+	if (!BNXT_CHIP_TYPE_NITRO_A0(bp))
+		dev->hw_features |= NETIF_F_GRO_HW;
 	dev->features |= dev->hw_features | NETIF_F_HIGHDMA;
+	if (dev->features & NETIF_F_GRO_HW)
+		dev->features &= ~NETIF_F_LRO;
 	dev->priv_flags |= IFF_UNICAST_FLT;
 
 #ifdef CONFIG_BNXT_SRIOV
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v3 4/5] bnx2x: Use NETIF_F_GRO_HW.
  2017-12-09  6:27 [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW Michael Chan
                   ` (2 preceding siblings ...)
  2017-12-09  6:27 ` [PATCH net-next v3 3/5] bnxt_en: Use NETIF_F_GRO_HW Michael Chan
@ 2017-12-09  6:27 ` Michael Chan
  2017-12-09  6:27 ` [PATCH net-next v3 5/5] qede: " Michael Chan
  4 siblings, 0 replies; 17+ messages in thread
From: Michael Chan @ 2017-12-09  6:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, andrew.gospodarek, Ariel Elior, everest-linux-l2

Advertise NETIF_F_GRO_HW and turn on TPA_MODE_GRO when NETIF_F_GRO_HW
is set.  Disable NETIF_F_GRO_HW in bnx2x_fix_features() if the MTU
does not support TPA_MODE_GRO.  bnx2x_change_mtu() also needs to disable
NETIF_F_GRO_HW if the MTU does not support it.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 19 ++++++++++++-------
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |  4 +++-
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 4c739d5..c75cfdd 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -2482,8 +2482,7 @@ static void bnx2x_bz_fp(struct bnx2x *bp, int index)
 	 */
 	if (bp->dev->features & NETIF_F_LRO)
 		fp->mode = TPA_MODE_LRO;
-	else if (bp->dev->features & NETIF_F_GRO &&
-		 bnx2x_mtu_allows_gro(bp->dev->mtu))
+	else if (bp->dev->features & NETIF_F_GRO_HW)
 		fp->mode = TPA_MODE_GRO;
 	else
 		fp->mode = TPA_MODE_DISABLED;
@@ -4874,6 +4873,9 @@ int bnx2x_change_mtu(struct net_device *dev, int new_mtu)
 	 */
 	dev->mtu = new_mtu;
 
+	if (!bnx2x_mtu_allows_gro(new_mtu))
+		dev->features &= ~NETIF_F_GRO_HW;
+
 	if (IS_PF(bp) && SHMEM2_HAS(bp, curr_cfg))
 		SHMEM2_WR(bp, curr_cfg, CURR_CFG_MET_OS);
 
@@ -4907,6 +4909,10 @@ netdev_features_t bnx2x_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_LRO;
 		features &= ~NETIF_F_GRO;
 	}
+	if (!bnx2x_mtu_allows_gro(dev->mtu))
+		features &= ~NETIF_F_GRO_HW;
+	if (features & NETIF_F_GRO_HW)
+		features &= ~NETIF_F_LRO;
 
 	return features;
 }
@@ -4933,13 +4939,12 @@ int bnx2x_set_features(struct net_device *dev, netdev_features_t features)
 		}
 	}
 
-	/* if GRO is changed while LRO is enabled, don't force a reload */
-	if ((changes & NETIF_F_GRO) && (features & NETIF_F_LRO))
-		changes &= ~NETIF_F_GRO;
+	/* Don't care about GRO changes */
+	changes &= ~NETIF_F_GRO;
 
 	/* if GRO is changed while HW TPA is off, don't force a reload */
-	if ((changes & NETIF_F_GRO) && bp->disable_tpa)
-		changes &= ~NETIF_F_GRO;
+	if ((changes & NETIF_F_GRO_HW) && bp->disable_tpa)
+		changes &= ~NETIF_F_GRO_HW;
 
 	if (changes)
 		bnx2x_reload = true;
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 91e2a75..1c7f821 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13273,7 +13273,7 @@ static int bnx2x_init_dev(struct bnx2x *bp, struct pci_dev *pdev,
 
 	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 |
-		NETIF_F_RXCSUM | NETIF_F_LRO | NETIF_F_GRO |
+		NETIF_F_RXCSUM | NETIF_F_LRO | NETIF_F_GRO | NETIF_F_GRO_HW |
 		NETIF_F_RXHASH | NETIF_F_HW_VLAN_CTAG_TX;
 	if (!chip_is_e1x) {
 		dev->hw_features |= NETIF_F_GSO_GRE | NETIF_F_GSO_GRE_CSUM |
@@ -13309,6 +13309,8 @@ static int bnx2x_init_dev(struct bnx2x *bp, struct pci_dev *pdev,
 
 	dev->features |= dev->hw_features | NETIF_F_HW_VLAN_CTAG_RX;
 	dev->features |= NETIF_F_HIGHDMA;
+	if (dev->features & NETIF_F_GRO_HW)
+		dev->features &= ~NETIF_F_LRO;
 
 	/* Add Loopback capability to the device */
 	dev->hw_features |= NETIF_F_LOOPBACK;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH net-next v3 5/5] qede: Use NETIF_F_GRO_HW.
  2017-12-09  6:27 [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW Michael Chan
                   ` (3 preceding siblings ...)
  2017-12-09  6:27 ` [PATCH net-next v3 4/5] bnx2x: " Michael Chan
@ 2017-12-09  6:27 ` Michael Chan
  4 siblings, 0 replies; 17+ messages in thread
From: Michael Chan @ 2017-12-09  6:27 UTC (permalink / raw)
  To: davem; +Cc: netdev, andrew.gospodarek, Ariel Elior, everest-linux-l2

Advertise NETIF_F_GRO_HW and set edev->gro_disable according to the
feature flag.  Add qede_fix_features() to drop NETIF_F_GRO_HW if
XDP is running or MTU does not support GRO_HW.  qede_change_mtu()
also checks and disables GRO_HW if MTU is not supported.

Cc: Ariel Elior <Ariel.Elior@cavium.com>
Cc: everest-linux-l2@cavium.com
Acked-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
---
 drivers/net/ethernet/qlogic/qede/qede.h         |  2 ++
 drivers/net/ethernet/qlogic/qede/qede_ethtool.c |  3 +++
 drivers/net/ethernet/qlogic/qede/qede_filter.c  | 20 +++++++++++++-------
 drivers/net/ethernet/qlogic/qede/qede_main.c    | 17 ++++++-----------
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index a3a70ad..8a33651 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -494,6 +494,8 @@ int qede_free_tx_pkt(struct qede_dev *edev,
 void qede_vlan_mark_nonconfigured(struct qede_dev *edev);
 int qede_configure_vlan_filters(struct qede_dev *edev);
 
+netdev_features_t qede_fix_features(struct net_device *dev,
+				    netdev_features_t features);
 int qede_set_features(struct net_device *dev, netdev_features_t features);
 void qede_set_rx_mode(struct net_device *ndev);
 void qede_config_rx_mode(struct net_device *ndev);
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index dae7412..4ca3847 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -940,6 +940,9 @@ int qede_change_mtu(struct net_device *ndev, int new_mtu)
 	DP_VERBOSE(edev, (NETIF_MSG_IFUP | NETIF_MSG_IFDOWN),
 		   "Configuring MTU size of %d\n", new_mtu);
 
+	if (new_mtu > PAGE_SIZE)
+		ndev->features &= ~NETIF_F_GRO_HW;
+
 	/* Set the mtu field and re-start the interface if needed */
 	args.u.mtu = new_mtu;
 	args.func = &qede_update_mtu;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_filter.c b/drivers/net/ethernet/qlogic/qede/qede_filter.c
index c1a0708..2de947e 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_filter.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_filter.c
@@ -895,19 +895,25 @@ static void qede_set_features_reload(struct qede_dev *edev,
 	edev->ndev->features = args->u.features;
 }
 
+netdev_features_t qede_fix_features(struct net_device *dev,
+				    netdev_features_t features)
+{
+	struct qede_dev *edev = netdev_priv(dev);
+
+	if (edev->xdp_prog || edev->ndev->mtu > PAGE_SIZE)
+		features &= ~NETIF_F_GRO_HW;
+
+	return features;
+}
+
 int qede_set_features(struct net_device *dev, netdev_features_t features)
 {
 	struct qede_dev *edev = netdev_priv(dev);
 	netdev_features_t changes = features ^ dev->features;
 	bool need_reload = false;
 
-	/* No action needed if hardware GRO is disabled during driver load */
-	if (changes & NETIF_F_GRO) {
-		if (dev->features & NETIF_F_GRO)
-			need_reload = !edev->gro_disable;
-		else
-			need_reload = edev->gro_disable;
-	}
+	if (changes & NETIF_F_GRO_HW)
+		need_reload = true;
 
 	if (need_reload) {
 		struct qede_reload_args args;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 57332b3..90d79ae 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -545,6 +545,7 @@ static int qede_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 #endif
 	.ndo_vlan_rx_add_vid = qede_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = qede_vlan_rx_kill_vid,
+	.ndo_fix_features = qede_fix_features,
 	.ndo_set_features = qede_set_features,
 	.ndo_get_stats64 = qede_get_stats64,
 #ifdef CONFIG_QED_SRIOV
@@ -572,6 +573,7 @@ static int qede_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	.ndo_change_mtu = qede_change_mtu,
 	.ndo_vlan_rx_add_vid = qede_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = qede_vlan_rx_kill_vid,
+	.ndo_fix_features = qede_fix_features,
 	.ndo_set_features = qede_set_features,
 	.ndo_get_stats64 = qede_get_stats64,
 	.ndo_udp_tunnel_add = qede_udp_tunnel_add,
@@ -589,6 +591,7 @@ static int qede_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	.ndo_change_mtu = qede_change_mtu,
 	.ndo_vlan_rx_add_vid = qede_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = qede_vlan_rx_kill_vid,
+	.ndo_fix_features = qede_fix_features,
 	.ndo_set_features = qede_set_features,
 	.ndo_get_stats64 = qede_get_stats64,
 	.ndo_udp_tunnel_add = qede_udp_tunnel_add,
@@ -676,7 +679,7 @@ static void qede_init_ndev(struct qede_dev *edev)
 	ndev->priv_flags |= IFF_UNICAST_FLT;
 
 	/* user-changeble features */
-	hw_features = NETIF_F_GRO | NETIF_F_SG |
+	hw_features = NETIF_F_GRO | NETIF_F_GRO_HW | NETIF_F_SG |
 		      NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		      NETIF_F_TSO | NETIF_F_TSO6;
 
@@ -1228,18 +1231,9 @@ static int qede_alloc_sge_mem(struct qede_dev *edev, struct qede_rx_queue *rxq)
 	dma_addr_t mapping;
 	int i;
 
-	/* Don't perform FW aggregations in case of XDP */
-	if (edev->xdp_prog)
-		edev->gro_disable = 1;
-
 	if (edev->gro_disable)
 		return 0;
 
-	if (edev->ndev->mtu > PAGE_SIZE) {
-		edev->gro_disable = 1;
-		return 0;
-	}
-
 	for (i = 0; i < ETH_TPA_MAX_AGGS_NUM; i++) {
 		struct qede_agg_info *tpa_info = &rxq->tpa_info[i];
 		struct sw_rx_data *replace_buf = &tpa_info->buffer;
@@ -1269,6 +1263,7 @@ static int qede_alloc_sge_mem(struct qede_dev *edev, struct qede_rx_queue *rxq)
 err:
 	qede_free_sge_mem(edev, rxq);
 	edev->gro_disable = 1;
+	edev->ndev->features &= ~NETIF_F_GRO_HW;
 	return -ENOMEM;
 }
 
@@ -1511,7 +1506,7 @@ static void qede_init_fp(struct qede_dev *edev)
 			 edev->ndev->name, queue_id);
 	}
 
-	edev->gro_disable = !(edev->ndev->features & NETIF_F_GRO);
+	edev->gro_disable = !(edev->ndev->features & NETIF_F_GRO_HW);
 }
 
 static int qede_set_real_num_queues(struct qede_dev *edev)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-09  6:27 ` [PATCH net-next v3 1/5] net: " Michael Chan
@ 2017-12-09 18:50   ` Alexander Duyck
  2017-12-09 21:31     ` Michael Chan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2017-12-09 18:50 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
> GRO.  With this flag, we can now independently turn on or off hardware
> GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
> control hardware GRO and so it cannot be independently turned on or
> off without affecting GRO.
>
> Hardware GRO (just like GRO) guarantees that packets can be re-segmented
> by TSO/GSO to reconstruct the original packet stream.  It is a subset of
> NETIF_F_GRO and depends on it, as well as NETIF_F_RXCSUM.

So I would disagree with it being a subset of NETIF_F_GRO. If anything
it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
at the device level in the case of hardware drivers. My concern is
this is probably going to end up applying to things other than just
hardware drivers though. For example what is to prevent this from
being applied to something like a virtio/tap interface? It seems like
this should be something that would be easy to implement in software.
In addition as I said in my earlier comments I think we should
probably look at using this new feature bit to indicate that we allow
GRO to occur at or below this device as opposed to just above it as
currently occurs with conventional GRO.

> Since NETIF_F_GRO is not propagated between upper and lower devices,
> NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
> words, a lower device can independent have GRO/GRO_HW enabled or disabled
> and no feature propagation is required.  This will preserve the current
> GRO behavior.

I'm going to back off on my requirement for you to handle propagation
since after spending a couple hours working on it I did find it was
more complex then I originally thought it would be. With that said
however I would want to see this feature implemented in such a way
that we can deal with propagating the bits in the future if we need to
and that is what I am basing my comments on. My concern is when this
ends up breaking we need to have a way to fix it and I don't want that
fix to end up being having to disable GRO across the board.

> Cc: Ariel Elior <Ariel.Elior@cavium.com>
> Cc: everest-linux-l2@cavium.com
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>



> ---
>  Documentation/networking/netdev-features.txt |  8 ++++++++
>  include/linux/netdev_features.h              |  3 +++
>  net/core/dev.c                               | 12 ++++++++++++
>  net/core/ethtool.c                           |  1 +
>  4 files changed, 24 insertions(+)
>
> diff --git a/Documentation/networking/netdev-features.txt b/Documentation/networking/netdev-features.txt
> index 7413eb0..8f36527 100644
> --- a/Documentation/networking/netdev-features.txt
> +++ b/Documentation/networking/netdev-features.txt
> @@ -163,3 +163,11 @@ This requests that the NIC receive all possible frames, including errored
>  frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
>  bad packets on it.  Some NICs may receive more packets if also put into normal
>  PROMISC mode.
> +
> +*  rx-gro-hw
> +
> +This requests that the NIC enables Hardware GRO (generic receive offload).
> +Hardware GRO is basically the exact reverse of TSO, and is generally
> +stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
> +be re-segmentable by GSO or TSO back to the exact original packet stream.
> +Hardware GRO is dependent on GRO and RXCSUM.
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index b1b0ca7..db84c51 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -78,6 +78,8 @@ enum {
>         NETIF_F_HW_ESP_TX_CSUM_BIT,     /* ESP with TX checksum offload */
>         NETIF_F_RX_UDP_TUNNEL_PORT_BIT, /* Offload of RX port for UDP tunnels */
>
> +       NETIF_F_GRO_HW_BIT,             /* Hardware Generic receive offload */
> +
>         /*
>          * Add your fresh new feature above and remember to update
>          * netdev_features_strings[] in net/core/ethtool.c and maybe
> @@ -97,6 +99,7 @@ enum {
>  #define NETIF_F_FRAGLIST       __NETIF_F(FRAGLIST)
>  #define NETIF_F_FSO            __NETIF_F(FSO)
>  #define NETIF_F_GRO            __NETIF_F(GRO)
> +#define NETIF_F_GRO_HW         __NETIF_F(GRO_HW)
>  #define NETIF_F_GSO            __NETIF_F(GSO)
>  #define NETIF_F_GSO_ROBUST     __NETIF_F(GSO_ROBUST)
>  #define NETIF_F_HIGHDMA                __NETIF_F(HIGHDMA)
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e32cf5c..6ebd0e7 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -7424,6 +7424,18 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>                 features &= ~dev->gso_partial_features;
>         }
>
> +       if (features & NETIF_F_GRO_HW) {
> +               /* Hardware GRO depends on GRO and RXCSUM. */
> +               if (!(features & NETIF_F_GRO)) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
> +                       features &= ~NETIF_F_GRO_HW;
> +               }

I still disagree with this bit. I think GRO is a pure software
offload, whereas GRO_HW can represent either a software offload of
some sort occurring in or before the driver, or in the hardware.
Basically the difference between the two as I view it is where the GRO
is occurring. I would like to keep that distinction and make use of
it. As I mentioned before in the case of bonding we currently have no
way to disable GRO on the lower devices partially because GRO is a
pure software feature and always happens at each device along the way.
The nice thing about this new bit is the assumption is that it is
pushing GRO to the lowest possible level and not triggering any side
effects like GRO currently does. I hope to use that logic with stacked
devices so that we could clear the bit and have it disable GRO,
GRO_HW, and LRO on all devices below the device that cleared it.

I think this linking of GRO and GRO_HW is something that would be
better served by moving it into the driver if you are wanting to
maintain the behavior of how this was previously linked to GRO. It
also makes it so that it is much easier to compare the performance for
GRO_HW against just a pure software GRO since you could then enable
them independently. Software GRO can come at a cost, and leaving it
enabled when you want to do it all in hardware is just adding a
penalty of sorts since I know for many of my routing tests I normally
disable GRO as it has a significant per-packet cost for small packet
workloads.

> +               if (!(features & NETIF_F_RXCSUM)) {
> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
> +                       features &= ~NETIF_F_GRO_HW;
> +               }

So I was thinking about this. For LRO it makes sense to disable it in
the case of RXCSUM being disabled since most implementations leave the
Rx checksum mangled. However for GRO I am not sure it makes complete
sense. For software GRO we perform checksum validation in either
tcp4_gro_receive or tcp6_gro_receive. Why should the hardware
implementation behave differently? When a GRO frame is assembled the
checksum is converted to CHECKSUM_PARTIAL anyway even if Rx checksum
validation is disabled for the driver.

I think this may be a hardware/driver specific implementation detail
and may not be generic enough to belong here. Regular GRO works
without RXCSUM, so why should we make an exception for the hardware
based version? I know in the case of the Intel NICs we don't ever
actually disable the checksum validation, we just don't report the
result we were given from the hardware and hand all the frames up the
stack. If we were implementing something like this we could still
support GRO in the hardware without reporting Rx check-sums otherwise.

The alternative way to look at it is that we shouldn't support any
form of packet mangling at he driver level if RXCSUM is disabled. In
which case, I would say we should probably frame it that way and
disable both LRO and GRO_HW if RXCSUM is enabled because this is
another case where this looks more like LRO than GRO.

> +       }
> +
>         return features;
>  }
>
> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index f8fcf45..50a7920 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -73,6 +73,7 @@ int ethtool_op_get_ts_info(struct net_device *dev, struct ethtool_ts_info *info)
>         [NETIF_F_LLTX_BIT] =             "tx-lockless",
>         [NETIF_F_NETNS_LOCAL_BIT] =      "netns-local",
>         [NETIF_F_GRO_BIT] =              "rx-gro",
> +       [NETIF_F_GRO_HW_BIT] =           "rx-gro-hw",
>         [NETIF_F_LRO_BIT] =              "rx-lro",
>
>         [NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device.
  2017-12-09  6:27 ` [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device Michael Chan
@ 2017-12-09 18:56   ` Alexander Duyck
  2017-12-09 21:40     ` Michael Chan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2017-12-09 18:56 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> Hardware should not aggregate any packets when generic XDP is installed.
>
> Cc: Ariel Elior <Ariel.Elior@cavium.com>
> Cc: everest-linux-l2@cavium.com
> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
> ---
>  net/core/dev.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6ebd0e7..ec08ace 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1542,6 +1542,29 @@ void dev_disable_lro(struct net_device *dev)
>  }
>  EXPORT_SYMBOL(dev_disable_lro);
>
> +/**
> + *     dev_disable_gro_hw - disable HW Generic Receive Offload on a device
> + *     @dev: device
> + *
> + *     Disable HW Generic Receive Offload (GRO_HW) on a net device.  Must be
> + *     called under RTNL.  This is needed if Generic XDP is installed on
> + *     the device.
> + */
> +static void dev_disable_gro_hw(struct net_device *dev)
> +{
> +       struct net_device *lower_dev;
> +       struct list_head *iter;
> +
> +       dev->wanted_features &= ~NETIF_F_GRO_HW;
> +       netdev_update_features(dev);
> +
> +       if (unlikely(dev->features & NETIF_F_GRO_HW))
> +               netdev_WARN(dev, "failed to disable GRO_HW!\n");
> +
> +       netdev_for_each_lower_dev(dev, lower_dev, iter)
> +               dev_disable_gro_hw(lower_dev);

I think these two lines are redundant in dev_disable_lro, since
netdev_update_features should propagate the disable to all of the
lower devices. Also this doesn't prevent the lower devices from
re-enabling gro_hw.

> +}
> +
>  static int call_netdevice_notifier(struct notifier_block *nb, unsigned long val,
>                                    struct net_device *dev)
>  {
> @@ -4564,6 +4587,7 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
>                 } else if (new && !old) {
>                         static_key_slow_inc(&generic_xdp_needed);
>                         dev_disable_lro(dev);
> +                       dev_disable_gro_hw(dev);
>                 }
>                 break;
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-09 18:50   ` Alexander Duyck
@ 2017-12-09 21:31     ` Michael Chan
  2017-12-09 22:04       ` Alexander Duyck
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Chan @ 2017-12-09 21:31 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 10:50 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
>> GRO.  With this flag, we can now independently turn on or off hardware
>> GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
>> control hardware GRO and so it cannot be independently turned on or
>> off without affecting GRO.
>>
>> Hardware GRO (just like GRO) guarantees that packets can be re-segmented
>> by TSO/GSO to reconstruct the original packet stream.  It is a subset of
>> NETIF_F_GRO and depends on it, as well as NETIF_F_RXCSUM.
>
> So I would disagree with it being a subset of NETIF_F_GRO. If anything
> it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
> at the device level in the case of hardware drivers. My concern is
> this is probably going to end up applying to things other than just
> hardware drivers though. For example what is to prevent this from
> being applied to something like a virtio/tap interface? It seems like
> this should be something that would be easy to implement in software.

If you do it in software, it's called NETIF_F_GRO.  We already have
it.  The whole point of the new flag is that if the device has
software GRO enabled, and if the device supports GRO_HW, then we can
do a subset of GRO in hardware (hopefully faster).

> In addition as I said in my earlier comments I think we should
> probably look at using this new feature bit to indicate that we allow
> GRO to occur at or below this device as opposed to just above it as
> currently occurs with conventional GRO.
>
>> Since NETIF_F_GRO is not propagated between upper and lower devices,
>> NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
>> words, a lower device can independent have GRO/GRO_HW enabled or disabled
>> and no feature propagation is required.  This will preserve the current
>> GRO behavior.
>
> I'm going to back off on my requirement for you to handle propagation
> since after spending a couple hours working on it I did find it was
> more complex then I originally thought it would be. With that said
> however I would want to see this feature implemented in such a way
> that we can deal with propagating the bits in the future if we need to
> and that is what I am basing my comments on.

Nothing stops anyone from propagating the flag.  Just add
NETIF_F_GRO_HW to NETIF_F_UPPER_DISABLES and it will be propagated
just like LRO.


>> @@ -7424,6 +7424,18 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>                 features &= ~dev->gso_partial_features;
>>         }
>>
>> +       if (features & NETIF_F_GRO_HW) {
>> +               /* Hardware GRO depends on GRO and RXCSUM. */
>> +               if (!(features & NETIF_F_GRO)) {
>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
>> +                       features &= ~NETIF_F_GRO_HW;
>> +               }
>
> I still disagree with this bit. I think GRO is a pure software
> offload, whereas GRO_HW can represent either a software offload of
> some sort occurring in or before the driver, or in the hardware.
> Basically the difference between the two as I view it is where the GRO
> is occurring. I would like to keep that distinction and make use of
> it. As I mentioned before in the case of bonding we currently have no
> way to disable GRO on the lower devices partially because GRO is a
> pure software feature and always happens at each device along the way.
> The nice thing about this new bit is the assumption is that it is
> pushing GRO to the lowest possible level and not triggering any side
> effects like GRO currently does. I hope to use that logic with stacked
> devices so that we could clear the bit and have it disable GRO,
> GRO_HW, and LRO on all devices below the device that cleared it.
>
> I think this linking of GRO and GRO_HW is something that would be
> better served by moving it into the driver if you are wanting to
> maintain the behavior of how this was previously linked to GRO.

If you insist, I can move this to the driver's ndo_fix_features().
But I feel it is much better to enforce this dependency system wide.
Once again, GRO_HW is hardware accelerated GRO and should depend on
it.

> It
> also makes it so that it is much easier to compare the performance for
> GRO_HW against just a pure software GRO since you could then enable
> them independently. Software GRO can come at a cost, and leaving it
> enabled when you want to do it all in hardware is just adding a
> penalty of sorts since I know for many of my routing tests I normally
> disable GRO as it has a significant per-packet cost for small packet
> workloads.
>
>> +               if (!(features & NETIF_F_RXCSUM)) {
>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
>> +                       features &= ~NETIF_F_GRO_HW;
>> +               }
>
> So I was thinking about this. For LRO it makes sense to disable it in
> the case of RXCSUM being disabled since most implementations leave the
> Rx checksum mangled. However for GRO I am not sure it makes complete
> sense. For software GRO we perform checksum validation in either
> tcp4_gro_receive or tcp6_gro_receive. Why should the hardware
> implementation behave differently? When a GRO frame is assembled the
> checksum is converted to CHECKSUM_PARTIAL anyway even if Rx checksum
> validation is disabled for the driver.

This is a logical feature dependency that Yuval Mintz suggested.  For
GRO_HW to work, hardware must verify the checksum of a packet before
the packet can be merged.

So if the user does not want to do RXCSUM on this device for whatever
reason, it logically means that he also doesn't want to do GRO_HW with
implied RXCSUM performed on each packet that is merged.

So I agree with Yuval that this dependency makes sense.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device.
  2017-12-09 18:56   ` Alexander Duyck
@ 2017-12-09 21:40     ` Michael Chan
  2017-12-09 22:37       ` Alexander Duyck
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Chan @ 2017-12-09 21:40 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 10:56 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> Hardware should not aggregate any packets when generic XDP is installed.
>>
>> Cc: Ariel Elior <Ariel.Elior@cavium.com>
>> Cc: everest-linux-l2@cavium.com
>> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>> ---
>>  net/core/dev.c | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 6ebd0e7..ec08ace 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1542,6 +1542,29 @@ void dev_disable_lro(struct net_device *dev)
>>  }
>>  EXPORT_SYMBOL(dev_disable_lro);
>>
>> +/**
>> + *     dev_disable_gro_hw - disable HW Generic Receive Offload on a device
>> + *     @dev: device
>> + *
>> + *     Disable HW Generic Receive Offload (GRO_HW) on a net device.  Must be
>> + *     called under RTNL.  This is needed if Generic XDP is installed on
>> + *     the device.
>> + */
>> +static void dev_disable_gro_hw(struct net_device *dev)
>> +{
>> +       struct net_device *lower_dev;
>> +       struct list_head *iter;
>> +
>> +       dev->wanted_features &= ~NETIF_F_GRO_HW;
>> +       netdev_update_features(dev);
>> +
>> +       if (unlikely(dev->features & NETIF_F_GRO_HW))
>> +               netdev_WARN(dev, "failed to disable GRO_HW!\n");
>> +
>> +       netdev_for_each_lower_dev(dev, lower_dev, iter)
>> +               dev_disable_gro_hw(lower_dev);
>
> I think these two lines are redundant in dev_disable_lro, since
> netdev_update_features should propagate the disable to all of the
> lower devices.

Right.  But for GRO_HW, there is no automatic propagation.

> Also this doesn't prevent the lower devices from
> re-enabling gro_hw.

Right.  You can re-enable LRO on the upper device as well.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-09 21:31     ` Michael Chan
@ 2017-12-09 22:04       ` Alexander Duyck
  2017-12-10  6:40         ` Michael Chan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2017-12-09 22:04 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 1:31 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Sat, Dec 9, 2017 at 10:50 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> Introduce NETIF_F_GRO_HW feature flag for NICs that support hardware
>>> GRO.  With this flag, we can now independently turn on or off hardware
>>> GRO when GRO is on.  Previously, drivers were using NETIF_F_GRO to
>>> control hardware GRO and so it cannot be independently turned on or
>>> off without affecting GRO.
>>>
>>> Hardware GRO (just like GRO) guarantees that packets can be re-segmented
>>> by TSO/GSO to reconstruct the original packet stream.  It is a subset of
>>> NETIF_F_GRO and depends on it, as well as NETIF_F_RXCSUM.
>>
>> So I would disagree with it being a subset of NETIF_F_GRO. If anything
>> it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
>> at the device level in the case of hardware drivers. My concern is
>> this is probably going to end up applying to things other than just
>> hardware drivers though. For example what is to prevent this from
>> being applied to something like a virtio/tap interface? It seems like
>> this should be something that would be easy to implement in software.
>
> If you do it in software, it's called NETIF_F_GRO.  We already have
> it.  The whole point of the new flag is that if the device has
> software GRO enabled, and if the device supports GRO_HW, then we can
> do a subset of GRO in hardware (hopefully faster).

I can see what you are getting at. But GRO_HW with GRO stacked on top
of it won't necessarily be the fastest form of GRO. If you have a
GRO_HW implementation that is complete enough people may want to
disable Software GRO in order to avoid the extra overhead involved
with using it.

>> In addition as I said in my earlier comments I think we should
>> probably look at using this new feature bit to indicate that we allow
>> GRO to occur at or below this device as opposed to just above it as
>> currently occurs with conventional GRO.
>>
>>> Since NETIF_F_GRO is not propagated between upper and lower devices,
>>> NETIF_F_GRO_HW should follow suit since it is a subset of GRO.  In other
>>> words, a lower device can independent have GRO/GRO_HW enabled or disabled
>>> and no feature propagation is required.  This will preserve the current
>>> GRO behavior.
>>
>> I'm going to back off on my requirement for you to handle propagation
>> since after spending a couple hours working on it I did find it was
>> more complex then I originally thought it would be. With that said
>> however I would want to see this feature implemented in such a way
>> that we can deal with propagating the bits in the future if we need to
>> and that is what I am basing my comments on.
>
> Nothing stops anyone from propagating the flag.  Just add
> NETIF_F_GRO_HW to NETIF_F_UPPER_DISABLES and it will be propagated
> just like LRO.

Yes, but the problem then is it doesn't solve the secondary issue of
no way to propagate down the desire to disable GRO as well. That is
why I am thinking that the new bit could be used to indicate that we
want GRO to be supported either in the driver or below it instead of
only in "hardware". We are much better off with a generic solution and
that is why I think it might be better to use more of a pipeline or
staged type definition for this. Basically with GRO it occurs in the
GRO logic just after the driver hands off the packet, while this new
bit indicates that GRO happens somewhere before then. If we use that
definition for this then it becomes usable to deal with things like
the stacked devices problem where the stacked devices normally have
the GRO flag disabled since we don't want to run GRO multiple times,
but as a result the stacked devices have no way of saying they don't
want GRO. If we tweak the definition of this bit it solves that
problem since it would allow for us disabling GRO, GRO_HW, and LRO on
any devices below a given device.

>>> @@ -7424,6 +7424,18 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>                 features &= ~dev->gso_partial_features;
>>>         }
>>>
>>> +       if (features & NETIF_F_GRO_HW) {
>>> +               /* Hardware GRO depends on GRO and RXCSUM. */
>>> +               if (!(features & NETIF_F_GRO)) {
>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no GRO feature.\n");
>>> +                       features &= ~NETIF_F_GRO_HW;
>>> +               }
>>
>> I still disagree with this bit. I think GRO is a pure software
>> offload, whereas GRO_HW can represent either a software offload of
>> some sort occurring in or before the driver, or in the hardware.
>> Basically the difference between the two as I view it is where the GRO
>> is occurring. I would like to keep that distinction and make use of
>> it. As I mentioned before in the case of bonding we currently have no
>> way to disable GRO on the lower devices partially because GRO is a
>> pure software feature and always happens at each device along the way.
>> The nice thing about this new bit is the assumption is that it is
>> pushing GRO to the lowest possible level and not triggering any side
>> effects like GRO currently does. I hope to use that logic with stacked
>> devices so that we could clear the bit and have it disable GRO,
>> GRO_HW, and LRO on all devices below the device that cleared it.
>>
>> I think this linking of GRO and GRO_HW is something that would be
>> better served by moving it into the driver if you are wanting to
>> maintain the behavior of how this was previously linked to GRO.
>
> If you insist, I can move this to the driver's ndo_fix_features().
> But I feel it is much better to enforce this dependency system wide.
> Once again, GRO_HW is hardware accelerated GRO and should depend on
> it.

The question I would have is why? Where is the dependency? I don't see
it. It is GRO in one spot and/or GRO in the other. The two don't
interract directly and I don't believe you can do software GRO on a
frame that has already been coalesced in hardware, and you take a
performance penalty for trying to offload in software what you would
have already been handled in hardware.

Also, when we start propagating this up to indicate it is active we
don't want to have the GRO dependency since it would just make things
more expensive since we only need to do GRO in software once.

>> It
>> also makes it so that it is much easier to compare the performance for
>> GRO_HW against just a pure software GRO since you could then enable
>> them independently. Software GRO can come at a cost, and leaving it
>> enabled when you want to do it all in hardware is just adding a
>> penalty of sorts since I know for many of my routing tests I normally
>> disable GRO as it has a significant per-packet cost for small packet
>> workloads.
>>
>>> +               if (!(features & NETIF_F_RXCSUM)) {
>>> +                       netdev_dbg(dev, "Dropping NETIF_F_GSO_HW since no RXCSUM feature.\n");
>>> +                       features &= ~NETIF_F_GRO_HW;
>>> +               }
>>
>> So I was thinking about this. For LRO it makes sense to disable it in
>> the case of RXCSUM being disabled since most implementations leave the
>> Rx checksum mangled. However for GRO I am not sure it makes complete
>> sense. For software GRO we perform checksum validation in either
>> tcp4_gro_receive or tcp6_gro_receive. Why should the hardware
>> implementation behave differently? When a GRO frame is assembled the
>> checksum is converted to CHECKSUM_PARTIAL anyway even if Rx checksum
>> validation is disabled for the driver.
>
> This is a logical feature dependency that Yuval Mintz suggested.  For
> GRO_HW to work, hardware must verify the checksum of a packet before
> the packet can be merged.
>
> So if the user does not want to do RXCSUM on this device for whatever
> reason, it logically means that he also doesn't want to do GRO_HW with
> implied RXCSUM performed on each packet that is merged.
>
> So I agree with Yuval that this dependency makes sense.

Okay then, so if we are going to go that route we may want to be
complete on this and just disable GRO_HW and LRO if RXCSUM is not
enabled. We might also want to add a comment indicating that we don't
support anything that might mangle a packet at the driver level if
RXCSUM is not enabled. Comments explaining all this would be a good
thing just to keep someone from grabbing GRO and lumping it in at some
point in the future.

I'm still working on trying to propagate the Rx checksum properly
since it should probably follow the same UPPER_DISABLES behavior as
LRO, but I will probably only have a few hours over the next week to
really work on any code and there end up being a number of stacked
drivers that have to be updated. I would be good with just flipping
this logic for now and if RXCSUM is not set, and GRO_HW (just noticed
the typo in your message) is set, then print your message and clear
the bit. I can probably come back later and add LRO once I get the
propagation bits worked out.

As far as patch 2 in the set it would probably be better to either
drop it and just accept it as an outstanding issue, or you could take
on the propagation problems with GRO_HW and RXCSUM since we really
need to get those solved in order for this functionality to fully
work.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device.
  2017-12-09 21:40     ` Michael Chan
@ 2017-12-09 22:37       ` Alexander Duyck
  2017-12-10  6:49         ` Michael Chan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2017-12-09 22:37 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 1:40 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Sat, Dec 9, 2017 at 10:56 AM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Fri, Dec 8, 2017 at 10:27 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> Hardware should not aggregate any packets when generic XDP is installed.
>>>
>>> Cc: Ariel Elior <Ariel.Elior@cavium.com>
>>> Cc: everest-linux-l2@cavium.com
>>> Signed-off-by: Michael Chan <michael.chan@broadcom.com>
>>> ---
>>>  net/core/dev.c | 24 ++++++++++++++++++++++++
>>>  1 file changed, 24 insertions(+)
>>>
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 6ebd0e7..ec08ace 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -1542,6 +1542,29 @@ void dev_disable_lro(struct net_device *dev)
>>>  }
>>>  EXPORT_SYMBOL(dev_disable_lro);
>>>
>>> +/**
>>> + *     dev_disable_gro_hw - disable HW Generic Receive Offload on a device
>>> + *     @dev: device
>>> + *
>>> + *     Disable HW Generic Receive Offload (GRO_HW) on a net device.  Must be
>>> + *     called under RTNL.  This is needed if Generic XDP is installed on
>>> + *     the device.
>>> + */
>>> +static void dev_disable_gro_hw(struct net_device *dev)
>>> +{
>>> +       struct net_device *lower_dev;
>>> +       struct list_head *iter;
>>> +
>>> +       dev->wanted_features &= ~NETIF_F_GRO_HW;
>>> +       netdev_update_features(dev);
>>> +
>>> +       if (unlikely(dev->features & NETIF_F_GRO_HW))
>>> +               netdev_WARN(dev, "failed to disable GRO_HW!\n");
>>> +
>>> +       netdev_for_each_lower_dev(dev, lower_dev, iter)
>>> +               dev_disable_gro_hw(lower_dev);
>>
>> I think these two lines are redundant in dev_disable_lro, since
>> netdev_update_features should propagate the disable to all of the
>> lower devices.
>
> Right.  But for GRO_HW, there is no automatic propagation.

Right, but that is also an issue since the automatic propagation is
what prevents LRO from being re-enabled on the lower devices.

>> Also this doesn't prevent the lower devices from
>> re-enabling gro_hw.
>
> Right.  You can re-enable LRO on the upper device as well.

On the upper device yes, but not on the lower devices. That was what I
was getting at. With LRO there is netdev_sync_upper_features() and
that prevents you from enabling it if the upper device has it
disabled. The problem is right now there is nothing that sets it for
the upper devices when they are added to something like a bond so that
is one of the pieces that still has to be worked on before we can just
use the existing sync logic.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-09 22:04       ` Alexander Duyck
@ 2017-12-10  6:40         ` Michael Chan
  2017-12-10 17:02           ` Alexander Duyck
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Chan @ 2017-12-10  6:40 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 2:04 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 1:31 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> On Sat, Dec 9, 2017 at 10:50 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> So I would disagree with it being a subset of NETIF_F_GRO. If anything
>>> it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
>>> at the device level in the case of hardware drivers. My concern is
>>> this is probably going to end up applying to things other than just
>>> hardware drivers though. For example what is to prevent this from
>>> being applied to something like a virtio/tap interface? It seems like
>>> this should be something that would be easy to implement in software.
>>
>> If you do it in software, it's called NETIF_F_GRO.  We already have
>> it.  The whole point of the new flag is that if the device has
>> software GRO enabled, and if the device supports GRO_HW, then we can
>> do a subset of GRO in hardware (hopefully faster).
>
> I can see what you are getting at. But GRO_HW with GRO stacked on top
> of it won't necessarily be the fastest form of GRO. If you have a
> GRO_HW implementation that is complete enough people may want to
> disable Software GRO in order to avoid the extra overhead involved
> with using it.

It is possible that if you have incoming packets 1, 2, 3, 4, 5 for a
TCP connection, HW_GRO can aggregate packets 1, 2, 3, but cannot
aggregate packets 4 and 5 due to hardware resource limitation.
Software GRO aggregates 4 and 5.  So it works well together.

>>> I'm going to back off on my requirement for you to handle propagation
>>> since after spending a couple hours working on it I did find it was
>>> more complex then I originally thought it would be. With that said
>>> however I would want to see this feature implemented in such a way
>>> that we can deal with propagating the bits in the future if we need to
>>> and that is what I am basing my comments on.
>>
>> Nothing stops anyone from propagating the flag.  Just add
>> NETIF_F_GRO_HW to NETIF_F_UPPER_DISABLES and it will be propagated
>> just like LRO.
>
> Yes, but the problem then is it doesn't solve the secondary issue of
> no way to propagate down the desire to disable GRO as well. That is
> why I am thinking that the new bit could be used to indicate that we
> want GRO to be supported either in the driver or below it instead of
> only in "hardware". We are much better off with a generic solution and
> that is why I think it might be better to use more of a pipeline or
> staged type definition for this. Basically with GRO it occurs in the
> GRO logic just after the driver hands off the packet, while this new
> bit indicates that GRO happens somewhere before then. If we use that
> definition for this then it becomes usable to deal with things like
> the stacked devices problem where the stacked devices normally have
> the GRO flag disabled since we don't want to run GRO multiple times,
> but as a result the stacked devices have no way of saying they don't
> want GRO. If we tweak the definition of this bit it solves that
> problem since it would allow for us disabling GRO, GRO_HW, and LRO on
> any devices below a given device.

I just don't follow your logic.  First of all, GRO on an upper device
doesn't mean that we are doing GRO on the upper device.  The bonding
driver cannot do GRO because it doesn't call napi_gro_receive().  GRO
always happens on the lower device.  Propagation of GRO can only mean
that if GRO is set on the upper device, GRO is propagated and allowed
on lower devices.  Nothing stops you from doing that if you want to do
that.

>>> I still disagree with this bit. I think GRO is a pure software
>>> offload, whereas GRO_HW can represent either a software offload of
>>> some sort occurring in or before the driver, or in the hardware.
>>> Basically the difference between the two as I view it is where the GRO
>>> is occurring. I would like to keep that distinction and make use of
>>> it. As I mentioned before in the case of bonding we currently have no
>>> way to disable GRO on the lower devices partially because GRO is a
>>> pure software feature and always happens at each device along the way.
>>> The nice thing about this new bit is the assumption is that it is
>>> pushing GRO to the lowest possible level and not triggering any side
>>> effects like GRO currently does. I hope to use that logic with stacked
>>> devices so that we could clear the bit and have it disable GRO,
>>> GRO_HW, and LRO on all devices below the device that cleared it.
>>>
>>> I think this linking of GRO and GRO_HW is something that would be
>>> better served by moving it into the driver if you are wanting to
>>> maintain the behavior of how this was previously linked to GRO.
>>
>> If you insist, I can move this to the driver's ndo_fix_features().
>> But I feel it is much better to enforce this dependency system wide.
>> Once again, GRO_HW is hardware accelerated GRO and should depend on
>> it.
>
> The question I would have is why? Where is the dependency? I don't see
> it. It is GRO in one spot and/or GRO in the other. The two don't
> interract directly and I don't believe you can do software GRO on a
> frame that has already been coalesced in hardware,

Right.  But hardware can do a series of frames and software can do a
different series of frames that have not been aggregated.

>> This is a logical feature dependency that Yuval Mintz suggested.  For
>> GRO_HW to work, hardware must verify the checksum of a packet before
>> the packet can be merged.
>>
>> So if the user does not want to do RXCSUM on this device for whatever
>> reason, it logically means that he also doesn't want to do GRO_HW with
>> implied RXCSUM performed on each packet that is merged.
>>
>> So I agree with Yuval that this dependency makes sense.
>
> Okay then, so if we are going to go that route we may want to be
> complete on this and just disable GRO_HW and LRO if RXCSUM is not
> enabled. We might also want to add a comment indicating that we don't
> support anything that might mangle a packet at the driver level if
> RXCSUM is not enabled. Comments explaining all this would be a good
> thing just to keep someone from grabbing GRO and lumping it in at some
> point in the future.
>
> I'm still working on trying to propagate the Rx checksum properly
> since it should probably follow the same UPPER_DISABLES behavior as
> LRO, but I will probably only have a few hours over the next week to
> really work on any code and there end up being a number of stacked
> drivers that have to be updated. I would be good with just flipping
> this logic for now and if RXCSUM is not set, and GRO_HW (just noticed
> the typo in your message) is set, then print your message and clear
> the bit. I can probably come back later and add LRO once I get the
> propagation bits worked out.

Just fix the netdev_dbg() typo, right?  I don't understand what you
mean by flipping the logic.  It's the same whether you check RXCSUM
first or GRO_HW first.

May be you meant put the RXCSUM check in the outer if statement so
that someone could add more inner checks?  OK, I think that's what you
meant.

>
> As far as patch 2 in the set it would probably be better to either
> drop it and just accept it as an outstanding issue, or you could take
> on the propagation problems with GRO_HW and RXCSUM since we really
> need to get those solved in order for this functionality to fully
> work.

We need patch #2 otherwise generic GRO won't work on these 3 drivers.
I don't think I fully understand your concerns about propagation.  To
me propagation is just a usage model where an upper device will
control the common features of lower devices.  It is more convenient
to have propagation, but requires upper devices to be aware of all
features that propagate (GRO, RXCSUM).  Without propagation, it is
still fine.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device.
  2017-12-09 22:37       ` Alexander Duyck
@ 2017-12-10  6:49         ` Michael Chan
  2017-12-11  3:03           ` Alexander Duyck
  0 siblings, 1 reply; 17+ messages in thread
From: Michael Chan @ 2017-12-10  6:49 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 2:37 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 1:40 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> On Sat, Dec 9, 2017 at 10:56 AM, Alexander Duyck
>> <alexander.duyck@gmail.com> wrote:
>>> I think these two lines are redundant in dev_disable_lro, since
>>> netdev_update_features should propagate the disable to all of the
>>> lower devices.
>>
>> Right.  But for GRO_HW, there is no automatic propagation.
>
> Right, but that is also an issue since the automatic propagation is
> what prevents LRO from being re-enabled on the lower devices.
>
>>> Also this doesn't prevent the lower devices from
>>> re-enabling gro_hw.
>>
>> Right.  You can re-enable LRO on the upper device as well.
>
> On the upper device yes, but not on the lower devices. That was what I
> was getting at. With LRO there is netdev_sync_upper_features() and
> that prevents you from enabling it if the upper device has it
> disabled. The problem is right now there is nothing that sets it for
> the upper devices when they are added to something like a bond so that
> is one of the pieces that still has to be worked on before we can just
> use the existing sync logic.

I understand.  But if the user really wants to re-enable LRO, he can
just re-enable LRO on the upper device first and then re-enable LRO on
the lower devices.

To permanently disable a feature, I think additional infrastructure
may be required so that the feature can be cleared in hw_features and
then re-enabled later when it is allowed.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-10  6:40         ` Michael Chan
@ 2017-12-10 17:02           ` Alexander Duyck
  2017-12-11  6:39             ` Michael Chan
  0 siblings, 1 reply; 17+ messages in thread
From: Alexander Duyck @ 2017-12-10 17:02 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 10:40 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Sat, Dec 9, 2017 at 2:04 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Sat, Dec 9, 2017 at 1:31 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> On Sat, Dec 9, 2017 at 10:50 AM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> So I would disagree with it being a subset of NETIF_F_GRO. If anything
>>>> it is an alternative to NETIF_F_GRO. It is performing GRO much earlier
>>>> at the device level in the case of hardware drivers. My concern is
>>>> this is probably going to end up applying to things other than just
>>>> hardware drivers though. For example what is to prevent this from
>>>> being applied to something like a virtio/tap interface? It seems like
>>>> this should be something that would be easy to implement in software.
>>>
>>> If you do it in software, it's called NETIF_F_GRO.  We already have
>>> it.  The whole point of the new flag is that if the device has
>>> software GRO enabled, and if the device supports GRO_HW, then we can
>>> do a subset of GRO in hardware (hopefully faster).
>>
>> I can see what you are getting at. But GRO_HW with GRO stacked on top
>> of it won't necessarily be the fastest form of GRO. If you have a
>> GRO_HW implementation that is complete enough people may want to
>> disable Software GRO in order to avoid the extra overhead involved
>> with using it.
>
> It is possible that if you have incoming packets 1, 2, 3, 4, 5 for a
> TCP connection, HW_GRO can aggregate packets 1, 2, 3, but cannot
> aggregate packets 4 and 5 due to hardware resource limitation.
> Software GRO aggregates 4 and 5.  So it works well together.

Right. But in the case where 1, 2, 3, 4, and 5 were not aggregated by
hardware GRO because the frames could not be aggregated and then GRO
burns cycles coming to the same conclusion you have waste. Same thing
goes for if hardware GRO aggregates 1 through 5 and then SW GRO tries
to see if it can do more.

They are both doing the same thing, but what I see it as is two
passes, not something where they are working together. The hardware
GRO can rely on software GRO for a second pass, but it doesn't need
to. The fact that it doesn't need to tells me that it isn't a hard
requirement to have GRO in order to make use of software GRO.

>>>> I'm going to back off on my requirement for you to handle propagation
>>>> since after spending a couple hours working on it I did find it was
>>>> more complex then I originally thought it would be. With that said
>>>> however I would want to see this feature implemented in such a way
>>>> that we can deal with propagating the bits in the future if we need to
>>>> and that is what I am basing my comments on.
>>>
>>> Nothing stops anyone from propagating the flag.  Just add
>>> NETIF_F_GRO_HW to NETIF_F_UPPER_DISABLES and it will be propagated
>>> just like LRO.
>>
>> Yes, but the problem then is it doesn't solve the secondary issue of
>> no way to propagate down the desire to disable GRO as well. That is
>> why I am thinking that the new bit could be used to indicate that we
>> want GRO to be supported either in the driver or below it instead of
>> only in "hardware". We are much better off with a generic solution and
>> that is why I think it might be better to use more of a pipeline or
>> staged type definition for this. Basically with GRO it occurs in the
>> GRO logic just after the driver hands off the packet, while this new
>> bit indicates that GRO happens somewhere before then. If we use that
>> definition for this then it becomes usable to deal with things like
>> the stacked devices problem where the stacked devices normally have
>> the GRO flag disabled since we don't want to run GRO multiple times,
>> but as a result the stacked devices have no way of saying they don't
>> want GRO. If we tweak the definition of this bit it solves that
>> problem since it would allow for us disabling GRO, GRO_HW, and LRO on
>> any devices below a given device.
>
> I just don't follow your logic.  First of all, GRO on an upper device
> doesn't mean that we are doing GRO on the upper device.  The bonding
> driver cannot do GRO because it doesn't call napi_gro_receive().  GRO
> always happens on the lower device.  Propagation of GRO can only mean
> that if GRO is set on the upper device, GRO is propagated and allowed
> on lower devices.  Nothing stops you from doing that if you want to do
> that.

If my understanding of things is correct it can mean doing GRO on an
upper device if that device does any sort of decapsulation as a result
of something like vxlan, geneve, or either ipsec or macsec encryption
occurring on top of it. It would be a side effect of the gro_cells
logic.

Admittedly I am more familiar with the segmentation side of things
then the reassembly. So my understanding of this could be incorrect.

>>>> I still disagree with this bit. I think GRO is a pure software
>>>> offload, whereas GRO_HW can represent either a software offload of
>>>> some sort occurring in or before the driver, or in the hardware.
>>>> Basically the difference between the two as I view it is where the GRO
>>>> is occurring. I would like to keep that distinction and make use of
>>>> it. As I mentioned before in the case of bonding we currently have no
>>>> way to disable GRO on the lower devices partially because GRO is a
>>>> pure software feature and always happens at each device along the way.
>>>> The nice thing about this new bit is the assumption is that it is
>>>> pushing GRO to the lowest possible level and not triggering any side
>>>> effects like GRO currently does. I hope to use that logic with stacked
>>>> devices so that we could clear the bit and have it disable GRO,
>>>> GRO_HW, and LRO on all devices below the device that cleared it.
>>>>
>>>> I think this linking of GRO and GRO_HW is something that would be
>>>> better served by moving it into the driver if you are wanting to
>>>> maintain the behavior of how this was previously linked to GRO.
>>>
>>> If you insist, I can move this to the driver's ndo_fix_features().
>>> But I feel it is much better to enforce this dependency system wide.
>>> Once again, GRO_HW is hardware accelerated GRO and should depend on
>>> it.
>>
>> The question I would have is why? Where is the dependency? I don't see
>> it. It is GRO in one spot and/or GRO in the other. The two don't
>> interract directly and I don't believe you can do software GRO on a
>> frame that has already been coalesced in hardware,
>
> Right.  But hardware can do a series of frames and software can do a
> different series of frames that have not been aggregated.

Right, but you have yet to define how the hardware offload would be
dependent on the software offload. I would say it makes more sense to
make LRO dependent on GRO_HW then it does to make GRO_HW dependent on
GRO. If I turn off GSO it doesn't turn off TSO and I would argue there
is a much stronger link there since GSO is the fallback for when TSO
fails, whereas for GRO you don't even necessarily need to have a
fallback.

There is a hierarchy to all of these features. GRO is the software
stack doing reversible aggregation, GRO_HW is allowing the hardware to
perform reversible aggregation, and LRO is allowing the hardware to
perform lossy/non-reversible aggregation. In my mind the
differentiation is the pure software solution is done outside of the
driver/hardware that you directly control. It basically just happens.
For the GRO_HW and LRO it requires the hardware/driver to participate
in it. In addition LRO might produce some frames that look identical
to GRO_HW, but it might also produce some frames that aren't
completely reversible depending on the implementation.

>>> This is a logical feature dependency that Yuval Mintz suggested.  For
>>> GRO_HW to work, hardware must verify the checksum of a packet before
>>> the packet can be merged.
>>>
>>> So if the user does not want to do RXCSUM on this device for whatever
>>> reason, it logically means that he also doesn't want to do GRO_HW with
>>> implied RXCSUM performed on each packet that is merged.
>>>
>>> So I agree with Yuval that this dependency makes sense.
>>
>> Okay then, so if we are going to go that route we may want to be
>> complete on this and just disable GRO_HW and LRO if RXCSUM is not
>> enabled. We might also want to add a comment indicating that we don't
>> support anything that might mangle a packet at the driver level if
>> RXCSUM is not enabled. Comments explaining all this would be a good
>> thing just to keep someone from grabbing GRO and lumping it in at some
>> point in the future.
>>
>> I'm still working on trying to propagate the Rx checksum properly
>> since it should probably follow the same UPPER_DISABLES behavior as
>> LRO, but I will probably only have a few hours over the next week to
>> really work on any code and there end up being a number of stacked
>> drivers that have to be updated. I would be good with just flipping
>> this logic for now and if RXCSUM is not set, and GRO_HW (just noticed
>> the typo in your message) is set, then print your message and clear
>> the bit. I can probably come back later and add LRO once I get the
>> propagation bits worked out.
>
> Just fix the netdev_dbg() typo, right?  I don't understand what you
> mean by flipping the logic.  It's the same whether you check RXCSUM
> first or GRO_HW first.

It just saves me work later when I get the propagation problems solved
and have to add LRO to the list of features dropped if RXCSUM is not
enabled.

> May be you meant put the RXCSUM check in the outer if statement so
> that someone could add more inner checks?  OK, I think that's what you
> meant.

Yes that is what I meant.

>>
>> As far as patch 2 in the set it would probably be better to either
>> drop it and just accept it as an outstanding issue, or you could take
>> on the propagation problems with GRO_HW and RXCSUM since we really
>> need to get those solved in order for this functionality to fully
>> work.
>
> We need patch #2 otherwise generic GRO won't work on these 3 drivers.
> I don't think I fully understand your concerns about propagation.  To
> me propagation is just a usage model where an upper device will
> control the common features of lower devices.  It is more convenient
> to have propagation, but requires upper devices to be aware of all
> features that propagate (GRO, RXCSUM).  Without propagation, it is
> still fine.

I'm not sure if it is. It depends on how much XDP depends on the
frames being non-linear. As-is I am pretty sure this doesn't work for
the stacked case anyway since GRO was still enabled for lower devices
anyway. So you might look at just modifying patch 2 to not worry about
the stacked devices case since I think that was already broken with
GRO anyway.

Feel free to try taking on the propagation setup if you want. I'm
stuck in a number of meetings over the next week so I probably won't
be able to have any patches to try to address the issue until a couple
weeks from now.

- Alex

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device.
  2017-12-10  6:49         ` Michael Chan
@ 2017-12-11  3:03           ` Alexander Duyck
  0 siblings, 0 replies; 17+ messages in thread
From: Alexander Duyck @ 2017-12-11  3:03 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sat, Dec 9, 2017 at 10:49 PM, Michael Chan <michael.chan@broadcom.com> wrote:
> On Sat, Dec 9, 2017 at 2:37 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Sat, Dec 9, 2017 at 1:40 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>>> On Sat, Dec 9, 2017 at 10:56 AM, Alexander Duyck
>>> <alexander.duyck@gmail.com> wrote:
>>>> I think these two lines are redundant in dev_disable_lro, since
>>>> netdev_update_features should propagate the disable to all of the
>>>> lower devices.
>>>
>>> Right.  But for GRO_HW, there is no automatic propagation.
>>
>> Right, but that is also an issue since the automatic propagation is
>> what prevents LRO from being re-enabled on the lower devices.
>>
>>>> Also this doesn't prevent the lower devices from
>>>> re-enabling gro_hw.
>>>
>>> Right.  You can re-enable LRO on the upper device as well.
>>
>> On the upper device yes, but not on the lower devices. That was what I
>> was getting at. With LRO there is netdev_sync_upper_features() and
>> that prevents you from enabling it if the upper device has it
>> disabled. The problem is right now there is nothing that sets it for
>> the upper devices when they are added to something like a bond so that
>> is one of the pieces that still has to be worked on before we can just
>> use the existing sync logic.
>
> I understand.  But if the user really wants to re-enable LRO, he can
> just re-enable LRO on the upper device first and then re-enable LRO on
> the lower devices.
>
> To permanently disable a feature, I think additional infrastructure
> may be required so that the feature can be cleared in hw_features and
> then re-enabled later when it is allowed.

I'm not saying we have to permanently ban it. But we want to prevent
any mix-ups. If someone re-enables LRO on the upper device and then
goes down the chain manually re-enabling it then they definitely
wanted to do that. However if they disable LRO on the bond they cannot
re-enable it on one of the slaves in the bond. That is the kind of
propagation I am looking at. Basically if I disable some feature in
the Rx path I should not see any frames that make use of that feature
come by after it has been disabled. That is why I am thinking GRO,
LRO, Rx Checksum, and GRO_HW all need to handle this sort of
propagation, but doing it right is going to take some time since we
have to make certain to enable things like Rx checksum on an upper
device then if the feature is present on a lower device and that logic
doesn't exist now and will be needed across multiple drivers such as
bond and team among others.

So for now I would say don't worry about lower devices. Just focus on
the immediate device and we can work on getting the synchronization
between the upper and lower devices sorted out over the next couple of
weeks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH net-next v3 1/5] net: Introduce NETIF_F_GRO_HW.
  2017-12-10 17:02           ` Alexander Duyck
@ 2017-12-11  6:39             ` Michael Chan
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Chan @ 2017-12-11  6:39 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, Netdev, Andrew Gospodarek, Ariel Elior, everest-linux-l2

On Sun, Dec 10, 2017 at 9:02 AM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 10:40 PM, Michael Chan <michael.chan@broadcom.com> wrote:
>> It is possible that if you have incoming packets 1, 2, 3, 4, 5 for a
>> TCP connection, HW_GRO can aggregate packets 1, 2, 3, but cannot
>> aggregate packets 4 and 5 due to hardware resource limitation.
>> Software GRO aggregates 4 and 5.  So it works well together.
>
> Right. But in the case where 1, 2, 3, 4, and 5 were not aggregated by
> hardware GRO because the frames could not be aggregated and then GRO
> burns cycles coming to the same conclusion you have waste. Same thing
> goes for if hardware GRO aggregates 1 through 5 and then SW GRO tries
> to see if it can do more.
>
> They are both doing the same thing, but what I see it as is two
> passes, not something where they are working together. The hardware
> GRO can rely on software GRO for a second pass, but it doesn't need
> to. The fact that it doesn't need to tells me that it isn't a hard
> requirement to have GRO in order to make use of software GRO.

I guess I look at this as feature propagation from net device to
hardware.  To me, it makes a lot of sense.

We've been doing hardware GRO for a while and I never think of it as
replacement for software GRO.  It's hardware accelerated GRO for a
subset of the connections that hardware can handle.

As for the additional GRO pass in software, I think it is quite
efficient.  When hardware has aggregated a GRO frame, software GRO
will effectively "flush" it and never hold it for more aggregation.
After this patchset is done, I can look at the code and see if we can
further optimize the "2nd pass" code path when hardware has already
aggregated the packet.

Anyway, I will move the GRO_HW/GRO dependency to the
ndo_fix_features() of the 3 drivers, so we can move on and get these
patches accepted.

>> May be you meant put the RXCSUM check in the outer if statement so
>> that someone could add more inner checks?  OK, I think that's what you
>> meant.
>
> Yes that is what I meant.

OK.  Will change in v4.

>> We need patch #2 otherwise generic GRO won't work on these 3 drivers.
>> I don't think I fully understand your concerns about propagation.  To
>> me propagation is just a usage model where an upper device will
>> control the common features of lower devices.  It is more convenient
>> to have propagation, but requires upper devices to be aware of all
>> features that propagate (GRO, RXCSUM).  Without propagation, it is
>> still fine.
>
> I'm not sure if it is. It depends on how much XDP depends on the
> frames being non-linear. As-is I am pretty sure this doesn't work for
> the stacked case anyway since GRO was still enabled for lower devices
> anyway. So you might look at just modifying patch 2 to not worry about
> the stacked devices case since I think that was already broken with
> GRO anyway.

OK.  I don't think anyone will run generic XDP on an upper device anyway.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-12-11  6:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-09  6:27 [PATCH net-next v3 0/5] Introduce NETIF_F_GRO_HW Michael Chan
2017-12-09  6:27 ` [PATCH net-next v3 1/5] net: " Michael Chan
2017-12-09 18:50   ` Alexander Duyck
2017-12-09 21:31     ` Michael Chan
2017-12-09 22:04       ` Alexander Duyck
2017-12-10  6:40         ` Michael Chan
2017-12-10 17:02           ` Alexander Duyck
2017-12-11  6:39             ` Michael Chan
2017-12-09  6:27 ` [PATCH net-next v3 2/5] net: Disable GRO_HW when generic XDP is installed on a device Michael Chan
2017-12-09 18:56   ` Alexander Duyck
2017-12-09 21:40     ` Michael Chan
2017-12-09 22:37       ` Alexander Duyck
2017-12-10  6:49         ` Michael Chan
2017-12-11  3:03           ` Alexander Duyck
2017-12-09  6:27 ` [PATCH net-next v3 3/5] bnxt_en: Use NETIF_F_GRO_HW Michael Chan
2017-12-09  6:27 ` [PATCH net-next v3 4/5] bnx2x: " Michael Chan
2017-12-09  6:27 ` [PATCH net-next v3 5/5] qede: " Michael Chan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.