All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 net-next 00/13] tcp: BIG TCP implementation
@ 2022-05-10  3:32 Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

This series implements BIG TCP as presented in netdev 0x15:

https://netdevconf.info/0x15/session.html?BIG-TCP

Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/

Standard TSO/GRO packet limit is 64KB

With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.

Note that this feature is by default not enabled, because it might
break some eBPF programs assuming TCP header immediately follows IPv6 header.

While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
are unable to skip over IPv6 extension headers.

Reducing number of packets traversing networking stack usually improves
performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.

'Standard' performance with current (74KB) limits.
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
77           138          183          8542.19    
79           143          178          8215.28    
70           117          164          9543.39    
80           144          176          8183.71    
78           126          155          9108.47    
80           146          184          8115.19    
71           113          165          9510.96    
74           113          164          9518.74    
79           137          178          8575.04    
73           111          171          9561.73    

Now enable BIG TCP on both hosts.

ip link set dev eth0 gro_max_size 185000 gso_max_size 185000
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
57           83           117          13871.38   
64           118          155          11432.94   
65           116          148          11507.62   
60           105          136          12645.15   
60           103          135          12760.34   
60           102          134          12832.64   
62           109          132          10877.68   
58           82           115          14052.93   
57           83           124          14212.58   
57           82           119          14196.01   

We see an increase of transactions per second, and lower latencies as well.

v6: fix a compilation error for CONFIG_IPV6=n in
    "net: allow gso_max_size to exceed 65536", reported by kernel bots.

v5: Replaced two patches (that were adding new attributes) with patches
    from Alexander Duyck. Idea is to reuse existing gso_max_size/gro_max_size

v4: Rebased on top of Jakub series (Merge branch 'tso-gso-limit-split')
    max_tso_size is now family independent.

v3: Fixed a typo in RFC number (Alexander)
    Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.

v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
    Addressed feedback, for Alexander and nvidia folks.


Alexander Duyck (2):
  net: allow gso_max_size to exceed 65536
  net: allow gro_max_size to exceed 65536

Coco Li (2):
  ipv6: Add hop-by-hop header to jumbograms in ip6_output
  mlx5: support BIG TCP packets

Eric Dumazet (9):
  net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
  net: limit GSO_MAX_SIZE to 524280 bytes
  tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  ipv6: add struct hop_jumbo_hdr definition
  ipv6/gso: remove temporary HBH/jumbo header
  ipv6/gro: insert temporary HBH/jumbo header
  net: loopback: enable BIG TCP packets
  veth: enable BIG TCP packets
  mlx4: support BIG TCP packets

 drivers/net/ethernet/amd/xgbe/xgbe.h          |  3 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 +
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 drivers/net/ethernet/sfc/ef100_nic.c          |  3 +-
 drivers/net/ethernet/sfc/falcon/tx.c          |  3 +-
 drivers/net/ethernet/sfc/tx_common.c          |  3 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac.h    |  3 +-
 drivers/net/hyperv/rndis_filter.c             |  2 +-
 drivers/net/loopback.c                        |  2 +
 drivers/net/veth.c                            |  1 +
 drivers/scsi/fcoe/fcoe.c                      |  2 +-
 include/linux/ipv6.h                          |  1 +
 include/linux/netdevice.h                     | 15 +++-
 include/net/ipv6.h                            | 44 ++++++++++
 include/uapi/linux/if_link.h                  |  2 +
 net/bpf/test_run.c                            |  2 +-
 net/core/dev.c                                |  9 +-
 net/core/gro.c                                |  8 ++
 net/core/rtnetlink.c                          | 16 ++--
 net/core/sock.c                               | 14 ++++
 net/ipv4/tcp_bbr.c                            |  2 +-
 net/ipv4/tcp_cubic.c                          |  4 +-
 net/ipv4/tcp_output.c                         |  2 +-
 net/ipv6/ip6_offload.c                        | 56 ++++++++++++-
 net/ipv6/ip6_output.c                         | 22 ++++-
 net/sctp/output.c                             |  3 +-
 tools/include/uapi/linux/if_link.h            |  2 +
 30 files changed, 301 insertions(+), 60 deletions(-)

-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report to user-space the device TSO limits.

ip -d link sh dev eth1
...
   tso_max_size 65536 tso_max_segs 65535

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/if_link.h       | 2 ++
 net/core/rtnetlink.c               | 6 ++++++
 tools/include/uapi/linux/if_link.h | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d1e600816b82c2e73c3e0684c66ddf9841a75b04..5f58dcfe2787f308bb2aa5777cca0816dd32bbb9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -368,6 +368,8 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_MAX_SIZE,
+	IFLA_TSO_MAX_SEGS,
 
 	__IFLA_MAX
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6aff02df9ba51c99e8f1dd8e1c1da393c92b8ebf..21b117b710bf2154f11b6511de7d578d0eafb65e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1064,6 +1064,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_MAX_SEGS */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1769,6 +1771,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_MAX_SIZE, dev->tso_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_MAX_SEGS, dev->tso_max_segs) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1922,6 +1926,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_TSO_MAX_SIZE]	= { .type = NLA_REJECT },
+	[IFLA_TSO_MAX_SEGS]	= { .type = NLA_REJECT },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index e1ba2d51b717b7ac7f06e94ac9791cf4c8a5ab6f..b339bf2196ca160ed3040615ae624b9a028562fb 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -348,6 +348,8 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_MAX_SIZE,
+	IFLA_TSO_MAX_SEGS,
 
 	__IFLA_MAX
 };
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 02/13] net: allow gso_max_size to exceed 65536
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes Eric Dumazet
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Alexander Duyck <alexanderduyck@fb.com>

The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.

With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.

So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.

v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
               in a new sk_trim_gso_size() helper.
               netif_set_tso_max_size() caps the requested TSO size
               with GSO_MAX_SIZE.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe.h            |  3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c |  2 +-
 drivers/net/ethernet/sfc/ef100_nic.c            |  3 ++-
 drivers/net/ethernet/sfc/falcon/tx.c            |  3 ++-
 drivers/net/ethernet/sfc/tx_common.c            |  3 ++-
 drivers/net/ethernet/synopsys/dwc-xlgmac.h      |  3 ++-
 drivers/net/hyperv/rndis_filter.c               |  2 +-
 drivers/scsi/fcoe/fcoe.c                        |  2 +-
 include/linux/netdevice.h                       |  4 +++-
 net/bpf/test_run.c                              |  2 +-
 net/core/dev.c                                  |  7 ++++---
 net/core/rtnetlink.c                            |  2 +-
 net/core/sock.c                                 | 14 ++++++++++++++
 net/ipv4/tcp_bbr.c                              |  2 +-
 net/ipv4/tcp_output.c                           |  2 +-
 net/sctp/output.c                               |  3 ++-
 16 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h
index 607a2c90513b529ca0383410a3f513d98a75a72f..d9547552ceefe1d291155ab7619a5f2fa6296340 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -151,7 +151,8 @@
 #define XGBE_TX_MAX_BUF_SIZE	(0x3fff & ~(64 - 1))
 
 /* Descriptors required for maximum contiguous TSO/GSO packet */
-#define XGBE_TX_MAX_SPLIT	((GSO_MAX_SIZE / XGBE_TX_MAX_BUF_SIZE) + 1)
+#define XGBE_TX_MAX_SPLIT	\
+	((GSO_LEGACY_MAX_SIZE / XGBE_TX_MAX_BUF_SIZE) + 1)
 
 /* Maximum possible descriptors needed for an SKB:
  * - Maximum number of SKB frags
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index fb11081001a088fcddde68b88bae1da65a3f2c06..838870bc6dbd6e3a3d8c9443ff4675a0e411006b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2038,7 +2038,7 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt)
 {
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 
-	return PAGE_SIZE * nr_frags + data_bcnt <= GSO_MAX_SIZE;
+	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_MAX_SIZE;
 }
 
 static void
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index a69d756e09b9316660aea5a48d07d86af9cd9112..b2536d2c218a6db8acf1e8a5802860639c5e71a6 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -1008,7 +1008,8 @@ static int ef100_process_design_param(struct efx_nic *efx,
 		}
 		return 0;
 	case ESE_EF100_DP_GZ_TSO_MAX_PAYLOAD_LEN:
-		nic_data->tso_max_payload_len = min_t(u64, reader->value, GSO_MAX_SIZE);
+		nic_data->tso_max_payload_len = min_t(u64, reader->value,
+						      GSO_LEGACY_MAX_SIZE);
 		netif_set_tso_max_size(efx->net_dev,
 				       nic_data->tso_max_payload_len);
 		return 0;
diff --git a/drivers/net/ethernet/sfc/falcon/tx.c b/drivers/net/ethernet/sfc/falcon/tx.c
index f7306e93a8b8db9b220c5c3b95dc95c7eaaf2580..b9369483758cd6ebcd263852542175610b4d2789 100644
--- a/drivers/net/ethernet/sfc/falcon/tx.c
+++ b/drivers/net/ethernet/sfc/falcon/tx.c
@@ -98,7 +98,8 @@ unsigned int ef4_tx_max_skb_descs(struct ef4_nic *efx)
 	/* Possibly more for PCIe page boundaries within input fragments */
 	if (PAGE_SIZE > EF4_PAGE_SIZE)
 		max_descs += max_t(unsigned int, MAX_SKB_FRAGS,
-				   DIV_ROUND_UP(GSO_MAX_SIZE, EF4_PAGE_SIZE));
+				   DIV_ROUND_UP(GSO_LEGACY_MAX_SIZE,
+						EF4_PAGE_SIZE));
 
 	return max_descs;
 }
diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c
index 9bc8281b7f5bdd3d95924c6f8294d39202424a27..658ea2d340704d186bb9f94ad24497cbd2d15752 100644
--- a/drivers/net/ethernet/sfc/tx_common.c
+++ b/drivers/net/ethernet/sfc/tx_common.c
@@ -416,7 +416,8 @@ unsigned int efx_tx_max_skb_descs(struct efx_nic *efx)
 	/* Possibly more for PCIe page boundaries within input fragments */
 	if (PAGE_SIZE > EFX_PAGE_SIZE)
 		max_descs += max_t(unsigned int, MAX_SKB_FRAGS,
-				   DIV_ROUND_UP(GSO_MAX_SIZE, EFX_PAGE_SIZE));
+				   DIV_ROUND_UP(GSO_LEGACY_MAX_SIZE,
+						EFX_PAGE_SIZE));
 
 	return max_descs;
 }
diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac.h b/drivers/net/ethernet/synopsys/dwc-xlgmac.h
index 98e3a271e017ae17f23866beab8021d2f2ab26c0..a848e10f3ea457da1b17571df6a35b077a96c794 100644
--- a/drivers/net/ethernet/synopsys/dwc-xlgmac.h
+++ b/drivers/net/ethernet/synopsys/dwc-xlgmac.h
@@ -38,7 +38,8 @@
 #define XLGMAC_RX_DESC_MAX_DIRTY	(XLGMAC_RX_DESC_CNT >> 3)
 
 /* Descriptors required for maximum contiguous TSO/GSO packet */
-#define XLGMAC_TX_MAX_SPLIT	((GSO_MAX_SIZE / XLGMAC_TX_MAX_BUF_SIZE) + 1)
+#define XLGMAC_TX_MAX_SPLIT	\
+	((GSO_LEGACY_MAX_SIZE / XLGMAC_TX_MAX_BUF_SIZE) + 1)
 
 /* Maximum possible descriptors needed for a SKB */
 #define XLGMAC_TX_MAX_DESC_NR	(MAX_SKB_FRAGS + XLGMAC_TX_MAX_SPLIT + 2)
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 866af2cc27a3e0df11812d6ade17dde1d247ff4a..6da36cb8af8055eba338490b6bc7493181e8644c 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1349,7 +1349,7 @@ static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device,
 	struct net_device_context *net_device_ctx = netdev_priv(net);
 	struct ndis_offload hwcaps;
 	struct ndis_offload_params offloads;
-	unsigned int gso_max_size = GSO_MAX_SIZE;
+	unsigned int gso_max_size = GSO_LEGACY_MAX_SIZE;
 	int ret;
 
 	/* Find HW offload capabilities */
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 44ca6110213caaf7222c8b69c6c3fc2a08687495..79b2827e4081b4015fc51ace4e1467214c45fd48 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -667,7 +667,7 @@ static void fcoe_netdev_features_change(struct fc_lport *lport,
 
 	if (netdev->features & NETIF_F_FSO) {
 		lport->seq_offload = 1;
-		lport->lso_max = netdev->gso_max_size;
+		lport->lso_max = min(netdev->gso_max_size, GSO_LEGACY_MAX_SIZE);
 		FCOE_NETDEV_DBG(netdev, "Supports LSO for max len 0x%x\n",
 				lport->lso_max);
 	} else {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74c97a34921d48c593c08e2bed72e099f42520a3..6150e3a7ce9dc743129d3f4f240329dd688b49a4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2262,7 +2262,9 @@ struct net_device {
 	const struct rtnl_link_ops *rtnl_link_ops;
 
 	/* for setting kernel sock attribute on TCP connection setup */
-#define GSO_MAX_SIZE		65536
+#define GSO_LEGACY_MAX_SIZE	65536u
+#define GSO_MAX_SIZE		UINT_MAX
+
 	unsigned int		gso_max_size;
 #define TSO_LEGACY_MAX_SIZE	65536
 #define TSO_MAX_SIZE		UINT_MAX
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 8d54fef9a568a189d14253bcf01e3d586e746084..9b5a1f630bb0dbfe577c0f2a63094cb5872ade1d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -1001,7 +1001,7 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
 		cb->pkt_len = skb->len;
 	} else {
 		if (__skb->wire_len < skb->len ||
-		    __skb->wire_len > GSO_MAX_SIZE)
+		    __skb->wire_len > GSO_LEGACY_MAX_SIZE)
 			return -EINVAL;
 		cb->pkt_len = __skb->wire_len;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index f036ccb61da4da3ffc52c4f2402427054b831e8a..68b76b45a5ac5f2ea705bd3db5d1732b79034609 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2998,11 +2998,12 @@ EXPORT_SYMBOL(netif_set_real_num_queues);
  * @size:	max skb->len of a TSO frame
  *
  * Set the limit on the size of TSO super-frames the device can handle.
- * Unless explicitly set the stack will assume the value of %GSO_MAX_SIZE.
+ * Unless explicitly set the stack will assume the value of
+ * %GSO_LEGACY_MAX_SIZE.
  */
 void netif_set_tso_max_size(struct net_device *dev, unsigned int size)
 {
-	dev->tso_max_size = size;
+	dev->tso_max_size = min(GSO_MAX_SIZE, size);
 	if (size < READ_ONCE(dev->gso_max_size))
 		netif_set_gso_max_size(dev, size);
 }
@@ -10602,7 +10603,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 
 	dev_net_set(dev, &init_net);
 
-	dev->gso_max_size = GSO_MAX_SIZE;
+	dev->gso_max_size = GSO_LEGACY_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
 	dev->tso_max_size = TSO_LEGACY_MAX_SIZE;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 21b117b710bf2154f11b6511de7d578d0eafb65e..823db8999a2c1d5959042393783492dbecf1352c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2809,7 +2809,7 @@ static int do_setlink(const struct sk_buff *skb,
 	if (tb[IFLA_GSO_MAX_SIZE]) {
 		u32 max_size = nla_get_u32(tb[IFLA_GSO_MAX_SIZE]);
 
-		if (max_size > GSO_MAX_SIZE || max_size > dev->tso_max_size) {
+		if (max_size > dev->tso_max_size) {
 			err = -EINVAL;
 			goto errout;
 		}
diff --git a/net/core/sock.c b/net/core/sock.c
index 6b287eb5427b32865d25fc22122fefeff3a4ccf5..24a46a1e4f282ada9370a1ecae66e29fcc832085 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2293,6 +2293,19 @@ void sk_free_unlock_clone(struct sock *sk)
 }
 EXPORT_SYMBOL_GPL(sk_free_unlock_clone);
 
+static void sk_trim_gso_size(struct sock *sk)
+{
+	if (sk->sk_gso_max_size <= GSO_LEGACY_MAX_SIZE)
+		return;
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6 &&
+	    sk_is_tcp(sk) &&
+	    !ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
+		return;
+#endif
+	sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
+}
+
 void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 {
 	u32 max_segs = 1;
@@ -2312,6 +2325,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */
 			sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
+			sk_trim_gso_size(sk);
 			sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */
 			max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1);
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index c7d30a3bbd81d27e16e800ec446569b93a4123ba..075e744bfb4829c087f4a85448e2f778dba439b4 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -310,7 +310,7 @@ static u32 bbr_tso_segs_goal(struct sock *sk)
 	 */
 	bytes = min_t(unsigned long,
 		      sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift),
-		      GSO_MAX_SIZE - 1 - MAX_TCP_HEADER);
+		      GSO_LEGACY_MAX_SIZE - 1 - MAX_TCP_HEADER);
 	segs = max_t(u32, bytes / tp->mss_cache, bbr_min_tso_segs(sk));
 
 	return min(segs, 0x7FU);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b092228e434261f45f79cc6c1fad613e0bb045c0..b4b2284ed4a2c9e2569bd945e3b4e023c5502f25 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1553,7 +1553,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
 	 * SO_SNDBUF values.
 	 * Also allow first and last skb in retransmit queue to be split.
 	 */
-	limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+	limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_LEGACY_MAX_SIZE);
 	if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
 		     tcp_queue != TCP_FRAG_IN_WRITE_QUEUE &&
 		     skb != tcp_rtx_queue_head(sk) &&
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 72fe6669c50de2c76842cf50d039b65a61943bd8..a63df055ac57d551e89edfb3a4982768a318cf67 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -134,7 +134,8 @@ void sctp_packet_config(struct sctp_packet *packet, __u32 vtag,
 		dst_hold(tp->dst);
 		sk_setup_caps(sk, tp->dst);
 	}
-	packet->max_size = sk_can_gso(sk) ? READ_ONCE(tp->dst->dev->gso_max_size)
+	packet->max_size = sk_can_gso(sk) ? min(READ_ONCE(tp->dst->dev->gso_max_size),
+						GSO_LEGACY_MAX_SIZE)
 					  : asoc->pathmtu;
 	rcu_read_unlock();
 }
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Make sure we will not overflow shinfo->gso_segs

Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs
is a 16bit field.

TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h,
it seems cleaner to not bring tcp details into include/linux/netdevice.h

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6150e3a7ce9dc743129d3f4f240329dd688b49a4..673c444aae874428b117df45dffcaf702ac72a47 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2262,14 +2262,17 @@ struct net_device {
 	const struct rtnl_link_ops *rtnl_link_ops;
 
 	/* for setting kernel sock attribute on TCP connection setup */
+#define GSO_MAX_SEGS		65535u
 #define GSO_LEGACY_MAX_SIZE	65536u
-#define GSO_MAX_SIZE		UINT_MAX
+/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
+ * and shinfo->gso_segs is a 16bit field.
+ */
+#define GSO_MAX_SIZE		(8 * GSO_MAX_SEGS)
 
 	unsigned int		gso_max_size;
 #define TSO_LEGACY_MAX_SIZE	65536
 #define TSO_MAX_SIZE		UINT_MAX
 	unsigned int		tso_max_size;
-#define GSO_MAX_SEGS		65535
 	u16			gso_max_segs;
 #define TSO_MAX_SEGS		U16_MAX
 	u16			tso_max_segs;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

hystart_ack_delay() had the assumption that a TSO packet
would not be bigger than GSO_MAX_SIZE.

This will no longer be true.

We should use sk->sk_gso_max_size instead.

This reduces chances of spurious Hystart ACK train detections.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_cubic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index b0918839bee7cf0264ec3bbcdfc1417daa86d197..68178e7280ce24c26a48e48a51518d759e4d1718 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -372,7 +372,7 @@ static void cubictcp_state(struct sock *sk, u8 new_state)
  * We apply another 100% factor because @rate is doubled at this point.
  * We cap the cushion to 1ms.
  */
-static u32 hystart_ack_delay(struct sock *sk)
+static u32 hystart_ack_delay(const struct sock *sk)
 {
 	unsigned long rate;
 
@@ -380,7 +380,7 @@ static u32 hystart_ack_delay(struct sock *sk)
 	if (!rate)
 		return 0;
 	return min_t(u64, USEC_PER_MSEC,
-		     div64_ul((u64)GSO_MAX_SIZE * 4 * USEC_PER_SEC, rate));
+		     div64_ul((u64)sk->sk_gso_max_size * 4 * USEC_PER_SEC, rate));
 }
 
 static void hystart_update(struct sock *sk, u32 delay)
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (3 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patches will need to add and remove local IPv6 jumbogram
options to enable BIG TCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 213612f1680c7c39f4c07f0c05b4e6cf34a7878e..63d019953c47ea03d3b723a58c25e83c249489a9 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -151,6 +151,17 @@ struct frag_hdr {
 	__be32	identification;
 };
 
+/*
+ * Jumbo payload option, as described in RFC 2675 2.
+ */
+struct hop_jumbo_hdr {
+	u8	nexthdr;
+	u8	hdrlen;
+	u8	tlv_type;	/* IPV6_TLV_JUMBO, 0xC2 */
+	u8	tlv_len;	/* 4 */
+	__be32	jumbo_payload_len;
+};
+
 #define	IP6_MF		0x0001
 #define	IP6_OFFSET	0xFFF8
 
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (4 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 07/13] ipv6/gro: insert " Eric Dumazet
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

ipv6 tcp and gro stacks will soon be able to build big TCP packets,
with an added temporary Hop By Hop header.

If GSO is involved for these large packets, we need to remove
the temporary HBH header before segmentation happens.

v2: perform HBH removal from ipv6_gso_segment() instead of
    skb_segment() (Alexander feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h     | 33 +++++++++++++++++++++++++++++++++
 net/ipv6/ip6_offload.c | 24 +++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 63d019953c47ea03d3b723a58c25e83c249489a9..b6df0314aa02dd1c4094620145ccb24da7195b2b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -467,6 +467,39 @@ bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb,
 struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
 					   struct ipv6_txoptions *opt);
 
+/* This helper is specialized for BIG TCP needs.
+ * It assumes the hop_jumbo_hdr will immediately follow the IPV6 header.
+ * It assumes headers are already in skb->head.
+ * Returns 0, or IPPROTO_TCP if a BIG TCP packet is there.
+ */
+static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
+{
+	const struct hop_jumbo_hdr *jhdr;
+	const struct ipv6hdr *nhdr;
+
+	if (likely(skb->len <= GRO_MAX_SIZE))
+		return 0;
+
+	if (skb->protocol != htons(ETH_P_IPV6))
+		return 0;
+
+	if (skb_network_offset(skb) +
+	    sizeof(struct ipv6hdr) +
+	    sizeof(struct hop_jumbo_hdr) > skb_headlen(skb))
+		return 0;
+
+	nhdr = ipv6_hdr(skb);
+
+	if (nhdr->nexthdr != NEXTHDR_HOP)
+		return 0;
+
+	jhdr = (const struct hop_jumbo_hdr *) (nhdr + 1);
+	if (jhdr->tlv_type != IPV6_TLV_JUMBO || jhdr->hdrlen != 0 ||
+	    jhdr->nexthdr != IPPROTO_TCP)
+		return 0;
+	return jhdr->nexthdr;
+}
+
 static inline bool ipv6_accept_ra(struct inet6_dev *idev)
 {
 	/* If forwarding is enabled, RA are not accepted unless the special
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index c4fc03c1ac99dbecd92e2b47b2db65374197434d..a6a6c1539c28d242ef8c35fcd5ce900512ce912d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -77,7 +77,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct ipv6hdr *ipv6h;
 	const struct net_offload *ops;
-	int proto;
+	int proto, nexthdr;
 	struct frag_hdr *fptr;
 	unsigned int payload_len;
 	u8 *prevhdr;
@@ -87,6 +87,28 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	bool gso_partial;
 
 	skb_reset_network_header(skb);
+	nexthdr = ipv6_has_hopopt_jumbo(skb);
+	if (nexthdr) {
+		const int hophdr_len = sizeof(struct hop_jumbo_hdr);
+		int err;
+
+		err = skb_cow_head(skb, 0);
+		if (err < 0)
+			return ERR_PTR(err);
+
+		/* remove the HBH header.
+		 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+		 */
+		memmove(skb_mac_header(skb) + hophdr_len,
+			skb_mac_header(skb),
+			ETH_HLEN + sizeof(struct ipv6hdr));
+		skb->data += hophdr_len;
+		skb->len -= hophdr_len;
+		skb->network_header += hophdr_len;
+		skb->mac_header += hophdr_len;
+		ipv6h = (struct ipv6hdr *)skb->data;
+		ipv6h->nexthdr = nexthdr;
+	}
 	nhoff = skb_network_header(skb) - skb_mac_header(skb);
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 07/13] ipv6/gro: insert temporary HBH/jumbo header
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (5 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 08/13] net: allow gro_max_size to exceed 65536 Eric Dumazet
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
BIG TCP ipv6 packets (bigger than 64K).

This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
so that resulting packet can go through IPv6/TCP stacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_offload.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a6a6c1539c28d242ef8c35fcd5ce900512ce912d..d12dba2dd5354dbb79bb80df4038dec2544cddeb 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -342,15 +342,43 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
-	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+	struct ipv6hdr *iph;
 	int err = -ENOSYS;
+	u32 payload_len;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
 		skb_set_inner_network_header(skb, nhoff);
 	}
 
-	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
+	payload_len = skb->len - nhoff - sizeof(*iph);
+	if (unlikely(payload_len > IPV6_MAXPLEN)) {
+		struct hop_jumbo_hdr *hop_jumbo;
+		int hoplen = sizeof(*hop_jumbo);
+
+		/* Move network header left */
+		memmove(skb_mac_header(skb) - hoplen, skb_mac_header(skb),
+			skb->transport_header - skb->mac_header);
+		skb->data -= hoplen;
+		skb->len += hoplen;
+		skb->mac_header -= hoplen;
+		skb->network_header -= hoplen;
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		hop_jumbo = (struct hop_jumbo_hdr *)(iph + 1);
+
+		/* Build hop-by-hop options */
+		hop_jumbo->nexthdr = iph->nexthdr;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(payload_len + hoplen);
+
+		iph->nexthdr = NEXTHDR_HOP;
+		iph->payload_len = 0;
+	} else {
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		iph->payload_len = htons(payload_len);
+	}
 
 	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 08/13] net: allow gro_max_size to exceed 65536
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (6 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 07/13] ipv6/gro: insert " Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Alexander Duyck <alexanderduyck@fb.com>

Allow the gro_max_size to exceed a value larger than 65536.

There weren't really any external limitations that prevented this other
than the fact that IPv4 only supports a 16 bit length field. Since we have
the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
via a constant rather than relying on gro_max_size.

[edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 include/linux/netdevice.h                       | 6 +++++-
 include/net/ipv6.h                              | 2 +-
 net/core/dev.c                                  | 2 +-
 net/core/gro.c                                  | 8 ++++++++
 net/core/rtnetlink.c                            | 8 --------
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 838870bc6dbd6e3a3d8c9443ff4675a0e411006b..24de37b79f5a917b304c011fcebcd09748ee5c6a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2038,7 +2038,7 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt)
 {
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 
-	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_MAX_SIZE;
+	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_LEGACY_MAX_SIZE;
 }
 
 static void
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 673c444aae874428b117df45dffcaf702ac72a47..69743188f639c5afd20f06a5e301edca00aedaef 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2151,7 +2151,11 @@ struct net_device {
 	struct bpf_prog __rcu	*xdp_prog;
 	unsigned long		gro_flush_timeout;
 	int			napi_defer_hard_irqs;
-#define GRO_MAX_SIZE		65536
+#define GRO_LEGACY_MAX_SIZE	65536u
+/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
+ * and shinfo->gso_segs is a 16bit field.
+ */
+#define GRO_MAX_SIZE		(8 * 65535u)
 	unsigned int		gro_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index b6df0314aa02dd1c4094620145ccb24da7195b2b..5b38bf1a586b9da55f43db30d140d364a70f6c11 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -477,7 +477,7 @@ static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
 	const struct hop_jumbo_hdr *jhdr;
 	const struct ipv6hdr *nhdr;
 
-	if (likely(skb->len <= GRO_MAX_SIZE))
+	if (likely(skb->len <= GRO_LEGACY_MAX_SIZE))
 		return 0;
 
 	if (skb->protocol != htons(ETH_P_IPV6))
diff --git a/net/core/dev.c b/net/core/dev.c
index 68b76b45a5ac5f2ea705bd3db5d1732b79034609..4be3695846520af18a687cdcaa70c5f327ba94e8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10605,7 +10605,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 
 	dev->gso_max_size = GSO_LEGACY_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
-	dev->gro_max_size = GRO_MAX_SIZE;
+	dev->gro_max_size = GRO_LEGACY_MAX_SIZE;
 	dev->tso_max_size = TSO_LEGACY_MAX_SIZE;
 	dev->tso_max_segs = TSO_MAX_SEGS;
 	dev->upper_level = 1;
diff --git a/net/core/gro.c b/net/core/gro.c
index 78110edf5d4b36d2fa6f8a2676096efe0112aa0e..b4190eb084672fb4f2be8b437eccb4e8507ff63f 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -167,6 +167,14 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	if (unlikely(p->len + len >= gro_max_size || NAPI_GRO_CB(skb)->flush))
 		return -E2BIG;
 
+	if (unlikely(p->len + len >= GRO_LEGACY_MAX_SIZE)) {
+		if (p->protocol != htons(ETH_P_IPV6) ||
+		    skb_headroom(p) < sizeof(struct hop_jumbo_hdr) ||
+		    ipv6_hdr(p)->nexthdr != IPPROTO_TCP ||
+		    p->encapsulation)
+			return -E2BIG;
+	}
+
 	lp = NAPI_GRO_CB(p)->last;
 	pinfo = skb_shinfo(lp);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 823db8999a2c1d5959042393783492dbecf1352c..5d7d7fe1e63a972bbcbd5eed1404b2643c74cfcb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2347,14 +2347,6 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[],
 		}
 	}
 
-	if (tb[IFLA_GRO_MAX_SIZE]) {
-		u32 gro_max_size = nla_get_u32(tb[IFLA_GRO_MAX_SIZE]);
-
-		if (gro_max_size > GRO_MAX_SIZE) {
-			NL_SET_ERR_MSG(extack, "too big gro_max_size");
-			return -EINVAL;
-		}
-	}
 	return 0;
 }
 
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (7 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 08/13] net: allow gro_max_size to exceed 65536 Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 10/13] net: loopback: enable BIG TCP packets Eric Dumazet
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.

Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.

This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/ipv6.h  |  1 +
 net/ipv6/ip6_output.c | 22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ec5ca392eaa31e83a022b1124fae6b607ba168cd..38c8203d52cbf39e523c43fe630a7b184b9991aa 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -145,6 +145,7 @@ struct inet6_skb_parm {
 #define IP6SKB_L3SLAVE         64
 #define IP6SKB_JUMBOGRAM      128
 #define IP6SKB_SEG6	      256
+#define IP6SKB_FAKEJUMBO      512
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index afa5bd4ad167c4a40878f33773d43be85e89c32f..4081b12a01ff22ecf94a6490aef0665808407a6e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -182,7 +182,9 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
 #endif
 
 	mtu = ip6_skb_dst_mtu(skb);
-	if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
+	if (skb_is_gso(skb) &&
+	    !(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
+	    !skb_gso_validate_network_len(skb, mtu))
 		return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
 
 	if ((skb->len > mtu && !skb_is_gso(skb)) ||
@@ -252,6 +254,8 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct inet6_dev *idev = ip6_dst_idev(dst);
+	struct hop_jumbo_hdr *hop_jumbo;
+	int hoplen = sizeof(*hop_jumbo);
 	unsigned int head_room;
 	struct ipv6hdr *hdr;
 	u8  proto = fl6->flowi6_proto;
@@ -259,7 +263,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	int hlimit = -1;
 	u32 mtu;
 
-	head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dev);
+	head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev);
 	if (opt)
 		head_room += opt->opt_nflen + opt->opt_flen;
 
@@ -282,6 +286,20 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 					     &fl6->saddr);
 	}
 
+	if (unlikely(seg_len > IPV6_MAXPLEN)) {
+		hop_jumbo = skb_push(skb, hoplen);
+
+		hop_jumbo->nexthdr = proto;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen);
+
+		proto = IPPROTO_HOPOPTS;
+		seg_len = 0;
+		IP6CB(skb)->flags |= IP6SKB_FAKEJUMBO;
+	}
+
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	hdr = ipv6_hdr(skb);
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 10/13] net: loopback: enable BIG TCP packets
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (8 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 11/13] veth: " Eric Dumazet
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Set the driver limit to GSO_MAX_SIZE (512 KB).

This allows the admin/user to set a GSO limit up to this value.

Tested:

ip link set dev lo gso_max_size 200000
netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &

tcpdump shows :

18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/loopback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 720394c0639b20a2fd6262e4ee9d5813c02802f1..14e8d04cb4347cb7b9171d576156fb8e8ecebbe3 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -191,6 +191,8 @@ static void gen_lo_setup(struct net_device *dev,
 	dev->netdev_ops		= dev_ops;
 	dev->needs_free_netdev	= true;
 	dev->priv_destructor	= dev_destructor;
+
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
 }
 
 /* The loopback device is special. There is only one instance
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 11/13] veth: enable BIG TCP packets
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (9 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 10/13] net: loopback: enable BIG TCP packets Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 12/13] mlx4: support " Eric Dumazet
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Set the TSO driver limit to GSO_MAX_SIZE (512 KB).

This allows the admin/user to set a GSO limit up to this value.

ip link set dev veth10 gso_max_size 200000

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/veth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f474e79a774580e4cb67da44b5f0c796c3ce8abb..466da01ba2e3e97ba9eb16586b6d5d9f092b3d76 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1647,6 +1647,7 @@ static void veth_setup(struct net_device *dev)
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
 	dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE;
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
 }
 
 /*
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 12/13] mlx4: support BIG TCP packets
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (10 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 11/13] veth: " Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-10  3:32 ` [PATCH v6 net-next 13/13] mlx5: " Eric Dumazet
  2022-05-10 19:49 ` [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Alexander H Duyck
  13 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet,
	Tariq Toukan

From: Eric Dumazet <edumazet@google.com>

mlx4 supports LSOv2 just fine.

IPv6 stack inserts a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore the HBH header when populating TX descriptor.

Tested:

Before: (not enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_max_size 65536 gro_max_size 65536

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   6591.45  0.86   1.34   62.490  97.446
262144 540000

After: (enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_max_size 185000 gro_max_size 185000

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   8383.95  0.95   1.01   54.432  57.584
262144 540000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 ++
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++++++++----
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c61dc7ae0c056a4dbcf24297549f6b1b5cc25d92..ca4b93a0103469b9629dad2f877a496c23fd727c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3417,6 +3417,9 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	dev->min_mtu = ETH_MIN_MTU;
 	dev->max_mtu = priv->max_mtu;
 
+	/* supports LSOv2 packets. */
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
+
 	mdev->pndev[port] = dev;
 	mdev->upper[port] = NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index f777151d226fb601f52366850f8c86358e214032..af3b2b59a2a6940a2839b277815ec7c3b4af1008 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -43,6 +43,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/indirect_call_wrapper.h>
+#include <net/ipv6.h>
 
 #include "mlx4_en.h"
 
@@ -634,19 +635,28 @@ static int get_real_size(const struct sk_buff *skb,
 			 struct net_device *dev,
 			 int *lso_header_size,
 			 bool *inline_ok,
-			 void **pfrag)
+			 void **pfrag,
+			 int *hopbyhop)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int real_size;
 
 	if (shinfo->gso_size) {
 		*inline_ok = false;
-		if (skb->encapsulation)
+		*hopbyhop = 0;
+		if (skb->encapsulation) {
 			*lso_header_size = (skb_inner_transport_header(skb) - skb->data) + inner_tcp_hdrlen(skb);
-		else
+		} else {
+			/* Detects large IPV6 TCP packets and prepares for removal of
+			 * HBH header that has been pushed by ip6_xmit(),
+			 * mainly so that tcpdump can dissect them.
+			 */
+			if (ipv6_has_hopopt_jumbo(skb))
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
 			*lso_header_size = skb_transport_offset(skb) + tcp_hdrlen(skb);
+		}
 		real_size = CTRL_SIZE + shinfo->nr_frags * DS_SIZE +
-			ALIGN(*lso_header_size + 4, DS_SIZE);
+			ALIGN(*lso_header_size - *hopbyhop + 4, DS_SIZE);
 		if (unlikely(*lso_header_size != skb_headlen(skb))) {
 			/* We add a segment for the skb linear buffer only if
 			 * it contains data */
@@ -873,6 +883,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	int desc_size;
 	int real_size;
 	u32 index, bf_index;
+	struct ipv6hdr *h6;
 	__be32 op_own;
 	int lso_header_size;
 	void *fragptr = NULL;
@@ -881,6 +892,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool stop_queue;
 	bool inline_ok;
 	u8 data_offset;
+	int hopbyhop;
 	bool bf_ok;
 
 	tx_ind = skb_get_queue_mapping(skb);
@@ -890,7 +902,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto tx_drop;
 
 	real_size = get_real_size(skb, shinfo, dev, &lso_header_size,
-				  &inline_ok, &fragptr);
+				  &inline_ok, &fragptr, &hopbyhop);
 	if (unlikely(!real_size))
 		goto tx_drop_count;
 
@@ -943,7 +955,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		data = &tx_desc->data;
 		data_offset = offsetof(struct mlx4_en_tx_desc, data);
 	} else {
-		int lso_align = ALIGN(lso_header_size + 4, DS_SIZE);
+		int lso_align = ALIGN(lso_header_size - hopbyhop + 4, DS_SIZE);
 
 		data = (void *)&tx_desc->lso + lso_align;
 		data_offset = offsetof(struct mlx4_en_tx_desc, lso) + lso_align;
@@ -1008,14 +1020,31 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 			((ring->prod & ring->size) ?
 				cpu_to_be32(MLX4_EN_BIT_DESC_OWN) : 0);
 
+		lso_header_size -= hopbyhop;
 		/* Fill in the LSO prefix */
 		tx_desc->lso.mss_hdr_size = cpu_to_be32(
 			shinfo->gso_size << 16 | lso_header_size);
 
-		/* Copy headers;
-		 * note that we already verified that it is linear */
-		memcpy(tx_desc->lso.header, skb->data, lso_header_size);
 
+		if (unlikely(hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(tx_desc->lso.header, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)tx_desc->lso.header + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			/* Copy headers;
+			 * note that we already verified that it is linear
+			 */
+			memcpy(tx_desc->lso.header, skb->data, lso_header_size);
+		}
 		ring->tso_packets++;
 
 		i = shinfo->gso_segs;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (11 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 12/13] mlx4: support " Eric Dumazet
@ 2022-05-10  3:32 ` Eric Dumazet
  2022-05-12  8:40   ` Saeed Mahameed
  2022-05-10 19:49 ` [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Alexander H Duyck
  13 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2022-05-10  3:32 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky

From: Coco Li <lixiaoyan@google.com>

mlx5 supports LSOv2.

IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore/skip this HBH header when populating TX descriptor.

Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.

v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 2 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d27986869b8ba070d1a4f8bcdc7e14ab54ae984e..bf3bca79e160124abd128ac1e9910cb2f39a39ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4920,6 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
+	netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
 	mlx5e_set_netdev_dev_addr(netdev);
 	mlx5e_ipsec_build_netdev(priv);
 	mlx5e_ktls_build_netdev(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2dc48406cd08d21ff94f665cd61ab9227f351215..b4fc45ba1b347fb9ad0f46b9c091cc45e4d3d84f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -40,6 +40,7 @@
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec_rxtx.h"
 #include "en/ptp.h"
+#include <net/ipv6.h>
 
 static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma)
 {
@@ -130,23 +131,32 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		sq->stats->csum_none++;
 }
 
+/* Returns the number of header bytes that we plan
+ * to inline later in the transmit descriptor
+ */
 static inline u16
-mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb)
+mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb, int *hopbyhop)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
 	u16 ihs;
 
+	*hopbyhop = 0;
 	if (skb->encapsulation) {
 		ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
 		stats->tso_inner_packets++;
 		stats->tso_inner_bytes += skb->len - ihs;
 	} else {
-		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
 			ihs = skb_transport_offset(skb) + sizeof(struct udphdr);
-		else
+		} else {
 			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
+			if (ipv6_has_hopopt_jumbo(skb)) {
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
+				ihs -= sizeof(struct hop_jumbo_hdr);
+			}
+		}
 		stats->tso_packets++;
-		stats->tso_bytes += skb->len - ihs;
+		stats->tso_bytes += skb->len - ihs - *hopbyhop;
 	}
 
 	return ihs;
@@ -208,6 +218,7 @@ struct mlx5e_tx_attr {
 	__be16 mss;
 	u16 insz;
 	u8 opcode;
+	u8 hopbyhop;
 };
 
 struct mlx5e_tx_wqe_attr {
@@ -244,14 +255,16 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5e_sq_stats *stats = sq->stats;
 
 	if (skb_is_gso(skb)) {
-		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb);
+		int hopbyhop;
+		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
 
 		*attr = (struct mlx5e_tx_attr) {
 			.opcode    = MLX5_OPCODE_LSO,
 			.mss       = cpu_to_be16(skb_shinfo(skb)->gso_size),
 			.ihs       = ihs,
 			.num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs,
-			.headlen   = skb_headlen(skb) - ihs,
+			.headlen   = skb_headlen(skb) - ihs - hopbyhop,
+			.hopbyhop  = hopbyhop,
 		};
 
 		stats->packets += skb_shinfo(skb)->gso_segs;
@@ -365,7 +378,8 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5_wqe_eth_seg  *eseg;
 	struct mlx5_wqe_data_seg *dseg;
 	struct mlx5e_tx_wqe_info *wi;
-
+	u16 ihs = attr->ihs;
+	struct ipv6hdr *h6;
 	struct mlx5e_sq_stats *stats = sq->stats;
 	int num_dma;
 
@@ -379,15 +393,36 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	eseg->mss = attr->mss;
 
-	if (attr->ihs) {
-		if (skb_vlan_tag_present(skb)) {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs + VLAN_HLEN);
-			mlx5e_insert_vlan(eseg->inline_hdr.start, skb, attr->ihs);
+	if (ihs) {
+		u8 *start = eseg->inline_hdr.start;
+
+		if (unlikely(attr->hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			if (skb_vlan_tag_present(skb)) {
+				mlx5e_insert_vlan(start, skb, ETH_HLEN + sizeof(*h6));
+				ihs += VLAN_HLEN;
+				h6 = (struct ipv6hdr *)(start + sizeof(struct vlan_ethhdr));
+			} else {
+				memcpy(start, skb->data, ETH_HLEN + sizeof(*h6));
+				h6 = (struct ipv6hdr *)(start + ETH_HLEN);
+			}
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else if (skb_vlan_tag_present(skb)) {
+			mlx5e_insert_vlan(start, skb, ihs);
+			ihs += VLAN_HLEN;
 			stats->added_vlan_packets++;
 		} else {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs);
-			memcpy(eseg->inline_hdr.start, skb->data, attr->ihs);
+			memcpy(start, skb->data, ihs);
 		}
+		eseg->inline_hdr.sz |= cpu_to_be16(ihs);
 		dseg += wqe_attr->ds_cnt_inl;
 	} else if (skb_vlan_tag_present(skb)) {
 		eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN);
@@ -398,7 +433,7 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	}
 
 	dseg += wqe_attr->ds_cnt_ids;
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs + attr->hopbyhop,
 					  attr->headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
@@ -918,12 +953,29 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	eseg->mss = attr.mss;
 
 	if (attr.ihs) {
-		memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		if (unlikely(attr.hopbyhop)) {
+			struct ipv6hdr *h6;
+
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(eseg->inline_hdr.start, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)eseg->inline_hdr.start + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		}
 		eseg->inline_hdr.sz = cpu_to_be16(attr.ihs);
 		dseg += wqe_attr.ds_cnt_inl;
 	}
 
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs + attr.hopbyhop,
 					  attr.headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 00/13] tcp: BIG TCP implementation
  2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (12 preceding siblings ...)
  2022-05-10  3:32 ` [PATCH v6 net-next 13/13] mlx5: " Eric Dumazet
@ 2022-05-10 19:49 ` Alexander H Duyck
  13 siblings, 0 replies; 23+ messages in thread
From: Alexander H Duyck @ 2022-05-10 19:49 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet

On Mon, 2022-05-09 at 20:32 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> This series implements BIG TCP as presented in netdev 0x15:
> 
> https://netdevconf.info/0x15/session.html?BIG-TCP
> 
> Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> 
> Standard TSO/GRO packet limit is 64KB
> 
> With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> 
> Note that this feature is by default not enabled, because it might
> break some eBPF programs assuming TCP header immediately follows IPv6 header.
> 
> While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> are unable to skip over IPv6 extension headers.
> 
> Reducing number of packets traversing networking stack usually improves
> performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> 
> 'Standard' performance with current (74KB) limits.
> for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> 77           138          183          8542.19    
> 79           143          178          8215.28    
> 70           117          164          9543.39    
> 80           144          176          8183.71    
> 78           126          155          9108.47    
> 80           146          184          8115.19    
> 71           113          165          9510.96    
> 74           113          164          9518.74    
> 79           137          178          8575.04    
> 73           111          171          9561.73    
> 
> Now enable BIG TCP on both hosts.
> 
> ip link set dev eth0 gro_max_size 185000 gso_max_size 185000
> for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> 57           83           117          13871.38   
> 64           118          155          11432.94   
> 65           116          148          11507.62   
> 60           105          136          12645.15   
> 60           103          135          12760.34   
> 60           102          134          12832.64   
> 62           109          132          10877.68   
> 58           82           115          14052.93   
> 57           83           124          14212.58   
> 57           82           119          14196.01   
> 
> We see an increase of transactions per second, and lower latencies as well.
> 
> v6: fix a compilation error for CONFIG_IPV6=n in
>     "net: allow gso_max_size to exceed 65536", reported by kernel bots.
> 
> v5: Replaced two patches (that were adding new attributes) with patches
>     from Alexander Duyck. Idea is to reuse existing gso_max_size/gro_max_size
> 
> v4: Rebased on top of Jakub series (Merge branch 'tso-gso-limit-split')
>     max_tso_size is now family independent.
> 
> v3: Fixed a typo in RFC number (Alexander)
>     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> 
> v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
>     Addressed feedback, for Alexander and nvidia folks.
> 
> 
> Alexander Duyck (2):
>   net: allow gso_max_size to exceed 65536
>   net: allow gro_max_size to exceed 65536
> 
> Coco Li (2):
>   ipv6: Add hop-by-hop header to jumbograms in ip6_output
>   mlx5: support BIG TCP packets
> 
> Eric Dumazet (9):
>   net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
>   net: limit GSO_MAX_SIZE to 524280 bytes
>   tcp_cubic: make hystart_ack_delay() aware of BIG TCP
>   ipv6: add struct hop_jumbo_hdr definition
>   ipv6/gso: remove temporary HBH/jumbo header
>   ipv6/gro: insert temporary HBH/jumbo header
>   net: loopback: enable BIG TCP packets
>   veth: enable BIG TCP packets
>   mlx4: support BIG TCP packets

Looked over the changes to my patches and they all look good (sorry for
not catching that myself). This approach addresses all the concerns I
had.

For the series:
Acked-by: Alexander Duyck <alexanderduyck@fb.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-10  3:32 ` [PATCH v6 net-next 13/13] mlx5: " Eric Dumazet
@ 2022-05-12  8:40   ` Saeed Mahameed
  2022-05-12  9:02     ` Paolo Abeni
  0 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2022-05-12  8:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Eric Dumazet, Tariq Toukan,
	Leon Romanovsky

On 09 May 20:32, Eric Dumazet wrote:
>From: Coco Li <lixiaoyan@google.com>
>
>mlx5 supports LSOv2.
>
>IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
>with JUMBO TLV for big packets.
>
>We need to ignore/skip this HBH header when populating TX descriptor.
>

Sorry i didn't go through all the documentations or previous discussions,
please bare with me, so why not clear HBH just before calling the
driver xmit ndo ? 

Or if HBH has to stick, why not provide some helpers to the driver, to make
it less intrusive to the driver. 

mlx5_xmit_skb() {
    skb_remove_hbh(skb);
    populate tx descriptor as usual;
    skb_restore_hbh(skb); //must be before doorbell
    ring doorbell
}

>Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
>layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
>
>v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
>v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
>
>Signed-off-by: Coco Li <lixiaoyan@google.com>
>Signed-off-by: Eric Dumazet <edumazet@google.com>
>Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
>Cc: Saeed Mahameed <saeedm@nvidia.com>
>Cc: Leon Romanovsky <leon@kernel.org>
>---
> .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
> .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
> 2 files changed, 69 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>index d27986869b8ba070d1a4f8bcdc7e14ab54ae984e..bf3bca79e160124abd128ac1e9910cb2f39a39ff 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>@@ -4920,6 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>
> 	netdev->priv_flags       |= IFF_UNICAST_FLT;
>
>+	netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
> 	mlx5e_set_netdev_dev_addr(netdev);
> 	mlx5e_ipsec_build_netdev(priv);
> 	mlx5e_ktls_build_netdev(priv);
>diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>index 2dc48406cd08d21ff94f665cd61ab9227f351215..b4fc45ba1b347fb9ad0f46b9c091cc45e4d3d84f 100644
>--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
>@@ -40,6 +40,7 @@
> #include "en_accel/en_accel.h"
> #include "en_accel/ipsec_rxtx.h"
> #include "en/ptp.h"
>+#include <net/ipv6.h>
>
> static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma)
> {
>@@ -130,23 +131,32 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> 		sq->stats->csum_none++;
> }
>
>+/* Returns the number of header bytes that we plan
>+ * to inline later in the transmit descriptor
>+ */
> static inline u16
>-mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb)
>+mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb, int *hopbyhop)
> {
> 	struct mlx5e_sq_stats *stats = sq->stats;
> 	u16 ihs;
>
>+	*hopbyhop = 0;
> 	if (skb->encapsulation) {
> 		ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
> 		stats->tso_inner_packets++;
> 		stats->tso_inner_bytes += skb->len - ihs;
> 	} else {
>-		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
>+		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
> 			ihs = skb_transport_offset(skb) + sizeof(struct udphdr);
>-		else
>+		} else {
> 			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
>+			if (ipv6_has_hopopt_jumbo(skb)) {
>+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
>+				ihs -= sizeof(struct hop_jumbo_hdr);
>+			}
>+		}
> 		stats->tso_packets++;
>-		stats->tso_bytes += skb->len - ihs;
>+		stats->tso_bytes += skb->len - ihs - *hopbyhop;
> 	}
>
> 	return ihs;
>@@ -208,6 +218,7 @@ struct mlx5e_tx_attr {
> 	__be16 mss;
> 	u16 insz;
> 	u8 opcode;
>+	u8 hopbyhop;
> };
>
> struct mlx5e_tx_wqe_attr {
>@@ -244,14 +255,16 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> 	struct mlx5e_sq_stats *stats = sq->stats;
>
> 	if (skb_is_gso(skb)) {
>-		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb);
>+		int hopbyhop;
>+		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
>
> 		*attr = (struct mlx5e_tx_attr) {
> 			.opcode    = MLX5_OPCODE_LSO,
> 			.mss       = cpu_to_be16(skb_shinfo(skb)->gso_size),
> 			.ihs       = ihs,
> 			.num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs,
>-			.headlen   = skb_headlen(skb) - ihs,
>+			.headlen   = skb_headlen(skb) - ihs - hopbyhop,
>+			.hopbyhop  = hopbyhop,
> 		};
>
> 		stats->packets += skb_shinfo(skb)->gso_segs;
>@@ -365,7 +378,8 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> 	struct mlx5_wqe_eth_seg  *eseg;
> 	struct mlx5_wqe_data_seg *dseg;
> 	struct mlx5e_tx_wqe_info *wi;
>-
>+	u16 ihs = attr->ihs;
>+	struct ipv6hdr *h6;
> 	struct mlx5e_sq_stats *stats = sq->stats;
> 	int num_dma;
>
>@@ -379,15 +393,36 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
>
> 	eseg->mss = attr->mss;
>
>-	if (attr->ihs) {
>-		if (skb_vlan_tag_present(skb)) {
>-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs + VLAN_HLEN);
>-			mlx5e_insert_vlan(eseg->inline_hdr.start, skb, attr->ihs);
>+	if (ihs) {
>+		u8 *start = eseg->inline_hdr.start;
>+
>+		if (unlikely(attr->hopbyhop)) {
>+			/* remove the HBH header.
>+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
>+			 */
>+			if (skb_vlan_tag_present(skb)) {
>+				mlx5e_insert_vlan(start, skb, ETH_HLEN + sizeof(*h6));
>+				ihs += VLAN_HLEN;
>+				h6 = (struct ipv6hdr *)(start + sizeof(struct vlan_ethhdr));
>+			} else {
>+				memcpy(start, skb->data, ETH_HLEN + sizeof(*h6));
>+				h6 = (struct ipv6hdr *)(start + ETH_HLEN);
>+			}
>+			h6->nexthdr = IPPROTO_TCP;
>+			/* Copy the TCP header after the IPv6 one */
>+			memcpy(h6 + 1,
>+			       skb->data + ETH_HLEN + sizeof(*h6) +
>+					sizeof(struct hop_jumbo_hdr),
>+			       tcp_hdrlen(skb));
>+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
>+		} else if (skb_vlan_tag_present(skb)) {
>+			mlx5e_insert_vlan(start, skb, ihs);
>+			ihs += VLAN_HLEN;
> 			stats->added_vlan_packets++;
> 		} else {
>-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs);
>-			memcpy(eseg->inline_hdr.start, skb->data, attr->ihs);
>+			memcpy(start, skb->data, ihs);
> 		}
>+		eseg->inline_hdr.sz |= cpu_to_be16(ihs);
> 		dseg += wqe_attr->ds_cnt_inl;
> 	} else if (skb_vlan_tag_present(skb)) {
> 		eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN);
>@@ -398,7 +433,7 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> 	}
>
> 	dseg += wqe_attr->ds_cnt_ids;
>-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs,
>+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs + attr->hopbyhop,
> 					  attr->headlen, dseg);
> 	if (unlikely(num_dma < 0))
> 		goto err_drop;
>@@ -918,12 +953,29 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
> 	eseg->mss = attr.mss;
>
> 	if (attr.ihs) {
>-		memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
>+		if (unlikely(attr.hopbyhop)) {
>+			struct ipv6hdr *h6;
>+
>+			/* remove the HBH header.
>+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
>+			 */
>+			memcpy(eseg->inline_hdr.start, skb->data, ETH_HLEN + sizeof(*h6));
>+			h6 = (struct ipv6hdr *)((char *)eseg->inline_hdr.start + ETH_HLEN);
>+			h6->nexthdr = IPPROTO_TCP;
>+			/* Copy the TCP header after the IPv6 one */
>+			memcpy(h6 + 1,
>+			       skb->data + ETH_HLEN + sizeof(*h6) +
>+					sizeof(struct hop_jumbo_hdr),
>+			       tcp_hdrlen(skb));
>+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
>+		} else {
>+			memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
>+		}
> 		eseg->inline_hdr.sz = cpu_to_be16(attr.ihs);
> 		dseg += wqe_attr.ds_cnt_inl;
> 	}
>
>-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs,
>+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs + attr.hopbyhop,
> 					  attr.headlen, dseg);
> 	if (unlikely(num_dma < 0))
> 		goto err_drop;
>-- 
>2.36.0.512.ge40c2bad7a-goog
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-12  8:40   ` Saeed Mahameed
@ 2022-05-12  9:02     ` Paolo Abeni
  2022-05-13  4:29       ` Saeed Mahameed
  0 siblings, 1 reply; 23+ messages in thread
From: Paolo Abeni @ 2022-05-12  9:02 UTC (permalink / raw)
  To: Saeed Mahameed, Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Alexander Duyck,
	Coco Li, Eric Dumazet, Tariq Toukan, Leon Romanovsky

On Thu, 2022-05-12 at 01:40 -0700, Saeed Mahameed wrote:
> On 09 May 20:32, Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> > 
> > mlx5 supports LSOv2.
> > 
> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> > with JUMBO TLV for big packets.
> > 
> > We need to ignore/skip this HBH header when populating TX descriptor.
> > 
> 
> Sorry i didn't go through all the documentations or previous discussions,
> please bare with me, so why not clear HBH just before calling the
> driver xmit ndo ? 

I guess this way is more efficient: the driver copies IP hdr and TCP
hdr directly in the correct/final location into the tx descriptor,
otherwise the caller would have to memmove L2/L3 just before the driver
copies them again.
> 
> Or if HBH has to stick, 

My understanding is that this is not the case.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-12  9:02     ` Paolo Abeni
@ 2022-05-13  4:29       ` Saeed Mahameed
  2022-05-13  4:34         ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2022-05-13  4:29 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li, Eric Dumazet, Tariq Toukan,
	Leon Romanovsky

On 12 May 11:02, Paolo Abeni wrote:
>On Thu, 2022-05-12 at 01:40 -0700, Saeed Mahameed wrote:
>> On 09 May 20:32, Eric Dumazet wrote:
>> > From: Coco Li <lixiaoyan@google.com>
>> >
>> > mlx5 supports LSOv2.
>> >
>> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
>> > with JUMBO TLV for big packets.
>> >
>> > We need to ignore/skip this HBH header when populating TX descriptor.
>> >
>>
>> Sorry i didn't go through all the documentations or previous discussions,
>> please bare with me, so why not clear HBH just before calling the
>> driver xmit ndo ?
>
>I guess this way is more efficient: the driver copies IP hdr and TCP
>hdr directly in the correct/final location into the tx descriptor,
>otherwise the caller would have to memmove L2/L3 just before the driver
>copies them again.
>>

memmove(sizeof(L2/L3)) is not that bad when done only every 64KB+.
it's going to be hard to repeat this and maintain this across all drivers
only to get this micro optimization that I doubt it will be even measurable.

>> Or if HBH has to stick, 
>
>My understanding is that this is not the case.
>
>Cheers,
>
>Paolo
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-13  4:29       ` Saeed Mahameed
@ 2022-05-13  4:34         ` Eric Dumazet
  2022-05-13  5:49           ` Saeed Mahameed
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2022-05-13  4:34 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Paolo Abeni, Eric Dumazet, David S . Miller, Jakub Kicinski,
	netdev, Alexander Duyck, Coco Li, Tariq Toukan, Leon Romanovsky

On Thu, May 12, 2022 at 9:29 PM Saeed Mahameed <saeedm@nvidia.com> wrote:
>
> On 12 May 11:02, Paolo Abeni wrote:
> >On Thu, 2022-05-12 at 01:40 -0700, Saeed Mahameed wrote:
> >> On 09 May 20:32, Eric Dumazet wrote:
> >> > From: Coco Li <lixiaoyan@google.com>
> >> >
> >> > mlx5 supports LSOv2.
> >> >
> >> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> >> > with JUMBO TLV for big packets.
> >> >
> >> > We need to ignore/skip this HBH header when populating TX descriptor.
> >> >
> >>
> >> Sorry i didn't go through all the documentations or previous discussions,
> >> please bare with me, so why not clear HBH just before calling the
> >> driver xmit ndo ?
> >
> >I guess this way is more efficient: the driver copies IP hdr and TCP
> >hdr directly in the correct/final location into the tx descriptor,
> >otherwise the caller would have to memmove L2/L3 just before the driver
> >copies them again.
> >>
>
> memmove(sizeof(L2/L3)) is not that bad when done only every 64KB+.
> it's going to be hard to repeat this and maintain this across all drivers
> only to get this micro optimization that I doubt it will be even measurable.

We prefer not changing skb->head, this would break tcpdump.

Surely calling skb_cow_head() would incur a cost.

As I suggested, we can respin the series without the mlx5 patch, this
is totally fine for us, if we can avoid missing 5.19 train.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-13  4:34         ` Eric Dumazet
@ 2022-05-13  5:49           ` Saeed Mahameed
  2022-05-13 13:05             ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Saeed Mahameed @ 2022-05-13  5:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, Eric Dumazet, David S . Miller, Jakub Kicinski,
	netdev, Alexander Duyck, Coco Li, Tariq Toukan, Leon Romanovsky

On 12 May 21:34, Eric Dumazet wrote:
>On Thu, May 12, 2022 at 9:29 PM Saeed Mahameed <saeedm@nvidia.com> wrote:
>>
>> On 12 May 11:02, Paolo Abeni wrote:
>> >On Thu, 2022-05-12 at 01:40 -0700, Saeed Mahameed wrote:
>> >> On 09 May 20:32, Eric Dumazet wrote:
>> >> > From: Coco Li <lixiaoyan@google.com>
>> >> >
>> >> > mlx5 supports LSOv2.
>> >> >
>> >> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
>> >> > with JUMBO TLV for big packets.
>> >> >
>> >> > We need to ignore/skip this HBH header when populating TX descriptor.
>> >> >
>> >>
>> >> Sorry i didn't go through all the documentations or previous discussions,
>> >> please bare with me, so why not clear HBH just before calling the
>> >> driver xmit ndo ?
>> >
>> >I guess this way is more efficient: the driver copies IP hdr and TCP
>> >hdr directly in the correct/final location into the tx descriptor,
>> >otherwise the caller would have to memmove L2/L3 just before the driver
>> >copies them again.
>> >>
>>
>> memmove(sizeof(L2/L3)) is not that bad when done only every 64KB+.
>> it's going to be hard to repeat this and maintain this across all drivers
>> only to get this micro optimization that I doubt it will be even measurable.
>
>We prefer not changing skb->head, this would break tcpdump.
>

in that case we can provide a helper to the drivers to call, just before
they start processing the skb.

>Surely calling skb_cow_head() would incur a cost.
>

Sure, but the benefit of this patch outweighs this cost by orders of
magnitude, you pay an extra 0.1$ for a cleaner code, and you still
get your 64K$ BIG TCP cash. 

>As I suggested, we can respin the series without the mlx5 patch, this
>is totally fine for us, if we can avoid missing 5.19 train.

To be clear, I am not nacking, Tariq already reviewed and gave his blessing,
and i won't resist this patch on v6. I am Just suggesting an improvement
to code readability and scalability to other drivers.

FWIW:
Acked-by: Saeed Mahameed <saeedm@nvidia.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-13  5:49           ` Saeed Mahameed
@ 2022-05-13 13:05             ` Eric Dumazet
  2022-05-13 17:04               ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2022-05-13 13:05 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: Paolo Abeni, Eric Dumazet, David S . Miller, Jakub Kicinski,
	netdev, Alexander Duyck, Coco Li, Tariq Toukan, Leon Romanovsky

On Thu, May 12, 2022 at 10:49 PM Saeed Mahameed <saeedm@nvidia.com> wrote:
>
> On 12 May 21:34, Eric Dumazet wrote:
> >On Thu, May 12, 2022 at 9:29 PM Saeed Mahameed <saeedm@nvidia.com> wrote:
> >>
> >> On 12 May 11:02, Paolo Abeni wrote:
> >> >On Thu, 2022-05-12 at 01:40 -0700, Saeed Mahameed wrote:
> >> >> On 09 May 20:32, Eric Dumazet wrote:
> >> >> > From: Coco Li <lixiaoyan@google.com>
> >> >> >
> >> >> > mlx5 supports LSOv2.
> >> >> >
> >> >> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> >> >> > with JUMBO TLV for big packets.
> >> >> >
> >> >> > We need to ignore/skip this HBH header when populating TX descriptor.
> >> >> >
> >> >>
> >> >> Sorry i didn't go through all the documentations or previous discussions,
> >> >> please bare with me, so why not clear HBH just before calling the
> >> >> driver xmit ndo ?
> >> >
> >> >I guess this way is more efficient: the driver copies IP hdr and TCP
> >> >hdr directly in the correct/final location into the tx descriptor,
> >> >otherwise the caller would have to memmove L2/L3 just before the driver
> >> >copies them again.
> >> >>
> >>
> >> memmove(sizeof(L2/L3)) is not that bad when done only every 64KB+.
> >> it's going to be hard to repeat this and maintain this across all drivers
> >> only to get this micro optimization that I doubt it will be even measurable.
> >
> >We prefer not changing skb->head, this would break tcpdump.
> >
>
> in that case we can provide a helper to the drivers to call, just before
> they start processing the skb.
>
> >Surely calling skb_cow_head() would incur a cost.
> >
>
> Sure, but the benefit of this patch outweighs this cost by orders of
> magnitude, you pay an extra 0.1$ for a cleaner code, and you still
> get your 64K$ BIG TCP cash.
>
> >As I suggested, we can respin the series without the mlx5 patch, this
> >is totally fine for us, if we can avoid missing 5.19 train.
>
> To be clear, I am not nacking, Tariq already reviewed and gave his blessing,
> and i won't resist this patch on v6. I am Just suggesting an improvement
> to code readability and scalability to other drivers.

The problem is that  skb_cow_head() can fail.

Really we have thought about this already.

A common helper for drivers is mostly unusable, you would have to
pre-allocate a per TX-ring slot to store the headers.
We would end up with adding complexity at queue creation/dismantle.

We could do that later, because some NICs do not inline the headers in
TX descriptor, but instead request
one mapped buffer for the headers part only.

BTW, I know Tariq already reviewed, the issue at hand is about
CONFIG_FORTIFY which is blocking us.

This is why I was considering not submitting mlx5 change until Kees
Cook and others come up with a solution.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-13 13:05             ` Eric Dumazet
@ 2022-05-13 17:04               ` Jakub Kicinski
  2022-05-13 17:12                 ` Eric Dumazet
  0 siblings, 1 reply; 23+ messages in thread
From: Jakub Kicinski @ 2022-05-13 17:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Saeed Mahameed, Paolo Abeni, Eric Dumazet, David S . Miller,
	netdev, Alexander Duyck, Coco Li, Tariq Toukan, Leon Romanovsky

On Fri, 13 May 2022 06:05:36 -0700 Eric Dumazet wrote:
> The problem is that  skb_cow_head() can fail.
> 
> Really we have thought about this already.
> 
> A common helper for drivers is mostly unusable, you would have to
> pre-allocate a per TX-ring slot to store the headers.
> We would end up with adding complexity at queue creation/dismantle.
> 
> We could do that later, because some NICs do not inline the headers in
> TX descriptor, but instead request
> one mapped buffer for the headers part only.
> 
> BTW, I know Tariq already reviewed, the issue at hand is about
> CONFIG_FORTIFY which is blocking us.
> 
> This is why I was considering not submitting mlx5 change until Kees
> Cook and others come up with a solution.

We do have the solution, no?

commit 43213daed6d6 ("fortify: Provide a memcpy trap door for sharp
corners")

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v6 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-13 17:04               ` Jakub Kicinski
@ 2022-05-13 17:12                 ` Eric Dumazet
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Dumazet @ 2022-05-13 17:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Saeed Mahameed, Paolo Abeni, Eric Dumazet, David S . Miller,
	netdev, Alexander Duyck, Coco Li, Tariq Toukan, Leon Romanovsky

On Fri, May 13, 2022 at 10:04 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 13 May 2022 06:05:36 -0700 Eric Dumazet wrote:
> > The problem is that  skb_cow_head() can fail.
> >
> > Really we have thought about this already.
> >
> > A common helper for drivers is mostly unusable, you would have to
> > pre-allocate a per TX-ring slot to store the headers.
> > We would end up with adding complexity at queue creation/dismantle.
> >
> > We could do that later, because some NICs do not inline the headers in
> > TX descriptor, but instead request
> > one mapped buffer for the headers part only.
> >
> > BTW, I know Tariq already reviewed, the issue at hand is about
> > CONFIG_FORTIFY which is blocking us.
> >
> > This is why I was considering not submitting mlx5 change until Kees
> > Cook and others come up with a solution.
>
> We do have the solution, no?
>
> commit 43213daed6d6 ("fortify: Provide a memcpy trap door for sharp
> corners")

Oh I missed this was already merged.

I will rebase then.

Hopefully ARCH=hexagon|awesome won't trigger a new issue :)

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2022-05-13 17:13 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-10  3:32 [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 07/13] ipv6/gro: insert " Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 08/13] net: allow gro_max_size to exceed 65536 Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 10/13] net: loopback: enable BIG TCP packets Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 11/13] veth: " Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 12/13] mlx4: support " Eric Dumazet
2022-05-10  3:32 ` [PATCH v6 net-next 13/13] mlx5: " Eric Dumazet
2022-05-12  8:40   ` Saeed Mahameed
2022-05-12  9:02     ` Paolo Abeni
2022-05-13  4:29       ` Saeed Mahameed
2022-05-13  4:34         ` Eric Dumazet
2022-05-13  5:49           ` Saeed Mahameed
2022-05-13 13:05             ` Eric Dumazet
2022-05-13 17:04               ` Jakub Kicinski
2022-05-13 17:12                 ` Eric Dumazet
2022-05-10 19:49 ` [PATCH v6 net-next 00/13] tcp: BIG TCP implementation Alexander H Duyck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.