All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net-next 00/14] tcp: BIG TCP implementation
@ 2022-03-03 18:15 Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

This series implements BIG TCP as presented in netdev 0x15:

https://netdevconf.info/0x15/session.html?BIG-TCP

Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/

Standard TSO/GRO packet limit is 64KB

With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.

Note that this feature is by default not enabled, because it might
break some eBPF programs assuming TCP header immediately follows IPv6 header.

Reducing number of packets traversing networking stack usually improves
performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.

'Standard' performance with current (74KB) limits.
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
77           138          183          8542.19    
79           143          178          8215.28    
70           117          164          9543.39    
80           144          176          8183.71    
78           126          155          9108.47    
80           146          184          8115.19    
71           113          165          9510.96    
74           113          164          9518.74    
79           137          178          8575.04    
73           111          171          9561.73    

Now enable BIG TCP on both hosts.

ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
57           83           117          13871.38   
64           118          155          11432.94   
65           116          148          11507.62   
60           105          136          12645.15   
60           103          135          12760.34   
60           102          134          12832.64   
62           109          132          10877.68   
58           82           115          14052.93   
57           83           124          14212.58   
57           82           119          14196.01   

We see an increase of transactions per second, and lower latencies as well.

v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
    Addressed feedback, for Alexander and nvidia folks.

Coco Li (5):
  ipv6: add dev->gso_ipv6_max_size
  ipv6: add GRO_IPV6_MAX_SIZE
  ipv6: Add hop-by-hop header to jumbograms in ip6_output
  ipvlan: enable BIG TCP Packets
  mlx5: support BIG TCP packets

Eric Dumazet (9):
  net: add netdev->tso_ipv6_max_size attribute
  tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  ipv6: add struct hop_jumbo_hdr definition
  ipv6/gso: remove temporary HBH/jumbo header
  ipv6/gro: insert temporary HBH/jumbo header
  net: loopback: enable BIG TCP packets
  bonding: update dev->tso_ipv6_max_size
  macvlan: enable BIG TCP Packets
  mlx4: support BIG TCP packets

 drivers/net/bonding/bond_main.c               |  3 +
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 +
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 82 +++++++++++++++----
 drivers/net/ipvlan/ipvlan_main.c              |  1 +
 drivers/net/loopback.c                        |  2 +
 drivers/net/macvlan.c                         |  1 +
 include/linux/ipv6.h                          |  1 +
 include/linux/netdevice.h                     | 32 ++++++++
 include/net/ipv6.h                            | 44 ++++++++++
 include/uapi/linux/if_link.h                  |  3 +
 net/core/dev.c                                |  4 +
 net/core/gro.c                                | 20 ++++-
 net/core/rtnetlink.c                          | 33 ++++++++
 net/core/sock.c                               |  6 ++
 net/ipv4/tcp_cubic.c                          |  4 +-
 net/ipv6/ip6_offload.c                        | 56 ++++++++++++-
 net/ipv6/ip6_output.c                         | 22 ++++-
 tools/include/uapi/linux/if_link.h            |  3 +
 20 files changed, 334 insertions(+), 34 deletions(-)

-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
@ 2022-03-03 18:15 ` Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Some NIC (or virtual devices) are LSOv2 compatible.

BIG TCP plans using the large LSOv2 feature for IPv6.

New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.

Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.

Unchanged drivers are not allowing big TSO packets to be sent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 10 ++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  2 ++
 net/core/rtnetlink.c               |  3 +++
 tools/include/uapi/linux/if_link.h |  1 +
 5 files changed, 17 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 19a27ac361efb64068e2b9954eb85261283b3d60..3b59359b5e4d35f40fb90d594e78cb88befbbcbf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1951,6 +1951,7 @@ enum netdev_ml_priv_type {
  *	@dev_registered_tracker:	tracker for reference held while
  *					registered
  *	@offload_xstats_l3:	L3 HW stats for this netdevice.
+ *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -2289,6 +2290,7 @@ struct net_device {
 	netdevice_tracker	watchdog_dev_tracker;
 	netdevice_tracker	dev_registered_tracker;
 	struct rtnl_hw_stats64	*offload_xstats_l3;
+	unsigned int		tso_ipv6_max_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -4898,6 +4900,14 @@ static inline void netif_set_gro_max_size(struct net_device *dev,
 	WRITE_ONCE(dev->gro_max_size, size);
 }
 
+/* Used by drivers to give their hardware/firmware limit for LSOv2 packets */
+static inline void netif_set_tso_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	dev->tso_ipv6_max_size = size;
+}
+
+
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
 					int mac_len)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ddca20357e7e89b5f204b3117ff3838735535470..c8af031b692e52690a2760e9d79c9462185e2fc9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -363,6 +363,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 5db2443c237132946fd0f3dc095d29711cccec12..aa37d3f2ca1afe53b05b7d71be1dbdccaeca4f6b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10461,6 +10461,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gso_max_size = GSO_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
+	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
+
 	dev->upper_level = 1;
 	dev->lower_level = 1;
 #ifdef CONFIG_LOCKDEP
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a66b6761b88b1c63c916fc085f4d9e8523bb0659..864c411c124040e2076289f8714f8b043563408c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1027,6 +1027,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1732,6 +1733,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1885,6 +1887,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index e1ba2d51b717b7ac7f06e94ac9791cf4c8a5ab6f..441615c39f0a24eeeb6e27b4ca88031bcc234cf8 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 02/14] ipv6: add dev->gso_ipv6_max_size
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
@ 2022-03-03 18:15 ` Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

This enable TCP stack to build TSO packets bigger than
64KB if the driver is LSOv2 compatible.

This patch introduces new variable gso_ipv6_max_size
that is modifiable through ip link.

ip link set dev eth0 gso_ipv6_max_size 185000

User input is capped by driver limit (tso_ipv6_max_size)
added in previous patch.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 12 ++++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  1 +
 net/core/rtnetlink.c               | 15 +++++++++++++++
 net/core/sock.c                    |  6 ++++++
 tools/include/uapi/linux/if_link.h |  1 +
 6 files changed, 36 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3b59359b5e4d35f40fb90d594e78cb88befbbcbf..6d559a0c4abd7cd1f5ee90e0c303fe9331a27841 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1952,6 +1952,7 @@ enum netdev_ml_priv_type {
  *					registered
  *	@offload_xstats_l3:	L3 HW stats for this netdevice.
  *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
+ *	@gso_ipv6_max_size:	Maximum size of IPv6 GSO packets (user/admin limit)
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -2291,6 +2292,7 @@ struct net_device {
 	netdevice_tracker	dev_registered_tracker;
 	struct rtnl_hw_stats64	*offload_xstats_l3;
 	unsigned int		tso_ipv6_max_size;
+	unsigned int		gso_ipv6_max_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -4884,6 +4886,10 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
 {
 	/* dev->gso_max_size is read locklessly from sk_setup_caps() */
 	WRITE_ONCE(dev->gso_max_size, size);
+
+	/* legacy drivers want to lower gso_max_size, regardless of family. */
+	size = min(size, dev->gso_ipv6_max_size);
+	WRITE_ONCE(dev->gso_ipv6_max_size, size);
 }
 
 static inline void netif_set_gso_max_segs(struct net_device *dev,
@@ -4907,6 +4913,12 @@ static inline void netif_set_tso_ipv6_max_size(struct net_device *dev,
 	dev->tso_ipv6_max_size = size;
 }
 
+static inline void netif_set_gso_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	size = min(size, dev->tso_ipv6_max_size);
+	WRITE_ONCE(dev->gso_ipv6_max_size, size);
+}
 
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c8af031b692e52690a2760e9d79c9462185e2fc9..048a9c848a3a39596b6c3135553fdfb9a1fe37d2 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -364,6 +364,7 @@ enum {
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
+	IFLA_GSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index aa37d3f2ca1afe53b05b7d71be1dbdccaeca4f6b..7dbedec0903279ece0cb1199969f732a4dc35cd2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10462,6 +10462,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
 	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
+	dev->gso_ipv6_max_size = GSO_MAX_SIZE;
 
 	dev->upper_level = 1;
 	dev->lower_level = 1;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 864c411c124040e2076289f8714f8b043563408c..a60efa6d0fac1b9ce209126bad946a3d2bd24ac3 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1028,6 +1028,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_GSO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1734,6 +1735,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
 	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
+	    nla_put_u32(skb, IFLA_GSO_IPV6_MAX_SIZE, dev->gso_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1888,6 +1890,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
 	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_GSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2774,6 +2777,15 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 	}
 
+	if (tb[IFLA_GSO_IPV6_MAX_SIZE]) {
+		u32 max_size = nla_get_u32(tb[IFLA_GSO_IPV6_MAX_SIZE]);
+
+		if (dev->gso_ipv6_max_size ^ max_size) {
+			netif_set_gso_ipv6_max_size(dev, max_size);
+			status |= DO_SETLINK_MODIFIED;
+		}
+	}
+
 	if (tb[IFLA_GSO_MAX_SEGS]) {
 		u32 max_segs = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]);
 
@@ -3249,6 +3261,9 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 		netif_set_gso_max_segs(dev, nla_get_u32(tb[IFLA_GSO_MAX_SEGS]));
 	if (tb[IFLA_GRO_MAX_SIZE])
 		netif_set_gro_max_size(dev, nla_get_u32(tb[IFLA_GRO_MAX_SIZE]));
+	if (tb[IFLA_GSO_IPV6_MAX_SIZE])
+		netif_set_gso_ipv6_max_size(dev,
+			nla_get_u32(tb[IFLA_GSO_IPV6_MAX_SIZE]));
 
 	return dev;
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 784c92eaded89fdb55be0ad11dd2dadc8548814b..7cd83bea205849ba7c3ee420d5a5e54ceff9979a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2279,6 +2279,12 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */
 			sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
+#if IS_ENABLED(CONFIG_IPV6)
+			if (sk->sk_family == AF_INET6 &&
+			    sk_is_tcp(sk) &&
+			    !ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
+				sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_ipv6_max_size);
+#endif
 			sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */
 			max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1);
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 441615c39f0a24eeeb6e27b4ca88031bcc234cf8..e40cd575607872d3bff3bc1971df8c6426290562 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -349,6 +349,7 @@ enum {
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
+	IFLA_GSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
@ 2022-03-03 18:15 ` Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

hystart_ack_delay() had the assumption that a TSO packet
would not be bigger than GSO_MAX_SIZE.

This will no longer be true.

We should use sk->sk_gso_max_size instead.

This reduces chances of spurious Hystart ACK train detections.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_cubic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 24d562dd62254d6e50dd08236f8967400d81e1ea..dfc9dc951b7404776b2246c38273fbadf03c39fd 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -372,7 +372,7 @@ static void cubictcp_state(struct sock *sk, u8 new_state)
  * We apply another 100% factor because @rate is doubled at this point.
  * We cap the cushion to 1ms.
  */
-static u32 hystart_ack_delay(struct sock *sk)
+static u32 hystart_ack_delay(const struct sock *sk)
 {
 	unsigned long rate;
 
@@ -380,7 +380,7 @@ static u32 hystart_ack_delay(struct sock *sk)
 	if (!rate)
 		return 0;
 	return min_t(u64, USEC_PER_MSEC,
-		     div64_ul((u64)GSO_MAX_SIZE * 4 * USEC_PER_SEC, rate));
+		     div64_ul((u64)sk->sk_gso_max_size * 4 * USEC_PER_SEC, rate));
 }
 
 static void hystart_update(struct sock *sk, u32 delay)
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-03-03 18:15 ` [PATCH v2 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
@ 2022-03-03 18:15 ` Eric Dumazet
  2022-03-04 19:26   ` Alexander H Duyck
  2022-03-03 18:15 ` [PATCH v2 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patches will need to add and remove local IPv6 jumbogram
options to enable BIG TCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 213612f1680c7c39f4c07f0c05b4e6cf34a7878e..95f405cde9e539d7909b6b89af2b956655f38b94 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -151,6 +151,17 @@ struct frag_hdr {
 	__be32	identification;
 };
 
+/*
+ * Jumbo payload option, as described in RFC 2676 2.
+ */
+struct hop_jumbo_hdr {
+	u8	nexthdr;
+	u8	hdrlen;
+	u8	tlv_type;	/* IPV6_TLV_JUMBO, 0xC2 */
+	u8	tlv_len;	/* 4 */
+	__be32	jumbo_payload_len;
+};
+
 #define	IP6_MF		0x0001
 #define	IP6_OFFSET	0xFFF8
 
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (3 preceding siblings ...)
  2022-03-03 18:15 ` [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
@ 2022-03-03 18:15 ` Eric Dumazet
  2022-03-03 18:15 ` [PATCH v2 net-next 06/14] ipv6/gro: insert " Eric Dumazet
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

ipv6 tcp and gro stacks will soon be able to build big TCP packets,
with an added temporary Hop By Hop header.

If GSO is involved for these large packets, we need to remove
the temporary HBH header before segmentation happens.

v2: perform HBH removal from ipv6_gso_segment() instead of
    skb_segment() (Alexander feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h     | 33 +++++++++++++++++++++++++++++++++
 net/ipv6/ip6_offload.c | 24 +++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 95f405cde9e539d7909b6b89af2b956655f38b94..efe0025bdbb9668ceb01989705ce8bccbf592350 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -467,6 +467,39 @@ bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb,
 struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
 					   struct ipv6_txoptions *opt);
 
+/* This helper is specialized for BIG TCP needs.
+ * It assumes the hop_jumbo_hdr will immediately follow the IPV6 header.
+ * It assumes headers are already in skb->head.
+ * Returns 0, or IPPROTO_TCP if a BIG TCP packet is there.
+ */
+static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
+{
+	const struct hop_jumbo_hdr *jhdr;
+	const struct ipv6hdr *nhdr;
+
+	if (likely(skb->len <= GRO_MAX_SIZE))
+		return 0;
+
+	if (skb->protocol != htons(ETH_P_IPV6))
+		return 0;
+
+	if (skb_network_offset(skb) +
+	    sizeof(struct ipv6hdr) +
+	    sizeof(struct hop_jumbo_hdr) > skb_headlen(skb))
+		return 0;
+
+	nhdr = ipv6_hdr(skb);
+
+	if (nhdr->nexthdr != NEXTHDR_HOP)
+		return 0;
+
+	jhdr = (const struct hop_jumbo_hdr *) (nhdr + 1);
+	if (jhdr->tlv_type != IPV6_TLV_JUMBO || jhdr->hdrlen != 0 ||
+	    jhdr->nexthdr != IPPROTO_TCP)
+		return 0;
+	return jhdr->nexthdr;
+}
+
 static inline bool ipv6_accept_ra(struct inet6_dev *idev)
 {
 	/* If forwarding is enabled, RA are not accepted unless the special
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index c4fc03c1ac99dbecd92e2b47b2db65374197434d..a6a6c1539c28d242ef8c35fcd5ce900512ce912d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -77,7 +77,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct ipv6hdr *ipv6h;
 	const struct net_offload *ops;
-	int proto;
+	int proto, nexthdr;
 	struct frag_hdr *fptr;
 	unsigned int payload_len;
 	u8 *prevhdr;
@@ -87,6 +87,28 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	bool gso_partial;
 
 	skb_reset_network_header(skb);
+	nexthdr = ipv6_has_hopopt_jumbo(skb);
+	if (nexthdr) {
+		const int hophdr_len = sizeof(struct hop_jumbo_hdr);
+		int err;
+
+		err = skb_cow_head(skb, 0);
+		if (err < 0)
+			return ERR_PTR(err);
+
+		/* remove the HBH header.
+		 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+		 */
+		memmove(skb_mac_header(skb) + hophdr_len,
+			skb_mac_header(skb),
+			ETH_HLEN + sizeof(struct ipv6hdr));
+		skb->data += hophdr_len;
+		skb->len -= hophdr_len;
+		skb->network_header += hophdr_len;
+		skb->mac_header += hophdr_len;
+		ipv6h = (struct ipv6hdr *)skb->data;
+		ipv6h->nexthdr = nexthdr;
+	}
 	nhoff = skb_network_header(skb) - skb_mac_header(skb);
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (4 preceding siblings ...)
  2022-03-03 18:15 ` [PATCH v2 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
@ 2022-03-03 18:15 ` Eric Dumazet
  2022-03-03 18:16 ` [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:15 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
BIG TCP ipv6 packets (bigger than 64K).

This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
so that resulting packet can go through IPv6/TCP stacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_offload.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a6a6c1539c28d242ef8c35fcd5ce900512ce912d..d12dba2dd5354dbb79bb80df4038dec2544cddeb 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -342,15 +342,43 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
-	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+	struct ipv6hdr *iph;
 	int err = -ENOSYS;
+	u32 payload_len;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
 		skb_set_inner_network_header(skb, nhoff);
 	}
 
-	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
+	payload_len = skb->len - nhoff - sizeof(*iph);
+	if (unlikely(payload_len > IPV6_MAXPLEN)) {
+		struct hop_jumbo_hdr *hop_jumbo;
+		int hoplen = sizeof(*hop_jumbo);
+
+		/* Move network header left */
+		memmove(skb_mac_header(skb) - hoplen, skb_mac_header(skb),
+			skb->transport_header - skb->mac_header);
+		skb->data -= hoplen;
+		skb->len += hoplen;
+		skb->mac_header -= hoplen;
+		skb->network_header -= hoplen;
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		hop_jumbo = (struct hop_jumbo_hdr *)(iph + 1);
+
+		/* Build hop-by-hop options */
+		hop_jumbo->nexthdr = iph->nexthdr;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(payload_len + hoplen);
+
+		iph->nexthdr = NEXTHDR_HOP;
+		iph->payload_len = 0;
+	} else {
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		iph->payload_len = htons(payload_len);
+	}
 
 	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (5 preceding siblings ...)
  2022-03-03 18:15 ` [PATCH v2 net-next 06/14] ipv6/gro: insert " Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-04  4:37   ` David Ahern
  2022-03-03 18:16 ` [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Enable GRO to have IPv6 specific limit for max packet size.

This patch introduces new dev->gro_ipv6_max_size
that is modifiable through ip link.

ip link set dev eth0 gro_ipv6_max_size 185000

Note that this value is only considered if bigger than
gro_max_size, and for non encapsulated TCP/ipv6 packets.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 10 ++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  1 +
 net/core/gro.c                     | 20 ++++++++++++++++++--
 net/core/rtnetlink.c               | 15 +++++++++++++++
 tools/include/uapi/linux/if_link.h |  1 +
 6 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6d559a0c4abd7cd1f5ee90e0c303fe9331a27841..30c9c6a4f51c364a0178bbb4ed8c2a57ede51d47 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1944,6 +1944,8 @@ enum netdev_ml_priv_type {
  *			keep a list of interfaces to be deleted.
  *	@gro_max_size:	Maximum size of aggregated packet in generic
  *			receive offload (GRO)
+ *	@gro_ipv6_max_size:	Maximum size of aggregated packet in generic
+ *				receive offload (GRO), for IPv6
  *
  *	@dev_addr_shadow:	Copy of @dev_addr to catch direct writes.
  *	@linkwatch_dev_tracker:	refcount tracker used by linkwatch.
@@ -2140,6 +2142,7 @@ struct net_device {
 	int			napi_defer_hard_irqs;
 #define GRO_MAX_SIZE		65536
 	unsigned int		gro_max_size;
+	unsigned int		gro_ipv6_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
 
@@ -4920,6 +4923,13 @@ static inline void netif_set_gso_ipv6_max_size(struct net_device *dev,
 	WRITE_ONCE(dev->gso_ipv6_max_size, size);
 }
 
+static inline void netif_set_gro_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	/* This pairs with the READ_ONCE() in skb_gro_receive() */
+	WRITE_ONCE(dev->gro_ipv6_max_size, size);
+}
+
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
 					int mac_len)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 048a9c848a3a39596b6c3135553fdfb9a1fe37d2..9baa084fe2c6762b05029c4692cfd9c4646bb916 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -365,6 +365,7 @@ enum {
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
 	IFLA_GSO_IPV6_MAX_SIZE,
+	IFLA_GRO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 7dbedec0903279ece0cb1199969f732a4dc35cd2..64ec72e5fdec9a642226e3efdefb93ad2c1d134d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10463,6 +10463,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gro_max_size = GRO_MAX_SIZE;
 	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
 	dev->gso_ipv6_max_size = GSO_MAX_SIZE;
+	dev->gro_ipv6_max_size = GRO_MAX_SIZE;
 
 	dev->upper_level = 1;
 	dev->lower_level = 1;
diff --git a/net/core/gro.c b/net/core/gro.c
index ee5e7e889d8bdd8db18715afc7bb6c1c759c9c23..f795393a883b08d71bfcfbd2d897e1ddcddf6fce 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -136,11 +136,27 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	unsigned int new_truesize;
 	struct sk_buff *lp;
 
+	if (unlikely(NAPI_GRO_CB(skb)->flush))
+		return -E2BIG;
+
 	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
 	gro_max_size = READ_ONCE(p->dev->gro_max_size);
 
-	if (unlikely(p->len + len >= gro_max_size || NAPI_GRO_CB(skb)->flush))
-		return -E2BIG;
+	if (unlikely(p->len + len >= gro_max_size)) {
+		/* pairs with WRITE_ONCE() in netif_set_gro_ipv6_max_size() */
+		unsigned int gro6_max_size = READ_ONCE(p->dev->gro_ipv6_max_size);
+
+		if (gro6_max_size > gro_max_size &&
+		    p->protocol == htons(ETH_P_IPV6) &&
+		    skb_headroom(p) >= sizeof(struct hop_jumbo_hdr) &&
+		    ipv6_hdr(p)->nexthdr == IPPROTO_TCP &&
+		    !p->encapsulation)
+			gro_max_size = gro6_max_size;
+
+		if (p->len + len >= gro_max_size)
+			return -E2BIG;
+	}
+
 
 	lp = NAPI_GRO_CB(p)->last;
 	pinfo = skb_shinfo(lp);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a60efa6d0fac1b9ce209126bad946a3d2bd24ac3..48158119c6d24ef3d16b1cff80c49525bd51678c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1029,6 +1029,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GSO_IPV6_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_GRO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1736,6 +1737,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
 	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
 	    nla_put_u32(skb, IFLA_GSO_IPV6_MAX_SIZE, dev->gso_ipv6_max_size) ||
+	    nla_put_u32(skb, IFLA_GRO_IPV6_MAX_SIZE, dev->gro_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1891,6 +1893,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
 	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 	[IFLA_GSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_GRO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2786,6 +2789,15 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 	}
 
+	if (tb[IFLA_GRO_IPV6_MAX_SIZE]) {
+		u32 max_size = nla_get_u32(tb[IFLA_GRO_IPV6_MAX_SIZE]);
+
+		if (dev->gro_ipv6_max_size ^ max_size) {
+			netif_set_gro_ipv6_max_size(dev, max_size);
+			status |= DO_SETLINK_MODIFIED;
+		}
+	}
+
 	if (tb[IFLA_GSO_MAX_SEGS]) {
 		u32 max_segs = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]);
 
@@ -3264,6 +3276,9 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 	if (tb[IFLA_GSO_IPV6_MAX_SIZE])
 		netif_set_gso_ipv6_max_size(dev,
 			nla_get_u32(tb[IFLA_GSO_IPV6_MAX_SIZE]));
+	if (tb[IFLA_GRO_IPV6_MAX_SIZE])
+		netif_set_gro_ipv6_max_size(dev,
+			nla_get_u32(tb[IFLA_GRO_IPV6_MAX_SIZE]));
 
 	return dev;
 }
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index e40cd575607872d3bff3bc1971df8c6426290562..567008925a8be6900aa048c7ebb12684b2eebb4b 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -350,6 +350,7 @@ enum {
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
 	IFLA_GSO_IPV6_MAX_SIZE,
+	IFLA_GRO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (6 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-04  4:33   ` David Ahern
  2022-03-05 16:55   ` David Ahern
  2022-03-03 18:16 ` [PATCH v2 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.

Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.

This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/ipv6.h  |  1 +
 net/ipv6/ip6_output.c | 22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 16870f86c74d3d1f5dfb7edac1e7db85f1ef6755..93b273db1c9926aba4199f486ce90778311916f5 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -144,6 +144,7 @@ struct inet6_skb_parm {
 #define IP6SKB_L3SLAVE         64
 #define IP6SKB_JUMBOGRAM      128
 #define IP6SKB_SEG6	      256
+#define IP6SKB_FAKEJUMBO      512
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 50db9b20d746bc59c7ef7114492db8b9585c575b..38a8e1c9894cd99ecbec5968fcc97549ea0c7508 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -180,7 +180,9 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
 #endif
 
 	mtu = ip6_skb_dst_mtu(skb);
-	if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
+	if (skb_is_gso(skb) &&
+	    !(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
+	    !skb_gso_validate_network_len(skb, mtu))
 		return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
 
 	if ((skb->len > mtu && !skb_is_gso(skb)) ||
@@ -251,6 +253,8 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct inet6_dev *idev = ip6_dst_idev(dst);
+	struct hop_jumbo_hdr *hop_jumbo;
+	int hoplen = sizeof(*hop_jumbo);
 	unsigned int head_room;
 	struct ipv6hdr *hdr;
 	u8  proto = fl6->flowi6_proto;
@@ -258,7 +262,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	int hlimit = -1;
 	u32 mtu;
 
-	head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dev);
+	head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev);
 	if (opt)
 		head_room += opt->opt_nflen + opt->opt_flen;
 
@@ -281,6 +285,20 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 					     &fl6->saddr);
 	}
 
+	if (unlikely(seg_len > IPV6_MAXPLEN)) {
+		hop_jumbo = skb_push(skb, hoplen);
+
+		hop_jumbo->nexthdr = proto;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen);
+
+		proto = IPPROTO_HOPOPTS;
+		seg_len = 0;
+		IP6CB(skb)->flags |= IP6SKB_FAKEJUMBO;
+	}
+
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	hdr = ipv6_hdr(skb);
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 09/14] net: loopback: enable BIG TCP packets
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (7 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-03 18:16 ` [PATCH v2 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Set the driver limit to 512 KB per TSO ipv6 packet.

This allows the admin/user to set a GSO ipv6 limit up to this value.

Tested:

ip link set dev lo gso_ipv6_max_size 200000
netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &

tcpdump shows :

18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/loopback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 720394c0639b20a2fd6262e4ee9d5813c02802f1..9c21d18f0aa75a310ac600081b450f6312ff16fc 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -191,6 +191,8 @@ static void gen_lo_setup(struct net_device *dev,
 	dev->netdev_ops		= dev_ops;
 	dev->needs_free_netdev	= true;
 	dev->priv_destructor	= dev_destructor;
+
+	netif_set_tso_ipv6_max_size(dev, 512 * 1024);
 }
 
 /* The loopback device is special. There is only one instance
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 10/14] bonding: update dev->tso_ipv6_max_size
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (8 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-03 18:16 ` [PATCH v2 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Use the minimal value found in the set of lower devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/bonding/bond_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 55e0ba2a163d0d9c17fdaf47a49d7a2190959651..357188c1f00e6e3919740adb6369d75712fc4e64 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1420,6 +1420,7 @@ static void bond_compute_features(struct bonding *bond)
 	struct slave *slave;
 	unsigned short max_hard_header_len = ETH_HLEN;
 	unsigned int gso_max_size = GSO_MAX_SIZE;
+	unsigned int tso_ipv6_max_size = ~0U;
 	u16 gso_max_segs = GSO_MAX_SEGS;
 
 	if (!bond_has_slaves(bond))
@@ -1450,6 +1451,7 @@ static void bond_compute_features(struct bonding *bond)
 			max_hard_header_len = slave->dev->hard_header_len;
 
 		gso_max_size = min(gso_max_size, slave->dev->gso_max_size);
+		tso_ipv6_max_size = min(tso_ipv6_max_size, slave->dev->tso_ipv6_max_size);
 		gso_max_segs = min(gso_max_segs, slave->dev->gso_max_segs);
 	}
 	bond_dev->hard_header_len = max_hard_header_len;
@@ -1465,6 +1467,7 @@ static void bond_compute_features(struct bonding *bond)
 	bond_dev->mpls_features = mpls_features;
 	netif_set_gso_max_segs(bond_dev, gso_max_segs);
 	netif_set_gso_max_size(bond_dev, gso_max_size);
+	netif_set_tso_ipv6_max_size(bond_dev, tso_ipv6_max_size);
 
 	bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
 	if ((bond_dev->priv_flags & IFF_XMIT_DST_RELEASE_PERM) &&
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 11/14] macvlan: enable BIG TCP Packets
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (9 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-03 18:16 ` [PATCH v2 net-next 12/14] ipvlan: " Eric Dumazet
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Inherit tso_ipv6_max_size from lower device.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/macvlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index d87c06c317ede4d757b10722a258668d24a25f1d..d921cd84b23818c3d4ea88134c77a2365e6d9caa 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -902,6 +902,7 @@ static int macvlan_init(struct net_device *dev)
 	dev->hw_enc_features    |= dev->features;
 	netif_set_gso_max_size(dev, lowerdev->gso_max_size);
 	netif_set_gso_max_segs(dev, lowerdev->gso_max_segs);
+	netif_set_tso_ipv6_max_size(dev, lowerdev->tso_ipv6_max_size);
 	dev->hard_header_len	= lowerdev->hard_header_len;
 	macvlan_set_lockdep_class(dev);
 
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 12/14] ipvlan: enable BIG TCP Packets
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (10 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-03 18:16 ` [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
  2022-03-03 18:16 ` [PATCH v2 net-next 14/14] mlx5: " Eric Dumazet
  13 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Inherit tso_ipv6_max_size from physical device.

Tested:

eth0 tso_ipv6_max_size is set to 524288

ip link add link eth0 name ipvl1 type ipvlan
ip -d link show ipvl1
10: ipvl1@eth0:...
	ipvlan  mode l3 bridge addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 gso_ipv6_max_size 65535 tso_ipv6_max_size 524288 gro_ipv6_max_size 65536

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 696e245f6d009d4d5d4a9c3523e4aa1e5d0f8bb6..4de30df25f19b32a78a06d18c99e94662307b7fb 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -141,6 +141,7 @@ static int ipvlan_init(struct net_device *dev)
 	dev->hw_enc_features |= dev->features;
 	netif_set_gso_max_size(dev, phy_dev->gso_max_size);
 	netif_set_gso_max_segs(dev, phy_dev->gso_max_segs);
+	netif_set_tso_ipv6_max_size(dev, phy_dev->tso_ipv6_max_size);
 	dev->hard_header_len = phy_dev->hard_header_len;
 
 	netdev_lockdep_set_classes(dev);
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (11 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 12/14] ipvlan: " Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-08 16:03   ` Tariq Toukan
  2022-03-03 18:16 ` [PATCH v2 net-next 14/14] mlx5: " Eric Dumazet
  13 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet, Tariq Toukan

From: Eric Dumazet <edumazet@google.com>

mlx4 supports LSOv2 just fine.

IPv6 stack inserts a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore the HBH header when populating TX descriptor.

Tested:

Before: (not enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_ipv6_max_size 65536 gro_ipv6_max_size 65536

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   6591.45  0.86   1.34   62.490  97.446
262144 540000

After: (enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_ipv6_max_size 185000 gro_ipv6_max_size 185000

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   8383.95  0.95   1.01   54.432  57.584
262144 540000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 ++
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++++++++----
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c61dc7ae0c056a4dbcf24297549f6b1b5cc25d92..76cb93f5e5240c54f6f4c57e39739376206b4f34 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3417,6 +3417,9 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	dev->min_mtu = ETH_MIN_MTU;
 	dev->max_mtu = priv->max_mtu;
 
+	/* supports LSOv2 packets, 512KB limit has been tested. */
+	netif_set_tso_ipv6_max_size(dev, 512 * 1024);
+
 	mdev->pndev[port] = dev;
 	mdev->upper[port] = NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 817f4154b86d599cd593876ec83529051d95fe2f..c89b3e8094e7d8cfb11aaa6cc4ad63bf3ad5934e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -44,6 +44,7 @@
 #include <linux/ipv6.h>
 #include <linux/moduleparam.h>
 #include <linux/indirect_call_wrapper.h>
+#include <net/ipv6.h>
 
 #include "mlx4_en.h"
 
@@ -635,19 +636,28 @@ static int get_real_size(const struct sk_buff *skb,
 			 struct net_device *dev,
 			 int *lso_header_size,
 			 bool *inline_ok,
-			 void **pfrag)
+			 void **pfrag,
+			 int *hopbyhop)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int real_size;
 
 	if (shinfo->gso_size) {
 		*inline_ok = false;
-		if (skb->encapsulation)
+		*hopbyhop = 0;
+		if (skb->encapsulation) {
 			*lso_header_size = (skb_inner_transport_header(skb) - skb->data) + inner_tcp_hdrlen(skb);
-		else
+		} else {
+			/* Detects large IPV6 TCP packets and prepares for removal of
+			 * HBH header that has been pushed by ip6_xmit(),
+			 * mainly so that tcpdump can dissect them.
+			 */
+			if (ipv6_has_hopopt_jumbo(skb))
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
 			*lso_header_size = skb_transport_offset(skb) + tcp_hdrlen(skb);
+		}
 		real_size = CTRL_SIZE + shinfo->nr_frags * DS_SIZE +
-			ALIGN(*lso_header_size + 4, DS_SIZE);
+			ALIGN(*lso_header_size - *hopbyhop + 4, DS_SIZE);
 		if (unlikely(*lso_header_size != skb_headlen(skb))) {
 			/* We add a segment for the skb linear buffer only if
 			 * it contains data */
@@ -874,6 +884,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	int desc_size;
 	int real_size;
 	u32 index, bf_index;
+	struct ipv6hdr *h6;
 	__be32 op_own;
 	int lso_header_size;
 	void *fragptr = NULL;
@@ -882,6 +893,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool stop_queue;
 	bool inline_ok;
 	u8 data_offset;
+	int hopbyhop;
 	bool bf_ok;
 
 	tx_ind = skb_get_queue_mapping(skb);
@@ -891,7 +903,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto tx_drop;
 
 	real_size = get_real_size(skb, shinfo, dev, &lso_header_size,
-				  &inline_ok, &fragptr);
+				  &inline_ok, &fragptr, &hopbyhop);
 	if (unlikely(!real_size))
 		goto tx_drop_count;
 
@@ -944,7 +956,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		data = &tx_desc->data;
 		data_offset = offsetof(struct mlx4_en_tx_desc, data);
 	} else {
-		int lso_align = ALIGN(lso_header_size + 4, DS_SIZE);
+		int lso_align = ALIGN(lso_header_size - hopbyhop + 4, DS_SIZE);
 
 		data = (void *)&tx_desc->lso + lso_align;
 		data_offset = offsetof(struct mlx4_en_tx_desc, lso) + lso_align;
@@ -1009,14 +1021,31 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 			((ring->prod & ring->size) ?
 				cpu_to_be32(MLX4_EN_BIT_DESC_OWN) : 0);
 
+		lso_header_size -= hopbyhop;
 		/* Fill in the LSO prefix */
 		tx_desc->lso.mss_hdr_size = cpu_to_be32(
 			shinfo->gso_size << 16 | lso_header_size);
 
-		/* Copy headers;
-		 * note that we already verified that it is linear */
-		memcpy(tx_desc->lso.header, skb->data, lso_header_size);
 
+		if (unlikely(hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(tx_desc->lso.header, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)tx_desc->lso.header + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			/* Copy headers;
+			 * note that we already verified that it is linear
+			 */
+			memcpy(tx_desc->lso.header, skb->data, lso_header_size);
+		}
 		ring->tso_packets++;
 
 		i = shinfo->gso_segs;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (12 preceding siblings ...)
  2022-03-03 18:16 ` [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
@ 2022-03-03 18:16 ` Eric Dumazet
  2022-03-04  4:42   ` David Ahern
  2022-03-08 16:02   ` Tariq Toukan
  13 siblings, 2 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-03 18:16 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Eric Dumazet, Saeed Mahameed, Leon Romanovsky

From: Coco Li <lixiaoyan@google.com>

mlx5 supports LSOv2.

IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore/skip this HBH header when populating TX descriptor.

Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.

v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 82 +++++++++++++++----
 2 files changed, 67 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
+	netif_set_tso_ipv6_max_size(netdev, 512 * 1024);
 	mlx5e_set_netdev_dev_addr(netdev);
 	mlx5e_ipsec_build_netdev(priv);
 	mlx5e_tls_build_netdev(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2dc48406cd08d21ff94f665cd61ab9227f351215..c6f6ca2d216692e1d3fd99e540198b11145788cd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -40,6 +40,7 @@
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec_rxtx.h"
 #include "en/ptp.h"
+#include <net/ipv6.h>
 
 static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma)
 {
@@ -130,23 +131,32 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		sq->stats->csum_none++;
 }
 
+/* Returns the number of header bytes that we plan
+ * to inline later in the transmit descriptor
+ */
 static inline u16
-mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb)
+mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb, int *hopbyhop)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
 	u16 ihs;
 
+	*hopbyhop = 0;
 	if (skb->encapsulation) {
 		ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
 		stats->tso_inner_packets++;
 		stats->tso_inner_bytes += skb->len - ihs;
 	} else {
-		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
 			ihs = skb_transport_offset(skb) + sizeof(struct udphdr);
-		else
+		} else {
 			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
+			if (ipv6_has_hopopt_jumbo(skb)) {
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
+				ihs -= sizeof(struct hop_jumbo_hdr);
+			}
+		}
 		stats->tso_packets++;
-		stats->tso_bytes += skb->len - ihs;
+		stats->tso_bytes += skb->len - ihs - *hopbyhop;
 	}
 
 	return ihs;
@@ -208,6 +218,7 @@ struct mlx5e_tx_attr {
 	__be16 mss;
 	u16 insz;
 	u8 opcode;
+	u8 hopbyhop;
 };
 
 struct mlx5e_tx_wqe_attr {
@@ -244,14 +255,16 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5e_sq_stats *stats = sq->stats;
 
 	if (skb_is_gso(skb)) {
-		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb);
+		int hopbyhop;
+		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
 
 		*attr = (struct mlx5e_tx_attr) {
 			.opcode    = MLX5_OPCODE_LSO,
 			.mss       = cpu_to_be16(skb_shinfo(skb)->gso_size),
 			.ihs       = ihs,
 			.num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs,
-			.headlen   = skb_headlen(skb) - ihs,
+			.headlen   = skb_headlen(skb) - ihs - hopbyhop,
+			.hopbyhop  = hopbyhop,
 		};
 
 		stats->packets += skb_shinfo(skb)->gso_segs;
@@ -365,7 +378,8 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5_wqe_eth_seg  *eseg;
 	struct mlx5_wqe_data_seg *dseg;
 	struct mlx5e_tx_wqe_info *wi;
-
+	u16 ihs = attr->ihs;
+	struct ipv6hdr *h6;
 	struct mlx5e_sq_stats *stats = sq->stats;
 	int num_dma;
 
@@ -379,15 +393,36 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	eseg->mss = attr->mss;
 
-	if (attr->ihs) {
-		if (skb_vlan_tag_present(skb)) {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs + VLAN_HLEN);
-			mlx5e_insert_vlan(eseg->inline_hdr.start, skb, attr->ihs);
+	if (ihs) {
+		u8 *start = eseg->inline_hdr.start;
+
+		if (unlikely(attr->hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			if (skb_vlan_tag_present(skb)) {
+				mlx5e_insert_vlan(start, skb, ETH_HLEN + sizeof(*h6));
+				ihs += VLAN_HLEN;
+				h6 = (struct ipv6hdr *)(start + sizeof(struct vlan_ethhdr));
+			} else {
+				memcpy(start, skb->data, ETH_HLEN + sizeof(*h6));
+				h6 = (struct ipv6hdr *)(start + ETH_HLEN);
+			}
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else if (skb_vlan_tag_present(skb)) {
+			mlx5e_insert_vlan(start, skb, ihs);
+			ihs += VLAN_HLEN;
 			stats->added_vlan_packets++;
 		} else {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs);
-			memcpy(eseg->inline_hdr.start, skb->data, attr->ihs);
+			memcpy(start, skb->data, ihs);
 		}
+		eseg->inline_hdr.sz |= cpu_to_be16(ihs);
 		dseg += wqe_attr->ds_cnt_inl;
 	} else if (skb_vlan_tag_present(skb)) {
 		eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN);
@@ -398,7 +433,7 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	}
 
 	dseg += wqe_attr->ds_cnt_ids;
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs + attr->hopbyhop,
 					  attr->headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
@@ -918,12 +953,27 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	eseg->mss = attr.mss;
 
 	if (attr.ihs) {
-		memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		if (unlikely(attr.hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(eseg->inline_hdr.start, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)eseg->inline_hdr.start + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		}
 		eseg->inline_hdr.sz = cpu_to_be16(attr.ihs);
 		dseg += wqe_attr.ds_cnt_inl;
 	}
 
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs + attr.hopbyhop,
 					  attr.headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-03 18:16 ` [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
@ 2022-03-04  4:33   ` David Ahern
  2022-03-04 15:48     ` Alexander H Duyck
  2022-03-04 17:47     ` Eric Dumazet
  2022-03-05 16:55   ` David Ahern
  1 sibling, 2 replies; 36+ messages in thread
From: David Ahern @ 2022-03-04  4:33 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Alexander Duyck

On 3/3/22 11:16 AM, Eric Dumazet wrote:
> From: Coco Li <lixiaoyan@google.com>
> 
> Instead of simply forcing a 0 payload_len in IPv6 header,
> implement RFC 2675 and insert a custom extension header.
> 
> Note that only TCP stack is currently potentially generating
> jumbograms, and that this extension header is purely local,
> it wont be sent on a physical link.
> 
> This is needed so that packet capture (tcpdump and friends)
> can properly dissect these large packets.
> 


I am fairly certain I know how you are going to respond, but I will ask
this anyways :-) :

The networking stack as it stands today does not care that skb->len >
64kB and nothing stops a driver from setting max gso size to be > 64kB.
Sure, packet socket apps (tcpdump) get confused but if the h/w supports
the larger packet size it just works.

The jumbogram header is getting adding at the L3/IPv6 layer and then
removed by the drivers before pushing to hardware. So, the only benefit
of the push and pop of the jumbogram header is for packet sockets and
tc/ebpf programs - assuming those programs understand the header
(tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
it is a standard header so apps have a chance to understand the larger
packet size, but what is the likelihood that random apps or even ebpf
programs will understand it?

Alternative solutions to the packet socket (ebpf programs have access to
skb->len) problem would allow IPv4 to join the Big TCP party. I am
wondering how feasible an alternative solution is to get large packet
sizes across the board with less overhead and changes.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE
  2022-03-03 18:16 ` [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
@ 2022-03-04  4:37   ` David Ahern
  2022-03-04 17:16     ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: David Ahern @ 2022-03-04  4:37 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Alexander Duyck

On 3/3/22 11:16 AM, Eric Dumazet wrote:
> From: Coco Li <lixiaoyan@google.com>
> 
> Enable GRO to have IPv6 specific limit for max packet size.
> 
> This patch introduces new dev->gro_ipv6_max_size
> that is modifiable through ip link.
> 
> ip link set dev eth0 gro_ipv6_max_size 185000
> 
> Note that this value is only considered if bigger than
> gro_max_size, and for non encapsulated TCP/ipv6 packets.
> 

What is the point of a max size for the Rx path that is per ingress
device? If the stack understands the larger packets then the ingress
device limits should not matter. (yes, I realize the existing code has
it this way, so I guess this is a historical question)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-03 18:16 ` [PATCH v2 net-next 14/14] mlx5: " Eric Dumazet
@ 2022-03-04  4:42   ` David Ahern
  2022-03-04 17:14     ` Eric Dumazet
  2022-03-08 16:02   ` Tariq Toukan
  1 sibling, 1 reply; 36+ messages in thread
From: David Ahern @ 2022-03-04  4:42 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Alexander Duyck, Saeed Mahameed,
	Leon Romanovsky

On 3/3/22 11:16 AM, Eric Dumazet wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>  
>  	netdev->priv_flags       |= IFF_UNICAST_FLT;
>  
> +	netif_set_tso_ipv6_max_size(netdev, 512 * 1024);


How does the ConnectX hardware handle fairness for such large packet
sizes? For 1500 MTU this means a single large TSO can cause the H/W to
generate 349 MTU sized packets. Even a 4k MTU means 128 packets. This
has an effect on the rate of packets hitting the next hop switch for
example.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04  4:33   ` David Ahern
@ 2022-03-04 15:48     ` Alexander H Duyck
  2022-03-04 17:09       ` Eric Dumazet
  2022-03-04 17:47     ` Eric Dumazet
  1 sibling, 1 reply; 36+ messages in thread
From: Alexander H Duyck @ 2022-03-04 15:48 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Alexander Duyck

On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> > 
> > Instead of simply forcing a 0 payload_len in IPv6 header,
> > implement RFC 2675 and insert a custom extension header.
> > 
> > Note that only TCP stack is currently potentially generating
> > jumbograms, and that this extension header is purely local,
> > it wont be sent on a physical link.
> > 
> > This is needed so that packet capture (tcpdump and friends)
> > can properly dissect these large packets.
> > 
> 
> 
> I am fairly certain I know how you are going to respond, but I will ask
> this anyways :-) :
> 
> The networking stack as it stands today does not care that skb->len >
> 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> the larger packet size it just works.
> 
> The jumbogram header is getting adding at the L3/IPv6 layer and then
> removed by the drivers before pushing to hardware. So, the only benefit
> of the push and pop of the jumbogram header is for packet sockets and
> tc/ebpf programs - assuming those programs understand the header
> (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> it is a standard header so apps have a chance to understand the larger
> packet size, but what is the likelihood that random apps or even ebpf
> programs will understand it?
> 
> Alternative solutions to the packet socket (ebpf programs have access to
> skb->len) problem would allow IPv4 to join the Big TCP party. I am
> wondering how feasible an alternative solution is to get large packet
> sizes across the board with less overhead and changes.

I agree that the header insertion and removal seems like a lot of extra
overhead for the sake of correctness. In the Microsoft case I am pretty
sure their LSOv2 supported both v4 and v6. I think we could do
something similar, we would just need to make certain the device
supports it and as such maybe it would make sense to implement it as a
gso type flag?

Could we handle the length field like we handle the checksum and place
a value in there that we know is wrong, but could be used to provide
additional data? Perhaps we could even use it to store the MSS in the
form of the length of the first packet so if examined, the packet would
look like the first frame of the flow with a set of trailing data.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04 15:48     ` Alexander H Duyck
@ 2022-03-04 17:09       ` Eric Dumazet
  2022-03-04 19:00         ` Alexander H Duyck
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-04 17:09 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: David Ahern, Eric Dumazet, David S . Miller, Jakub Kicinski,
	netdev, Coco Li, Alexander Duyck

On Fri, Mar 4, 2022 at 7:48 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> > On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > > From: Coco Li <lixiaoyan@google.com>
> > >
> > > Instead of simply forcing a 0 payload_len in IPv6 header,
> > > implement RFC 2675 and insert a custom extension header.
> > >
> > > Note that only TCP stack is currently potentially generating
> > > jumbograms, and that this extension header is purely local,
> > > it wont be sent on a physical link.
> > >
> > > This is needed so that packet capture (tcpdump and friends)
> > > can properly dissect these large packets.
> > >
> >
> >
> > I am fairly certain I know how you are going to respond, but I will ask
> > this anyways :-) :
> >
> > The networking stack as it stands today does not care that skb->len >
> > 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> > Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> > the larger packet size it just works.
> >
> > The jumbogram header is getting adding at the L3/IPv6 layer and then
> > removed by the drivers before pushing to hardware. So, the only benefit
> > of the push and pop of the jumbogram header is for packet sockets and
> > tc/ebpf programs - assuming those programs understand the header
> > (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> > it is a standard header so apps have a chance to understand the larger
> > packet size, but what is the likelihood that random apps or even ebpf
> > programs will understand it?
> >
> > Alternative solutions to the packet socket (ebpf programs have access to
> > skb->len) problem would allow IPv4 to join the Big TCP party. I am
> > wondering how feasible an alternative solution is to get large packet
> > sizes across the board with less overhead and changes.
>
> I agree that the header insertion and removal seems like a lot of extra
> overhead for the sake of correctness. In the Microsoft case I am pretty
> sure their LSOv2 supported both v4 and v6. I think we could do
> something similar, we would just need to make certain the device
> supports it and as such maybe it would make sense to implement it as a
> gso type flag?
>
> Could we handle the length field like we handle the checksum and place
> a value in there that we know is wrong, but could be used to provide
> additional data? Perhaps we could even use it to store the MSS in the
> form of the length of the first packet so if examined, the packet would
> look like the first frame of the flow with a set of trailing data.
>

I am a bit sad you did not give all this feedback back in August when
I presented BIG TCP.

We did a lot of work in the last 6 months to implement, test all this,
making sure this worked.

I am not sure I want to spend another 6 months implementing what you suggest.

For instance, input path will not like packets larger than 64KB.

There is this thing trimming padding bytes, you probably do not want
to mess with this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-04  4:42   ` David Ahern
@ 2022-03-04 17:14     ` Eric Dumazet
  2022-03-05 16:36       ` David Ahern
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-04 17:14 UTC (permalink / raw)
  To: David Ahern
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck, Saeed Mahameed, Leon Romanovsky

On Thu, Mar 3, 2022 at 8:43 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >
> >       netdev->priv_flags       |= IFF_UNICAST_FLT;
> >
> > +     netif_set_tso_ipv6_max_size(netdev, 512 * 1024);
>
>
> How does the ConnectX hardware handle fairness for such large packet
> sizes? For 1500 MTU this means a single large TSO can cause the H/W to
> generate 349 MTU sized packets. Even a 4k MTU means 128 packets. This
> has an effect on the rate of packets hitting the next hop switch for
> example.

I think ConnectX cards interleave packets from all TX queues, at least
old CX3 have a parameter to control that.

Given that we already can send at line rate, from a single TX queue, I
do not see why presenting larger TSO packets
would change anything on the wire ?

Do you think ConnectX adds an extra gap on the wire at the end of a TSO train ?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE
  2022-03-04  4:37   ` David Ahern
@ 2022-03-04 17:16     ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-04 17:16 UTC (permalink / raw)
  To: David Ahern
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck

On Thu, Mar 3, 2022 at 8:37 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> >
> > Enable GRO to have IPv6 specific limit for max packet size.
> >
> > This patch introduces new dev->gro_ipv6_max_size
> > that is modifiable through ip link.
> >
> > ip link set dev eth0 gro_ipv6_max_size 185000
> >
> > Note that this value is only considered if bigger than
> > gro_max_size, and for non encapsulated TCP/ipv6 packets.
> >
>
> What is the point of a max size for the Rx path that is per ingress
> device? If the stack understands the larger packets then the ingress
> device limits should not matter. (yes, I realize the existing code has
> it this way, so I guess this is a historical question)

The point is to opt-in for this feature really.

Some software stack might not be ready yet.

For example, maybe you do not want to let GRO build skbs with
frag_list, because you know these skbs might cause problems later.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04  4:33   ` David Ahern
  2022-03-04 15:48     ` Alexander H Duyck
@ 2022-03-04 17:47     ` Eric Dumazet
  2022-03-05 16:46       ` David Ahern
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-04 17:47 UTC (permalink / raw)
  To: David Ahern
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck

On Thu, Mar 3, 2022 at 8:33 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> >
> > Instead of simply forcing a 0 payload_len in IPv6 header,
> > implement RFC 2675 and insert a custom extension header.
> >
> > Note that only TCP stack is currently potentially generating
> > jumbograms, and that this extension header is purely local,
> > it wont be sent on a physical link.
> >
> > This is needed so that packet capture (tcpdump and friends)
> > can properly dissect these large packets.
> >
>
>
> I am fairly certain I know how you are going to respond, but I will ask
> this anyways :-) :
>
> The networking stack as it stands today does not care that skb->len >
> 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> the larger packet size it just works.

Observability is key. "just works" is a bold claim.

>
> The jumbogram header is getting adding at the L3/IPv6 layer and then
> removed by the drivers before pushing to hardware. So, the only benefit
> of the push and pop of the jumbogram header is for packet sockets and
> tc/ebpf programs - assuming those programs understand the header
> (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> it is a standard header so apps have a chance to understand the larger
> packet size, but what is the likelihood that random apps or even ebpf
> programs will understand it?

Can you explain to me what you are referring to by " random apps" exactly ?
TCP does not expose to user space any individual packet length.



>
> Alternative solutions to the packet socket (ebpf programs have access to
> skb->len) problem would allow IPv4 to join the Big TCP party. I am
> wondering how feasible an alternative solution is to get large packet
> sizes across the board with less overhead and changes.

You know, I think I already answered this question 6 months ago.

We need to carry an extra metadata to carry how much TCP payload is in a packet,
both on RX and TX side.

Adding an skb field for that was not an option for me.

Adding a 8 bytes header is basically free, the headers need to be in cpu caches
when the header is added/removed.

This is zero cost on current cpus, compared to the gains.

I think you focus on TSO side, which is only 25% of the possible gains
that BIG TCP was seeking for.

We covered both RX and TX with a common mechanism.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04 17:09       ` Eric Dumazet
@ 2022-03-04 19:00         ` Alexander H Duyck
  2022-03-04 19:13           ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Alexander H Duyck @ 2022-03-04 19:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Ahern, Eric Dumazet, David S . Miller, Jakub Kicinski,
	netdev, Coco Li, Alexander Duyck

On Fri, 2022-03-04 at 09:09 -0800, Eric Dumazet wrote:
> On Fri, Mar 4, 2022 at 7:48 AM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> > 
> > On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> > > On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > > > From: Coco Li <lixiaoyan@google.com>
> > > > 
> > > > Instead of simply forcing a 0 payload_len in IPv6 header,
> > > > implement RFC 2675 and insert a custom extension header.
> > > > 
> > > > Note that only TCP stack is currently potentially generating
> > > > jumbograms, and that this extension header is purely local,
> > > > it wont be sent on a physical link.
> > > > 
> > > > This is needed so that packet capture (tcpdump and friends)
> > > > can properly dissect these large packets.
> > > > 
> > > 
> > > 
> > > I am fairly certain I know how you are going to respond, but I will ask
> > > this anyways :-) :
> > > 
> > > The networking stack as it stands today does not care that skb->len >
> > > 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> > > Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> > > the larger packet size it just works.
> > > 
> > > The jumbogram header is getting adding at the L3/IPv6 layer and then
> > > removed by the drivers before pushing to hardware. So, the only benefit
> > > of the push and pop of the jumbogram header is for packet sockets and
> > > tc/ebpf programs - assuming those programs understand the header
> > > (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> > > it is a standard header so apps have a chance to understand the larger
> > > packet size, but what is the likelihood that random apps or even ebpf
> > > programs will understand it?
> > > 
> > > Alternative solutions to the packet socket (ebpf programs have access to
> > > skb->len) problem would allow IPv4 to join the Big TCP party. I am
> > > wondering how feasible an alternative solution is to get large packet
> > > sizes across the board with less overhead and changes.
> > 
> > I agree that the header insertion and removal seems like a lot of extra
> > overhead for the sake of correctness. In the Microsoft case I am pretty
> > sure their LSOv2 supported both v4 and v6. I think we could do
> > something similar, we would just need to make certain the device
> > supports it and as such maybe it would make sense to implement it as a
> > gso type flag?
> > 
> > Could we handle the length field like we handle the checksum and place
> > a value in there that we know is wrong, but could be used to provide
> > additional data? Perhaps we could even use it to store the MSS in the
> > form of the length of the first packet so if examined, the packet would
> > look like the first frame of the flow with a set of trailing data.
> > 
> 
> I am a bit sad you did not give all this feedback back in August when
> I presented BIG TCP.
> 

As I recall, I was thinking along the same lines as what you have done
here, but Dave's question about including IPv4 does bring up an
interesting point. And the Microsoft version supported both.

> We did a lot of work in the last 6 months to implement, test all this,
> making sure this worked.
> 
> I am not sure I want to spend another 6 months implementing what you suggest.

I am not saying we have to do this. I am simply stating a "what if"
just to gauge this approach. You could think of it as thinking out
loud, but in written form.

> For instance, input path will not like packets larger than 64KB.
> 
> There is this thing trimming padding bytes, you probably do not want
> to mess with this.

I had overlooked the fact that this is being used on the input path,
the trimming would be an issue. I suppose the fact that the LSOv2
didn't have an Rx counterpart would be one reason for us to not
consider the IPv4 approach.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04 19:00         ` Alexander H Duyck
@ 2022-03-04 19:13           ` Eric Dumazet
  2022-03-05 16:53             ` David Ahern
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-04 19:13 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: David Ahern, Eric Dumazet, David S . Miller, Jakub Kicinski,
	netdev, Coco Li, Alexander Duyck

On Fri, Mar 4, 2022 at 11:00 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Fri, 2022-03-04 at 09:09 -0800, Eric Dumazet wrote:
> > On Fri, Mar 4, 2022 at 7:48 AM Alexander H Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Thu, 2022-03-03 at 21:33 -0700, David Ahern wrote:
> > > > On 3/3/22 11:16 AM, Eric Dumazet wrote:
> > > > > From: Coco Li <lixiaoyan@google.com>
> > > > >
> > > > > Instead of simply forcing a 0 payload_len in IPv6 header,
> > > > > implement RFC 2675 and insert a custom extension header.
> > > > >
> > > > > Note that only TCP stack is currently potentially generating
> > > > > jumbograms, and that this extension header is purely local,
> > > > > it wont be sent on a physical link.
> > > > >
> > > > > This is needed so that packet capture (tcpdump and friends)
> > > > > can properly dissect these large packets.
> > > > >
> > > >
> > > >
> > > > I am fairly certain I know how you are going to respond, but I will ask
> > > > this anyways :-) :
> > > >
> > > > The networking stack as it stands today does not care that skb->len >
> > > > 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> > > > Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> > > > the larger packet size it just works.
> > > >
> > > > The jumbogram header is getting adding at the L3/IPv6 layer and then
> > > > removed by the drivers before pushing to hardware. So, the only benefit
> > > > of the push and pop of the jumbogram header is for packet sockets and
> > > > tc/ebpf programs - assuming those programs understand the header
> > > > (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> > > > it is a standard header so apps have a chance to understand the larger
> > > > packet size, but what is the likelihood that random apps or even ebpf
> > > > programs will understand it?
> > > >
> > > > Alternative solutions to the packet socket (ebpf programs have access to
> > > > skb->len) problem would allow IPv4 to join the Big TCP party. I am
> > > > wondering how feasible an alternative solution is to get large packet
> > > > sizes across the board with less overhead and changes.
> > >
> > > I agree that the header insertion and removal seems like a lot of extra
> > > overhead for the sake of correctness. In the Microsoft case I am pretty
> > > sure their LSOv2 supported both v4 and v6. I think we could do
> > > something similar, we would just need to make certain the device
> > > supports it and as such maybe it would make sense to implement it as a
> > > gso type flag?
> > >
> > > Could we handle the length field like we handle the checksum and place
> > > a value in there that we know is wrong, but could be used to provide
> > > additional data? Perhaps we could even use it to store the MSS in the
> > > form of the length of the first packet so if examined, the packet would
> > > look like the first frame of the flow with a set of trailing data.
> > >
> >
> > I am a bit sad you did not give all this feedback back in August when
> > I presented BIG TCP.
> >
>
> As I recall, I was thinking along the same lines as what you have done
> here, but Dave's question about including IPv4 does bring up an
> interesting point. And the Microsoft version supported both.

Yes, maybe they added metadata for that, and decided to let packet capture
in the dark, or changed tcpdump/wireshark to fetch/use this metadata ?

This was the first thing I tried one year ago, and eventually gave up,
because this was a no go for us.

Then seeing HBH Jumbo support being added recently in tcpdump,
I understood we could finally get visibility, and started BIG TCP using this.

I guess someone might add extra logic to allow ipv4 BIG TCP, if they
really need it,
I will not object to it.

>
> > We did a lot of work in the last 6 months to implement, test all this,
> > making sure this worked.
> >
> > I am not sure I want to spend another 6 months implementing what you suggest.
>
> I am not saying we have to do this. I am simply stating a "what if"
> just to gauge this approach. You could think of it as thinking out
> loud, but in written form.

Understood.

BTW I spent time adding a new gso_type flag, but also gave up because we have
no more room in features_t type.

Solving features_t exhaustion alone is a delicate topic.


>
> > For instance, input path will not like packets larger than 64KB.
> >
> > There is this thing trimming padding bytes, you probably do not want
> > to mess with this.
>
> I had overlooked the fact that this is being used on the input path,
> the trimming would be an issue. I suppose the fact that the LSOv2
> didn't have an Rx counterpart would be one reason for us to not
> consider the IPv4 approach.
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition
  2022-03-03 18:15 ` [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
@ 2022-03-04 19:26   ` Alexander H Duyck
  2022-03-04 19:28     ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: Alexander H Duyck @ 2022-03-04 19:26 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck

On Thu, 2022-03-03 at 10:15 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Following patches will need to add and remove local IPv6 jumbogram
> options to enable BIG TCP.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/net/ipv6.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 213612f1680c7c39f4c07f0c05b4e6cf34a7878e..95f405cde9e539d7909b6b89af2b956655f38b94 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -151,6 +151,17 @@ struct frag_hdr {
>  	__be32	identification;
>  };
>  
> +/*
> + * Jumbo payload option, as described in RFC 2676 2.
> + */

The RFC number is 2675 isn't it?





^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition
  2022-03-04 19:26   ` Alexander H Duyck
@ 2022-03-04 19:28     ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-04 19:28 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	David Ahern, Alexander Duyck

On Fri, Mar 4, 2022 at 11:26 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Thu, 2022-03-03 at 10:15 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Following patches will need to add and remove local IPv6 jumbogram
> > options to enable BIG TCP.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  include/net/ipv6.h | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> >
> > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > index 213612f1680c7c39f4c07f0c05b4e6cf34a7878e..95f405cde9e539d7909b6b89af2b956655f38b94 100644
> > --- a/include/net/ipv6.h
> > +++ b/include/net/ipv6.h
> > @@ -151,6 +151,17 @@ struct frag_hdr {
> >       __be32  identification;
> >  };
> >
> > +/*
> > + * Jumbo payload option, as described in RFC 2676 2.
> > + */
>
> The RFC number is 2675 isn't it?
>

You are right, thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-04 17:14     ` Eric Dumazet
@ 2022-03-05 16:36       ` David Ahern
  2022-03-05 17:57         ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: David Ahern @ 2022-03-05 16:36 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck, Saeed Mahameed, Leon Romanovsky

On 3/4/22 10:14 AM, Eric Dumazet wrote:
> On Thu, Mar 3, 2022 at 8:43 PM David Ahern <dsahern@kernel.org> wrote:
>>
>> On 3/3/22 11:16 AM, Eric Dumazet wrote:
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>> @@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>>>
>>>       netdev->priv_flags       |= IFF_UNICAST_FLT;
>>>
>>> +     netif_set_tso_ipv6_max_size(netdev, 512 * 1024);
>>
>>
>> How does the ConnectX hardware handle fairness for such large packet
>> sizes? For 1500 MTU this means a single large TSO can cause the H/W to
>> generate 349 MTU sized packets. Even a 4k MTU means 128 packets. This
>> has an effect on the rate of packets hitting the next hop switch for
>> example.
> 
> I think ConnectX cards interleave packets from all TX queues, at least
> old CX3 have a parameter to control that.
> 
> Given that we already can send at line rate, from a single TX queue, I
> do not see why presenting larger TSO packets
> would change anything on the wire ?
> 
> Do you think ConnectX adds an extra gap on the wire at the end of a TSO train ?

It's not about 1 queue, my question was along several lines. e.g,
1. the inter-packet gap for TSO generated packets. With 512kB packets
the burst is 8x from what it is today.

2. the fairness within hardware as 1 queue has potentially many 512kB
packets and the impact on other queues (e.g., higher latency?) since it
will take longer to split the larger packets into MTU sized packets.

It is really about understanding the change this new default size is
going to have on users.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04 17:47     ` Eric Dumazet
@ 2022-03-05 16:46       ` David Ahern
  2022-03-05 18:08         ` Eric Dumazet
  0 siblings, 1 reply; 36+ messages in thread
From: David Ahern @ 2022-03-05 16:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck

On 3/4/22 10:47 AM, Eric Dumazet wrote:
> On Thu, Mar 3, 2022 at 8:33 PM David Ahern <dsahern@kernel.org> wrote:
>>
>> On 3/3/22 11:16 AM, Eric Dumazet wrote:
>>> From: Coco Li <lixiaoyan@google.com>
>>>
>>> Instead of simply forcing a 0 payload_len in IPv6 header,
>>> implement RFC 2675 and insert a custom extension header.
>>>
>>> Note that only TCP stack is currently potentially generating
>>> jumbograms, and that this extension header is purely local,
>>> it wont be sent on a physical link.
>>>
>>> This is needed so that packet capture (tcpdump and friends)
>>> can properly dissect these large packets.
>>>
>>
>>
>> I am fairly certain I know how you are going to respond, but I will ask
>> this anyways :-) :
>>
>> The networking stack as it stands today does not care that skb->len >
>> 64kB and nothing stops a driver from setting max gso size to be > 64kB.
>> Sure, packet socket apps (tcpdump) get confused but if the h/w supports
>> the larger packet size it just works.
> 
> Observability is key. "just works" is a bold claim.
> 
>>
>> The jumbogram header is getting adding at the L3/IPv6 layer and then
>> removed by the drivers before pushing to hardware. So, the only benefit
>> of the push and pop of the jumbogram header is for packet sockets and
>> tc/ebpf programs - assuming those programs understand the header
>> (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
>> it is a standard header so apps have a chance to understand the larger
>> packet size, but what is the likelihood that random apps or even ebpf
>> programs will understand it?
> 
> Can you explain to me what you are referring to by " random apps" exactly ?
> TCP does not expose to user space any individual packet length.

TCP apps are not affected; they do not have direct access to L3 headers.
This is about packet sockets and ebpf programs and their knowledge of
the HBH header. This does not seem like a widely used feature and even
tcpdump only recently gained support for it (e.g.,  Ubuntu 20.04 does
not support it, 21.10 does). Given that what are the odds most packet
programs are affected by the change and if they need to have support we
could just as easily add that support in a way that gets both networking
layers working.

> 
> 
> 
>>
>> Alternative solutions to the packet socket (ebpf programs have access to
>> skb->len) problem would allow IPv4 to join the Big TCP party. I am
>> wondering how feasible an alternative solution is to get large packet
>> sizes across the board with less overhead and changes.
> 
> You know, I think I already answered this question 6 months ago.
> 
> We need to carry an extra metadata to carry how much TCP payload is in a packet,
> both on RX and TX side.
> 
> Adding an skb field for that was not an option for me.

Why? skb->len is not limited to a u16. The only affect is when skb->len
is used to fill in the ipv4/ipv6 header.

> 
> Adding a 8 bytes header is basically free, the headers need to be in cpu caches
> when the header is added/removed.
> 
> This is zero cost on current cpus, compared to the gains.
> 
> I think you focus on TSO side, which is only 25% of the possible gains
> that BIG TCP was seeking for.
> 
> We covered both RX and TX with a common mechanism.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-04 19:13           ` Eric Dumazet
@ 2022-03-05 16:53             ` David Ahern
  0 siblings, 0 replies; 36+ messages in thread
From: David Ahern @ 2022-03-05 16:53 UTC (permalink / raw)
  To: Eric Dumazet, Alexander H Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck

On 3/4/22 12:13 PM, Eric Dumazet wrote:
>> I am not saying we have to do this. I am simply stating a "what if"
>> just to gauge this approach. You could think of it as thinking out
>> loud, but in written form.

my point as well.

> 
> Understood.
> 
> BTW I spent time adding a new gso_type flag, but also gave up because we have
> no more room in features_t type.
> 
> Solving features_t exhaustion alone is a delicate topic.
> 
> 
>>
>>> For instance, input path will not like packets larger than 64KB.
>>>
>>> There is this thing trimming padding bytes, you probably do not want
>>> to mess with this.
>>
>> I had overlooked the fact that this is being used on the input path,
>> the trimming would be an issue. I suppose the fact that the LSOv2
>> didn't have an Rx counterpart would be one reason for us to not
>> consider the IPv4 approach.
>>

I'm aware of the trim on ingress; it can be properly handled. Drivers
(LRO) and the S/W GRO stack would know when it is exceeding the 64kB
length so the skb can be marked that is a large packet.

I am not trying to derail this set from getting merged. v6 has a
standard header for the large packet support, so certainly use it. That
said, it is always best in the long run for IPv4 and IPv6 to have
consistent feature support and implementation. Hence the asking about
alternative solutions that work for both.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-03 18:16 ` [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
  2022-03-04  4:33   ` David Ahern
@ 2022-03-05 16:55   ` David Ahern
  1 sibling, 0 replies; 36+ messages in thread
From: David Ahern @ 2022-03-05 16:55 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, Alexander Duyck

On 3/3/22 11:16 AM, Eric Dumazet wrote:
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 16870f86c74d3d1f5dfb7edac1e7db85f1ef6755..93b273db1c9926aba4199f486ce90778311916f5 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -144,6 +144,7 @@ struct inet6_skb_parm {
>  #define IP6SKB_L3SLAVE         64
>  #define IP6SKB_JUMBOGRAM      128
>  #define IP6SKB_SEG6	      256
> +#define IP6SKB_FAKEJUMBO      512
>  };
>  

Why is this considered a FAKEJUMBO? The proper header is getting added
correct?


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-05 16:36       ` David Ahern
@ 2022-03-05 17:57         ` Eric Dumazet
  0 siblings, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2022-03-05 17:57 UTC (permalink / raw)
  To: David Ahern
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck, Saeed Mahameed, Leon Romanovsky

On Sat, Mar 5, 2022 at 8:36 AM David Ahern <dsahern@kernel.org> wrote:
>
> On 3/4/22 10:14 AM, Eric Dumazet wrote:
> > On Thu, Mar 3, 2022 at 8:43 PM David Ahern <dsahern@kernel.org> wrote:
> >>
> >> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> >>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >>> index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
> >>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> >>> @@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
> >>>
> >>>       netdev->priv_flags       |= IFF_UNICAST_FLT;
> >>>
> >>> +     netif_set_tso_ipv6_max_size(netdev, 512 * 1024);
> >>
> >>
> >> How does the ConnectX hardware handle fairness for such large packet
> >> sizes? For 1500 MTU this means a single large TSO can cause the H/W to
> >> generate 349 MTU sized packets. Even a 4k MTU means 128 packets. This
> >> has an effect on the rate of packets hitting the next hop switch for
> >> example.
> >
> > I think ConnectX cards interleave packets from all TX queues, at least
> > old CX3 have a parameter to control that.
> >
> > Given that we already can send at line rate, from a single TX queue, I
> > do not see why presenting larger TSO packets
> > would change anything on the wire ?
> >
> > Do you think ConnectX adds an extra gap on the wire at the end of a TSO train ?
>
> It's not about 1 queue, my question was along several lines. e.g,
> 1. the inter-packet gap for TSO generated packets. With 512kB packets
> the burst is 8x from what it is today.

We did experiments with 185 KB  (or 45 4K segments in our case [1]),
and got no increase of drops.
We are deploying these limits.
[1] we increased MAX_SKB_FRAGS to 45,  so that zero copy for both TX
and RX is possible.

Once your switches are 100Gbit rated, just send them 100Gbit traffic.

Note that linux TCP has a lot of burst-control, and pacing features already.

>
> 2. the fairness within hardware as 1 queue has potentially many 512kB
> packets and the impact on other queues (e.g., higher latency?) since it
> will take longer to split the larger packets into MTU sized packets.

It depends on the NIC. Many NICs (including mlx4) have a per queue quantum,
usually configurable in power of two steps (4K, 8K, 16K, 32K ...)

It means that one TSO packet is split in smaller chunks, depending on
concurrent eligible TX queues.

Our NIC of the day at  Google, has a MTU quantum per queue.

(This is one of the reason I added
/sys/class/net/ethX/gro_flush_timeout, because sending TSO packets
would not mean the receiver would receive this TSO in a single train
of received packets)

>
> It is really about understanding the change this new default size is
> going to have on users.

Sure, but to be able to conduct experiments, and allow TCP congestion control
to probe for bigger bursts, we need the core to support bigger packets.

Then, one can precisely tune the max GSO size that it wants, per
ethernet device,
if really existing rate limiting features do not help.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-05 16:46       ` David Ahern
@ 2022-03-05 18:08         ` Eric Dumazet
  2022-03-05 19:06           ` David Ahern
  0 siblings, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2022-03-05 18:08 UTC (permalink / raw)
  To: David Ahern
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck

On Sat, Mar 5, 2022 at 8:46 AM David Ahern <dsahern@kernel.org> wrote:
>
> On 3/4/22 10:47 AM, Eric Dumazet wrote:
> > On Thu, Mar 3, 2022 at 8:33 PM David Ahern <dsahern@kernel.org> wrote:
> >>
> >> On 3/3/22 11:16 AM, Eric Dumazet wrote:
> >>> From: Coco Li <lixiaoyan@google.com>
> >>>
> >>> Instead of simply forcing a 0 payload_len in IPv6 header,
> >>> implement RFC 2675 and insert a custom extension header.
> >>>
> >>> Note that only TCP stack is currently potentially generating
> >>> jumbograms, and that this extension header is purely local,
> >>> it wont be sent on a physical link.
> >>>
> >>> This is needed so that packet capture (tcpdump and friends)
> >>> can properly dissect these large packets.
> >>>
> >>
> >>
> >> I am fairly certain I know how you are going to respond, but I will ask
> >> this anyways :-) :
> >>
> >> The networking stack as it stands today does not care that skb->len >
> >> 64kB and nothing stops a driver from setting max gso size to be > 64kB.
> >> Sure, packet socket apps (tcpdump) get confused but if the h/w supports
> >> the larger packet size it just works.
> >
> > Observability is key. "just works" is a bold claim.
> >
> >>
> >> The jumbogram header is getting adding at the L3/IPv6 layer and then
> >> removed by the drivers before pushing to hardware. So, the only benefit
> >> of the push and pop of the jumbogram header is for packet sockets and
> >> tc/ebpf programs - assuming those programs understand the header
> >> (tcpdump (libpcap?) yes, random packet socket program maybe not). Yes,
> >> it is a standard header so apps have a chance to understand the larger
> >> packet size, but what is the likelihood that random apps or even ebpf
> >> programs will understand it?
> >
> > Can you explain to me what you are referring to by " random apps" exactly ?
> > TCP does not expose to user space any individual packet length.
>
> TCP apps are not affected; they do not have direct access to L3 headers.
> This is about packet sockets and ebpf programs and their knowledge of
> the HBH header. This does not seem like a widely used feature and even
> tcpdump only recently gained support for it (e.g.,  Ubuntu 20.04 does
> not support it, 21.10 does). Given that what are the odds most packet
> programs are affected by the change and if they need to have support we
> could just as easily add that support in a way that gets both networking
> layers working.
>
> >
> >
> >
> >>
> >> Alternative solutions to the packet socket (ebpf programs have access to
> >> skb->len) problem would allow IPv4 to join the Big TCP party. I am
> >> wondering how feasible an alternative solution is to get large packet
> >> sizes across the board with less overhead and changes.
> >
> > You know, I think I already answered this question 6 months ago.
> >
> > We need to carry an extra metadata to carry how much TCP payload is in a packet,
> > both on RX and TX side.
> >
> > Adding an skb field for that was not an option for me.
>
> Why? skb->len is not limited to a u16. The only affect is when skb->len
> is used to fill in the ipv4/ipv6 header.

Seriously ?

Have you looked recently at core networking stack, and have you read
my netdev presentation ?

Core networking stack will trim your packets skb->len based on what is
found in the network header,
which is a 16bit field, unless you use HBH.

Look at ip6_rcv_core().
Do you want to modify it ?
Let us know how exactly, and why it is not going to break things.

pkt_len = ntohs(hdr->payload_len);

/* pkt_len may be zero if Jumbo payload option is present */
if (pkt_len || hdr->nexthdr != NEXTHDR_HOP) {
    if (pkt_len + sizeof(struct ipv6hdr) > skb->len) {
         __IP6_INC_STATS(net,
                                     idev, IPSTATS_MIB_INTRUNCATEDPKTS);
        goto drop;
    }
    if (pskb_trim_rcsum(skb, pkt_len + sizeof(struct ipv6hdr))) {
        __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
            goto drop;
      }
hdr = ipv6_hdr(skb);
}

if (hdr->nexthdr == NEXTHDR_HOP) {
    if (ipv6_parse_hopopts(skb) < 0) {
        __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
        rcu_read_unlock();
        return NULL;
    }
}







>
> >
> > Adding a 8 bytes header is basically free, the headers need to be in cpu caches
> > when the header is added/removed.
> >
> > This is zero cost on current cpus, compared to the gains.
> >
> > I think you focus on TSO side, which is only 25% of the possible gains
> > that BIG TCP was seeking for.
> >
> > We covered both RX and TX with a common mechanism.
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-05 18:08         ` Eric Dumazet
@ 2022-03-05 19:06           ` David Ahern
  0 siblings, 0 replies; 36+ messages in thread
From: David Ahern @ 2022-03-05 19:06 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li,
	Alexander Duyck

On 3/5/22 11:08 AM, Eric Dumazet wrote:
>>
>> Why? skb->len is not limited to a u16. The only affect is when skb->len
>> is used to fill in the ipv4/ipv6 header.
> 
> Seriously ?
> 
> Have you looked recently at core networking stack, and have you read
> my netdev presentation ?

yes I have.

> 
> Core networking stack will trim your packets skb->len based on what is
> found in the network header,

Core networking stack is s/w under our control; much more complicated
and intricate changes have been made.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-03 18:16 ` [PATCH v2 net-next 14/14] mlx5: " Eric Dumazet
  2022-03-04  4:42   ` David Ahern
@ 2022-03-08 16:02   ` Tariq Toukan
  1 sibling, 0 replies; 36+ messages in thread
From: Tariq Toukan @ 2022-03-08 16:02 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan



On 3/3/2022 8:16 PM, Eric Dumazet wrote:
> From: Coco Li <lixiaoyan@google.com>
> 
> mlx5 supports LSOv2.
> 
> IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> with JUMBO TLV for big packets.
> 
> We need to ignore/skip this HBH header when populating TX descriptor.
> 
> Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
> layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
> 
> v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
> 
> Signed-off-by: Coco Li <lixiaoyan@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Saeed Mahameed <saeedm@nvidia.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> ---
>   .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
>   .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 82 +++++++++++++++----
>   2 files changed, 67 insertions(+), 16 deletions(-)
> 

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

Thanks.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets
  2022-03-03 18:16 ` [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
@ 2022-03-08 16:03   ` Tariq Toukan
  0 siblings, 0 replies; 36+ messages in thread
From: Tariq Toukan @ 2022-03-08 16:03 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Coco Li, David Ahern, Alexander Duyck,
	Tariq Toukan



On 3/3/2022 8:16 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> mlx4 supports LSOv2 just fine.
> 
> IPv6 stack inserts a temporary Hop-by-Hop header
> with JUMBO TLV for big packets.
> 
> We need to ignore the HBH header when populating TX descriptor.
> 
> Tested:
> 
> Before: (not enabling bigger TSO/GRO packets)
> 
> ip link set dev eth0 gso_ipv6_max_size 65536 gro_ipv6_max_size 65536
> 
> netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
> MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
> Local /Remote
> Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr
> 
> 262144 540000 70000   70000  10.00   6591.45  0.86   1.34   62.490  97.446
> 262144 540000
> 
> After: (enabling bigger TSO/GRO packets)
> 
> ip link set dev eth0 gso_ipv6_max_size 185000 gro_ipv6_max_size 185000
> 
> netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
> MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
> Local /Remote
> Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr
> 
> 262144 540000 70000   70000  10.00   8383.95  0.95   1.01   54.432  57.584
> 262144 540000
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Tariq Toukan <tariqt@nvidia.com>
> ---
>   .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 ++
>   drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++++++++----
>   2 files changed, 41 insertions(+), 9 deletions(-)
> 

Reviewed-by: Tariq Toukan <tariqt@nvidia.com>

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2022-03-08 16:03 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-03 18:15 [PATCH v2 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
2022-03-04 19:26   ` Alexander H Duyck
2022-03-04 19:28     ` Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
2022-03-03 18:15 ` [PATCH v2 net-next 06/14] ipv6/gro: insert " Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
2022-03-04  4:37   ` David Ahern
2022-03-04 17:16     ` Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
2022-03-04  4:33   ` David Ahern
2022-03-04 15:48     ` Alexander H Duyck
2022-03-04 17:09       ` Eric Dumazet
2022-03-04 19:00         ` Alexander H Duyck
2022-03-04 19:13           ` Eric Dumazet
2022-03-05 16:53             ` David Ahern
2022-03-04 17:47     ` Eric Dumazet
2022-03-05 16:46       ` David Ahern
2022-03-05 18:08         ` Eric Dumazet
2022-03-05 19:06           ` David Ahern
2022-03-05 16:55   ` David Ahern
2022-03-03 18:16 ` [PATCH v2 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 12/14] ipvlan: " Eric Dumazet
2022-03-03 18:16 ` [PATCH v2 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
2022-03-08 16:03   ` Tariq Toukan
2022-03-03 18:16 ` [PATCH v2 net-next 14/14] mlx5: " Eric Dumazet
2022-03-04  4:42   ` David Ahern
2022-03-04 17:14     ` Eric Dumazet
2022-03-05 16:36       ` David Ahern
2022-03-05 17:57         ` Eric Dumazet
2022-03-08 16:02   ` Tariq Toukan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.