All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 net-next 00/14] tcp: BIG TCP implementation
@ 2022-03-10  5:46 Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

This series implements BIG TCP as presented in netdev 0x15:

https://netdevconf.info/0x15/session.html?BIG-TCP

Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/

Standard TSO/GRO packet limit is 64KB

With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.

Note that this feature is by default not enabled, because it might
break some eBPF programs assuming TCP header immediately follows IPv6 header.

While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
are unable to skip over IPv6 extension headers.

Reducing number of packets traversing networking stack usually improves
performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.

'Standard' performance with current (74KB) limits.
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
77           138          183          8542.19    
79           143          178          8215.28    
70           117          164          9543.39    
80           144          176          8183.71    
78           126          155          9108.47    
80           146          184          8115.19    
71           113          165          9510.96    
74           113          164          9518.74    
79           137          178          8575.04    
73           111          171          9561.73    

Now enable BIG TCP on both hosts.

ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
57           83           117          13871.38   
64           118          155          11432.94   
65           116          148          11507.62   
60           105          136          12645.15   
60           103          135          12760.34   
60           102          134          12832.64   
62           109          132          10877.68   
58           82           115          14052.93   
57           83           124          14212.58   
57           82           119          14196.01   

We see an increase of transactions per second, and lower latencies as well.

v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)

v3: Fixed a typo in RFC number (Alexander)
    Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.

v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
    Addressed feedback, for Alexander and nvidia folks.

Coco Li (5):
  ipv6: add dev->gso_ipv6_max_size
  ipv6: add GRO_IPV6_MAX_SIZE
  ipv6: Add hop-by-hop header to jumbograms in ip6_output
  ipvlan: enable BIG TCP Packets
  mlx5: support BIG TCP packets

Eric Dumazet (9):
  net: add netdev->tso_ipv6_max_size attribute
  tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  ipv6: add struct hop_jumbo_hdr definition
  ipv6/gso: remove temporary HBH/jumbo header
  ipv6/gro: insert temporary HBH/jumbo header
  net: loopback: enable BIG TCP packets
  bonding: update dev->tso_ipv6_max_size
  macvlan: enable BIG TCP Packets
  mlx4: support BIG TCP packets

 drivers/net/bonding/bond_main.c               |  3 +
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 +
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 drivers/net/ipvlan/ipvlan_main.c              |  1 +
 drivers/net/loopback.c                        |  2 +
 drivers/net/macvlan.c                         |  1 +
 include/linux/ipv6.h                          |  1 +
 include/linux/netdevice.h                     | 32 +++++++
 include/net/ipv6.h                            | 44 ++++++++++
 include/uapi/linux/if_link.h                  |  3 +
 net/core/dev.c                                |  4 +
 net/core/gro.c                                | 20 ++++-
 net/core/rtnetlink.c                          | 33 ++++++++
 net/core/sock.c                               |  6 ++
 net/ipv4/tcp_cubic.c                          |  4 +-
 net/ipv6/ip6_offload.c                        | 56 ++++++++++++-
 net/ipv6/ip6_output.c                         | 22 ++++-
 tools/include/uapi/linux/if_link.h            |  3 +
 20 files changed, 336 insertions(+), 34 deletions(-)

-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Some NIC (or virtual devices) are LSOv2 compatible.

BIG TCP plans using the large LSOv2 feature for IPv6.

New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined.

Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit.

Unchanged drivers are not allowing big TSO packets to be sent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 10 ++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  2 ++
 net/core/rtnetlink.c               |  3 +++
 tools/include/uapi/linux/if_link.h |  1 +
 5 files changed, 17 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 29a850a8d4604bb2ac43b582595f301aaa96a0bc..61db67222c47664c179b6a5d3b6f15fdf8a02bdd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1951,6 +1951,7 @@ enum netdev_ml_priv_type {
  *	@dev_registered_tracker:	tracker for reference held while
  *					registered
  *	@offload_xstats_l3:	L3 HW stats for this netdevice.
+ *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -2289,6 +2290,7 @@ struct net_device {
 	netdevice_tracker	watchdog_dev_tracker;
 	netdevice_tracker	dev_registered_tracker;
 	struct rtnl_hw_stats64	*offload_xstats_l3;
+	unsigned int		tso_ipv6_max_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -4888,6 +4890,14 @@ static inline void netif_set_gro_max_size(struct net_device *dev,
 	WRITE_ONCE(dev->gro_max_size, size);
 }
 
+/* Used by drivers to give their hardware/firmware limit for LSOv2 packets */
+static inline void netif_set_tso_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	dev->tso_ipv6_max_size = size;
+}
+
+
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
 					int mac_len)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ddca20357e7e89b5f204b3117ff3838735535470..c8af031b692e52690a2760e9d79c9462185e2fc9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -363,6 +363,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index ba69ddf85af6b4543caa91f314caf54794a3a02a..de28f634c18a65d1948a96db5678d38e9c871b1f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10467,6 +10467,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gso_max_size = GSO_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
+	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
+
 	dev->upper_level = 1;
 	dev->lower_level = 1;
 #ifdef CONFIG_LOCKDEP
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a759f9e0a8476538fb41311113daed998a7193fd..ab51b18cdb5d46b87d4a11d2f66a68968ba737d6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1027,6 +1027,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1732,6 +1733,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1885,6 +1887,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index e1ba2d51b717b7ac7f06e94ac9791cf4c8a5ab6f..441615c39f0a24eeeb6e27b4ca88031bcc234cf8 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -348,6 +348,7 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-11 16:21   ` Alexander H Duyck
  2022-03-10  5:46 ` [PATCH v4 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

This enable TCP stack to build TSO packets bigger than
64KB if the driver is LSOv2 compatible.

This patch introduces new variable gso_ipv6_max_size
that is modifiable through ip link.

ip link set dev eth0 gso_ipv6_max_size 185000

User input is capped by driver limit (tso_ipv6_max_size)
added in previous patch.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 12 ++++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  1 +
 net/core/rtnetlink.c               | 15 +++++++++++++++
 net/core/sock.c                    |  6 ++++++
 tools/include/uapi/linux/if_link.h |  1 +
 6 files changed, 36 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 61db67222c47664c179b6a5d3b6f15fdf8a02bdd..9ed348d8b6f1195514c3b5f85fbe2c45b3fa997f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1952,6 +1952,7 @@ enum netdev_ml_priv_type {
  *					registered
  *	@offload_xstats_l3:	L3 HW stats for this netdevice.
  *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
+ *	@gso_ipv6_max_size:	Maximum size of IPv6 GSO packets (user/admin limit)
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -2291,6 +2292,7 @@ struct net_device {
 	netdevice_tracker	dev_registered_tracker;
 	struct rtnl_hw_stats64	*offload_xstats_l3;
 	unsigned int		tso_ipv6_max_size;
+	unsigned int		gso_ipv6_max_size;
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
@@ -4874,6 +4876,10 @@ static inline void netif_set_gso_max_size(struct net_device *dev,
 {
 	/* dev->gso_max_size is read locklessly from sk_setup_caps() */
 	WRITE_ONCE(dev->gso_max_size, size);
+
+	/* legacy drivers want to lower gso_max_size, regardless of family. */
+	size = min(size, dev->gso_ipv6_max_size);
+	WRITE_ONCE(dev->gso_ipv6_max_size, size);
 }
 
 static inline void netif_set_gso_max_segs(struct net_device *dev,
@@ -4897,6 +4903,12 @@ static inline void netif_set_tso_ipv6_max_size(struct net_device *dev,
 	dev->tso_ipv6_max_size = size;
 }
 
+static inline void netif_set_gso_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	size = min(size, dev->tso_ipv6_max_size);
+	WRITE_ONCE(dev->gso_ipv6_max_size, size);
+}
 
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c8af031b692e52690a2760e9d79c9462185e2fc9..048a9c848a3a39596b6c3135553fdfb9a1fe37d2 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -364,6 +364,7 @@ enum {
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
+	IFLA_GSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index de28f634c18a65d1948a96db5678d38e9c871b1f..87f8b8cb39a61c8f5a444e3b341a97ba0a4c06d9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10468,6 +10468,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
 	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
+	dev->gso_ipv6_max_size = GSO_MAX_SIZE;
 
 	dev->upper_level = 1;
 	dev->lower_level = 1;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ab51b18cdb5d46b87d4a11d2f66a68968ba737d6..172de404c595c89e30651a091242a75be8f786b7 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1028,6 +1028,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_GSO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1734,6 +1735,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
 	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
+	    nla_put_u32(skb, IFLA_GSO_IPV6_MAX_SIZE, dev->gso_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1888,6 +1890,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
 	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_GSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2774,6 +2777,15 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 	}
 
+	if (tb[IFLA_GSO_IPV6_MAX_SIZE]) {
+		u32 max_size = nla_get_u32(tb[IFLA_GSO_IPV6_MAX_SIZE]);
+
+		if (dev->gso_ipv6_max_size ^ max_size) {
+			netif_set_gso_ipv6_max_size(dev, max_size);
+			status |= DO_SETLINK_MODIFIED;
+		}
+	}
+
 	if (tb[IFLA_GSO_MAX_SEGS]) {
 		u32 max_segs = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]);
 
@@ -3249,6 +3261,9 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 		netif_set_gso_max_segs(dev, nla_get_u32(tb[IFLA_GSO_MAX_SEGS]));
 	if (tb[IFLA_GRO_MAX_SIZE])
 		netif_set_gro_max_size(dev, nla_get_u32(tb[IFLA_GRO_MAX_SIZE]));
+	if (tb[IFLA_GSO_IPV6_MAX_SIZE])
+		netif_set_gso_ipv6_max_size(dev,
+			nla_get_u32(tb[IFLA_GSO_IPV6_MAX_SIZE]));
 
 	return dev;
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 1180a0cb01104561befa1f96deb71f36efcf12da..e0858e82bc386eb2779a0d6af6063b2078e6ea7b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2279,6 +2279,12 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */
 			sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
+#if IS_ENABLED(CONFIG_IPV6)
+			if (sk->sk_family == AF_INET6 &&
+			    sk_is_tcp(sk) &&
+			    !ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr))
+				sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_ipv6_max_size);
+#endif
 			sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */
 			max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1);
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 441615c39f0a24eeeb6e27b4ca88031bcc234cf8..e40cd575607872d3bff3bc1971df8c6426290562 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -349,6 +349,7 @@ enum {
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
+	IFLA_GSO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

hystart_ack_delay() had the assumption that a TSO packet
would not be bigger than GSO_MAX_SIZE.

This will no longer be true.

We should use sk->sk_gso_max_size instead.

This reduces chances of spurious Hystart ACK train detections.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_cubic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index 24d562dd62254d6e50dd08236f8967400d81e1ea..dfc9dc951b7404776b2246c38273fbadf03c39fd 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -372,7 +372,7 @@ static void cubictcp_state(struct sock *sk, u8 new_state)
  * We apply another 100% factor because @rate is doubled at this point.
  * We cap the cushion to 1ms.
  */
-static u32 hystart_ack_delay(struct sock *sk)
+static u32 hystart_ack_delay(const struct sock *sk)
 {
 	unsigned long rate;
 
@@ -380,7 +380,7 @@ static u32 hystart_ack_delay(struct sock *sk)
 	if (!rate)
 		return 0;
 	return min_t(u64, USEC_PER_MSEC,
-		     div64_ul((u64)GSO_MAX_SIZE * 4 * USEC_PER_SEC, rate));
+		     div64_ul((u64)sk->sk_gso_max_size * 4 * USEC_PER_SEC, rate));
 }
 
 static void hystart_update(struct sock *sk, u32 delay)
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patches will need to add and remove local IPv6 jumbogram
options to enable BIG TCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 213612f1680c7c39f4c07f0c05b4e6cf34a7878e..63d019953c47ea03d3b723a58c25e83c249489a9 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -151,6 +151,17 @@ struct frag_hdr {
 	__be32	identification;
 };
 
+/*
+ * Jumbo payload option, as described in RFC 2675 2.
+ */
+struct hop_jumbo_hdr {
+	u8	nexthdr;
+	u8	hdrlen;
+	u8	tlv_type;	/* IPV6_TLV_JUMBO, 0xC2 */
+	u8	tlv_len;	/* 4 */
+	__be32	jumbo_payload_len;
+};
+
 #define	IP6_MF		0x0001
 #define	IP6_OFFSET	0xFFF8
 
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (3 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 06/14] ipv6/gro: insert " Eric Dumazet
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

ipv6 tcp and gro stacks will soon be able to build big TCP packets,
with an added temporary Hop By Hop header.

If GSO is involved for these large packets, we need to remove
the temporary HBH header before segmentation happens.

v2: perform HBH removal from ipv6_gso_segment() instead of
    skb_segment() (Alexander feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h     | 33 +++++++++++++++++++++++++++++++++
 net/ipv6/ip6_offload.c | 24 +++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 63d019953c47ea03d3b723a58c25e83c249489a9..b6df0314aa02dd1c4094620145ccb24da7195b2b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -467,6 +467,39 @@ bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb,
 struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
 					   struct ipv6_txoptions *opt);
 
+/* This helper is specialized for BIG TCP needs.
+ * It assumes the hop_jumbo_hdr will immediately follow the IPV6 header.
+ * It assumes headers are already in skb->head.
+ * Returns 0, or IPPROTO_TCP if a BIG TCP packet is there.
+ */
+static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
+{
+	const struct hop_jumbo_hdr *jhdr;
+	const struct ipv6hdr *nhdr;
+
+	if (likely(skb->len <= GRO_MAX_SIZE))
+		return 0;
+
+	if (skb->protocol != htons(ETH_P_IPV6))
+		return 0;
+
+	if (skb_network_offset(skb) +
+	    sizeof(struct ipv6hdr) +
+	    sizeof(struct hop_jumbo_hdr) > skb_headlen(skb))
+		return 0;
+
+	nhdr = ipv6_hdr(skb);
+
+	if (nhdr->nexthdr != NEXTHDR_HOP)
+		return 0;
+
+	jhdr = (const struct hop_jumbo_hdr *) (nhdr + 1);
+	if (jhdr->tlv_type != IPV6_TLV_JUMBO || jhdr->hdrlen != 0 ||
+	    jhdr->nexthdr != IPPROTO_TCP)
+		return 0;
+	return jhdr->nexthdr;
+}
+
 static inline bool ipv6_accept_ra(struct inet6_dev *idev)
 {
 	/* If forwarding is enabled, RA are not accepted unless the special
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index c4fc03c1ac99dbecd92e2b47b2db65374197434d..a6a6c1539c28d242ef8c35fcd5ce900512ce912d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -77,7 +77,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct ipv6hdr *ipv6h;
 	const struct net_offload *ops;
-	int proto;
+	int proto, nexthdr;
 	struct frag_hdr *fptr;
 	unsigned int payload_len;
 	u8 *prevhdr;
@@ -87,6 +87,28 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	bool gso_partial;
 
 	skb_reset_network_header(skb);
+	nexthdr = ipv6_has_hopopt_jumbo(skb);
+	if (nexthdr) {
+		const int hophdr_len = sizeof(struct hop_jumbo_hdr);
+		int err;
+
+		err = skb_cow_head(skb, 0);
+		if (err < 0)
+			return ERR_PTR(err);
+
+		/* remove the HBH header.
+		 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+		 */
+		memmove(skb_mac_header(skb) + hophdr_len,
+			skb_mac_header(skb),
+			ETH_HLEN + sizeof(struct ipv6hdr));
+		skb->data += hophdr_len;
+		skb->len -= hophdr_len;
+		skb->network_header += hophdr_len;
+		skb->mac_header += hophdr_len;
+		ipv6h = (struct ipv6hdr *)skb->data;
+		ipv6h->nexthdr = nexthdr;
+	}
 	nhoff = skb_network_header(skb) - skb_mac_header(skb);
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (4 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-11 16:24   ` Alexander H Duyck
  2022-03-10  5:46 ` [PATCH v4 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
BIG TCP ipv6 packets (bigger than 64K).

This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
so that resulting packet can go through IPv6/TCP stacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_offload.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a6a6c1539c28d242ef8c35fcd5ce900512ce912d..d12dba2dd5354dbb79bb80df4038dec2544cddeb 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -342,15 +342,43 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
-	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+	struct ipv6hdr *iph;
 	int err = -ENOSYS;
+	u32 payload_len;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
 		skb_set_inner_network_header(skb, nhoff);
 	}
 
-	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
+	payload_len = skb->len - nhoff - sizeof(*iph);
+	if (unlikely(payload_len > IPV6_MAXPLEN)) {
+		struct hop_jumbo_hdr *hop_jumbo;
+		int hoplen = sizeof(*hop_jumbo);
+
+		/* Move network header left */
+		memmove(skb_mac_header(skb) - hoplen, skb_mac_header(skb),
+			skb->transport_header - skb->mac_header);
+		skb->data -= hoplen;
+		skb->len += hoplen;
+		skb->mac_header -= hoplen;
+		skb->network_header -= hoplen;
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		hop_jumbo = (struct hop_jumbo_hdr *)(iph + 1);
+
+		/* Build hop-by-hop options */
+		hop_jumbo->nexthdr = iph->nexthdr;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(payload_len + hoplen);
+
+		iph->nexthdr = NEXTHDR_HOP;
+		iph->payload_len = 0;
+	} else {
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		iph->payload_len = htons(payload_len);
+	}
 
 	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (5 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 06/14] ipv6/gro: insert " Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Enable GRO to have IPv6 specific limit for max packet size.

This patch introduces new dev->gro_ipv6_max_size
that is modifiable through ip link.

ip link set dev eth0 gro_ipv6_max_size 185000

Note that this value is only considered if bigger than
gro_max_size, and for non encapsulated TCP/ipv6 packets.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h          | 10 ++++++++++
 include/uapi/linux/if_link.h       |  1 +
 net/core/dev.c                     |  1 +
 net/core/gro.c                     | 20 ++++++++++++++++++--
 net/core/rtnetlink.c               | 15 +++++++++++++++
 tools/include/uapi/linux/if_link.h |  1 +
 6 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9ed348d8b6f1195514c3b5f85fbe2c45b3fa997f..771440f6f8a8fa6cdadd398be8f2bacb4841138c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1944,6 +1944,8 @@ enum netdev_ml_priv_type {
  *			keep a list of interfaces to be deleted.
  *	@gro_max_size:	Maximum size of aggregated packet in generic
  *			receive offload (GRO)
+ *	@gro_ipv6_max_size:	Maximum size of aggregated packet in generic
+ *				receive offload (GRO), for IPv6
  *
  *	@dev_addr_shadow:	Copy of @dev_addr to catch direct writes.
  *	@linkwatch_dev_tracker:	refcount tracker used by linkwatch.
@@ -2140,6 +2142,7 @@ struct net_device {
 	int			napi_defer_hard_irqs;
 #define GRO_MAX_SIZE		65536
 	unsigned int		gro_max_size;
+	unsigned int		gro_ipv6_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
 
@@ -4910,6 +4913,13 @@ static inline void netif_set_gso_ipv6_max_size(struct net_device *dev,
 	WRITE_ONCE(dev->gso_ipv6_max_size, size);
 }
 
+static inline void netif_set_gro_ipv6_max_size(struct net_device *dev,
+					       unsigned int size)
+{
+	/* This pairs with the READ_ONCE() in skb_gro_receive() */
+	WRITE_ONCE(dev->gro_ipv6_max_size, size);
+}
+
 static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 					int pulled_hlen, u16 mac_offset,
 					int mac_len)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 048a9c848a3a39596b6c3135553fdfb9a1fe37d2..9baa084fe2c6762b05029c4692cfd9c4646bb916 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -365,6 +365,7 @@ enum {
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
 	IFLA_GSO_IPV6_MAX_SIZE,
+	IFLA_GRO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
diff --git a/net/core/dev.c b/net/core/dev.c
index 87f8b8cb39a61c8f5a444e3b341a97ba0a4c06d9..9921cee9c20d2bc396ef1f4d783ac01604b1e8be 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10469,6 +10469,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	dev->gro_max_size = GRO_MAX_SIZE;
 	dev->tso_ipv6_max_size = GSO_MAX_SIZE;
 	dev->gso_ipv6_max_size = GSO_MAX_SIZE;
+	dev->gro_ipv6_max_size = GRO_MAX_SIZE;
 
 	dev->upper_level = 1;
 	dev->lower_level = 1;
diff --git a/net/core/gro.c b/net/core/gro.c
index ee5e7e889d8bdd8db18715afc7bb6c1c759c9c23..f795393a883b08d71bfcfbd2d897e1ddcddf6fce 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -136,11 +136,27 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	unsigned int new_truesize;
 	struct sk_buff *lp;
 
+	if (unlikely(NAPI_GRO_CB(skb)->flush))
+		return -E2BIG;
+
 	/* pairs with WRITE_ONCE() in netif_set_gro_max_size() */
 	gro_max_size = READ_ONCE(p->dev->gro_max_size);
 
-	if (unlikely(p->len + len >= gro_max_size || NAPI_GRO_CB(skb)->flush))
-		return -E2BIG;
+	if (unlikely(p->len + len >= gro_max_size)) {
+		/* pairs with WRITE_ONCE() in netif_set_gro_ipv6_max_size() */
+		unsigned int gro6_max_size = READ_ONCE(p->dev->gro_ipv6_max_size);
+
+		if (gro6_max_size > gro_max_size &&
+		    p->protocol == htons(ETH_P_IPV6) &&
+		    skb_headroom(p) >= sizeof(struct hop_jumbo_hdr) &&
+		    ipv6_hdr(p)->nexthdr == IPPROTO_TCP &&
+		    !p->encapsulation)
+			gro_max_size = gro6_max_size;
+
+		if (p->len + len >= gro_max_size)
+			return -E2BIG;
+	}
+
 
 	lp = NAPI_GRO_CB(p)->last;
 	pinfo = skb_shinfo(lp);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 172de404c595c89e30651a091242a75be8f786b7..39c5a9fb792df3992b4e7177f4dfeba2553eaa08 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1029,6 +1029,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GSO_IPV6_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_GRO_IPV6_MAX_SIZE */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1736,6 +1737,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
 	    nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) ||
 	    nla_put_u32(skb, IFLA_GSO_IPV6_MAX_SIZE, dev->gso_ipv6_max_size) ||
+	    nla_put_u32(skb, IFLA_GRO_IPV6_MAX_SIZE, dev->gro_ipv6_max_size) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1891,6 +1893,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
 	[IFLA_TSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 	[IFLA_GSO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_GRO_IPV6_MAX_SIZE]	= { .type = NLA_U32 },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2786,6 +2789,15 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 	}
 
+	if (tb[IFLA_GRO_IPV6_MAX_SIZE]) {
+		u32 max_size = nla_get_u32(tb[IFLA_GRO_IPV6_MAX_SIZE]);
+
+		if (dev->gro_ipv6_max_size ^ max_size) {
+			netif_set_gro_ipv6_max_size(dev, max_size);
+			status |= DO_SETLINK_MODIFIED;
+		}
+	}
+
 	if (tb[IFLA_GSO_MAX_SEGS]) {
 		u32 max_segs = nla_get_u32(tb[IFLA_GSO_MAX_SEGS]);
 
@@ -3264,6 +3276,9 @@ struct net_device *rtnl_create_link(struct net *net, const char *ifname,
 	if (tb[IFLA_GSO_IPV6_MAX_SIZE])
 		netif_set_gso_ipv6_max_size(dev,
 			nla_get_u32(tb[IFLA_GSO_IPV6_MAX_SIZE]));
+	if (tb[IFLA_GRO_IPV6_MAX_SIZE])
+		netif_set_gro_ipv6_max_size(dev,
+			nla_get_u32(tb[IFLA_GRO_IPV6_MAX_SIZE]));
 
 	return dev;
 }
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index e40cd575607872d3bff3bc1971df8c6426290562..567008925a8be6900aa048c7ebb12684b2eebb4b 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -350,6 +350,7 @@ enum {
 	IFLA_GRO_MAX_SIZE,
 	IFLA_TSO_IPV6_MAX_SIZE,
 	IFLA_GSO_IPV6_MAX_SIZE,
+	IFLA_GRO_IPV6_MAX_SIZE,
 
 	__IFLA_MAX
 };
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (6 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.

Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.

This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/ipv6.h  |  1 +
 net/ipv6/ip6_output.c | 22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 16870f86c74d3d1f5dfb7edac1e7db85f1ef6755..93b273db1c9926aba4199f486ce90778311916f5 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -144,6 +144,7 @@ struct inet6_skb_parm {
 #define IP6SKB_L3SLAVE         64
 #define IP6SKB_JUMBOGRAM      128
 #define IP6SKB_SEG6	      256
+#define IP6SKB_FAKEJUMBO      512
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index e69fac576970a9b85fb68aa02822c0e2df67e1a2..941ceff83b616cec11c6bb7ccaf81bc041f8d9cc 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -180,7 +180,9 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
 #endif
 
 	mtu = ip6_skb_dst_mtu(skb);
-	if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
+	if (skb_is_gso(skb) &&
+	    !(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
+	    !skb_gso_validate_network_len(skb, mtu))
 		return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
 
 	if ((skb->len > mtu && !skb_is_gso(skb)) ||
@@ -251,6 +253,8 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct inet6_dev *idev = ip6_dst_idev(dst);
+	struct hop_jumbo_hdr *hop_jumbo;
+	int hoplen = sizeof(*hop_jumbo);
 	unsigned int head_room;
 	struct ipv6hdr *hdr;
 	u8  proto = fl6->flowi6_proto;
@@ -258,7 +262,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	int hlimit = -1;
 	u32 mtu;
 
-	head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dev);
+	head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev);
 	if (opt)
 		head_room += opt->opt_nflen + opt->opt_flen;
 
@@ -281,6 +285,20 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 					     &fl6->saddr);
 	}
 
+	if (unlikely(seg_len > IPV6_MAXPLEN)) {
+		hop_jumbo = skb_push(skb, hoplen);
+
+		hop_jumbo->nexthdr = proto;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen);
+
+		proto = IPPROTO_HOPOPTS;
+		seg_len = 0;
+		IP6CB(skb)->flags |= IP6SKB_FAKEJUMBO;
+	}
+
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	hdr = ipv6_hdr(skb);
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 09/14] net: loopback: enable BIG TCP packets
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (7 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:46 ` [PATCH v4 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Set the driver limit to 512 KB per TSO ipv6 packet.

This allows the admin/user to set a GSO ipv6 limit up to this value.

Tested:

ip link set dev lo gso_ipv6_max_size 200000
netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &

tcpdump shows :

18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/loopback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 720394c0639b20a2fd6262e4ee9d5813c02802f1..9c21d18f0aa75a310ac600081b450f6312ff16fc 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -191,6 +191,8 @@ static void gen_lo_setup(struct net_device *dev,
 	dev->netdev_ops		= dev_ops;
 	dev->needs_free_netdev	= true;
 	dev->priv_destructor	= dev_destructor;
+
+	netif_set_tso_ipv6_max_size(dev, 512 * 1024);
 }
 
 /* The loopback device is special. There is only one instance
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 10/14] bonding: update dev->tso_ipv6_max_size
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (8 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
@ 2022-03-10  5:46 ` Eric Dumazet
  2022-03-10  5:47 ` [PATCH v4 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:46 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Use the minimal value found in the set of lower devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/bonding/bond_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 55e0ba2a163d0d9c17fdaf47a49d7a2190959651..357188c1f00e6e3919740adb6369d75712fc4e64 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1420,6 +1420,7 @@ static void bond_compute_features(struct bonding *bond)
 	struct slave *slave;
 	unsigned short max_hard_header_len = ETH_HLEN;
 	unsigned int gso_max_size = GSO_MAX_SIZE;
+	unsigned int tso_ipv6_max_size = ~0U;
 	u16 gso_max_segs = GSO_MAX_SEGS;
 
 	if (!bond_has_slaves(bond))
@@ -1450,6 +1451,7 @@ static void bond_compute_features(struct bonding *bond)
 			max_hard_header_len = slave->dev->hard_header_len;
 
 		gso_max_size = min(gso_max_size, slave->dev->gso_max_size);
+		tso_ipv6_max_size = min(tso_ipv6_max_size, slave->dev->tso_ipv6_max_size);
 		gso_max_segs = min(gso_max_segs, slave->dev->gso_max_segs);
 	}
 	bond_dev->hard_header_len = max_hard_header_len;
@@ -1465,6 +1467,7 @@ static void bond_compute_features(struct bonding *bond)
 	bond_dev->mpls_features = mpls_features;
 	netif_set_gso_max_segs(bond_dev, gso_max_segs);
 	netif_set_gso_max_size(bond_dev, gso_max_size);
+	netif_set_tso_ipv6_max_size(bond_dev, tso_ipv6_max_size);
 
 	bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
 	if ((bond_dev->priv_flags & IFF_XMIT_DST_RELEASE_PERM) &&
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 11/14] macvlan: enable BIG TCP Packets
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (9 preceding siblings ...)
  2022-03-10  5:46 ` [PATCH v4 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
@ 2022-03-10  5:47 ` Eric Dumazet
  2022-03-10  5:47 ` [PATCH v4 net-next 12/14] ipvlan: " Eric Dumazet
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Inherit tso_ipv6_max_size from lower device.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/macvlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 33753a2fde292f8f415eefe957d09be5db1c4d55..0a41228d4efabb6bcd36bc954cecb9fe3626b63a 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -902,6 +902,7 @@ static int macvlan_init(struct net_device *dev)
 	dev->hw_enc_features    |= dev->features;
 	netif_set_gso_max_size(dev, lowerdev->gso_max_size);
 	netif_set_gso_max_segs(dev, lowerdev->gso_max_segs);
+	netif_set_tso_ipv6_max_size(dev, lowerdev->tso_ipv6_max_size);
 	dev->hard_header_len	= lowerdev->hard_header_len;
 	macvlan_set_lockdep_class(dev);
 
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 12/14] ipvlan: enable BIG TCP Packets
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (10 preceding siblings ...)
  2022-03-10  5:47 ` [PATCH v4 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
@ 2022-03-10  5:47 ` Eric Dumazet
  2022-03-10  5:47 ` [PATCH v4 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Inherit tso_ipv6_max_size from physical device.

Tested:

eth0 tso_ipv6_max_size is set to 524288

ip link add link eth0 name ipvl1 type ipvlan
ip -d link show ipvl1
10: ipvl1@eth0:...
	ipvlan  mode l3 bridge addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 gso_ipv6_max_size 65535 tso_ipv6_max_size 524288 gro_ipv6_max_size 65536

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 696e245f6d009d4d5d4a9c3523e4aa1e5d0f8bb6..4de30df25f19b32a78a06d18c99e94662307b7fb 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -141,6 +141,7 @@ static int ipvlan_init(struct net_device *dev)
 	dev->hw_enc_features |= dev->features;
 	netif_set_gso_max_size(dev, phy_dev->gso_max_size);
 	netif_set_gso_max_segs(dev, phy_dev->gso_max_segs);
+	netif_set_tso_ipv6_max_size(dev, phy_dev->tso_ipv6_max_size);
 	dev->hard_header_len = phy_dev->hard_header_len;
 
 	netdev_lockdep_set_classes(dev);
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 13/14] mlx4: support BIG TCP packets
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (11 preceding siblings ...)
  2022-03-10  5:47 ` [PATCH v4 net-next 12/14] ipvlan: " Eric Dumazet
@ 2022-03-10  5:47 ` Eric Dumazet
  2022-03-10  5:47 ` [PATCH v4 net-next 14/14] mlx5: " Eric Dumazet
  2022-03-11 17:13 ` [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Alexander H Duyck
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet,
	Tariq Toukan

From: Eric Dumazet <edumazet@google.com>

mlx4 supports LSOv2 just fine.

IPv6 stack inserts a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore the HBH header when populating TX descriptor.

Tested:

Before: (not enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_ipv6_max_size 65536 gro_ipv6_max_size 65536

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   6591.45  0.86   1.34   62.490  97.446
262144 540000

After: (enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_ipv6_max_size 185000 gro_ipv6_max_size 185000

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   8383.95  0.95   1.01   54.432  57.584
262144 540000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 ++
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++++++++----
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c61dc7ae0c056a4dbcf24297549f6b1b5cc25d92..76cb93f5e5240c54f6f4c57e39739376206b4f34 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3417,6 +3417,9 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	dev->min_mtu = ETH_MIN_MTU;
 	dev->max_mtu = priv->max_mtu;
 
+	/* supports LSOv2 packets, 512KB limit has been tested. */
+	netif_set_tso_ipv6_max_size(dev, 512 * 1024);
+
 	mdev->pndev[port] = dev;
 	mdev->upper[port] = NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 817f4154b86d599cd593876ec83529051d95fe2f..c89b3e8094e7d8cfb11aaa6cc4ad63bf3ad5934e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -44,6 +44,7 @@
 #include <linux/ipv6.h>
 #include <linux/moduleparam.h>
 #include <linux/indirect_call_wrapper.h>
+#include <net/ipv6.h>
 
 #include "mlx4_en.h"
 
@@ -635,19 +636,28 @@ static int get_real_size(const struct sk_buff *skb,
 			 struct net_device *dev,
 			 int *lso_header_size,
 			 bool *inline_ok,
-			 void **pfrag)
+			 void **pfrag,
+			 int *hopbyhop)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int real_size;
 
 	if (shinfo->gso_size) {
 		*inline_ok = false;
-		if (skb->encapsulation)
+		*hopbyhop = 0;
+		if (skb->encapsulation) {
 			*lso_header_size = (skb_inner_transport_header(skb) - skb->data) + inner_tcp_hdrlen(skb);
-		else
+		} else {
+			/* Detects large IPV6 TCP packets and prepares for removal of
+			 * HBH header that has been pushed by ip6_xmit(),
+			 * mainly so that tcpdump can dissect them.
+			 */
+			if (ipv6_has_hopopt_jumbo(skb))
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
 			*lso_header_size = skb_transport_offset(skb) + tcp_hdrlen(skb);
+		}
 		real_size = CTRL_SIZE + shinfo->nr_frags * DS_SIZE +
-			ALIGN(*lso_header_size + 4, DS_SIZE);
+			ALIGN(*lso_header_size - *hopbyhop + 4, DS_SIZE);
 		if (unlikely(*lso_header_size != skb_headlen(skb))) {
 			/* We add a segment for the skb linear buffer only if
 			 * it contains data */
@@ -874,6 +884,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	int desc_size;
 	int real_size;
 	u32 index, bf_index;
+	struct ipv6hdr *h6;
 	__be32 op_own;
 	int lso_header_size;
 	void *fragptr = NULL;
@@ -882,6 +893,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool stop_queue;
 	bool inline_ok;
 	u8 data_offset;
+	int hopbyhop;
 	bool bf_ok;
 
 	tx_ind = skb_get_queue_mapping(skb);
@@ -891,7 +903,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto tx_drop;
 
 	real_size = get_real_size(skb, shinfo, dev, &lso_header_size,
-				  &inline_ok, &fragptr);
+				  &inline_ok, &fragptr, &hopbyhop);
 	if (unlikely(!real_size))
 		goto tx_drop_count;
 
@@ -944,7 +956,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		data = &tx_desc->data;
 		data_offset = offsetof(struct mlx4_en_tx_desc, data);
 	} else {
-		int lso_align = ALIGN(lso_header_size + 4, DS_SIZE);
+		int lso_align = ALIGN(lso_header_size - hopbyhop + 4, DS_SIZE);
 
 		data = (void *)&tx_desc->lso + lso_align;
 		data_offset = offsetof(struct mlx4_en_tx_desc, lso) + lso_align;
@@ -1009,14 +1021,31 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 			((ring->prod & ring->size) ?
 				cpu_to_be32(MLX4_EN_BIT_DESC_OWN) : 0);
 
+		lso_header_size -= hopbyhop;
 		/* Fill in the LSO prefix */
 		tx_desc->lso.mss_hdr_size = cpu_to_be32(
 			shinfo->gso_size << 16 | lso_header_size);
 
-		/* Copy headers;
-		 * note that we already verified that it is linear */
-		memcpy(tx_desc->lso.header, skb->data, lso_header_size);
 
+		if (unlikely(hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(tx_desc->lso.header, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)tx_desc->lso.header + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			/* Copy headers;
+			 * note that we already verified that it is linear
+			 */
+			memcpy(tx_desc->lso.header, skb->data, lso_header_size);
+		}
 		ring->tso_packets++;
 
 		i = shinfo->gso_segs;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v4 net-next 14/14] mlx5: support BIG TCP packets
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (12 preceding siblings ...)
  2022-03-10  5:47 ` [PATCH v4 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
@ 2022-03-10  5:47 ` Eric Dumazet
  2022-03-11 17:13 ` [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Alexander H Duyck
  14 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-10  5:47 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky

From: Coco Li <lixiaoyan@google.com>

mlx5 supports LSOv2.

IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore/skip this HBH header when populating TX descriptor.

Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.

v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 2 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b2ed2f6d4a9208aebfd17fd0c503cd1e37c39ee1..1e51ce1d74486392a26568852c5068fe9047296d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4910,6 +4910,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
+	netif_set_tso_ipv6_max_size(netdev, 512 * 1024);
 	mlx5e_set_netdev_dev_addr(netdev);
 	mlx5e_ipsec_build_netdev(priv);
 	mlx5e_tls_build_netdev(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2dc48406cd08d21ff94f665cd61ab9227f351215..b4fc45ba1b347fb9ad0f46b9c091cc45e4d3d84f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -40,6 +40,7 @@
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec_rxtx.h"
 #include "en/ptp.h"
+#include <net/ipv6.h>
 
 static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma)
 {
@@ -130,23 +131,32 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		sq->stats->csum_none++;
 }
 
+/* Returns the number of header bytes that we plan
+ * to inline later in the transmit descriptor
+ */
 static inline u16
-mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb)
+mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb, int *hopbyhop)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
 	u16 ihs;
 
+	*hopbyhop = 0;
 	if (skb->encapsulation) {
 		ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
 		stats->tso_inner_packets++;
 		stats->tso_inner_bytes += skb->len - ihs;
 	} else {
-		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
 			ihs = skb_transport_offset(skb) + sizeof(struct udphdr);
-		else
+		} else {
 			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
+			if (ipv6_has_hopopt_jumbo(skb)) {
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
+				ihs -= sizeof(struct hop_jumbo_hdr);
+			}
+		}
 		stats->tso_packets++;
-		stats->tso_bytes += skb->len - ihs;
+		stats->tso_bytes += skb->len - ihs - *hopbyhop;
 	}
 
 	return ihs;
@@ -208,6 +218,7 @@ struct mlx5e_tx_attr {
 	__be16 mss;
 	u16 insz;
 	u8 opcode;
+	u8 hopbyhop;
 };
 
 struct mlx5e_tx_wqe_attr {
@@ -244,14 +255,16 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5e_sq_stats *stats = sq->stats;
 
 	if (skb_is_gso(skb)) {
-		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb);
+		int hopbyhop;
+		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
 
 		*attr = (struct mlx5e_tx_attr) {
 			.opcode    = MLX5_OPCODE_LSO,
 			.mss       = cpu_to_be16(skb_shinfo(skb)->gso_size),
 			.ihs       = ihs,
 			.num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs,
-			.headlen   = skb_headlen(skb) - ihs,
+			.headlen   = skb_headlen(skb) - ihs - hopbyhop,
+			.hopbyhop  = hopbyhop,
 		};
 
 		stats->packets += skb_shinfo(skb)->gso_segs;
@@ -365,7 +378,8 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5_wqe_eth_seg  *eseg;
 	struct mlx5_wqe_data_seg *dseg;
 	struct mlx5e_tx_wqe_info *wi;
-
+	u16 ihs = attr->ihs;
+	struct ipv6hdr *h6;
 	struct mlx5e_sq_stats *stats = sq->stats;
 	int num_dma;
 
@@ -379,15 +393,36 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	eseg->mss = attr->mss;
 
-	if (attr->ihs) {
-		if (skb_vlan_tag_present(skb)) {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs + VLAN_HLEN);
-			mlx5e_insert_vlan(eseg->inline_hdr.start, skb, attr->ihs);
+	if (ihs) {
+		u8 *start = eseg->inline_hdr.start;
+
+		if (unlikely(attr->hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			if (skb_vlan_tag_present(skb)) {
+				mlx5e_insert_vlan(start, skb, ETH_HLEN + sizeof(*h6));
+				ihs += VLAN_HLEN;
+				h6 = (struct ipv6hdr *)(start + sizeof(struct vlan_ethhdr));
+			} else {
+				memcpy(start, skb->data, ETH_HLEN + sizeof(*h6));
+				h6 = (struct ipv6hdr *)(start + ETH_HLEN);
+			}
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else if (skb_vlan_tag_present(skb)) {
+			mlx5e_insert_vlan(start, skb, ihs);
+			ihs += VLAN_HLEN;
 			stats->added_vlan_packets++;
 		} else {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs);
-			memcpy(eseg->inline_hdr.start, skb->data, attr->ihs);
+			memcpy(start, skb->data, ihs);
 		}
+		eseg->inline_hdr.sz |= cpu_to_be16(ihs);
 		dseg += wqe_attr->ds_cnt_inl;
 	} else if (skb_vlan_tag_present(skb)) {
 		eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN);
@@ -398,7 +433,7 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	}
 
 	dseg += wqe_attr->ds_cnt_ids;
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs + attr->hopbyhop,
 					  attr->headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
@@ -918,12 +953,29 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	eseg->mss = attr.mss;
 
 	if (attr.ihs) {
-		memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		if (unlikely(attr.hopbyhop)) {
+			struct ipv6hdr *h6;
+
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(eseg->inline_hdr.start, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)eseg->inline_hdr.start + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		}
 		eseg->inline_hdr.sz = cpu_to_be16(attr.ihs);
 		dseg += wqe_attr.ds_cnt_inl;
 	}
 
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs + attr.hopbyhop,
 					  attr.headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
-- 
2.35.1.616.g0bdcbb4464-goog


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size
  2022-03-10  5:46 ` [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
@ 2022-03-11 16:21   ` Alexander H Duyck
  2022-03-15 15:57     ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander H Duyck @ 2022-03-11 16:21 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet

On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> From: Coco Li <lixiaoyan@google.com>
> 
> This enable TCP stack to build TSO packets bigger than
> 64KB if the driver is LSOv2 compatible.
> 
> This patch introduces new variable gso_ipv6_max_size
> that is modifiable through ip link.
> 
> ip link set dev eth0 gso_ipv6_max_size 185000
> 
> User input is capped by driver limit (tso_ipv6_max_size)
> added in previous patch.
> 
> Signed-off-by: Coco Li <lixiaoyan@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  include/linux/netdevice.h          | 12 ++++++++++++
>  include/uapi/linux/if_link.h       |  1 +
>  net/core/dev.c                     |  1 +
>  net/core/rtnetlink.c               | 15 +++++++++++++++
>  net/core/sock.c                    |  6 ++++++
>  tools/include/uapi/linux/if_link.h |  1 +
>  6 files changed, 36 insertions(+)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 61db67222c47664c179b6a5d3b6f15fdf8a02bdd..9ed348d8b6f1195514c3b5f85fbe2c45b3fa997f 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1952,6 +1952,7 @@ enum netdev_ml_priv_type {
>   *					registered
>   *	@offload_xstats_l3:	L3 HW stats for this netdevice.
>   *	@tso_ipv6_max_size:	Maximum size of IPv6 TSO packets (driver/NIC limit)
> + *	@gso_ipv6_max_size:	Maximum size of IPv6 GSO packets (user/admin limit)
>   *
>   *	FIXME: cleanup struct net_device such that network protocol info
>   *	moves out.
> @@ -2291,6 +2292,7 @@ struct net_device {
>  	netdevice_tracker	dev_registered_tracker;
>  	struct rtnl_hw_stats64	*offload_xstats_l3;
>  	unsigned int		tso_ipv6_max_size;
> +	unsigned int		gso_ipv6_max_size;
>  };
>  #define to_net_dev(d) container_of(d, struct net_device, dev)
> 

Rather than have this as a device specific value would it be
advantageous to consider making this a namespace specific sysctl value
instead? Something along the lines of:
  net.ipv6.conf.*.max_jumbogram_size

It could also be applied generically to the GSO/GRO as the upper limit
for any frame assembled by the socket or GRO.

The general idea is that might be desirable for admins to be able to
basically just set the maximum size they want to see for IPv6 frames
and if we could combine the GRO/GSO logic into a single sysctl that
could be set on a namespace basis instead of a device basis which would
be more difficult to track down. We already have the per-device limits
in the tso_ipv6_max_size for the outgoing frames so it seems like it
might make sense to make this per network namespace and defaultable
rather than per device and requiring an update for each device
instance.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-10  5:46 ` [PATCH v4 net-next 06/14] ipv6/gro: insert " Eric Dumazet
@ 2022-03-11 16:24   ` Alexander H Duyck
  2022-03-15 16:01     ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander H Duyck @ 2022-03-11 16:24 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet

On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
> BIG TCP ipv6 packets (bigger than 64K).
> 

This looks like it belongs in the next patch, not this one. This patch
is adding the HBH header.

> This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
> so that resulting packet can go through IPv6/TCP stacks.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  net/ipv6/ip6_offload.c | 32 ++++++++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index a6a6c1539c28d242ef8c35fcd5ce900512ce912d..d12dba2dd5354dbb79bb80df4038dec2544cddeb 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -342,15 +342,43 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
>  INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
>  {
>  	const struct net_offload *ops;
> -	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
> +	struct ipv6hdr *iph;
>  	int err = -ENOSYS;
> +	u32 payload_len;
>  
>  	if (skb->encapsulation) {
>  		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
>  		skb_set_inner_network_header(skb, nhoff);
>  	}
>  
> -	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
> +	payload_len = skb->len - nhoff - sizeof(*iph);
> +	if (unlikely(payload_len > IPV6_MAXPLEN)) {
> +		struct hop_jumbo_hdr *hop_jumbo;
> +		int hoplen = sizeof(*hop_jumbo);
> +
> +		/* Move network header left */
> +		memmove(skb_mac_header(skb) - hoplen, skb_mac_header(skb),
> +			skb->transport_header - skb->mac_header);
> +		skb->data -= hoplen;
> +		skb->len += hoplen;
> +		skb->mac_header -= hoplen;
> +		skb->network_header -= hoplen;
> +		iph = (struct ipv6hdr *)(skb->data + nhoff);
> +		hop_jumbo = (struct hop_jumbo_hdr *)(iph + 1);
> +
> +		/* Build hop-by-hop options */
> +		hop_jumbo->nexthdr = iph->nexthdr;
> +		hop_jumbo->hdrlen = 0;
> +		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
> +		hop_jumbo->tlv_len = 4;
> +		hop_jumbo->jumbo_payload_len = htonl(payload_len + hoplen);
> +
> +		iph->nexthdr = NEXTHDR_HOP;
> +		iph->payload_len = 0;
> +	} else {
> +		iph = (struct ipv6hdr *)(skb->data + nhoff);
> +		iph->payload_len = htons(payload_len);
> +	}
>  
>  	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
>  	if (WARN_ON(!ops || !ops->callbacks.gro_complete))



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation
  2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
                   ` (13 preceding siblings ...)
  2022-03-10  5:47 ` [PATCH v4 net-next 14/14] mlx5: " Eric Dumazet
@ 2022-03-11 17:13 ` Alexander H Duyck
  2022-03-15 15:50   ` Eric Dumazet
  14 siblings, 1 reply; 27+ messages in thread
From: Alexander H Duyck @ 2022-03-11 17:13 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet

On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> This series implements BIG TCP as presented in netdev 0x15:
> 
> https://netdevconf.info/0x15/session.html?BIG-TCP
> 
> Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> 
> Standard TSO/GRO packet limit is 64KB
> 
> With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> 
> Note that this feature is by default not enabled, because it might
> break some eBPF programs assuming TCP header immediately follows IPv6 header.
> 
> While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> are unable to skip over IPv6 extension headers.
> 
> Reducing number of packets traversing networking stack usually improves
> performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> 
> 'Standard' performance with current (74KB) limits.
> for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> 77           138          183          8542.19    
> 79           143          178          8215.28    
> 70           117          164          9543.39    
> 80           144          176          8183.71    
> 78           126          155          9108.47    
> 80           146          184          8115.19    
> 71           113          165          9510.96    
> 74           113          164          9518.74    
> 79           137          178          8575.04    
> 73           111          171          9561.73    
> 
> Now enable BIG TCP on both hosts.
> 
> ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
> for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> 57           83           117          13871.38   
> 64           118          155          11432.94   
> 65           116          148          11507.62   
> 60           105          136          12645.15   
> 60           103          135          12760.34   
> 60           102          134          12832.64   
> 62           109          132          10877.68   
> 58           82           115          14052.93   
> 57           83           124          14212.58   
> 57           82           119          14196.01   
> 
> We see an increase of transactions per second, and lower latencies as well.
> 
> v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)
> 
> v3: Fixed a typo in RFC number (Alexander)
>     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> 
> v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
>     Addressed feedback, for Alexander and nvidia folks.

One concern with this patch set is the addition of all the max_size
netdev attributes for tsov6, gsov6, and grov6. For the gsov6 and grov6
maxes I really think these make more sense as sysctl values since it
feels more like a protocol change rather than a netdev specific one.

If I recall correctly the addition of gso_max_size and gso_max_segs
were added as a workaround for NICs that couldn't handle offloading
frames larger than a certain size. This feels like increasing the scope
of the workaround rather than adding a new feature.

I didn't see the patch that went by for gro_max_size but I am not a fan
of the way it was added since it would make more sense as a sysctl
which controlled the stack instead of something that is device specific
since as far as the device is concerned it received MTU size frames,
and GRO happens above the device. I suppose it makes things symmetric
with gso_max_size, but at the same time it isn't really a device
specific attribute since the work happens in the stack above the
device.

Do we need to add the IPv6 specific version of the tso_ipv6_max_size?
Could we instead just allow setting the gso_max_size value larger than
64K? Then it would just be a matter of having a protocol specific max
size check to pull us back down to GSO_MAX_SIZE in the case of non-ipv6
frames.







^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation
  2022-03-11 17:13 ` [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Alexander H Duyck
@ 2022-03-15 15:50   ` Eric Dumazet
  2022-03-15 16:17     ` Alexander Duyck
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2022-03-15 15:50 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li

On Fri, Mar 11, 2022 at 9:13 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > This series implements BIG TCP as presented in netdev 0x15:
> >
> > https://netdevconf.info/0x15/session.html?BIG-TCP
> >
> > Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> >
> > Standard TSO/GRO packet limit is 64KB
> >
> > With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> >
> > Note that this feature is by default not enabled, because it might
> > break some eBPF programs assuming TCP header immediately follows IPv6 header.
> >
> > While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> > are unable to skip over IPv6 extension headers.
> >
> > Reducing number of packets traversing networking stack usually improves
> > performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> >
> > 'Standard' performance with current (74KB) limits.
> > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > 77           138          183          8542.19
> > 79           143          178          8215.28
> > 70           117          164          9543.39
> > 80           144          176          8183.71
> > 78           126          155          9108.47
> > 80           146          184          8115.19
> > 71           113          165          9510.96
> > 74           113          164          9518.74
> > 79           137          178          8575.04
> > 73           111          171          9561.73
> >
> > Now enable BIG TCP on both hosts.
> >
> > ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
> > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > 57           83           117          13871.38
> > 64           118          155          11432.94
> > 65           116          148          11507.62
> > 60           105          136          12645.15
> > 60           103          135          12760.34
> > 60           102          134          12832.64
> > 62           109          132          10877.68
> > 58           82           115          14052.93
> > 57           83           124          14212.58
> > 57           82           119          14196.01
> >
> > We see an increase of transactions per second, and lower latencies as well.
> >
> > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)
> >
> > v3: Fixed a typo in RFC number (Alexander)
> >     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> >
> > v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
> >     Addressed feedback, for Alexander and nvidia folks.
>
> One concern with this patch set is the addition of all the max_size
> netdev attributes for tsov6, gsov6, and grov6. For the gsov6 and grov6
> maxes I really think these make more sense as sysctl values since it
> feels more like a protocol change rather than a netdev specific one.
>
> If I recall correctly the addition of gso_max_size and gso_max_segs
> were added as a workaround for NICs that couldn't handle offloading
> frames larger than a certain size. This feels like increasing the scope
> of the workaround rather than adding a new feature.
>
> I didn't see the patch that went by for gro_max_size but I am not a fan
> of the way it was added since it would make more sense as a sysctl
> which controlled the stack instead of something that is device specific
> since as far as the device is concerned it received MTU size frames,
> and GRO happens above the device. I suppose it makes things symmetric
> with gso_max_size, but at the same time it isn't really a device
> specific attribute since the work happens in the stack above the
> device.
>

We already have per device gso_max_size and gso_max_segs.

GRO max size being per device is nice. There are cases where a host
has multiple NIC,
one of them being used for incoming traffic that needs to be forwarded.

Maybe the changelog was not clear enough, but being able to lower gro_max_size
is also a way to prevent frag_list being used, so that most NIC
support TSO just fine.


> Do we need to add the IPv6 specific version of the tso_ipv6_max_size?
> Could we instead just allow setting the gso_max_size value larger than
> 64K? Then it would just be a matter of having a protocol specific max
> size check to pull us back down to GSO_MAX_SIZE in the case of non-ipv6
> frames.

Not sure why adding attributes is an issue really, more flexibility
seems better to me.

One day, if someone adds LSOv2 to IPv4, I prefer being able to
selectively turn on this support,
after tests have concluded nothing broke.

Having to turn off LSOv2 in emergency because of some bug in LSOv2
IPv4 implementation would be bad.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size
  2022-03-11 16:21   ` Alexander H Duyck
@ 2022-03-15 15:57     ` Eric Dumazet
  0 siblings, 0 replies; 27+ messages in thread
From: Eric Dumazet @ 2022-03-15 15:57 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li

On Fri, Mar 11, 2022 at 8:22 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> >
> > This enable TCP stack to build TSO packets bigger than
> > 64KB if the driver is LSOv2 compatible.
> >
> > This patch introduces new variable gso_ipv6_max_size
> > that is modifiable through ip link.
> >
> > ip link set dev eth0 gso_ipv6_max_size 185000
> >
> > User input is capped by driver limit (tso_ipv6_max_size)
> > added in previous patch.
> >
> > Signed-off-by: Coco Li <lixiaoyan@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  include/linux/netdevice.h          | 12 ++++++++++++
> >  include/uapi/linux/if_link.h       |  1 +
> >  net/core/dev.c                     |  1 +
> >  net/core/rtnetlink.c               | 15 +++++++++++++++
> >  net/core/sock.c                    |  6 ++++++
> >  tools/include/uapi/linux/if_link.h |  1 +
> >  6 files changed, 36 insertions(+)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 61db67222c47664c179b6a5d3b6f15fdf8a02bdd..9ed348d8b6f1195514c3b5f85fbe2c45b3fa997f 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1952,6 +1952,7 @@ enum netdev_ml_priv_type {
> >   *                                   registered
> >   *   @offload_xstats_l3:     L3 HW stats for this netdevice.
> >   *   @tso_ipv6_max_size:     Maximum size of IPv6 TSO packets (driver/NIC limit)
> > + *   @gso_ipv6_max_size:     Maximum size of IPv6 GSO packets (user/admin limit)
> >   *
> >   *   FIXME: cleanup struct net_device such that network protocol info
> >   *   moves out.
> > @@ -2291,6 +2292,7 @@ struct net_device {
> >       netdevice_tracker       dev_registered_tracker;
> >       struct rtnl_hw_stats64  *offload_xstats_l3;
> >       unsigned int            tso_ipv6_max_size;
> > +     unsigned int            gso_ipv6_max_size;
> >  };
> >  #define to_net_dev(d) container_of(d, struct net_device, dev)
> >
>
> Rather than have this as a device specific value would it be
> advantageous to consider making this a namespace specific sysctl value
> instead? Something along the lines of:
>   net.ipv6.conf.*.max_jumbogram_size
>
> It could also be applied generically to the GSO/GRO as the upper limit
> for any frame assembled by the socket or GRO.
>
> The general idea is that might be desirable for admins to be able to
> basically just set the maximum size they want to see for IPv6 frames
> and if we could combine the GRO/GSO logic into a single sysctl that
> could be set on a namespace basis instead of a device basis which would
> be more difficult to track down. We already have the per-device limits
> in the tso_ipv6_max_size for the outgoing frames so it seems like it
> might make sense to make this per network namespace and defaultable
> rather than per device and requiring an update for each device
> instance.
>

At least Google found it was easier to have per device controls, in
terms of testing the feature,
and gradually deploying it.

We have hosts with multiple NIC, of different types. We want to be
able to control BIG TCP on a per device basis.
For instance I had a bug in one of the implementation for one (non
upstream) driver, that I could mitigate
by setting a different limit only for this NIC, until the host can
boot with a fixed kernel.

We use ipvlan, with one private net-ns and IPv6 address per job, we
wanted to deploy BIG TCP on a per job basis

I guess that if you want to add a sysctl, automatically overriding the
per device setting, this could be done later ?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-11 16:24   ` Alexander H Duyck
@ 2022-03-15 16:01     ` Eric Dumazet
  2022-03-15 16:04       ` Alexander Duyck
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2022-03-15 16:01 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li

On Fri, Mar 11, 2022 at 8:24 AM Alexander H Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
> > BIG TCP ipv6 packets (bigger than 64K).
> >
>
> This looks like it belongs in the next patch, not this one. This patch
> is adding the HBH header.

What do you mean by "it belongs" ?

Do you want me to squash the patches, or remove the first sentence ?

I am confused.

>
> > This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
> > so that resulting packet can go through IPv6/TCP stacks.
> >

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-15 16:01     ` Eric Dumazet
@ 2022-03-15 16:04       ` Alexander Duyck
  2022-03-15 16:10         ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Duyck @ 2022-03-15 16:04 UTC (permalink / raw)
  To: Eric Dumazet, Alexander H Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev, Coco Li



> -----Original Message-----
> From: Eric Dumazet <edumazet@google.com>
> Sent: Tuesday, March 15, 2022 9:02 AM
> To: Alexander H Duyck <alexander.duyck@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>; David S . Miller
> <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; netdev
> <netdev@vger.kernel.org>; Alexander Duyck <alexanderduyck@fb.com>;
> Coco Li <lixiaoyan@google.com>
> Subject: Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary
> HBH/jumbo header
> 
> On Fri, Mar 11, 2022 at 8:24 AM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
> > > BIG TCP ipv6 packets (bigger than 64K).
> > >
> >
> > This looks like it belongs in the next patch, not this one. This patch
> > is adding the HBH header.
> 
> What do you mean by "it belongs" ?
> 
> Do you want me to squash the patches, or remove the first sentence ?
> 
> I am confused.

It is about the sentence. Your next patch essentially has that as the title and actually does add GRO_IPV6_MAX_SIZE. I wasn't sure if you reordered the patches or split them. However as I recall I didn't see anything in this patch that added GRO_IPV6_MAX_SIZE.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-15 16:04       ` Alexander Duyck
@ 2022-03-15 16:10         ` Eric Dumazet
  2022-03-15 17:35           ` Alexander Duyck
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2022-03-15 16:10 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander H Duyck, Eric Dumazet, David S . Miller,
	Jakub Kicinski, netdev, Coco Li

On Tue, Mar 15, 2022 at 9:04 AM Alexander Duyck <alexanderduyck@fb.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Eric Dumazet <edumazet@google.com>
> > Sent: Tuesday, March 15, 2022 9:02 AM
> > To: Alexander H Duyck <alexander.duyck@gmail.com>
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>; David S . Miller
> > <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; netdev
> > <netdev@vger.kernel.org>; Alexander Duyck <alexanderduyck@fb.com>;
> > Coco Li <lixiaoyan@google.com>
> > Subject: Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary
> > HBH/jumbo header
> >
> > On Fri, Mar 11, 2022 at 8:24 AM Alexander H Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > > > From: Eric Dumazet <edumazet@google.com>
> > > >
> > > > Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
> > > > BIG TCP ipv6 packets (bigger than 64K).
> > > >
> > >
> > > This looks like it belongs in the next patch, not this one. This patch
> > > is adding the HBH header.
> >
> > What do you mean by "it belongs" ?
> >
> > Do you want me to squash the patches, or remove the first sentence ?
> >
> > I am confused.
>
> It is about the sentence. Your next patch essentially has that as the title and actually does add GRO_IPV6_MAX_SIZE. I wasn't sure if you reordered the patches or split them. However as I recall I didn't see anything in this patch that added GRO_IPV6_MAX_SIZE.


I used "Following patch will", meaning the patch following _this_ one,
sorry if this is confusing.

I would have used  "This patch is ..." if I wanted to describe what
this patch is doing.

Patches were not reordered, and have two different authors.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation
  2022-03-15 15:50   ` Eric Dumazet
@ 2022-03-15 16:17     ` Alexander Duyck
  2022-03-15 16:33       ` Eric Dumazet
  0 siblings, 1 reply; 27+ messages in thread
From: Alexander Duyck @ 2022-03-15 16:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li

On Tue, Mar 15, 2022 at 8:50 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Mar 11, 2022 at 9:13 AM Alexander H Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > This series implements BIG TCP as presented in netdev 0x15:
> > >
> > > https://netdevconf.info/0x15/session.html?BIG-TCP
> > >
> > > Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> > >
> > > Standard TSO/GRO packet limit is 64KB
> > >
> > > With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> > >
> > > Note that this feature is by default not enabled, because it might
> > > break some eBPF programs assuming TCP header immediately follows IPv6 header.
> > >
> > > While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> > > are unable to skip over IPv6 extension headers.
> > >
> > > Reducing number of packets traversing networking stack usually improves
> > > performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> > >
> > > 'Standard' performance with current (74KB) limits.
> > > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > > 77           138          183          8542.19
> > > 79           143          178          8215.28
> > > 70           117          164          9543.39
> > > 80           144          176          8183.71
> > > 78           126          155          9108.47
> > > 80           146          184          8115.19
> > > 71           113          165          9510.96
> > > 74           113          164          9518.74
> > > 79           137          178          8575.04
> > > 73           111          171          9561.73
> > >
> > > Now enable BIG TCP on both hosts.
> > >
> > > ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
> > > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > > 57           83           117          13871.38
> > > 64           118          155          11432.94
> > > 65           116          148          11507.62
> > > 60           105          136          12645.15
> > > 60           103          135          12760.34
> > > 60           102          134          12832.64
> > > 62           109          132          10877.68
> > > 58           82           115          14052.93
> > > 57           83           124          14212.58
> > > 57           82           119          14196.01
> > >
> > > We see an increase of transactions per second, and lower latencies as well.
> > >
> > > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)
> > >
> > > v3: Fixed a typo in RFC number (Alexander)
> > >     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> > >
> > > v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
> > >     Addressed feedback, for Alexander and nvidia folks.
> >
> > One concern with this patch set is the addition of all the max_size
> > netdev attributes for tsov6, gsov6, and grov6. For the gsov6 and grov6
> > maxes I really think these make more sense as sysctl values since it
> > feels more like a protocol change rather than a netdev specific one.
> >
> > If I recall correctly the addition of gso_max_size and gso_max_segs
> > were added as a workaround for NICs that couldn't handle offloading
> > frames larger than a certain size. This feels like increasing the scope
> > of the workaround rather than adding a new feature.
> >
> > I didn't see the patch that went by for gro_max_size but I am not a fan
> > of the way it was added since it would make more sense as a sysctl
> > which controlled the stack instead of something that is device specific
> > since as far as the device is concerned it received MTU size frames,
> > and GRO happens above the device. I suppose it makes things symmetric
> > with gso_max_size, but at the same time it isn't really a device
> > specific attribute since the work happens in the stack above the
> > device.
> >
>
> We already have per device gso_max_size and gso_max_segs.
>
> GRO max size being per device is nice. There are cases where a host
> has multiple NIC,
> one of them being used for incoming traffic that needs to be forwarded.
>
> Maybe the changelog was not clear enough, but being able to lower gro_max_size
> is also a way to prevent frag_list being used, so that most NIC
> support TSO just fine.

The point is gso_max_size and gso_max_segs were workarounds for
devices. Basically they weren't added until it was found that specific
NICs were having issues with segmenting frames either larger than a
specific size or number of segments.

I suppose we can keep gro_max_size in place if we are wanting to say
that it ties back into the device. There may be some benefit there if
we end up with some devices allocating skbs that can aggregate more
segments than others, however in that case that seems more like a
segment limit than a size limit. Maybe something like gro_max_segs
would make more sense, or I suppose we end up with both eventually.

> > Do we need to add the IPv6 specific version of the tso_ipv6_max_size?
> > Could we instead just allow setting the gso_max_size value larger than
> > 64K? Then it would just be a matter of having a protocol specific max
> > size check to pull us back down to GSO_MAX_SIZE in the case of non-ipv6
> > frames.
>
> Not sure why adding attributes is an issue really, more flexibility
> seems better to me.
>
> One day, if someone adds LSOv2 to IPv4, I prefer being able to
> selectively turn on this support,
> after tests have concluded nothing broke.
>
> Having to turn off LSOv2 in emergency because of some bug in LSOv2
> IPv4 implementation would be bad.

You already have a means of turning off LSOv2 in the form of
gso_max_size. Remember it was put there as a workaround for broken
devices that couldn't fully support LSOv1. In my mind we would reuse
the gso_max_size, and I suppose gro_max_size to control the GSO types
supported by the device. If it is set for 64K or less then it only
supports LSOv1, if it is set higher, then it can support GSOv2. I'm
not sure it makes sense to split gso_max_size into two versions. If a
device supports GSO larger than 64K then what is the point in having
gso_max_size around when it will always be set to 64K because the
device isn't broken. Otherwise we are going to end up creating a bunch
of duplications.

What I am getting at is that it would be nice to have a stack level
control and then a device level control. The device control is there
to say what size drivers support when segmenting, whereas the stack
level control says if the protocol wants to try sending down frames
larger than 64K. So essentially all non-IPv6 protocols will cap at
64K, whereas IPv6 can go beyond that with huge frames. Then you get
the best of both worlds.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation
  2022-03-15 16:17     ` Alexander Duyck
@ 2022-03-15 16:33       ` Eric Dumazet
  2022-03-15 17:20         ` Alexander Duyck
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Dumazet @ 2022-03-15 16:33 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li

On Tue, Mar 15, 2022 at 9:17 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Tue, Mar 15, 2022 at 8:50 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Mar 11, 2022 at 9:13 AM Alexander H Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > > > From: Eric Dumazet <edumazet@google.com>
> > > >
> > > > This series implements BIG TCP as presented in netdev 0x15:
> > > >
> > > > https://netdevconf.info/0x15/session.html?BIG-TCP
> > > >
> > > > Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> > > >
> > > > Standard TSO/GRO packet limit is 64KB
> > > >
> > > > With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> > > >
> > > > Note that this feature is by default not enabled, because it might
> > > > break some eBPF programs assuming TCP header immediately follows IPv6 header.
> > > >
> > > > While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> > > > are unable to skip over IPv6 extension headers.
> > > >
> > > > Reducing number of packets traversing networking stack usually improves
> > > > performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> > > >
> > > > 'Standard' performance with current (74KB) limits.
> > > > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > > > 77           138          183          8542.19
> > > > 79           143          178          8215.28
> > > > 70           117          164          9543.39
> > > > 80           144          176          8183.71
> > > > 78           126          155          9108.47
> > > > 80           146          184          8115.19
> > > > 71           113          165          9510.96
> > > > 74           113          164          9518.74
> > > > 79           137          178          8575.04
> > > > 73           111          171          9561.73
> > > >
> > > > Now enable BIG TCP on both hosts.
> > > >
> > > > ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
> > > > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > > > 57           83           117          13871.38
> > > > 64           118          155          11432.94
> > > > 65           116          148          11507.62
> > > > 60           105          136          12645.15
> > > > 60           103          135          12760.34
> > > > 60           102          134          12832.64
> > > > 62           109          132          10877.68
> > > > 58           82           115          14052.93
> > > > 57           83           124          14212.58
> > > > 57           82           119          14196.01
> > > >
> > > > We see an increase of transactions per second, and lower latencies as well.
> > > >
> > > > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)
> > > >
> > > > v3: Fixed a typo in RFC number (Alexander)
> > > >     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> > > >
> > > > v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
> > > >     Addressed feedback, for Alexander and nvidia folks.
> > >
> > > One concern with this patch set is the addition of all the max_size
> > > netdev attributes for tsov6, gsov6, and grov6. For the gsov6 and grov6
> > > maxes I really think these make more sense as sysctl values since it
> > > feels more like a protocol change rather than a netdev specific one.
> > >
> > > If I recall correctly the addition of gso_max_size and gso_max_segs
> > > were added as a workaround for NICs that couldn't handle offloading
> > > frames larger than a certain size. This feels like increasing the scope
> > > of the workaround rather than adding a new feature.
> > >
> > > I didn't see the patch that went by for gro_max_size but I am not a fan
> > > of the way it was added since it would make more sense as a sysctl
> > > which controlled the stack instead of something that is device specific
> > > since as far as the device is concerned it received MTU size frames,
> > > and GRO happens above the device. I suppose it makes things symmetric
> > > with gso_max_size, but at the same time it isn't really a device
> > > specific attribute since the work happens in the stack above the
> > > device.
> > >
> >
> > We already have per device gso_max_size and gso_max_segs.
> >
> > GRO max size being per device is nice. There are cases where a host
> > has multiple NIC,
> > one of them being used for incoming traffic that needs to be forwarded.
> >
> > Maybe the changelog was not clear enough, but being able to lower gro_max_size
> > is also a way to prevent frag_list being used, so that most NIC
> > support TSO just fine.
>
> The point is gso_max_size and gso_max_segs were workarounds for
> devices. Basically they weren't added until it was found that specific
> NICs were having issues with segmenting frames either larger than a
> specific size or number of segments.

These settings are used in our tests, when we want to precisely
control size of TSO.

There are not only used by drivers.


>
> I suppose we can keep gro_max_size in place if we are wanting to say
> that it ties back into the device. There may be some benefit there if
> we end up with some devices allocating skbs that can aggregate more
> segments than others, however in that case that seems more like a
> segment limit than a size limit. Maybe something like gro_max_segs
> would make more sense, or I suppose we end up with both eventually.
>
> > > Do we need to add the IPv6 specific version of the tso_ipv6_max_size?
> > > Could we instead just allow setting the gso_max_size value larger than
> > > 64K? Then it would just be a matter of having a protocol specific max
> > > size check to pull us back down to GSO_MAX_SIZE in the case of non-ipv6
> > > frames.
> >
> > Not sure why adding attributes is an issue really, more flexibility
> > seems better to me.
> >
> > One day, if someone adds LSOv2 to IPv4, I prefer being able to
> > selectively turn on this support,
> > after tests have concluded nothing broke.
> >
> > Having to turn off LSOv2 in emergency because of some bug in LSOv2
> > IPv4 implementation would be bad.
>
> You already have a means of turning off LSOv2 in the form of
> gso_max_size. Remember it was put there as a workaround for broken
> devices that couldn't fully support LSOv1. In my mind we would reuse
> the gso_max_size, and I suppose gro_max_size to control the GSO types
> supported by the device. If it is set for 64K or less then it only
> supports LSOv1, if it is set higher, then it can support GSOv2. I'm
> not sure it makes sense to split gso_max_size into two versions. If a
> device supports GSO larger than 64K then what is the point in having
> gso_max_size around when it will always be set to 64K because the
> device isn't broken. Otherwise we are going to end up creating a bunch
> of duplications.

OK, but if a buggy user script is currently doing this:

ip link set dev eth1 gso_max_size 70000

Old kernels were ignoring this request.

Suddenly a new kernel comes, and user ends up with a broken setup.

>
> What I am getting at is that it would be nice to have a stack level
> control and then a device level control. The device control is there
> to say what size drivers support when segmenting, whereas the stack
> level control says if the protocol wants to try sending down frames
> larger than 64K. So essentially all non-IPv6 protocols will cap at
> 64K, whereas IPv6 can go beyond that with huge frames. Then you get
> the best of both worlds.

Can you explain how the per ipvlan setting would be allowed ?

To me sysctls are the old ways of controlling things, not mentioning
adding more of them slow down netns creation and dismantles.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v4 net-next 00/14] tcp: BIG TCP implementation
  2022-03-15 16:33       ` Eric Dumazet
@ 2022-03-15 17:20         ` Alexander Duyck
  0 siblings, 0 replies; 27+ messages in thread
From: Alexander Duyck @ 2022-03-15 17:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Coco Li

On Tue, Mar 15, 2022 at 9:33 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Mar 15, 2022 at 9:17 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Tue, Mar 15, 2022 at 8:50 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Fri, Mar 11, 2022 at 9:13 AM Alexander H Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > > > > From: Eric Dumazet <edumazet@google.com>
> > > > >
> > > > > This series implements BIG TCP as presented in netdev 0x15:
> > > > >
> > > > > https://netdevconf.info/0x15/session.html?BIG-TCP
> > > > >
> > > > > Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/
> > > > >
> > > > > Standard TSO/GRO packet limit is 64KB
> > > > >
> > > > > With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.
> > > > >
> > > > > Note that this feature is by default not enabled, because it might
> > > > > break some eBPF programs assuming TCP header immediately follows IPv6 header.
> > > > >
> > > > > While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
> > > > > are unable to skip over IPv6 extension headers.
> > > > >
> > > > > Reducing number of packets traversing networking stack usually improves
> > > > > performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.
> > > > >
> > > > > 'Standard' performance with current (74KB) limits.
> > > > > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > > > > 77           138          183          8542.19
> > > > > 79           143          178          8215.28
> > > > > 70           117          164          9543.39
> > > > > 80           144          176          8183.71
> > > > > 78           126          155          9108.47
> > > > > 80           146          184          8115.19
> > > > > 71           113          165          9510.96
> > > > > 74           113          164          9518.74
> > > > > 79           137          178          8575.04
> > > > > 73           111          171          9561.73
> > > > >
> > > > > Now enable BIG TCP on both hosts.
> > > > >
> > > > > ip link set dev eth0 gro_ipv6_max_size 185000 gso_ipv6_max_size 185000
> > > > > for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
> > > > > 57           83           117          13871.38
> > > > > 64           118          155          11432.94
> > > > > 65           116          148          11507.62
> > > > > 60           105          136          12645.15
> > > > > 60           103          135          12760.34
> > > > > 60           102          134          12832.64
> > > > > 62           109          132          10877.68
> > > > > 58           82           115          14052.93
> > > > > 57           83           124          14212.58
> > > > > 57           82           119          14196.01
> > > > >
> > > > > We see an increase of transactions per second, and lower latencies as well.
> > > > >
> > > > > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y in mlx5 (Jakub)
> > > > >
> > > > > v3: Fixed a typo in RFC number (Alexander)
> > > > >     Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.
> > > > >
> > > > > v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
> > > > >     Addressed feedback, for Alexander and nvidia folks.
> > > >
> > > > One concern with this patch set is the addition of all the max_size
> > > > netdev attributes for tsov6, gsov6, and grov6. For the gsov6 and grov6
> > > > maxes I really think these make more sense as sysctl values since it
> > > > feels more like a protocol change rather than a netdev specific one.
> > > >
> > > > If I recall correctly the addition of gso_max_size and gso_max_segs
> > > > were added as a workaround for NICs that couldn't handle offloading
> > > > frames larger than a certain size. This feels like increasing the scope
> > > > of the workaround rather than adding a new feature.
> > > >
> > > > I didn't see the patch that went by for gro_max_size but I am not a fan
> > > > of the way it was added since it would make more sense as a sysctl
> > > > which controlled the stack instead of something that is device specific
> > > > since as far as the device is concerned it received MTU size frames,
> > > > and GRO happens above the device. I suppose it makes things symmetric
> > > > with gso_max_size, but at the same time it isn't really a device
> > > > specific attribute since the work happens in the stack above the
> > > > device.
> > > >
> > >
> > > We already have per device gso_max_size and gso_max_segs.
> > >
> > > GRO max size being per device is nice. There are cases where a host
> > > has multiple NIC,
> > > one of them being used for incoming traffic that needs to be forwarded.
> > >
> > > Maybe the changelog was not clear enough, but being able to lower gro_max_size
> > > is also a way to prevent frag_list being used, so that most NIC
> > > support TSO just fine.
> >
> > The point is gso_max_size and gso_max_segs were workarounds for
> > devices. Basically they weren't added until it was found that specific
> > NICs were having issues with segmenting frames either larger than a
> > specific size or number of segments.
>
> These settings are used in our tests, when we want to precisely
> control size of TSO.
>
> There are not only used by drivers.

Yes, but what I am getting at is that for most drivers they are always
set to 64K. I get that you are using them to test things, and you
still could still use them after the change. All I am asking for is us
not to fork them into 3 different things where each one has a specific
protocol it supports.

> >
> > I suppose we can keep gro_max_size in place if we are wanting to say
> > that it ties back into the device. There may be some benefit there if
> > we end up with some devices allocating skbs that can aggregate more
> > segments than others, however in that case that seems more like a
> > segment limit than a size limit. Maybe something like gro_max_segs
> > would make more sense, or I suppose we end up with both eventually.
> >
> > > > Do we need to add the IPv6 specific version of the tso_ipv6_max_size?
> > > > Could we instead just allow setting the gso_max_size value larger than
> > > > 64K? Then it would just be a matter of having a protocol specific max
> > > > size check to pull us back down to GSO_MAX_SIZE in the case of non-ipv6
> > > > frames.
> > >
> > > Not sure why adding attributes is an issue really, more flexibility
> > > seems better to me.
> > >
> > > One day, if someone adds LSOv2 to IPv4, I prefer being able to
> > > selectively turn on this support,
> > > after tests have concluded nothing broke.
> > >
> > > Having to turn off LSOv2 in emergency because of some bug in LSOv2
> > > IPv4 implementation would be bad.
> >
> > You already have a means of turning off LSOv2 in the form of
> > gso_max_size. Remember it was put there as a workaround for broken
> > devices that couldn't fully support LSOv1. In my mind we would reuse
> > the gso_max_size, and I suppose gro_max_size to control the GSO types
> > supported by the device. If it is set for 64K or less then it only
> > supports LSOv1, if it is set higher, then it can support GSOv2. I'm
> > not sure it makes sense to split gso_max_size into two versions. If a
> > device supports GSO larger than 64K then what is the point in having
> > gso_max_size around when it will always be set to 64K because the
> > device isn't broken. Otherwise we are going to end up creating a bunch
> > of duplications.
>
> OK, but if a buggy user script is currently doing this:
>
> ip link set dev eth1 gso_max_size 70000
>
> Old kernels were ignoring this request.
>
> Suddenly a new kernel comes, and user ends up with a broken setup.

That could already happen. Recall that I mentioned this was added as a
workaround. So I would argue the code was already broken as there were
devices that didn't support 64K and somehow we are allowing that.

We probably need to add a function call to the drivers in order to
control the setting of this as there are some drivers that cannot
support the current limit of 64K. Then if it isn't set we just default
to 64K. For the drivers that support larger we can then overwrite the
function when setting up the jumbogram segmentation support value.

> >
> > What I am getting at is that it would be nice to have a stack level
> > control and then a device level control. The device control is there
> > to say what size drivers support when segmenting, whereas the stack
> > level control says if the protocol wants to try sending down frames
> > larger than 64K. So essentially all non-IPv6 protocols will cap at
> > 64K, whereas IPv6 can go beyond that with huge frames. Then you get
> > the best of both worlds.
>
> Can you explain how the per ipvlan setting would be allowed ?
>
> To me sysctls are the old ways of controlling things, not mentioning
> adding more of them slow down netns creation and dismantles.

So what I am suggesting is splitting it into two pieces. One being
gso_max_size which can be set to values larger than 64K if that is
what we want to support on that device. The other being a protocol
specific value where you can configure things to use a value greater
than 64K if that is supported by the protocol. That way if there is a
problem in the stack you can turn it off in one place, and if there is
a problem in the device you can turn it off in another.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary HBH/jumbo header
  2022-03-15 16:10         ` Eric Dumazet
@ 2022-03-15 17:35           ` Alexander Duyck
  0 siblings, 0 replies; 27+ messages in thread
From: Alexander Duyck @ 2022-03-15 17:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander H Duyck, Eric Dumazet, David S . Miller,
	Jakub Kicinski, netdev, Coco Li

> -----Original Message-----
> From: Eric Dumazet <edumazet@google.com>
> Sent: Tuesday, March 15, 2022 9:11 AM
> To: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Alexander H Duyck <alexander.duyck@gmail.com>; Eric Dumazet
> <eric.dumazet@gmail.com>; David S . Miller <davem@davemloft.net>;
> Jakub Kicinski <kuba@kernel.org>; netdev <netdev@vger.kernel.org>; Coco
> Li <lixiaoyan@google.com>
> Subject: Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary
> HBH/jumbo header
> 
> On Tue, Mar 15, 2022 at 9:04 AM Alexander Duyck
> <alexanderduyck@fb.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Eric Dumazet <edumazet@google.com>
> > > Sent: Tuesday, March 15, 2022 9:02 AM
> > > To: Alexander H Duyck <alexander.duyck@gmail.com>
> > > Cc: Eric Dumazet <eric.dumazet@gmail.com>; David S . Miller
> > > <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; netdev
> > > <netdev@vger.kernel.org>; Alexander Duyck
> <alexanderduyck@fb.com>;
> > > Coco Li <lixiaoyan@google.com>
> > > Subject: Re: [PATCH v4 net-next 06/14] ipv6/gro: insert temporary
> > > HBH/jumbo header
> > >
> > > On Fri, Mar 11, 2022 at 8:24 AM Alexander H Duyck
> > > <alexander.duyck@gmail.com> wrote:
> > > >
> > > > On Wed, 2022-03-09 at 21:46 -0800, Eric Dumazet wrote:
> > > > > From: Eric Dumazet <edumazet@google.com>
> > > > >
> > > > > Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to
> > > > > build BIG TCP ipv6 packets (bigger than 64K).
> > > > >
> > > >
> > > > This looks like it belongs in the next patch, not this one. This
> > > > patch is adding the HBH header.
> > >
> > > What do you mean by "it belongs" ?
> > >
> > > Do you want me to squash the patches, or remove the first sentence ?
> > >
> > > I am confused.
> >
> > It is about the sentence. Your next patch essentially has that as the title and
> actually does add GRO_IPV6_MAX_SIZE. I wasn't sure if you reordered the
> patches or split them. However as I recall I didn't see anything in this patch
> that added GRO_IPV6_MAX_SIZE.
> 
> 
> I used "Following patch will", meaning the patch following _this_ one, sorry if
> this is confusing.
> 
> I would have used  "This patch is ..." if I wanted to describe what this patch is
> doing.
> 
> Patches were not reordered, and have two different authors.

Yeah, the problem is I read "Following patch" as in the "The patch below". I would probably drop the line since it doesn't add much to the patch itself.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-03-15 17:35 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10  5:46 [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 01/14] net: add netdev->tso_ipv6_max_size attribute Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 02/14] ipv6: add dev->gso_ipv6_max_size Eric Dumazet
2022-03-11 16:21   ` Alexander H Duyck
2022-03-15 15:57     ` Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 03/14] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 04/14] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 05/14] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 06/14] ipv6/gro: insert " Eric Dumazet
2022-03-11 16:24   ` Alexander H Duyck
2022-03-15 16:01     ` Eric Dumazet
2022-03-15 16:04       ` Alexander Duyck
2022-03-15 16:10         ` Eric Dumazet
2022-03-15 17:35           ` Alexander Duyck
2022-03-10  5:46 ` [PATCH v4 net-next 07/14] ipv6: add GRO_IPV6_MAX_SIZE Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 08/14] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 09/14] net: loopback: enable BIG TCP packets Eric Dumazet
2022-03-10  5:46 ` [PATCH v4 net-next 10/14] bonding: update dev->tso_ipv6_max_size Eric Dumazet
2022-03-10  5:47 ` [PATCH v4 net-next 11/14] macvlan: enable BIG TCP Packets Eric Dumazet
2022-03-10  5:47 ` [PATCH v4 net-next 12/14] ipvlan: " Eric Dumazet
2022-03-10  5:47 ` [PATCH v4 net-next 13/14] mlx4: support BIG TCP packets Eric Dumazet
2022-03-10  5:47 ` [PATCH v4 net-next 14/14] mlx5: " Eric Dumazet
2022-03-11 17:13 ` [PATCH v4 net-next 00/14] tcp: BIG TCP implementation Alexander H Duyck
2022-03-15 15:50   ` Eric Dumazet
2022-03-15 16:17     ` Alexander Duyck
2022-03-15 16:33       ` Eric Dumazet
2022-03-15 17:20         ` Alexander Duyck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.