netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 net-next 00/13] tcp: BIG TCP implementation
@ 2022-05-09 22:21 Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

This series implements BIG TCP as presented in netdev 0x15:

https://netdevconf.info/0x15/session.html?BIG-TCP

Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/

Standard TSO/GRO packet limit is 64KB

With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.

Note that this feature is by default not enabled, because it might
break some eBPF programs assuming TCP header immediately follows IPv6 header.

While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
are unable to skip over IPv6 extension headers.

Reducing number of packets traversing networking stack usually improves
performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.

'Standard' performance with current (74KB) limits.
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
77           138          183          8542.19    
79           143          178          8215.28    
70           117          164          9543.39    
80           144          176          8183.71    
78           126          155          9108.47    
80           146          184          8115.19    
71           113          165          9510.96    
74           113          164          9518.74    
79           137          178          8575.04    
73           111          171          9561.73    

Now enable BIG TCP on both hosts.

ip link set dev eth0 gro_max_size 185000 gso_max_size 185000
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
57           83           117          13871.38   
64           118          155          11432.94   
65           116          148          11507.62   
60           105          136          12645.15   
60           103          135          12760.34   
60           102          134          12832.64   
62           109          132          10877.68   
58           82           115          14052.93   
57           83           124          14212.58   
57           82           119          14196.01   

We see an increase of transactions per second, and lower latencies as well.

v5: Replaced two patches (that were adding new attributes) with patches
    from Alexander Duyck. Idea is to reuse existing gso_max_size/gro_max_size

v4: Rebased on top of Jakub series (Merge branch 'tso-gso-limit-split')
    max_tso_size is now family independent.

v3: Fixed a typo in RFC number (Alexander)
    Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.

v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
    Addressed feedback, for Alexander and nvidia folks.




Alexander Duyck (2):
  net: allow gso_max_size to exceed 65536
  net: allow gro_max_size to exceed 65536

Coco Li (2):
  ipv6: Add hop-by-hop header to jumbograms in ip6_output
  mlx5: support BIG TCP packets

Eric Dumazet (9):
  net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
  net: limit GSO_MAX_SIZE to 524280 bytes
  tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  ipv6: add struct hop_jumbo_hdr definition
  ipv6/gso: remove temporary HBH/jumbo header
  ipv6/gro: insert temporary HBH/jumbo header
  net: loopback: enable BIG TCP packets
  veth: enable BIG TCP packets
  mlx4: support BIG TCP packets

 drivers/net/ethernet/amd/xgbe/xgbe.h          |  3 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 +
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 drivers/net/ethernet/sfc/ef100_nic.c          |  3 +-
 drivers/net/ethernet/sfc/falcon/tx.c          |  3 +-
 drivers/net/ethernet/sfc/tx_common.c          |  3 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac.h    |  3 +-
 drivers/net/hyperv/rndis_filter.c             |  2 +-
 drivers/net/loopback.c                        |  2 +
 drivers/net/veth.c                            |  1 +
 drivers/scsi/fcoe/fcoe.c                      |  2 +-
 include/linux/ipv6.h                          |  1 +
 include/linux/netdevice.h                     | 16 +++-
 include/net/ipv6.h                            | 44 ++++++++++
 include/uapi/linux/if_link.h                  |  2 +
 net/bpf/test_run.c                            |  2 +-
 net/core/dev.c                                |  7 +-
 net/core/gro.c                                |  8 ++
 net/core/rtnetlink.c                          | 16 ++--
 net/core/sock.c                               |  4 +
 net/ipv4/tcp_bbr.c                            |  2 +-
 net/ipv4/tcp_cubic.c                          |  4 +-
 net/ipv4/tcp_output.c                         |  2 +-
 net/ipv6/ip6_offload.c                        | 56 ++++++++++++-
 net/ipv6/ip6_output.c                         | 22 ++++-
 net/sctp/output.c                             |  3 +-
 tools/include/uapi/linux/if_link.h            |  2 +
 30 files changed, 291 insertions(+), 59 deletions(-)

-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report to user-space the device TSO limits.

ip -d link sh dev eth1
...
   tso_max_size 65536 tso_max_segs 65535

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/uapi/linux/if_link.h       | 2 ++
 net/core/rtnetlink.c               | 6 ++++++
 tools/include/uapi/linux/if_link.h | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index d1e600816b82c2e73c3e0684c66ddf9841a75b04..5f58dcfe2787f308bb2aa5777cca0816dd32bbb9 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -368,6 +368,8 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_MAX_SIZE,
+	IFLA_TSO_MAX_SEGS,
 
 	__IFLA_MAX
 };
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 6aff02df9ba51c99e8f1dd8e1c1da393c92b8ebf..21b117b710bf2154f11b6511de7d578d0eafb65e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1064,6 +1064,8 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */
 	       + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */
 	       + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_MAX_SIZE */
+	       + nla_total_size(4) /* IFLA_TSO_MAX_SEGS */
 	       + nla_total_size(1) /* IFLA_OPERSTATE */
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_CARRIER_CHANGES */
@@ -1769,6 +1771,8 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	    nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) ||
 	    nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) ||
 	    nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_MAX_SIZE, dev->tso_max_size) ||
+	    nla_put_u32(skb, IFLA_TSO_MAX_SEGS, dev->tso_max_segs) ||
 #ifdef CONFIG_RPS
 	    nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) ||
 #endif
@@ -1922,6 +1926,8 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_NEW_IFINDEX]	= NLA_POLICY_MIN(NLA_S32, 1),
 	[IFLA_PARENT_DEV_NAME]	= { .type = NLA_NUL_STRING },
 	[IFLA_GRO_MAX_SIZE]	= { .type = NLA_U32 },
+	[IFLA_TSO_MAX_SIZE]	= { .type = NLA_REJECT },
+	[IFLA_TSO_MAX_SEGS]	= { .type = NLA_REJECT },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index e1ba2d51b717b7ac7f06e94ac9791cf4c8a5ab6f..b339bf2196ca160ed3040615ae624b9a028562fb 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -348,6 +348,8 @@ enum {
 	IFLA_PARENT_DEV_NAME,
 	IFLA_PARENT_DEV_BUS_NAME,
 	IFLA_GRO_MAX_SIZE,
+	IFLA_TSO_MAX_SIZE,
+	IFLA_TSO_MAX_SEGS,
 
 	__IFLA_MAX
 };
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-10  1:35   ` kernel test robot
  2022-05-10  3:08   ` kernel test robot
  2022-05-09 22:21 ` [PATCH v5 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes Eric Dumazet
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Alexander Duyck <alexanderduyck@fb.com>

The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.

With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.

So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe.h            | 3 ++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 drivers/net/ethernet/sfc/ef100_nic.c            | 3 ++-
 drivers/net/ethernet/sfc/falcon/tx.c            | 3 ++-
 drivers/net/ethernet/sfc/tx_common.c            | 3 ++-
 drivers/net/ethernet/synopsys/dwc-xlgmac.h      | 3 ++-
 drivers/net/hyperv/rndis_filter.c               | 2 +-
 drivers/scsi/fcoe/fcoe.c                        | 2 +-
 include/linux/netdevice.h                       | 3 ++-
 net/bpf/test_run.c                              | 2 +-
 net/core/dev.c                                  | 5 +++--
 net/core/rtnetlink.c                            | 2 +-
 net/core/sock.c                                 | 4 ++++
 net/ipv4/tcp_bbr.c                              | 2 +-
 net/ipv4/tcp_output.c                           | 2 +-
 net/sctp/output.c                               | 3 ++-
 16 files changed, 28 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe.h b/drivers/net/ethernet/amd/xgbe/xgbe.h
index 607a2c90513b529ca0383410a3f513d98a75a72f..d9547552ceefe1d291155ab7619a5f2fa6296340 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe.h
+++ b/drivers/net/ethernet/amd/xgbe/xgbe.h
@@ -151,7 +151,8 @@
 #define XGBE_TX_MAX_BUF_SIZE	(0x3fff & ~(64 - 1))
 
 /* Descriptors required for maximum contiguous TSO/GSO packet */
-#define XGBE_TX_MAX_SPLIT	((GSO_MAX_SIZE / XGBE_TX_MAX_BUF_SIZE) + 1)
+#define XGBE_TX_MAX_SPLIT	\
+	((GSO_LEGACY_MAX_SIZE / XGBE_TX_MAX_BUF_SIZE) + 1)
 
 /* Maximum possible descriptors needed for an SKB:
  * - Maximum number of SKB frags
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index fb11081001a088fcddde68b88bae1da65a3f2c06..838870bc6dbd6e3a3d8c9443ff4675a0e411006b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2038,7 +2038,7 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt)
 {
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 
-	return PAGE_SIZE * nr_frags + data_bcnt <= GSO_MAX_SIZE;
+	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_MAX_SIZE;
 }
 
 static void
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index a69d756e09b9316660aea5a48d07d86af9cd9112..b2536d2c218a6db8acf1e8a5802860639c5e71a6 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -1008,7 +1008,8 @@ static int ef100_process_design_param(struct efx_nic *efx,
 		}
 		return 0;
 	case ESE_EF100_DP_GZ_TSO_MAX_PAYLOAD_LEN:
-		nic_data->tso_max_payload_len = min_t(u64, reader->value, GSO_MAX_SIZE);
+		nic_data->tso_max_payload_len = min_t(u64, reader->value,
+						      GSO_LEGACY_MAX_SIZE);
 		netif_set_tso_max_size(efx->net_dev,
 				       nic_data->tso_max_payload_len);
 		return 0;
diff --git a/drivers/net/ethernet/sfc/falcon/tx.c b/drivers/net/ethernet/sfc/falcon/tx.c
index f7306e93a8b8db9b220c5c3b95dc95c7eaaf2580..b9369483758cd6ebcd263852542175610b4d2789 100644
--- a/drivers/net/ethernet/sfc/falcon/tx.c
+++ b/drivers/net/ethernet/sfc/falcon/tx.c
@@ -98,7 +98,8 @@ unsigned int ef4_tx_max_skb_descs(struct ef4_nic *efx)
 	/* Possibly more for PCIe page boundaries within input fragments */
 	if (PAGE_SIZE > EF4_PAGE_SIZE)
 		max_descs += max_t(unsigned int, MAX_SKB_FRAGS,
-				   DIV_ROUND_UP(GSO_MAX_SIZE, EF4_PAGE_SIZE));
+				   DIV_ROUND_UP(GSO_LEGACY_MAX_SIZE,
+						EF4_PAGE_SIZE));
 
 	return max_descs;
 }
diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c
index 9bc8281b7f5bdd3d95924c6f8294d39202424a27..658ea2d340704d186bb9f94ad24497cbd2d15752 100644
--- a/drivers/net/ethernet/sfc/tx_common.c
+++ b/drivers/net/ethernet/sfc/tx_common.c
@@ -416,7 +416,8 @@ unsigned int efx_tx_max_skb_descs(struct efx_nic *efx)
 	/* Possibly more for PCIe page boundaries within input fragments */
 	if (PAGE_SIZE > EFX_PAGE_SIZE)
 		max_descs += max_t(unsigned int, MAX_SKB_FRAGS,
-				   DIV_ROUND_UP(GSO_MAX_SIZE, EFX_PAGE_SIZE));
+				   DIV_ROUND_UP(GSO_LEGACY_MAX_SIZE,
+						EFX_PAGE_SIZE));
 
 	return max_descs;
 }
diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac.h b/drivers/net/ethernet/synopsys/dwc-xlgmac.h
index 98e3a271e017ae17f23866beab8021d2f2ab26c0..a848e10f3ea457da1b17571df6a35b077a96c794 100644
--- a/drivers/net/ethernet/synopsys/dwc-xlgmac.h
+++ b/drivers/net/ethernet/synopsys/dwc-xlgmac.h
@@ -38,7 +38,8 @@
 #define XLGMAC_RX_DESC_MAX_DIRTY	(XLGMAC_RX_DESC_CNT >> 3)
 
 /* Descriptors required for maximum contiguous TSO/GSO packet */
-#define XLGMAC_TX_MAX_SPLIT	((GSO_MAX_SIZE / XLGMAC_TX_MAX_BUF_SIZE) + 1)
+#define XLGMAC_TX_MAX_SPLIT	\
+	((GSO_LEGACY_MAX_SIZE / XLGMAC_TX_MAX_BUF_SIZE) + 1)
 
 /* Maximum possible descriptors needed for a SKB */
 #define XLGMAC_TX_MAX_DESC_NR	(MAX_SKB_FRAGS + XLGMAC_TX_MAX_SPLIT + 2)
diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c
index 866af2cc27a3e0df11812d6ade17dde1d247ff4a..6da36cb8af8055eba338490b6bc7493181e8644c 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -1349,7 +1349,7 @@ static int rndis_netdev_set_hwcaps(struct rndis_device *rndis_device,
 	struct net_device_context *net_device_ctx = netdev_priv(net);
 	struct ndis_offload hwcaps;
 	struct ndis_offload_params offloads;
-	unsigned int gso_max_size = GSO_MAX_SIZE;
+	unsigned int gso_max_size = GSO_LEGACY_MAX_SIZE;
 	int ret;
 
 	/* Find HW offload capabilities */
diff --git a/drivers/scsi/fcoe/fcoe.c b/drivers/scsi/fcoe/fcoe.c
index 44ca6110213caaf7222c8b69c6c3fc2a08687495..79b2827e4081b4015fc51ace4e1467214c45fd48 100644
--- a/drivers/scsi/fcoe/fcoe.c
+++ b/drivers/scsi/fcoe/fcoe.c
@@ -667,7 +667,7 @@ static void fcoe_netdev_features_change(struct fc_lport *lport,
 
 	if (netdev->features & NETIF_F_FSO) {
 		lport->seq_offload = 1;
-		lport->lso_max = netdev->gso_max_size;
+		lport->lso_max = min(netdev->gso_max_size, GSO_LEGACY_MAX_SIZE);
 		FCOE_NETDEV_DBG(netdev, "Supports LSO for max len 0x%x\n",
 				lport->lso_max);
 	} else {
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 74c97a34921d48c593c08e2bed72e099f42520a3..9a34cc45b20a4465a9e1532c39f410b26604144f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2262,7 +2262,8 @@ struct net_device {
 	const struct rtnl_link_ops *rtnl_link_ops;
 
 	/* for setting kernel sock attribute on TCP connection setup */
-#define GSO_MAX_SIZE		65536
+#define GSO_LEGACY_MAX_SIZE	65536u
+#define GSO_MAX_SIZE		UINT_MAX
 	unsigned int		gso_max_size;
 #define TSO_LEGACY_MAX_SIZE	65536
 #define TSO_MAX_SIZE		UINT_MAX
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 8d54fef9a568a189d14253bcf01e3d586e746084..9b5a1f630bb0dbfe577c0f2a63094cb5872ade1d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -1001,7 +1001,7 @@ static int convert___skb_to_skb(struct sk_buff *skb, struct __sk_buff *__skb)
 		cb->pkt_len = skb->len;
 	} else {
 		if (__skb->wire_len < skb->len ||
-		    __skb->wire_len > GSO_MAX_SIZE)
+		    __skb->wire_len > GSO_LEGACY_MAX_SIZE)
 			return -EINVAL;
 		cb->pkt_len = __skb->wire_len;
 	}
diff --git a/net/core/dev.c b/net/core/dev.c
index f036ccb61da4da3ffc52c4f2402427054b831e8a..a1bbe000953f9365b4419f2ddbef96ddada42d3a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2998,7 +2998,8 @@ EXPORT_SYMBOL(netif_set_real_num_queues);
  * @size:	max skb->len of a TSO frame
  *
  * Set the limit on the size of TSO super-frames the device can handle.
- * Unless explicitly set the stack will assume the value of %GSO_MAX_SIZE.
+ * Unless explicitly set the stack will assume the value of
+ * %GSO_LEGACY_MAX_SIZE.
  */
 void netif_set_tso_max_size(struct net_device *dev, unsigned int size)
 {
@@ -10602,7 +10603,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 
 	dev_net_set(dev, &init_net);
 
-	dev->gso_max_size = GSO_MAX_SIZE;
+	dev->gso_max_size = GSO_LEGACY_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
 	dev->gro_max_size = GRO_MAX_SIZE;
 	dev->tso_max_size = TSO_LEGACY_MAX_SIZE;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 21b117b710bf2154f11b6511de7d578d0eafb65e..823db8999a2c1d5959042393783492dbecf1352c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2809,7 +2809,7 @@ static int do_setlink(const struct sk_buff *skb,
 	if (tb[IFLA_GSO_MAX_SIZE]) {
 		u32 max_size = nla_get_u32(tb[IFLA_GSO_MAX_SIZE]);
 
-		if (max_size > GSO_MAX_SIZE || max_size > dev->tso_max_size) {
+		if (max_size > dev->tso_max_size) {
 			err = -EINVAL;
 			goto errout;
 		}
diff --git a/net/core/sock.c b/net/core/sock.c
index 6b287eb5427b32865d25fc22122fefeff3a4ccf5..f7c3171078b6fccd25757e8fe54dd56a2a674238 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2312,6 +2312,10 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */
 			sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
+			if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
+			    (!IS_ENABLED(CONFIG_IPV6) || sk->sk_family != AF_INET6 ||
+			     !sk_is_tcp(sk) || ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
+				sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
 			sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */
 			max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1);
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index c7d30a3bbd81d27e16e800ec446569b93a4123ba..075e744bfb4829c087f4a85448e2f778dba439b4 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -310,7 +310,7 @@ static u32 bbr_tso_segs_goal(struct sock *sk)
 	 */
 	bytes = min_t(unsigned long,
 		      sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift),
-		      GSO_MAX_SIZE - 1 - MAX_TCP_HEADER);
+		      GSO_LEGACY_MAX_SIZE - 1 - MAX_TCP_HEADER);
 	segs = max_t(u32, bytes / tp->mss_cache, bbr_min_tso_segs(sk));
 
 	return min(segs, 0x7FU);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b092228e434261f45f79cc6c1fad613e0bb045c0..b4b2284ed4a2c9e2569bd945e3b4e023c5502f25 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1553,7 +1553,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
 	 * SO_SNDBUF values.
 	 * Also allow first and last skb in retransmit queue to be split.
 	 */
-	limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+	limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_LEGACY_MAX_SIZE);
 	if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
 		     tcp_queue != TCP_FRAG_IN_WRITE_QUEUE &&
 		     skb != tcp_rtx_queue_head(sk) &&
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 72fe6669c50de2c76842cf50d039b65a61943bd8..a63df055ac57d551e89edfb3a4982768a318cf67 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -134,7 +134,8 @@ void sctp_packet_config(struct sctp_packet *packet, __u32 vtag,
 		dst_hold(tp->dst);
 		sk_setup_caps(sk, tp->dst);
 	}
-	packet->max_size = sk_can_gso(sk) ? READ_ONCE(tp->dst->dev->gso_max_size)
+	packet->max_size = sk_can_gso(sk) ? min(READ_ONCE(tp->dst->dev->gso_max_size),
+						GSO_LEGACY_MAX_SIZE)
 					  : asoc->pathmtu;
 	rcu_read_unlock();
 }
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Make sure we will not overflow shinfo->gso_segs

Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs
is a 16bit field.

TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h,
it seems cleaner to not bring tcp details into include/linux/netdevice.h

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9a34cc45b20a4465a9e1532c39f410b26604144f..2ef9254a9d3a57403f510d32194d8be6730b1645 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2263,12 +2263,17 @@ struct net_device {
 
 	/* for setting kernel sock attribute on TCP connection setup */
 #define GSO_LEGACY_MAX_SIZE	65536u
-#define GSO_MAX_SIZE		UINT_MAX
+#define GSO_MAX_SEGS		65535u
+
+/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
+ * and shinfo->gso_segs is a 16bit field.
+ */
+#define GSO_MAX_SIZE		(8 * GSO_MAX_SEGS)
+
 	unsigned int		gso_max_size;
 #define TSO_LEGACY_MAX_SIZE	65536
 #define TSO_MAX_SIZE		UINT_MAX
 	unsigned int		tso_max_size;
-#define GSO_MAX_SEGS		65535
 	u16			gso_max_segs;
 #define TSO_MAX_SEGS		U16_MAX
 	u16			tso_max_segs;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (2 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

hystart_ack_delay() had the assumption that a TSO packet
would not be bigger than GSO_MAX_SIZE.

This will no longer be true.

We should use sk->sk_gso_max_size instead.

This reduces chances of spurious Hystart ACK train detections.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_cubic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c
index b0918839bee7cf0264ec3bbcdfc1417daa86d197..68178e7280ce24c26a48e48a51518d759e4d1718 100644
--- a/net/ipv4/tcp_cubic.c
+++ b/net/ipv4/tcp_cubic.c
@@ -372,7 +372,7 @@ static void cubictcp_state(struct sock *sk, u8 new_state)
  * We apply another 100% factor because @rate is doubled at this point.
  * We cap the cushion to 1ms.
  */
-static u32 hystart_ack_delay(struct sock *sk)
+static u32 hystart_ack_delay(const struct sock *sk)
 {
 	unsigned long rate;
 
@@ -380,7 +380,7 @@ static u32 hystart_ack_delay(struct sock *sk)
 	if (!rate)
 		return 0;
 	return min_t(u64, USEC_PER_MSEC,
-		     div64_ul((u64)GSO_MAX_SIZE * 4 * USEC_PER_SEC, rate));
+		     div64_ul((u64)sk->sk_gso_max_size * 4 * USEC_PER_SEC, rate));
 }
 
 static void hystart_update(struct sock *sk, u32 delay)
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (3 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patches will need to add and remove local IPv6 jumbogram
options to enable BIG TCP.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 213612f1680c7c39f4c07f0c05b4e6cf34a7878e..63d019953c47ea03d3b723a58c25e83c249489a9 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -151,6 +151,17 @@ struct frag_hdr {
 	__be32	identification;
 };
 
+/*
+ * Jumbo payload option, as described in RFC 2675 2.
+ */
+struct hop_jumbo_hdr {
+	u8	nexthdr;
+	u8	hdrlen;
+	u8	tlv_type;	/* IPV6_TLV_JUMBO, 0xC2 */
+	u8	tlv_len;	/* 4 */
+	__be32	jumbo_payload_len;
+};
+
 #define	IP6_MF		0x0001
 #define	IP6_OFFSET	0xFFF8
 
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (4 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 07/13] ipv6/gro: insert " Eric Dumazet
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

ipv6 tcp and gro stacks will soon be able to build big TCP packets,
with an added temporary Hop By Hop header.

If GSO is involved for these large packets, we need to remove
the temporary HBH header before segmentation happens.

v2: perform HBH removal from ipv6_gso_segment() instead of
    skb_segment() (Alexander feedback)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/ipv6.h     | 33 +++++++++++++++++++++++++++++++++
 net/ipv6/ip6_offload.c | 24 +++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 63d019953c47ea03d3b723a58c25e83c249489a9..b6df0314aa02dd1c4094620145ccb24da7195b2b 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -467,6 +467,39 @@ bool ipv6_opt_accepted(const struct sock *sk, const struct sk_buff *skb,
 struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
 					   struct ipv6_txoptions *opt);
 
+/* This helper is specialized for BIG TCP needs.
+ * It assumes the hop_jumbo_hdr will immediately follow the IPV6 header.
+ * It assumes headers are already in skb->head.
+ * Returns 0, or IPPROTO_TCP if a BIG TCP packet is there.
+ */
+static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
+{
+	const struct hop_jumbo_hdr *jhdr;
+	const struct ipv6hdr *nhdr;
+
+	if (likely(skb->len <= GRO_MAX_SIZE))
+		return 0;
+
+	if (skb->protocol != htons(ETH_P_IPV6))
+		return 0;
+
+	if (skb_network_offset(skb) +
+	    sizeof(struct ipv6hdr) +
+	    sizeof(struct hop_jumbo_hdr) > skb_headlen(skb))
+		return 0;
+
+	nhdr = ipv6_hdr(skb);
+
+	if (nhdr->nexthdr != NEXTHDR_HOP)
+		return 0;
+
+	jhdr = (const struct hop_jumbo_hdr *) (nhdr + 1);
+	if (jhdr->tlv_type != IPV6_TLV_JUMBO || jhdr->hdrlen != 0 ||
+	    jhdr->nexthdr != IPPROTO_TCP)
+		return 0;
+	return jhdr->nexthdr;
+}
+
 static inline bool ipv6_accept_ra(struct inet6_dev *idev)
 {
 	/* If forwarding is enabled, RA are not accepted unless the special
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index c4fc03c1ac99dbecd92e2b47b2db65374197434d..a6a6c1539c28d242ef8c35fcd5ce900512ce912d 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -77,7 +77,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
 	struct ipv6hdr *ipv6h;
 	const struct net_offload *ops;
-	int proto;
+	int proto, nexthdr;
 	struct frag_hdr *fptr;
 	unsigned int payload_len;
 	u8 *prevhdr;
@@ -87,6 +87,28 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	bool gso_partial;
 
 	skb_reset_network_header(skb);
+	nexthdr = ipv6_has_hopopt_jumbo(skb);
+	if (nexthdr) {
+		const int hophdr_len = sizeof(struct hop_jumbo_hdr);
+		int err;
+
+		err = skb_cow_head(skb, 0);
+		if (err < 0)
+			return ERR_PTR(err);
+
+		/* remove the HBH header.
+		 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+		 */
+		memmove(skb_mac_header(skb) + hophdr_len,
+			skb_mac_header(skb),
+			ETH_HLEN + sizeof(struct ipv6hdr));
+		skb->data += hophdr_len;
+		skb->len -= hophdr_len;
+		skb->network_header += hophdr_len;
+		skb->mac_header += hophdr_len;
+		ipv6h = (struct ipv6hdr *)skb->data;
+		ipv6h->nexthdr = nexthdr;
+	}
 	nhoff = skb_network_header(skb) - skb_mac_header(skb);
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 07/13] ipv6/gro: insert temporary HBH/jumbo header
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (5 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 08/13] net: allow gro_max_size to exceed 65536 Eric Dumazet
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Following patch will add GRO_IPV6_MAX_SIZE, allowing gro to build
BIG TCP ipv6 packets (bigger than 64K).

This patch changes ipv6_gro_complete() to insert a HBH/jumbo header
so that resulting packet can go through IPv6/TCP stacks.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv6/ip6_offload.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a6a6c1539c28d242ef8c35fcd5ce900512ce912d..d12dba2dd5354dbb79bb80df4038dec2544cddeb 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -342,15 +342,43 @@ static struct sk_buff *ip4ip6_gro_receive(struct list_head *head,
 INDIRECT_CALLABLE_SCOPE int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
 	const struct net_offload *ops;
-	struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
+	struct ipv6hdr *iph;
 	int err = -ENOSYS;
+	u32 payload_len;
 
 	if (skb->encapsulation) {
 		skb_set_inner_protocol(skb, cpu_to_be16(ETH_P_IPV6));
 		skb_set_inner_network_header(skb, nhoff);
 	}
 
-	iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
+	payload_len = skb->len - nhoff - sizeof(*iph);
+	if (unlikely(payload_len > IPV6_MAXPLEN)) {
+		struct hop_jumbo_hdr *hop_jumbo;
+		int hoplen = sizeof(*hop_jumbo);
+
+		/* Move network header left */
+		memmove(skb_mac_header(skb) - hoplen, skb_mac_header(skb),
+			skb->transport_header - skb->mac_header);
+		skb->data -= hoplen;
+		skb->len += hoplen;
+		skb->mac_header -= hoplen;
+		skb->network_header -= hoplen;
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		hop_jumbo = (struct hop_jumbo_hdr *)(iph + 1);
+
+		/* Build hop-by-hop options */
+		hop_jumbo->nexthdr = iph->nexthdr;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(payload_len + hoplen);
+
+		iph->nexthdr = NEXTHDR_HOP;
+		iph->payload_len = 0;
+	} else {
+		iph = (struct ipv6hdr *)(skb->data + nhoff);
+		iph->payload_len = htons(payload_len);
+	}
 
 	nhoff += sizeof(*iph) + ipv6_exthdrs_len(iph, &ops);
 	if (WARN_ON(!ops || !ops->callbacks.gro_complete))
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 08/13] net: allow gro_max_size to exceed 65536
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (6 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 07/13] ipv6/gro: insert " Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Alexander Duyck <alexanderduyck@fb.com>

Allow the gro_max_size to exceed a value larger than 65536.

There weren't really any external limitations that prevented this other
than the fact that IPv4 only supports a 16 bit length field. Since we have
the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
via a constant rather than relying on gro_max_size.

[edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.

Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
 include/linux/netdevice.h                       | 6 +++++-
 include/net/ipv6.h                              | 2 +-
 net/core/dev.c                                  | 2 +-
 net/core/gro.c                                  | 8 ++++++++
 net/core/rtnetlink.c                            | 8 --------
 6 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 838870bc6dbd6e3a3d8c9443ff4675a0e411006b..24de37b79f5a917b304c011fcebcd09748ee5c6a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -2038,7 +2038,7 @@ mlx5e_hw_gro_skb_has_enough_space(struct sk_buff *skb, u16 data_bcnt)
 {
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 
-	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_MAX_SIZE;
+	return PAGE_SIZE * nr_frags + data_bcnt <= GRO_LEGACY_MAX_SIZE;
 }
 
 static void
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2ef9254a9d3a57403f510d32194d8be6730b1645..dfd57a647c97ed0f400ffe89c73919367a900f75 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2151,7 +2151,11 @@ struct net_device {
 	struct bpf_prog __rcu	*xdp_prog;
 	unsigned long		gro_flush_timeout;
 	int			napi_defer_hard_irqs;
-#define GRO_MAX_SIZE		65536
+#define GRO_LEGACY_MAX_SIZE	65536u
+/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
+ * and shinfo->gso_segs is a 16bit field.
+ */
+#define GRO_MAX_SIZE		(8 * 65535u)
 	unsigned int		gro_max_size;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index b6df0314aa02dd1c4094620145ccb24da7195b2b..5b38bf1a586b9da55f43db30d140d364a70f6c11 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -477,7 +477,7 @@ static inline int ipv6_has_hopopt_jumbo(const struct sk_buff *skb)
 	const struct hop_jumbo_hdr *jhdr;
 	const struct ipv6hdr *nhdr;
 
-	if (likely(skb->len <= GRO_MAX_SIZE))
+	if (likely(skb->len <= GRO_LEGACY_MAX_SIZE))
 		return 0;
 
 	if (skb->protocol != htons(ETH_P_IPV6))
diff --git a/net/core/dev.c b/net/core/dev.c
index a1bbe000953f9365b4419f2ddbef96ddada42d3a..7349f75891d5724a060781abc80a800bdf835f74 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10605,7 +10605,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 
 	dev->gso_max_size = GSO_LEGACY_MAX_SIZE;
 	dev->gso_max_segs = GSO_MAX_SEGS;
-	dev->gro_max_size = GRO_MAX_SIZE;
+	dev->gro_max_size = GRO_LEGACY_MAX_SIZE;
 	dev->tso_max_size = TSO_LEGACY_MAX_SIZE;
 	dev->tso_max_segs = TSO_MAX_SEGS;
 	dev->upper_level = 1;
diff --git a/net/core/gro.c b/net/core/gro.c
index 78110edf5d4b36d2fa6f8a2676096efe0112aa0e..b4190eb084672fb4f2be8b437eccb4e8507ff63f 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -167,6 +167,14 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
 	if (unlikely(p->len + len >= gro_max_size || NAPI_GRO_CB(skb)->flush))
 		return -E2BIG;
 
+	if (unlikely(p->len + len >= GRO_LEGACY_MAX_SIZE)) {
+		if (p->protocol != htons(ETH_P_IPV6) ||
+		    skb_headroom(p) < sizeof(struct hop_jumbo_hdr) ||
+		    ipv6_hdr(p)->nexthdr != IPPROTO_TCP ||
+		    p->encapsulation)
+			return -E2BIG;
+	}
+
 	lp = NAPI_GRO_CB(p)->last;
 	pinfo = skb_shinfo(lp);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 823db8999a2c1d5959042393783492dbecf1352c..5d7d7fe1e63a972bbcbd5eed1404b2643c74cfcb 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2347,14 +2347,6 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[],
 		}
 	}
 
-	if (tb[IFLA_GRO_MAX_SIZE]) {
-		u32 gro_max_size = nla_get_u32(tb[IFLA_GRO_MAX_SIZE]);
-
-		if (gro_max_size > GRO_MAX_SIZE) {
-			NL_SET_ERR_MSG(extack, "too big gro_max_size");
-			return -EINVAL;
-		}
-	}
 	return 0;
 }
 
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (7 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 08/13] net: allow gro_max_size to exceed 65536 Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 10/13] net: loopback: enable BIG TCP packets Eric Dumazet
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Coco Li <lixiaoyan@google.com>

Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.

Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.

This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/ipv6.h  |  1 +
 net/ipv6/ip6_output.c | 22 ++++++++++++++++++++--
 2 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index ec5ca392eaa31e83a022b1124fae6b607ba168cd..38c8203d52cbf39e523c43fe630a7b184b9991aa 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -145,6 +145,7 @@ struct inet6_skb_parm {
 #define IP6SKB_L3SLAVE         64
 #define IP6SKB_JUMBOGRAM      128
 #define IP6SKB_SEG6	      256
+#define IP6SKB_FAKEJUMBO      512
 };
 
 #if defined(CONFIG_NET_L3_MASTER_DEV)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index afa5bd4ad167c4a40878f33773d43be85e89c32f..4081b12a01ff22ecf94a6490aef0665808407a6e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -182,7 +182,9 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
 #endif
 
 	mtu = ip6_skb_dst_mtu(skb);
-	if (skb_is_gso(skb) && !skb_gso_validate_network_len(skb, mtu))
+	if (skb_is_gso(skb) &&
+	    !(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
+	    !skb_gso_validate_network_len(skb, mtu))
 		return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
 
 	if ((skb->len > mtu && !skb_is_gso(skb)) ||
@@ -252,6 +254,8 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	struct dst_entry *dst = skb_dst(skb);
 	struct net_device *dev = dst->dev;
 	struct inet6_dev *idev = ip6_dst_idev(dst);
+	struct hop_jumbo_hdr *hop_jumbo;
+	int hoplen = sizeof(*hop_jumbo);
 	unsigned int head_room;
 	struct ipv6hdr *hdr;
 	u8  proto = fl6->flowi6_proto;
@@ -259,7 +263,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 	int hlimit = -1;
 	u32 mtu;
 
-	head_room = sizeof(struct ipv6hdr) + LL_RESERVED_SPACE(dev);
+	head_room = sizeof(struct ipv6hdr) + hoplen + LL_RESERVED_SPACE(dev);
 	if (opt)
 		head_room += opt->opt_nflen + opt->opt_flen;
 
@@ -282,6 +286,20 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6,
 					     &fl6->saddr);
 	}
 
+	if (unlikely(seg_len > IPV6_MAXPLEN)) {
+		hop_jumbo = skb_push(skb, hoplen);
+
+		hop_jumbo->nexthdr = proto;
+		hop_jumbo->hdrlen = 0;
+		hop_jumbo->tlv_type = IPV6_TLV_JUMBO;
+		hop_jumbo->tlv_len = 4;
+		hop_jumbo->jumbo_payload_len = htonl(seg_len + hoplen);
+
+		proto = IPPROTO_HOPOPTS;
+		seg_len = 0;
+		IP6CB(skb)->flags |= IP6SKB_FAKEJUMBO;
+	}
+
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
 	hdr = ipv6_hdr(skb);
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 10/13] net: loopback: enable BIG TCP packets
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (8 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 11/13] veth: " Eric Dumazet
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Set the driver limit to GSO_MAX_SIZE (512 KB).

This allows the admin/user to set a GSO limit up to this value.

Tested:

ip link set dev lo gso_max_size 200000
netperf -H ::1 -t TCP_RR -l 100 -- -r 80000,80000 &

tcpdump shows :

18:28:42.962116 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626480001:3626560001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962138 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962152 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626560001:3626640001, ack 3626560001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962157 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 0
18:28:42.962180 IP6 ::1 > ::1: HBH 40051 > 63780: Flags [P.], seq 3626560001:3626640001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179265 ecr 3771179265], length 80000
18:28:42.962214 IP6 ::1.63780 > ::1.40051: Flags [.], ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 0
18:28:42.962228 IP6 ::1 > ::1: HBH 63780 > 40051: Flags [P.], seq 3626640001:3626720001, ack 3626640001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179265], length 80000
18:28:42.962233 IP6 ::1.40051 > ::1.63780: Flags [.], ack 3626720001, win 17743, options [nop,nop,TS val 3771179266 ecr 3771179266], length 0

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/loopback.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 720394c0639b20a2fd6262e4ee9d5813c02802f1..14e8d04cb4347cb7b9171d576156fb8e8ecebbe3 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -191,6 +191,8 @@ static void gen_lo_setup(struct net_device *dev,
 	dev->netdev_ops		= dev_ops;
 	dev->needs_free_netdev	= true;
 	dev->priv_destructor	= dev_destructor;
+
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
 }
 
 /* The loopback device is special. There is only one instance
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 11/13] veth: enable BIG TCP packets
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (9 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 10/13] net: loopback: enable BIG TCP packets Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 12/13] mlx4: support " Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 13/13] mlx5: " Eric Dumazet
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

Set the TSO driver limit to GSO_MAX_SIZE (512 KB).

This allows the admin/user to set a GSO limit up to this value.

ip link set dev veth10 gso_max_size 200000

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/veth.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f474e79a774580e4cb67da44b5f0c796c3ce8abb..466da01ba2e3e97ba9eb16586b6d5d9f092b3d76 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1647,6 +1647,7 @@ static void veth_setup(struct net_device *dev)
 	dev->hw_features = VETH_FEATURES;
 	dev->hw_enc_features = VETH_FEATURES;
 	dev->mpls_features = NETIF_F_HW_CSUM | NETIF_F_GSO_SOFTWARE;
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
 }
 
 /*
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 12/13] mlx4: support BIG TCP packets
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (10 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 11/13] veth: " Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:21 ` [PATCH v5 net-next 13/13] mlx5: " Eric Dumazet
  12 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet,
	Tariq Toukan

From: Eric Dumazet <edumazet@google.com>

mlx4 supports LSOv2 just fine.

IPv6 stack inserts a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore the HBH header when populating TX descriptor.

Tested:

Before: (not enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_max_size 65536 gro_max_size 65536

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   6591.45  0.86   1.34   62.490  97.446
262144 540000

After: (enabling bigger TSO/GRO packets)

ip link set dev eth0 gso_max_size 185000 gro_max_size 185000

netperf -H lpaa18 -t TCP_RR -T2,2 -l 10 -Cc -- -r 70000,70000
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to lpaa18.prod.google.com () port 0 AF_INET6 : first burst 0 : cpu bind
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  % S    % S    us/Tr   us/Tr

262144 540000 70000   70000  10.00   8383.95  0.95   1.01   54.432  57.584
262144 540000

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 ++
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++++++++----
 2 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c61dc7ae0c056a4dbcf24297549f6b1b5cc25d92..ca4b93a0103469b9629dad2f877a496c23fd727c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3417,6 +3417,9 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	dev->min_mtu = ETH_MIN_MTU;
 	dev->max_mtu = priv->max_mtu;
 
+	/* supports LSOv2 packets. */
+	netif_set_tso_max_size(dev, GSO_MAX_SIZE);
+
 	mdev->pndev[port] = dev;
 	mdev->upper[port] = NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index f777151d226fb601f52366850f8c86358e214032..af3b2b59a2a6940a2839b277815ec7c3b4af1008 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -43,6 +43,7 @@
 #include <linux/ip.h>
 #include <linux/ipv6.h>
 #include <linux/indirect_call_wrapper.h>
+#include <net/ipv6.h>
 
 #include "mlx4_en.h"
 
@@ -634,19 +635,28 @@ static int get_real_size(const struct sk_buff *skb,
 			 struct net_device *dev,
 			 int *lso_header_size,
 			 bool *inline_ok,
-			 void **pfrag)
+			 void **pfrag,
+			 int *hopbyhop)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	int real_size;
 
 	if (shinfo->gso_size) {
 		*inline_ok = false;
-		if (skb->encapsulation)
+		*hopbyhop = 0;
+		if (skb->encapsulation) {
 			*lso_header_size = (skb_inner_transport_header(skb) - skb->data) + inner_tcp_hdrlen(skb);
-		else
+		} else {
+			/* Detects large IPV6 TCP packets and prepares for removal of
+			 * HBH header that has been pushed by ip6_xmit(),
+			 * mainly so that tcpdump can dissect them.
+			 */
+			if (ipv6_has_hopopt_jumbo(skb))
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
 			*lso_header_size = skb_transport_offset(skb) + tcp_hdrlen(skb);
+		}
 		real_size = CTRL_SIZE + shinfo->nr_frags * DS_SIZE +
-			ALIGN(*lso_header_size + 4, DS_SIZE);
+			ALIGN(*lso_header_size - *hopbyhop + 4, DS_SIZE);
 		if (unlikely(*lso_header_size != skb_headlen(skb))) {
 			/* We add a segment for the skb linear buffer only if
 			 * it contains data */
@@ -873,6 +883,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	int desc_size;
 	int real_size;
 	u32 index, bf_index;
+	struct ipv6hdr *h6;
 	__be32 op_own;
 	int lso_header_size;
 	void *fragptr = NULL;
@@ -881,6 +892,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool stop_queue;
 	bool inline_ok;
 	u8 data_offset;
+	int hopbyhop;
 	bool bf_ok;
 
 	tx_ind = skb_get_queue_mapping(skb);
@@ -890,7 +902,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto tx_drop;
 
 	real_size = get_real_size(skb, shinfo, dev, &lso_header_size,
-				  &inline_ok, &fragptr);
+				  &inline_ok, &fragptr, &hopbyhop);
 	if (unlikely(!real_size))
 		goto tx_drop_count;
 
@@ -943,7 +955,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		data = &tx_desc->data;
 		data_offset = offsetof(struct mlx4_en_tx_desc, data);
 	} else {
-		int lso_align = ALIGN(lso_header_size + 4, DS_SIZE);
+		int lso_align = ALIGN(lso_header_size - hopbyhop + 4, DS_SIZE);
 
 		data = (void *)&tx_desc->lso + lso_align;
 		data_offset = offsetof(struct mlx4_en_tx_desc, lso) + lso_align;
@@ -1008,14 +1020,31 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 			((ring->prod & ring->size) ?
 				cpu_to_be32(MLX4_EN_BIT_DESC_OWN) : 0);
 
+		lso_header_size -= hopbyhop;
 		/* Fill in the LSO prefix */
 		tx_desc->lso.mss_hdr_size = cpu_to_be32(
 			shinfo->gso_size << 16 | lso_header_size);
 
-		/* Copy headers;
-		 * note that we already verified that it is linear */
-		memcpy(tx_desc->lso.header, skb->data, lso_header_size);
 
+		if (unlikely(hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(tx_desc->lso.header, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)tx_desc->lso.header + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			/* Copy headers;
+			 * note that we already verified that it is linear
+			 */
+			memcpy(tx_desc->lso.header, skb->data, lso_header_size);
+		}
 		ring->tso_packets++;
 
 		i = shinfo->gso_segs;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
                   ` (11 preceding siblings ...)
  2022-05-09 22:21 ` [PATCH v5 net-next 12/13] mlx4: support " Eric Dumazet
@ 2022-05-09 22:21 ` Eric Dumazet
  2022-05-09 22:30   ` Eric Dumazet
  2022-05-10  1:38   ` Jakub Kicinski
  12 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:21 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: netdev, Alexander Duyck, Coco Li, Eric Dumazet, Eric Dumazet,
	Tariq Toukan, Saeed Mahameed, Leon Romanovsky

From: Coco Li <lixiaoyan@google.com>

mlx5 supports LSOv2.

IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
with JUMBO TLV for big packets.

We need to ignore/skip this HBH header when populating TX descriptor.

Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.

v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y

Signed-off-by: Coco Li <lixiaoyan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 2 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index d27986869b8ba070d1a4f8bcdc7e14ab54ae984e..226825410a1aa55b5b7941a7389a78abdb800521 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4920,6 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
 
 	netdev->priv_flags       |= IFF_UNICAST_FLT;
 
+	netif_set_tso_max_size(netdev, 512 * 1024);
 	mlx5e_set_netdev_dev_addr(netdev);
 	mlx5e_ipsec_build_netdev(priv);
 	mlx5e_ktls_build_netdev(priv);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2dc48406cd08d21ff94f665cd61ab9227f351215..b4fc45ba1b347fb9ad0f46b9c091cc45e4d3d84f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -40,6 +40,7 @@
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec_rxtx.h"
 #include "en/ptp.h"
+#include <net/ipv6.h>
 
 static void mlx5e_dma_unmap_wqe_err(struct mlx5e_txqsq *sq, u8 num_dma)
 {
@@ -130,23 +131,32 @@ mlx5e_txwqe_build_eseg_csum(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 		sq->stats->csum_none++;
 }
 
+/* Returns the number of header bytes that we plan
+ * to inline later in the transmit descriptor
+ */
 static inline u16
-mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb)
+mlx5e_tx_get_gso_ihs(struct mlx5e_txqsq *sq, struct sk_buff *skb, int *hopbyhop)
 {
 	struct mlx5e_sq_stats *stats = sq->stats;
 	u16 ihs;
 
+	*hopbyhop = 0;
 	if (skb->encapsulation) {
 		ihs = skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb);
 		stats->tso_inner_packets++;
 		stats->tso_inner_bytes += skb->len - ihs;
 	} else {
-		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) {
 			ihs = skb_transport_offset(skb) + sizeof(struct udphdr);
-		else
+		} else {
 			ihs = skb_transport_offset(skb) + tcp_hdrlen(skb);
+			if (ipv6_has_hopopt_jumbo(skb)) {
+				*hopbyhop = sizeof(struct hop_jumbo_hdr);
+				ihs -= sizeof(struct hop_jumbo_hdr);
+			}
+		}
 		stats->tso_packets++;
-		stats->tso_bytes += skb->len - ihs;
+		stats->tso_bytes += skb->len - ihs - *hopbyhop;
 	}
 
 	return ihs;
@@ -208,6 +218,7 @@ struct mlx5e_tx_attr {
 	__be16 mss;
 	u16 insz;
 	u8 opcode;
+	u8 hopbyhop;
 };
 
 struct mlx5e_tx_wqe_attr {
@@ -244,14 +255,16 @@ static void mlx5e_sq_xmit_prepare(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5e_sq_stats *stats = sq->stats;
 
 	if (skb_is_gso(skb)) {
-		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb);
+		int hopbyhop;
+		u16 ihs = mlx5e_tx_get_gso_ihs(sq, skb, &hopbyhop);
 
 		*attr = (struct mlx5e_tx_attr) {
 			.opcode    = MLX5_OPCODE_LSO,
 			.mss       = cpu_to_be16(skb_shinfo(skb)->gso_size),
 			.ihs       = ihs,
 			.num_bytes = skb->len + (skb_shinfo(skb)->gso_segs - 1) * ihs,
-			.headlen   = skb_headlen(skb) - ihs,
+			.headlen   = skb_headlen(skb) - ihs - hopbyhop,
+			.hopbyhop  = hopbyhop,
 		};
 
 		stats->packets += skb_shinfo(skb)->gso_segs;
@@ -365,7 +378,8 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	struct mlx5_wqe_eth_seg  *eseg;
 	struct mlx5_wqe_data_seg *dseg;
 	struct mlx5e_tx_wqe_info *wi;
-
+	u16 ihs = attr->ihs;
+	struct ipv6hdr *h6;
 	struct mlx5e_sq_stats *stats = sq->stats;
 	int num_dma;
 
@@ -379,15 +393,36 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 
 	eseg->mss = attr->mss;
 
-	if (attr->ihs) {
-		if (skb_vlan_tag_present(skb)) {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs + VLAN_HLEN);
-			mlx5e_insert_vlan(eseg->inline_hdr.start, skb, attr->ihs);
+	if (ihs) {
+		u8 *start = eseg->inline_hdr.start;
+
+		if (unlikely(attr->hopbyhop)) {
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			if (skb_vlan_tag_present(skb)) {
+				mlx5e_insert_vlan(start, skb, ETH_HLEN + sizeof(*h6));
+				ihs += VLAN_HLEN;
+				h6 = (struct ipv6hdr *)(start + sizeof(struct vlan_ethhdr));
+			} else {
+				memcpy(start, skb->data, ETH_HLEN + sizeof(*h6));
+				h6 = (struct ipv6hdr *)(start + ETH_HLEN);
+			}
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else if (skb_vlan_tag_present(skb)) {
+			mlx5e_insert_vlan(start, skb, ihs);
+			ihs += VLAN_HLEN;
 			stats->added_vlan_packets++;
 		} else {
-			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs);
-			memcpy(eseg->inline_hdr.start, skb->data, attr->ihs);
+			memcpy(start, skb->data, ihs);
 		}
+		eseg->inline_hdr.sz |= cpu_to_be16(ihs);
 		dseg += wqe_attr->ds_cnt_inl;
 	} else if (skb_vlan_tag_present(skb)) {
 		eseg->insert.type = cpu_to_be16(MLX5_ETH_WQE_INSERT_VLAN);
@@ -398,7 +433,7 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	}
 
 	dseg += wqe_attr->ds_cnt_ids;
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr->ihs + attr->hopbyhop,
 					  attr->headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
@@ -918,12 +953,29 @@ void mlx5i_sq_xmit(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 	eseg->mss = attr.mss;
 
 	if (attr.ihs) {
-		memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		if (unlikely(attr.hopbyhop)) {
+			struct ipv6hdr *h6;
+
+			/* remove the HBH header.
+			 * Layout: [Ethernet header][IPv6 header][HBH][TCP header]
+			 */
+			memcpy(eseg->inline_hdr.start, skb->data, ETH_HLEN + sizeof(*h6));
+			h6 = (struct ipv6hdr *)((char *)eseg->inline_hdr.start + ETH_HLEN);
+			h6->nexthdr = IPPROTO_TCP;
+			/* Copy the TCP header after the IPv6 one */
+			memcpy(h6 + 1,
+			       skb->data + ETH_HLEN + sizeof(*h6) +
+					sizeof(struct hop_jumbo_hdr),
+			       tcp_hdrlen(skb));
+			/* Leave ipv6 payload_len set to 0, as LSO v2 specs request. */
+		} else {
+			memcpy(eseg->inline_hdr.start, skb->data, attr.ihs);
+		}
 		eseg->inline_hdr.sz = cpu_to_be16(attr.ihs);
 		dseg += wqe_attr.ds_cnt_inl;
 	}
 
-	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs,
+	num_dma = mlx5e_txwqe_build_dsegs(sq, skb, skb->data + attr.ihs + attr.hopbyhop,
 					  attr.headlen, dseg);
 	if (unlikely(num_dma < 0))
 		goto err_drop;
-- 
2.36.0.512.ge40c2bad7a-goog


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-09 22:21 ` [PATCH v5 net-next 13/13] mlx5: " Eric Dumazet
@ 2022-05-09 22:30   ` Eric Dumazet
  2022-05-10  1:38   ` Jakub Kicinski
  1 sibling, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-09 22:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Tariq Toukan, Saeed Mahameed,
	Leon Romanovsky

On Mon, May 9, 2022 at 3:22 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Coco Li <lixiaoyan@google.com>
>
> mlx5 supports LSOv2.
>
> IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> with JUMBO TLV for big packets.
>
> We need to ignore/skip this HBH header when populating TX descriptor.
>
> Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
> layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
>
> v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
> v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
>
> Signed-off-by: Coco Li <lixiaoyan@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Cc: Saeed Mahameed <saeedm@nvidia.com>
> Cc: Leon Romanovsky <leon@kernel.org>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
>  .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
>  2 files changed, 69 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index d27986869b8ba070d1a4f8bcdc7e14ab54ae984e..226825410a1aa55b5b7941a7389a78abdb800521 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -4920,6 +4920,7 @@ static void mlx5e_build_nic_netdev(struct net_device *netdev)
>
>         netdev->priv_flags       |= IFF_UNICAST_FLT;
>
> +       netif_set_tso_max_size(netdev, 512 * 1024);

Apparently I forgot to amend this part on the final patch of the series.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 226825410a1aa55b5b7941a7389a78abdb800521..bf3bca79e160124abd128ac1e9910cb2f39a39ff
100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4920,7 +4920,7 @@ static void mlx5e_build_nic_netdev(struct
net_device *netdev)

        netdev->priv_flags       |= IFF_UNICAST_FLT;

-       netif_set_tso_max_size(netdev, 512 * 1024);
+       netif_set_tso_max_size(netdev, GSO_MAX_SIZE);
        mlx5e_set_netdev_dev_addr(netdev);
        mlx5e_ipsec_build_netdev(priv);
        mlx5e_ktls_build_netdev(priv);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536
  2022-05-09 22:21 ` [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
@ 2022-05-10  1:35   ` kernel test robot
  2022-05-10  2:09     ` Eric Dumazet
  2022-05-10  3:08   ` kernel test robot
  1 sibling, 1 reply; 25+ messages in thread
From: kernel test robot @ 2022-05-10  1:35 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: kbuild-all, netdev, Alexander Duyck, Coco Li, Eric Dumazet

Hi Eric,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220510-062530
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 9c095bd0d4c451d31d0fd1131cc09d3b60de815d
config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20220510/202205100923.RHeXqtNd-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/8f9b47ee99f57d1747010d002315092bfa17ed50
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220510-062530
        git checkout 8f9b47ee99f57d1747010d002315092bfa17ed50
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/net/inet_sock.h:22,
                    from include/net/ip.h:29,
                    from include/linux/errqueue.h:6,
                    from net/core/sock.c:91:
   net/core/sock.c: In function 'sk_setup_caps':
>> include/net/sock.h:389:37: error: 'struct sock_common' has no member named 'skc_v6_rcv_saddr'; did you mean 'skc_rcv_saddr'?
     389 | #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr
         |                                     ^~~~~~~~~~~~~~~~
   net/core/sock.c:2317:72: note: in expansion of macro 'sk_v6_rcv_saddr'
    2317 |                              !sk_is_tcp(sk) || ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
         |                                                                        ^~~~~~~~~~~~~~~


vim +389 include/net/sock.h

4dc6dc7162c08b Eric Dumazet             2009-07-15  368  
68835aba4d9b74 Eric Dumazet             2010-11-30  369  #define sk_dontcopy_begin	__sk_common.skc_dontcopy_begin
68835aba4d9b74 Eric Dumazet             2010-11-30  370  #define sk_dontcopy_end		__sk_common.skc_dontcopy_end
4dc6dc7162c08b Eric Dumazet             2009-07-15  371  #define sk_hash			__sk_common.skc_hash
5080546682bae3 Eric Dumazet             2013-10-02  372  #define sk_portpair		__sk_common.skc_portpair
05dbc7b59481ca Eric Dumazet             2013-10-03  373  #define sk_num			__sk_common.skc_num
05dbc7b59481ca Eric Dumazet             2013-10-03  374  #define sk_dport		__sk_common.skc_dport
5080546682bae3 Eric Dumazet             2013-10-02  375  #define sk_addrpair		__sk_common.skc_addrpair
5080546682bae3 Eric Dumazet             2013-10-02  376  #define sk_daddr		__sk_common.skc_daddr
5080546682bae3 Eric Dumazet             2013-10-02  377  #define sk_rcv_saddr		__sk_common.skc_rcv_saddr
^1da177e4c3f41 Linus Torvalds           2005-04-16  378  #define sk_family		__sk_common.skc_family
^1da177e4c3f41 Linus Torvalds           2005-04-16  379  #define sk_state		__sk_common.skc_state
^1da177e4c3f41 Linus Torvalds           2005-04-16  380  #define sk_reuse		__sk_common.skc_reuse
055dc21a1d1d21 Tom Herbert              2013-01-22  381  #define sk_reuseport		__sk_common.skc_reuseport
9fe516ba3fb29b Eric Dumazet             2014-06-27  382  #define sk_ipv6only		__sk_common.skc_ipv6only
26abe14379f8e2 Eric W. Biederman        2015-05-08  383  #define sk_net_refcnt		__sk_common.skc_net_refcnt
^1da177e4c3f41 Linus Torvalds           2005-04-16  384  #define sk_bound_dev_if		__sk_common.skc_bound_dev_if
^1da177e4c3f41 Linus Torvalds           2005-04-16  385  #define sk_bind_node		__sk_common.skc_bind_node
8feaf0c0a5488b Arnaldo Carvalho de Melo 2005-08-09  386  #define sk_prot			__sk_common.skc_prot
07feaebfcc10cd Eric W. Biederman        2007-09-12  387  #define sk_net			__sk_common.skc_net
efe4208f47f907 Eric Dumazet             2013-10-03  388  #define sk_v6_daddr		__sk_common.skc_v6_daddr
efe4208f47f907 Eric Dumazet             2013-10-03 @389  #define sk_v6_rcv_saddr	__sk_common.skc_v6_rcv_saddr
33cf7c90fe2f97 Eric Dumazet             2015-03-11  390  #define sk_cookie		__sk_common.skc_cookie
70da268b569d32 Eric Dumazet             2015-10-08  391  #define sk_incoming_cpu		__sk_common.skc_incoming_cpu
8e5eb54d303b7c Eric Dumazet             2015-10-08  392  #define sk_flags		__sk_common.skc_flags
ed53d0ab761f5c Eric Dumazet             2015-10-08  393  #define sk_rxhash		__sk_common.skc_rxhash
efe4208f47f907 Eric Dumazet             2013-10-03  394  
43f51df4172955 Eric Dumazet             2021-11-15  395  	/* early demux fields */
8b3f91332291fa Jakub Kicinski           2021-12-23  396  	struct dst_entry __rcu	*sk_rx_dst;
43f51df4172955 Eric Dumazet             2021-11-15  397  	int			sk_rx_dst_ifindex;
43f51df4172955 Eric Dumazet             2021-11-15  398  	u32			sk_rx_dst_cookie;
43f51df4172955 Eric Dumazet             2021-11-15  399  
^1da177e4c3f41 Linus Torvalds           2005-04-16  400  	socket_lock_t		sk_lock;
9115e8cd2a0c6e Eric Dumazet             2016-12-03  401  	atomic_t		sk_drops;
9115e8cd2a0c6e Eric Dumazet             2016-12-03  402  	int			sk_rcvlowat;
9115e8cd2a0c6e Eric Dumazet             2016-12-03  403  	struct sk_buff_head	sk_error_queue;
b178bb3dfc30d9 Eric Dumazet             2010-11-16  404  	struct sk_buff_head	sk_receive_queue;
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  405  	/*
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  406  	 * The backlog queue is special, it is always used with
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  407  	 * the per-socket spinlock held and requires low latency
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  408  	 * access. Therefore we special case it's implementation.
b178bb3dfc30d9 Eric Dumazet             2010-11-16  409  	 * Note : rmem_alloc is in this structure to fill a hole
b178bb3dfc30d9 Eric Dumazet             2010-11-16  410  	 * on 64bit arches, not because its logically part of
b178bb3dfc30d9 Eric Dumazet             2010-11-16  411  	 * backlog.
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  412  	 */
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  413  	struct {
b178bb3dfc30d9 Eric Dumazet             2010-11-16  414  		atomic_t	rmem_alloc;
b178bb3dfc30d9 Eric Dumazet             2010-11-16  415  		int		len;
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  416  		struct sk_buff	*head;
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  417  		struct sk_buff	*tail;
fa438ccfdfd3f6 Eric Dumazet             2007-03-04  418  	} sk_backlog;
f35f821935d8df Eric Dumazet             2021-11-15  419  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-09 22:21 ` [PATCH v5 net-next 13/13] mlx5: " Eric Dumazet
  2022-05-09 22:30   ` Eric Dumazet
@ 2022-05-10  1:38   ` Jakub Kicinski
  2022-05-10  2:00     ` Eric Dumazet
                       ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Jakub Kicinski @ 2022-05-10  1:38 UTC (permalink / raw)
  To: Eric Dumazet, Kees Cook
  Cc: David S . Miller, Paolo Abeni, netdev, Alexander Duyck, Coco Li,
	Eric Dumazet, Tariq Toukan, Saeed Mahameed, Leon Romanovsky

On Mon,  9 May 2022 15:21:49 -0700 Eric Dumazet wrote:
> From: Coco Li <lixiaoyan@google.com>
> 
> mlx5 supports LSOv2.
> 
> IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> with JUMBO TLV for big packets.
> 
> We need to ignore/skip this HBH header when populating TX descriptor.
> 
> Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
> layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
> 
> v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
> v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
> 
> Signed-off-by: Coco Li <lixiaoyan@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> Cc: Saeed Mahameed <saeedm@nvidia.com>
> Cc: Leon Romanovsky <leon@kernel.org>

So we're leaving the warning for Kees to deal with?

Kees is there some form of "I know what I'm doing" cast 
that you could sneak us under the table?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-10  1:38   ` Jakub Kicinski
@ 2022-05-10  2:00     ` Eric Dumazet
  2022-05-10 15:49     ` Kees Cook
  2022-05-11  2:55     ` Kees Cook
  2 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-10  2:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, Kees Cook, David S . Miller, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Tariq Toukan, Saeed Mahameed,
	Leon Romanovsky

On Mon, May 9, 2022 at 6:38 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon,  9 May 2022 15:21:49 -0700 Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> >
> > mlx5 supports LSOv2.
> >
> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> > with JUMBO TLV for big packets.
> >
> > We need to ignore/skip this HBH header when populating TX descriptor.
> >
> > Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
> > layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
> >
> > v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
> > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
> >
> > Signed-off-by: Coco Li <lixiaoyan@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > Cc: Saeed Mahameed <saeedm@nvidia.com>
> > Cc: Leon Romanovsky <leon@kernel.org>
>
> So we're leaving the warning for Kees to deal with?

I think so. I do not see an easy way to escape this, unless perhaps add some
extra obfuscation, so that gcc can not determine the memcpy() third
argument at compile time.

Alternative is to remove mlx5 patch from the upstream series.

>
> Kees is there some form of "I know what I'm doing" cast
> that you could sneak us under the table?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536
  2022-05-10  1:35   ` kernel test robot
@ 2022-05-10  2:09     ` Eric Dumazet
  2022-05-10  2:20       ` Eric Dumazet
  0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2022-05-10  2:09 UTC (permalink / raw)
  To: kernel test robot
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	kbuild-all, netdev, Alexander Duyck, Coco Li

On Mon, May 9, 2022 at 6:36 PM kernel test robot <lkp@intel.com> wrote:
>
> Hi Eric,
>
> Thank you for the patch! Yet something to improve:
>
> [auto build test ERROR on net-next/master]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220510-062530
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 9c095bd0d4c451d31d0fd1131cc09d3b60de815d
> config: um-i386_defconfig (https://download.01.org/0day-ci/archive/20220510/202205100923.RHeXqtNd-lkp@intel.com/config)
> compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
> reproduce (this is a W=1 build):
>         # https://github.com/intel-lab-lkp/linux/commit/8f9b47ee99f57d1747010d002315092bfa17ed50
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220510-062530
>         git checkout 8f9b47ee99f57d1747010d002315092bfa17ed50
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         make W=1 O=build_dir ARCH=um SUBARCH=i386 SHELL=/bin/bash
>
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
>
> All errors (new ones prefixed by >>):
>
>    In file included from include/net/inet_sock.h:22,
>                     from include/net/ip.h:29,
>                     from include/linux/errqueue.h:6,
>                     from net/core/sock.c:91:
>    net/core/sock.c: In function 'sk_setup_caps':
> >> include/net/sock.h:389:37: error: 'struct sock_common' has no member named 'skc_v6_rcv_saddr'; did you mean 'skc_rcv_saddr'?
>      389 | #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr
>          |                                     ^~~~~~~~~~~~~~~~
>    net/core/sock.c:2317:72: note: in expansion of macro 'sk_v6_rcv_saddr'
>     2317 |                              !sk_is_tcp(sk) || ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
>          |                                                                        ^~~~~~~~~~~~~~~
>
>

Alexander used :

+                       if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
+                           (!IS_ENABLED(CONFIG_IPV6) || sk->sk_family
!= AF_INET6 ||
+                            !sk_is_tcp(sk) ||
ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
+                               sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;

I guess we could simply allow gso_max_size to be bigger than
GSO_LEGACY_MAX_SIZE only
if IS_ENABLED(CONFIG_IPV6)

So the above code could really be:

#if IS_ENABLED(CONFIG_IPV6)
                       if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
                           (sk->sk_family != AF_INET6 ||
                            !sk_is_tcp(sk) ||
ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
                               sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
#endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536
  2022-05-10  2:09     ` Eric Dumazet
@ 2022-05-10  2:20       ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2022-05-10  2:20 UTC (permalink / raw)
  To: kernel test robot
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni,
	kbuild-all, netdev, Alexander Duyck, Coco Li

On Mon, May 9, 2022 at 7:09 PM Eric Dumazet <edumazet@google.com> wrote:
>

>
> Alexander used :
>
> +                       if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
> +                           (!IS_ENABLED(CONFIG_IPV6) || sk->sk_family
> != AF_INET6 ||
> +                            !sk_is_tcp(sk) ||
> ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
> +                               sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
>
> I guess we could simply allow gso_max_size to be bigger than
> GSO_LEGACY_MAX_SIZE only
> if IS_ENABLED(CONFIG_IPV6)
>
> So the above code could really be:
>
> #if IS_ENABLED(CONFIG_IPV6)
>                        if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
>                            (sk->sk_family != AF_INET6 ||
>                             !sk_is_tcp(sk) ||
> ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
>                                sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
> #endif

In v6, I will squash the following diff to Alexander patch:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dfd57a647c97ed0f400ffe89c73919367a900f75..6bd9e09b34ec583a05a929ca979511e6423dbeb7
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2271,8 +2271,13 @@ struct net_device {

 /* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE),
  * and shinfo->gso_segs is a 16bit field.
+ * If IPV6 is not enabled, we keep legacy value.
  */
+#if IS_ENABLED(CONFIG_IPV6)
 #define GSO_MAX_SIZE           (8 * GSO_MAX_SEGS)
+#else
+#define GSO_MAX_SIZE           GSO_LEGACY_MAX_SIZE
+#endif

        unsigned int            gso_max_size;
 #define TSO_LEGACY_MAX_SIZE    65536
diff --git a/net/core/dev.c b/net/core/dev.c
index 7349f75891d5724a060781abc80a800bdf835f74..4be3695846520af18a687cdcaa70c5f327ba94e8
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3003,7 +3003,7 @@ EXPORT_SYMBOL(netif_set_real_num_queues);
  */
 void netif_set_tso_max_size(struct net_device *dev, unsigned int size)
 {
-       dev->tso_max_size = size;
+       dev->tso_max_size = min(GSO_MAX_SIZE, size);
        if (size < READ_ONCE(dev->gso_max_size))
                netif_set_gso_max_size(dev, size);
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index f7c3171078b6fccd25757e8fe54dd56a2a674238..2a931f396472108ccedcd3d08189c63775caecff
100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2312,10 +2312,13 @@ void sk_setup_caps(struct sock *sk, struct
dst_entry *dst)
                        sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
                        /* pairs with the WRITE_ONCE() in
netif_set_gso_max_size() */
                        sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
+#if IS_ENABLED(CONFIG_IPV6)
                        if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
-                           (!IS_ENABLED(CONFIG_IPV6) || sk->sk_family
!= AF_INET6 ||
-                            !sk_is_tcp(sk) ||
ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
+                           (sk->sk_family != AF_INET6 ||
+                            !sk_is_tcp(sk) ||
+                            ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
                                sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
+#endif
                        sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
                        /* pairs with the WRITE_ONCE() in
netif_set_gso_max_segs() */
                        max_segs = max_t(u32,
READ_ONCE(dst->dev->gso_max_segs), 1);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536
  2022-05-09 22:21 ` [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
  2022-05-10  1:35   ` kernel test robot
@ 2022-05-10  3:08   ` kernel test robot
  1 sibling, 0 replies; 25+ messages in thread
From: kernel test robot @ 2022-05-10  3:08 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: llvm, kbuild-all, netdev, Alexander Duyck, Coco Li, Eric Dumazet

Hi Eric,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/tcp-BIG-TCP-implementation/20220510-062530
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 9c095bd0d4c451d31d0fd1131cc09d3b60de815d
config: arm-spear3xx_defconfig (https://download.01.org/0day-ci/archive/20220510/202205101045.zaceqBiC-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 3abb68a626160e019c30a4860e569d7bc75e486a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/intel-lab-lkp/linux/commit/8f9b47ee99f57d1747010d002315092bfa17ed50
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Eric-Dumazet/tcp-BIG-TCP-implementation/20220510-062530
        git checkout 8f9b47ee99f57d1747010d002315092bfa17ed50
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> net/core/sock.c:2317:51: error: no member named 'skc_v6_rcv_saddr' in 'struct sock_common'; did you mean 'skc_rcv_saddr'?
                                !sk_is_tcp(sk) || ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
                                                                          ^
   include/net/sock.h:389:37: note: expanded from macro 'sk_v6_rcv_saddr'
   #define sk_v6_rcv_saddr __sk_common.skc_v6_rcv_saddr
                                       ^
   include/net/sock.h:171:11: note: 'skc_rcv_saddr' declared here
                           __be32  skc_rcv_saddr;
                                   ^
   1 error generated.


vim +2317 net/core/sock.c

  2295	
  2296	void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
  2297	{
  2298		u32 max_segs = 1;
  2299	
  2300		sk_dst_set(sk, dst);
  2301		sk->sk_route_caps = dst->dev->features;
  2302		if (sk_is_tcp(sk))
  2303			sk->sk_route_caps |= NETIF_F_GSO;
  2304		if (sk->sk_route_caps & NETIF_F_GSO)
  2305			sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
  2306		if (unlikely(sk->sk_gso_disabled))
  2307			sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
  2308		if (sk_can_gso(sk)) {
  2309			if (dst->header_len && !xfrm_dst_offload_ok(dst)) {
  2310				sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
  2311			} else {
  2312				sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
  2313				/* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */
  2314				sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
  2315				if (sk->sk_gso_max_size > GSO_LEGACY_MAX_SIZE &&
  2316				    (!IS_ENABLED(CONFIG_IPV6) || sk->sk_family != AF_INET6 ||
> 2317				     !sk_is_tcp(sk) || ipv6_addr_v4mapped(&sk->sk_v6_rcv_saddr)))
  2318					sk->sk_gso_max_size = GSO_LEGACY_MAX_SIZE;
  2319				sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
  2320				/* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */
  2321				max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1);
  2322			}
  2323		}
  2324		sk->sk_gso_max_segs = max_segs;
  2325	}
  2326	EXPORT_SYMBOL_GPL(sk_setup_caps);
  2327	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-10  1:38   ` Jakub Kicinski
  2022-05-10  2:00     ` Eric Dumazet
@ 2022-05-10 15:49     ` Kees Cook
  2022-05-11  2:55     ` Kees Cook
  2 siblings, 0 replies; 25+ messages in thread
From: Kees Cook @ 2022-05-10 15:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Eric Dumazet, Tariq Toukan,
	Saeed Mahameed, Leon Romanovsky

On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote:
> On Mon,  9 May 2022 15:21:49 -0700 Eric Dumazet wrote:
> > From: Coco Li <lixiaoyan@google.com>
> > 
> > mlx5 supports LSOv2.
> > 
> > IPv6 gro/tcp stacks insert a temporary Hop-by-Hop header
> > with JUMBO TLV for big packets.
> > 
> > We need to ignore/skip this HBH header when populating TX descriptor.
> > 
> > Note that ipv6_has_hopopt_jumbo() only recognizes very specific packet
> > layout, thus mlx5e_sq_xmit_wqe() is taking care of this layout only.
> > 
> > v2: clear hopbyhop in mlx5e_tx_get_gso_ihs()
> > v4: fix compile error for CONFIG_MLX5_CORE_IPOIB=y
> > 
> > Signed-off-by: Coco Li <lixiaoyan@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
> > Cc: Saeed Mahameed <saeedm@nvidia.com>
> > Cc: Leon Romanovsky <leon@kernel.org>
> 
> So we're leaving the warning for Kees to deal with?
> 
> Kees is there some form of "I know what I'm doing" cast 
> that you could sneak us under the table?

Right now, it's switching that memcpy to __builtin_memcpy(), but I'll
send a patch that'll create an unsafe_memcpy() macro that does the right
things vs kasan, fortify, etc.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-10  1:38   ` Jakub Kicinski
  2022-05-10  2:00     ` Eric Dumazet
  2022-05-10 15:49     ` Kees Cook
@ 2022-05-11  2:55     ` Kees Cook
  2022-05-11 16:26       ` Jakub Kicinski
  2 siblings, 1 reply; 25+ messages in thread
From: Kees Cook @ 2022-05-11  2:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Eric Dumazet, Tariq Toukan,
	Saeed Mahameed, Leon Romanovsky

On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote:
> So we're leaving the warning for Kees to deal with?
> 
> Kees is there some form of "I know what I'm doing" cast 
> that you could sneak us under the table?

Okay, I've sent this[1] now. If that looks okay to you, I figure you'll
land it via netdev for the coming merge window?

-Kees

[1] https://lore.kernel.org/netdev/20220511025301.3636666-1-keescook@chromium.org/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-11  2:55     ` Kees Cook
@ 2022-05-11 16:26       ` Jakub Kicinski
  2022-05-11 17:27         ` Kees Cook
  0 siblings, 1 reply; 25+ messages in thread
From: Jakub Kicinski @ 2022-05-11 16:26 UTC (permalink / raw)
  To: Kees Cook
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Eric Dumazet, Tariq Toukan,
	Saeed Mahameed, Leon Romanovsky

On Tue, 10 May 2022 19:55:16 -0700 Kees Cook wrote:
> On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote:
> > So we're leaving the warning for Kees to deal with?
> > 
> > Kees is there some form of "I know what I'm doing" cast 
> > that you could sneak us under the table?  
> 
> Okay, I've sent this[1] now. If that looks okay to you, I figure you'll
> land it via netdev for the coming merge window?

I was about to say "great!" but perhaps given we're adding an unsafe_
flavor of something a "it is what it is" would be a more appropriate
reaction.

Thank you!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v5 net-next 13/13] mlx5: support BIG TCP packets
  2022-05-11 16:26       ` Jakub Kicinski
@ 2022-05-11 17:27         ` Kees Cook
  0 siblings, 0 replies; 25+ messages in thread
From: Kees Cook @ 2022-05-11 17:27 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, David S . Miller, Paolo Abeni, netdev,
	Alexander Duyck, Coco Li, Eric Dumazet, Tariq Toukan,
	Saeed Mahameed, Leon Romanovsky

On Wed, May 11, 2022 at 09:26:48AM -0700, Jakub Kicinski wrote:
> On Tue, 10 May 2022 19:55:16 -0700 Kees Cook wrote:
> > On Mon, May 09, 2022 at 06:38:53PM -0700, Jakub Kicinski wrote:
> > > So we're leaving the warning for Kees to deal with?
> > > 
> > > Kees is there some form of "I know what I'm doing" cast 
> > > that you could sneak us under the table?  
> > 
> > Okay, I've sent this[1] now. If that looks okay to you, I figure you'll
> > land it via netdev for the coming merge window?
> 
> I was about to say "great!" but perhaps given we're adding an unsafe_
> flavor of something a "it is what it is" would be a more appropriate
> reaction.

Heh, well, I think it's just calling a spade a spade: plain memcpy is
already unsafe. The goal is for the kernel's (fortified) memcpy to be
"provably" safe. :) But yeah, I get what you mean. I'm sad that I don't
yet have a workable way to deal with this code pattern, but I'm getting
close, I think. My random notes currently are:

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
index 2dc48406cd08..595d0db4e97a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c
@@ -386,6 +386,14 @@ mlx5e_sq_xmit_wqe(struct mlx5e_txqsq *sq, struct sk_buff *skb,
 			stats->added_vlan_packets++;
 		} else {
 			eseg->inline_hdr.sz |= cpu_to_be16(attr->ihs);
+/* interface could take:
+	fas: wqe
+	dst: eth.inline_hdr.start
+	src: skb->data
+	bytes: attr->ihs
+	elements member: data
+	elements_count value: wqe_attr->ds_cnt_inl
+*/
 			memcpy(eseg->inline_hdr.start, skb->data, attr->ihs);
 		}
 		dseg += wqe_attr->ds_cnt_inl;

There's a similar case with how netlink constructs things (i.e.
performing a memcpy across some of the trailing header members and then
into the flex array) that may share this code pattern, and at least one
patch to mlx5 I'd sent before could be refactored back into this to
unsplit the memcpy there.

Anyway, I'll continue to chip away at it.

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-05-11 17:28 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09 22:21 [PATCH v5 net-next 00/13] tcp: BIG TCP implementation Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 01/13] net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 02/13] net: allow gso_max_size to exceed 65536 Eric Dumazet
2022-05-10  1:35   ` kernel test robot
2022-05-10  2:09     ` Eric Dumazet
2022-05-10  2:20       ` Eric Dumazet
2022-05-10  3:08   ` kernel test robot
2022-05-09 22:21 ` [PATCH v5 net-next 03/13] net: limit GSO_MAX_SIZE to 524280 bytes Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 04/13] tcp_cubic: make hystart_ack_delay() aware of BIG TCP Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 05/13] ipv6: add struct hop_jumbo_hdr definition Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 06/13] ipv6/gso: remove temporary HBH/jumbo header Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 07/13] ipv6/gro: insert " Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 08/13] net: allow gro_max_size to exceed 65536 Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 09/13] ipv6: Add hop-by-hop header to jumbograms in ip6_output Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 10/13] net: loopback: enable BIG TCP packets Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 11/13] veth: " Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 12/13] mlx4: support " Eric Dumazet
2022-05-09 22:21 ` [PATCH v5 net-next 13/13] mlx5: " Eric Dumazet
2022-05-09 22:30   ` Eric Dumazet
2022-05-10  1:38   ` Jakub Kicinski
2022-05-10  2:00     ` Eric Dumazet
2022-05-10 15:49     ` Kees Cook
2022-05-11  2:55     ` Kees Cook
2022-05-11 16:26       ` Jakub Kicinski
2022-05-11 17:27         ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).