netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] Support fraglist GRO/GSO
@ 2019-12-18 13:34 Steffen Klassert
  2019-12-18 13:34 ` [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags Steffen Klassert
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Steffen Klassert @ 2019-12-18 13:34 UTC (permalink / raw)
  To: David Miller
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner, netdev

This patchset adds support to do GRO/GSO by chaining packets
of the same flow at the SKB frag_list pointer. This avoids
the overhead to merge payloads into one big packet, and
on the other end, if GSO is needed it avoids the overhead
of splitting the big packet back to the native form.

Patch 1 adds netdev feature flags to enable listifyed GRO,
this implements one of the configuration options discussed
at netconf 2019.

Patch 2 adds a netdev software feature set that defaults to off
and assigns the new listifyed GRO feature flag to it.

Patch 3 adds the core infrastructure to do fraglist GRO/GSO.

Patch 4 enables UDP to use fraglist GRO/GSO if configured.

I have only meaningful forwarding performance measurements.
I did some tests for the local receive path with netperf and iperf,
but in this case the sender that generates the packets is the
bottleneck. So the benchmarks are not that meaningful for the
receive path.

Paolo Abeni did some benchmarks of the local receive path for the v2
version of this pachset, results can be found here:

https://www.spinics.net/lists/netdev/msg551158.html

I used my IPsec forwarding test setup for the performance measurements:

           ------------         ------------
        -->| router 1 |-------->| router 2 |--
        |  ------------         ------------  |
        |                                     |
        |       --------------------          |
        --------|Spirent Testcenter|<----------
                --------------------

net-next (September 7th):

Single stream UDP frame size 1460 Bytes: 1.161.000 fps (13.5 Gbps).

----------------------------------------------------------------------

net-next (September 7th) + standard UDP GRO/GSO (not implemented
in this patchset):

Single stream UDP frame size 1460 Bytes: 1.801.000 fps (21 Gbps).

----------------------------------------------------------------------
net-next (September 7th) + fraglist UDP GRO/GSO:

Single stream UDP frame size 1460 Bytes: 2.860.000 fps (33.4 Gbps).

-----------------------------------------------------------------------

Changes from RFC v1:

- Add IPv6 support.
- Split patchset to enable UDP GRO by default before adding
  fraglist GRO support.
- Mark fraglist GRO packets as CHECKSUM_NONE.
- Take a refcount on the first segment skb when doing fraglist
  segmentation. With this we can use the same error handling
  path as with standard segmentation.

Changes from RFC v2:

- Add a netdev feature flag to configure listifyed GRO.
- Fix UDP GRO enabling for IPv6.
- Fix a rcu_read_lock() imbalance.
- Fix error path in skb_segment_list().

Changes from RFC v3:

- Rename NETIF_F_GRO_LIST to NETIF_F_GRO_FRAGLIST and add
  NETIF_F_GSO_FRAGLIST.
- Move introduction of SKB_GSO_FRAGLIST to patch 2.
- Use udpv6_encap_needed_key instead of udp_encap_needed_key in IPv6.
- Move some missplaced code from patch 5 to patch 1 where it belongs to.

Changes from RFC v4:

- Drop the 'UDP: enable GRO by default' patch for now. Standard UDP GRO
  is not changed with this patchset.
- Rebase to net-next current.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags
  2019-12-18 13:34 [PATCH net-next 0/4] Support fraglist GRO/GSO Steffen Klassert
@ 2019-12-18 13:34 ` Steffen Klassert
  2019-12-18 16:02   ` Willem de Bruijn
  2019-12-18 13:34 ` [PATCH net-next 2/4] net: Add a netdev software feature set that defaults to off Steffen Klassert
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-12-18 13:34 UTC (permalink / raw)
  To: David Miller
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner, netdev

This adds new Fraglist GRO/GSO feature flags. They will be used
to configure fraglist GRO/GSO what will be implemented with some
followup paches.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdev_features.h | 6 +++++-
 include/linux/netdevice.h       | 1 +
 include/linux/skbuff.h          | 2 ++
 net/ethtool/common.c            | 1 +
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 4b19c544c59a..b239507da2a0 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -53,8 +53,9 @@ enum {
 	NETIF_F_GSO_ESP_BIT,		/* ... ESP with TSO */
 	NETIF_F_GSO_UDP_BIT,		/* ... UFO, deprecated except tuntap */
 	NETIF_F_GSO_UDP_L4_BIT,		/* ... UDP payload GSO (not UFO) */
+	NETIF_F_GSO_FRAGLIST_BIT,		/* ... Fraglist GSO */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_UDP_L4_BIT,
+		NETIF_F_GSO_FRAGLIST_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CRC_BIT,		/* SCTP checksum offload */
@@ -80,6 +81,7 @@ enum {
 
 	NETIF_F_GRO_HW_BIT,		/* Hardware Generic receive offload */
 	NETIF_F_HW_TLS_RECORD_BIT,	/* Offload TLS record */
+	NETIF_F_GRO_FRAGLIST_BIT,	/* Fraglist GRO */
 
 	/*
 	 * Add your fresh new feature above and remember to update
@@ -150,6 +152,8 @@ enum {
 #define NETIF_F_GSO_UDP_L4	__NETIF_F(GSO_UDP_L4)
 #define NETIF_F_HW_TLS_TX	__NETIF_F(HW_TLS_TX)
 #define NETIF_F_HW_TLS_RX	__NETIF_F(HW_TLS_RX)
+#define NETIF_F_GRO_FRAGLIST	__NETIF_F(GRO_FRAGLIST)
+#define NETIF_F_GSO_FRAGLIST	__NETIF_F(GSO_FRAGLIST)
 
 /* Finds the next feature with the highest number of the range of start till 0.
  */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 30745068fb39..1eddf5db0fb6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4553,6 +4553,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e9133bcf0544..82108f6bc198 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -592,6 +592,8 @@ enum {
 	SKB_GSO_UDP = 1 << 16,
 
 	SKB_GSO_UDP_L4 = 1 << 17,
+
+	SKB_GSO_FRAGLIST = 1 << 18,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 0a8728565356..041f9ba89caa 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -59,6 +59,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
 	[NETIF_F_HW_TLS_RECORD_BIT] =	"tls-hw-record",
 	[NETIF_F_HW_TLS_TX_BIT] =	 "tls-hw-tx-offload",
 	[NETIF_F_HW_TLS_RX_BIT] =	 "tls-hw-rx-offload",
+	[NETIF_F_GRO_FRAGLIST_BIT] =	 "rx-gro-list",
 };
 
 const char
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 2/4] net: Add a netdev software feature set that defaults to off.
  2019-12-18 13:34 [PATCH net-next 0/4] Support fraglist GRO/GSO Steffen Klassert
  2019-12-18 13:34 ` [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags Steffen Klassert
@ 2019-12-18 13:34 ` Steffen Klassert
  2019-12-18 16:02   ` Willem de Bruijn
  2019-12-18 13:34 ` [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining Steffen Klassert
  2019-12-18 13:34 ` [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO Steffen Klassert
  3 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-12-18 13:34 UTC (permalink / raw)
  To: David Miller
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner, netdev

The previous patch added the NETIF_F_GRO_FRAGLIST feature.
This is a software feature that should default to off.
Current software features default to on, so add a new
feature set that defaults to off.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdev_features.h | 3 +++
 net/core/dev.c                  | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index b239507da2a0..34d050bb1ae6 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -230,6 +230,9 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start)
 /* changeable features with no special hardware requirements */
 #define NETIF_F_SOFT_FEATURES	(NETIF_F_GSO | NETIF_F_GRO)
 
+/* Changeable features with no special hardware requirements that defaults to off. */
+#define NETIF_F_SOFT_FEATURES_OFF	NETIF_F_GRO_FRAGLIST
+
 #define NETIF_F_VLAN_FEATURES	(NETIF_F_HW_VLAN_CTAG_FILTER | \
 				 NETIF_F_HW_VLAN_CTAG_RX | \
 				 NETIF_F_HW_VLAN_CTAG_TX | \
diff --git a/net/core/dev.c b/net/core/dev.c
index 2c277b8aba38..092bdbcfb9f8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9269,7 +9269,7 @@ int register_netdevice(struct net_device *dev)
 	/* Transfer changeable features to wanted_features and enable
 	 * software offloads (GSO and GRO).
 	 */
-	dev->hw_features |= NETIF_F_SOFT_FEATURES;
+	dev->hw_features |= (NETIF_F_SOFT_FEATURES | NETIF_F_SOFT_FEATURES_OFF);
 	dev->features |= NETIF_F_SOFT_FEATURES;
 
 	if (dev->netdev_ops->ndo_udp_tunnel_add) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2019-12-18 13:34 [PATCH net-next 0/4] Support fraglist GRO/GSO Steffen Klassert
  2019-12-18 13:34 ` [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags Steffen Klassert
  2019-12-18 13:34 ` [PATCH net-next 2/4] net: Add a netdev software feature set that defaults to off Steffen Klassert
@ 2019-12-18 13:34 ` Steffen Klassert
  2019-12-18 16:02   ` Willem de Bruijn
  2019-12-18 13:34 ` [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO Steffen Klassert
  3 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-12-18 13:34 UTC (permalink / raw)
  To: David Miller
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner, netdev

This patch adds the core functions to chain/unchain
GSO skbs at the frag_list pointer. This also adds
a new GSO type SKB_GSO_FRAGLIST and a is_flist
flag to napi_gro_cb which indicates that this
flow will be GROed by fraglist chaining.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/linux/netdevice.h |   4 +-
 include/linux/skbuff.h    |   2 +
 net/core/dev.c            |   2 +-
 net/core/skbuff.c         | 106 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 112 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1eddf5db0fb6..f854c75c6e5d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2320,7 +2320,8 @@ struct napi_gro_cb {
 	/* Number of gro_receive callbacks this packet already went through */
 	u8 recursion_counter:4;
 
-	/* 1 bit hole */
+	/* GRO is done by frag_list pointer chaining. */
+	u8	is_flist:1;
 
 	/* used to support CHECKSUM_COMPLETE for tunneling protocols */
 	__wsum	csum;
@@ -2688,6 +2689,7 @@ struct net_device *dev_get_by_napi_id(unsigned int napi_id);
 int netdev_get_name(struct net *net, char *name, int ifindex);
 int dev_restart(struct net_device *dev);
 int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb);
 
 static inline unsigned int skb_gro_offset(const struct sk_buff *skb)
 {
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 82108f6bc198..213a8c7723ea 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3525,6 +3525,8 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet);
 bool skb_gso_validate_network_len(const struct sk_buff *skb, unsigned int mtu);
 bool skb_gso_validate_mac_len(const struct sk_buff *skb, unsigned int len);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
+struct sk_buff *skb_segment_list(struct sk_buff *skb, netdev_features_t features,
+				 unsigned int offset);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
 int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci);
diff --git a/net/core/dev.c b/net/core/dev.c
index 092bdbcfb9f8..139d6d5a7897 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3249,7 +3249,7 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 
 	segs = skb_mac_gso_segment(skb, features);
 
-	if (unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs)))
+	if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs)))
 		skb_warn_bad_offload(skb);
 
 	return segs;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 973a71f4bc89..885a5f76b6ee 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3638,6 +3638,112 @@ static inline skb_frag_t skb_head_frag_to_page_desc(struct sk_buff *frag_skb)
 	return head_frag;
 }
 
+struct sk_buff *skb_segment_list(struct sk_buff *skb,
+				 netdev_features_t features,
+				 unsigned int offset)
+{
+	struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
+	unsigned int tnl_hlen = skb_tnl_header_len(skb);
+	unsigned int delta_truesize = 0;
+	unsigned int delta_len = 0;
+	struct sk_buff *tail = NULL;
+	struct sk_buff *nskb;
+
+	skb_push(skb, -skb_network_offset(skb) + offset);
+
+	skb_shinfo(skb)->frag_list = NULL;
+
+	do {
+		nskb = list_skb;
+		list_skb = list_skb->next;
+
+		if (!tail)
+			skb->next = nskb;
+		else
+			tail->next = nskb;
+
+		tail = nskb;
+
+		delta_len += nskb->len;
+		delta_truesize += nskb->truesize;
+
+		skb_push(nskb, -skb_network_offset(nskb) + offset);
+
+		if (!secpath_exists(nskb))
+			__skb_ext_copy(nskb, skb);
+
+		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
+
+		nskb->tstamp = skb->tstamp;
+		nskb->dev = skb->dev;
+		nskb->queue_mapping = skb->queue_mapping;
+
+		nskb->mac_len = skb->mac_len;
+		nskb->mac_header = skb->mac_header;
+		nskb->transport_header = skb->transport_header;
+		nskb->network_header = skb->network_header;
+		skb_dst_copy(nskb, skb);
+
+		skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
+		skb_copy_from_linear_data_offset(skb, -tnl_hlen,
+						 nskb->data - tnl_hlen,
+						 offset + tnl_hlen);
+
+		if (skb_needs_linearize(nskb, features) &&
+		    __skb_linearize(nskb))
+			goto err_linearize;
+
+	} while (list_skb);
+
+	skb->truesize = skb->truesize - delta_truesize;
+	skb->data_len = skb->data_len - delta_len;
+	skb->len = skb->len - delta_len;
+	skb->ip_summed = nskb->ip_summed;
+	skb->csum_level = nskb->csum_level;
+
+	skb_gso_reset(skb);
+
+	skb->prev = tail;
+
+	if (skb_needs_linearize(skb, features) &&
+	    __skb_linearize(skb))
+		goto err_linearize;
+
+	skb_get(skb);
+
+	return skb;
+
+err_linearize:
+	kfree_skb_list(skb->next);
+	skb->next = NULL;
+	return ERR_PTR(-ENOMEM);
+}
+EXPORT_SYMBOL_GPL(skb_segment_list);
+
+int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
+{
+	if (unlikely(p->len + skb->len >= 65536))
+		return -E2BIG;
+
+	if (NAPI_GRO_CB(p)->last == p)
+		skb_shinfo(p)->frag_list = skb;
+	else
+		NAPI_GRO_CB(p)->last->next = skb;
+
+	skb_pull(skb, skb_gro_offset(skb));
+
+	NAPI_GRO_CB(p)->last = skb;
+	NAPI_GRO_CB(p)->count++;
+	p->data_len += skb->len;
+	p->truesize += skb->truesize;
+	p->len += skb->len;
+
+	NAPI_GRO_CB(skb)->same_flow = 1;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(skb_gro_receive_list);
+
 /**
  *	skb_segment - Perform protocol segmentation on skb.
  *	@head_skb: buffer to segment
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO.
  2019-12-18 13:34 [PATCH net-next 0/4] Support fraglist GRO/GSO Steffen Klassert
                   ` (2 preceding siblings ...)
  2019-12-18 13:34 ` [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining Steffen Klassert
@ 2019-12-18 13:34 ` Steffen Klassert
  2019-12-18 16:03   ` Willem de Bruijn
  3 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-12-18 13:34 UTC (permalink / raw)
  To: David Miller
  Cc: Steffen Klassert, Willem de Bruijn, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner, netdev

This patch extends UDP GRO to support fraglist GRO/GSO
by using the previously introduced infrastructure.
If the feature is enabled, all UDP packets are going to
fraglist GRO (local input and forward).

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/udp.h      |  2 +-
 net/ipv4/udp_offload.c | 99 ++++++++++++++++++++++++++++++++----------
 net/ipv6/udp_offload.c | 19 +++++++-
 3 files changed, 94 insertions(+), 26 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index bad74f780831..44e0e52b585c 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -167,7 +167,7 @@ typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
 				     __be16 dport);
 
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
-				struct udphdr *uh, udp_lookup_t lookup);
+				struct udphdr *uh, struct sock *sk);
 int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup);
 
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index a3908e55ed89..03c67d37a5e5 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -184,6 +184,20 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_udp_tunnel_segment);
 
+static struct sk_buff *__udp_gso_segment_list(struct sk_buff *skb,
+					      netdev_features_t features)
+{
+	unsigned int mss = skb_shinfo(skb)->gso_size;
+
+	skb = skb_segment_list(skb, features, skb_mac_header_len(skb));
+	if (IS_ERR(skb))
+		return skb;
+
+	udp_hdr(skb)->len = htons(sizeof(struct udphdr) + mss);
+
+	return skb;
+}
+
 struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 				  netdev_features_t features)
 {
@@ -196,6 +210,9 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	__sum16 check;
 	__be16 newlen;
 
+	if (skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST)
+		return __udp_gso_segment_list(gso_skb, features);
+
 	mss = skb_shinfo(gso_skb)->gso_size;
 	if (gso_skb->len <= sizeof(*uh) + mss)
 		return ERR_PTR(-EINVAL);
@@ -354,6 +371,7 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	struct udphdr *uh2;
 	struct sk_buff *p;
 	unsigned int ulen;
+	int ret = 0;
 
 	/* requires non zero csum, for symmetry with GSO */
 	if (!uh->check) {
@@ -369,7 +387,6 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	}
 	/* pull encapsulating udp header */
 	skb_gro_pull(skb, sizeof(struct udphdr));
-	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
 
 	list_for_each_entry(p, head, list) {
 		if (!NAPI_GRO_CB(p)->same_flow)
@@ -383,14 +400,40 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 			continue;
 		}
 
+		if (NAPI_GRO_CB(skb)->is_flist != NAPI_GRO_CB(p)->is_flist) {
+			NAPI_GRO_CB(skb)->flush = 1;
+			return p;
+		}
+
 		/* Terminate the flow on len mismatch or if it grow "too much".
 		 * Under small packet flood GRO count could elsewhere grow a lot
 		 * leading to excessive truesize values.
 		 * On len mismatch merge the first packet shorter than gso_size,
 		 * otherwise complete the GRO packet.
 		 */
-		if (ulen > ntohs(uh2->len) || skb_gro_receive(p, skb) ||
-		    ulen != ntohs(uh2->len) ||
+		if (ulen > ntohs(uh2->len)) {
+			pp = p;
+		} else {
+			if (NAPI_GRO_CB(skb)->is_flist) {
+				if (!pskb_may_pull(skb, skb_gro_offset(skb))) {
+					NAPI_GRO_CB(skb)->flush = 1;
+					return NULL;
+				}
+				if ((skb->ip_summed != p->ip_summed) ||
+				    (skb->csum_level != p->csum_level)) {
+					NAPI_GRO_CB(skb)->flush = 1;
+					return NULL;
+				}
+				ret = skb_gro_receive_list(p, skb);
+			} else {
+				skb_gro_postpull_rcsum(skb, uh,
+						       sizeof(struct udphdr));
+
+				ret = skb_gro_receive(p, skb);
+			}
+		}
+
+		if (ret || ulen != ntohs(uh2->len) ||
 		    NAPI_GRO_CB(p)->count >= UDP_GRO_CNT_MAX)
 			pp = p;
 
@@ -401,36 +444,29 @@ static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 	return NULL;
 }
 
-INDIRECT_CALLABLE_DECLARE(struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
-						   __be16 sport, __be16 dport));
 struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
-				struct udphdr *uh, udp_lookup_t lookup)
+				struct udphdr *uh, struct sock *sk)
 {
 	struct sk_buff *pp = NULL;
 	struct sk_buff *p;
 	struct udphdr *uh2;
 	unsigned int off = skb_gro_offset(skb);
 	int flush = 1;
-	struct sock *sk;
 
-	rcu_read_lock();
-	sk = INDIRECT_CALL_INET(lookup, udp6_lib_lookup_skb,
-				udp4_lib_lookup_skb, skb, uh->source, uh->dest);
-	if (!sk)
-		goto out_unlock;
+	if (skb->dev->features & NETIF_F_GRO_FRAGLIST)
+		NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled: 1;
 
-	if (udp_sk(sk)->gro_enabled) {
+	if ((sk && udp_sk(sk)->gro_enabled) ||  NAPI_GRO_CB(skb)->is_flist) {
 		pp = call_gro_receive(udp_gro_receive_segment, head, skb);
-		rcu_read_unlock();
 		return pp;
 	}
 
-	if (NAPI_GRO_CB(skb)->encap_mark ||
+	if (!sk || NAPI_GRO_CB(skb)->encap_mark ||
 	    (skb->ip_summed != CHECKSUM_PARTIAL &&
 	     NAPI_GRO_CB(skb)->csum_cnt == 0 &&
 	     !NAPI_GRO_CB(skb)->csum_valid) ||
 	    !udp_sk(sk)->gro_receive)
-		goto out_unlock;
+		goto out;
 
 	/* mark that this skb passed once through the tunnel gro layer */
 	NAPI_GRO_CB(skb)->encap_mark = 1;
@@ -457,8 +493,7 @@ struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	skb_gro_postpull_rcsum(skb, uh, sizeof(struct udphdr));
 	pp = call_gro_receive_sk(udp_sk(sk)->gro_receive, sk, head, skb);
 
-out_unlock:
-	rcu_read_unlock();
+out:
 	skb_gro_flush_final(skb, pp, flush);
 	return pp;
 }
@@ -468,8 +503,10 @@ INDIRECT_CALLABLE_SCOPE
 struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
+	struct sk_buff *pp;
+	struct sock *sk;
 
-	if (unlikely(!uh) || !static_branch_unlikely(&udp_encap_needed_key))
+	if (unlikely(!uh))
 		goto flush;
 
 	/* Don't bother verifying checksum if we're going to flush anyway. */
@@ -484,7 +521,11 @@ struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
 					     inet_gro_compute_pseudo);
 skip:
 	NAPI_GRO_CB(skb)->is_ipv6 = 0;
-	return udp_gro_receive(head, skb, uh, udp4_lib_lookup_skb);
+	rcu_read_lock();
+	sk = static_branch_unlikely(&udp_encap_needed_key) ? udp4_lib_lookup_skb(skb, uh->source, uh->dest) : NULL;
+	pp = udp_gro_receive(head, skb, uh, sk);
+	rcu_read_unlock();
+	return pp;
 
 flush:
 	NAPI_GRO_CB(skb)->flush = 1;
@@ -517,9 +558,7 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 	rcu_read_lock();
 	sk = INDIRECT_CALL_INET(lookup, udp6_lib_lookup_skb,
 				udp4_lib_lookup_skb, skb, uh->source, uh->dest);
-	if (sk && udp_sk(sk)->gro_enabled) {
-		err = udp_gro_complete_segment(skb);
-	} else if (sk && udp_sk(sk)->gro_complete) {
+	if (sk && udp_sk(sk)->gro_complete) {
 		skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
 					: SKB_GSO_UDP_TUNNEL;
 
@@ -529,6 +568,8 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		skb->encapsulation = 1;
 		err = udp_sk(sk)->gro_complete(sk, skb,
 				nhoff + sizeof(struct udphdr));
+	} else {
+		err = udp_gro_complete_segment(skb);
 	}
 	rcu_read_unlock();
 
@@ -544,6 +585,18 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
+	if (NAPI_GRO_CB(skb)->is_flist) {
+		uh->len = htons(skb->len - nhoff);
+
+		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
+		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+		skb->ip_summed = CHECKSUM_UNNECESSARY;
+		skb->csum_level = ~0;
+
+		return 0;
+	}
+
 	if (uh->check)
 		uh->check = ~udp_v4_check(skb->len - nhoff, iph->saddr,
 					  iph->daddr, 0);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 64b8f05d6735..8836f2b69ef3 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -115,8 +115,10 @@ INDIRECT_CALLABLE_SCOPE
 struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 {
 	struct udphdr *uh = udp_gro_udphdr(skb);
+	struct sk_buff *pp;
+	struct sock *sk;
 
-	if (unlikely(!uh) || !static_branch_unlikely(&udpv6_encap_needed_key))
+	if (unlikely(!uh))
 		goto flush;
 
 	/* Don't bother verifying checksum if we're going to flush anyway. */
@@ -132,7 +134,11 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
 
 skip:
 	NAPI_GRO_CB(skb)->is_ipv6 = 1;
-	return udp_gro_receive(head, skb, uh, udp6_lib_lookup_skb);
+	rcu_read_lock();
+	sk = static_branch_unlikely(&udpv6_encap_needed_key) ? udp6_lib_lookup_skb(skb, uh->source, uh->dest) : NULL;
+	pp = udp_gro_receive(head, skb, uh, sk);
+	rcu_read_unlock();
+	return pp;
 
 flush:
 	NAPI_GRO_CB(skb)->flush = 1;
@@ -144,6 +150,15 @@ INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
+	if (NAPI_GRO_CB(skb)->is_flist) {
+		uh->len = htons(skb->len - nhoff);
+
+		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
+		skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
+
+		return 0;
+	}
+
 	if (uh->check)
 		uh->check = ~udp_v6_check(skb->len - nhoff, &ipv6h->saddr,
 					  &ipv6h->daddr, 0);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags
  2019-12-18 13:34 ` [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags Steffen Klassert
@ 2019-12-18 16:02   ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2019-12-18 16:02 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> This adds new Fraglist GRO/GSO feature flags. They will be used
> to configure fraglist GRO/GSO what will be implemented with some
> followup paches.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 2/4] net: Add a netdev software feature set that defaults to off.
  2019-12-18 13:34 ` [PATCH net-next 2/4] net: Add a netdev software feature set that defaults to off Steffen Klassert
@ 2019-12-18 16:02   ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2019-12-18 16:02 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> The previous patch added the NETIF_F_GRO_FRAGLIST feature.
> This is a software feature that should default to off.
> Current software features default to on, so add a new
> feature set that defaults to off.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2019-12-18 13:34 ` [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining Steffen Klassert
@ 2019-12-18 16:02   ` Willem de Bruijn
  2019-12-19  8:22     ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2019-12-18 16:02 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> This patch adds the core functions to chain/unchain
> GSO skbs at the frag_list pointer. This also adds
> a new GSO type SKB_GSO_FRAGLIST and a is_flist
> flag to napi_gro_cb which indicates that this
> flow will be GROed by fraglist chaining.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

> +struct sk_buff *skb_segment_list(struct sk_buff *skb,
> +                                netdev_features_t features,
> +                                unsigned int offset)
> +{
> +       struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
> +       unsigned int tnl_hlen = skb_tnl_header_len(skb);
> +       unsigned int delta_truesize = 0;
> +       unsigned int delta_len = 0;
> +       struct sk_buff *tail = NULL;
> +       struct sk_buff *nskb;
> +
> +       skb_push(skb, -skb_network_offset(skb) + offset);
> +
> +       skb_shinfo(skb)->frag_list = NULL;
> +
> +       do {
> +               nskb = list_skb;
> +               list_skb = list_skb->next;
> +
> +               if (!tail)
> +                       skb->next = nskb;
> +               else
> +                       tail->next = nskb;
> +
> +               tail = nskb;
> +
> +               delta_len += nskb->len;
> +               delta_truesize += nskb->truesize;
> +
> +               skb_push(nskb, -skb_network_offset(nskb) + offset);
> +
> +               if (!secpath_exists(nskb))
> +                       __skb_ext_copy(nskb, skb);

Of all the possible extensions, why is this only relevant to secpath?

More in general, this function open codes a variety of skb fields that
carry over from skb to nskb. How did you select this subset of fields?

> +
> +               memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
> +
> +               nskb->tstamp = skb->tstamp;
> +               nskb->dev = skb->dev;
> +               nskb->queue_mapping = skb->queue_mapping;
> +
> +               nskb->mac_len = skb->mac_len;
> +               nskb->mac_header = skb->mac_header;
> +               nskb->transport_header = skb->transport_header;
> +               nskb->network_header = skb->network_header;
> +               skb_dst_copy(nskb, skb);
> +
> +               skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
> +               skb_copy_from_linear_data_offset(skb, -tnl_hlen,
> +                                                nskb->data - tnl_hlen,
> +                                                offset + tnl_hlen);
> +
> +               if (skb_needs_linearize(nskb, features) &&
> +                   __skb_linearize(nskb))
> +                       goto err_linearize;
> +
> +       } while (list_skb);
> +
> +       skb->truesize = skb->truesize - delta_truesize;
> +       skb->data_len = skb->data_len - delta_len;
> +       skb->len = skb->len - delta_len;
> +       skb->ip_summed = nskb->ip_summed;
> +       skb->csum_level = nskb->csum_level;

This changed from the previous version, where nskb inherited ip_summed
and csum_level from skb. Why is that?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO.
  2019-12-18 13:34 ` [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO Steffen Klassert
@ 2019-12-18 16:03   ` Willem de Bruijn
  2019-12-19  8:26     ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2019-12-18 16:03 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> This patch extends UDP GRO to support fraglist GRO/GSO
> by using the previously introduced infrastructure.
> If the feature is enabled, all UDP packets are going to
> fraglist GRO (local input and forward).
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

> -INDIRECT_CALLABLE_DECLARE(struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
> -                                                  __be16 sport, __be16 dport));
>  struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb,
> -                               struct udphdr *uh, udp_lookup_t lookup)
> +                               struct udphdr *uh, struct sock *sk)
>  {
>         struct sk_buff *pp = NULL;
>         struct sk_buff *p;
>         struct udphdr *uh2;
>         unsigned int off = skb_gro_offset(skb);
>         int flush = 1;
> -       struct sock *sk;
>
> -       rcu_read_lock();
> -       sk = INDIRECT_CALL_INET(lookup, udp6_lib_lookup_skb,
> -                               udp4_lib_lookup_skb, skb, uh->source, uh->dest);
> -       if (!sk)
> -               goto out_unlock;
> +       if (skb->dev->features & NETIF_F_GRO_FRAGLIST)
> +               NAPI_GRO_CB(skb)->is_flist = sk ? !udp_sk(sk)->gro_enabled: 1;
>
> -       if (udp_sk(sk)->gro_enabled) {
> +       if ((sk && udp_sk(sk)->gro_enabled) ||  NAPI_GRO_CB(skb)->is_flist) {

nit: two spaces before NAPI_GRO_CB

> @@ -544,6 +585,18 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
>         const struct iphdr *iph = ip_hdr(skb);
>         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>
> +       if (NAPI_GRO_CB(skb)->is_flist) {
> +               uh->len = htons(skb->len - nhoff);
> +
> +               skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> +               skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
> +
> +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> +               skb->csum_level = ~0;

why is this needed for ipv4 only?


> @@ -144,6 +150,15 @@ INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
>         const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
>         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>
> +       if (NAPI_GRO_CB(skb)->is_flist) {
> +               uh->len = htons(skb->len - nhoff);
> +
> +               skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> +               skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
> +
> +               return 0;
> +       }

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2019-12-18 16:02   ` Willem de Bruijn
@ 2019-12-19  8:22     ` Steffen Klassert
  2019-12-19 16:28       ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2019-12-19  8:22 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Dec 18, 2019 at 11:02:39AM -0500, Willem de Bruijn wrote:
> On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> 
> > +struct sk_buff *skb_segment_list(struct sk_buff *skb,
> > +                                netdev_features_t features,
> > +                                unsigned int offset)
> > +{
> > +       struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
> > +       unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > +       unsigned int delta_truesize = 0;
> > +       unsigned int delta_len = 0;
> > +       struct sk_buff *tail = NULL;
> > +       struct sk_buff *nskb;
> > +
> > +       skb_push(skb, -skb_network_offset(skb) + offset);
> > +
> > +       skb_shinfo(skb)->frag_list = NULL;
> > +
> > +       do {
> > +               nskb = list_skb;
> > +               list_skb = list_skb->next;
> > +
> > +               if (!tail)
> > +                       skb->next = nskb;
> > +               else
> > +                       tail->next = nskb;
> > +
> > +               tail = nskb;
> > +
> > +               delta_len += nskb->len;
> > +               delta_truesize += nskb->truesize;
> > +
> > +               skb_push(nskb, -skb_network_offset(nskb) + offset);
> > +
> > +               if (!secpath_exists(nskb))
> > +                       __skb_ext_copy(nskb, skb);
> 
> Of all the possible extensions, why is this only relevant to secpath?

I wrote this before we had these extensions and adjusted it
when the extensions where introduced to make it compile again.
I think we can just copy the extensions unconditionally.

> 
> More in general, this function open codes a variety of skb fields that
> carry over from skb to nskb. How did you select this subset of fields?

I tried to find the subset of __copy_skb_header() that is needed to copy.
Some fields of nskb are still valid, and some (csum related) fields
should not be copied from skb to nskb.

> 
> > +
> > +               memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
> > +
> > +               nskb->tstamp = skb->tstamp;
> > +               nskb->dev = skb->dev;
> > +               nskb->queue_mapping = skb->queue_mapping;
> > +
> > +               nskb->mac_len = skb->mac_len;
> > +               nskb->mac_header = skb->mac_header;
> > +               nskb->transport_header = skb->transport_header;
> > +               nskb->network_header = skb->network_header;
> > +               skb_dst_copy(nskb, skb);
> > +
> > +               skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
> > +               skb_copy_from_linear_data_offset(skb, -tnl_hlen,
> > +                                                nskb->data - tnl_hlen,
> > +                                                offset + tnl_hlen);
> > +
> > +               if (skb_needs_linearize(nskb, features) &&
> > +                   __skb_linearize(nskb))
> > +                       goto err_linearize;
> > +
> > +       } while (list_skb);
> > +
> > +       skb->truesize = skb->truesize - delta_truesize;
> > +       skb->data_len = skb->data_len - delta_len;
> > +       skb->len = skb->len - delta_len;
> > +       skb->ip_summed = nskb->ip_summed;
> > +       skb->csum_level = nskb->csum_level;
> 
> This changed from the previous version, where nskb inherited ip_summed
> and csum_level from skb. Why is that?

I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
make sure the noone touches the checksum of the head
skb. Otherise netfilter etc. tries to touch the csum.

Before chaining I make sure that ip_summed and csum_level is
the same for all chained skbs and here I restore the original
value from nskb.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO.
  2019-12-18 16:03   ` Willem de Bruijn
@ 2019-12-19  8:26     ` Steffen Klassert
  0 siblings, 0 replies; 18+ messages in thread
From: Steffen Klassert @ 2019-12-19  8:26 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Dec 18, 2019 at 11:03:09AM -0500, Willem de Bruijn wrote:
> On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> 
> > @@ -544,6 +585,18 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
> >         const struct iphdr *iph = ip_hdr(skb);
> >         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> >
> > +       if (NAPI_GRO_CB(skb)->is_flist) {
> > +               uh->len = htons(skb->len - nhoff);
> > +
> > +               skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> > +               skb_shinfo(skb)->gso_segs = NAPI_GRO_CB(skb)->count;
> > +
> > +               skb->ip_summed = CHECKSUM_UNNECESSARY;
> > +               skb->csum_level = ~0;
> 
> why is this needed for ipv4 only?

It is needed for IPv6 too, I've just forgot to add it there.

Thanks for the review!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2019-12-19  8:22     ` Steffen Klassert
@ 2019-12-19 16:28       ` Willem de Bruijn
  2020-01-13  8:51         ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2019-12-19 16:28 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Thu, Dec 19, 2019 at 3:22 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> On Wed, Dec 18, 2019 at 11:02:39AM -0500, Willem de Bruijn wrote:
> > On Wed, Dec 18, 2019 at 8:35 AM Steffen Klassert
> > <steffen.klassert@secunet.com> wrote:
> >
> > > +struct sk_buff *skb_segment_list(struct sk_buff *skb,
> > > +                                netdev_features_t features,
> > > +                                unsigned int offset)
> > > +{
> > > +       struct sk_buff *list_skb = skb_shinfo(skb)->frag_list;
> > > +       unsigned int tnl_hlen = skb_tnl_header_len(skb);
> > > +       unsigned int delta_truesize = 0;
> > > +       unsigned int delta_len = 0;
> > > +       struct sk_buff *tail = NULL;
> > > +       struct sk_buff *nskb;
> > > +
> > > +       skb_push(skb, -skb_network_offset(skb) + offset);
> > > +
> > > +       skb_shinfo(skb)->frag_list = NULL;
> > > +
> > > +       do {
> > > +               nskb = list_skb;
> > > +               list_skb = list_skb->next;
> > > +
> > > +               if (!tail)
> > > +                       skb->next = nskb;
> > > +               else
> > > +                       tail->next = nskb;
> > > +
> > > +               tail = nskb;
> > > +
> > > +               delta_len += nskb->len;
> > > +               delta_truesize += nskb->truesize;
> > > +
> > > +               skb_push(nskb, -skb_network_offset(nskb) + offset);
> > > +
> > > +               if (!secpath_exists(nskb))
> > > +                       __skb_ext_copy(nskb, skb);
> >
> > Of all the possible extensions, why is this only relevant to secpath?
>
> I wrote this before we had these extensions and adjusted it
> when the extensions where introduced to make it compile again.
> I think we can just copy the extensions unconditionally.
>
> >
> > More in general, this function open codes a variety of skb fields that
> > carry over from skb to nskb. How did you select this subset of fields?
>
> I tried to find the subset of __copy_skb_header() that is needed to copy.
> Some fields of nskb are still valid, and some (csum related) fields
> should not be copied from skb to nskb.

Duplicating that code is kind of fragile wrt new fields being added to
skbs later (such as the recent skb_ext example).

Perhaps we can split __copy_skb_header further and call the
inner part directly.

Or, at least follow the order of operations exactly and add a comment
that this code was taken verbatim from __copy_skb_header.

> > > +
> > > +               memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
> > > +
> > > +               nskb->tstamp = skb->tstamp;
> > > +               nskb->dev = skb->dev;
> > > +               nskb->queue_mapping = skb->queue_mapping;
> > > +
> > > +               nskb->mac_len = skb->mac_len;
> > > +               nskb->mac_header = skb->mac_header;
> > > +               nskb->transport_header = skb->transport_header;
> > > +               nskb->network_header = skb->network_header;
> > > +               skb_dst_copy(nskb, skb);
> > > +
> > > +               skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
> > > +               skb_copy_from_linear_data_offset(skb, -tnl_hlen,
> > > +                                                nskb->data - tnl_hlen,
> > > +                                                offset + tnl_hlen);
> > > +
> > > +               if (skb_needs_linearize(nskb, features) &&
> > > +                   __skb_linearize(nskb))
> > > +                       goto err_linearize;
> > > +
> > > +       } while (list_skb);
> > > +
> > > +       skb->truesize = skb->truesize - delta_truesize;
> > > +       skb->data_len = skb->data_len - delta_len;
> > > +       skb->len = skb->len - delta_len;
> > > +       skb->ip_summed = nskb->ip_summed;
> > > +       skb->csum_level = nskb->csum_level;
> >
> > This changed from the previous version, where nskb inherited ip_summed
> > and csum_level from skb. Why is that?
>
> I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> make sure the noone touches the checksum of the head
> skb. Otherise netfilter etc. tries to touch the csum.
>
> Before chaining I make sure that ip_summed and csum_level is
> the same for all chained skbs and here I restore the original
> value from nskb.

This is safe because the skb_gro_checksum_validate will have validated
already on CHECKSUM_PARTIAL? What happens if there is decap or encap
in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
imagine.

Either way, would you mind briefly documenting the checksum behavior
in the commit message? It's not trivial and I doubt I'll recall the
details in six months.

Really about patch 4: that squashed in a lot of non-trivial scaffolding
from previous patch 'UDP: enable GRO by default'. Does it make sense
to keep that in a separate patch? That should be a noop, which we can
verify. And it makes patch 4 easier to reason about on its own, too.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2019-12-19 16:28       ` Willem de Bruijn
@ 2020-01-13  8:51         ` Steffen Klassert
  2020-01-13 16:21           ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2020-01-13  8:51 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Thu, Dec 19, 2019 at 11:28:34AM -0500, Willem de Bruijn wrote:
> On Thu, Dec 19, 2019 at 3:22 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> >
> > I tried to find the subset of __copy_skb_header() that is needed to copy.
> > Some fields of nskb are still valid, and some (csum related) fields
> > should not be copied from skb to nskb.
> 
> Duplicating that code is kind of fragile wrt new fields being added to
> skbs later (such as the recent skb_ext example).
> 
> Perhaps we can split __copy_skb_header further and call the
> inner part directly.

I thought already about that, but __copy_skb_header does a
memcpy over all the header fields. If we split this, we
would need change the memcpy to direct assignments.

Maybe we can be conservative here and do a full
__copy_skb_header for now. The initial version
does not necessarily need to be the most performant
version. We could try to identify the correct subset
of header fields later then.

> >
> > I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> > make sure the noone touches the checksum of the head
> > skb. Otherise netfilter etc. tries to touch the csum.
> >
> > Before chaining I make sure that ip_summed and csum_level is
> > the same for all chained skbs and here I restore the original
> > value from nskb.
> 
> This is safe because the skb_gro_checksum_validate will have validated
> already on CHECKSUM_PARTIAL? What happens if there is decap or encap
> in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
> imagine.

Yes, the checksum is validated with skb_gro_checksum_validate. If the
packets are UDP encapsulated, they are segmented before decapsulation.
Original values are already restored. If an additional encapsulation
happens, the encap checksum will be calculated after segmentation.
Original values are restored before that.

> 
> Either way, would you mind briefly documenting the checksum behavior
> in the commit message? It's not trivial and I doubt I'll recall the
> details in six months.

Yes, can do this.

> Really about patch 4: that squashed in a lot of non-trivial scaffolding
> from previous patch 'UDP: enable GRO by default'. Does it make sense
> to keep that in a separate patch? That should be a noop, which we can
> verify. And it makes patch 4 easier to reason about on its own, too.

Patch 4 is not that big, so not sure it that makes really sense.
But I can split out a preparation patch if that is preferred.

Anyway, I likely do another RFC version because we are already
late in the development cycle.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2020-01-13  8:51         ` Steffen Klassert
@ 2020-01-13 16:21           ` Willem de Bruijn
  2020-01-15  9:47             ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2020-01-13 16:21 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Willem de Bruijn, David Miller, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner,
	Network Development

On Mon, Jan 13, 2020 at 3:51 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> On Thu, Dec 19, 2019 at 11:28:34AM -0500, Willem de Bruijn wrote:
> > On Thu, Dec 19, 2019 at 3:22 AM Steffen Klassert
> > <steffen.klassert@secunet.com> wrote:
> > >
> > > I tried to find the subset of __copy_skb_header() that is needed to copy.
> > > Some fields of nskb are still valid, and some (csum related) fields
> > > should not be copied from skb to nskb.
> >
> > Duplicating that code is kind of fragile wrt new fields being added to
> > skbs later (such as the recent skb_ext example).
> >
> > Perhaps we can split __copy_skb_header further and call the
> > inner part directly.
>
> I thought already about that, but __copy_skb_header does a
> memcpy over all the header fields. If we split this, we
> would need change the memcpy to direct assignments.

Okay, if any of those fields should not be overwritten in this case,
then that's not an option. That memcpy is probably a lot more
efficient than all the direct assignments.

> Maybe we can be conservative here and do a full
> __copy_skb_header for now. The initial version
> does not necessarily need to be the most performant
> version. We could try to identify the correct subset
> of header fields later then.

We should probably aim for the right set from the start. If you think
this set is it, let's keep it.

Could you add an explicit comment that this is a subset of
__copy_skb_header. That might help remind us of this partial duplicate
whenever updating that function.

> > > I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> > > make sure the noone touches the checksum of the head
> > > skb. Otherise netfilter etc. tries to touch the csum.
> > >
> > > Before chaining I make sure that ip_summed and csum_level is
> > > the same for all chained skbs and here I restore the original
> > > value from nskb.
> >
> > This is safe because the skb_gro_checksum_validate will have validated
> > already on CHECKSUM_PARTIAL? What happens if there is decap or encap
> > in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
> > imagine.
>
> Yes, the checksum is validated with skb_gro_checksum_validate. If the
> packets are UDP encapsulated, they are segmented before decapsulation.
> Original values are already restored. If an additional encapsulation
> happens, the encap checksum will be calculated after segmentation.
> Original values are restored before that.

I was wondering more about additional other encapsulation protocols.

From a quick read, it seems like csum_level is associated only with
CHECKSUM_UNNECESSARY.

What if a device returns CHECKSUM_COMPLETE for packets with a tunnel
that is decapsulated before forwarding. Say, just VLAN. That gets
untagged in __netif_receive_skb_core with skb_vlan_untag calling
skb_pull_rcsum. After segmentation the ip_summed is restored, with
skb->csum still containing the unmodified csum that includes the VLAN
tag?

>
> >
> > Either way, would you mind briefly documenting the checksum behavior
> > in the commit message? It's not trivial and I doubt I'll recall the
> > details in six months.
>
> Yes, can do this.
>
> > Really about patch 4: that squashed in a lot of non-trivial scaffolding
> > from previous patch 'UDP: enable GRO by default'. Does it make sense
> > to keep that in a separate patch? That should be a noop, which we can
> > verify. And it makes patch 4 easier to reason about on its own, too.
>
> Patch 4 is not that big, so not sure it that makes really sense.
> But I can split out a preparation patch if that is preferred.

Thanks. If it's not too complex. I do think that it will help make the
non-obvious functional change stand out from the larger noop code
refactor. But perhaps it's indeed nitpicking. Leave as is if you
prefer, for sure.

> Anyway, I likely do another RFC version because we are already
> late in the development cycle.

The feature is now disabled by default, so it could definitely go in
late in the cycle and allow us to find and fix bugs during
stabilization. Up to you, obviously!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2020-01-13 16:21           ` Willem de Bruijn
@ 2020-01-15  9:47             ` Steffen Klassert
  2020-01-15 15:43               ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2020-01-15  9:47 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Mon, Jan 13, 2020 at 11:21:07AM -0500, Willem de Bruijn wrote:
> On Mon, Jan 13, 2020 at 3:51 AM Steffen Klassert
> <steffen.klassert@secunet.com> wrote:
> >
> > On Thu, Dec 19, 2019 at 11:28:34AM -0500, Willem de Bruijn wrote:
> > > On Thu, Dec 19, 2019 at 3:22 AM Steffen Klassert
> > > <steffen.klassert@secunet.com> wrote:
> > > >
> > > > I tried to find the subset of __copy_skb_header() that is needed to copy.
> > > > Some fields of nskb are still valid, and some (csum related) fields
> > > > should not be copied from skb to nskb.
> > >
> > > Duplicating that code is kind of fragile wrt new fields being added to
> > > skbs later (such as the recent skb_ext example).
> > >
> > > Perhaps we can split __copy_skb_header further and call the
> > > inner part directly.
> >
> > I thought already about that, but __copy_skb_header does a
> > memcpy over all the header fields. If we split this, we
> > would need change the memcpy to direct assignments.
> 
> Okay, if any of those fields should not be overwritten in this case,
> then that's not an option. That memcpy is probably a lot more
> efficient than all the direct assignments.
> 
> > Maybe we can be conservative here and do a full
> > __copy_skb_header for now. The initial version
> > does not necessarily need to be the most performant
> > version. We could try to identify the correct subset
> > of header fields later then.
> 
> We should probably aim for the right set from the start. If you think
> this set is it, let's keep it.

I'd prefer to do a full __copy_skb_header for now and think a bit
longer if that what I chose is really the correct subset.

> > > > I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> > > > make sure the noone touches the checksum of the head
> > > > skb. Otherise netfilter etc. tries to touch the csum.
> > > >
> > > > Before chaining I make sure that ip_summed and csum_level is
> > > > the same for all chained skbs and here I restore the original
> > > > value from nskb.
> > >
> > > This is safe because the skb_gro_checksum_validate will have validated
> > > already on CHECKSUM_PARTIAL? What happens if there is decap or encap
> > > in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
> > > imagine.
> >
> > Yes, the checksum is validated with skb_gro_checksum_validate. If the
> > packets are UDP encapsulated, they are segmented before decapsulation.
> > Original values are already restored. If an additional encapsulation
> > happens, the encap checksum will be calculated after segmentation.
> > Original values are restored before that.
> 
> I was wondering more about additional other encapsulation protocols.
> 
> >From a quick read, it seems like csum_level is associated only with
> CHECKSUM_UNNECESSARY.
> 
> What if a device returns CHECKSUM_COMPLETE for packets with a tunnel
> that is decapsulated before forwarding. Say, just VLAN. That gets
> untagged in __netif_receive_skb_core with skb_vlan_untag calling
> skb_pull_rcsum. After segmentation the ip_summed is restored, with
> skb->csum still containing the unmodified csum that includes the VLAN
> tag?

Hm, that could be really a problem. So setting CHECKSUM_UNNECESSARY
should be ok, but restoring the old values are not. Our checksum
magic is rather complex, it's hard to get it right for all possible
cases. Maybe we can just set CHECKSUM_UNNECESSARY for all packets
and keep this value after segmentation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2020-01-15  9:47             ` Steffen Klassert
@ 2020-01-15 15:43               ` Willem de Bruijn
  2020-01-20  8:35                 ` Steffen Klassert
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2020-01-15 15:43 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Willem de Bruijn, David Miller, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner,
	Network Development

> > > Maybe we can be conservative here and do a full
> > > __copy_skb_header for now. The initial version
> > > does not necessarily need to be the most performant
> > > version. We could try to identify the correct subset
> > > of header fields later then.
> >
> > We should probably aim for the right set from the start. If you think
> > this set is it, let's keep it.
>
> I'd prefer to do a full __copy_skb_header for now and think a bit
> longer if that what I chose is really the correct subset.

Ok

> > > > > I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> > > > > make sure the noone touches the checksum of the head
> > > > > skb. Otherise netfilter etc. tries to touch the csum.
> > > > >
> > > > > Before chaining I make sure that ip_summed and csum_level is
> > > > > the same for all chained skbs and here I restore the original
> > > > > value from nskb.
> > > >
> > > > This is safe because the skb_gro_checksum_validate will have validated
> > > > already on CHECKSUM_PARTIAL? What happens if there is decap or encap
> > > > in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
> > > > imagine.
> > >
> > > Yes, the checksum is validated with skb_gro_checksum_validate. If the
> > > packets are UDP encapsulated, they are segmented before decapsulation.
> > > Original values are already restored. If an additional encapsulation
> > > happens, the encap checksum will be calculated after segmentation.
> > > Original values are restored before that.
> >
> > I was wondering more about additional other encapsulation protocols.
> >
> > >From a quick read, it seems like csum_level is associated only with
> > CHECKSUM_UNNECESSARY.
> >
> > What if a device returns CHECKSUM_COMPLETE for packets with a tunnel
> > that is decapsulated before forwarding. Say, just VLAN. That gets
> > untagged in __netif_receive_skb_core with skb_vlan_untag calling
> > skb_pull_rcsum. After segmentation the ip_summed is restored, with
> > skb->csum still containing the unmodified csum that includes the VLAN
> > tag?
>
> Hm, that could be really a problem. So setting CHECKSUM_UNNECESSARY
> should be ok, but restoring the old values are not. Our checksum
> magic is rather complex, it's hard to get it right for all possible
> cases. Maybe we can just set CHECKSUM_UNNECESSARY for all packets
> and keep this value after segmentation.

Note that I'm not 100% sure that the issue can occur. But it seems likely.

Yes, inverse CHECKSUM_UNNECESSARY conversion after verifying the checksum is
probably the way to go. Inverse, because it is the opposite of
__skb_gro_checksum_convert.

Or forgo (this variant of) GRO when encountering unexpected outer encap headers?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2020-01-15 15:43               ` Willem de Bruijn
@ 2020-01-20  8:35                 ` Steffen Klassert
  2020-01-20 16:35                   ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Steffen Klassert @ 2020-01-20  8:35 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Paolo Abeni, Subash Abhinov Kasiviswanathan,
	Marcelo Ricardo Leitner, Network Development

On Wed, Jan 15, 2020 at 10:43:08AM -0500, Willem de Bruijn wrote:
> > > > Maybe we can be conservative here and do a full
> > > > __copy_skb_header for now. The initial version
> > > > does not necessarily need to be the most performant
> > > > version. We could try to identify the correct subset
> > > > of header fields later then.
> > >
> > > We should probably aim for the right set from the start. If you think
> > > this set is it, let's keep it.
> >
> > I'd prefer to do a full __copy_skb_header for now and think a bit
> > longer if that what I chose is really the correct subset.
> 
> Ok
> 
> > > > > > I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> > > > > > make sure the noone touches the checksum of the head
> > > > > > skb. Otherise netfilter etc. tries to touch the csum.
> > > > > >
> > > > > > Before chaining I make sure that ip_summed and csum_level is
> > > > > > the same for all chained skbs and here I restore the original
> > > > > > value from nskb.
> > > > >
> > > > > This is safe because the skb_gro_checksum_validate will have validated
> > > > > already on CHECKSUM_PARTIAL? What happens if there is decap or encap
> > > > > in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
> > > > > imagine.
> > > >
> > > > Yes, the checksum is validated with skb_gro_checksum_validate. If the
> > > > packets are UDP encapsulated, they are segmented before decapsulation.
> > > > Original values are already restored. If an additional encapsulation
> > > > happens, the encap checksum will be calculated after segmentation.
> > > > Original values are restored before that.
> > >
> > > I was wondering more about additional other encapsulation protocols.
> > >
> > > >From a quick read, it seems like csum_level is associated only with
> > > CHECKSUM_UNNECESSARY.
> > >
> > > What if a device returns CHECKSUM_COMPLETE for packets with a tunnel
> > > that is decapsulated before forwarding. Say, just VLAN. That gets
> > > untagged in __netif_receive_skb_core with skb_vlan_untag calling
> > > skb_pull_rcsum. After segmentation the ip_summed is restored, with
> > > skb->csum still containing the unmodified csum that includes the VLAN
> > > tag?
> >
> > Hm, that could be really a problem. So setting CHECKSUM_UNNECESSARY
> > should be ok, but restoring the old values are not. Our checksum
> > magic is rather complex, it's hard to get it right for all possible
> > cases. Maybe we can just set CHECKSUM_UNNECESSARY for all packets
> > and keep this value after segmentation.
> 
> Note that I'm not 100% sure that the issue can occur. But it seems likely.
> 
> Yes, inverse CHECKSUM_UNNECESSARY conversion after verifying the checksum is
> probably the way to go. Inverse, because it is the opposite of
> __skb_gro_checksum_convert.

I'm not sure if I understand what you mean here. I'd do the following
for fraglist GRO in udp4_gro_complete:

        if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
                if (skb->csum_level < SKB_MAX_CSUM_LEVEL)
                        skb->csum_level++;
        } else {
                skb->ip_summed = CHECKSUM_UNNECESSARY;
                skb->csum_level = 0;
        }

and then copy these values to the segments after segmentation.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining.
  2020-01-20  8:35                 ` Steffen Klassert
@ 2020-01-20 16:35                   ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2020-01-20 16:35 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Willem de Bruijn, David Miller, Paolo Abeni,
	Subash Abhinov Kasiviswanathan, Marcelo Ricardo Leitner,
	Network Development

On Mon, Jan 20, 2020 at 3:36 AM Steffen Klassert
<steffen.klassert@secunet.com> wrote:
>
> On Wed, Jan 15, 2020 at 10:43:08AM -0500, Willem de Bruijn wrote:
> > > > > Maybe we can be conservative here and do a full
> > > > > __copy_skb_header for now. The initial version
> > > > > does not necessarily need to be the most performant
> > > > > version. We could try to identify the correct subset
> > > > > of header fields later then.
> > > >
> > > > We should probably aim for the right set from the start. If you think
> > > > this set is it, let's keep it.
> > >
> > > I'd prefer to do a full __copy_skb_header for now and think a bit
> > > longer if that what I chose is really the correct subset.
> >
> > Ok
> >
> > > > > > > I had to set ip_summed to CHECKSUM_UNNECESSARY on GRO to
> > > > > > > make sure the noone touches the checksum of the head
> > > > > > > skb. Otherise netfilter etc. tries to touch the csum.
> > > > > > >
> > > > > > > Before chaining I make sure that ip_summed and csum_level is
> > > > > > > the same for all chained skbs and here I restore the original
> > > > > > > value from nskb.
> > > > > >
> > > > > > This is safe because the skb_gro_checksum_validate will have validated
> > > > > > already on CHECKSUM_PARTIAL? What happens if there is decap or encap
> > > > > > in the path? We cannot revert to CHECKSUM_PARTIAL after that, I
> > > > > > imagine.
> > > > >
> > > > > Yes, the checksum is validated with skb_gro_checksum_validate. If the
> > > > > packets are UDP encapsulated, they are segmented before decapsulation.
> > > > > Original values are already restored. If an additional encapsulation
> > > > > happens, the encap checksum will be calculated after segmentation.
> > > > > Original values are restored before that.
> > > >
> > > > I was wondering more about additional other encapsulation protocols.
> > > >
> > > > >From a quick read, it seems like csum_level is associated only with
> > > > CHECKSUM_UNNECESSARY.
> > > >
> > > > What if a device returns CHECKSUM_COMPLETE for packets with a tunnel
> > > > that is decapsulated before forwarding. Say, just VLAN. That gets
> > > > untagged in __netif_receive_skb_core with skb_vlan_untag calling
> > > > skb_pull_rcsum. After segmentation the ip_summed is restored, with
> > > > skb->csum still containing the unmodified csum that includes the VLAN
> > > > tag?
> > >
> > > Hm, that could be really a problem. So setting CHECKSUM_UNNECESSARY
> > > should be ok, but restoring the old values are not. Our checksum
> > > magic is rather complex, it's hard to get it right for all possible
> > > cases. Maybe we can just set CHECKSUM_UNNECESSARY for all packets
> > > and keep this value after segmentation.
> >
> > Note that I'm not 100% sure that the issue can occur. But it seems likely.
> >
> > Yes, inverse CHECKSUM_UNNECESSARY conversion after verifying the checksum is
> > probably the way to go. Inverse, because it is the opposite of
> > __skb_gro_checksum_convert.
>
> I'm not sure if I understand what you mean here.

I mean that I agree that it's okay to convert from CHECKSUM_COMPLETE
to CHECKSUM_UNNECESSARY if the UDP checksum has been validated.

> I'd do the following
> for fraglist GRO in udp4_gro_complete:
>
>         if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
>                 if (skb->csum_level < SKB_MAX_CSUM_LEVEL)
>                         skb->csum_level++;
>         } else {
>                 skb->ip_summed = CHECKSUM_UNNECESSARY;
>                 skb->csum_level = 0;
>         }

Akin to __skb_incr_checksum_unnecessary, but applying conversion both
in the case of CHECKSUM_NONE and CHECKSUM_COMPLETE. Makes sense.


> and then copy these values to the segments after segmentation.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-20 16:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18 13:34 [PATCH net-next 0/4] Support fraglist GRO/GSO Steffen Klassert
2019-12-18 13:34 ` [PATCH net-next 1/4] net: Add fraglist GRO/GSO feature flags Steffen Klassert
2019-12-18 16:02   ` Willem de Bruijn
2019-12-18 13:34 ` [PATCH net-next 2/4] net: Add a netdev software feature set that defaults to off Steffen Klassert
2019-12-18 16:02   ` Willem de Bruijn
2019-12-18 13:34 ` [PATCH net-next 3/4] net: Support GRO/GSO fraglist chaining Steffen Klassert
2019-12-18 16:02   ` Willem de Bruijn
2019-12-19  8:22     ` Steffen Klassert
2019-12-19 16:28       ` Willem de Bruijn
2020-01-13  8:51         ` Steffen Klassert
2020-01-13 16:21           ` Willem de Bruijn
2020-01-15  9:47             ` Steffen Klassert
2020-01-15 15:43               ` Willem de Bruijn
2020-01-20  8:35                 ` Steffen Klassert
2020-01-20 16:35                   ` Willem de Bruijn
2019-12-18 13:34 ` [PATCH net-next 4/4] udp: Support UDP fraglist GRO/GSO Steffen Klassert
2019-12-18 16:03   ` Willem de Bruijn
2019-12-19  8:26     ` Steffen Klassert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).