All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] udp optimisation
@ 2022-01-11  1:21 Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 01/14] ipv6: optimise dst referencing Pavel Begunkov
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

A mainly UDP/IPv6 optimisation patch set. Zerocopy io_uring benchmark over
dummy netdev (CPU bound) gives 2068992 -> 2166481 tx/s, which is ~4.7% or
over 5% of net layer overhead. Should give similar results for small
packet non-zerocopy.

- 1/14 and 9/14 remove a get/put dst pair each, so saving 4 atomics per
  corkless UDP send.
- Patches 3-8 optimise iflow handling, in particular removes one 88B
  memset and one 88B copy.
- 10-14 are random improvements, which are not UDP-specific but also
  beneficial to TCP and others.

Pavel Begunkov (14):
  ipv6: optimise dst referencing
  ipv6: shuffle up->pending AF_INET bits
  ipv6: remove daddr temp buffer in __ip6_make_skb
  ipv6: clean up cork setup/release
  ipv6: don't zero cork's flowi after use
  ipv6: pass full cork into __ip6_append_data()
  ipv6: pass flow in ip6_make_skb together with cork
  ipv6/udp: don't make extra copies of iflow
  ipv6: hand dst refs to cork setup
  skbuff: drop zero check from skb_zcopy_set
  skbuff: drop null check from skb_zcopy
  skbuff: optimise alloc_skb_with_frags()
  net: inline part of skb_csum_hwoffload_help
  net: inline sock_alloc_send_skb

 include/linux/netdevice.h |  16 +++++-
 include/linux/skbuff.h    |  45 +++++++++++++---
 include/net/ipv6.h        |   2 +-
 include/net/sock.h        |  10 +++-
 net/core/dev.c            |  15 ++----
 net/core/skbuff.c         |  34 +++++-------
 net/core/sock.c           |   7 ---
 net/ipv4/ip_output.c      |  10 ++--
 net/ipv4/tcp.c            |   5 +-
 net/ipv6/ip6_output.c     | 105 +++++++++++++++++++++-----------------
 net/ipv6/udp.c            | 103 ++++++++++++++++++-------------------
 11 files changed, 197 insertions(+), 155 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/14] ipv6: optimise dst referencing
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 02/14] ipv6: shuffle up->pending AF_INET bits Pavel Begunkov
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

__ip6_make_skb() initialises skb's dst by taking an additional reference
to cork->dst. However, cork->dst comes into the function holding a ref,
which will be put shortly at the end of the function in
ip6_cork_release().

Avoid this extra pair of get/put atomics by stealing cork->dst and
NULL'ing the field, ip6_cork_release() already handles zero values.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 2995f8d89e7e..14d607ccfeea 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1807,6 +1807,15 @@ int ip6_append_data(struct sock *sk,
 }
 EXPORT_SYMBOL_GPL(ip6_append_data);
 
+static void ip6_cork_steal_dst(struct sk_buff *skb, struct inet_cork_full *cork)
+{
+	struct dst_entry *dst = cork->base.dst;
+
+	cork->base.dst = NULL;
+	cork->base.flags &= ~IPCORK_ALLFRAG;
+	skb_dst_set(skb, dst);
+}
+
 static void ip6_cork_release(struct inet_cork_full *cork,
 			     struct inet6_cork *v6_cork)
 {
@@ -1889,7 +1898,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	skb->tstamp = cork->base.transmit_time;
 
-	skb_dst_set(skb, dst_clone(&rt->dst));
+	ip6_cork_steal_dst(skb, cork);
 	IP6_UPD_PO_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUT, skb->len);
 	if (proto == IPPROTO_ICMPV6) {
 		struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/14] ipv6: shuffle up->pending AF_INET bits
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 01/14] ipv6: optimise dst referencing Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 03/14] ipv6: remove daddr temp buffer in __ip6_make_skb Pavel Begunkov
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

Corked AF_INET for ipv6 socket doesn't appear to be the hottest case,
so move it out of the common path under up->pending check to remove
overhead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index df216268cb02..0c10ee0124b5 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1363,9 +1363,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		}
 	}
 
-	if (up->pending == AF_INET)
-		return udp_sendmsg(sk, msg, len);
-
 	/* Rough check on arithmetic overflow,
 	   better check is made in ip6_append_data().
 	   */
@@ -1374,6 +1371,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	getfrag  =  is_udplite ?  udplite_getfrag : ip_generic_getfrag;
 	if (up->pending) {
+		if (up->pending == AF_INET)
+			return udp_sendmsg(sk, msg, len);
 		/*
 		 * There are pending frames.
 		 * The socket lock must be held while it's corked.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/14] ipv6: remove daddr temp buffer in __ip6_make_skb
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 01/14] ipv6: optimise dst referencing Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 02/14] ipv6: shuffle up->pending AF_INET bits Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 04/14] ipv6: clean up cork setup/release Pavel Begunkov
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

__ip6_make_skb() doesn't actually need to keep an on-stack copy of
fl6->daddr because even though ipv6_push_nfrag_opts() may return a
different daddr it doesn't change the one that was passed in.
Just set final_dst to fl6->daddr and get rid of the temp copy.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 14d607ccfeea..4acd577d5ec5 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1843,7 +1843,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 {
 	struct sk_buff *skb, *tmp_skb;
 	struct sk_buff **tail_skb;
-	struct in6_addr final_dst_buf, *final_dst = &final_dst_buf;
+	struct in6_addr *final_dst;
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct net *net = sock_net(sk);
 	struct ipv6hdr *hdr;
@@ -1873,9 +1873,9 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	/* Allow local fragmentation. */
 	skb->ignore_df = ip6_sk_ignore_df(sk);
-
-	*final_dst = fl6->daddr;
 	__skb_pull(skb, skb_network_header_len(skb));
+
+	final_dst = &fl6->daddr;
 	if (opt && opt->opt_flen)
 		ipv6_push_frag_opts(skb, opt, &proto);
 	if (opt && opt->opt_nflen)
@@ -1895,7 +1895,6 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	skb->priority = sk->sk_priority;
 	skb->mark = cork->base.mark;
-
 	skb->tstamp = cork->base.transmit_time;
 
 	ip6_cork_steal_dst(skb, cork);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/14] ipv6: clean up cork setup/release
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 03/14] ipv6: remove daddr temp buffer in __ip6_make_skb Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 05/14] ipv6: don't zero cork's flowi after use Pavel Begunkov
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

A simple cleanup of ip6_setup_cork() and ip6_cork_release() adding a
local variable for v6_cork->opt instead of retyping it many times. It
serves as a preparation patch to make further work cleaner.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 44 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 4acd577d5ec5..88349e49717a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1354,7 +1354,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	unsigned int mtu;
-	struct ipv6_txoptions *opt = ipc6->opt;
+	struct ipv6_txoptions *nopt, *opt = ipc6->opt;
 
 	/*
 	 * setup for corking
@@ -1363,32 +1363,28 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 		if (WARN_ON(v6_cork->opt))
 			return -EINVAL;
 
-		v6_cork->opt = kzalloc(sizeof(*opt), sk->sk_allocation);
-		if (unlikely(!v6_cork->opt))
+		nopt = v6_cork->opt = kzalloc(sizeof(*opt), sk->sk_allocation);
+		if (unlikely(!nopt))
 			return -ENOBUFS;
 
-		v6_cork->opt->tot_len = sizeof(*opt);
-		v6_cork->opt->opt_flen = opt->opt_flen;
-		v6_cork->opt->opt_nflen = opt->opt_nflen;
+		nopt->tot_len = sizeof(*opt);
+		nopt->opt_flen = opt->opt_flen;
+		nopt->opt_nflen = opt->opt_nflen;
 
-		v6_cork->opt->dst0opt = ip6_opt_dup(opt->dst0opt,
-						    sk->sk_allocation);
-		if (opt->dst0opt && !v6_cork->opt->dst0opt)
+		nopt->dst0opt = ip6_opt_dup(opt->dst0opt, sk->sk_allocation);
+		if (opt->dst0opt && !nopt->dst0opt)
 			return -ENOBUFS;
 
-		v6_cork->opt->dst1opt = ip6_opt_dup(opt->dst1opt,
-						    sk->sk_allocation);
-		if (opt->dst1opt && !v6_cork->opt->dst1opt)
+		nopt->dst1opt = ip6_opt_dup(opt->dst1opt, sk->sk_allocation);
+		if (opt->dst1opt && !nopt->dst1opt)
 			return -ENOBUFS;
 
-		v6_cork->opt->hopopt = ip6_opt_dup(opt->hopopt,
-						   sk->sk_allocation);
-		if (opt->hopopt && !v6_cork->opt->hopopt)
+		nopt->hopopt = ip6_opt_dup(opt->hopopt, sk->sk_allocation);
+		if (opt->hopopt && !nopt->hopopt)
 			return -ENOBUFS;
 
-		v6_cork->opt->srcrt = ip6_rthdr_dup(opt->srcrt,
-						    sk->sk_allocation);
-		if (opt->srcrt && !v6_cork->opt->srcrt)
+		nopt->srcrt = ip6_rthdr_dup(opt->srcrt, sk->sk_allocation);
+		if (opt->srcrt && !nopt->srcrt)
 			return -ENOBUFS;
 
 		/* need source address above miyazawa*/
@@ -1820,11 +1816,13 @@ static void ip6_cork_release(struct inet_cork_full *cork,
 			     struct inet6_cork *v6_cork)
 {
 	if (v6_cork->opt) {
-		kfree(v6_cork->opt->dst0opt);
-		kfree(v6_cork->opt->dst1opt);
-		kfree(v6_cork->opt->hopopt);
-		kfree(v6_cork->opt->srcrt);
-		kfree(v6_cork->opt);
+		struct ipv6_txoptions *opt = v6_cork->opt;
+
+		kfree(opt->dst0opt);
+		kfree(opt->dst1opt);
+		kfree(opt->hopopt);
+		kfree(opt->srcrt);
+		kfree(opt);
 		v6_cork->opt = NULL;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/14] ipv6: don't zero cork's flowi after use
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 04/14] ipv6: clean up cork setup/release Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 06/14] ipv6: pass full cork into __ip6_append_data() Pavel Begunkov
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

It doesn't appear there is any reason to zero cork->fl after use, i.e.
in ip6_cork_release(), especially when cork struct is on-stack. Not
only the memset accounts to 0.3-0.5% of total cycles (perf profiling),
but also prevents other optimisations implemented in further patches.
Also, now we can remove a relatively expensive flow copy in
udp_v6_push_pending_frames().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c |  1 -
 net/ipv6/udp.c        | 10 ++--------
 2 files changed, 2 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 88349e49717a..b8fdda9ac797 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1831,7 +1831,6 @@ static void ip6_cork_release(struct inet_cork_full *cork,
 		cork->base.dst = NULL;
 		cork->base.flags &= ~IPCORK_ALLFRAG;
 	}
-	memset(&cork->fl, 0, sizeof(cork->fl));
 }
 
 struct sk_buff *__ip6_make_skb(struct sock *sk,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 0c10ee0124b5..9a91b51d8e3f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1266,23 +1266,17 @@ static int udp_v6_push_pending_frames(struct sock *sk)
 {
 	struct sk_buff *skb;
 	struct udp_sock  *up = udp_sk(sk);
-	struct flowi6 fl6;
 	int err = 0;
 
 	if (up->pending == AF_INET)
 		return udp_push_pending_frames(sk);
 
-	/* ip6_finish_skb will release the cork, so make a copy of
-	 * fl6 here.
-	 */
-	fl6 = inet_sk(sk)->cork.fl.u.ip6;
-
 	skb = ip6_finish_skb(sk);
 	if (!skb)
 		goto out;
 
-	err = udp_v6_send_skb(skb, &fl6, &inet_sk(sk)->cork.base);
-
+	err = udp_v6_send_skb(skb, &inet_sk(sk)->cork.fl.u.ip6,
+			      &inet_sk(sk)->cork.base);
 out:
 	up->len = 0;
 	up->pending = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/14] ipv6: pass full cork into __ip6_append_data()
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 05/14] ipv6: don't zero cork's flowi after use Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 07/14] ipv6: pass flow in ip6_make_skb together with cork Pavel Begunkov
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

Convert a struct inet_cork argument in __ip6_append_data() to struct
inet_cork_full. As one struct contains another inet_cork is still can
be accessed via ->base field. It's a preparation patch making further
changes a bit cleaner.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index b8fdda9ac797..62da09819750 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1424,7 +1424,7 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 static int __ip6_append_data(struct sock *sk,
 			     struct flowi6 *fl6,
 			     struct sk_buff_head *queue,
-			     struct inet_cork *cork,
+			     struct inet_cork_full *cork_full,
 			     struct inet6_cork *v6_cork,
 			     struct page_frag *pfrag,
 			     int getfrag(void *from, char *to, int offset,
@@ -1433,6 +1433,7 @@ static int __ip6_append_data(struct sock *sk,
 			     unsigned int flags, struct ipcm6_cookie *ipc6)
 {
 	struct sk_buff *skb, *skb_prev = NULL;
+	struct inet_cork *cork = &cork_full->base;
 	unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
 	struct ubuf_info *uarg = NULL;
 	int exthdrlen = 0;
@@ -1797,7 +1798,7 @@ int ip6_append_data(struct sock *sk,
 		transhdrlen = 0;
 	}
 
-	return __ip6_append_data(sk, fl6, &sk->sk_write_queue, &inet->cork.base,
+	return __ip6_append_data(sk, fl6, &sk->sk_write_queue, &inet->cork,
 				 &np->cork, sk_page_frag(sk), getfrag,
 				 from, length, transhdrlen, flags, ipc6);
 }
@@ -1993,7 +1994,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 	if (ipc6->dontfrag < 0)
 		ipc6->dontfrag = inet6_sk(sk)->dontfrag;
 
-	err = __ip6_append_data(sk, fl6, &queue, &cork->base, &v6_cork,
+	err = __ip6_append_data(sk, fl6, &queue, cork, &v6_cork,
 				&current->task_frag, getfrag, from,
 				length + exthdrlen, transhdrlen + exthdrlen,
 				flags, ipc6);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/14] ipv6: pass flow in ip6_make_skb together with cork
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 06/14] ipv6: pass full cork into __ip6_append_data() Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 08/14] ipv6/udp: don't make extra copies of iflow Pavel Begunkov
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

Another preparation patch. inet_cork_full already contains a field for
iflow, so we can avoid passing a separate struct iflow6 into
__ip6_append_data() and ip6_make_skb(), and use the flow stored in
inet_cork_full. Make sure callers set cork->fl right, i.e. we init it in
ip6_append_data() and right before the only ip6_make_skb() call.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/ipv6.h    |  2 +-
 net/ipv6/ip6_output.c | 20 +++++++++-----------
 net/ipv6/udp.c        |  4 +++-
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3afcb128e064..5e0b56d66724 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1020,7 +1020,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 			     int getfrag(void *from, char *to, int offset,
 					 int len, int odd, struct sk_buff *skb),
 			     void *from, int length, int transhdrlen,
-			     struct ipcm6_cookie *ipc6, struct flowi6 *fl6,
+			     struct ipcm6_cookie *ipc6,
 			     struct rt6_info *rt, unsigned int flags,
 			     struct inet_cork_full *cork);
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 62da09819750..0cc490f2cfbf 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1350,7 +1350,7 @@ static void ip6_append_data_mtu(unsigned int *mtu,
 
 static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 			  struct inet6_cork *v6_cork, struct ipcm6_cookie *ipc6,
-			  struct rt6_info *rt, struct flowi6 *fl6)
+			  struct rt6_info *rt)
 {
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	unsigned int mtu;
@@ -1391,7 +1391,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	}
 	dst_hold(&rt->dst);
 	cork->base.dst = &rt->dst;
-	cork->fl.u.ip6 = *fl6;
 	v6_cork->hop_limit = ipc6->hlimit;
 	v6_cork->tclass = ipc6->tclass;
 	if (rt->dst.flags & DST_XFRM_TUNNEL)
@@ -1422,7 +1421,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 }
 
 static int __ip6_append_data(struct sock *sk,
-			     struct flowi6 *fl6,
 			     struct sk_buff_head *queue,
 			     struct inet_cork_full *cork_full,
 			     struct inet6_cork *v6_cork,
@@ -1434,6 +1432,7 @@ static int __ip6_append_data(struct sock *sk,
 {
 	struct sk_buff *skb, *skb_prev = NULL;
 	struct inet_cork *cork = &cork_full->base;
+	struct flowi6 *fl6 = &cork_full->fl.u.ip6;
 	unsigned int maxfraglen, fragheaderlen, mtu, orig_mtu, pmtu;
 	struct ubuf_info *uarg = NULL;
 	int exthdrlen = 0;
@@ -1786,19 +1785,19 @@ int ip6_append_data(struct sock *sk,
 		 * setup for corking
 		 */
 		err = ip6_setup_cork(sk, &inet->cork, &np->cork,
-				     ipc6, rt, fl6);
+				     ipc6, rt);
 		if (err)
 			return err;
 
+		inet->cork.fl.u.ip6 = *fl6;
 		exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0);
 		length += exthdrlen;
 		transhdrlen += exthdrlen;
 	} else {
-		fl6 = &inet->cork.fl.u.ip6;
 		transhdrlen = 0;
 	}
 
-	return __ip6_append_data(sk, fl6, &sk->sk_write_queue, &inet->cork,
+	return __ip6_append_data(sk, &sk->sk_write_queue, &inet->cork,
 				 &np->cork, sk_page_frag(sk), getfrag,
 				 from, length, transhdrlen, flags, ipc6);
 }
@@ -1967,9 +1966,8 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 			     int getfrag(void *from, char *to, int offset,
 					 int len, int odd, struct sk_buff *skb),
 			     void *from, int length, int transhdrlen,
-			     struct ipcm6_cookie *ipc6, struct flowi6 *fl6,
-			     struct rt6_info *rt, unsigned int flags,
-			     struct inet_cork_full *cork)
+			     struct ipcm6_cookie *ipc6, struct rt6_info *rt,
+			     unsigned int flags, struct inet_cork_full *cork)
 {
 	struct inet6_cork v6_cork;
 	struct sk_buff_head queue;
@@ -1986,7 +1984,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 	cork->base.opt = NULL;
 	cork->base.dst = NULL;
 	v6_cork.opt = NULL;
-	err = ip6_setup_cork(sk, cork, &v6_cork, ipc6, rt, fl6);
+	err = ip6_setup_cork(sk, cork, &v6_cork, ipc6, rt);
 	if (err) {
 		ip6_cork_release(cork, &v6_cork);
 		return ERR_PTR(err);
@@ -1994,7 +1992,7 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 	if (ipc6->dontfrag < 0)
 		ipc6->dontfrag = inet6_sk(sk)->dontfrag;
 
-	err = __ip6_append_data(sk, fl6, &queue, cork, &v6_cork,
+	err = __ip6_append_data(sk, &queue, cork, &v6_cork,
 				&current->task_frag, getfrag, from,
 				length + exthdrlen, transhdrlen + exthdrlen,
 				flags, ipc6);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9a91b51d8e3f..2580705431ea 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1533,9 +1533,11 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		struct inet_cork_full cork;
 		struct sk_buff *skb;
 
+		cork.fl.u.ip6 = fl6;
+
 		skb = ip6_make_skb(sk, getfrag, msg, ulen,
 				   sizeof(struct udphdr), &ipc6,
-				   &fl6, (struct rt6_info *)dst,
+				   (struct rt6_info *)dst,
 				   msg->msg_flags, &cork);
 		err = PTR_ERR(skb);
 		if (!IS_ERR_OR_NULL(skb))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/14] ipv6/udp: don't make extra copies of iflow
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 07/14] ipv6: pass flow in ip6_make_skb together with cork Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 09/14] ipv6: hand dst refs to cork setup Pavel Begunkov
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

struct flowi takes 88 bytes and copying it is relatively expensive.
Currenly, udpv6_sendmsg() first initialises an on-stack struct flowi6
and then copies it into cork. Instead, directly initialise a flow in an
on-stack cork, i.e. cork->fl, so corkless udp can avoid making an extra
copy.

Note: moving inet_cork_full instance shouldn't grow stack too much,
it replaces 88 bytes for iflow with 160.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 85 +++++++++++++++++++++++++-------------------------
 1 file changed, 42 insertions(+), 43 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2580705431ea..eec83e34ae27 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1294,7 +1294,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	struct ipv6_txoptions *opt = NULL;
 	struct ipv6_txoptions *opt_to_free = NULL;
 	struct ip6_flowlabel *flowlabel = NULL;
-	struct flowi6 fl6;
+	struct inet_cork_full cork;
+	struct flowi6 *fl6 = &cork.fl.u.ip6;
 	struct dst_entry *dst;
 	struct ipcm6_cookie ipc6;
 	int addr_len = msg->msg_namelen;
@@ -1384,19 +1385,19 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 	ulen += sizeof(struct udphdr);
 
-	memset(&fl6, 0, sizeof(fl6));
+	memset(fl6, 0, sizeof(*fl6));
 
 	if (sin6) {
 		if (sin6->sin6_port == 0)
 			return -EINVAL;
 
-		fl6.fl6_dport = sin6->sin6_port;
+		fl6->fl6_dport = sin6->sin6_port;
 		daddr = &sin6->sin6_addr;
 
 		if (np->sndflow) {
-			fl6.flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
-			if (fl6.flowlabel&IPV6_FLOWLABEL_MASK) {
-				flowlabel = fl6_sock_lookup(sk, fl6.flowlabel);
+			fl6->flowlabel = sin6->sin6_flowinfo&IPV6_FLOWINFO_MASK;
+			if (fl6->flowlabel & IPV6_FLOWLABEL_MASK) {
+				flowlabel = fl6_sock_lookup(sk, fl6->flowlabel);
 				if (IS_ERR(flowlabel))
 					return -EINVAL;
 			}
@@ -1413,24 +1414,24 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		if (addr_len >= sizeof(struct sockaddr_in6) &&
 		    sin6->sin6_scope_id &&
 		    __ipv6_addr_needs_scope_id(__ipv6_addr_type(daddr)))
-			fl6.flowi6_oif = sin6->sin6_scope_id;
+			fl6->flowi6_oif = sin6->sin6_scope_id;
 	} else {
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 
-		fl6.fl6_dport = inet->inet_dport;
+		fl6->fl6_dport = inet->inet_dport;
 		daddr = &sk->sk_v6_daddr;
-		fl6.flowlabel = np->flow_label;
+		fl6->flowlabel = np->flow_label;
 		connected = true;
 	}
 
-	if (!fl6.flowi6_oif)
-		fl6.flowi6_oif = sk->sk_bound_dev_if;
+	if (!fl6->flowi6_oif)
+		fl6->flowi6_oif = sk->sk_bound_dev_if;
 
-	if (!fl6.flowi6_oif)
-		fl6.flowi6_oif = np->sticky_pktinfo.ipi6_ifindex;
+	if (!fl6->flowi6_oif)
+		fl6->flowi6_oif = np->sticky_pktinfo.ipi6_ifindex;
 
-	fl6.flowi6_uid = sk->sk_uid;
+	fl6->flowi6_uid = sk->sk_uid;
 
 	if (msg->msg_controllen) {
 		opt = &opt_space;
@@ -1440,14 +1441,14 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 		err = udp_cmsg_send(sk, msg, &ipc6.gso_size);
 		if (err > 0)
-			err = ip6_datagram_send_ctl(sock_net(sk), sk, msg, &fl6,
+			err = ip6_datagram_send_ctl(sock_net(sk), sk, msg, fl6,
 						    &ipc6);
 		if (err < 0) {
 			fl6_sock_release(flowlabel);
 			return err;
 		}
-		if ((fl6.flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
-			flowlabel = fl6_sock_lookup(sk, fl6.flowlabel);
+		if ((fl6->flowlabel&IPV6_FLOWLABEL_MASK) && !flowlabel) {
+			flowlabel = fl6_sock_lookup(sk, fl6->flowlabel);
 			if (IS_ERR(flowlabel))
 				return -EINVAL;
 		}
@@ -1464,16 +1465,17 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	opt = ipv6_fixup_options(&opt_space, opt);
 	ipc6.opt = opt;
 
-	fl6.flowi6_proto = sk->sk_protocol;
-	fl6.flowi6_mark = ipc6.sockc.mark;
-	fl6.daddr = *daddr;
-	if (ipv6_addr_any(&fl6.saddr) && !ipv6_addr_any(&np->saddr))
-		fl6.saddr = np->saddr;
-	fl6.fl6_sport = inet->inet_sport;
+	fl6->flowi6_proto = sk->sk_protocol;
+	fl6->flowi6_mark = ipc6.sockc.mark;
+	fl6->daddr = *daddr;
+	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
+		fl6->saddr = np->saddr;
+	fl6->fl6_sport = inet->inet_sport;
 
 	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
 		err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk,
-					   (struct sockaddr *)sin6, &fl6.saddr);
+					   (struct sockaddr *)sin6,
+					   &fl6->saddr);
 		if (err)
 			goto out_no_dst;
 		if (sin6) {
@@ -1489,32 +1491,32 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 				err = -EINVAL;
 				goto out_no_dst;
 			}
-			fl6.fl6_dport = sin6->sin6_port;
-			fl6.daddr = sin6->sin6_addr;
+			fl6->fl6_dport = sin6->sin6_port;
+			fl6->daddr = sin6->sin6_addr;
 		}
 	}
 
-	if (ipv6_addr_any(&fl6.daddr))
-		fl6.daddr.s6_addr[15] = 0x1; /* :: means loopback (BSD'ism) */
+	if (ipv6_addr_any(&fl6->daddr))
+		fl6->daddr.s6_addr[15] = 0x1; /* :: means loopback (BSD'ism) */
 
-	final_p = fl6_update_dst(&fl6, opt, &final);
+	final_p = fl6_update_dst(fl6, opt, &final);
 	if (final_p)
 		connected = false;
 
-	if (!fl6.flowi6_oif && ipv6_addr_is_multicast(&fl6.daddr)) {
-		fl6.flowi6_oif = np->mcast_oif;
+	if (!fl6->flowi6_oif && ipv6_addr_is_multicast(&fl6->daddr)) {
+		fl6->flowi6_oif = np->mcast_oif;
 		connected = false;
-	} else if (!fl6.flowi6_oif)
-		fl6.flowi6_oif = np->ucast_oif;
+	} else if (!fl6->flowi6_oif)
+		fl6->flowi6_oif = np->ucast_oif;
 
-	security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6));
+	security_sk_classify_flow(sk, flowi6_to_flowi_common(fl6));
 
 	if (ipc6.tclass < 0)
 		ipc6.tclass = np->tclass;
 
-	fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
+	fl6->flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6->flowlabel);
 
-	dst = ip6_sk_dst_lookup_flow(sk, &fl6, final_p, connected);
+	dst = ip6_sk_dst_lookup_flow(sk, fl6, final_p, connected);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		dst = NULL;
@@ -1522,7 +1524,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	}
 
 	if (ipc6.hlimit < 0)
-		ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
+		ipc6.hlimit = ip6_sk_dst_hoplimit(np, fl6, dst);
 
 	if (msg->msg_flags&MSG_CONFIRM)
 		goto do_confirm;
@@ -1530,18 +1532,15 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	/* Lockless fast path for the non-corking case */
 	if (!corkreq) {
-		struct inet_cork_full cork;
 		struct sk_buff *skb;
 
-		cork.fl.u.ip6 = fl6;
-
 		skb = ip6_make_skb(sk, getfrag, msg, ulen,
 				   sizeof(struct udphdr), &ipc6,
 				   (struct rt6_info *)dst,
 				   msg->msg_flags, &cork);
 		err = PTR_ERR(skb);
 		if (!IS_ERR_OR_NULL(skb))
-			err = udp_v6_send_skb(skb, &fl6, &cork.base);
+			err = udp_v6_send_skb(skb, fl6, &cork.base);
 		goto out;
 	}
 
@@ -1563,7 +1562,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		ipc6.dontfrag = np->dontfrag;
 	up->len += ulen;
 	err = ip6_append_data(sk, getfrag, msg, ulen, sizeof(struct udphdr),
-			      &ipc6, &fl6, (struct rt6_info *)dst,
+			      &ipc6, fl6, (struct rt6_info *)dst,
 			      corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
 	if (err)
 		udp_v6_flush_pending_frames(sk);
@@ -1598,7 +1597,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 do_confirm:
 	if (msg->msg_flags & MSG_PROBE)
-		dst_confirm_neigh(dst, &fl6.daddr);
+		dst_confirm_neigh(dst, &fl6->daddr);
 	if (!(msg->msg_flags&MSG_PROBE) || len)
 		goto back_from_confirm;
 	err = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 08/14] ipv6/udp: don't make extra copies of iflow Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11 15:39   ` Willem de Bruijn
  2022-01-11 17:11   ` Paolo Abeni
  2022-01-11  1:21 ` [PATCH 10/14] skbuff: drop zero check from skb_zcopy_set Pavel Begunkov
                   ` (4 subsequent siblings)
  13 siblings, 2 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

During cork->dst setup, ip6_make_skb() gets an additional reference to
a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
ip6_make_skb(), and so we can save two additional atomics by passing
dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
it's enough to make sure it doesn't use dst afterwards.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 9 ++++++---
 net/ipv6/udp.c        | 3 ++-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 0cc490f2cfbf..6a7bba4dd04d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1356,6 +1356,8 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	unsigned int mtu;
 	struct ipv6_txoptions *nopt, *opt = ipc6->opt;
 
+	cork->base.dst = &rt->dst;
+
 	/*
 	 * setup for corking
 	 */
@@ -1389,8 +1391,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 
 		/* need source address above miyazawa*/
 	}
-	dst_hold(&rt->dst);
-	cork->base.dst = &rt->dst;
 	v6_cork->hop_limit = ipc6->hlimit;
 	v6_cork->tclass = ipc6->tclass;
 	if (rt->dst.flags & DST_XFRM_TUNNEL)
@@ -1784,6 +1784,7 @@ int ip6_append_data(struct sock *sk,
 		/*
 		 * setup for corking
 		 */
+		dst_hold(&rt->dst);
 		err = ip6_setup_cork(sk, &inet->cork, &np->cork,
 				     ipc6, rt);
 		if (err)
@@ -1974,8 +1975,10 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 	int exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0);
 	int err;
 
-	if (flags & MSG_PROBE)
+	if (flags & MSG_PROBE) {
+		dst_release(&rt->dst);
 		return NULL;
+	}
 
 	__skb_queue_head_init(&queue);
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index eec83e34ae27..3039dff7fe64 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1541,7 +1541,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		err = PTR_ERR(skb);
 		if (!IS_ERR_OR_NULL(skb))
 			err = udp_v6_send_skb(skb, fl6, &cork.base);
-		goto out;
+		/* ip6_make_skb steals dst reference */
+		goto out_no_dst;
 	}
 
 	lock_sock(sk);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/14] skbuff: drop zero check from skb_zcopy_set
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 09/14] ipv6: hand dst refs to cork setup Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 11/14] skbuff: drop null check from skb_zcopy Pavel Begunkov
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

Only two skb_zcopy_set() callers may pass a NULL skb, so kill the zero
check from inside the function, which can't be compiled out, and place
it where needed. It's also needed by the following patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 2 +-
 net/ipv4/ip_output.c   | 3 ++-
 net/ipv6/ip6_output.c  | 3 ++-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 642acb0d1646..8a7d0d03a100 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1499,7 +1499,7 @@ static inline void skb_zcopy_init(struct sk_buff *skb, struct ubuf_info *uarg)
 static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
 				 bool *have_ref)
 {
-	if (skb && uarg && !skb_zcopy(skb)) {
+	if (uarg && !skb_zcopy(skb)) {
 		if (unlikely(have_ref && *have_ref))
 			*have_ref = false;
 		else
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 57c1d8431386..87d4472545a5 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1010,7 +1010,8 @@ static int __ip_append_data(struct sock *sk,
 			paged = true;
 		} else {
 			uarg->zerocopy = 0;
-			skb_zcopy_set(skb, uarg, &extra_uref);
+			if (skb)
+				skb_zcopy_set(skb, uarg, &extra_uref);
 		}
 	}
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6a7bba4dd04d..9881b61da493 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1522,7 +1522,8 @@ static int __ip6_append_data(struct sock *sk,
 			paged = true;
 		} else {
 			uarg->zerocopy = 0;
-			skb_zcopy_set(skb, uarg, &extra_uref);
+			if (skb)
+				skb_zcopy_set(skb, uarg, &extra_uref);
 		}
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/14] skbuff: drop null check from skb_zcopy
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 10/14] skbuff: drop zero check from skb_zcopy_set Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 12/14] skbuff: optimise alloc_skb_with_frags() Pavel Begunkov
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

skb_zcopy() is used all around the networkong code with many of calls
sitting in generic not necessarily zerocopy paths. Many of callers
don't ever pass a NULL skb, however a NULL check inside skb_zcopy()
can't be optimised out. As with previous patch, move the check out of
the helper to a few places where it's needed.

It removes a bunch of extra ifs in non-zerocopy paths, which is nice.
E.g. before and after:

   text    data     bss     dec     hex filename
8521472       0       0 8521472  820700 arch/x86/boot/bzImage
8521056       0       0 8521056  820560 arch/x86/boot/bzImage
delta=416B

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 2 +-
 net/core/dev.c         | 2 +-
 net/core/skbuff.c      | 3 ++-
 net/ipv4/ip_output.c   | 7 +++++--
 net/ipv4/tcp.c         | 5 ++++-
 net/ipv6/ip6_output.c  | 7 +++++--
 6 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 8a7d0d03a100..7fd2b44aada0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1469,7 +1469,7 @@ static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
 
 static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb)
 {
-	bool is_zcopy = skb && skb_shinfo(skb)->flags & SKBFL_ZEROCOPY_ENABLE;
+	bool is_zcopy = skb_shinfo(skb)->flags & SKBFL_ZEROCOPY_ENABLE;
 
 	return is_zcopy ? skb_uarg(skb) : NULL;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 83a4089990a0..877ebc0f72bd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2239,7 +2239,7 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
 	}
 out_unlock:
 	if (pt_prev) {
-		if (!skb_orphan_frags_rx(skb2, GFP_ATOMIC))
+		if (!skb2 || !skb_orphan_frags_rx(skb2, GFP_ATOMIC))
 			pt_prev->func(skb2, skb->dev, pt_prev, skb->dev);
 		else
 			kfree_skb(skb2);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e514a36bcffc..a9b8ac38dc1a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -890,7 +890,8 @@ EXPORT_SYMBOL(skb_dump);
  */
 void skb_tx_error(struct sk_buff *skb)
 {
-	skb_zcopy_clear(skb, true);
+	if (skb)
+		skb_zcopy_clear(skb, true);
 }
 EXPORT_SYMBOL(skb_tx_error);
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 87d4472545a5..b63f307cc5ab 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1001,10 +1001,13 @@ static int __ip_append_data(struct sock *sk,
 		csummode = CHECKSUM_PARTIAL;
 
 	if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
-		uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
+		if (skb)
+			uarg = skb_zcopy(skb);
+		extra_uref = !uarg; /* only ref on new uarg */
+
+		uarg = msg_zerocopy_realloc(sk, length, uarg);
 		if (!uarg)
 			return -ENOBUFS;
-		extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
 		if (rt->dst.dev->features & NETIF_F_SG &&
 		    csummode == CHECKSUM_PARTIAL) {
 			paged = true;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3b75836db19b..f35e49ea08ec 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1188,7 +1188,10 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 	if (flags & MSG_ZEROCOPY && size && sock_flag(sk, SOCK_ZEROCOPY)) {
 		skb = tcp_write_queue_tail(sk);
-		uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
+		if (skb)
+			uarg = skb_zcopy(skb);
+
+		uarg = msg_zerocopy_realloc(sk, size, uarg);
 		if (!uarg) {
 			err = -ENOBUFS;
 			goto out_err;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9881b61da493..41abe83c3419 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1513,10 +1513,13 @@ static int __ip6_append_data(struct sock *sk,
 		csummode = CHECKSUM_PARTIAL;
 
 	if (flags & MSG_ZEROCOPY && length && sock_flag(sk, SOCK_ZEROCOPY)) {
-		uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
+		if (skb)
+			uarg = skb_zcopy(skb);
+		extra_uref = !uarg; /* only ref on new uarg */
+
+		uarg = msg_zerocopy_realloc(sk, length, uarg);
 		if (!uarg)
 			return -ENOBUFS;
-		extra_uref = !skb_zcopy(skb);	/* only ref on new uarg */
 		if (rt->dst.dev->features & NETIF_F_SG &&
 		    csummode == CHECKSUM_PARTIAL) {
 			paged = true;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/14] skbuff: optimise alloc_skb_with_frags()
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (10 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 11/14] skbuff: drop null check from skb_zcopy Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 13/14] net: inline part of skb_csum_hwoffload_help Pavel Begunkov
  2022-01-11  1:21 ` [PATCH 14/14] net: inline sock_alloc_send_skb Pavel Begunkov
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

Many users of alloc_skb_with_frags() pass zero datalen, e.g.
all callers sock_alloc_send_skb() including udp. Extract and inline a
part of it doing skb allocation. BTW, do a minor cleanup, e.g. don't
set errcode in advance as it can't be optimised.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 41 ++++++++++++++++++++++++++++++++++++-----
 net/core/skbuff.c      | 31 ++++++++++++-------------------
 2 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7fd2b44aada0..8ea145101b56 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1130,11 +1130,42 @@ static inline struct sk_buff *alloc_skb(unsigned int size,
 	return __alloc_skb(size, priority, 0, NUMA_NO_NODE);
 }
 
-struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
-				     unsigned long data_len,
-				     int max_page_order,
-				     int *errcode,
-				     gfp_t gfp_mask);
+struct sk_buff *alloc_skb_frags(struct sk_buff *skb,
+				unsigned long data_len,
+				int max_page_order,
+				int *errcode,
+				gfp_t gfp_mask);
+
+/**
+ * alloc_skb_with_frags - allocate skb with page frags
+ *
+ * @header_len: size of linear part
+ * @data_len: needed length in frags
+ * @max_page_order: max page order desired.
+ * @errcode: pointer to error code if any
+ * @gfp_mask: allocation mask
+ *
+ * This can be used to allocate a paged skb, given a maximal order for frags.
+ */
+static inline struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
+						   unsigned long data_len,
+						   int max_page_order,
+						   int *errcode,
+						   gfp_t gfp_mask)
+{
+	struct sk_buff *skb;
+
+	skb = alloc_skb(header_len, gfp_mask);
+	if (unlikely(!skb)) {
+		*errcode = -ENOBUFS;
+		return NULL;
+	}
+
+	if (!data_len)
+		return skb;
+	return alloc_skb_frags(skb, data_len, max_page_order, errcode, gfp_mask);
+}
+
 struct sk_buff *alloc_skb_for_msg(struct sk_buff *first);
 
 /* Layout of fast clones : [skb1][skb2][fclone_ref] */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a9b8ac38dc1a..7811dde22f26 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5922,40 +5922,32 @@ int skb_mpls_dec_ttl(struct sk_buff *skb)
 EXPORT_SYMBOL_GPL(skb_mpls_dec_ttl);
 
 /**
- * alloc_skb_with_frags - allocate skb with page frags
+ * alloc_skb_frags - allocate page frags for skb
  *
- * @header_len: size of linear part
+ * @skb: buffer
  * @data_len: needed length in frags
  * @max_page_order: max page order desired.
  * @errcode: pointer to error code if any
  * @gfp_mask: allocation mask
  *
- * This can be used to allocate a paged skb, given a maximal order for frags.
+ * This can be used to allocate pages for skb, given a maximal order for frags.
  */
-struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
-				     unsigned long data_len,
-				     int max_page_order,
-				     int *errcode,
-				     gfp_t gfp_mask)
+struct sk_buff *alloc_skb_frags(struct sk_buff *skb,
+				unsigned long data_len,
+				int max_page_order,
+				int *errcode,
+				gfp_t gfp_mask)
 {
 	int npages = (data_len + (PAGE_SIZE - 1)) >> PAGE_SHIFT;
 	unsigned long chunk;
-	struct sk_buff *skb;
 	struct page *page;
 	int i;
 
-	*errcode = -EMSGSIZE;
 	/* Note this test could be relaxed, if we succeed to allocate
 	 * high order pages...
 	 */
-	if (npages > MAX_SKB_FRAGS)
-		return NULL;
-
-	*errcode = -ENOBUFS;
-	skb = alloc_skb(header_len, gfp_mask);
-	if (!skb)
-		return NULL;
-
+	if (unlikely(npages > MAX_SKB_FRAGS))
+		goto failure;
 	skb->truesize += npages << PAGE_SHIFT;
 
 	for (i = 0; npages > 0; i++) {
@@ -5989,9 +5981,10 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len,
 
 failure:
 	kfree_skb(skb);
+	*errcode = -EMSGSIZE;
 	return NULL;
 }
-EXPORT_SYMBOL(alloc_skb_with_frags);
+EXPORT_SYMBOL(alloc_skb_frags);
 
 /* carve out the first off bytes from skb when off < headlen */
 static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (11 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 12/14] skbuff: optimise alloc_skb_with_frags() Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  2022-01-11  9:24   ` David Laight
  2022-01-15 13:00   ` kernel test robot
  2022-01-11  1:21 ` [PATCH 14/14] net: inline sock_alloc_send_skb Pavel Begunkov
  13 siblings, 2 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

Inline a HW csum'ed part of skb_csum_hwoffload_help().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/netdevice.h | 16 ++++++++++++++--
 net/core/dev.c            | 13 +++----------
 2 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3213c7227b59..fbe6c764ce57 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
 
 int skb_checksum_help(struct sk_buff *skb);
 int skb_crc32c_csum_help(struct sk_buff *skb);
-int skb_csum_hwoffload_help(struct sk_buff *skb,
-			    const netdev_features_t features);
+int __skb_csum_hwoffload_help(struct sk_buff *skb,
+			      const netdev_features_t features);
+
+static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
+					  const netdev_features_t features)
+{
+	if (unlikely(skb_csum_is_sctp(skb)))
+		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
+			skb_crc32c_csum_help(skb);
+
+	if (features & NETIF_F_HW_CSUM)
+		return 0;
+	return __skb_csum_hwoffload_help(skb, features);
+}
 
 struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 				  netdev_features_t features, bool tx_path);
diff --git a/net/core/dev.c b/net/core/dev.c
index 877ebc0f72bd..e65a3b311810 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3513,16 +3513,9 @@ static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb,
 	return skb;
 }
 
-int skb_csum_hwoffload_help(struct sk_buff *skb,
-			    const netdev_features_t features)
+int __skb_csum_hwoffload_help(struct sk_buff *skb,
+			      const netdev_features_t features)
 {
-	if (unlikely(skb_csum_is_sctp(skb)))
-		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
-			skb_crc32c_csum_help(skb);
-
-	if (features & NETIF_F_HW_CSUM)
-		return 0;
-
 	if (features & (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM)) {
 		switch (skb->csum_offset) {
 		case offsetof(struct tcphdr, check):
@@ -3533,7 +3526,7 @@ int skb_csum_hwoffload_help(struct sk_buff *skb,
 
 	return skb_checksum_help(skb);
 }
-EXPORT_SYMBOL(skb_csum_hwoffload_help);
+EXPORT_SYMBOL(__skb_csum_hwoffload_help);
 
 static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device *dev, bool *again)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 14/14] net: inline sock_alloc_send_skb
  2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
                   ` (12 preceding siblings ...)
  2022-01-11  1:21 ` [PATCH 13/14] net: inline part of skb_csum_hwoffload_help Pavel Begunkov
@ 2022-01-11  1:21 ` Pavel Begunkov
  13 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11  1:21 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel, Pavel Begunkov

sock_alloc_send_skb() is simple and just proxying to another function,
so we can inline it and cut associated overhead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/sock.h | 10 ++++++++--
 net/core/sock.c    |  7 -------
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 7b4b4237e6e0..cde35481a152 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1810,11 +1810,17 @@ int sock_getsockopt(struct socket *sock, int level, int op,
 		    char __user *optval, int __user *optlen);
 int sock_gettstamp(struct socket *sock, void __user *userstamp,
 		   bool timeval, bool time32);
-struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
-				    int noblock, int *errcode);
 struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 				     unsigned long data_len, int noblock,
 				     int *errcode, int max_page_order);
+
+static inline struct sk_buff *sock_alloc_send_skb(struct sock *sk,
+						  unsigned long size,
+						  int noblock, int *errcode)
+{
+	return sock_alloc_send_pskb(sk, size, 0, noblock, errcode, 0);
+}
+
 void *sock_kmalloc(struct sock *sk, int size, gfp_t priority);
 void sock_kfree_s(struct sock *sk, void *mem, int size);
 void sock_kzfree_s(struct sock *sk, void *mem, int size);
diff --git a/net/core/sock.c b/net/core/sock.c
index e21485ab285d..25a266a429d4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2592,13 +2592,6 @@ struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
 }
 EXPORT_SYMBOL(sock_alloc_send_pskb);
 
-struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size,
-				    int noblock, int *errcode)
-{
-	return sock_alloc_send_pskb(sk, size, 0, noblock, errcode, 0);
-}
-EXPORT_SYMBOL(sock_alloc_send_skb);
-
 int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 		     struct sockcm_cookie *sockc)
 {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* RE: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11  1:21 ` [PATCH 13/14] net: inline part of skb_csum_hwoffload_help Pavel Begunkov
@ 2022-01-11  9:24   ` David Laight
  2022-01-11 16:59     ` Pavel Begunkov
  2022-01-15 13:00   ` kernel test robot
  1 sibling, 1 reply; 28+ messages in thread
From: David Laight @ 2022-01-11  9:24 UTC (permalink / raw)
  To: 'Pavel Begunkov', netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

From: Pavel Begunkov
> Sent: 11 January 2022 01:22
> 
> Inline a HW csum'ed part of skb_csum_hwoffload_help().
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/netdevice.h | 16 ++++++++++++++--
>  net/core/dev.c            | 13 +++----------
>  2 files changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3213c7227b59..fbe6c764ce57 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
> 
>  int skb_checksum_help(struct sk_buff *skb);
>  int skb_crc32c_csum_help(struct sk_buff *skb);
> -int skb_csum_hwoffload_help(struct sk_buff *skb,
> -			    const netdev_features_t features);
> +int __skb_csum_hwoffload_help(struct sk_buff *skb,
> +			      const netdev_features_t features);
> +
> +static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
> +					  const netdev_features_t features)
> +{
> +	if (unlikely(skb_csum_is_sctp(skb)))
> +		return !!(features & NETIF_F_SCTP_CRC) ? 0 :

If that !! doing anything? - doesn't look like it.

> +			skb_crc32c_csum_help(skb);
> +
> +	if (features & NETIF_F_HW_CSUM)
> +		return 0;
> +	return __skb_csum_hwoffload_help(skb, features);
> +}

Maybe you should remove some bloat by moving the sctp code
into the called function.
This probably needs something like?

{
	if (features & NETIF_F_HW_CSUM && !skb_csum_is_sctp(skb))
		return 0;
	return __skb_csum_hw_offload(skb, features);
}

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-11  1:21 ` [PATCH 09/14] ipv6: hand dst refs to cork setup Pavel Begunkov
@ 2022-01-11 15:39   ` Willem de Bruijn
  2022-01-11 15:57     ` Pavel Begunkov
  2022-01-11 17:11   ` Paolo Abeni
  1 sibling, 1 reply; 28+ messages in thread
From: Willem de Bruijn @ 2022-01-11 15:39 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: netdev, David S . Miller, Jakub Kicinski, Hideaki YOSHIFUJI,
	David Ahern, Eric Dumazet, Willem de Bruijn, linux-kernel

On Mon, Jan 10, 2022 at 8:25 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> During cork->dst setup, ip6_make_skb() gets an additional reference to
> a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
> ip6_make_skb(), and so we can save two additional atomics by passing
> dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
> it's enough to make sure it doesn't use dst afterwards.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---

There are two patches 9/14

>  net/ipv6/ip6_output.c | 9 ++++++---
>  net/ipv6/udp.c        | 3 ++-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 0cc490f2cfbf..6a7bba4dd04d 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1356,6 +1356,8 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
>         unsigned int mtu;
>         struct ipv6_txoptions *nopt, *opt = ipc6->opt;
>
> +       cork->base.dst = &rt->dst;
> +

Is there a reason to move this up from its original location next to
the other cork initialization assignments?

That the reference is taken in ip6_append_data for corked requests
(once, in setup cork branch) and inherited from udpv6_send_skb
otherwise is non-trivial. Worth a comment.

>         /*
>          * setup for corking
>          */
> @@ -1389,8 +1391,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
>
>                 /* need source address above miyazawa*/
>         }
> -       dst_hold(&rt->dst);
> -       cork->base.dst = &rt->dst;
>         v6_cork->hop_limit = ipc6->hlimit;
>         v6_cork->tclass = ipc6->tclass;
>         if (rt->dst.flags & DST_XFRM_TUNNEL)
> @@ -1784,6 +1784,7 @@ int ip6_append_data(struct sock *sk,
>                 /*
>                  * setup for corking
>                  */
> +               dst_hold(&rt->dst);
>                 err = ip6_setup_cork(sk, &inet->cork, &np->cork,
>                                      ipc6, rt);
>                 if (err)
> @@ -1974,8 +1975,10 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
>         int exthdrlen = (ipc6->opt ? ipc6->opt->opt_flen : 0);
>         int err;
>
> -       if (flags & MSG_PROBE)
> +       if (flags & MSG_PROBE) {
> +               dst_release(&rt->dst);
>                 return NULL;
> +       }
>
>         __skb_queue_head_init(&queue);
>
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index eec83e34ae27..3039dff7fe64 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1541,7 +1541,8 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>                 err = PTR_ERR(skb);
>                 if (!IS_ERR_OR_NULL(skb))
>                         err = udp_v6_send_skb(skb, fl6, &cork.base);
> -               goto out;
> +               /* ip6_make_skb steals dst reference */
> +               goto out_no_dst;
>         }
>
>         lock_sock(sk);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-11 15:39   ` Willem de Bruijn
@ 2022-01-11 15:57     ` Pavel Begunkov
  0 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11 15:57 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: netdev, David S . Miller, Jakub Kicinski, Hideaki YOSHIFUJI,
	David Ahern, Eric Dumazet, linux-kernel

On 1/11/22 15:39, Willem de Bruijn wrote:
> On Mon, Jan 10, 2022 at 8:25 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> During cork->dst setup, ip6_make_skb() gets an additional reference to
>> a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
>> ip6_make_skb(), and so we can save two additional atomics by passing
>> dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
>> it's enough to make sure it doesn't use dst afterwards.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
> 
> There are two patches 9/14

Weird, thanks for letting know

> 
>>   net/ipv6/ip6_output.c | 9 ++++++---
>>   net/ipv6/udp.c        | 3 ++-
>>   2 files changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 0cc490f2cfbf..6a7bba4dd04d 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1356,6 +1356,8 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
>>          unsigned int mtu;
>>          struct ipv6_txoptions *nopt, *opt = ipc6->opt;
>>
>> +       cork->base.dst = &rt->dst;
>> +
> 
> Is there a reason to move this up from its original location next to
> the other cork initialization assignments?

ip6_setup_cork() consumes a dst ref now even in error cases, moved
it to not patch up all error returns in there. On the other hand
I can add dst_release() to callers when it failed.


> That the reference is taken in ip6_append_data for corked requests
> (once, in setup cork branch) and inherited from udpv6_send_skb
> otherwise is non-trivial. Worth a comment.

Will update in v2, thanks

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11  9:24   ` David Laight
@ 2022-01-11 16:59     ` Pavel Begunkov
  2022-01-11 17:25       ` David Laight
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11 16:59 UTC (permalink / raw)
  To: David Laight, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On 1/11/22 09:24, David Laight wrote:
> From: Pavel Begunkov
>> Sent: 11 January 2022 01:22
>>
>> Inline a HW csum'ed part of skb_csum_hwoffload_help().
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/linux/netdevice.h | 16 ++++++++++++++--
>>   net/core/dev.c            | 13 +++----------
>>   2 files changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 3213c7227b59..fbe6c764ce57 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
>>
>>   int skb_checksum_help(struct sk_buff *skb);
>>   int skb_crc32c_csum_help(struct sk_buff *skb);
>> -int skb_csum_hwoffload_help(struct sk_buff *skb,
>> -			    const netdev_features_t features);
>> +int __skb_csum_hwoffload_help(struct sk_buff *skb,
>> +			      const netdev_features_t features);
>> +
>> +static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
>> +					  const netdev_features_t features)
>> +{
>> +	if (unlikely(skb_csum_is_sctp(skb)))
>> +		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
> 
> If that !! doing anything? - doesn't look like it.

It doesn't, but left the original style


>> +			skb_crc32c_csum_help(skb);
>> +
>> +	if (features & NETIF_F_HW_CSUM)
>> +		return 0;
>> +	return __skb_csum_hwoffload_help(skb, features);
>> +}
> 
> Maybe you should remove some bloat by moving the sctp code
> into the called function.
> This probably needs something like?
> 
> {
> 	if (features & NETIF_F_HW_CSUM && !skb_csum_is_sctp(skb))
> 		return 0;
> 	return __skb_csum_hw_offload(skb, features);
> }

I don't like inlining that sctp chunk myself. It seems your way would
need another skb_csum_is_sctp() in __skb_csum_hw_offload(), if so I
don't think it's worth it. Would've been great to put the
NETIF_F_HW_CSUM check first and hide sctp, but don't think it's
correct. Would be great to hear some ideas.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-11  1:21 ` [PATCH 09/14] ipv6: hand dst refs to cork setup Pavel Begunkov
  2022-01-11 15:39   ` Willem de Bruijn
@ 2022-01-11 17:11   ` Paolo Abeni
  2022-01-11 20:39     ` Pavel Begunkov
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2022-01-11 17:11 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On Tue, 2022-01-11 at 01:21 +0000, Pavel Begunkov wrote:
> During cork->dst setup, ip6_make_skb() gets an additional reference to
> a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
> ip6_make_skb(), and so we can save two additional atomics by passing
> dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
> it's enough to make sure it doesn't use dst afterwards.

What about the corked path in udp6_sendmsg()? I mean:

udp6_sendmsg(MSG_MORE) -> ip6_append_data() -> ip6_setup_cork() 

what if ip6_setup_cork() errors out in that path?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11 16:59     ` Pavel Begunkov
@ 2022-01-11 17:25       ` David Laight
  2022-01-11 20:48         ` Pavel Begunkov
  0 siblings, 1 reply; 28+ messages in thread
From: David Laight @ 2022-01-11 17:25 UTC (permalink / raw)
  To: 'Pavel Begunkov', netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

From: Pavel Begunkov
> Sent: 11 January 2022 16:59
> 
> On 1/11/22 09:24, David Laight wrote:
> > From: Pavel Begunkov
> >> Sent: 11 January 2022 01:22
> >>
> >> Inline a HW csum'ed part of skb_csum_hwoffload_help().
> >>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >>   include/linux/netdevice.h | 16 ++++++++++++++--
> >>   net/core/dev.c            | 13 +++----------
> >>   2 files changed, 17 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >> index 3213c7227b59..fbe6c764ce57 100644
> >> --- a/include/linux/netdevice.h
> >> +++ b/include/linux/netdevice.h
> >> @@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
> >>
> >>   int skb_checksum_help(struct sk_buff *skb);
> >>   int skb_crc32c_csum_help(struct sk_buff *skb);
> >> -int skb_csum_hwoffload_help(struct sk_buff *skb,
> >> -			    const netdev_features_t features);
> >> +int __skb_csum_hwoffload_help(struct sk_buff *skb,
> >> +			      const netdev_features_t features);
> >> +
> >> +static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
> >> +					  const netdev_features_t features)
> >> +{
> >> +	if (unlikely(skb_csum_is_sctp(skb)))
> >> +		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
> >
> > If that !! doing anything? - doesn't look like it.
> 
> It doesn't, but left the original style

It just makes you think it is needed...

> >> +			skb_crc32c_csum_help(skb);
> >> +
> >> +	if (features & NETIF_F_HW_CSUM)
> >> +		return 0;
> >> +	return __skb_csum_hwoffload_help(skb, features);
> >> +}
> >
> > Maybe you should remove some bloat by moving the sctp code
> > into the called function.
> > This probably needs something like?
> >
> > {
> > 	if (features & NETIF_F_HW_CSUM && !skb_csum_is_sctp(skb))
> > 		return 0;
> > 	return __skb_csum_hw_offload(skb, features);
> > }
> 
> I don't like inlining that sctp chunk myself. It seems your way would
> need another skb_csum_is_sctp() in __skb_csum_hw_offload(), if so I
> don't think it's worth it. Would've been great to put the
> NETIF_F_HW_CSUM check first and hide sctp, but don't think it's
> correct. Would be great to hear some ideas.

Given the definition:

static inline bool skb_csum_is_sctp(struct sk_buff *skb)
{
	return skb->csum_not_inet;
}

I wouldn't worry about doing it twice.

Also skb_crc32_csum_help() is only called one.
Make it static (so inlined) and pass 'features' into it.

In reality sctp is such a slow crappy protocol that a few extra
function calls will make diddly-squit difference.
(And yes, we do actually use the sctp stack.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-11 17:11   ` Paolo Abeni
@ 2022-01-11 20:39     ` Pavel Begunkov
  2022-01-12 11:15       ` Paolo Abeni
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11 20:39 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On 1/11/22 17:11, Paolo Abeni wrote:
> On Tue, 2022-01-11 at 01:21 +0000, Pavel Begunkov wrote:
>> During cork->dst setup, ip6_make_skb() gets an additional reference to
>> a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
>> ip6_make_skb(), and so we can save two additional atomics by passing
>> dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
>> it's enough to make sure it doesn't use dst afterwards.
> 
> What about the corked path in udp6_sendmsg()? I mean:

It doesn't change it for callers, so the ref stays with udp6_sendmsg() when
corking. To compensate for ip6_setup_cork() there is an explicit dst_hold()
in ip6_append_data, should be fine.

@@ -1784,6 +1784,7 @@ int ip6_append_data(struct sock *sk,
  		/*
  		 * setup for corking
  		 */
+		dst_hold(&rt->dst);
  		err = ip6_setup_cork(sk, &inet->cork, &np->cork,
  				     ipc6, rt);


I don't care much about corking perf, but might be better to implement
this "handing away" for ip6_append_data() as well to be more consistent
with ip6_make_skb().


> udp6_sendmsg(MSG_MORE) -> ip6_append_data() -> ip6_setup_cork()
> 
> what if ip6_setup_cork() errors out in that path?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11 17:25       ` David Laight
@ 2022-01-11 20:48         ` Pavel Begunkov
  2022-01-12  2:41           ` David Laight
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-11 20:48 UTC (permalink / raw)
  To: David Laight, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On 1/11/22 17:25, David Laight wrote:
> From: Pavel Begunkov
>> Sent: 11 January 2022 16:59
>>
>> On 1/11/22 09:24, David Laight wrote:
>>> From: Pavel Begunkov
>>>> Sent: 11 January 2022 01:22
>>>>
>>>> Inline a HW csum'ed part of skb_csum_hwoffload_help().
>>>>
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> ---
>>>>    include/linux/netdevice.h | 16 ++++++++++++++--
>>>>    net/core/dev.c            | 13 +++----------
>>>>    2 files changed, 17 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 3213c7227b59..fbe6c764ce57 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
>>>>
>>>>    int skb_checksum_help(struct sk_buff *skb);
>>>>    int skb_crc32c_csum_help(struct sk_buff *skb);
>>>> -int skb_csum_hwoffload_help(struct sk_buff *skb,
>>>> -			    const netdev_features_t features);
>>>> +int __skb_csum_hwoffload_help(struct sk_buff *skb,
>>>> +			      const netdev_features_t features);
>>>> +
>>>> +static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
>>>> +					  const netdev_features_t features)
>>>> +{
>>>> +	if (unlikely(skb_csum_is_sctp(skb)))
>>>> +		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
>>>
>>> If that !! doing anything? - doesn't look like it.
>>
>> It doesn't, but left the original style
> 
> It just makes you think it is needed...
> 
>>>> +			skb_crc32c_csum_help(skb);
>>>> +
>>>> +	if (features & NETIF_F_HW_CSUM)
>>>> +		return 0;
>>>> +	return __skb_csum_hwoffload_help(skb, features);
>>>> +}
>>>
>>> Maybe you should remove some bloat by moving the sctp code
>>> into the called function.
>>> This probably needs something like?
>>>
>>> {
>>> 	if (features & NETIF_F_HW_CSUM && !skb_csum_is_sctp(skb))
>>> 		return 0;
>>> 	return __skb_csum_hw_offload(skb, features);
>>> }
>>
>> I don't like inlining that sctp chunk myself. It seems your way would
>> need another skb_csum_is_sctp() in __skb_csum_hw_offload(), if so I
>> don't think it's worth it. Would've been great to put the
>> NETIF_F_HW_CSUM check first and hide sctp, but don't think it's
>> correct. Would be great to hear some ideas.
> 
> Given the definition:
> 
> static inline bool skb_csum_is_sctp(struct sk_buff *skb)
> {
> 	return skb->csum_not_inet;
> }
> 
> I wouldn't worry about doing it twice.
> 
> Also skb_crc32_csum_help() is only called one.
> Make it static (so inlined) and pass 'features' into it.
> 
> In reality sctp is such a slow crappy protocol that a few extra
> function calls will make diddly-squit difference.
> (And yes, we do actually use the sctp stack.)

I was more thinking about non-sctp path without NETIF_F_HW_CSUM


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11 20:48         ` Pavel Begunkov
@ 2022-01-12  2:41           ` David Laight
  2022-01-12 16:43             ` Pavel Begunkov
  0 siblings, 1 reply; 28+ messages in thread
From: David Laight @ 2022-01-12  2:41 UTC (permalink / raw)
  To: 'Pavel Begunkov', netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

From: Pavel Begunkov
> Sent: 11 January 2022 20:48
> On 1/11/22 17:25, David Laight wrote:
> > From: Pavel Begunkov
> >> Sent: 11 January 2022 16:59
> >>
> >> On 1/11/22 09:24, David Laight wrote:
> >>> From: Pavel Begunkov
> >>>> Sent: 11 January 2022 01:22
> >>>>
> >>>> Inline a HW csum'ed part of skb_csum_hwoffload_help().
> >>>>
> >>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >>>> ---
> >>>>    include/linux/netdevice.h | 16 ++++++++++++++--
> >>>>    net/core/dev.c            | 13 +++----------
> >>>>    2 files changed, 17 insertions(+), 12 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>>> index 3213c7227b59..fbe6c764ce57 100644
> >>>> --- a/include/linux/netdevice.h
> >>>> +++ b/include/linux/netdevice.h
> >>>> @@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
> >>>>
> >>>>    int skb_checksum_help(struct sk_buff *skb);
> >>>>    int skb_crc32c_csum_help(struct sk_buff *skb);
> >>>> -int skb_csum_hwoffload_help(struct sk_buff *skb,
> >>>> -			    const netdev_features_t features);
> >>>> +int __skb_csum_hwoffload_help(struct sk_buff *skb,
> >>>> +			      const netdev_features_t features);
> >>>> +
> >>>> +static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
> >>>> +					  const netdev_features_t features)
> >>>> +{
> >>>> +	if (unlikely(skb_csum_is_sctp(skb)))
> >>>> +		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
> >>>
> >>> If that !! doing anything? - doesn't look like it.
> >>
> >> It doesn't, but left the original style
> >
> > It just makes you think it is needed...
> >
> >>>> +			skb_crc32c_csum_help(skb);
> >>>> +
> >>>> +	if (features & NETIF_F_HW_CSUM)
> >>>> +		return 0;
> >>>> +	return __skb_csum_hwoffload_help(skb, features);
> >>>> +}
> >>>
> >>> Maybe you should remove some bloat by moving the sctp code
> >>> into the called function.
> >>> This probably needs something like?
> >>>
> >>> {
> >>> 	if (features & NETIF_F_HW_CSUM && !skb_csum_is_sctp(skb))
> >>> 		return 0;
> >>> 	return __skb_csum_hw_offload(skb, features);
> >>> }
> >>
> >> I don't like inlining that sctp chunk myself. It seems your way would
> >> need another skb_csum_is_sctp() in __skb_csum_hw_offload(), if so I
> >> don't think it's worth it. Would've been great to put the
> >> NETIF_F_HW_CSUM check first and hide sctp, but don't think it's
> >> correct. Would be great to hear some ideas.
> >
> > Given the definition:
> >
> > static inline bool skb_csum_is_sctp(struct sk_buff *skb)
> > {
> > 	return skb->csum_not_inet;
> > }
> >
> > I wouldn't worry about doing it twice.
> >
> > Also skb_crc32_csum_help() is only called one.
> > Make it static (so inlined) and pass 'features' into it.
> >
> > In reality sctp is such a slow crappy protocol that a few extra
> > function calls will make diddly-squit difference.
> > (And yes, we do actually use the sctp stack.)
> 
> I was more thinking about non-sctp path without NETIF_F_HW_CSUM

In which case you need the body of __skb_csum_hw_offload()
and end up doing the 'sctp' check once inside it.
The 'sctp' check is only done twice for sctp.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-11 20:39     ` Pavel Begunkov
@ 2022-01-12 11:15       ` Paolo Abeni
  2022-01-12 16:49         ` Pavel Begunkov
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Abeni @ 2022-01-12 11:15 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On Tue, 2022-01-11 at 20:39 +0000, Pavel Begunkov wrote:
> On 1/11/22 17:11, Paolo Abeni wrote:
> > On Tue, 2022-01-11 at 01:21 +0000, Pavel Begunkov wrote:
> > > During cork->dst setup, ip6_make_skb() gets an additional reference to
> > > a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
> > > ip6_make_skb(), and so we can save two additional atomics by passing
> > > dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
> > > it's enough to make sure it doesn't use dst afterwards.
> > 
> > What about the corked path in udp6_sendmsg()? I mean:
> 
> It doesn't change it for callers, so the ref stays with udp6_sendmsg() when
> corking. To compensate for ip6_setup_cork() there is an explicit dst_hold()
> in ip6_append_data, should be fine.

Whoops, I underlooked that chunk, thanks for pointing it out!

Yes, it looks fine.

> @@ -1784,6 +1784,7 @@ int ip6_append_data(struct sock *sk,
>   		/*
>   		 * setup for corking
>   		 */
> +		dst_hold(&rt->dst);
>   		err = ip6_setup_cork(sk, &inet->cork, &np->cork,
>   				     ipc6, rt);
> 
> 
> I don't care much about corking perf, but might be better to implement
> this "handing away" for ip6_append_data() as well to be more consistent
> with ip6_make_skb().

I'm personally fine with the the added dst_hold() in ip6_append_data()

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-12  2:41           ` David Laight
@ 2022-01-12 16:43             ` Pavel Begunkov
  0 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-12 16:43 UTC (permalink / raw)
  To: David Laight, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On 1/12/22 02:41, David Laight wrote:
> From: Pavel Begunkov
>> Sent: 11 January 2022 20:48
>> On 1/11/22 17:25, David Laight wrote:
>>> From: Pavel Begunkov
>>>> Sent: 11 January 2022 16:59
>>>>
>>>> On 1/11/22 09:24, David Laight wrote:
>>>>> From: Pavel Begunkov
>>>>>> Sent: 11 January 2022 01:22
>>>>>>
>>>>>> Inline a HW csum'ed part of skb_csum_hwoffload_help().
>>>>>>
>>>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>>>> ---
>>>>>>     include/linux/netdevice.h | 16 ++++++++++++++--
>>>>>>     net/core/dev.c            | 13 +++----------
>>>>>>     2 files changed, 17 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>> index 3213c7227b59..fbe6c764ce57 100644
>>>>>> --- a/include/linux/netdevice.h
>>>>>> +++ b/include/linux/netdevice.h
>>>>>> @@ -4596,8 +4596,20 @@ void netdev_rss_key_fill(void *buffer, size_t len);
>>>>>>
>>>>>>     int skb_checksum_help(struct sk_buff *skb);
>>>>>>     int skb_crc32c_csum_help(struct sk_buff *skb);
>>>>>> -int skb_csum_hwoffload_help(struct sk_buff *skb,
>>>>>> -			    const netdev_features_t features);
>>>>>> +int __skb_csum_hwoffload_help(struct sk_buff *skb,
>>>>>> +			      const netdev_features_t features);
>>>>>> +
>>>>>> +static inline int skb_csum_hwoffload_help(struct sk_buff *skb,
>>>>>> +					  const netdev_features_t features)
>>>>>> +{
>>>>>> +	if (unlikely(skb_csum_is_sctp(skb)))
>>>>>> +		return !!(features & NETIF_F_SCTP_CRC) ? 0 :
>>>>>
>>>>> If that !! doing anything? - doesn't look like it.
>>>>
>>>> It doesn't, but left the original style
>>>
>>> It just makes you think it is needed...
>>>
>>>>>> +			skb_crc32c_csum_help(skb);
>>>>>> +
>>>>>> +	if (features & NETIF_F_HW_CSUM)
>>>>>> +		return 0;
>>>>>> +	return __skb_csum_hwoffload_help(skb, features);
>>>>>> +}
>>>>>
>>>>> Maybe you should remove some bloat by moving the sctp code
>>>>> into the called function.
>>>>> This probably needs something like?
>>>>>
>>>>> {
>>>>> 	if (features & NETIF_F_HW_CSUM && !skb_csum_is_sctp(skb))
>>>>> 		return 0;
>>>>> 	return __skb_csum_hw_offload(skb, features);
>>>>> }
>>>>
>>>> I don't like inlining that sctp chunk myself. It seems your way would
>>>> need another skb_csum_is_sctp() in __skb_csum_hw_offload(), if so I
>>>> don't think it's worth it. Would've been great to put the
>>>> NETIF_F_HW_CSUM check first and hide sctp, but don't think it's
>>>> correct. Would be great to hear some ideas.
>>>
>>> Given the definition:
>>>
>>> static inline bool skb_csum_is_sctp(struct sk_buff *skb)
>>> {
>>> 	return skb->csum_not_inet;
>>> }
>>>
>>> I wouldn't worry about doing it twice.
>>>
>>> Also skb_crc32_csum_help() is only called one.
>>> Make it static (so inlined) and pass 'features' into it.
>>>
>>> In reality sctp is such a slow crappy protocol that a few extra
>>> function calls will make diddly-squit difference.
>>> (And yes, we do actually use the sctp stack.)
>>
>> I was more thinking about non-sctp path without NETIF_F_HW_CSUM
> 
> In which case you need the body of __skb_csum_hw_offload()
> and end up doing the 'sctp' check once inside it.
> The 'sctp' check is only done twice for sctp.

Ah yes, might be better indeed

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/14] ipv6: hand dst refs to cork setup
  2022-01-12 11:15       ` Paolo Abeni
@ 2022-01-12 16:49         ` Pavel Begunkov
  0 siblings, 0 replies; 28+ messages in thread
From: Pavel Begunkov @ 2022-01-12 16:49 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: Hideaki YOSHIFUJI, David Ahern, Eric Dumazet, Willem de Bruijn,
	linux-kernel

On 1/12/22 11:15, Paolo Abeni wrote:
> On Tue, 2022-01-11 at 20:39 +0000, Pavel Begunkov wrote:
>> On 1/11/22 17:11, Paolo Abeni wrote:
>>> On Tue, 2022-01-11 at 01:21 +0000, Pavel Begunkov wrote:
>>>> During cork->dst setup, ip6_make_skb() gets an additional reference to
>>>> a passed in dst. However, udpv6_sendmsg() doesn't need dst after calling
>>>> ip6_make_skb(), and so we can save two additional atomics by passing
>>>> dst references to ip6_make_skb(). udpv6_sendmsg() is the only caller, so
>>>> it's enough to make sure it doesn't use dst afterwards.
>>>
>>> What about the corked path in udp6_sendmsg()? I mean:
>>
>> It doesn't change it for callers, so the ref stays with udp6_sendmsg() when
>> corking. To compensate for ip6_setup_cork() there is an explicit dst_hold()
>> in ip6_append_data, should be fine.
> 
> Whoops, I underlooked that chunk, thanks for pointing it out!
> 
> Yes, it looks fine.

perfect, thanks


>> @@ -1784,6 +1784,7 @@ int ip6_append_data(struct sock *sk,
>>    		/*
>>    		 * setup for corking
>>    		 */
>> +		dst_hold(&rt->dst);
>>    		err = ip6_setup_cork(sk, &inet->cork, &np->cork,
>>    				     ipc6, rt);
>>
>>
>> I don't care much about corking perf, but might be better to implement
>> this "handing away" for ip6_append_data() as well to be more consistent
>> with ip6_make_skb().
> 
> I'm personally fine with the the added dst_hold() in ip6_append_data()
-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/14] net: inline part of skb_csum_hwoffload_help
  2022-01-11  1:21 ` [PATCH 13/14] net: inline part of skb_csum_hwoffload_help Pavel Begunkov
  2022-01-11  9:24   ` David Laight
@ 2022-01-15 13:00   ` kernel test robot
  1 sibling, 0 replies; 28+ messages in thread
From: kernel test robot @ 2022-01-15 13:00 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 1857 bytes --]

Hi Pavel,

I love your patch! Yet something to improve:

[auto build test ERROR on net/master]
[also build test ERROR on net-next/master klassert-ipsec/master horms-ipvs/master linus/master v5.16 next-20220115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Pavel-Begunkov/udp-optimisation/20220111-092653
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git dd3ca4c5184ea98e40acb8eb293d85b88ea04ee2
config: s390-randconfig-r044-20220113 (https://download.01.org/0day-ci/archive/20220115/202201152052.qh2PH81Z-lkp(a)intel.com/config)
compiler: s390-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/e90b74891027be09208b388b5ff57657fce17c4a
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Pavel-Begunkov/udp-optimisation/20220111-092653
        git checkout e90b74891027be09208b388b5ff57657fce17c4a
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=s390 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "skb_crc32c_csum_help" [net/openvswitch/openvswitch.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-01-15 13:00 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-11  1:21 [PATCH 00/14] udp optimisation Pavel Begunkov
2022-01-11  1:21 ` [PATCH 01/14] ipv6: optimise dst referencing Pavel Begunkov
2022-01-11  1:21 ` [PATCH 02/14] ipv6: shuffle up->pending AF_INET bits Pavel Begunkov
2022-01-11  1:21 ` [PATCH 03/14] ipv6: remove daddr temp buffer in __ip6_make_skb Pavel Begunkov
2022-01-11  1:21 ` [PATCH 04/14] ipv6: clean up cork setup/release Pavel Begunkov
2022-01-11  1:21 ` [PATCH 05/14] ipv6: don't zero cork's flowi after use Pavel Begunkov
2022-01-11  1:21 ` [PATCH 06/14] ipv6: pass full cork into __ip6_append_data() Pavel Begunkov
2022-01-11  1:21 ` [PATCH 07/14] ipv6: pass flow in ip6_make_skb together with cork Pavel Begunkov
2022-01-11  1:21 ` [PATCH 08/14] ipv6/udp: don't make extra copies of iflow Pavel Begunkov
2022-01-11  1:21 ` [PATCH 09/14] ipv6: hand dst refs to cork setup Pavel Begunkov
2022-01-11 15:39   ` Willem de Bruijn
2022-01-11 15:57     ` Pavel Begunkov
2022-01-11 17:11   ` Paolo Abeni
2022-01-11 20:39     ` Pavel Begunkov
2022-01-12 11:15       ` Paolo Abeni
2022-01-12 16:49         ` Pavel Begunkov
2022-01-11  1:21 ` [PATCH 10/14] skbuff: drop zero check from skb_zcopy_set Pavel Begunkov
2022-01-11  1:21 ` [PATCH 11/14] skbuff: drop null check from skb_zcopy Pavel Begunkov
2022-01-11  1:21 ` [PATCH 12/14] skbuff: optimise alloc_skb_with_frags() Pavel Begunkov
2022-01-11  1:21 ` [PATCH 13/14] net: inline part of skb_csum_hwoffload_help Pavel Begunkov
2022-01-11  9:24   ` David Laight
2022-01-11 16:59     ` Pavel Begunkov
2022-01-11 17:25       ` David Laight
2022-01-11 20:48         ` Pavel Begunkov
2022-01-12  2:41           ` David Laight
2022-01-12 16:43             ` Pavel Begunkov
2022-01-15 13:00   ` kernel test robot
2022-01-11  1:21 ` [PATCH 14/14] net: inline sock_alloc_send_skb Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.