All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/11] UDP/IPv6 refactoring
@ 2022-04-28 10:56 Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init Pavel Begunkov
                   ` (11 more replies)
  0 siblings, 12 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
cleaner than it was before and the series also removes a bunch of instructions
and other overhead from the hot path positively affecting performance.

It was a part of a larger series, there were some perf numbers for it, see
https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/

Pavel Begunkov (11):
  ipv6: optimise ipcm6 cookie init
  udp/ipv6: refactor udpv6_sendmsg udplite checks
  udp/ipv6: move pending section of udpv6_sendmsg
  udp/ipv6: prioritise the ip6 path over ip4 checks
  udp/ipv6: optimise udpv6_sendmsg() daddr checks
  udp/ipv6: optimise out daddr reassignment
  udp/ipv6: clean up udpv6_sendmsg's saddr init
  ipv6: partially inline fl6_update_dst()
  ipv6: refactor opts push in __ip6_make_skb()
  ipv6: improve opt-less __ip6_make_skb()
  ipv6: clean up ip6_setup_cork

 include/net/ipv6.h    |  24 +++----
 net/ipv6/datagram.c   |   4 +-
 net/ipv6/exthdrs.c    |  15 ++--
 net/ipv6/ip6_output.c |  53 +++++++-------
 net/ipv6/raw.c        |   8 +--
 net/ipv6/udp.c        | 158 ++++++++++++++++++++----------------------
 net/l2tp/l2tp_ip6.c   |   8 +--
 7 files changed, 122 insertions(+), 148 deletions(-)

-- 
2.36.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 14:04   ` Paolo Abeni
  2022-04-28 10:56 ` [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks Pavel Begunkov
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Users of ipcm6_init() have a somewhat complex post initialisation
of ->dontfrag and ->tclass. Not only it adds additional overhead,
but also complicates the code.

First, replace ipcm6_init() with ipcm6_init_sk(). As it might be not an
equivalent change, let's first look at ->dontfrag. The logic was to set
it from cmsg if specified and otherwise fallback to np->dontfrag. Now
it's initialising to np->dontfrag in the beginning and then potentially
overriding with cmsg, which is absolutely the same behaviour.

It's a bit more complex with ->tclass as ip6_datagram_send_ctl() might
set it to -1, which is a default and not valid value. The solution
here is to skip -1's specified in cmsg, so it'll be left with the socket
default value getting us to the old behaviour.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/ipv6.h    | 9 ---------
 net/ipv6/datagram.c   | 4 ++--
 net/ipv6/ip6_output.c | 2 --
 net/ipv6/raw.c        | 8 +-------
 net/ipv6/udp.c        | 7 +------
 net/l2tp/l2tp_ip6.c   | 8 +-------
 6 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 213612f1680c..30a3447e34b4 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -352,15 +352,6 @@ struct ipcm6_cookie {
 	struct ipv6_txoptions *opt;
 };
 
-static inline void ipcm6_init(struct ipcm6_cookie *ipc6)
-{
-	*ipc6 = (struct ipcm6_cookie) {
-		.hlimit = -1,
-		.tclass = -1,
-		.dontfrag = -1,
-	};
-}
-
 static inline void ipcm6_init_sk(struct ipcm6_cookie *ipc6,
 				 const struct ipv6_pinfo *np)
 {
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 206f66310a88..1b334bc855ae 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -1003,9 +1003,9 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 			if (tc < -1 || tc > 0xff)
 				goto exit_f;
 
+			if (tc != -1)
+				ipc6->tclass = tc;
 			err = 0;
-			ipc6->tclass = tc;
-
 			break;
 		    }
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 1f3d777e7694..976554d0fdec 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -2001,8 +2001,6 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 		ip6_cork_release(cork, &v6_cork);
 		return ERR_PTR(err);
 	}
-	if (ipc6->dontfrag < 0)
-		ipc6->dontfrag = inet6_sk(sk)->dontfrag;
 
 	err = __ip6_append_data(sk, &queue, cork, &v6_cork,
 				&current->task_frag, getfrag, from,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 0d7c13d33d1a..4582e432fa9f 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -808,7 +808,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	fl6.flowi6_mark = sk->sk_mark;
 	fl6.flowi6_uid = sk->sk_uid;
 
-	ipcm6_init(&ipc6);
+	ipcm6_init_sk(&ipc6, np);
 	ipc6.sockc.tsflags = sk->sk_tsflags;
 	ipc6.sockc.mark = sk->sk_mark;
 
@@ -920,9 +920,6 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (hdrincl)
 		fl6.flowi6_flags |= FLOWI_FLAG_KNOWN_NH;
 
-	if (ipc6.tclass < 0)
-		ipc6.tclass = np->tclass;
-
 	fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
 
 	dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p);
@@ -933,9 +930,6 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (ipc6.hlimit < 0)
 		ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
 
-	if (ipc6.dontfrag < 0)
-		ipc6.dontfrag = np->dontfrag;
-
 	if (msg->msg_flags&MSG_CONFIRM)
 		goto do_confirm;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index db9449b52dbe..de8382930910 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1313,7 +1313,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	int is_udplite = IS_UDPLITE(sk);
 	int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
 
-	ipcm6_init(&ipc6);
+	ipcm6_init_sk(&ipc6, np);
 	ipc6.gso_size = READ_ONCE(up->gso_size);
 	ipc6.sockc.tsflags = sk->sk_tsflags;
 	ipc6.sockc.mark = sk->sk_mark;
@@ -1518,9 +1518,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	security_sk_classify_flow(sk, flowi6_to_flowi_common(fl6));
 
-	if (ipc6.tclass < 0)
-		ipc6.tclass = np->tclass;
-
 	fl6->flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6->flowlabel);
 
 	dst = ip6_sk_dst_lookup_flow(sk, fl6, final_p, connected);
@@ -1566,8 +1563,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	up->pending = AF_INET6;
 
 do_append_data:
-	if (ipc6.dontfrag < 0)
-		ipc6.dontfrag = np->dontfrag;
 	up->len += ulen;
 	err = ip6_append_data(sk, getfrag, msg, ulen, sizeof(struct udphdr),
 			      &ipc6, fl6, (struct rt6_info *)dst,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 217c7192691e..12406789bb28 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -521,7 +521,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	fl6.flowi6_mark = sk->sk_mark;
 	fl6.flowi6_uid = sk->sk_uid;
 
-	ipcm6_init(&ipc6);
+	ipcm6_init_sk(&ipc6, np);
 
 	if (lsa) {
 		if (addr_len < SIN6_LEN_RFC2133)
@@ -608,9 +608,6 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6));
 
-	if (ipc6.tclass < 0)
-		ipc6.tclass = np->tclass;
-
 	fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
 
 	dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p);
@@ -622,9 +619,6 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (ipc6.hlimit < 0)
 		ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
 
-	if (ipc6.dontfrag < 0)
-		ipc6.dontfrag = np->dontfrag;
-
 	if (msg->msg_flags & MSG_CONFIRM)
 		goto do_confirm;
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 14:09   ` Paolo Abeni
  2022-04-28 10:56 ` [PATCH net-next 03/11] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Don't save a IS_UDPLITE() result in advance but do when it's really
needed, so it doesn't store/load it from the stack. Same for resolving
the getfrag callback pointer.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index de8382930910..705eea080f5e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1310,7 +1310,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	int ulen = len;
 	int corkreq = READ_ONCE(up->corkflag) || msg->msg_flags&MSG_MORE;
 	int err;
-	int is_udplite = IS_UDPLITE(sk);
 	int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
 
 	ipcm6_init_sk(&ipc6, np);
@@ -1371,7 +1370,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (len > INT_MAX - sizeof(struct udphdr))
 		return -EMSGSIZE;
 
-	getfrag  =  is_udplite ?  udplite_getfrag : ip_generic_getfrag;
 	if (up->pending) {
 		if (up->pending == AF_INET)
 			return udp_sendmsg(sk, msg, len);
@@ -1538,6 +1536,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (!corkreq) {
 		struct sk_buff *skb;
 
+		getfrag = IS_UDPLITE(sk) ? udplite_getfrag : ip_generic_getfrag;
 		skb = ip6_make_skb(sk, getfrag, msg, ulen,
 				   sizeof(struct udphdr), &ipc6,
 				   (struct rt6_info *)dst,
@@ -1564,6 +1563,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 do_append_data:
 	up->len += ulen;
+	getfrag = IS_UDPLITE(sk) ? udplite_getfrag : ip_generic_getfrag;
 	err = ip6_append_data(sk, getfrag, msg, ulen, sizeof(struct udphdr),
 			      &ipc6, fl6, (struct rt6_info *)dst,
 			      corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
@@ -1594,7 +1594,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	 */
 	if (err == -ENOBUFS || test_bit(SOCK_NOSPACE, &sk->sk_socket->flags)) {
 		UDP6_INC_STATS(sock_net(sk),
-			       UDP_MIB_SNDBUFERRORS, is_udplite);
+			       UDP_MIB_SNDBUFERRORS, IS_UDPLITE(sk));
 	}
 	return err;
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 03/11] udp/ipv6: move pending section of udpv6_sendmsg
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 04/11] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Move up->pending section of udpv6_sendmsg() to the beginning of the
function. Even though it require some code duplication for sin6 parsing,
it clearly localises the pending handling in one place, removes an extra
if and more importantly will prepare the code for further patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 67 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 705eea080f5e..d6aedd4dab25 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1317,6 +1317,44 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	ipc6.sockc.tsflags = sk->sk_tsflags;
 	ipc6.sockc.mark = sk->sk_mark;
 
+	/* Rough check on arithmetic overflow,
+	   better check is made in ip6_append_data().
+	   */
+	if (unlikely(len > INT_MAX - sizeof(struct udphdr)))
+		return -EMSGSIZE;
+
+	/* There are pending frames. */
+	if (up->pending) {
+		if (up->pending == AF_INET)
+			return udp_sendmsg(sk, msg, len);
+
+		/* Do a quick destination sanity check before corking. */
+		if (sin6) {
+			if (msg->msg_namelen < offsetof(struct sockaddr, sa_data))
+				return -EINVAL;
+			if (sin6->sin6_family == AF_INET6) {
+				if (msg->msg_namelen < SIN6_LEN_RFC2133)
+					return -EINVAL;
+				if (ipv6_addr_any(&sin6->sin6_addr) &&
+				    ipv6_addr_v4mapped(&np->saddr))
+					return -EINVAL;
+			} else if (sin6->sin6_family != AF_UNSPEC) {
+				return -EINVAL;
+			}
+		}
+
+		/* The socket lock must be held while it's corked. */
+		lock_sock(sk);
+		if (unlikely(up->pending != AF_INET6)) {
+			/* Just now it was seen corked, userspace is buggy */
+			err = up->pending ? -EAFNOSUPPORT : -EINVAL;
+			release_sock(sk);
+			return err;
+		}
+		dst = NULL;
+		goto do_append_data;
+	}
+
 	/* destination address check */
 	if (sin6) {
 		if (addr_len < offsetof(struct sockaddr, sa_data))
@@ -1342,12 +1380,11 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		default:
 			return -EINVAL;
 		}
-	} else if (!up->pending) {
+	} else {
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 		daddr = &sk->sk_v6_daddr;
-	} else
-		daddr = NULL;
+	}
 
 	if (daddr) {
 		if (ipv6_addr_v4mapped(daddr)) {
@@ -1364,30 +1401,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		}
 	}
 
-	/* Rough check on arithmetic overflow,
-	   better check is made in ip6_append_data().
-	   */
-	if (len > INT_MAX - sizeof(struct udphdr))
-		return -EMSGSIZE;
-
-	if (up->pending) {
-		if (up->pending == AF_INET)
-			return udp_sendmsg(sk, msg, len);
-		/*
-		 * There are pending frames.
-		 * The socket lock must be held while it's corked.
-		 */
-		lock_sock(sk);
-		if (likely(up->pending)) {
-			if (unlikely(up->pending != AF_INET6)) {
-				release_sock(sk);
-				return -EAFNOSUPPORT;
-			}
-			dst = NULL;
-			goto do_append_data;
-		}
-		release_sock(sk);
-	}
 	ulen += sizeof(struct udphdr);
 
 	memset(fl6, 0, sizeof(*fl6));
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 04/11] udp/ipv6: prioritise the ip6 path over ip4 checks
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 03/11] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 05/11] udp/ipv6: optimise udpv6_sendmsg() daddr checks Pavel Begunkov
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

For AF_INET6 sockets we care the most about ipv6 but not ip4 mappings as
it's requires some extra hops anyway. Take AF_INET6 case from the address
parsing switch and add an explicit path for it. It removes some extra
ifs from the path and removes the switch overhead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d6aedd4dab25..78ce5fc53b59 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1357,30 +1357,27 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	/* destination address check */
 	if (sin6) {
-		if (addr_len < offsetof(struct sockaddr, sa_data))
-			return -EINVAL;
+		if (addr_len < SIN6_LEN_RFC2133 || sin6->sin6_family != AF_INET6) {
+			if (addr_len < offsetof(struct sockaddr, sa_data))
+				return -EINVAL;
 
-		switch (sin6->sin6_family) {
-		case AF_INET6:
-			if (addr_len < SIN6_LEN_RFC2133)
+			switch (sin6->sin6_family) {
+			case AF_INET:
+				goto do_udp_sendmsg;
+			case AF_UNSPEC:
+				msg->msg_name = sin6 = NULL;
+				msg->msg_namelen = addr_len = 0;
+				goto no_daddr;
+			default:
 				return -EINVAL;
-			daddr = &sin6->sin6_addr;
-			if (ipv6_addr_any(daddr) &&
-			    ipv6_addr_v4mapped(&np->saddr))
-				ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK),
-						       daddr);
-			break;
-		case AF_INET:
-			goto do_udp_sendmsg;
-		case AF_UNSPEC:
-			msg->msg_name = sin6 = NULL;
-			msg->msg_namelen = addr_len = 0;
-			daddr = NULL;
-			break;
-		default:
-			return -EINVAL;
+			}
 		}
+
+		daddr = &sin6->sin6_addr;
+		if (ipv6_addr_any(daddr) && ipv6_addr_v4mapped(&np->saddr))
+			ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK), daddr);
 	} else {
+no_daddr:
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 		daddr = &sk->sk_v6_daddr;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 05/11] udp/ipv6: optimise udpv6_sendmsg() daddr checks
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 04/11] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 06/11] udp/ipv6: optimise out daddr reassignment Pavel Begunkov
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

All paths taking udpv6_sendmsg() to the ipv6_addr_v4mapped() check set a
non zero daddr, we can safely kill the NULL check just before it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 78ce5fc53b59..1f05e165eb17 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1383,19 +1383,18 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		daddr = &sk->sk_v6_daddr;
 	}
 
-	if (daddr) {
-		if (ipv6_addr_v4mapped(daddr)) {
-			struct sockaddr_in sin;
-			sin.sin_family = AF_INET;
-			sin.sin_port = sin6 ? sin6->sin6_port : inet->inet_dport;
-			sin.sin_addr.s_addr = daddr->s6_addr32[3];
-			msg->msg_name = &sin;
-			msg->msg_namelen = sizeof(sin);
+	if (ipv6_addr_v4mapped(daddr)) {
+		struct sockaddr_in sin;
+
+		sin.sin_family = AF_INET;
+		sin.sin_port = sin6 ? sin6->sin6_port : inet->inet_dport;
+		sin.sin_addr.s_addr = daddr->s6_addr32[3];
+		msg->msg_name = &sin;
+		msg->msg_namelen = sizeof(sin);
 do_udp_sendmsg:
-			if (__ipv6_only_sock(sk))
-				return -ENETUNREACH;
-			return udp_sendmsg(sk, msg, len);
-		}
+		if (__ipv6_only_sock(sk))
+			return -ENETUNREACH;
+		return udp_sendmsg(sk, msg, len);
 	}
 
 	ulen += sizeof(struct udphdr);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 06/11] udp/ipv6: optimise out daddr reassignment
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 05/11] udp/ipv6: optimise udpv6_sendmsg() daddr checks Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 07/11] udp/ipv6: clean up udpv6_sendmsg's saddr init Pavel Begunkov
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

There is nothing that checks daddr placement in udpv6_sendmsg(), so the
check reassigning it to ->sk_v6_daddr looks like a not needed anymore
artifact from the past. Remove it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 1f05e165eb17..34c5919afa3e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1417,14 +1417,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			}
 		}
 
-		/*
-		 * Otherwise it will be difficult to maintain
-		 * sk->sk_dst_cache.
-		 */
-		if (sk->sk_state == TCP_ESTABLISHED &&
-		    ipv6_addr_equal(daddr, &sk->sk_v6_daddr))
-			daddr = &sk->sk_v6_daddr;
-
 		if (addr_len >= sizeof(struct sockaddr_in6) &&
 		    sin6->sin6_scope_id &&
 		    __ipv6_addr_needs_scope_id(__ipv6_addr_type(daddr)))
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 07/11] udp/ipv6: clean up udpv6_sendmsg's saddr init
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 06/11] udp/ipv6: optimise out daddr reassignment Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 08/11] ipv6: partially inline fl6_update_dst() Pavel Begunkov
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

We initialise fl6 in udpv6_sendmsg() to zeroes, that sets saddr to any
addr, then it might be changed in by cmsg but only to a non-any addr.
After we check again for it left set to "any", which is likely to be so,
and try to initialise it from socket saddr.

The result of it is that fl6->saddr is set to cmsg's saddr if specified
and inet6_sk(sk)->saddr otherwise. We can achieve the same by
pre-setting it to the sockets saddr and potentially overriding by cmsg
after.

This looks a bit cleaner comparing to conditional init and also removes
extra checks from the way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 34c5919afa3e..ae774766c116 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1431,14 +1431,15 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		connected = true;
 	}
 
+	fl6->flowi6_uid = sk->sk_uid;
+	fl6->saddr = np->saddr;
+	fl6->daddr = *daddr;
+
 	if (!fl6->flowi6_oif)
 		fl6->flowi6_oif = sk->sk_bound_dev_if;
-
 	if (!fl6->flowi6_oif)
 		fl6->flowi6_oif = np->sticky_pktinfo.ipi6_ifindex;
 
-	fl6->flowi6_uid = sk->sk_uid;
-
 	if (msg->msg_controllen) {
 		opt = &opt_space;
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
@@ -1473,9 +1474,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	fl6->flowi6_proto = sk->sk_protocol;
 	fl6->flowi6_mark = ipc6.sockc.mark;
-	fl6->daddr = *daddr;
-	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
-		fl6->saddr = np->saddr;
 	fl6->fl6_sport = inet->inet_sport;
 
 	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 08/11] ipv6: partially inline fl6_update_dst()
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 07/11] udp/ipv6: clean up udpv6_sendmsg's saddr init Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 09/11] ipv6: refactor opts push in __ip6_make_skb() Pavel Begunkov
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

fl6_update_dst() doesn't do anything when there are no opts passed.
Inline the null checking part.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/ipv6.h | 15 ++++++++++++---
 net/ipv6/exthdrs.c | 15 ++++++---------
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 30a3447e34b4..b9848fcd6954 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1094,9 +1094,18 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset, int target,
 
 int ipv6_find_tlv(const struct sk_buff *skb, int offset, int type);
 
-struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
-				const struct ipv6_txoptions *opt,
-				struct in6_addr *orig);
+struct in6_addr *__fl6_update_dst(struct flowi6 *fl6,
+				  const struct ipv6_txoptions *opt,
+				  struct in6_addr *orig);
+
+static inline struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
+					      const struct ipv6_txoptions *opt,
+					      struct in6_addr *orig)
+{
+	if (!opt || !opt->srcrt)
+		return NULL;
+	return __fl6_update_dst(fl6, opt, orig);
+}
 
 /*
  *	socket options (ipv6_sockglue.c)
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index a8d961d3a477..d02c27d4f2c2 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -1367,8 +1367,8 @@ struct ipv6_txoptions *__ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 EXPORT_SYMBOL_GPL(__ipv6_fixup_options);
 
 /**
- * fl6_update_dst - update flowi destination address with info given
- *                  by srcrt option, if any.
+ * __fl6_update_dst - update flowi destination address with info given
+ *                    by srcrt option.
  *
  * @fl6: flowi6 for which daddr is to be updated
  * @opt: struct ipv6_txoptions in which to look for srcrt opt
@@ -1377,13 +1377,10 @@ EXPORT_SYMBOL_GPL(__ipv6_fixup_options);
  * Returns NULL if no txoptions or no srcrt, otherwise returns orig
  * and initial value of fl6->daddr set in orig
  */
-struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
-				const struct ipv6_txoptions *opt,
-				struct in6_addr *orig)
+struct in6_addr *__fl6_update_dst(struct flowi6 *fl6,
+				  const struct ipv6_txoptions *opt,
+				  struct in6_addr *orig)
 {
-	if (!opt || !opt->srcrt)
-		return NULL;
-
 	*orig = fl6->daddr;
 
 	switch (opt->srcrt->type) {
@@ -1405,4 +1402,4 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
 
 	return orig;
 }
-EXPORT_SYMBOL_GPL(fl6_update_dst);
+EXPORT_SYMBOL_GPL(__fl6_update_dst);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 09/11] ipv6: refactor opts push in __ip6_make_skb()
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 08/11] ipv6: partially inline fl6_update_dst() Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 10/11] ipv6: improve opt-less __ip6_make_skb() Pavel Begunkov
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Don't preload v6_cork->opt before we actually need it, it likely to be
saved on the stack and read again for no good reason.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 976554d0fdec..43a541bbcf5f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1855,7 +1855,6 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct net *net = sock_net(sk);
 	struct ipv6hdr *hdr;
-	struct ipv6_txoptions *opt = v6_cork->opt;
 	struct rt6_info *rt = (struct rt6_info *)cork->base.dst;
 	struct flowi6 *fl6 = &cork->fl.u.ip6;
 	unsigned char proto = fl6->flowi6_proto;
@@ -1884,10 +1883,14 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 	__skb_pull(skb, skb_network_header_len(skb));
 
 	final_dst = &fl6->daddr;
-	if (opt && opt->opt_flen)
-		ipv6_push_frag_opts(skb, opt, &proto);
-	if (opt && opt->opt_nflen)
-		ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+	if (v6_cork->opt) {
+		struct ipv6_txoptions *opt = v6_cork->opt;
+
+		if (opt->opt_flen)
+			ipv6_push_frag_opts(skb, opt, &proto);
+		if (opt->opt_nflen)
+			ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+	}
 
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 10/11] ipv6: improve opt-less __ip6_make_skb()
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 09/11] ipv6: refactor opts push in __ip6_make_skb() Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 10:56 ` [PATCH net-next 11/11] ipv6: clean up ip6_setup_cork Pavel Begunkov
  2022-04-28 14:04 ` [PATCH net-next 00/11] UDP/IPv6 refactoring Paolo Abeni
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

We do a bit of a network header pointer shuffling in __ip6_make_skb()
expecting that ipv6_push_*frag_opts() might change the layout. Avoid it
with associated overhead when there are no opts.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 43a541bbcf5f..416d14299242 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1880,22 +1880,20 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	/* Allow local fragmentation. */
 	skb->ignore_df = ip6_sk_ignore_df(sk);
-	__skb_pull(skb, skb_network_header_len(skb));
-
 	final_dst = &fl6->daddr;
 	if (v6_cork->opt) {
 		struct ipv6_txoptions *opt = v6_cork->opt;
 
+		__skb_pull(skb, skb_network_header_len(skb));
 		if (opt->opt_flen)
 			ipv6_push_frag_opts(skb, opt, &proto);
 		if (opt->opt_nflen)
 			ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+		skb_push(skb, sizeof(struct ipv6hdr));
+		skb_reset_network_header(skb);
 	}
 
-	skb_push(skb, sizeof(struct ipv6hdr));
-	skb_reset_network_header(skb);
 	hdr = ipv6_hdr(skb);
-
 	ip6_flow_hdr(hdr, v6_cork->tclass,
 		     ip6_make_flowlabel(net, skb, fl6->flowlabel,
 					ip6_autoflowlabel(net, np), fl6));
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next 11/11] ipv6: clean up ip6_setup_cork
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 10/11] ipv6: improve opt-less __ip6_make_skb() Pavel Begunkov
@ 2022-04-28 10:56 ` Pavel Begunkov
  2022-04-28 14:04 ` [PATCH net-next 00/11] UDP/IPv6 refactoring Paolo Abeni
  11 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 10:56 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Do a bit of refactoring for ip6_setup_cork(). Cache a xfrm_dst_path()
result to not call it twice, reshuffle ifs to not repeat some parts
twice and so.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 416d14299242..a17b26d5f34d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1358,15 +1358,13 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	unsigned int mtu;
 	struct ipv6_txoptions *nopt, *opt = ipc6->opt;
+	struct dst_entry *xrfm_dst;
 
 	/* callers pass dst together with a reference, set it first so
 	 * ip6_cork_release() can put it down even in case of an error.
 	 */
 	cork->base.dst = &rt->dst;
 
-	/*
-	 * setup for corking
-	 */
 	if (opt) {
 		if (WARN_ON(v6_cork->opt))
 			return -EINVAL;
@@ -1399,28 +1397,26 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	}
 	v6_cork->hop_limit = ipc6->hlimit;
 	v6_cork->tclass = ipc6->tclass;
-	if (rt->dst.flags & DST_XFRM_TUNNEL)
-		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
+
+	xrfm_dst = xfrm_dst_path(&rt->dst);
+	if (dst_allfrag(xrfm_dst))
+		cork->base.flags |= IPCORK_ALLFRAG;
+
+	if (np->pmtudisc < IPV6_PMTUDISC_PROBE)
+		mtu = dst_mtu(rt->dst.flags & DST_XFRM_TUNNEL ? &rt->dst : xrfm_dst);
 	else
-		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-			READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst));
-	if (np->frag_size < mtu) {
-		if (np->frag_size)
-			mtu = np->frag_size;
-	}
+		mtu = READ_ONCE(rt->dst.dev->mtu);
+
+	if (np->frag_size < mtu && np->frag_size)
+		mtu = np->frag_size;
+
 	cork->base.fragsize = mtu;
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
 	cork->base.mark = ipc6->sockc.mark;
 	sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
-
-	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
-		cork->base.flags |= IPCORK_ALLFRAG;
 	cork->base.length = 0;
-
 	cork->base.transmit_time = ipc6->sockc.transmit_time;
-
 	return 0;
 }
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/11] UDP/IPv6 refactoring
  2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
                   ` (10 preceding siblings ...)
  2022-04-28 10:56 ` [PATCH net-next 11/11] ipv6: clean up ip6_setup_cork Pavel Begunkov
@ 2022-04-28 14:04 ` Paolo Abeni
  2022-04-28 15:03   ` Pavel Begunkov
  11 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2022-04-28 14:04 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
> cleaner than it was before and the series also removes a bunch of instructions
> and other overhead from the hot path positively affecting performance.
> 
> It was a part of a larger series, there were some perf numbers for it, see
> https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/
> 
> Pavel Begunkov (11):
>   ipv6: optimise ipcm6 cookie init
>   udp/ipv6: refactor udpv6_sendmsg udplite checks
>   udp/ipv6: move pending section of udpv6_sendmsg
>   udp/ipv6: prioritise the ip6 path over ip4 checks
>   udp/ipv6: optimise udpv6_sendmsg() daddr checks
>   udp/ipv6: optimise out daddr reassignment
>   udp/ipv6: clean up udpv6_sendmsg's saddr init
>   ipv6: partially inline fl6_update_dst()
>   ipv6: refactor opts push in __ip6_make_skb()
>   ipv6: improve opt-less __ip6_make_skb()
>   ipv6: clean up ip6_setup_cork
> 
>  include/net/ipv6.h    |  24 +++----
>  net/ipv6/datagram.c   |   4 +-
>  net/ipv6/exthdrs.c    |  15 ++--
>  net/ipv6/ip6_output.c |  53 +++++++-------
>  net/ipv6/raw.c        |   8 +--
>  net/ipv6/udp.c        | 158 ++++++++++++++++++++----------------------
>  net/l2tp/l2tp_ip6.c   |   8 +--
>  7 files changed, 122 insertions(+), 148 deletions(-)

Just a general comment here: IMHO the above diffstat is quite
significant and some patches looks completely non trivial to me.

I think we need a quite significant performance gain to justify the
above, could you please share your performace data, comprising the
testing scenario?

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init
  2022-04-28 10:56 ` [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init Pavel Begunkov
@ 2022-04-28 14:04   ` Paolo Abeni
  2022-04-28 15:27     ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2022-04-28 14:04 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
> Users of ipcm6_init() have a somewhat complex post initialisation
> of ->dontfrag and ->tclass. Not only it adds additional overhead,
> but also complicates the code.
> 
> First, replace ipcm6_init() with ipcm6_init_sk(). As it might be not an
> equivalent change, let's first look at ->dontfrag. The logic was to set
> it from cmsg if specified and otherwise fallback to np->dontfrag. Now
> it's initialising to np->dontfrag in the beginning and then potentially
> overriding with cmsg, which is absolutely the same behaviour.
> 
> It's a bit more complex with ->tclass as ip6_datagram_send_ctl() might
> set it to -1, which is a default and not valid value. The solution
> here is to skip -1's specified in cmsg, so it'll be left with the socket
> default value getting us to the old behaviour.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/net/ipv6.h    | 9 ---------
>  net/ipv6/datagram.c   | 4 ++--
>  net/ipv6/ip6_output.c | 2 --
>  net/ipv6/raw.c        | 8 +-------
>  net/ipv6/udp.c        | 7 +------
>  net/l2tp/l2tp_ip6.c   | 8 +-------
>  6 files changed, 5 insertions(+), 33 deletions(-)
> 
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 213612f1680c..30a3447e34b4 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -352,15 +352,6 @@ struct ipcm6_cookie {
>  	struct ipv6_txoptions *opt;
>  };
>  
> -static inline void ipcm6_init(struct ipcm6_cookie *ipc6)
> -{
> -	*ipc6 = (struct ipcm6_cookie) {
> -		.hlimit = -1,
> -		.tclass = -1,
> -		.dontfrag = -1,
> -	};
> -}
> -
>  static inline void ipcm6_init_sk(struct ipcm6_cookie *ipc6,
>  				 const struct ipv6_pinfo *np)
>  {
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index 206f66310a88..1b334bc855ae 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -1003,9 +1003,9 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>  			if (tc < -1 || tc > 0xff)
>  				goto exit_f;
>  
> +			if (tc != -1)
> +				ipc6->tclass = tc;
>  			err = 0;
> -			ipc6->tclass = tc;
> -
>  			break;
>  		    }

It looks like the above causes a behavioral change: before this patch
cmsg took precedence on socket status, after this patch looks like it's
the opposide.

Am I missing something?

Thanks

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks
  2022-04-28 10:56 ` [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks Pavel Begunkov
@ 2022-04-28 14:09   ` Paolo Abeni
  2022-04-28 15:11     ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2022-04-28 14:09 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
> Don't save a IS_UDPLITE() result in advance but do when it's really
> needed, so it doesn't store/load it from the stack. Same for resolving
> the getfrag callback pointer.

It's quite unclear to me if this change brings really any performance
benefit. The end results will depend a lot on the optimization
performed by the compiler, and IMHO the code looks better before this
modifications.

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 00/11] UDP/IPv6 refactoring
  2022-04-28 14:04 ` [PATCH net-next 00/11] UDP/IPv6 refactoring Paolo Abeni
@ 2022-04-28 15:03   ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 15:03 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On 4/28/22 15:04, Paolo Abeni wrote:
> On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
>> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
>> cleaner than it was before and the series also removes a bunch of instructions
>> and other overhead from the hot path positively affecting performance.
>>
>> It was a part of a larger series, there were some perf numbers for it, see
>> https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/
>>
>> Pavel Begunkov (11):
>>    ipv6: optimise ipcm6 cookie init
>>    udp/ipv6: refactor udpv6_sendmsg udplite checks
>>    udp/ipv6: move pending section of udpv6_sendmsg
>>    udp/ipv6: prioritise the ip6 path over ip4 checks
>>    udp/ipv6: optimise udpv6_sendmsg() daddr checks
>>    udp/ipv6: optimise out daddr reassignment
>>    udp/ipv6: clean up udpv6_sendmsg's saddr init
>>    ipv6: partially inline fl6_update_dst()
>>    ipv6: refactor opts push in __ip6_make_skb()
>>    ipv6: improve opt-less __ip6_make_skb()
>>    ipv6: clean up ip6_setup_cork
>>
>>   include/net/ipv6.h    |  24 +++----
>>   net/ipv6/datagram.c   |   4 +-
>>   net/ipv6/exthdrs.c    |  15 ++--
>>   net/ipv6/ip6_output.c |  53 +++++++-------
>>   net/ipv6/raw.c        |   8 +--
>>   net/ipv6/udp.c        | 158 ++++++++++++++++++++----------------------
>>   net/l2tp/l2tp_ip6.c   |   8 +--
>>   7 files changed, 122 insertions(+), 148 deletions(-)
> 
> Just a general comment here: IMHO the above diffstat is quite
> significant and some patches looks completely non trivial to me.
> 
> I think we need a quite significant performance gain to justify the
> above, could you please share your performace data, comprising the
> testing scenario?

As mentioned I benchmarked it with a UDP/IPv6 max throughput kind of
test and only as a part of a larger series [1]. It was "2090K vs
2229K tx/s, +6.6%". Taking into account +3% from split out sock_wfree
optimisations, half if not most of the rest should be accounted to this
series, so a bit hand-wavingly +1-3%. Can spend some extra time
retesting this particular series if strongly required...

I was using [2], which is basically an io_uring copy of send paths of
selftests/net/msg_zerocopy. Should be visible with other tools, this
one just alleviates context switch / etc. overhead with io_uring.

./send-zc -6 udp -D <address> -t <time> -s16 -z0

It sends a number of 16 bytes UDP/ipv6 (non-zerocopy) send requests over
io_uring, then waits for them and repeats. It was 8 (default) requests
per iteration (i.e. syscall). I was using dummy netdev, so there is no
actual receiver, but it quite correlates with my server setup with mlx
cards, just takes more effort for me to test. And all with
mitigations=off

There might be some fatter targets to optimise, but udpv6_sendmsg()
and functions around take a good chunk of cycles as well, though without
particular hotspots. If we'd want some better justification than 1-3%,
then need to add more work on top, adding even more to diffstat...
vicious cycle.


[1] https://lore.kernel.org/netdev/cover.1648981570.git.asml.silence@gmail.com/
[2] https://github.com/isilence/liburing/blob/zc_v3/test/send-zc.c

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks
  2022-04-28 14:09   ` Paolo Abeni
@ 2022-04-28 15:11     ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 15:11 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On 4/28/22 15:09, Paolo Abeni wrote:
> On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
>> Don't save a IS_UDPLITE() result in advance but do when it's really
>> needed, so it doesn't store/load it from the stack. Same for resolving
>> the getfrag callback pointer.
> 
> It's quite unclear to me if this change brings really any performance
> benefit. The end results will depend a lot on the optimization
> performed by the compiler, and IMHO the code looks better before this
> modifications.

There is a lot of code and function calls between IS_UDPLITE() and
use sites, because of alias analysis the compiler will be forced
to call it early in the function and store something on stack.
I don't believe it will be able to keep in a register. But it's
not a problem to drop it

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init
  2022-04-28 14:04   ` Paolo Abeni
@ 2022-04-28 15:27     ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-04-28 15:27 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On 4/28/22 15:04, Paolo Abeni wrote:
> On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
>> Users of ipcm6_init() have a somewhat complex post initialisation
>> of ->dontfrag and ->tclass. Not only it adds additional overhead,
>> but also complicates the code.
>>
>> First, replace ipcm6_init() with ipcm6_init_sk(). As it might be not an
>> equivalent change, let's first look at ->dontfrag. The logic was to set
>> it from cmsg if specified and otherwise fallback to np->dontfrag. Now
>> it's initialising to np->dontfrag in the beginning and then potentially
>> overriding with cmsg, which is absolutely the same behaviour.
>>
>> It's a bit more complex with ->tclass as ip6_datagram_send_ctl() might
>> set it to -1, which is a default and not valid value. The solution
>> here is to skip -1's specified in cmsg, so it'll be left with the socket
>> default value getting us to the old behaviour.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/net/ipv6.h    | 9 ---------
>>   net/ipv6/datagram.c   | 4 ++--
>>   net/ipv6/ip6_output.c | 2 --
>>   net/ipv6/raw.c        | 8 +-------
>>   net/ipv6/udp.c        | 7 +------
>>   net/l2tp/l2tp_ip6.c   | 8 +-------
>>   6 files changed, 5 insertions(+), 33 deletions(-)
>>
>> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
>> index 213612f1680c..30a3447e34b4 100644
>> --- a/include/net/ipv6.h
>> +++ b/include/net/ipv6.h
>> @@ -352,15 +352,6 @@ struct ipcm6_cookie {
>>   	struct ipv6_txoptions *opt;
>>   };
>>   
>> -static inline void ipcm6_init(struct ipcm6_cookie *ipc6)
>> -{
>> -	*ipc6 = (struct ipcm6_cookie) {
>> -		.hlimit = -1,
>> -		.tclass = -1,
>> -		.dontfrag = -1,
>> -	};
>> -}
>> -
>>   static inline void ipcm6_init_sk(struct ipcm6_cookie *ipc6,
>>   				 const struct ipv6_pinfo *np)
>>   {
>> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
>> index 206f66310a88..1b334bc855ae 100644
>> --- a/net/ipv6/datagram.c
>> +++ b/net/ipv6/datagram.c
>> @@ -1003,9 +1003,9 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
>>   			if (tc < -1 || tc > 0xff)
>>   				goto exit_f;
>>   
>> +			if (tc != -1)
>> +				ipc6->tclass = tc;
>>   			err = 0;
>> -			ipc6->tclass = tc;
>> -
>>   			break;
>>   		    }
> 
> It looks like the above causes a behavioral change: before this patch
> cmsg took precedence on socket status, after this patch looks like it's
> the opposide.
> 
> Am I missing something?

before:

ipc6.tclass = -1;
if (cmsg)
	ip6_datagram_send_ctl(&ipc6);
if (ipc6.tclass < 0)
	ipc6.tclass = np->tclass;

after:

ipc6.tclass = np->tclass; // ipcm6_init_sk()
if (cmsg)
	ip6_datagram_send_ctl(&ipc6);


Both should prioritise cmsg. The only catch is when tclass is
specified in cmsg but it's -1. The old version would assign
np->tclass in the end, the new one does the same but with
this added "if" in ip6_datagram_send_ctl() in the chunk
you quoted. Unless I missed something as well.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-04-28 15:28 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-28 10:56 [PATCH net-next 00/11] UDP/IPv6 refactoring Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 01/11] ipv6: optimise ipcm6 cookie init Pavel Begunkov
2022-04-28 14:04   ` Paolo Abeni
2022-04-28 15:27     ` Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 02/11] udp/ipv6: refactor udpv6_sendmsg udplite checks Pavel Begunkov
2022-04-28 14:09   ` Paolo Abeni
2022-04-28 15:11     ` Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 03/11] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 04/11] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 05/11] udp/ipv6: optimise udpv6_sendmsg() daddr checks Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 06/11] udp/ipv6: optimise out daddr reassignment Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 07/11] udp/ipv6: clean up udpv6_sendmsg's saddr init Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 08/11] ipv6: partially inline fl6_update_dst() Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 09/11] ipv6: refactor opts push in __ip6_make_skb() Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 10/11] ipv6: improve opt-less __ip6_make_skb() Pavel Begunkov
2022-04-28 10:56 ` [PATCH net-next 11/11] ipv6: clean up ip6_setup_cork Pavel Begunkov
2022-04-28 14:04 ` [PATCH net-next 00/11] UDP/IPv6 refactoring Paolo Abeni
2022-04-28 15:03   ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.