linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 00/10] UDP/IPv6 refactoring
@ 2022-05-13 15:26 Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 01/10] ipv6: optimise ipcm6 cookie init Pavel Begunkov
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
cleaner than it was before and the series also removes a bunch of instructions
and other overhead from the hot path positively affecting performance.

Testing over dummy netdev with 16 byte packets yields 2240481 tx/s,
comparing to 2203417 tx/s previously, which is around +1.6%

v2: no code changes, just resending properly
v3: remove patch moving getfrag callback assignment
    add benchmark numbers

Pavel Begunkov (10):
  ipv6: optimise ipcm6 cookie init
  udp/ipv6: move pending section of udpv6_sendmsg
  udp/ipv6: prioritise the ip6 path over ip4 checks
  udp/ipv6: optimise udpv6_sendmsg() daddr checks
  udp/ipv6: optimise out daddr reassignment
  udp/ipv6: clean up udpv6_sendmsg's saddr init
  ipv6: partially inline fl6_update_dst()
  ipv6: refactor opts push in __ip6_make_skb()
  ipv6: improve opt-less __ip6_make_skb()
  ipv6: clean up ip6_setup_cork

 include/net/ipv6.h    |  24 +++----
 net/ipv6/datagram.c   |   4 +-
 net/ipv6/exthdrs.c    |  15 ++---
 net/ipv6/ip6_output.c |  53 +++++++--------
 net/ipv6/raw.c        |   8 +--
 net/ipv6/udp.c        | 153 ++++++++++++++++++++----------------------
 net/l2tp/l2tp_ip6.c   |   8 +--
 7 files changed, 120 insertions(+), 145 deletions(-)

-- 
2.36.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 01/10] ipv6: optimise ipcm6 cookie init
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

The common pattern for setting up ipcm6 cookies is to call ipcm6_init()
initialising ->dontfrag and tclass fields to -1, which is a special
value, and then if the fields haven't changed set it to some default
value. For instance

ipcm6_init(&ipc6); // ipc6.tclass = -1;
if (cmsg)
    ip6_datagram_send_ctl(&ipc6);
if (ipc6.tclass < 0)
    ipc6.tclass = np->tclass;

This prioritieses cmsg over the socket status. This patches changes it
to ipcm6_init_sk(), which initially sets those fields to the socket
default values, and then lets cmsg to override it:

ipcm6_init_sk(&ipc6); // ipc6.tclass = np->tclass;
if (cmsg)
    ip6_datagram_send_ctl(&ipc6);

It sets it to the cmsg value if specified and leaves the socket default
if not. One difference with this approach is when cmsg sets ->tclass to
the special value, i.e. -1, and the old version would catch it and
initialise. Thus, this patch also modifies ip6_datagram_send_ctl() to
ignore cmsg trying to assign -1 to the ->tclass field.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/ipv6.h    | 9 ---------
 net/ipv6/datagram.c   | 4 ++--
 net/ipv6/ip6_output.c | 2 --
 net/ipv6/raw.c        | 8 +-------
 net/ipv6/udp.c        | 7 +------
 net/l2tp/l2tp_ip6.c   | 8 +-------
 6 files changed, 5 insertions(+), 33 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 213612f1680c..30a3447e34b4 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -352,15 +352,6 @@ struct ipcm6_cookie {
 	struct ipv6_txoptions *opt;
 };
 
-static inline void ipcm6_init(struct ipcm6_cookie *ipc6)
-{
-	*ipc6 = (struct ipcm6_cookie) {
-		.hlimit = -1,
-		.tclass = -1,
-		.dontfrag = -1,
-	};
-}
-
 static inline void ipcm6_init_sk(struct ipcm6_cookie *ipc6,
 				 const struct ipv6_pinfo *np)
 {
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 39b2327edc4e..3a2ae188d08b 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -1003,9 +1003,9 @@ int ip6_datagram_send_ctl(struct net *net, struct sock *sk,
 			if (tc < -1 || tc > 0xff)
 				goto exit_f;
 
+			if (tc != -1)
+				ipc6->tclass = tc;
 			err = 0;
-			ipc6->tclass = tc;
-
 			break;
 		    }
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index afa5bd4ad167..53c0e33e3899 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -2002,8 +2002,6 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
 		ip6_cork_release(cork, &v6_cork);
 		return ERR_PTR(err);
 	}
-	if (ipc6->dontfrag < 0)
-		ipc6->dontfrag = inet6_sk(sk)->dontfrag;
 
 	err = __ip6_append_data(sk, &queue, cork, &v6_cork,
 				&current->task_frag, getfrag, from,
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 3b7cbd522b54..402e4d9e3f82 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -808,7 +808,7 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	fl6.flowi6_mark = sk->sk_mark;
 	fl6.flowi6_uid = sk->sk_uid;
 
-	ipcm6_init(&ipc6);
+	ipcm6_init_sk(&ipc6, np);
 	ipc6.sockc.tsflags = sk->sk_tsflags;
 	ipc6.sockc.mark = sk->sk_mark;
 
@@ -920,9 +920,6 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (hdrincl)
 		fl6.flowi6_flags |= FLOWI_FLAG_KNOWN_NH;
 
-	if (ipc6.tclass < 0)
-		ipc6.tclass = np->tclass;
-
 	fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
 
 	dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p);
@@ -933,9 +930,6 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (ipc6.hlimit < 0)
 		ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
 
-	if (ipc6.dontfrag < 0)
-		ipc6.dontfrag = np->dontfrag;
-
 	if (msg->msg_flags&MSG_CONFIRM)
 		goto do_confirm;
 
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3fc97d4621ac..11d44ed46953 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1313,7 +1313,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	int is_udplite = IS_UDPLITE(sk);
 	int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
 
-	ipcm6_init(&ipc6);
+	ipcm6_init_sk(&ipc6, np);
 	ipc6.gso_size = READ_ONCE(up->gso_size);
 	ipc6.sockc.tsflags = sk->sk_tsflags;
 	ipc6.sockc.mark = sk->sk_mark;
@@ -1518,9 +1518,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	security_sk_classify_flow(sk, flowi6_to_flowi_common(fl6));
 
-	if (ipc6.tclass < 0)
-		ipc6.tclass = np->tclass;
-
 	fl6->flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6->flowlabel);
 
 	dst = ip6_sk_dst_lookup_flow(sk, fl6, final_p, connected);
@@ -1566,8 +1563,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	up->pending = AF_INET6;
 
 do_append_data:
-	if (ipc6.dontfrag < 0)
-		ipc6.dontfrag = np->dontfrag;
 	up->len += ulen;
 	err = ip6_append_data(sk, getfrag, msg, ulen, sizeof(struct udphdr),
 			      &ipc6, fl6, (struct rt6_info *)dst,
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 217c7192691e..12406789bb28 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -521,7 +521,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	fl6.flowi6_mark = sk->sk_mark;
 	fl6.flowi6_uid = sk->sk_uid;
 
-	ipcm6_init(&ipc6);
+	ipcm6_init_sk(&ipc6, np);
 
 	if (lsa) {
 		if (addr_len < SIN6_LEN_RFC2133)
@@ -608,9 +608,6 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	security_sk_classify_flow(sk, flowi6_to_flowi_common(&fl6));
 
-	if (ipc6.tclass < 0)
-		ipc6.tclass = np->tclass;
-
 	fl6.flowlabel = ip6_make_flowinfo(ipc6.tclass, fl6.flowlabel);
 
 	dst = ip6_dst_lookup_flow(sock_net(sk), sk, &fl6, final_p);
@@ -622,9 +619,6 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	if (ipc6.hlimit < 0)
 		ipc6.hlimit = ip6_sk_dst_hoplimit(np, &fl6, dst);
 
-	if (ipc6.dontfrag < 0)
-		ipc6.dontfrag = np->dontfrag;
-
 	if (msg->msg_flags & MSG_CONFIRM)
 		goto do_confirm;
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 01/10] ipv6: optimise ipcm6 cookie init Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-16 13:11   ` Paolo Abeni
  2022-05-13 15:26 ` [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Move up->pending section of udpv6_sendmsg() to the beginning of the
function. Even though it require some code duplication for sin6 parsing,
it clearly localises the pending handling in one place, removes an extra
if and more importantly will prepare the code for further patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 70 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 42 insertions(+), 28 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 11d44ed46953..85bff1252f5c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1318,6 +1318,46 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 	ipc6.sockc.tsflags = sk->sk_tsflags;
 	ipc6.sockc.mark = sk->sk_mark;
 
+	/* Rough check on arithmetic overflow,
+	   better check is made in ip6_append_data().
+	   */
+	if (unlikely(len > INT_MAX - sizeof(struct udphdr)))
+		return -EMSGSIZE;
+
+	getfrag  =  is_udplite ?  udplite_getfrag : ip_generic_getfrag;
+
+	/* There are pending frames. */
+	if (up->pending) {
+		if (up->pending == AF_INET)
+			return udp_sendmsg(sk, msg, len);
+
+		/* Do a quick destination sanity check before corking. */
+		if (sin6) {
+			if (msg->msg_namelen < offsetof(struct sockaddr, sa_data))
+				return -EINVAL;
+			if (sin6->sin6_family == AF_INET6) {
+				if (msg->msg_namelen < SIN6_LEN_RFC2133)
+					return -EINVAL;
+				if (ipv6_addr_any(&sin6->sin6_addr) &&
+				    ipv6_addr_v4mapped(&np->saddr))
+					return -EINVAL;
+			} else if (sin6->sin6_family != AF_UNSPEC) {
+				return -EINVAL;
+			}
+		}
+
+		/* The socket lock must be held while it's corked. */
+		lock_sock(sk);
+		if (unlikely(up->pending != AF_INET6)) {
+			/* Just now it was seen corked, userspace is buggy */
+			err = up->pending ? -EAFNOSUPPORT : -EINVAL;
+			release_sock(sk);
+			return err;
+		}
+		dst = NULL;
+		goto do_append_data;
+	}
+
 	/* destination address check */
 	if (sin6) {
 		if (addr_len < offsetof(struct sockaddr, sa_data))
@@ -1343,12 +1383,11 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		default:
 			return -EINVAL;
 		}
-	} else if (!up->pending) {
+	} else {
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 		daddr = &sk->sk_v6_daddr;
-	} else
-		daddr = NULL;
+	}
 
 	if (daddr) {
 		if (ipv6_addr_v4mapped(daddr)) {
@@ -1365,31 +1404,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		}
 	}
 
-	/* Rough check on arithmetic overflow,
-	   better check is made in ip6_append_data().
-	   */
-	if (len > INT_MAX - sizeof(struct udphdr))
-		return -EMSGSIZE;
-
-	getfrag  =  is_udplite ?  udplite_getfrag : ip_generic_getfrag;
-	if (up->pending) {
-		if (up->pending == AF_INET)
-			return udp_sendmsg(sk, msg, len);
-		/*
-		 * There are pending frames.
-		 * The socket lock must be held while it's corked.
-		 */
-		lock_sock(sk);
-		if (likely(up->pending)) {
-			if (unlikely(up->pending != AF_INET6)) {
-				release_sock(sk);
-				return -EAFNOSUPPORT;
-			}
-			dst = NULL;
-			goto do_append_data;
-		}
-		release_sock(sk);
-	}
 	ulen += sizeof(struct udphdr);
 
 	memset(fl6, 0, sizeof(*fl6));
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 01/10] ipv6: optimise ipcm6 cookie init Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-16 13:14   ` Paolo Abeni
  2022-05-13 15:26 ` [PATCH net-next v3 04/10] udp/ipv6: optimise udpv6_sendmsg() daddr checks Pavel Begunkov
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

For AF_INET6 sockets we care the most about ipv6 but not ip4 mappings as
it's requires some extra hops anyway. Take AF_INET6 case from the address
parsing switch and add an explicit path for it. It removes some extra
ifs from the path and removes the switch overhead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 37 +++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 85bff1252f5c..e0b1bea998ce 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1360,30 +1360,27 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	/* destination address check */
 	if (sin6) {
-		if (addr_len < offsetof(struct sockaddr, sa_data))
-			return -EINVAL;
+		if (addr_len < SIN6_LEN_RFC2133 || sin6->sin6_family != AF_INET6) {
+			if (addr_len < offsetof(struct sockaddr, sa_data))
+				return -EINVAL;
 
-		switch (sin6->sin6_family) {
-		case AF_INET6:
-			if (addr_len < SIN6_LEN_RFC2133)
+			switch (sin6->sin6_family) {
+			case AF_INET:
+				goto do_udp_sendmsg;
+			case AF_UNSPEC:
+				msg->msg_name = sin6 = NULL;
+				msg->msg_namelen = addr_len = 0;
+				goto no_daddr;
+			default:
 				return -EINVAL;
-			daddr = &sin6->sin6_addr;
-			if (ipv6_addr_any(daddr) &&
-			    ipv6_addr_v4mapped(&np->saddr))
-				ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK),
-						       daddr);
-			break;
-		case AF_INET:
-			goto do_udp_sendmsg;
-		case AF_UNSPEC:
-			msg->msg_name = sin6 = NULL;
-			msg->msg_namelen = addr_len = 0;
-			daddr = NULL;
-			break;
-		default:
-			return -EINVAL;
+			}
 		}
+
+		daddr = &sin6->sin6_addr;
+		if (ipv6_addr_any(daddr) && ipv6_addr_v4mapped(&np->saddr))
+			ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK), daddr);
 	} else {
+no_daddr:
 		if (sk->sk_state != TCP_ESTABLISHED)
 			return -EDESTADDRREQ;
 		daddr = &sk->sk_v6_daddr;
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 04/10] udp/ipv6: optimise udpv6_sendmsg() daddr checks
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 05/10] udp/ipv6: optimise out daddr reassignment Pavel Begunkov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

All paths taking udpv6_sendmsg() to the ipv6_addr_v4mapped() check set a
non zero daddr, we can safely kill the NULL check just before it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 23 +++++++++++------------
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index e0b1bea998ce..8a37e2d7b14b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1386,19 +1386,18 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		daddr = &sk->sk_v6_daddr;
 	}
 
-	if (daddr) {
-		if (ipv6_addr_v4mapped(daddr)) {
-			struct sockaddr_in sin;
-			sin.sin_family = AF_INET;
-			sin.sin_port = sin6 ? sin6->sin6_port : inet->inet_dport;
-			sin.sin_addr.s_addr = daddr->s6_addr32[3];
-			msg->msg_name = &sin;
-			msg->msg_namelen = sizeof(sin);
+	if (ipv6_addr_v4mapped(daddr)) {
+		struct sockaddr_in sin;
+
+		sin.sin_family = AF_INET;
+		sin.sin_port = sin6 ? sin6->sin6_port : inet->inet_dport;
+		sin.sin_addr.s_addr = daddr->s6_addr32[3];
+		msg->msg_name = &sin;
+		msg->msg_namelen = sizeof(sin);
 do_udp_sendmsg:
-			if (ipv6_only_sock(sk))
-				return -ENETUNREACH;
-			return udp_sendmsg(sk, msg, len);
-		}
+		if (ipv6_only_sock(sk))
+			return -ENETUNREACH;
+		return udp_sendmsg(sk, msg, len);
 	}
 
 	ulen += sizeof(struct udphdr);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 05/10] udp/ipv6: optimise out daddr reassignment
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 04/10] udp/ipv6: optimise udpv6_sendmsg() daddr checks Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 06/10] udp/ipv6: clean up udpv6_sendmsg's saddr init Pavel Begunkov
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

There is nothing that checks daddr placement in udpv6_sendmsg(), so the
check reassigning it to ->sk_v6_daddr looks like a not needed anymore
artifact from the past. Remove it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 8a37e2d7b14b..61dbe2f04675 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1420,14 +1420,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 			}
 		}
 
-		/*
-		 * Otherwise it will be difficult to maintain
-		 * sk->sk_dst_cache.
-		 */
-		if (sk->sk_state == TCP_ESTABLISHED &&
-		    ipv6_addr_equal(daddr, &sk->sk_v6_daddr))
-			daddr = &sk->sk_v6_daddr;
-
 		if (addr_len >= sizeof(struct sockaddr_in6) &&
 		    sin6->sin6_scope_id &&
 		    __ipv6_addr_needs_scope_id(__ipv6_addr_type(daddr)))
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 06/10] udp/ipv6: clean up udpv6_sendmsg's saddr init
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 05/10] udp/ipv6: optimise out daddr reassignment Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 07/10] ipv6: partially inline fl6_update_dst() Pavel Begunkov
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

We initialise fl6 in udpv6_sendmsg() to zeroes, that sets saddr to any
addr, then it might be changed in by cmsg but only to a non-any addr.
After we check again for it left set to "any", which is likely to be so,
and try to initialise it from socket saddr.

The result of it is that fl6->saddr is set to cmsg's saddr if specified
and inet6_sk(sk)->saddr otherwise. We can achieve the same by
pre-setting it to the sockets saddr and potentially overriding by cmsg
after.

This looks a bit cleaner comparing to conditional init and also removes
extra checks from the way.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/udp.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 61dbe2f04675..9bd317c2b67f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1434,14 +1434,15 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 		connected = true;
 	}
 
+	fl6->flowi6_uid = sk->sk_uid;
+	fl6->saddr = np->saddr;
+	fl6->daddr = *daddr;
+
 	if (!fl6->flowi6_oif)
 		fl6->flowi6_oif = sk->sk_bound_dev_if;
-
 	if (!fl6->flowi6_oif)
 		fl6->flowi6_oif = np->sticky_pktinfo.ipi6_ifindex;
 
-	fl6->flowi6_uid = sk->sk_uid;
-
 	if (msg->msg_controllen) {
 		opt = &opt_space;
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
@@ -1476,9 +1477,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
 
 	fl6->flowi6_proto = sk->sk_protocol;
 	fl6->flowi6_mark = ipc6.sockc.mark;
-	fl6->daddr = *daddr;
-	if (ipv6_addr_any(&fl6->saddr) && !ipv6_addr_any(&np->saddr))
-		fl6->saddr = np->saddr;
 	fl6->fl6_sport = inet->inet_sport;
 
 	if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) {
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 07/10] ipv6: partially inline fl6_update_dst()
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 06/10] udp/ipv6: clean up udpv6_sendmsg's saddr init Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 08/10] ipv6: refactor opts push in __ip6_make_skb() Pavel Begunkov
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

fl6_update_dst() doesn't do anything when there are no opts passed.
Inline the null checking part.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/net/ipv6.h | 15 ++++++++++++---
 net/ipv6/exthdrs.c | 15 ++++++---------
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 30a3447e34b4..b9848fcd6954 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -1094,9 +1094,18 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset, int target,
 
 int ipv6_find_tlv(const struct sk_buff *skb, int offset, int type);
 
-struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
-				const struct ipv6_txoptions *opt,
-				struct in6_addr *orig);
+struct in6_addr *__fl6_update_dst(struct flowi6 *fl6,
+				  const struct ipv6_txoptions *opt,
+				  struct in6_addr *orig);
+
+static inline struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
+					      const struct ipv6_txoptions *opt,
+					      struct in6_addr *orig)
+{
+	if (!opt || !opt->srcrt)
+		return NULL;
+	return __fl6_update_dst(fl6, opt, orig);
+}
 
 /*
  *	socket options (ipv6_sockglue.c)
diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index a8d961d3a477..d02c27d4f2c2 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -1367,8 +1367,8 @@ struct ipv6_txoptions *__ipv6_fixup_options(struct ipv6_txoptions *opt_space,
 EXPORT_SYMBOL_GPL(__ipv6_fixup_options);
 
 /**
- * fl6_update_dst - update flowi destination address with info given
- *                  by srcrt option, if any.
+ * __fl6_update_dst - update flowi destination address with info given
+ *                    by srcrt option.
  *
  * @fl6: flowi6 for which daddr is to be updated
  * @opt: struct ipv6_txoptions in which to look for srcrt opt
@@ -1377,13 +1377,10 @@ EXPORT_SYMBOL_GPL(__ipv6_fixup_options);
  * Returns NULL if no txoptions or no srcrt, otherwise returns orig
  * and initial value of fl6->daddr set in orig
  */
-struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
-				const struct ipv6_txoptions *opt,
-				struct in6_addr *orig)
+struct in6_addr *__fl6_update_dst(struct flowi6 *fl6,
+				  const struct ipv6_txoptions *opt,
+				  struct in6_addr *orig)
 {
-	if (!opt || !opt->srcrt)
-		return NULL;
-
 	*orig = fl6->daddr;
 
 	switch (opt->srcrt->type) {
@@ -1405,4 +1402,4 @@ struct in6_addr *fl6_update_dst(struct flowi6 *fl6,
 
 	return orig;
 }
-EXPORT_SYMBOL_GPL(fl6_update_dst);
+EXPORT_SYMBOL_GPL(__fl6_update_dst);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 08/10] ipv6: refactor opts push in __ip6_make_skb()
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 07/10] ipv6: partially inline fl6_update_dst() Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 09/10] ipv6: improve opt-less __ip6_make_skb() Pavel Begunkov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Don't preload v6_cork->opt before we actually need it, it likely to be
saved on the stack and read again for no good reason.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 53c0e33e3899..e2a6b9bdf79c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1856,7 +1856,6 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	struct net *net = sock_net(sk);
 	struct ipv6hdr *hdr;
-	struct ipv6_txoptions *opt = v6_cork->opt;
 	struct rt6_info *rt = (struct rt6_info *)cork->base.dst;
 	struct flowi6 *fl6 = &cork->fl.u.ip6;
 	unsigned char proto = fl6->flowi6_proto;
@@ -1885,10 +1884,14 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 	__skb_pull(skb, skb_network_header_len(skb));
 
 	final_dst = &fl6->daddr;
-	if (opt && opt->opt_flen)
-		ipv6_push_frag_opts(skb, opt, &proto);
-	if (opt && opt->opt_nflen)
-		ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+	if (v6_cork->opt) {
+		struct ipv6_txoptions *opt = v6_cork->opt;
+
+		if (opt->opt_flen)
+			ipv6_push_frag_opts(skb, opt, &proto);
+		if (opt->opt_nflen)
+			ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+	}
 
 	skb_push(skb, sizeof(struct ipv6hdr));
 	skb_reset_network_header(skb);
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 09/10] ipv6: improve opt-less __ip6_make_skb()
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 08/10] ipv6: refactor opts push in __ip6_make_skb() Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-13 15:26 ` [PATCH net-next v3 10/10] ipv6: clean up ip6_setup_cork Pavel Begunkov
  2022-05-16 13:48 ` [PATCH net-next v3 00/10] UDP/IPv6 refactoring Paolo Abeni
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

We do a bit of a network header pointer shuffling in __ip6_make_skb()
expecting that ipv6_push_*frag_opts() might change the layout. Avoid it
with associated overhead when there are no opts.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index e2a6b9bdf79c..6ee44c509485 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1881,22 +1881,20 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
 
 	/* Allow local fragmentation. */
 	skb->ignore_df = ip6_sk_ignore_df(sk);
-	__skb_pull(skb, skb_network_header_len(skb));
-
 	final_dst = &fl6->daddr;
 	if (v6_cork->opt) {
 		struct ipv6_txoptions *opt = v6_cork->opt;
 
+		__skb_pull(skb, skb_network_header_len(skb));
 		if (opt->opt_flen)
 			ipv6_push_frag_opts(skb, opt, &proto);
 		if (opt->opt_nflen)
 			ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+		skb_push(skb, sizeof(struct ipv6hdr));
+		skb_reset_network_header(skb);
 	}
 
-	skb_push(skb, sizeof(struct ipv6hdr));
-	skb_reset_network_header(skb);
 	hdr = ipv6_hdr(skb);
-
 	ip6_flow_hdr(hdr, v6_cork->tclass,
 		     ip6_make_flowlabel(net, skb, fl6->flowlabel,
 					ip6_autoflowlabel(net, np), fl6));
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next v3 10/10] ipv6: clean up ip6_setup_cork
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 09/10] ipv6: improve opt-less __ip6_make_skb() Pavel Begunkov
@ 2022-05-13 15:26 ` Pavel Begunkov
  2022-05-16 13:48 ` [PATCH net-next v3 00/10] UDP/IPv6 refactoring Paolo Abeni
  10 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-13 15:26 UTC (permalink / raw)
  To: netdev, David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: David Ahern, Eric Dumazet, linux-kernel, Pavel Begunkov

Do a bit of refactoring for ip6_setup_cork(). Cache a xfrm_dst_path()
result to not call it twice, reshuffle ifs to not repeat some parts
twice and so.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 net/ipv6/ip6_output.c | 30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6ee44c509485..61dfe3eca773 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1359,15 +1359,13 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	struct ipv6_pinfo *np = inet6_sk(sk);
 	unsigned int mtu;
 	struct ipv6_txoptions *nopt, *opt = ipc6->opt;
+	struct dst_entry *xrfm_dst;
 
 	/* callers pass dst together with a reference, set it first so
 	 * ip6_cork_release() can put it down even in case of an error.
 	 */
 	cork->base.dst = &rt->dst;
 
-	/*
-	 * setup for corking
-	 */
 	if (opt) {
 		if (WARN_ON(v6_cork->opt))
 			return -EINVAL;
@@ -1400,28 +1398,26 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
 	}
 	v6_cork->hop_limit = ipc6->hlimit;
 	v6_cork->tclass = ipc6->tclass;
-	if (rt->dst.flags & DST_XFRM_TUNNEL)
-		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-		      READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
+
+	xrfm_dst = xfrm_dst_path(&rt->dst);
+	if (dst_allfrag(xrfm_dst))
+		cork->base.flags |= IPCORK_ALLFRAG;
+
+	if (np->pmtudisc < IPV6_PMTUDISC_PROBE)
+		mtu = dst_mtu(rt->dst.flags & DST_XFRM_TUNNEL ? &rt->dst : xrfm_dst);
 	else
-		mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
-			READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst));
-	if (np->frag_size < mtu) {
-		if (np->frag_size)
-			mtu = np->frag_size;
-	}
+		mtu = READ_ONCE(rt->dst.dev->mtu);
+
+	if (np->frag_size < mtu && np->frag_size)
+		mtu = np->frag_size;
+
 	cork->base.fragsize = mtu;
 	cork->base.gso_size = ipc6->gso_size;
 	cork->base.tx_flags = 0;
 	cork->base.mark = ipc6->sockc.mark;
 	sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
-
-	if (dst_allfrag(xfrm_dst_path(&rt->dst)))
-		cork->base.flags |= IPCORK_ALLFRAG;
 	cork->base.length = 0;
-
 	cork->base.transmit_time = ipc6->sockc.transmit_time;
-
 	return 0;
 }
 
-- 
2.36.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg
  2022-05-13 15:26 ` [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
@ 2022-05-16 13:11   ` Paolo Abeni
  2022-05-16 20:09     ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2022-05-16 13:11 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
> Move up->pending section of udpv6_sendmsg() to the beginning of the
> function. Even though it require some code duplication for sin6 parsing,
> it clearly localises the pending handling in one place, removes an extra
> if and more importantly will prepare the code for further patches.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  net/ipv6/udp.c | 70 ++++++++++++++++++++++++++++++--------------------
>  1 file changed, 42 insertions(+), 28 deletions(-)
> 
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 11d44ed46953..85bff1252f5c 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1318,6 +1318,46 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  	ipc6.sockc.tsflags = sk->sk_tsflags;
>  	ipc6.sockc.mark = sk->sk_mark;
>  
> +	/* Rough check on arithmetic overflow,
> +	   better check is made in ip6_append_data().
> +	   */
> +	if (unlikely(len > INT_MAX - sizeof(struct udphdr)))
> +		return -EMSGSIZE;
> +
> +	getfrag  =  is_udplite ?  udplite_getfrag : ip_generic_getfrag;
> +
> +	/* There are pending frames. */
> +	if (up->pending) {
> +		if (up->pending == AF_INET)
> +			return udp_sendmsg(sk, msg, len);
> +
> +		/* Do a quick destination sanity check before corking. */
> +		if (sin6) {
> +			if (msg->msg_namelen < offsetof(struct sockaddr, sa_data))
> +				return -EINVAL;
> +			if (sin6->sin6_family == AF_INET6) {
> +				if (msg->msg_namelen < SIN6_LEN_RFC2133)
> +					return -EINVAL;
> +				if (ipv6_addr_any(&sin6->sin6_addr) &&
> +				    ipv6_addr_v4mapped(&np->saddr))
> +					return -EINVAL;

It looks like 'any' destination with ipv4 mapped source is now
rejected, while the existing code accept it.

/P


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks
  2022-05-13 15:26 ` [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
@ 2022-05-16 13:14   ` Paolo Abeni
  2022-05-16 20:10     ` Pavel Begunkov
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Abeni @ 2022-05-16 13:14 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
> For AF_INET6 sockets we care the most about ipv6 but not ip4 mappings as
> it's requires some extra hops anyway. Take AF_INET6 case from the address
> parsing switch and add an explicit path for it. It removes some extra
> ifs from the path and removes the switch overhead.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  net/ipv6/udp.c | 37 +++++++++++++++++--------------------
>  1 file changed, 17 insertions(+), 20 deletions(-)
> 
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 85bff1252f5c..e0b1bea998ce 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1360,30 +1360,27 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>  
>  	/* destination address check */
>  	if (sin6) {
> -		if (addr_len < offsetof(struct sockaddr, sa_data))
> -			return -EINVAL;
> +		if (addr_len < SIN6_LEN_RFC2133 || sin6->sin6_family != AF_INET6) {
> +			if (addr_len < offsetof(struct sockaddr, sa_data))
> +				return -EINVAL;

I think you can't access 'sin6->sin6_family' before validating the
socket address len, that is before doing:

if (addr_len < offsetof(struct sockaddr, sa_data))

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 00/10] UDP/IPv6 refactoring
  2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-05-13 15:26 ` [PATCH net-next v3 10/10] ipv6: clean up ip6_setup_cork Pavel Begunkov
@ 2022-05-16 13:48 ` Paolo Abeni
  2022-05-16 14:47   ` David Ahern
  2022-05-16 20:48   ` Pavel Begunkov
  10 siblings, 2 replies; 18+ messages in thread
From: Paolo Abeni @ 2022-05-16 13:48 UTC (permalink / raw)
  To: Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

Hello,

On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
> cleaner than it was before and the series also removes a bunch of instructions
> and other overhead from the hot path positively affecting performance.
> 
> Testing over dummy netdev with 16 byte packets yields 2240481 tx/s,
> comparing to 2203417 tx/s previously, which is around +1.6%

I personally feel that some patches in this series have a relevant
chance of introducing functional regressions and e.g. syzbot will not
help to catch them. That risk is IMHO relevant considered that the
performance gain here looks quite limited.

There are a few individual changes that IMHO looks like nice cleanup
e.g. patch 5, 6, 8, 9 and possibly even patch 1.

I suggest to reduce the patchset scope to them.

Thanks!

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 00/10] UDP/IPv6 refactoring
  2022-05-16 13:48 ` [PATCH net-next v3 00/10] UDP/IPv6 refactoring Paolo Abeni
@ 2022-05-16 14:47   ` David Ahern
  2022-05-16 20:48   ` Pavel Begunkov
  1 sibling, 0 replies; 18+ messages in thread
From: David Ahern @ 2022-05-16 14:47 UTC (permalink / raw)
  To: Paolo Abeni, Pavel Begunkov, netdev, David S . Miller, Jakub Kicinski
  Cc: Eric Dumazet, linux-kernel

On 5/16/22 7:48 AM, Paolo Abeni wrote:
> Hello,
> 
> On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
>> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
>> cleaner than it was before and the series also removes a bunch of instructions
>> and other overhead from the hot path positively affecting performance.
>>
>> Testing over dummy netdev with 16 byte packets yields 2240481 tx/s,
>> comparing to 2203417 tx/s previously, which is around +1.6%
> 
> I personally feel that some patches in this series have a relevant
> chance of introducing functional regressions and e.g. syzbot will not
> help to catch them. That risk is IMHO relevant considered that the
> performance gain here looks quite limited.
> 
> There are a few individual changes that IMHO looks like nice cleanup
> e.g. patch 5, 6, 8, 9 and possibly even patch 1.
> 
> I suggest to reduce the patchset scope to them.
> 

I agree with that sentiment. The set also needs testcases that captures
the various permutations.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg
  2022-05-16 13:11   ` Paolo Abeni
@ 2022-05-16 20:09     ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-16 20:09 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On 5/16/22 14:11, Paolo Abeni wrote:
> On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
>> Move up->pending section of udpv6_sendmsg() to the beginning of the
>> function. Even though it require some code duplication for sin6 parsing,
>> it clearly localises the pending handling in one place, removes an extra
>> if and more importantly will prepare the code for further patches.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   net/ipv6/udp.c | 70 ++++++++++++++++++++++++++++++--------------------
>>   1 file changed, 42 insertions(+), 28 deletions(-)
>>
>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>> index 11d44ed46953..85bff1252f5c 100644
>> --- a/net/ipv6/udp.c
>> +++ b/net/ipv6/udp.c
>> @@ -1318,6 +1318,46 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>   	ipc6.sockc.tsflags = sk->sk_tsflags;
>>   	ipc6.sockc.mark = sk->sk_mark;
>>   
>> +	/* Rough check on arithmetic overflow,
>> +	   better check is made in ip6_append_data().
>> +	   */
>> +	if (unlikely(len > INT_MAX - sizeof(struct udphdr)))
>> +		return -EMSGSIZE;
>> +
>> +	getfrag  =  is_udplite ?  udplite_getfrag : ip_generic_getfrag;
>> +
>> +	/* There are pending frames. */
>> +	if (up->pending) {
>> +		if (up->pending == AF_INET)
>> +			return udp_sendmsg(sk, msg, len);
>> +
>> +		/* Do a quick destination sanity check before corking. */
>> +		if (sin6) {
>> +			if (msg->msg_namelen < offsetof(struct sockaddr, sa_data))
>> +				return -EINVAL;
>> +			if (sin6->sin6_family == AF_INET6) {
>> +				if (msg->msg_namelen < SIN6_LEN_RFC2133)
>> +					return -EINVAL;
>> +				if (ipv6_addr_any(&sin6->sin6_addr) &&
>> +				    ipv6_addr_v4mapped(&np->saddr))
>> +					return -EINVAL;
> 
> It looks like 'any' destination with ipv4 mapped source is now
> rejected, while the existing code accept it.

It should be up->pending == AF_INET6 to get there, and previously it'd
fall into udp_sendmsg() and fail

if (unlikely(up->pending != AF_INET))
         return -EINVAL;

I don't see it anyhow rejecting cases that were working before.
Can you elaborate a bit?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks
  2022-05-16 13:14   ` Paolo Abeni
@ 2022-05-16 20:10     ` Pavel Begunkov
  0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-16 20:10 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On 5/16/22 14:14, Paolo Abeni wrote:
> On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
>> For AF_INET6 sockets we care the most about ipv6 but not ip4 mappings as
>> it's requires some extra hops anyway. Take AF_INET6 case from the address
>> parsing switch and add an explicit path for it. It removes some extra
>> ifs from the path and removes the switch overhead.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   net/ipv6/udp.c | 37 +++++++++++++++++--------------------
>>   1 file changed, 17 insertions(+), 20 deletions(-)
>>
>> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
>> index 85bff1252f5c..e0b1bea998ce 100644
>> --- a/net/ipv6/udp.c
>> +++ b/net/ipv6/udp.c
>> @@ -1360,30 +1360,27 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>   
>>   	/* destination address check */
>>   	if (sin6) {
>> -		if (addr_len < offsetof(struct sockaddr, sa_data))
>> -			return -EINVAL;
>> +		if (addr_len < SIN6_LEN_RFC2133 || sin6->sin6_family != AF_INET6) {
>> +			if (addr_len < offsetof(struct sockaddr, sa_data))
>> +				return -EINVAL;
> 
> I think you can't access 'sin6->sin6_family' before validating the
> socket address len, that is before doing:

Paolo, thanks for reviewing it!


sin6_family is protected by

if (addr_len < SIN6_LEN_RFC2133 ...)

on the previous line. I can add a BUILD_BUG_ON() if that
would be more reassuring.


> 
> if (addr_len < offsetof(struct sockaddr, sa_data))

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next v3 00/10] UDP/IPv6 refactoring
  2022-05-16 13:48 ` [PATCH net-next v3 00/10] UDP/IPv6 refactoring Paolo Abeni
  2022-05-16 14:47   ` David Ahern
@ 2022-05-16 20:48   ` Pavel Begunkov
  1 sibling, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2022-05-16 20:48 UTC (permalink / raw)
  To: Paolo Abeni, netdev, David S . Miller, Jakub Kicinski
  Cc: David Ahern, Eric Dumazet, linux-kernel

On 5/16/22 14:48, Paolo Abeni wrote:
> Hello,
> 
> On Fri, 2022-05-13 at 16:26 +0100, Pavel Begunkov wrote:
>> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
>> cleaner than it was before and the series also removes a bunch of instructions
>> and other overhead from the hot path positively affecting performance.
>>
>> Testing over dummy netdev with 16 byte packets yields 2240481 tx/s,
>> comparing to 2203417 tx/s previously, which is around +1.6%
> 
> I personally feel that some patches in this series have a relevant
> chance of introducing functional regressions and e.g. syzbot will not
> help to catch them. That risk is IMHO relevant considered that the
> performance gain here looks quite limited.

I can't say I agree with that. First, I do think the code is much
cleaner having just one block checking corking instead of a couple
of random ifs in different places. Same for sin6. Not to mention
negative line count.

Also, assuming this 1.6% translates to ~0.5-1% with fast NICs, that's
still huge, especially when we get >5GB/s in single core zc tests b/w
servers.

If maintainers are not merging it, I think I'll delay the series until
I get another batch of planned optimisations implemented on top.


> There are a few individual changes that IMHO looks like nice cleanup
> e.g. patch 5, 6, 8, 9 and possibly even patch 1.
> 
> I suggest to reduce the patchset scope to them.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-05-16 21:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-13 15:26 [PATCH net-next v3 00/10] UDP/IPv6 refactoring Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 01/10] ipv6: optimise ipcm6 cookie init Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 02/10] udp/ipv6: move pending section of udpv6_sendmsg Pavel Begunkov
2022-05-16 13:11   ` Paolo Abeni
2022-05-16 20:09     ` Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 03/10] udp/ipv6: prioritise the ip6 path over ip4 checks Pavel Begunkov
2022-05-16 13:14   ` Paolo Abeni
2022-05-16 20:10     ` Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 04/10] udp/ipv6: optimise udpv6_sendmsg() daddr checks Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 05/10] udp/ipv6: optimise out daddr reassignment Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 06/10] udp/ipv6: clean up udpv6_sendmsg's saddr init Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 07/10] ipv6: partially inline fl6_update_dst() Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 08/10] ipv6: refactor opts push in __ip6_make_skb() Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 09/10] ipv6: improve opt-less __ip6_make_skb() Pavel Begunkov
2022-05-13 15:26 ` [PATCH net-next v3 10/10] ipv6: clean up ip6_setup_cork Pavel Begunkov
2022-05-16 13:48 ` [PATCH net-next v3 00/10] UDP/IPv6 refactoring Paolo Abeni
2022-05-16 14:47   ` David Ahern
2022-05-16 20:48   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).