All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
To: <jakub@cloudflare.com>
Cc: <ast@kernel.org>, <daniel@iogearbox.net>, <davem@davemloft.net>,
	<kernel-team@cloudflare.com>, <kuniyu@amazon.co.jp>,
	<linux-kernel@vger.kernel.org>, <linux-next@vger.kernel.org>,
	<netdev@vger.kernel.org>, <sfr@canb.auug.org.au>,
	<willemb@google.com>
Subject: Re: linux-next: manual merge of the bpf-next tree with the net tree
Date: Wed, 22 Jul 2020 23:42:12 +0900	[thread overview]
Message-ID: <20200722144212.27106-1-kuniyu@amazon.co.jp> (raw)
In-Reply-To: <87wo2vwxq6.fsf@cloudflare.com>

From:   Jakub Sitnicki <jakub@cloudflare.com>
Date:   Wed, 22 Jul 2020 14:17:05 +0200
> On Wed, Jul 22, 2020 at 05:21 AM CEST, Stephen Rothwell wrote:
> > Hi all,
> >
> > Today's linux-next merge of the bpf-next tree got conflicts in:
> >
> >   net/ipv4/udp.c
> >   net/ipv6/udp.c
> >
> > between commit:
> >
> >   efc6b6f6c311 ("udp: Improve load balancing for SO_REUSEPORT.")
> >
> > from the net tree and commits:
> >
> >   7629c73a1466 ("udp: Extract helper for selecting socket from reuseport group")
> >   2a08748cd384 ("udp6: Extract helper for selecting socket from reuseport group")
> >
> > from the bpf-next tree.
> >
> > I fixed it up (I wasn't sure how to proceed, so I used the latter
> > version) and can carry the fix as necessary. This is now fixed as far
> > as linux-next is concerned, but any non trivial conflicts should be
> > mentioned to your upstream maintainer when your tree is submitted for
> > merging.  You may also want to consider cooperating with the maintainer
> > of the conflicting tree to minimise any particularly complex conflicts.
> 
> This one is a bit tricky.
> 
> Looking at how code in udp[46]_lib_lookup2 evolved, first:
> 
>   acdcecc61285 ("udp: correct reuseport selection with connected sockets")
> 
> 1) exluded connected UDP sockets from reuseport group during lookup, and
> 2) limited fast reuseport return to groups with no connected sockets,
> 
> The second change had an uninteded side-effect of discarding reuseport
> socket selection when reuseport group contained connected sockets.
> 
> Then, recent
> 
>   efc6b6f6c311 ("udp: Improve load balancing for SO_REUSEPORT.")
> 
> rectified it by recording reuseport socket selection as lookup result
> candidate, in case fast reuseport return did not happen because
> reuseport group had connected sockets.
> 
> I belive that changes in commit efc6b6f6c311 can be rewritten as below
> to the same effect, by realizing that we are always setting the 'result'
> if 'score > badness'. Either to what reuseport_select_sock() returned or
> to 'sk' that scored higher than current 'badness' threshold.

Good point!
It looks good to me.


> ---8<---
> static struct sock *udp4_lib_lookup2(struct net *net,
> 				     __be32 saddr, __be16 sport,
> 				     __be32 daddr, unsigned int hnum,
> 				     int dif, int sdif,
> 				     struct udp_hslot *hslot2,
> 				     struct sk_buff *skb)
> {
> 	struct sock *sk, *result;
> 	int score, badness;
> 	u32 hash = 0;
> 
> 	result = NULL;
> 	badness = 0;
> 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> 		score = compute_score(sk, net, saddr, sport,
> 				      daddr, hnum, dif, sdif);
> 		if (score > badness) {
> 			result = NULL;
> 			if (sk->sk_reuseport &&
> 			    sk->sk_state != TCP_ESTABLISHED) {
> 				hash = udp_ehashfn(net, daddr, hnum,
> 						   saddr, sport);
> 				result = reuseport_select_sock(sk, hash, skb,
> 							       sizeof(struct udphdr));
> 				if (result && !reuseport_has_conns(sk, false))
> 					return result;
> 			}
> 			if (!result)
> 				result = sk;
> 			badness = score;
> 		}
> 	}
> 	return result;
> }
> ---8<---
> 
> From there, it is now easier to resolve the conflict with
> 
>   7629c73a1466 ("udp: Extract helper for selecting socket from reuseport group")
>   2a08748cd384 ("udp6: Extract helper for selecting socket from reuseport group")
> 
> which extract the 'if (sk->sk_reuseport && sk->sk_state !=
> TCP_ESTABLISHED)' block into a helper called lookup_reuseport().
> 
> To merge the two, we need to pull the reuseport_has_conns() check up
> from lookup_reuseport() and back into udp[46]_lib_lookup2(), because now
> we want to record reuseport socket selection even if reuseport group has
> connections.
> 
> The only other call site of lookup_reuseport() is in
> udp[46]_lookup_run_bpf(). We don't want to discard the reuseport
> selected socket if group has connections there either, so no changes are
> needed. And, now that I think about it, the current behavior in
> udp[46]_lookup_run_bpf() is not right.
> 
> The end result for udp4 will look like:
> 
> ---8<---
> static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
> 					    struct sk_buff *skb,
> 					    __be32 saddr, __be16 sport,
> 					    __be32 daddr, unsigned short hnum)
> {
> 	struct sock *reuse_sk = NULL;
> 	u32 hash;
> 
> 	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
> 		hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
> 		reuse_sk = reuseport_select_sock(sk, hash, skb,
> 						 sizeof(struct udphdr));
> 	}
> 	return reuse_sk;
> }
> 
> /* called with rcu_read_lock() */
> static struct sock *udp4_lib_lookup2(struct net *net,
> 				     __be32 saddr, __be16 sport,
> 				     __be32 daddr, unsigned int hnum,
> 				     int dif, int sdif,
> 				     struct udp_hslot *hslot2,
> 				     struct sk_buff *skb)
> {
> 	struct sock *sk, *result;
> 	int score, badness;
> 
> 	result = NULL;
> 	badness = 0;
> 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> 		score = compute_score(sk, net, saddr, sport,
> 				      daddr, hnum, dif, sdif);
> 		if (score > badness) {
> 			result = lookup_reuseport(net, sk, skb,
> 						  saddr, sport, daddr, hnum);
> 			if (result && !reuseport_has_conns(sk, false))
> 				return result;
> 			if (!result)
> 				result = sk;
> 			badness = score;
> 		}
> 	}
> 	return result;
> }
> ---8<---
> 
> I will submit a patch that pulls the reuseport_has_conns() check from
> lookup_reuseport() to bpf-next. That should bring the two sides of the
> merge closer. Please let me know if I can help in any other way.
> 
> Also, please take a look at the 3-way diff below from my attempt to
> merge net tree into bpf-next tree taking the described approach.
> 
> Thanks,
> -jkbs

Can I submit a patch to net tree that rewrites udp[46]_lib_lookup2() to
use only 'result' ?

Best Regards,
Kuniyuki

  reply	other threads:[~2020-07-22 14:52 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-22  3:21 linux-next: manual merge of the bpf-next tree with the net tree Stephen Rothwell
2020-07-22 12:17 ` Jakub Sitnicki
2020-07-22 14:42   ` Kuniyuki Iwashima [this message]
2020-07-22 15:02     ` Jakub Sitnicki
2020-07-22 15:05       ` Willem de Bruijn
2020-07-22 15:25         ` Jakub Sitnicki
2020-07-22 15:49           ` Willem de Bruijn
2020-07-22 17:14             ` Alexei Starovoitov
2020-07-23  2:11 ` Stephen Rothwell
  -- strict thread matches above, loose matches on Subject: below --
2024-04-29  1:49 Stephen Rothwell
2024-04-29 18:56 ` Jakub Kicinski
2024-04-29 21:17   ` Daniel Borkmann
2024-03-28  1:55 Stephen Rothwell
2024-03-28  1:57 ` Alexei Starovoitov
2022-11-15 23:10 Stephen Rothwell
2022-09-01  1:11 Stephen Rothwell
2022-08-25  1:00 Stephen Rothwell
2019-12-19 23:23 Stephen Rothwell
2019-10-13 23:32 Stephen Rothwell
2019-10-15 23:30 ` Stephen Rothwell
2019-03-26 22:14 Stephen Rothwell
2019-03-27  1:56 ` Alexei Starovoitov
2019-03-27  9:26   ` Luca Boccassi
2019-03-27 15:15     ` Alexei Starovoitov
2019-03-28  1:53       ` Alexei Starovoitov
2019-03-28 11:37         ` Luca Boccassi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200722144212.27106-1-kuniyu@amazon.co.jp \
    --to=kuniyu@amazon.co.jp \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.