All of lore.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Poirier <bpoirier@nvidia.com>
To: Mike Manning <mvrmanning@gmail.com>
Cc: Netdev <netdev@vger.kernel.org>, David Ahern <dsahern@gmail.com>,
	Saikrishna Arcot <sarcot@microsoft.com>,
	Craig Gallek <kraig@google.com>
Subject: Re: [PATCH] net: prefer socket bound to interface when not in VRF
Date: Mon, 27 Jun 2022 09:05:37 +0900	[thread overview]
Message-ID: <Yrj0UWOQM8QNqJqu@d3> (raw)
In-Reply-To: <9b37fcae-214c-3a5b-d976-8c94632184d0@gmail.com>

On 2022-06-26 23:25 +0100, Mike Manning wrote:
> On 13/06/2022 04:14, Benjamin Poirier wrote:
> > On 2021-10-05 14:03 +0100, Mike Manning wrote:
[...]
> > Hi Mike,
> >
> > I was looking at this commit, 8d6c414cd2fb ("net: prefer socket bound to
> > interface when not in VRF"), and I get the feeling that it is only
> > partially effective. It works with UDP connected sockets but it doesn't
> > work for TCP and UDP unconnected sockets.
> >
> > The compute_score() functions are a bit misleading. Because of the
> > reuseport shortcut in their callers (inet_lhash2_lookup() and the like),
> > the first socket with score > 0 may be chosen, not necessarily the
> > socket with highest score. In order to prefer certain sockets, I think
> > an approach like commit d894ba18d4e4 ("soreuseport: fix ordering for
> > mixed v4/v6 sockets") would be needed. What do you think?
> 
> Hi Benjamin,
> 
> We had never observed any issues with any of our configurations. The VRF changes introduced
> 
> in 7e225619e8af result in a failure being returned when there is no device match, which satisfies
> 
> the requirements for VRF handling so unbound vs. bound to an l3mdev - the score is irrelevant.
> 
> However, 8d6c414cd2fb was subsequently needed as unbound and bound sockets were scored
> 
> equally, so that fix reinstated a higher score needed for sockets bound to an interface. Wrt to
> 
> your query, the scoring resolved the issue. I am unaware of any problematic use-cases, but in
> 
> any case, my changes are in line with the current approach.

The problematic use case involves sockets that have SO_REUSEPORT +
SO_BINDTODEVICE. Earlier in the thread I've included a test that
demonstrates the issue.

For the Cumulus kernel I've put in place a workaround that removes the
reuseport optimization (see below). I probably won't have time to work
on a proper upstream solution.

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index b9d995b5ce24..1765ac837358 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -253,16 +253,20 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
 		score = compute_score(sk, net, hnum, daddr, dif, sdif);
 		if (score > hiscore) {
-			result = lookup_reuseport(net, sk, skb, doff,
-						  saddr, sport, daddr, hnum);
-			if (result)
-				return result;
-
 			result = sk;
 			hiscore = score;
 		}
 	}
 
+	if (result) {
+		struct sock *reuse_sk;
+
+		reuse_sk = lookup_reuseport(net, result, skb, doff,
+					    saddr, sport, daddr, hnum);
+		if (reuse_sk)
+			result = reuse_sk;
+	}
+
 	return result;
 }
 
> > Extra info:
> > 1) fcnal-test.sh results
> >
> > I tried to reproduce the fcnal-test.sh test results quoted above but in
> > my case the test cases already pass at 8d6c414cd2fb^ and 9e9fb7655ed5.
> > Moreover I believe those test cases don't have multiple listening
> > sockets. So that just added to my confusion.
> 
> The fix was not targeting those 2 failed test cases, the output was only to show the before/after
> 
> test results. It is unclear why they failed for me with with the 9e9fb7655ed5 baseline,

Thanks for taking a look, that was also my guess.

      reply	other threads:[~2022-06-27  0:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-05 13:03 [PATCH] net: prefer socket bound to interface when not in VRF Mike Manning
2021-10-07 14:07 ` Jakub Kicinski
2021-10-07 14:10   ` David Ahern
2021-10-07 14:27     ` Jakub Kicinski
2022-06-13  3:14 ` Benjamin Poirier
2022-06-13  3:52   ` David Ahern
2022-06-14  0:39     ` Benjamin Poirier
2022-06-26 22:25   ` Mike Manning
2022-06-27  0:05     ` Benjamin Poirier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yrj0UWOQM8QNqJqu@d3 \
    --to=bpoirier@nvidia.com \
    --cc=dsahern@gmail.com \
    --cc=kraig@google.com \
    --cc=mvrmanning@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=sarcot@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.