linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Luebbe <jluebbe@lasnet.de>
To: Mike Manning <mvrmanning@gmail.com>,
	David Ahern <dsahern@kernel.org>,
	Robert Shearman <robertshearman@gmail.com>,
	Andy Roulin <aroulin@nvidia.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	netdev@vger.kernel.org, regressions@lists.linux.dev,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [REGRESSION] connection timeout with routes to VRF
Date: Wed, 06 Jul 2022 20:49:27 +0200	[thread overview]
Message-ID: <a32428fa0f3811c25912cd313a6fe1fb4f0a4fac.camel@lasnet.de> (raw)
In-Reply-To: <940fa370-08ce-1d39-d5cc-51de8e853b47@gmail.com>

On Sun, 2022-06-26 at 21:06 +0100, Mike Manning wrote:
...
> Andy Roulin suggested the same fix to the same problem a few weeks back.
> Let's do it along with a test case in fcnl-test.sh which covers all of
> these vrf permutations.
> 
Reverting 3c82a21f4320 would remove isolation between the default and other VRFs
needed when no VRF route leaking has been configured between these: there may be
unintended leaking of packets arriving on a device enslaved to an l3mdev due to
the potential match on an unbound socket.

Thanks for the explanation.

VRF route leaking requires routes to be present for both ingress and egress
VRFs,
the testcase shown only has a route from default to red VRF. The implicit return
path from red to default VRF due to match on unbound socket is no longer
present.


If there is a better configuration that makes this work in the general case
without a change to the kernel, we'd be happy as well.

In our full setup, the outbound TCP connection (from the default VRF) gets a
local IP from the interface enslaved to the VRF. Before 3c82a21f4320, this would
simply work.

How would the return path route from the red VRF to the default VRF look in that
case?

Match on unbound socket in all VRFs and not only in the default VRF should be
possible by setting this option (see
https://www.kernel.org/doc/Documentation/networking/vrf.txt):


Do you mean unbound as in listening socket not bound to an IP with bind()? Or as
in a socket in the default VRF?

sysctl net.ipv4.tcp_l3mdev_accept=1


The sysctl docs sound like this should only apply to listening sockets. In this
case, we have an unconnected outbound socket.

However, for this to work a change similar to the following is needed (I have
shown the change to the macro for consistency with above, it is now an inline
fn):


I can also test on master and only used the macro form only because I wasn't
completely sure how to translate it to the inline function form.

---
 include/net/inet_hashtables.h |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -300,9 +300,8 @@
 #define INET_MATCH(__sk, __net, __cookie, __saddr, __daddr, __ports, __dif,
__sdif) \
        (((__sk)->sk_portpair == (__ports))                     &&      \
         ((__sk)->sk_addrpair == (__cookie))                    &&      \
-        (((__sk)->sk_bound_dev_if == (__dif))                  ||      \
-         ((__sk)->sk_bound_dev_if == (__sdif)))                &&      \
-        net_eq(sock_net(__sk), (__net)))
+        net_eq(sock_net(__sk), (__net))                        &&      \
+        inet_sk_bound_dev_eq((__net), (__sk)->sk_bound_dev_if, (__dif),
(__sdif)))
 #else /* 32-bit arch */
 #define INET_ADDR_COOKIE(__name, __saddr, __daddr) \
        const int __name __deprecated __attribute__((unused))
@@ -311,9 +310,8 @@
        (((__sk)->sk_portpair == (__ports))             &&              \
         ((__sk)->sk_daddr      == (__saddr))           &&              \
         ((__sk)->sk_rcv_saddr  == (__daddr))           &&              \
-        (((__sk)->sk_bound_dev_if == (__dif))          ||              \
-         ((__sk)->sk_bound_dev_if == (__sdif)))        &&              \
-        net_eq(sock_net(__sk), (__net)))
+        net_eq(sock_net(__sk), (__net))                &&              \
+        inet_sk_bound_dev_eq((__net), (__sk)->sk_bound_dev_if, (__dif),
(__sdif)))
 #endif /* 64-bit arch */

 /* Sockets in TCP_CLOSE state are _always_ taken out of the hash, so we need

I can confirm that this gets my testcase working with 
net.ipv4.tcp_l3mdev_accept=1.

This is to get the testcase to pass, I will leave it to others to comment on
the testcase validity in terms of testing forwarding using commands on 1 device.

So a network-namespace-based testcase would be preferred? We used the simple
setup because it seemed easier to understand.

The series that 3c82a21f4320 is part of were introduced into the kernel in 2018
by the Vyatta team, who regularly run an extensive test suite for routing
protocols
for VRF functionality incl. all combinations of route leaking between default
and
other VRFs, so there is no known issue in this regard. I will attempt to reach
out
to them so as to advise them of this thread.

Are these testcases public? Perhaps I could use them find a better configuration
that handles our use-case.

Thanks,

Jan


  reply	other threads:[~2022-07-06 18:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-11 11:14 [REGRESSION] connection timeout with routes to VRF Jan Luebbe
2022-06-11 16:44 ` David Ahern
2022-06-15 17:47   ` Jan Luebbe
2022-06-16 13:27     ` David Ahern
2022-06-26 20:06   ` Mike Manning
2022-07-06 18:49     ` Jan Luebbe [this message]
2022-07-17 10:31       ` Mike Manning
2022-07-17 19:27         ` Jan Lübbe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a32428fa0f3811c25912cd313a6fe1fb4f0a4fac.camel@lasnet.de \
    --to=jluebbe@lasnet.de \
    --cc=aroulin@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mvrmanning@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=robertshearman@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).