netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Matching unbound sockets for VRF
@ 2022-03-24 17:19 Stephen Suryaputra
  2022-03-25 14:13 ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Suryaputra @ 2022-03-24 17:19 UTC (permalink / raw)
  To: netdev, rshearma, mmanning, dsahern

Hello,

After upgrading to a kernel version that has commit 3c82a21f4320c ("net:
allow binding socket in a VRF when there's an unbound socket") several
of our applications don't work anymore. We are relying on the previous
behavior, i.e. when packets arrive on an l3mdev enslaved device, the
unbound sockets are matched.

I understand the use case for the commit but given that the previous
behavior has been there for quite some time since the VRF introduction,
should there be a configurable option to get the previous behavior? The
option could be having the default be the behavior achieved by the
commit.

Thanks,

Stephen.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Matching unbound sockets for VRF
  2022-03-24 17:19 Matching unbound sockets for VRF Stephen Suryaputra
@ 2022-03-25 14:13 ` David Ahern
  2022-03-27 12:57   ` Stephen Suryaputra
  0 siblings, 1 reply; 7+ messages in thread
From: David Ahern @ 2022-03-25 14:13 UTC (permalink / raw)
  To: Stephen Suryaputra, netdev, rshearma, mmanning

On 3/24/22 11:19 AM, Stephen Suryaputra wrote:
> Hello,
> 
> After upgrading to a kernel version that has commit 3c82a21f4320c ("net:
> allow binding socket in a VRF when there's an unbound socket") several
> of our applications don't work anymore. We are relying on the previous
> behavior, i.e. when packets arrive on an l3mdev enslaved device, the
> unbound sockets are matched.
> 
> I understand the use case for the commit but given that the previous
> behavior has been there for quite some time since the VRF introduction,
> should there be a configurable option to get the previous behavior? The
> option could be having the default be the behavior achieved by the
> commit.
> 

I thought the behavior was controlled by the l3mdev sysctl knobs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Matching unbound sockets for VRF
  2022-03-25 14:13 ` David Ahern
@ 2022-03-27 12:57   ` Stephen Suryaputra
  2022-04-03 16:24     ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Suryaputra @ 2022-03-27 12:57 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

On Fri, Mar 25, 2022 at 08:13:55AM -0600, David Ahern wrote:
> On 3/24/22 11:19 AM, Stephen Suryaputra wrote:
> > Hello,
> > 
> > After upgrading to a kernel version that has commit 3c82a21f4320c ("net:
> > allow binding socket in a VRF when there's an unbound socket") several
> > of our applications don't work anymore. We are relying on the previous
> > behavior, i.e. when packets arrive on an l3mdev enslaved device, the
> > unbound sockets are matched.
> > 
> > I understand the use case for the commit but given that the previous
> > behavior has been there for quite some time since the VRF introduction,
> > should there be a configurable option to get the previous behavior? The
> > option could be having the default be the behavior achieved by the
> > commit.
> > 
> 
> I thought the behavior was controlled by the l3mdev sysctl knobs.

The addresses for Mike and Robert bounced. So, removing them from the
thread.

The problem is that our system uses a fallback rule to a vrf, e.g.:

1000:   from all lookup [l3mdev-table]
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default
32768:  from all lookup 256

to force traffic to go out of the vrf-enslaved interface. When the host
with the vrf initiates tcp connection, the received SYN+ACK fails to
find a matching socket after the commit. See the traffic dump:

08:51:28.625806 IP 10.1.1.1.48076 > 10.1.1.2.1499: Flags [S], seq 2060777757, win 64240, options [mss 1460,sackOK,TS val 3307983770 ecr 0,nop,wscale 7], length 0
08:51:28.625831 IP 10.1.1.2.1499 > 10.1.1.1.48076: Flags [S.], seq 4017990855, ack 2060777758, win 65160, options [mss 1460,sackOK,TS val 1658979570 ecr 3307983770,nop,wscale 7], length 0
08:51:28.625837 IP 10.1.1.1.48076 > 10.1.1.2.1499: Flags [R], seq 2060777758, win 0, length 0

The reproducer script is attached.

Thanks,

Stephen.

[-- Attachment #2: socket.sh --]
[-- Type: application/x-sh, Size: 1721 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Matching unbound sockets for VRF
  2022-03-27 12:57   ` Stephen Suryaputra
@ 2022-04-03 16:24     ` David Ahern
  2022-04-04 12:41       ` Stephen Suryaputra
  0 siblings, 1 reply; 7+ messages in thread
From: David Ahern @ 2022-04-03 16:24 UTC (permalink / raw)
  To: Stephen Suryaputra; +Cc: netdev

On 3/27/22 6:57 AM, Stephen Suryaputra wrote:
> 
> The reproducer script is attached.
> 

h0 has the mgmt vrf, the l3mdev settings yet is running the client in
*default* vrf. Add 'ip vrf exec mgmt' before the 'nc' and it works.

Are you saying that before Mike and Robert's changes you could get a
client to run in default VRF and work over mgmt VRF? If so it required
some ugly routing tricks (the last fib rule you installed) and is a bug
relative to the VRF design.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Matching unbound sockets for VRF
  2022-04-03 16:24     ` David Ahern
@ 2022-04-04 12:41       ` Stephen Suryaputra
  2022-04-05 14:32         ` David Ahern
  0 siblings, 1 reply; 7+ messages in thread
From: Stephen Suryaputra @ 2022-04-04 12:41 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev

On Sun, Apr 03, 2022 at 10:24:36AM -0600, David Ahern wrote:
> On 3/27/22 6:57 AM, Stephen Suryaputra wrote:
> > 
> > The reproducer script is attached.
> > 
> 
> h0 has the mgmt vrf, the l3mdev settings yet is running the client in
> *default* vrf. Add 'ip vrf exec mgmt' before the 'nc' and it works.

Yes. With "ip vrf exec mgmt" nc would work. We know that. See more
below.

> Are you saying that before Mike and Robert's changes you could get a
> client to run in default VRF and work over mgmt VRF? If so it required
> some ugly routing tricks (the last fib rule you installed) and is a bug
> relative to the VRF design.

Yes, before Mike and Robert's changes the client ran fine because of the
last fib rule. We did that because some of our applications are:
1) Pre-dates "ip vrf exec"
2) LD_PRELOAD trick from the early days doesn't work

On the case (2) above, one concrete example is NFS mounting our images:
applications and kernel modules. We had to run less than full-blown
utilities and also the mount command uses glibc RPC functions
(pmap_getmaps(), clntudp_create(), clnt_call(), etc, etc.). We analyzed
it back then that because these functions are in glibc and call socket()
from within glibc, the LD_PRELOAD doesn't work.

From the thread of Mike and Robert's changes, the conclusion is that the
previous behavior is a bug but we have been relying on it for a while,
since the early days of VRFs, and an upgrade that includes the changes
caused some applications to not work anymore.

I'm asking if Mike and Robert's changes should be controlled by an
option, e.g. sysctl, and be the default. But can be reverted back to the
previous behavior.

Thanks,
Stephen.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Matching unbound sockets for VRF
  2022-04-04 12:41       ` Stephen Suryaputra
@ 2022-04-05 14:32         ` David Ahern
  2022-04-07 21:28           ` Ben Greear
  0 siblings, 1 reply; 7+ messages in thread
From: David Ahern @ 2022-04-05 14:32 UTC (permalink / raw)
  To: Stephen Suryaputra, Ben Greear; +Cc: netdev

On 4/4/22 6:41 AM, Stephen Suryaputra wrote:
> On Sun, Apr 03, 2022 at 10:24:36AM -0600, David Ahern wrote:
>> On 3/27/22 6:57 AM, Stephen Suryaputra wrote:
>>>
>>> The reproducer script is attached.
>>>
>>
>> h0 has the mgmt vrf, the l3mdev settings yet is running the client in
>> *default* vrf. Add 'ip vrf exec mgmt' before the 'nc' and it works.
> 
> Yes. With "ip vrf exec mgmt" nc would work. We know that. See more
> below.
> 
>> Are you saying that before Mike and Robert's changes you could get a
>> client to run in default VRF and work over mgmt VRF? If so it required
>> some ugly routing tricks (the last fib rule you installed) and is a bug
>> relative to the VRF design.
> 
> Yes, before Mike and Robert's changes the client ran fine because of the
> last fib rule. We did that because some of our applications are:
> 1) Pre-dates "ip vrf exec"
> 2) LD_PRELOAD trick from the early days doesn't work
> 
> On the case (2) above, one concrete example is NFS mounting our images:
> applications and kernel modules. We had to run less than full-blown
> utilities and also the mount command uses glibc RPC functions
> (pmap_getmaps(), clntudp_create(), clnt_call(), etc, etc.). We analyzed
> it back then that because these functions are in glibc and call socket()
> from within glibc, the LD_PRELOAD doesn't work.
> 
> From the thread of Mike and Robert's changes, the conclusion is that the
> previous behavior is a bug but we have been relying on it for a while,
> since the early days of VRFs, and an upgrade that includes the changes
> caused some applications to not work anymore.
> 
> I'm asking if Mike and Robert's changes should be controlled by an
> option, e.g. sysctl, and be the default. But can be reverted back to the
> previous behavior.
> 

It has been 3-1/2 years since that patch. Rather than add more checks to
try to manage unintended app behavior, why not work on making your apps
consistent with the intent of the VRF design? If adding `ip vrf exec
VRF` before commands works, that is a very simple solution and the
reason for the command (handle code that is not VRF aware).

I'm guessing that option will not work for all cases (e.g., NFS which I
think Ben has asked about as well, cc'ed), but working towards making
the code align with VRF design is the longer term win.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Matching unbound sockets for VRF
  2022-04-05 14:32         ` David Ahern
@ 2022-04-07 21:28           ` Ben Greear
  0 siblings, 0 replies; 7+ messages in thread
From: Ben Greear @ 2022-04-07 21:28 UTC (permalink / raw)
  To: David Ahern, Stephen Suryaputra; +Cc: netdev

On 4/5/22 7:32 AM, David Ahern wrote:
> On 4/4/22 6:41 AM, Stephen Suryaputra wrote:
>> On Sun, Apr 03, 2022 at 10:24:36AM -0600, David Ahern wrote:
>>> On 3/27/22 6:57 AM, Stephen Suryaputra wrote:
>>>>
>>>> The reproducer script is attached.
>>>>
>>>
>>> h0 has the mgmt vrf, the l3mdev settings yet is running the client in
>>> *default* vrf. Add 'ip vrf exec mgmt' before the 'nc' and it works.
>>
>> Yes. With "ip vrf exec mgmt" nc would work. We know that. See more
>> below.
>>
>>> Are you saying that before Mike and Robert's changes you could get a
>>> client to run in default VRF and work over mgmt VRF? If so it required
>>> some ugly routing tricks (the last fib rule you installed) and is a bug
>>> relative to the VRF design.
>>
>> Yes, before Mike and Robert's changes the client ran fine because of the
>> last fib rule. We did that because some of our applications are:
>> 1) Pre-dates "ip vrf exec"
>> 2) LD_PRELOAD trick from the early days doesn't work
>>
>> On the case (2) above, one concrete example is NFS mounting our images:
>> applications and kernel modules. We had to run less than full-blown
>> utilities and also the mount command uses glibc RPC functions
>> (pmap_getmaps(), clntudp_create(), clnt_call(), etc, etc.). We analyzed
>> it back then that because these functions are in glibc and call socket()
>> from within glibc, the LD_PRELOAD doesn't work.
>>
>>  From the thread of Mike and Robert's changes, the conclusion is that the
>> previous behavior is a bug but we have been relying on it for a while,
>> since the early days of VRFs, and an upgrade that includes the changes
>> caused some applications to not work anymore.
>>
>> I'm asking if Mike and Robert's changes should be controlled by an
>> option, e.g. sysctl, and be the default. But can be reverted back to the
>> previous behavior.
>>
> 
> It has been 3-1/2 years since that patch. Rather than add more checks to
> try to manage unintended app behavior, why not work on making your apps
> consistent with the intent of the VRF design? If adding `ip vrf exec
> VRF` before commands works, that is a very simple solution and the
> reason for the command (handle code that is not VRF aware).
> 
> I'm guessing that option will not work for all cases (e.g., NFS which I
> think Ben has asked about as well, cc'ed), but working towards making
> the code align with VRF design is the longer term win.

NFS certainly wouldn't work.  It builds its sockets in the kernel in a convoluted
call path.  I tried to make NFS work with VRF at one time, found it very painful
and not worth the effort, especially since I figured patches would never make it
upstream.

We have out-of-tree patches to at least make NFS work with source based routing,
but those patches were not accepted upstream (2011 timeframe may be last I tried),
so stock kernels + NFS plus interesting routing
pretty much doesn't work at all as far as I can tell.

If binding NFS to source IPs is interesting in this day and time, I can try reposting
my patches, or you can grab them from one of my trees:

Somewhere around 800 patches down from HEAD, first few patches after the upstream
rebase point:

https://github.com/greearb/linux-ct-5.17

https://github.com/greearb/nfs-utils-ct

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-04-07 21:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-24 17:19 Matching unbound sockets for VRF Stephen Suryaputra
2022-03-25 14:13 ` David Ahern
2022-03-27 12:57   ` Stephen Suryaputra
2022-04-03 16:24     ` David Ahern
2022-04-04 12:41       ` Stephen Suryaputra
2022-04-05 14:32         ` David Ahern
2022-04-07 21:28           ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).