All of lore.kernel.org
 help / color / mirror / Atom feed
* Change in behavior for bound vs unbound sockets
@ 2021-08-30 23:47 Saikrishna Arcot
  2021-08-31 10:12 ` Paul Menzel
  0 siblings, 1 reply; 5+ messages in thread
From: Saikrishna Arcot @ 2021-08-30 23:47 UTC (permalink / raw)
  To: netdev

Hi all,

When upgrading from 4.19.152 to 5.10.40, I noticed a change in behavior in how incoming UDP packets are assigned to sockets that are bound to an interface and a socket that is not bound to any interface. This affects the dhcrelay program in isc-dhcp, when it is compiled to use regular UDP sockets and not raw sockets.

For each interface it finds on the system (or is passed in via command-line), dhcrelay opens a UDP socket listening on port 67 and bound to that interface. Then, at the end, it opens a UDP socket also listening on port 67, but not bound to any interface (this socket is used for sending, mainly). It expects that for packets that arrived on an interface for which a bound socket is opened, it will arrive on that bound socket. This was true for 4.19.152, but on 5.10.40, packets arrive on the unbound socket only, and never on the bound socket. dhcrelay discards any packets that it sees on the unbound socket. Because of this, this application breaks.

I made a test application that creates two UDP sockets, binds one of them to the loopback interface, and has them both listen on 0.0.0.0 with some random port. Then, it waits for a message on those two sockets, and prints out which socket it received a message on. With another application (such as nc) sending some UDP message, I can see that on 4.19.152, the test application gets the message on the bound socket consistently, whereas on 5.10.40, it gets the message on the unbound socket consistently. I have a dev machine running 5.4.0, and it gets the message on the unbound socket consistently as well.

I traced it to one commit (6da5b0f027a8 "net: ensure unbound datagram socket to be chosen when not in a VRF") that makes sure that when not in a VRF, the unbound socket is chosen over the bound socket, if both are available. If I revert this commit and two other commits that made changes on top of this, I can see that packets get sent to the bound socket instead. There's similar commits made for TCP and raw sockets as well, as part of that patch series.

Is the intention of those commits also meant to affect sockets that are bound to just regular interfaces (and not only VRFs)? If so, since this change breaks a userspace application, is it possible to add a config that reverts to the old behavior, where bound sockets are preferred over unbound sockets?

--
Saikrishna Arcot


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Change in behavior for bound vs unbound sockets
  2021-08-30 23:47 Change in behavior for bound vs unbound sockets Saikrishna Arcot
@ 2021-08-31 10:12 ` Paul Menzel
  2021-09-01  2:29   ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Menzel @ 2021-08-31 10:12 UTC (permalink / raw)
  To: Saikrishna Arcot, Mike Manning; +Cc: netdev, David Ahern, David S. Miller

[cc: +maintainers and commit author and reviewers]

Dear Saikrishna,


Am 31.08.21 um 01:47 schrieb Saikrishna Arcot:

Thank you for bringing this issue, you found working on upgrading the 
Linux kernel in SONiC [1], up on the mailing list.

> When upgrading from 4.19.152 to 5.10.40, I noticed a change in
> behavior in how incoming UDP packets are assigned to sockets that are
> bound to an interface and a socket that is not bound to any
> interface. This affects the dhcrelay program in isc-dhcp, when it is
> compiled to use regular UDP sockets and not raw sockets.
> 
> For each interface it finds on the system (or is passed in via
> command-line), dhcrelay opens a UDP socket listening on port 67 and
> bound to that interface. Then, at the end, it opens a UDP socket also
> listening on port 67, but not bound to any interface (this socket is
> used for sending, mainly). It expects that for packets that arrived
> on an interface for which a bound socket is opened, it will arrive on
> that bound socket. This was true for 4.19.152, but on 5.10.40,
> packets arrive on the unbound socket only, and never on the bound
> socket. dhcrelay discards any packets that it sees on the unbound
> socket. Because of this, this application breaks.
> 
> I made a test application that creates two UDP sockets, binds one of
> them to the loopback interface, and has them both listen on 0.0.0.0
> with some random port. Then, it waits for a message on those two
> sockets, and prints out which socket it received a message on. With
> another application (such as nc) sending some UDP message, I can see
> that on 4.19.152, the test application gets the message on the bound
> socket consistently, whereas on 5.10.40, it gets the message on the
> unbound socket consistently. I have a dev machine running 5.4.0, and
> it gets the message on the unbound socket consistently as well.

It’d be great, if you shared your script.

> I traced it to one commit (6da5b0f027a8 "net: ensure unbound datagram
> socket to be chosen when not in a VRF") that makes sure that when not
> in a VRF, the unbound socket is chosen over the bound socket, if both
> are available. If I revert this commit and two other commits that
> made changes on top of this, I can see that packets get sent to the
> bound socket instead. There's similar commits made for TCP and raw
> sockets as well, as part of that patch series.

Commit 6da5b0f027a8 (net: ensure unbound datagram socket to be chosen 
when not in a VRF) was added to Linux 5.0.

> Is the intention of those commits also meant to affect sockets that
> are bound to just regular interfaces (and not only VRFs)? If so,
> since this change breaks a userspace application, is it possible to
> add a config that reverts to the old behavior, where bound sockets
> are preferred over unbound sockets?
If it breaks user space, the old behavior needs to be restored according 
to Linux’ no regression policy. Let’s hope, in the future, there is 
better testing infrastructure and such issues are noticed earlier.


Kind regards,

Paul


PS:

> --
> Saikrishna Arcot

Saikrishna, if you care, the standard signature delimiter has a trailing 
space.


[1]: https://github.com/Azure/sonic-linux-kernel/pull/227/
[2]: https://en.wikipedia.org/wiki/Signature_block#Standard_delimiter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Change in behavior for bound vs unbound sockets
  2021-08-31 10:12 ` Paul Menzel
@ 2021-09-01  2:29   ` David Ahern
  2021-09-02  0:16     ` [EXTERNAL] " Saikrishna Arcot
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2021-09-01  2:29 UTC (permalink / raw)
  To: Paul Menzel, Saikrishna Arcot, Mike Manning; +Cc: netdev, David S. Miller

On 8/31/21 3:12 AM, Paul Menzel wrote:
>> I traced it to one commit (6da5b0f027a8 "net: ensure unbound datagram
>> socket to be chosen when not in a VRF") that makes sure that when not
>> in a VRF, the unbound socket is chosen over the bound socket, if both
>> are available. If I revert this commit and two other commits that
>> made changes on top of this, I can see that packets get sent to the
>> bound socket instead. There's similar commits made for TCP and raw
>> sockets as well, as part of that patch series.
> 
> Commit 6da5b0f027a8 (net: ensure unbound datagram socket to be chosen
> when not in a VRF) was added to Linux 5.0.
> 
>> Is the intention of those commits also meant to affect sockets that
>> are bound to just regular interfaces (and not only VRFs)? If so,
>> since this change breaks a userspace application, is it possible to
>> add a config that reverts to the old behavior, where bound sockets
>> are preferred over unbound sockets?
> If it breaks user space, the old behavior needs to be restored according
> to Linux’ no regression policy. Let’s hope, in the future, there is
> better testing infrastructure and such issues are noticed earlier.

5.0 was 2-1/2 years ago.

Feel free to add tests to tools/testing/selftests/net/fcnal-test.sh to
cover any missing permutations, including what you believe is the
problem here. Both IPv4 and IPv6 should be added for consistency across
protocols.

nettest.c has a lot of the networking APIs, supports udp, tcp, raw, ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [EXTERNAL] Re: Change in behavior for bound vs unbound sockets
  2021-09-01  2:29   ` David Ahern
@ 2021-09-02  0:16     ` Saikrishna Arcot
  2021-09-02  3:41       ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Saikrishna Arcot @ 2021-09-02  0:16 UTC (permalink / raw)
  To: David Ahern, Paul Menzel, Mike Manning; +Cc: netdev, David S. Miller


>On 8/31/21 7:29 PM, Paul Menzel wrote:
>>> Is the intention of those commits also meant to affect sockets that
>>> are bound to just regular interfaces (and not only VRFs)? If so,
>>> since this change breaks a userspace application, is it possible to
>>> add a config that reverts to the old behavior, where bound sockets
>>> are preferred over unbound sockets?
>> If it breaks user space, the old behavior needs to be restored
>> according to Linux' no regression policy. Let's hope, in the future,
>> there is better testing infrastructure and such issues are noticed earlier.
>
>5.0 was 2-1/2 years ago.

Does that mean that this should be considered the new behavior? Is it
possible to at least add a sysctl config to use the older behavior for
non-VRF socket bindings?

>
>Feel free to add tests to tools/testing/selftests/net/fcnal-test.sh to cover any
>missing permutations, including what you believe is the problem here. Both IPv4
>and IPv6 should be added for consistency across protocols.
>
>nettest.c has a lot of the networking APIs, supports udp, tcp, raw, ...

Let me try to add a test case there. I'm guessing test cases added there
should pass with the current version of the kernel (i.e. should reflect the
current behavior)?

-- 
Saikrishna Arcot

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [EXTERNAL] Re: Change in behavior for bound vs unbound sockets
  2021-09-02  0:16     ` [EXTERNAL] " Saikrishna Arcot
@ 2021-09-02  3:41       ` David Ahern
  0 siblings, 0 replies; 5+ messages in thread
From: David Ahern @ 2021-09-02  3:41 UTC (permalink / raw)
  To: Saikrishna Arcot, Paul Menzel, Mike Manning; +Cc: netdev, David S. Miller

On 9/1/21 5:16 PM, Saikrishna Arcot wrote:
> 
>> On 8/31/21 7:29 PM, Paul Menzel wrote:
>>>> Is the intention of those commits also meant to affect sockets that
>>>> are bound to just regular interfaces (and not only VRFs)? If so,
>>>> since this change breaks a userspace application, is it possible to
>>>> add a config that reverts to the old behavior, where bound sockets
>>>> are preferred over unbound sockets?
>>> If it breaks user space, the old behavior needs to be restored
>>> according to Linux' no regression policy. Let's hope, in the future,
>>> there is better testing infrastructure and such issues are noticed earlier.
>>
>> 5.0 was 2-1/2 years ago.
> 
> Does that mean that this should be considered the new behavior? Is it
> possible to at least add a sysctl config to use the older behavior for
> non-VRF socket bindings?
> 
>>
>> Feel free to add tests to tools/testing/selftests/net/fcnal-test.sh to cover any
>> missing permutations, including what you believe is the problem here. Both IPv4
>> and IPv6 should be added for consistency across protocols.
>>
>> nettest.c has a lot of the networking APIs, supports udp, tcp, raw, ...
> 
> Let me try to add a test case there. I'm guessing test cases added there
> should pass with the current version of the kernel (i.e. should reflect the
> current behavior)?
> 

Let's start by seeing test cases that demonstrate the problem.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-02  3:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-30 23:47 Change in behavior for bound vs unbound sockets Saikrishna Arcot
2021-08-31 10:12 ` Paul Menzel
2021-09-01  2:29   ` David Ahern
2021-09-02  0:16     ` [EXTERNAL] " Saikrishna Arcot
2021-09-02  3:41       ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.