netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Delayed source port allocation for connected UDP sockets
@ 2019-11-27 14:07 Marek Majkowski
  2019-11-27 16:09 ` Maciej Żenczykowski
  2019-12-02 10:14 ` Jakub Sitnicki
  0 siblings, 2 replies; 7+ messages in thread
From: Marek Majkowski @ 2019-11-27 14:07 UTC (permalink / raw)
  To: Eric Dumazet, ncardwell, maze, network dev; +Cc: kernel-team

Morning,

In my applications I need something like a connectx()[1] syscall. On
Linux I can get quite far with using bind-before-connect and
IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.

For various UDP applications I'm establishing connected sockets from
specific 2-tuple. This is working fine with bind-before-connect, but
in UDP it creates a slight race condition. It's possible the socket
will receive packet from arbitrary source after bind():

s = socket(SOCK_DGRAM)
s.bind((192.0.2.1, 1703))
# here be dragons
s.connect((198.18.0.1, 58910))

For the short amount of time after bind() and before connect(), the
socket may receive packets from any peer. For situations when I don't
need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
issue. This code is fine:

s = socket(SOCK_DGRAM)
s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
s.bind((192.0.2.1, 0))
s.connect((198.18.0.1, 58910))

But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
selected. It seems natural to expand the scope of
IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:

s = socket(SOCK_DGRAM)
s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
s.bind((192.0.2.1, 1703))
s.connect((198.18.0.1, 58910))

I would like such code to delay the binding to port 1703 up until the
connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
sockets anyway. This raises a couple of questions though:

 - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
number in the bind!

 - Where to store the source port in __inet_bind. Neither
inet->inet_sport nor inet->inet_num seem like correct places to store
the user-passed source port hint. The alternative is to introduce
yet-another field onto inet_sock struct, but that is wasteful.

Suggestions?

Marek

[1] https://www.unix.com/man-page/mojave/2/connectx/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Delayed source port allocation for connected UDP sockets
  2019-11-27 14:07 Delayed source port allocation for connected UDP sockets Marek Majkowski
@ 2019-11-27 16:09 ` Maciej Żenczykowski
  2019-11-27 16:18   ` Maciej Żenczykowski
  2019-12-02 10:14 ` Jakub Sitnicki
  1 sibling, 1 reply; 7+ messages in thread
From: Maciej Żenczykowski @ 2019-11-27 16:09 UTC (permalink / raw)
  To: Marek Majkowski; +Cc: Eric Dumazet, Neal Cardwell, network dev, kernel-team

On Wed, Nov 27, 2019 at 6:08 AM Marek Majkowski <marek@cloudflare.com> wrote:
>
> Morning,
>
> In my applications I need something like a connectx()[1] syscall. On
> Linux I can get quite far with using bind-before-connect and
> IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.
>
> For various UDP applications I'm establishing connected sockets from
> specific 2-tuple. This is working fine with bind-before-connect, but
> in UDP it creates a slight race condition. It's possible the socket
> will receive packet from arbitrary source after bind():
>
> s = socket(SOCK_DGRAM)
> s.bind((192.0.2.1, 1703))
> # here be dragons
> s.connect((198.18.0.1, 58910))
>
> For the short amount of time after bind() and before connect(), the
> socket may receive packets from any peer. For situations when I don't
> need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
> issue. This code is fine:
>
> s = socket(SOCK_DGRAM)
> s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s.bind((192.0.2.1, 0))
> s.connect((198.18.0.1, 58910))
>
> But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
> selected. It seems natural to expand the scope of
> IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:
>
> s = socket(SOCK_DGRAM)
> s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s.bind((192.0.2.1, 1703))
> s.connect((198.18.0.1, 58910))
>
> I would like such code to delay the binding to port 1703 up until the
> connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
> sockets anyway. This raises a couple of questions though:
>
>  - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
> number in the bind!
>
>  - Where to store the source port in __inet_bind. Neither
> inet->inet_sport nor inet->inet_num seem like correct places to store
> the user-passed source port hint. The alternative is to introduce
> yet-another field onto inet_sock struct, but that is wasteful.
>
> Suggestions?
>
> Marek
>
> [1] https://www.unix.com/man-page/mojave/2/connectx/

attack BPF socket filter drop all, then bind, then connect, then replace it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Delayed source port allocation for connected UDP sockets
  2019-11-27 16:09 ` Maciej Żenczykowski
@ 2019-11-27 16:18   ` Maciej Żenczykowski
  2019-11-27 17:15     ` Marek Majkowski
  0 siblings, 1 reply; 7+ messages in thread
From: Maciej Żenczykowski @ 2019-11-27 16:18 UTC (permalink / raw)
  To: Marek Majkowski; +Cc: Eric Dumazet, Neal Cardwell, network dev, kernel-team

On Wed, Nov 27, 2019 at 8:09 AM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Wed, Nov 27, 2019 at 6:08 AM Marek Majkowski <marek@cloudflare.com> wrote:
> >
> > Morning,
> >
> > In my applications I need something like a connectx()[1] syscall. On
> > Linux I can get quite far with using bind-before-connect and
> > IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.
> >
> > For various UDP applications I'm establishing connected sockets from
> > specific 2-tuple. This is working fine with bind-before-connect, but
> > in UDP it creates a slight race condition. It's possible the socket
> > will receive packet from arbitrary source after bind():
> >
> > s = socket(SOCK_DGRAM)
> > s.bind((192.0.2.1, 1703))
> > # here be dragons
> > s.connect((198.18.0.1, 58910))
> >
> > For the short amount of time after bind() and before connect(), the
> > socket may receive packets from any peer. For situations when I don't
> > need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
> > issue. This code is fine:
> >
> > s = socket(SOCK_DGRAM)
> > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > s.bind((192.0.2.1, 0))
> > s.connect((198.18.0.1, 58910))
> >
> > But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
> > selected. It seems natural to expand the scope of
> > IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:
> >
> > s = socket(SOCK_DGRAM)
> > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > s.bind((192.0.2.1, 1703))
> > s.connect((198.18.0.1, 58910))
> >
> > I would like such code to delay the binding to port 1703 up until the
> > connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
> > sockets anyway. This raises a couple of questions though:
> >
> >  - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
> > number in the bind!
> >
> >  - Where to store the source port in __inet_bind. Neither
> > inet->inet_sport nor inet->inet_num seem like correct places to store
> > the user-passed source port hint. The alternative is to introduce
> > yet-another field onto inet_sock struct, but that is wasteful.
> >
> > Suggestions?
> >
> > Marek
> >
> > [1] https://www.unix.com/man-page/mojave/2/connectx/
>
> attack BPF socket filter drop all, then bind, then connect, then replace it.

Although I guess perhaps you'd consider dropping the packets to be bad...?
Then I think you might be able to do the same trick with
SO_BINDTODEVICE("dummy0") instead of bpf and then SO_BINDTODEVICE("")
That unfortunately requires privs though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Delayed source port allocation for connected UDP sockets
  2019-11-27 16:18   ` Maciej Żenczykowski
@ 2019-11-27 17:15     ` Marek Majkowski
  0 siblings, 0 replies; 7+ messages in thread
From: Marek Majkowski @ 2019-11-27 17:15 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Eric Dumazet, Neal Cardwell, network dev, kernel-team

There may be a valid socket underneath. Consider socket() followed by bind():

udp UNCONN *:* 0.0.0.0:1703  -> master
udp UNCONN *:* 192.0.2.1:1703 -> worker

Them after connect() is done, the socket will move to ESTAB:

udp UNCONN *:* 0.0.0.0:1703  -> master
udp ESTAB 198.18.0.1:58910 192.0.2.1:1703 -> worker

I want to avoid this race. For this brief moment now I have two UNCONN
sockets. I don't want that. I want other sources to be routed to the
wildcard address. I', thinking that IP_BIND_ADDRESS_NO_PORT should be
basically a request for delayed binding. For me it makes sense to
delay the actual binding to the connect().

Marek

On Wed, Nov 27, 2019 at 5:19 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Wed, Nov 27, 2019 at 8:09 AM Maciej Żenczykowski <maze@google.com> wrote:
> >
> > On Wed, Nov 27, 2019 at 6:08 AM Marek Majkowski <marek@cloudflare.com> wrote:
> > >
> > > Morning,
> > >
> > > In my applications I need something like a connectx()[1] syscall. On
> > > Linux I can get quite far with using bind-before-connect and
> > > IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.
> > >
> > > For various UDP applications I'm establishing connected sockets from
> > > specific 2-tuple. This is working fine with bind-before-connect, but
> > > in UDP it creates a slight race condition. It's possible the socket
> > > will receive packet from arbitrary source after bind():
> > >
> > > s = socket(SOCK_DGRAM)
> > > s.bind((192.0.2.1, 1703))
> > > # here be dragons
> > > s.connect((198.18.0.1, 58910))
> > >
> > > For the short amount of time after bind() and before connect(), the
> > > socket may receive packets from any peer. For situations when I don't
> > > need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
> > > issue. This code is fine:
> > >
> > > s = socket(SOCK_DGRAM)
> > > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > > s.bind((192.0.2.1, 0))
> > > s.connect((198.18.0.1, 58910))
> > >
> > > But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
> > > selected. It seems natural to expand the scope of
> > > IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:
> > >
> > > s = socket(SOCK_DGRAM)
> > > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > > s.bind((192.0.2.1, 1703))
> > > s.connect((198.18.0.1, 58910))
> > >
> > > I would like such code to delay the binding to port 1703 up until the
> > > connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
> > > sockets anyway. This raises a couple of questions though:
> > >
> > >  - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
> > > number in the bind!
> > >
> > >  - Where to store the source port in __inet_bind. Neither
> > > inet->inet_sport nor inet->inet_num seem like correct places to store
> > > the user-passed source port hint. The alternative is to introduce
> > > yet-another field onto inet_sock struct, but that is wasteful.
> > >
> > > Suggestions?
> > >
> > > Marek
> > >
> > > [1] https://www.unix.com/man-page/mojave/2/connectx/
> >
> > attack BPF socket filter drop all, then bind, then connect, then replace it.
>
> Although I guess perhaps you'd consider dropping the packets to be bad...?
> Then I think you might be able to do the same trick with
> SO_BINDTODEVICE("dummy0") instead of bpf and then SO_BINDTODEVICE("")
> That unfortunately requires privs though.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Delayed source port allocation for connected UDP sockets
  2019-11-27 14:07 Delayed source port allocation for connected UDP sockets Marek Majkowski
  2019-11-27 16:09 ` Maciej Żenczykowski
@ 2019-12-02 10:14 ` Jakub Sitnicki
  2019-12-02 16:03   ` Willem de Bruijn
  1 sibling, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2019-12-02 10:14 UTC (permalink / raw)
  To: netdev; +Cc: kernel-team, Marek Majkowski

On Wed, Nov 27, 2019 at 03:07 PM CET, Marek Majkowski wrote:
> In my applications I need something like a connectx()[1] syscall. On
> Linux I can get quite far with using bind-before-connect and
> IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.
>
> For various UDP applications I'm establishing connected sockets from
> specific 2-tuple. This is working fine with bind-before-connect, but
> in UDP it creates a slight race condition. It's possible the socket
> will receive packet from arbitrary source after bind():
>
> s = socket(SOCK_DGRAM)
> s.bind((192.0.2.1, 1703))
> # here be dragons
> s.connect((198.18.0.1, 58910))
>
> For the short amount of time after bind() and before connect(), the
> socket may receive packets from any peer. For situations when I don't
> need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
> issue. This code is fine:
>
> s = socket(SOCK_DGRAM)
> s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s.bind((192.0.2.1, 0))
> s.connect((198.18.0.1, 58910))
>
> But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
> selected. It seems natural to expand the scope of
> IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:
>
> s = socket(SOCK_DGRAM)
> s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> s.bind((192.0.2.1, 1703))
> s.connect((198.18.0.1, 58910))
>
> I would like such code to delay the binding to port 1703 up until the
> connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
> sockets anyway. This raises a couple of questions though:
>
>  - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
> number in the bind!
>
>  - Where to store the source port in __inet_bind. Neither
> inet->inet_sport nor inet->inet_num seem like correct places to store
> the user-passed source port hint. The alternative is to introduce
> yet-another field onto inet_sock struct, but that is wasteful.

We've been talking with Marek about it some more. I'll summarize for the
sake of keeping the discussion open.

1. inet->inet_sport as storage for port hint

   It seems inet->inet_sport could be used to hold the port passed to
   bind() when we're delaying port allocation with
   IP_BIND_ADDRESS_NO_PORT. As long as local port, inet->inet_num, is
   not set, connect() and sendmsg() will know the socket needs to be
   bound to a port first.

   We didn't do a detailed audit of all access sites to
   inet->inet_sport. Potentially we missed something.

2. Backward compatibility

   Changing the existing behavior to delay port allocation when
   IP_BIND_ADDRESS_NO_PORT is set but port number was passed to bind(),
   could break apps that set the sockopt but never connect() the socket
   for some reason.

3. Extend the sockopt? Add new one? Introduce connectx() syscall?

   Since IP_BIND_ADDRESS_NO_PORT cannot be reused as is, we need a way
   for the user-space to signal its desire to delay binding to a
   specific port.

   We could imagine an extended version of IP_BIND_ADDRESS_NO_PORT
   sockopt that takes an extra value apart from the int flag.

   Then there's the option of adding a new sockopt dedicated for this
   use-case. However, we fear two sockopts having a similar purpose will
   be confusing for the users [0].

   Finally, we could go for the hard-core solution and take a stab at
   adding connectx() syscall [1]. Were there any attempts or discussions
   about this before? Quick search didn't turn up anything but the name
   is kind of a nightmare to google for.

   Question to the maintainers - which approach would be most welcome?

4. Why connected UDP sockets?

   We know that it's better to stick to receiving UDP sockets and
   demultiplex the client requests/sessions in user-space. Being hashed
   just by local address & port, connected UDP sockets don't scale well.

   We think there is one useful application, though. Service draining
   during restarts.

   When a service is being restarted, we would like the dying process to
   handle the ongoing L7 sessions until they come to an end. New UDP
   flows should go to a fresh service instance.

   To achieve that, for each ongoing session we would open a connected
   UDP socket. This way socket lookup logic would deliver just the flows
   we care about to the old process.

5. reuseport BPF with SOCKARRAY to the rescue?

   Since we're talking about opening connected UDP sockets that share
   the local port with other receiving UDP sockets (owned by another
   process), we would need to opt for port sharing with REUSEPORT [3].

   If we don't want the connected UDP sockets to receive any traffic
   during the short window of opportunity when the socket is bound but
   not connected, we could exclude it from the reuseport group by
   controlling the socket set with BPF & SOCKARRAY.

Comments and thoughts more than welcome.

-Jakub

[0] Unless we call it IP_BIND_ADDRESS_NO_PORT_FOR_REAL... ;-)
[1] https://www.unix.com/man-page/mojave/2/connectx/
[2] Or REUSEADDR which semantics allow it for unicast UDP.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Delayed source port allocation for connected UDP sockets
  2019-12-02 10:14 ` Jakub Sitnicki
@ 2019-12-02 16:03   ` Willem de Bruijn
  2019-12-03 14:59     ` Marek Majkowski
  0 siblings, 1 reply; 7+ messages in thread
From: Willem de Bruijn @ 2019-12-02 16:03 UTC (permalink / raw)
  To: Jakub Sitnicki; +Cc: Network Development, kernel-team, Marek Majkowski

On Mon, Dec 2, 2019 at 5:15 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Nov 27, 2019 at 03:07 PM CET, Marek Majkowski wrote:
> > In my applications I need something like a connectx()[1] syscall. On
> > Linux I can get quite far with using bind-before-connect and
> > IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.
> >
> > For various UDP applications I'm establishing connected sockets from
> > specific 2-tuple. This is working fine with bind-before-connect, but
> > in UDP it creates a slight race condition. It's possible the socket
> > will receive packet from arbitrary source after bind():
> >
> > s = socket(SOCK_DGRAM)
> > s.bind((192.0.2.1, 1703))
> > # here be dragons
> > s.connect((198.18.0.1, 58910))
> >
> > For the short amount of time after bind() and before connect(), the
> > socket may receive packets from any peer. For situations when I don't
> > need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
> > issue. This code is fine:
> >
> > s = socket(SOCK_DGRAM)
> > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > s.bind((192.0.2.1, 0))
> > s.connect((198.18.0.1, 58910))
> >
> > But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
> > selected. It seems natural to expand the scope of
> > IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:
> >
> > s = socket(SOCK_DGRAM)
> > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > s.bind((192.0.2.1, 1703))
> > s.connect((198.18.0.1, 58910))
> >
> > I would like such code to delay the binding to port 1703 up until the
> > connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
> > sockets anyway. This raises a couple of questions though:
> >
> >  - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
> > number in the bind!
> >
> >  - Where to store the source port in __inet_bind. Neither
> > inet->inet_sport nor inet->inet_num seem like correct places to store
> > the user-passed source port hint. The alternative is to introduce
> > yet-another field onto inet_sock struct, but that is wasteful.
>
> We've been talking with Marek about it some more. I'll summarize for the
> sake of keeping the discussion open.
>
> 1. inet->inet_sport as storage for port hint
>
>    It seems inet->inet_sport could be used to hold the port passed to
>    bind() when we're delaying port allocation with
>    IP_BIND_ADDRESS_NO_PORT. As long as local port, inet->inet_num, is
>    not set, connect() and sendmsg() will know the socket needs to be
>    bound to a port first.

So bind might succeed, but connect fail later if the port is already
bound by another socket inbetween?

Related, I have toyed with unhashed sockets with inet_sport set in the
past for a different use-case: transmit-only sockets. If all receive
processing happens on a small set (say, per cpu) of unconnected
listening sockets. Then have unhashed transmit-only connected sockets
to transmit without route lookup. But the route caching did not
warrant the cost of maintaining a socket per connection at scale.

>
>    We didn't do a detailed audit of all access sites to
>    inet->inet_sport. Potentially we missed something.
>
>

> 4. Why connected UDP sockets?
>
>    We know that it's better to stick to receiving UDP sockets and
>    demultiplex the client requests/sessions in user-space. Being hashed
>    just by local address & port, connected UDP sockets don't scale well.
>
>    We think there is one useful application, though. Service draining
>    during restarts.
>
>    When a service is being restarted, we would like the dying process to
>    handle the ongoing L7 sessions until they come to an end. New UDP
>    flows should go to a fresh service instance.

Service hand-off is a prime use case of reuseport BPF. With UDP it is
trickier than TCP. Requires a map to store session to process affinity,
likely.

>    To achieve that, for each ongoing session we would open a connected
>    UDP socket. This way socket lookup logic would deliver just the flows
>    we care about to the old process.
>
> 5. reuseport BPF with SOCKARRAY to the rescue?
>
>    Since we're talking about opening connected UDP sockets that share
>    the local port with other receiving UDP sockets (owned by another
>    process), we would need to opt for port sharing with REUSEPORT [3].
>
>    If we don't want the connected UDP sockets to receive any traffic
>    during the short window of opportunity when the socket is bound but
>    not connected, we could exclude it from the reuseport group by
>    controlling the socket set with BPF & SOCKARRAY.
>
> Comments and thoughts more than welcome.

If CAP_NET_RAW is no issue, Maciej's suggestion of temporarily binding
to a dummy device (or even lo) might be the simplest approach?

>
> -Jakub
>
> [0] Unless we call it IP_BIND_ADDRESS_NO_PORT_FOR_REAL... ;-)
> [1] https://www.unix.com/man-page/mojave/2/connectx/
> [2] Or REUSEADDR which semantics allow it for unicast UDP.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Delayed source port allocation for connected UDP sockets
  2019-12-02 16:03   ` Willem de Bruijn
@ 2019-12-03 14:59     ` Marek Majkowski
  0 siblings, 0 replies; 7+ messages in thread
From: Marek Majkowski @ 2019-12-03 14:59 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Jakub Sitnicki, Network Development, kernel-team

On Mon, Dec 2, 2019 at 5:03 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> So bind might succeed, but connect fail later if the port is already
> bound by another socket inbetween?

Yes, I'm proposing to delay the bind() up till connect(). The
semantics should remain the same, just the actual bind work will be
done atomically in the context of connect.

As mentioned - this is basically what connectx syscall does on some BSD's.

> Related, I have toyed with unhashed sockets with inet_sport set in the
> past for a different use-case: transmit-only sockets. If all receive
> processing happens on a small set (say, per cpu) of unconnected
> listening sockets. Then have unhashed transmit-only connected sockets
> to transmit without route lookup. But the route caching did not
> warrant the cost of maintaining a socket per connection at scale.

This is interesting. We have another use case for that - with TPROXY, we need
to _source_ packets from arbitrary port number. Port number on udp socket
can't be set with usual IP_PKTINFO. Therefore, to source packets from
arbitrary port number we are planning either:

 - use raw sockets
 - open a port on useless ip but specific sport, like 127.0.0.99:1234,
and call sendto() on it with arbitrary target.

Having proper unhashed sockets would make it slightly less hacky.

[...]
> If CAP_NET_RAW is no issue, Maciej's suggestion of temporarily binding
> to a dummy device (or even lo) might be the simplest approach?

Oh boy. I thought I know enough UDP hacks in Linux, but this brings it
to the next level. Indeed, it works:

sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sd.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
sd.bind(('0.0.0.0', 1234))
sd.connect(("1.1.1.1", 53))
sd.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"")

With the caveat, that dummy0 must be up. But this successfully
eliminates the race.

Thanks for suggestions,
    Marek

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-12-03 14:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27 14:07 Delayed source port allocation for connected UDP sockets Marek Majkowski
2019-11-27 16:09 ` Maciej Żenczykowski
2019-11-27 16:18   ` Maciej Żenczykowski
2019-11-27 17:15     ` Marek Majkowski
2019-12-02 10:14 ` Jakub Sitnicki
2019-12-02 16:03   ` Willem de Bruijn
2019-12-03 14:59     ` Marek Majkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).