All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Majkowski <marek@cloudflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Jakub Sitnicki <jakub@cloudflare.com>,
	Network Development <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: Delayed source port allocation for connected UDP sockets
Date: Tue, 3 Dec 2019 15:59:15 +0100	[thread overview]
Message-ID: <CAJPywTLPzBz50LW7awNMAEOUdjLt4spz3vQ6i3BRKOp2qzBq4g@mail.gmail.com> (raw)
In-Reply-To: <CA+FuTSfA9o=yQk5EjR2hMuhwRDLXCAwYQ+eGqx2YSh=hx03c8g@mail.gmail.com>

On Mon, Dec 2, 2019 at 5:03 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> So bind might succeed, but connect fail later if the port is already
> bound by another socket inbetween?

Yes, I'm proposing to delay the bind() up till connect(). The
semantics should remain the same, just the actual bind work will be
done atomically in the context of connect.

As mentioned - this is basically what connectx syscall does on some BSD's.

> Related, I have toyed with unhashed sockets with inet_sport set in the
> past for a different use-case: transmit-only sockets. If all receive
> processing happens on a small set (say, per cpu) of unconnected
> listening sockets. Then have unhashed transmit-only connected sockets
> to transmit without route lookup. But the route caching did not
> warrant the cost of maintaining a socket per connection at scale.

This is interesting. We have another use case for that - with TPROXY, we need
to _source_ packets from arbitrary port number. Port number on udp socket
can't be set with usual IP_PKTINFO. Therefore, to source packets from
arbitrary port number we are planning either:

 - use raw sockets
 - open a port on useless ip but specific sport, like 127.0.0.99:1234,
and call sendto() on it with arbitrary target.

Having proper unhashed sockets would make it slightly less hacky.

[...]
> If CAP_NET_RAW is no issue, Maciej's suggestion of temporarily binding
> to a dummy device (or even lo) might be the simplest approach?

Oh boy. I thought I know enough UDP hacks in Linux, but this brings it
to the next level. Indeed, it works:

sd = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sd.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
sd.bind(('0.0.0.0', 1234))
sd.connect(("1.1.1.1", 53))
sd.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"")

With the caveat, that dummy0 must be up. But this successfully
eliminates the race.

Thanks for suggestions,
    Marek

      reply	other threads:[~2019-12-03 14:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 14:07 Delayed source port allocation for connected UDP sockets Marek Majkowski
2019-11-27 16:09 ` Maciej Żenczykowski
2019-11-27 16:18   ` Maciej Żenczykowski
2019-11-27 17:15     ` Marek Majkowski
2019-12-02 10:14 ` Jakub Sitnicki
2019-12-02 16:03   ` Willem de Bruijn
2019-12-03 14:59     ` Marek Majkowski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJPywTLPzBz50LW7awNMAEOUdjLt4spz3vQ6i3BRKOp2qzBq4g@mail.gmail.com \
    --to=marek@cloudflare.com \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.