All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marek Majkowski <marek@cloudflare.com>
To: "Maciej Żenczykowski" <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	network dev <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>
Subject: Re: Delayed source port allocation for connected UDP sockets
Date: Wed, 27 Nov 2019 18:15:21 +0100	[thread overview]
Message-ID: <CAJPywTJv=pFK2dFcHRsZPR89DQVbQX8J6OAcSkZk5MkOP43kvQ@mail.gmail.com> (raw)
In-Reply-To: <CANP3RGfLkxodi=SB3KuS+Vhv==Akb0Ep16qNkXd+h4x23PaG=Q@mail.gmail.com>

There may be a valid socket underneath. Consider socket() followed by bind():

udp UNCONN *:* 0.0.0.0:1703  -> master
udp UNCONN *:* 192.0.2.1:1703 -> worker

Them after connect() is done, the socket will move to ESTAB:

udp UNCONN *:* 0.0.0.0:1703  -> master
udp ESTAB 198.18.0.1:58910 192.0.2.1:1703 -> worker

I want to avoid this race. For this brief moment now I have two UNCONN
sockets. I don't want that. I want other sources to be routed to the
wildcard address. I', thinking that IP_BIND_ADDRESS_NO_PORT should be
basically a request for delayed binding. For me it makes sense to
delay the actual binding to the connect().

Marek

On Wed, Nov 27, 2019 at 5:19 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Wed, Nov 27, 2019 at 8:09 AM Maciej Żenczykowski <maze@google.com> wrote:
> >
> > On Wed, Nov 27, 2019 at 6:08 AM Marek Majkowski <marek@cloudflare.com> wrote:
> > >
> > > Morning,
> > >
> > > In my applications I need something like a connectx()[1] syscall. On
> > > Linux I can get quite far with using bind-before-connect and
> > > IP_BIND_ADDRESS_NO_PORT. One corner case is missing though.
> > >
> > > For various UDP applications I'm establishing connected sockets from
> > > specific 2-tuple. This is working fine with bind-before-connect, but
> > > in UDP it creates a slight race condition. It's possible the socket
> > > will receive packet from arbitrary source after bind():
> > >
> > > s = socket(SOCK_DGRAM)
> > > s.bind((192.0.2.1, 1703))
> > > # here be dragons
> > > s.connect((198.18.0.1, 58910))
> > >
> > > For the short amount of time after bind() and before connect(), the
> > > socket may receive packets from any peer. For situations when I don't
> > > need to specify source port, IP_BIND_ADDRESS_NO_PORT flag solves the
> > > issue. This code is fine:
> > >
> > > s = socket(SOCK_DGRAM)
> > > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > > s.bind((192.0.2.1, 0))
> > > s.connect((198.18.0.1, 58910))
> > >
> > > But the IP_BIND_ADDRESS_NO_PORT doesn't work when the source port is
> > > selected. It seems natural to expand the scope of
> > > IP_BIND_ADDRESS_NO_PORT flag. Perhaps this could be made to work:
> > >
> > > s = socket(SOCK_DGRAM)
> > > s.setsockopt(IP_BIND_ADDRESS_NO_PORT)
> > > s.bind((192.0.2.1, 1703))
> > > s.connect((198.18.0.1, 58910))
> > >
> > > I would like such code to delay the binding to port 1703 up until the
> > > connect(). IP_BIND_ADDRESS_NO_PORT only makes sense for connected
> > > sockets anyway. This raises a couple of questions though:
> > >
> > >  - IP_BIND_ADDRESS_NO_PORT name is confusing - we specify the port
> > > number in the bind!
> > >
> > >  - Where to store the source port in __inet_bind. Neither
> > > inet->inet_sport nor inet->inet_num seem like correct places to store
> > > the user-passed source port hint. The alternative is to introduce
> > > yet-another field onto inet_sock struct, but that is wasteful.
> > >
> > > Suggestions?
> > >
> > > Marek
> > >
> > > [1] https://www.unix.com/man-page/mojave/2/connectx/
> >
> > attack BPF socket filter drop all, then bind, then connect, then replace it.
>
> Although I guess perhaps you'd consider dropping the packets to be bad...?
> Then I think you might be able to do the same trick with
> SO_BINDTODEVICE("dummy0") instead of bpf and then SO_BINDTODEVICE("")
> That unfortunately requires privs though.

  reply	other threads:[~2019-11-27 17:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 14:07 Delayed source port allocation for connected UDP sockets Marek Majkowski
2019-11-27 16:09 ` Maciej Żenczykowski
2019-11-27 16:18   ` Maciej Żenczykowski
2019-11-27 17:15     ` Marek Majkowski [this message]
2019-12-02 10:14 ` Jakub Sitnicki
2019-12-02 16:03   ` Willem de Bruijn
2019-12-03 14:59     ` Marek Majkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJPywTJv=pFK2dFcHRsZPR89DQVbQX8J6OAcSkZk5MkOP43kvQ@mail.gmail.com' \
    --to=marek@cloudflare.com \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=maze@google.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.