From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Paasch Subject: Re: [RFC 0/2] Delayed binding of UDP sockets for Quic per-connection sockets Date: Wed, 31 Oct 2018 20:50:50 -0700 Message-ID: <20181101035050.GO80792@MacBook-Pro-19.local> References: <20181031232635.33750-1-cpaasch@apple.com> <0ce864f0-38b9-59cc-18ea-e071afca347d@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; CHARSET=US-ASCII Content-Transfer-Encoding: 7BIT Cc: netdev@vger.kernel.org, Ian Swett , Leif Hedstrom , Jana Iyengar To: Eric Dumazet Return-path: Received: from nwk-aaemail-lapp02.apple.com ([17.151.62.67]:57030 "EHLO nwk-aaemail-lapp02.apple.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726863AbeKAO0h (ORCPT ); Thu, 1 Nov 2018 10:26:37 -0400 Content-disposition: inline In-reply-to: <0ce864f0-38b9-59cc-18ea-e071afca347d@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On 31/10/18 - 17:53:22, Eric Dumazet wrote: > On 10/31/2018 04:26 PM, Christoph Paasch wrote: > > Implementations of Quic might want to create a separate socket for each > > Quic-connection by creating a connected UDP-socket. > > > > Nice proposal, but I doubt a QUIC server can afford having one UDP socket per connection ? > > It would add a huge overhead in term of memory usage in the kernel, > and lots of epoll events to manage (say a QUIC server with one million flows, receiving > very few packets per second per flow) > > Maybe you could elaborate on the need of having one UDP socket per connection. I let Leif chime in on that as the ask came from him. Leif & his team are implementing Quic in the Apache Traffic Server. One advantage I can see is that it would allow to benefit from fq_pacing as one could set sk_pacing_rate simply on the socket. That way there is no need to implement the pacing in the user-space anymore. > > To achieve that on the server-side, a "master-socket" needs to wait for > > incoming new connections and then creates a new socket that will be a > > connected UDP-socket. To create that latter one, the server needs to > > first bind() and then connect(). However, after the bind() the server > > might already receive traffic on that new socket that is unrelated to the > > Quic-connection at hand. Only after the connect() a full 4-tuple match > > is happening. So, one can't really create this kind of a server that has > > a connected UDP-socket per Quic connection. > > > > So, what is needed is an "atomic bind & connect" that basically > > prevents any incoming traffic until the connect() call has been issued > > at which point the full 4-tuple is known. > > > > > > This patchset implements this functionality and exposes a socket-option > > to do this. > > > > Usage would be: > > > > int fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); > > > > int val = 1; > > setsockopt(fd, SOL_SOCKET, SO_DELAYED_BIND, &val, sizeof(val)); > > > > bind(fd, (struct sockaddr *)&src, sizeof(src)); > > > > /* At this point, incoming traffic will never match on this socket */ > > > > connect(fd, (struct sockaddr *)&dst, sizeof(dst)); > > > > /* Only now incoming traffic will reach the socket */ > > > > > > > > There is literally an infinite number of ways on how to implement it, > > which is why I first send it out as an RFC. With this approach here I > > chose the least invasive one, just preventing the match on the incoming > > path. > > > > > > The reason for choosing a SOL_SOCKET socket-option and not at the > > SOL_UDP-level is because that functionality actually could be useful for > > other protocols as well. E.g., TCP wants to better use the full 4-tuple space > > by binding to the source-IP and the destination-IP at the same time. > > Passive TCP flows can not benefit from this idea. > > Active TCP flows can already do that, I do not really understand what you are suggesting. What we had here is that we wanted to let a server initiate more than 64K connections *while* binding also to a source-IP. With TCP the bind() would then pick a source-port and we ended up hitting the 64K limit. If we could do an atomic "bind + connect", source-port selection could ensure that the 4-tuple is unique. Or has something changed in recent times that allows to use the 4-tuple matching when doing this with TCP? Christoph