* UDP "accept" proposed
@ 2013-06-18 8:48 James Yonan
2013-06-18 9:52 ` Daniel Borkmann
2013-06-18 9:54 ` David Laight
0 siblings, 2 replies; 5+ messages in thread
From: James Yonan @ 2013-06-18 8:48 UTC (permalink / raw)
To: netdev
One of the frustrations of creating UDP servers using BSD sockets is
that there isn't an easy way for a server to pass off a socket for a
particular client instance to a handler thread or process.
By contrast, with TCP you can "accept" an incoming connection, and pass
the socket representing that connection off to any arbitrary handler.
But UDP servers that want to play well with stateful firewalls and NAT
are forced to aggregate their entire connection pool onto a single
socket, since BSD sockets don't have the equivalent of an "accept"
mechanism to provide a connection-specific socket.
This is a disaster from a performance perspective because you can't take
a UDP server that binds to a single port and efficiently scale it up
across multiple threads or processors because you must operate off a
single socket.
So why can't I "accept" a UDP socket? The conventional response would
be that UDP is connectionless and that "accept" is meaningless outside
the context of a connection. UDP may be connectionless, but it's not
stateless. The tuple of (local address/port, remote address/port)
concisely defines the state of a UDP session between a client and
server. Netfilter connection tracking recognizes this statefulness, but
unfortunately BSD sockets do not.
I would like to propose that Linux adds a userspace API method to allow
UDP sockets to be "accepted":
int accept_udp(int sockfd, const struct sockaddr *addr, socklen_t
*addrlen, int flags)
accept_udp will return a new UDP socket which is bound to the original
local address/port of sockfd but which is additionally bound to the
source address/port denoted by addr. This socket will only receive
datagrams having a source address of addr, and when used with send(),
will transmit datagrams to addr.
This socket, while open, will have priority in the sense of receiving
any datagrams having a source address of addr that would normally have
been received by sockfd. When closed, datagrams from addr will revert
to being received by sockfd.
This abstraction allows UDP servers to follow the same scalable event
loop as TCP servers, i.e. bind to local socket, then:
1. recvfrom to read a packet
2. call accept_udp on socket, passing the source address of packet read
in (1)
3. pass the return socket of accept_udp to a handler thread
4. repeat
This would require the UDP implementation in the kernel to understand
how to dispatch incoming UDP datagrams to sockets based on the tuple of
(source addr, local addr) rather than just local addr as is currently
the case.
But this would be a huge performance win for UDP servers (I'm thinking
about OpenVPN in particular) because making the kernel smarter about
dispatching UDP datagrams would make it much easier to develop scalable
UDP servers on Linux.
James
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: UDP "accept" proposed
2013-06-18 8:48 UDP "accept" proposed James Yonan
@ 2013-06-18 9:52 ` Daniel Borkmann
2013-06-18 10:17 ` Eric Dumazet
2013-06-18 9:54 ` David Laight
1 sibling, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2013-06-18 9:52 UTC (permalink / raw)
To: James Yonan; +Cc: netdev
On 06/18/2013 10:48 AM, James Yonan wrote:
[...]
> This is a disaster from a performance perspective because you can't take a UDP server that
> binds to a single port and efficiently scale it up across multiple threads or processors
> because you must operate off a single socket.
[...]
> But this would be a huge performance win for UDP servers (I'm thinking about OpenVPN in
> particular) because making the kernel smarter about dispatching UDP datagrams would make it
> much easier to develop scalable UDP servers on Linux.
So SO_REUSEPORT that was added in 3.9 by Tom Herbert wouldn't
help in your case (+ f.e. steering flows to CPUs locally) ?
https://lwn.net/Articles/542629/
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: UDP "accept" proposed
2013-06-18 8:48 UDP "accept" proposed James Yonan
2013-06-18 9:52 ` Daniel Borkmann
@ 2013-06-18 9:54 ` David Laight
1 sibling, 0 replies; 5+ messages in thread
From: David Laight @ 2013-06-18 9:54 UTC (permalink / raw)
To: James Yonan, netdev
> One of the frustrations of creating UDP servers using BSD sockets is
> that there isn't an easy way for a server to pass off a socket for a
> particular client instance to a handler thread or process.
>
> By contrast, with TCP you can "accept" an incoming connection, and pass
> the socket representing that connection off to any arbitrary handler.
>
> But UDP servers that want to play well with stateful firewalls and NAT
> are forced to aggregate their entire connection pool onto a single
> socket, since BSD sockets don't have the equivalent of an "accept"
> mechanism to provide a connection-specific socket.
You should be able to create another UDP socket and use connect().
> This is a disaster from a performance perspective because you can't take
> a UDP server that binds to a single port and efficiently scale it up
> across multiple threads or processors because you must operate off a
> single socket.
Actually, for really large workloads having large number of sockets
generates its own problems.
...
> This would require the UDP implementation in the kernel to understand
> how to dispatch incoming UDP datagrams to sockets based on the tuple of
> (source addr, local addr) rather than just local addr as is currently
> the case.
I believe it already does that if you've called connect().
> But this would be a huge performance win for UDP servers (I'm thinking
> about OpenVPN in particular) because making the kernel smarter about
> dispatching UDP datagrams would make it much easier to develop scalable
> UDP servers on Linux.
By the sound of it what you really want is a small number of sockets
all bound to the same UDP port, and for the kernel to spread the
received traffic between the sockets in a manner that tends to keep
traffic from a specific remote host assigned to the same socket.
Then you can have a multi-threaded daemon that will tend to keep
the data in the correct cpu cache.
David
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: UDP "accept" proposed
2013-06-18 9:52 ` Daniel Borkmann
@ 2013-06-18 10:17 ` Eric Dumazet
2013-06-18 10:41 ` Daniel Borkmann
0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2013-06-18 10:17 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: James Yonan, netdev
On Tue, 2013-06-18 at 11:52 +0200, Daniel Borkmann wrote:
> On 06/18/2013 10:48 AM, James Yonan wrote:
> [...]
> > This is a disaster from a performance perspective because you can't take a UDP server that
> > binds to a single port and efficiently scale it up across multiple threads or processors
> > because you must operate off a single socket.
Well not with current linux ;)
> > But this would be a huge performance win for UDP servers (I'm thinking about OpenVPN in
> > particular) because making the kernel smarter about dispatching UDP datagrams would make it
> > much easier to develop scalable UDP servers on Linux.
>
> So SO_REUSEPORT that was added in 3.9 by Tom Herbert wouldn't
> help in your case (+ f.e. steering flows to CPUs locally) ?
>
> https://lwn.net/Articles/542629/
Yes, but no need for particular steering.
Incoming UDP message will match the 'connected socket' and will be
delivered to the socket.
RFS will then automatically do the right thing
( Documentation/networking/scaling.txt )
Note that if you have a lot of sockets bound to the same port,
this fix is needed :
http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=c87a124a5d5e8cf8e21c4363c3372bcaf53ea190
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: UDP "accept" proposed
2013-06-18 10:17 ` Eric Dumazet
@ 2013-06-18 10:41 ` Daniel Borkmann
0 siblings, 0 replies; 5+ messages in thread
From: Daniel Borkmann @ 2013-06-18 10:41 UTC (permalink / raw)
To: Eric Dumazet; +Cc: James Yonan, netdev
On 06/18/2013 12:17 PM, Eric Dumazet wrote:
> On Tue, 2013-06-18 at 11:52 +0200, Daniel Borkmann wrote:
>> On 06/18/2013 10:48 AM, James Yonan wrote:
>> [...]
>>> This is a disaster from a performance perspective because you can't take a UDP server that
>> > binds to a single port and efficiently scale it up across multiple threads or processors
>> > because you must operate off a single socket.
>
> Well not with current linux ;)
>
>>> But this would be a huge performance win for UDP servers (I'm thinking about OpenVPN in
>> > particular) because making the kernel smarter about dispatching UDP datagrams would make it
>> > much easier to develop scalable UDP servers on Linux.
>
>>
>> So SO_REUSEPORT that was added in 3.9 by Tom Herbert wouldn't
>> help in your case (+ f.e. steering flows to CPUs locally) ?
>>
>> https://lwn.net/Articles/542629/
>
> Yes, but no need for particular steering.
>
> Incoming UDP message will match the 'connected socket' and will be
> delivered to the socket.
>
> RFS will then automatically do the right thing
> ( Documentation/networking/scaling.txt )
+1, this was what I meant. I should have been more specific. ;-)
> Note that if you have a lot of sockets bound to the same port,
> this fix is needed :
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=c87a124a5d5e8cf8e21c4363c3372bcaf53ea190
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2013-06-18 10:41 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-18 8:48 UDP "accept" proposed James Yonan
2013-06-18 9:52 ` Daniel Borkmann
2013-06-18 10:17 ` Eric Dumazet
2013-06-18 10:41 ` Daniel Borkmann
2013-06-18 9:54 ` David Laight
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.