netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] Faster SO_REUSEPORT
@ 2015-12-22 20:05 Craig Gallek
  2015-12-22 20:05 ` [PATCH net-next 1/4] soreuseport: define reuseport groups Craig Gallek
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Craig Gallek @ 2015-12-22 20:05 UTC (permalink / raw)
  To: netdev, David Miller

From: Craig Gallek <kraig@google.com>

This series contains two optimizations for the SO_REUSEPORT feature:
Faster lookup when selecting a socket for an incoming packet and
the ability to select the socket from the group using a BPF program.

This series only includes the UDP path.  I plan to submit a follow-up
including the TCP path if the implementation in this series is
acceptable.

Craig Gallek (4):
  soreuseport: define reuseport groups
  soreuseport: fast reuseport UDP socket selection
  soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
  soreuseport: BPF selection functional test

 arch/alpha/include/uapi/asm/socket.h        |   3 +
 arch/avr32/include/uapi/asm/socket.h        |   3 +
 arch/frv/include/uapi/asm/socket.h          |   3 +
 arch/ia64/include/uapi/asm/socket.h         |   3 +
 arch/m32r/include/uapi/asm/socket.h         |   3 +
 arch/mips/include/uapi/asm/socket.h         |   3 +
 arch/mn10300/include/uapi/asm/socket.h      |   3 +
 arch/parisc/include/uapi/asm/socket.h       |   3 +
 arch/powerpc/include/uapi/asm/socket.h      |   3 +
 arch/s390/include/uapi/asm/socket.h         |   3 +
 arch/sparc/include/uapi/asm/socket.h        |   3 +
 arch/xtensa/include/uapi/asm/socket.h       |   3 +
 include/linux/filter.h                      |   2 +
 include/net/addrconf.h                      |   3 +-
 include/net/sock.h                          |   2 +
 include/net/sock_reuseport.h                |  29 ++
 include/net/udp.h                           |   7 +-
 include/uapi/asm-generic/socket.h           |   3 +
 net/core/Makefile                           |   2 +-
 net/core/filter.c                           | 120 +++++--
 net/core/sock.c                             |  29 ++
 net/core/sock_reuseport.c                   | 251 +++++++++++++++
 net/ipv4/udp.c                              | 127 ++++++--
 net/ipv4/udp_diag.c                         |   4 +-
 net/ipv6/inet6_connection_sock.c            |   4 +-
 net/ipv6/udp.c                              |  56 +++-
 tools/testing/selftests/net/.gitignore      |   1 +
 tools/testing/selftests/net/Makefile        |   2 +-
 tools/testing/selftests/net/reuseport_bpf.c | 467 ++++++++++++++++++++++++++++
 29 files changed, 1077 insertions(+), 68 deletions(-)
 create mode 100644 include/net/sock_reuseport.h
 create mode 100644 net/core/sock_reuseport.c
 create mode 100644 tools/testing/selftests/net/reuseport_bpf.c

-- 
2.6.0.rc2.230.g3dd15c0

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: [PATCH net-next 3/4] soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF
@ 2015-12-26 19:05 Craig Gallek
  2015-12-29 17:16 ` Craig Gallek
  0 siblings, 1 reply; 13+ messages in thread
From: Craig Gallek @ 2015-12-26 19:05 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev, David Miller

On Thu, Dec 24, 2015 at 11:36 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Dec 22, 2015 at 03:05:09PM -0500, Craig Gallek wrote:
>> From: Craig Gallek <kraig@google.com>
>>
>> Expose socket options for setting a classic or extended BPF program
>> for use when selecting sockets in an SO_REUSEPORT group.  These options
>> can be used on the first socket to belong to a group before bind or
>> on any socket in the group after bind.
>>
>> This change includes refactoring of the existing sk_filter code to
>> allow reuse of the existing BPF filter validation checks.
>>
>> Signed-off-by: Craig Gallek <kraig@google.com>
>
> interesting stuff.
>
>> +static struct sock *run_bpf(struct sock_reuseport *reuse, u16 socks,
>> +                         struct bpf_prog *prog, struct sk_buff *skb,
>> +                         int hdr_len)
>> +{
>> +     struct sk_buff *nskb = NULL;
>> +     u32 index;
>> +
>> +     if (skb_shared(skb)) {
>> +             nskb = skb_clone(skb, GFP_ATOMIC);
>> +             if (!nskb)
>> +                     return NULL;
>> +             skb = nskb;
>> +     }
>
> what is the typical case here skb_shared or not?
I _think_ most of code paths that lead here will be not shared.  I
haven't finished examining all of the TCP cases yet, but in the common
UDP case the skb is not yet shared here.

>> +     /* temporarily advance data past protocol header */
>> +     if (skb_headlen(skb) < hdr_len || !skb_pull_inline(skb, hdr_len)) {
>
> though bpf core will read just fine past linear part of the packet,
> here we're limiting this feature only to packets where udp header is
> part of headlen. Will it make it somewhat unreliable?
> May be we can avoid doing this pull/push and use negative load
> instructions with SKF_NET_OFF ? Something like:
> load_word(skb, SKF_NET_OFF + sizeof(struct udphdr)));
This is an excellent point and will be even more relevant for TCP.
I'll try to get this to work for v2.

>>  /**
>>   *  reuseport_select_sock - Select a socket from an SO_REUSEPORT group.
>>   *  @sk: First socket in the group.
>> - *  @hash: Use this hash to select.
>> + *  @hash: When no BPF filter is available, use this hash to select.
>> + *  @skb: skb to run through BPF filter.
>> + *  @hdr_len: BPF filter expects skb data pointer at payload data.  If
>> + *    the skb does not yet point at the payload, this parameter represents
>> + *    how far the pointer needs to advance to reach the payload.
>
> what is the use case of this?
> Do you expect programs to be stateful?
I was trying to approximate the semantics of the existing BPF socket
filters.  When those programs are run, the skb data pointer has been
advanced passed the network headers.  I assumed that the location of
the existing socket filter execution was chosen because of the state
of the data pointer (to avoid leaking the headers to user space,
though this was just a guess).   These new filters must be run while
choosing the socket, before the protocol headers have been popped.  If
it's safe to send the skb to the program with the protocol header,
then this field is not strictly necessary (it may still be desired to
uniformly present packets to a program in a protocol-agnostic way).

I imagine that some programs may have some state in the form of BPF
maps, but the simpler ones will just do packet steering based on cpu
core id, numa node or something similar.  There is currently an
outstanding patch to do controlled shutdown of reuseport sockets
(SO_REUSEPORT_LISTEN_OFF).  I could imagine using this BPF approach to
implement those semantics as well.

>> +                             sk2 = reuseport_select_sock(sk, hash, NULL, 0);
> ...
>> +                             sk2 = reuseport_select_sock(sk, hash, skb,
>> +                                                     sizeof(struct udphdr));
>
> these are the cases that comment is trying to explain?
> Meaning the bpf program needs to understand well enough when udp stack
> is calling it ?
There are some call paths that involve selecting a socket without
having an skb.  These obviously won't work with a BPF program, so the
code falls back to using a hash of some other data.  The real reason
for the hdr_len parameter is to support advancing past both UDP or TCP
headers.  If the data pointer is always moved to the payload, the same
program could be used for any protocol (assuming it doesn't need to
care about the protocol).

> Will do more careful review of bpf bits once I'm back from PTO.
I appreciate the initial comments, enjoy your time off.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-12-29 17:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-22 20:05 [PATCH net-next 0/4] Faster SO_REUSEPORT Craig Gallek
2015-12-22 20:05 ` [PATCH net-next 1/4] soreuseport: define reuseport groups Craig Gallek
2015-12-22 21:40   ` David Miller
2015-12-22 21:58     ` Craig Gallek
2015-12-22 22:03       ` David Miller
2015-12-22 22:11   ` kbuild test robot
2015-12-22 22:39     ` Craig Gallek
2015-12-22 20:05 ` [PATCH net-next 2/4] soreuseport: fast reuseport UDP socket selection Craig Gallek
2015-12-22 20:05 ` [PATCH net-next 3/4] soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF Craig Gallek
2015-12-24 16:36   ` Alexei Starovoitov
2015-12-22 20:05 ` [PATCH net-next 4/4] soreuseport: BPF selection functional test Craig Gallek
2015-12-26 19:05 [PATCH net-next 3/4] soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF Craig Gallek
2015-12-29 17:16 ` Craig Gallek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).