All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next 00/17] Run a BPF program on socket lookup
@ 2020-05-06 12:54 ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Overview
========

This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
or BPF sk_lookup for short.

BPF sk_lookup program runs when transport layer is looking up a socket for
a received packet. When called, sk_lookup program can select a socket that
will receive the packet.

This serves as a mechanism to overcome the limits of what bind() API allows
to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, fixed port to a single socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, any port to a single socket

     198.51.100.1, any port -> L7 proxy socket

In its context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection, and returns BPF_REDIRECT code. Transport
layer then uses the selected socket as a result of socket lookup.

Alternatively, program can also fail the lookup (BPF_DROP), or let the
lookup continue as usual (BPF_OK).

This lets the user match packets with listening (TCP) or receiving (UDP)
sockets freely at the last possible point on the receive path, where we
know that packets are destined for local delivery after undergoing
policing, filtering, and routing.

Program is attached to a network namespace, similar to BPF flow_dissector.
We add a new attach type, BPF_SK_LOOKUP, for this.

Patches are organized as so:

 1: prepares ground for attaching/detaching programs to netns
 2: introduces sk_lookup program type
 3-5: hook up the program to run on ipv4/tcp socket lookup
 6-7: hook up the program to run on ipv6/tcp socket lookup
 8-10: hook up the program to run on ipv4/udp socket lookup
 11-12: hook up the program to run on ipv4/udp socket lookup
 13-14: add libbpf support for sk_lookup
 15-17: verifier and selftests for sk_lookup

Performance considerations
==========================

Patch set adds new code on receive hot path. This comes with a cost,
especially in a scenario of a SYN flood or small UDP packet flood.

Measuring the performance penalty turned out to be harder than expected
because socket lookup is fast. For CPUs to spend >= 1% of time in socket
lookup we had to modify our setup by unloading iptables and reducing the
number of routes.

The receiver machine is a Cloudflare Gen 9 server covered in detail at [0].
In short:

 - 24 core Intel custom off-roadmap 1.9Ghz 150W (Skylake) CPU
 - dual-port 25G Mellanox ConnectX-4 NIC
 - 256G DDR4 2666Mhz RAM

Flood traffic pattern:

 - source: 1 IP, 10k ports
 - destination: 1 IP, 1 port
 - TCP - SYN packet
 - UDP - Len=0 packet

Receiver setup:

 - ingress traffic spread over 4 RX queues,
 - RX/TX pause and autoneg disabled,
 - Intel Turbo Boost disabled,
 - TCP SYN cookies always on.

For TCP test there is a receiver process with single listening socket
open. Receiver is not accept()'ing connections.

For UDP the receiver process has a single UDP socket with a filter
installed, dropping the packets.

With such setup in place, we record RX pps and cpu-cycles events under
flood for 60 seconds in 3 configurations:

 1. 5.6.3 kernel w/o this patch series (baseline),
 2. 5.6.3 kernel with patches applied, but no SK_LOOKUP program attached,
 3. 5.6.3 kernel with patches applied, and SK_LOOKUP program attached;
    BPF program [1] is doing a lookup in LPM_TRIE map with 200 entries.

RX pps measured with `ifpps -d <dev> -t 1000 --csv --loop` for 60 seconds.

| tcp4 SYN flood               | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 939,616 ± 0.5%         |        - |
| no SK_LOOKUP prog attached   | 929,275 ± 1.2%         |    -1.1% |
| with SK_LOOKUP prog attached | 918,582 ± 0.4%         |    -2.2% |

| tcp6 SYN flood               | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 875,838 ± 0.5%         |        - |
| no SK_LOOKUP prog attached   | 872,005 ± 0.3%         |    -0.4% |
| with SK_LOOKUP prog attached | 856,250 ± 0.5%         |    -2.2% |

| udp4 0-len flood             | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 2,738,662 ± 1.5%       |        - |
| no SK_LOOKUP prog attached   | 2,576,893 ± 1.0%       |    -5.9% |
| with SK_LOOKUP prog attached | 2,530,698 ± 1.0%       |    -7.6% |

| udp6 0-len flood             | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 2,867,885 ± 1.4%       |        - |
| no SK_LOOKUP prog attached   | 2,646,875 ± 1.0%       |    -7.7% |
| with SK_LOOKUP prog attached | 2,520,474 ± 0.7%       |   -12.1% |

Also visualized on bpf-sk-lookup-v1-rx-pps.png chart [2].

cpu-cycles measured with `perf record -F 999 --cpu 1-4 -g -- sleep 60`.

|                              |      cpu-cycles events |          |
| tcp4 SYN flood               | __inet_lookup_listener | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  1.12% |        - |
| no SK_LOOKUP prog attached   |                  1.31% |    0.19% |
| with SK_LOOKUP prog attached |                  3.05% |    1.93% |

|                              |      cpu-cycles events |          |
| tcp6 SYN flood               |  inet6_lookup_listener | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  1.05% |        - |
| no SK_LOOKUP prog attached   |                  1.68% |    0.63% |
| with SK_LOOKUP prog attached |                  3.15% |    2.10% |

|                              |      cpu-cycles events |          |
| udp4 0-len flood             |      __udp4_lib_lookup | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  3.81% |        - |
| no SK_LOOKUP prog attached   |                  5.22% |    1.41% |
| with SK_LOOKUP prog attached |                  8.20% |    4.39% |

|                              |      cpu-cycles events |          |
| udp6 0-len flood             |      __udp6_lib_lookup | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  5.51% |        - |
| no SK_LOOKUP prog attached   |                  6.51% |    1.00% |
| with SK_LOOKUP prog attached |                 10.14% |    4.63% |

Also visualized on bpf-sk-lookup-v1-cpu-cycles.png chart [3].

Further work
============

To be done, either in next iteration, or as a follow up:

 - document the new program type under Documentation/bpf/,
 - timeout on accept() in tests once accept_timeout is in a common place.

Changelog
=========

RFCv2 -> v1:

- Switch to fetching a socket from a map and selecting a socket with
  bpf_sk_assign, instead of having a dedicated helper that does both.

- Run reuseport logic on sockets selected by BPF sk_lookup.

- Allow BPF sk_lookup to fail the lookup with no match.

- Go back to having just 2 hash table lookups in UDP.

RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.

- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.

- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.

[0] https://blog.cloudflare.com/a-tour-inside-cloudflares-g9-servers/
[1] https://github.com/majek/inet-tool/blob/master/ebpf/inet-kern.c
[2] https://drive.google.com/file/d/1HrrjWhQoVlqiqT73_eLtWMPhuGPKhGFX/
[3] https://drive.google.com/file/d/1cYPPOlGg7M-bkzI4RW1SOm49goI4LYbb/
[RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
[RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/

Jakub Sitnicki (17):
  flow_dissector: Extract attach/detach/query helpers
  bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  inet: Store layer 4 protocol in inet_hashinfo
  inet: Extract helper for selecting socket from reuseport group
  inet: Run SK_LOOKUP BPF program on socket lookup
  inet6: Extract helper for selecting socket from reuseport group
  inet6: Run SK_LOOKUP BPF program on socket lookup
  udp: Store layer 4 protocol in udp_table
  udp: Extract helper for selecting socket from reuseport group
  udp: Run SK_LOOKUP BPF program on socket lookup
  udp6: Extract helper for selecting socket from reuseport group
  udp6: Run SK_LOOKUP BPF program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for SK_LOOKUP program type
  selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
  selftests/bpf: Tests for BPF_SK_LOOKUP attach point

 include/linux/bpf.h                           |   8 +
 include/linux/bpf_types.h                     |   2 +
 include/linux/filter.h                        |  23 +
 include/net/inet6_hashtables.h                |  20 +
 include/net/inet_hashtables.h                 |  39 +
 include/net/net_namespace.h                   |   1 +
 include/net/udp.h                             |  10 +-
 include/uapi/linux/bpf.h                      |  53 +
 kernel/bpf/syscall.c                          |   9 +
 net/core/filter.c                             | 315 ++++++
 net/core/flow_dissector.c                     |  61 +-
 net/dccp/proto.c                              |   2 +-
 net/ipv4/inet_hashtables.c                    |  44 +-
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |  85 +-
 net/ipv4/udp_impl.h                           |   2 +-
 net/ipv4/udplite.c                            |   4 +-
 net/ipv6/inet6_hashtables.c                   |  46 +-
 net/ipv6/udp.c                                |  86 +-
 net/ipv6/udp_impl.h                           |   2 +-
 net/ipv6/udplite.c                            |   2 +-
 scripts/bpf_helpers_doc.py                    |   9 +-
 tools/include/uapi/linux/bpf.h                |  53 +
 tools/lib/bpf/libbpf.c                        |   3 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 .../bpf/prog_tests/reference_tracking.c       |   2 +-
 .../selftests/bpf/prog_tests/sk_lookup.c      | 999 ++++++++++++++++++
 .../selftests/bpf/progs/test_ref_track_kern.c | 180 ++++
 .../selftests/bpf/progs/test_sk_lookup_kern.c | 258 +++--
 .../selftests/bpf/verifier/ctx_sk_lookup.c    | 696 ++++++++++++
 32 files changed, 2749 insertions(+), 272 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ref_track_kern.c
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

-- 
2.25.3


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 00/17] Run a BPF program on socket lookup
@ 2020-05-06 12:54 ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: dccp

Overview
====

This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
or BPF sk_lookup for short.

BPF sk_lookup program runs when transport layer is looking up a socket for
a received packet. When called, sk_lookup program can select a socket that
will receive the packet.

This serves as a mechanism to overcome the limits of what bind() API allows
to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, fixed port to a single socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, any port to a single socket

     198.51.100.1, any port -> L7 proxy socket

In its context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection, and returns BPF_REDIRECT code. Transport
layer then uses the selected socket as a result of socket lookup.

Alternatively, program can also fail the lookup (BPF_DROP), or let the
lookup continue as usual (BPF_OK).

This lets the user match packets with listening (TCP) or receiving (UDP)
sockets freely at the last possible point on the receive path, where we
know that packets are destined for local delivery after undergoing
policing, filtering, and routing.

Program is attached to a network namespace, similar to BPF flow_dissector.
We add a new attach type, BPF_SK_LOOKUP, for this.

Patches are organized as so:

 1: prepares ground for attaching/detaching programs to netns
 2: introduces sk_lookup program type
 3-5: hook up the program to run on ipv4/tcp socket lookup
 6-7: hook up the program to run on ipv6/tcp socket lookup
 8-10: hook up the program to run on ipv4/udp socket lookup
 11-12: hook up the program to run on ipv4/udp socket lookup
 13-14: add libbpf support for sk_lookup
 15-17: verifier and selftests for sk_lookup

Performance considerations
=============

Patch set adds new code on receive hot path. This comes with a cost,
especially in a scenario of a SYN flood or small UDP packet flood.

Measuring the performance penalty turned out to be harder than expected
because socket lookup is fast. For CPUs to spend >= 1% of time in socket
lookup we had to modify our setup by unloading iptables and reducing the
number of routes.

The receiver machine is a Cloudflare Gen 9 server covered in detail at [0].
In short:

 - 24 core Intel custom off-roadmap 1.9Ghz 150W (Skylake) CPU
 - dual-port 25G Mellanox ConnectX-4 NIC
 - 256G DDR4 2666Mhz RAM

Flood traffic pattern:

 - source: 1 IP, 10k ports
 - destination: 1 IP, 1 port
 - TCP - SYN packet
 - UDP - Len=0 packet

Receiver setup:

 - ingress traffic spread over 4 RX queues,
 - RX/TX pause and autoneg disabled,
 - Intel Turbo Boost disabled,
 - TCP SYN cookies always on.

For TCP test there is a receiver process with single listening socket
open. Receiver is not accept()'ing connections.

For UDP the receiver process has a single UDP socket with a filter
installed, dropping the packets.

With such setup in place, we record RX pps and cpu-cycles events under
flood for 60 seconds in 3 configurations:

 1. 5.6.3 kernel w/o this patch series (baseline),
 2. 5.6.3 kernel with patches applied, but no SK_LOOKUP program attached,
 3. 5.6.3 kernel with patches applied, and SK_LOOKUP program attached;
    BPF program [1] is doing a lookup in LPM_TRIE map with 200 entries.

RX pps measured with `ifpps -d <dev> -t 1000 --csv --loop` for 60 seconds.

| tcp4 SYN flood               | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 939,616 ± 0.5%         |        - |
| no SK_LOOKUP prog attached   | 929,275 ± 1.2%         |    -1.1% |
| with SK_LOOKUP prog attached | 918,582 ± 0.4%         |    -2.2% |

| tcp6 SYN flood               | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 875,838 ± 0.5%         |        - |
| no SK_LOOKUP prog attached   | 872,005 ± 0.3%         |    -0.4% |
| with SK_LOOKUP prog attached | 856,250 ± 0.5%         |    -2.2% |

| udp4 0-len flood             | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 2,738,662 ± 1.5%       |        - |
| no SK_LOOKUP prog attached   | 2,576,893 ± 1.0%       |    -5.9% |
| with SK_LOOKUP prog attached | 2,530,698 ± 1.0%       |    -7.6% |

| udp6 0-len flood             | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     | 2,867,885 ± 1.4%       |        - |
| no SK_LOOKUP prog attached   | 2,646,875 ± 1.0%       |    -7.7% |
| with SK_LOOKUP prog attached | 2,520,474 ± 0.7%       |   -12.1% |

Also visualized on bpf-sk-lookup-v1-rx-pps.png chart [2].

cpu-cycles measured with `perf record -F 999 --cpu 1-4 -g -- sleep 60`.

|                              |      cpu-cycles events |          |
| tcp4 SYN flood               | __inet_lookup_listener | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  1.12% |        - |
| no SK_LOOKUP prog attached   |                  1.31% |    0.19% |
| with SK_LOOKUP prog attached |                  3.05% |    1.93% |

|                              |      cpu-cycles events |          |
| tcp6 SYN flood               |  inet6_lookup_listener | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  1.05% |        - |
| no SK_LOOKUP prog attached   |                  1.68% |    0.63% |
| with SK_LOOKUP prog attached |                  3.15% |    2.10% |

|                              |      cpu-cycles events |          |
| udp4 0-len flood             |      __udp4_lib_lookup | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  3.81% |        - |
| no SK_LOOKUP prog attached   |                  5.22% |    1.41% |
| with SK_LOOKUP prog attached |                  8.20% |    4.39% |

|                              |      cpu-cycles events |          |
| udp6 0-len flood             |      __udp6_lib_lookup | Δ events |
|------------------------------+------------------------+----------|
| 5.6.3 vanilla (baseline)     |                  5.51% |        - |
| no SK_LOOKUP prog attached   |                  6.51% |    1.00% |
| with SK_LOOKUP prog attached |                 10.14% |    4.63% |

Also visualized on bpf-sk-lookup-v1-cpu-cycles.png chart [3].

Further work
======

To be done, either in next iteration, or as a follow up:

 - document the new program type under Documentation/bpf/,
 - timeout on accept() in tests once accept_timeout is in a common place.

Changelog
====
RFCv2 -> v1:

- Switch to fetching a socket from a map and selecting a socket with
  bpf_sk_assign, instead of having a dedicated helper that does both.

- Run reuseport logic on sockets selected by BPF sk_lookup.

- Allow BPF sk_lookup to fail the lookup with no match.

- Go back to having just 2 hash table lookups in UDP.

RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.

- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.

- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.

[0] https://blog.cloudflare.com/a-tour-inside-cloudflares-g9-servers/
[1] https://github.com/majek/inet-tool/blob/master/ebpf/inet-kern.c
[2] https://drive.google.com/file/d/1HrrjWhQoVlqiqT73_eLtWMPhuGPKhGFX/
[3] https://drive.google.com/file/d/1cYPPOlGg7M-bkzI4RW1SOm49goI4LYbb/
[RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
[RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/

Jakub Sitnicki (17):
  flow_dissector: Extract attach/detach/query helpers
  bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  inet: Store layer 4 protocol in inet_hashinfo
  inet: Extract helper for selecting socket from reuseport group
  inet: Run SK_LOOKUP BPF program on socket lookup
  inet6: Extract helper for selecting socket from reuseport group
  inet6: Run SK_LOOKUP BPF program on socket lookup
  udp: Store layer 4 protocol in udp_table
  udp: Extract helper for selecting socket from reuseport group
  udp: Run SK_LOOKUP BPF program on socket lookup
  udp6: Extract helper for selecting socket from reuseport group
  udp6: Run SK_LOOKUP BPF program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for SK_LOOKUP program type
  selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
  selftests/bpf: Tests for BPF_SK_LOOKUP attach point

 include/linux/bpf.h                           |   8 +
 include/linux/bpf_types.h                     |   2 +
 include/linux/filter.h                        |  23 +
 include/net/inet6_hashtables.h                |  20 +
 include/net/inet_hashtables.h                 |  39 +
 include/net/net_namespace.h                   |   1 +
 include/net/udp.h                             |  10 +-
 include/uapi/linux/bpf.h                      |  53 +
 kernel/bpf/syscall.c                          |   9 +
 net/core/filter.c                             | 315 ++++++
 net/core/flow_dissector.c                     |  61 +-
 net/dccp/proto.c                              |   2 +-
 net/ipv4/inet_hashtables.c                    |  44 +-
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |  85 +-
 net/ipv4/udp_impl.h                           |   2 +-
 net/ipv4/udplite.c                            |   4 +-
 net/ipv6/inet6_hashtables.c                   |  46 +-
 net/ipv6/udp.c                                |  86 +-
 net/ipv6/udp_impl.h                           |   2 +-
 net/ipv6/udplite.c                            |   2 +-
 scripts/bpf_helpers_doc.py                    |   9 +-
 tools/include/uapi/linux/bpf.h                |  53 +
 tools/lib/bpf/libbpf.c                        |   3 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 .../bpf/prog_tests/reference_tracking.c       |   2 +-
 .../selftests/bpf/prog_tests/sk_lookup.c      | 999 ++++++++++++++++++
 .../selftests/bpf/progs/test_ref_track_kern.c | 180 ++++
 .../selftests/bpf/progs/test_sk_lookup_kern.c | 258 +++--
 .../selftests/bpf/verifier/ctx_sk_lookup.c    | 696 ++++++++++++
 32 files changed, 2749 insertions(+), 272 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ref_track_kern.c
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

-- 
2.25.3

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 01/17] flow_dissector: Extract attach/detach/query helpers
@ 2020-05-06 12:54   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Lorenz Bauer

Move generic parts of callbacks for querying, attaching, and detaching a
single BPF program for reuse by other BPF program types.

Subsequent patch makes use of the extracted routines.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf.h       |  8 +++++
 net/core/filter.c         | 68 +++++++++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c | 61 +++++++----------------------------
 3 files changed, 88 insertions(+), 49 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1262ec460ab3..716c47ac1e75 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -31,6 +31,7 @@ struct seq_file;
 struct btf;
 struct btf_type;
 struct exception_table_entry;
+struct mutex;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1660,4 +1661,11 @@ enum bpf_text_poke_type {
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		       void *addr1, void *addr2);
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr);
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags);
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock);
+
 #endif /* _LINUX_BPF_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..bc25bb1085b1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8740,6 +8740,74 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 	return ret;
 }
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	u32 prog_id, prog_cnt = 0, flags = 0;
+	struct bpf_prog *attached;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	rcu_read_lock();
+	attached = rcu_dereference(*pprog);
+	if (attached) {
+		prog_cnt = 1;
+		prog_id = attached->aux->id;
+	}
+	rcu_read_unlock();
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags)
+{
+	struct bpf_prog *attached;
+
+	if (flags)
+		return -EINVAL;
+
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (attached == prog) {
+		/* The same program cannot be attached twice */
+		return -EINVAL;
+	}
+	rcu_assign_pointer(*pprog, prog);
+	if (attached)
+		bpf_prog_put(attached);
+
+	return 0;
+}
+
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock)
+{
+	struct bpf_prog *attached;
+
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (!attached)
+		return -ENOENT;
+	RCU_INIT_POINTER(*pprog, NULL);
+	bpf_prog_put(attached);
+
+	return 0;
+}
+
 #ifdef CONFIG_INET
 static void bpf_init_reuseport_kern(struct sk_reuseport_kern *reuse_kern,
 				    struct sock_reuseport *reuse,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3eff84824c8b..5ff99ed175bd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -73,46 +73,22 @@ EXPORT_SYMBOL(skb_flow_dissector_init);
 int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 				  union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
-	struct bpf_prog *attached;
 	struct net *net;
-
-	if (attr->query.query_flags)
-		return -EINVAL;
+	int ret;
 
 	net = get_net_ns_by_fd(attr->query.target_fd);
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->flow_dissector_prog);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	ret = bpf_prog_query_one(&net->flow_dissector_prog, attr, uattr);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 				       struct bpf_prog *prog)
 {
-	struct bpf_prog *attached;
 	struct net *net;
 	int ret = 0;
 
@@ -145,16 +121,9 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 		}
 	}
 
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (attached == prog) {
-		/* The same program cannot be attached twice */
-		ret = -EINVAL;
-		goto out;
-	}
-	rcu_assign_pointer(net->flow_dissector_prog, prog);
-	if (attached)
-		bpf_prog_put(attached);
+	ret = bpf_prog_attach_one(&net->flow_dissector_prog,
+				  &flow_dissector_mutex, prog,
+				  attr->attach_flags);
 out:
 	mutex_unlock(&flow_dissector_mutex);
 	return ret;
@@ -162,21 +131,15 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 
 int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 {
-	struct bpf_prog *attached;
-	struct net *net;
+	struct net *net = current->nsproxy->net_ns;
+	int ret;
 
-	net = current->nsproxy->net_ns;
 	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (!attached) {
-		mutex_unlock(&flow_dissector_mutex);
-		return -ENOENT;
-	}
-	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
-	bpf_prog_put(attached);
+	ret =  bpf_prog_detach_one(&net->flow_dissector_prog,
+				   &flow_dissector_mutex);
 	mutex_unlock(&flow_dissector_mutex);
-	return 0;
+
+	return ret;
 }
 
 /**
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 01/17] flow_dissector: Extract attach/detach/query helpers
@ 2020-05-06 12:54   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: dccp

Move generic parts of callbacks for querying, attaching, and detaching a
single BPF program for reuse by other BPF program types.

Subsequent patch makes use of the extracted routines.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf.h       |  8 +++++
 net/core/filter.c         | 68 +++++++++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c | 61 +++++++----------------------------
 3 files changed, 88 insertions(+), 49 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1262ec460ab3..716c47ac1e75 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -31,6 +31,7 @@ struct seq_file;
 struct btf;
 struct btf_type;
 struct exception_table_entry;
+struct mutex;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1660,4 +1661,11 @@ enum bpf_text_poke_type {
 int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
 		       void *addr1, void *addr2);
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr);
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags);
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock);
+
 #endif /* _LINUX_BPF_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..bc25bb1085b1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8740,6 +8740,74 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 	return ret;
 }
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	u32 prog_id, prog_cnt = 0, flags = 0;
+	struct bpf_prog *attached;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	rcu_read_lock();
+	attached = rcu_dereference(*pprog);
+	if (attached) {
+		prog_cnt = 1;
+		prog_id = attached->aux->id;
+	}
+	rcu_read_unlock();
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags)
+{
+	struct bpf_prog *attached;
+
+	if (flags)
+		return -EINVAL;
+
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (attached = prog) {
+		/* The same program cannot be attached twice */
+		return -EINVAL;
+	}
+	rcu_assign_pointer(*pprog, prog);
+	if (attached)
+		bpf_prog_put(attached);
+
+	return 0;
+}
+
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock)
+{
+	struct bpf_prog *attached;
+
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (!attached)
+		return -ENOENT;
+	RCU_INIT_POINTER(*pprog, NULL);
+	bpf_prog_put(attached);
+
+	return 0;
+}
+
 #ifdef CONFIG_INET
 static void bpf_init_reuseport_kern(struct sk_reuseport_kern *reuse_kern,
 				    struct sock_reuseport *reuse,
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 3eff84824c8b..5ff99ed175bd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -73,46 +73,22 @@ EXPORT_SYMBOL(skb_flow_dissector_init);
 int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 				  union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
-	struct bpf_prog *attached;
 	struct net *net;
-
-	if (attr->query.query_flags)
-		return -EINVAL;
+	int ret;
 
 	net = get_net_ns_by_fd(attr->query.target_fd);
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->flow_dissector_prog);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	ret = bpf_prog_query_one(&net->flow_dissector_prog, attr, uattr);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 				       struct bpf_prog *prog)
 {
-	struct bpf_prog *attached;
 	struct net *net;
 	int ret = 0;
 
@@ -145,16 +121,9 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 		}
 	}
 
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (attached = prog) {
-		/* The same program cannot be attached twice */
-		ret = -EINVAL;
-		goto out;
-	}
-	rcu_assign_pointer(net->flow_dissector_prog, prog);
-	if (attached)
-		bpf_prog_put(attached);
+	ret = bpf_prog_attach_one(&net->flow_dissector_prog,
+				  &flow_dissector_mutex, prog,
+				  attr->attach_flags);
 out:
 	mutex_unlock(&flow_dissector_mutex);
 	return ret;
@@ -162,21 +131,15 @@ int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 
 int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 {
-	struct bpf_prog *attached;
-	struct net *net;
+	struct net *net = current->nsproxy->net_ns;
+	int ret;
 
-	net = current->nsproxy->net_ns;
 	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (!attached) {
-		mutex_unlock(&flow_dissector_mutex);
-		return -ENOENT;
-	}
-	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
-	bpf_prog_put(attached);
+	ret =  bpf_prog_detach_one(&net->flow_dissector_prog,
+				   &flow_dissector_mutex);
 	mutex_unlock(&flow_dissector_mutex);
-	return 0;
+
+	return ret;
 }
 
 /**
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-06 12:54   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Marek Majkowski, Lorenz Bauer

Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
called BPF_SK_LOOKUP. The new program kind is to be invoked by the
transport layer when looking up a socket for a received packet.

When called, SK_LOOKUP program can select a socket that will receive the
packet. This serves as a mechanism to overcome the limits of what bind()
API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf_types.h   |   2 +
 include/linux/filter.h      |  23 ++++
 include/net/net_namespace.h |   1 +
 include/uapi/linux/bpf.h    |  53 ++++++++
 kernel/bpf/syscall.c        |   9 ++
 net/core/filter.c           | 247 ++++++++++++++++++++++++++++++++++++
 scripts/bpf_helpers_doc.py  |   9 +-
 7 files changed, 343 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 8345cdf553b8..08c2aef674ac 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
 #ifdef CONFIG_INET
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
 	      struct sk_reuseport_md, struct sk_reuseport_kern)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
+	      struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
 #endif
 #if defined(CONFIG_BPF_JIT)
 BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index af37318bb1c5..33254e840c8d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1280,4 +1280,27 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+struct bpf_sk_lookup_kern {
+	unsigned short	family;
+	u16		protocol;
+	union {
+		struct {
+			__be32 saddr;
+			__be32 daddr;
+		} v4;
+		struct {
+			struct in6_addr saddr;
+			struct in6_addr daddr;
+		} v6;
+	};
+	__be16		sport;
+	u16		dport;
+	struct sock	*selected_sk;
+};
+
+int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int sk_lookup_prog_detach(const union bpf_attr *attr);
+int sk_lookup_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index ab96fb59131c..70bf4888c94d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -163,6 +163,7 @@ struct net {
 	struct net_generic __rcu	*gen;
 
 	struct bpf_prog __rcu	*flow_dissector_prog;
+	struct bpf_prog __rcu	*sk_lookup_prog;
 
 	/* Note : following structs are cache line aligned */
 #ifdef CONFIG_XFRM
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b3643e27e264..e4c61b63d4bc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -187,6 +187,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -218,6 +219,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3041,6 +3043,10 @@ union bpf_attr {
  *
  * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3061,6 +3067,39 @@ union bpf_attr {
  *					call from outside of TC ingress.
  *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
  *
+ * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
+ *		match. While IP family (*AF_INET* or *AF_INET6*) must be
+ *		compatible, that is IPv6 sockets that are not v6-only can be
+ *		selected for IPv4 packets.
+ *
+ *		Only full sockets can be selected. However, there is no need to
+ *		call bpf_fullsock() before passing a socket as an argument to
+ *		this helper.
+ *
+ *		The *flags* argument must be zero.
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		**-EAFNOSUPPORT** is socket family (*sk->family*) is not
+ *		compatible with packet family (*ctx->family*).
+ *
+ *		**-EINVAL** if unsupported flags were specified.
+ *
+ *		**-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
+ *		match packet protocol (*ctx->protocol*).
+ *
+ *		**-ESOCKTNOSUPPORT** if socket is not a full socket.
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPPROTO_TCP, IPPROTO_UDP */
+	/* IP addresses allows 1, 2, and 4 bytes access */
+	__u32 src_ip4;
+	__u32 src_ip6[4];
+	__u32 src_port;		/* network byte order */
+	__u32 dst_ip4;
+	__u32 dst_ip6[4];
+	__u32 dst_port;		/* host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bb1ab7da6103..26d643c171fd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2729,6 +2729,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
+	case BPF_SK_LOOKUP:
+		return BPF_PROG_TYPE_SK_LOOKUP;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -2778,6 +2780,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		ret = sk_lookup_prog_attach(attr, prog);
+		break;
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2818,6 +2823,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		return lirc_prog_detach(attr);
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		return skb_flow_dissector_bpf_prog_detach(attr);
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		return sk_lookup_prog_detach(attr);
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2867,6 +2874,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
 		return skb_flow_dissector_prog_query(attr, uattr);
+	case BPF_SK_LOOKUP:
+		return sk_lookup_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index bc25bb1085b1..a00bdc70041c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9054,6 +9054,253 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
+static DEFINE_MUTEX(sk_lookup_prog_mutex);
+
+int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct net *net = current->nsproxy->net_ns;
+	int ret;
+
+	if (unlikely(attr->attach_flags))
+		return -EINVAL;
+
+	mutex_lock(&sk_lookup_prog_mutex);
+	ret = bpf_prog_attach_one(&net->sk_lookup_prog,
+				  &sk_lookup_prog_mutex, prog,
+				  attr->attach_flags);
+	mutex_unlock(&sk_lookup_prog_mutex);
+
+	return ret;
+}
+
+int sk_lookup_prog_detach(const union bpf_attr *attr)
+{
+	struct net *net = current->nsproxy->net_ns;
+	int ret;
+
+	if (unlikely(attr->attach_flags))
+		return -EINVAL;
+
+	mutex_lock(&sk_lookup_prog_mutex);
+	ret = bpf_prog_detach_one(&net->sk_lookup_prog,
+				  &sk_lookup_prog_mutex);
+	mutex_unlock(&sk_lookup_prog_mutex);
+
+	return ret;
+}
+
+int sk_lookup_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr)
+{
+	struct net *net;
+	int ret;
+
+	net = get_net_ns_by_fd(attr->query.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	ret = bpf_prog_query_one(&net->sk_lookup_prog, attr, uattr);
+
+	put_net(net);
+	return ret;
+}
+
+BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
+	   struct sock *, sk, u64, flags)
+{
+	if (unlikely(flags != 0))
+		return -EINVAL;
+	if (unlikely(!sk_fullsock(sk)))
+		return -ESOCKTNOSUPPORT;
+
+	/* Check if socket is suitable for packet L3/L4 protocol */
+	if (sk->sk_protocol != ctx->protocol)
+		return -EPROTOTYPE;
+	if (sk->sk_family != ctx->family &&
+	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
+		return -EAFNOSUPPORT;
+
+	/* Select socket as lookup result */
+	ctx->selected_sk = sk;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sk_lookup_assign_proto = {
+	.func		= bpf_sk_lookup_assign,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_SOCK_COMMON,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_sk_assign:
+		return &bpf_sk_lookup_assign_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static bool sk_lookup_is_valid_access(int off, int size,
+				      enum bpf_access_type type,
+				      const struct bpf_prog *prog,
+				      struct bpf_insn_access_aux *info)
+{
+	const int size_default = sizeof(__u32);
+
+	if (off < 0 || off >= sizeof(struct bpf_sk_lookup))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct bpf_sk_lookup, src_ip4):
+	case bpf_ctx_range(struct bpf_sk_lookup, dst_ip4):
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				src_ip6[0], src_ip6[3]):
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				dst_ip6[0], dst_ip6[3]):
+		if (!bpf_ctx_narrow_access_ok(off, size, size_default))
+			return false;
+		bpf_ctx_record_field_size(info, size_default);
+		break;
+
+	case bpf_ctx_range(struct bpf_sk_lookup, family):
+	case bpf_ctx_range(struct bpf_sk_lookup, protocol):
+	case bpf_ctx_range(struct bpf_sk_lookup, src_port):
+	case bpf_ctx_range(struct bpf_sk_lookup, dst_port):
+		if (size != size_default)
+			return false;
+		break;
+
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+#define CHECK_FIELD_SIZE(BPF_TYPE, BPF_FIELD, KERN_TYPE, KERN_FIELD)	\
+	BUILD_BUG_ON(sizeof_field(BPF_TYPE, BPF_FIELD) <		\
+		     sizeof_field(KERN_TYPE, KERN_FIELD))
+
+#define LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, OFF)			\
+	BPF_LDX_MEM(SIZE, si->dst_reg, si->src_reg,			\
+		    bpf_target_off(TYPE, FIELD,				\
+				   sizeof_field(TYPE, FIELD),		\
+				   target_size) + (OFF))
+
+#define LOAD_FIELD_SIZE(TYPE, FIELD, SIZE) \
+	LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, 0)
+
+#define LOAD_FIELD(TYPE, FIELD) \
+	LOAD_FIELD_SIZE(TYPE, FIELD, BPF_FIELD_SIZEOF(TYPE, FIELD))
+
+static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
+					const struct bpf_insn *si,
+					struct bpf_insn *insn_buf,
+					struct bpf_prog *prog,
+					u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+	int off;
+
+	switch (si->off) {
+	case offsetof(struct bpf_sk_lookup, family):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, family,
+				 struct bpf_sk_lookup_kern, family);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, family);
+		break;
+
+	case offsetof(struct bpf_sk_lookup, protocol):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, protocol,
+				 struct bpf_sk_lookup_kern, protocol);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, protocol);
+		break;
+
+	case offsetof(struct bpf_sk_lookup, src_ip4):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, src_ip4,
+				 struct bpf_sk_lookup_kern, v4.saddr);
+		*insn++ = LOAD_FIELD_SIZE(struct bpf_sk_lookup_kern, v4.saddr,
+					  BPF_SIZE(si->code));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, dst_ip4):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, dst_ip4,
+				 struct bpf_sk_lookup_kern, v4.daddr);
+		*insn++ = LOAD_FIELD_SIZE(struct bpf_sk_lookup_kern, v4.daddr,
+					  BPF_SIZE(si->code));
+
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				src_ip6[0], src_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, src_ip6[0],
+				 struct bpf_sk_lookup_kern,
+				 v6.saddr.s6_addr32[0]);
+		off = si->off;
+		off -= offsetof(struct bpf_sk_lookup, src_ip6[0]);
+		*insn++ = LOAD_FIELD_SIZE_OFF(struct bpf_sk_lookup_kern,
+					      v6.saddr.s6_addr32[0],
+					      BPF_SIZE(si->code), off);
+#else
+		(void)off;
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				dst_ip6[0], dst_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, dst_ip6[0],
+				 struct bpf_sk_lookup_kern,
+				 v6.daddr.s6_addr32[0]);
+		off = si->off;
+		off -= offsetof(struct bpf_sk_lookup, dst_ip6[0]);
+		*insn++ = LOAD_FIELD_SIZE_OFF(struct bpf_sk_lookup_kern,
+					      v6.daddr.s6_addr32[0],
+					      BPF_SIZE(si->code), off);
+#else
+		(void)off;
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_sk_lookup, src_port):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, src_port,
+				 struct bpf_sk_lookup_kern, sport);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, sport);
+		break;
+
+	case offsetof(struct bpf_sk_lookup, dst_port):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, dst_port,
+				 struct bpf_sk_lookup_kern, dport);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, dport);
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_prog_ops sk_lookup_prog_ops = {
+};
+
+const struct bpf_verifier_ops sk_lookup_verifier_ops = {
+	.get_func_proto		= sk_lookup_func_proto,
+	.is_valid_access	= sk_lookup_is_valid_access,
+	.convert_ctx_access	= sk_lookup_convert_ctx_access,
+};
+
 #endif /* CONFIG_INET */
 
 DEFINE_BPF_DISPATCHER(xdp)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index f43d193aff3a..70b1b033c721 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -398,6 +398,7 @@ class PrinterHelpers(Printer):
 
     type_fwds = [
             'struct bpf_fib_lookup',
+            'struct bpf_sk_lookup',
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
@@ -437,6 +438,7 @@ class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
             'struct bpf_sock_ops',
@@ -467,6 +469,11 @@ class PrinterHelpers(Printer):
             'struct sk_msg_buff': 'struct sk_msg_md',
             'struct xdp_buff': 'struct xdp_md',
     }
+    # Helpers overloaded for different context types.
+    overloaded_helpers = [
+        'bpf_get_socket_cookie',
+        'bpf_sk_assign',
+    ]
 
     def print_header(self):
         header = '''\
@@ -523,7 +530,7 @@ class PrinterHelpers(Printer):
         for i, a in enumerate(proto['args']):
             t = a['type']
             n = a['name']
-            if proto['name'] == 'bpf_get_socket_cookie' and i == 0:
+            if proto['name'] in self.overloaded_helpers and i == 0:
                     t = 'void'
                     n = 'ctx'
             one_arg = '{}{}'.format(comma, self.map_type(t))
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-06 12:54   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: dccp

Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
called BPF_SK_LOOKUP. The new program kind is to be invoked by the
transport layer when looking up a socket for a received packet.

When called, SK_LOOKUP program can select a socket that will receive the
packet. This serves as a mechanism to overcome the limits of what bind()
API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf_types.h   |   2 +
 include/linux/filter.h      |  23 ++++
 include/net/net_namespace.h |   1 +
 include/uapi/linux/bpf.h    |  53 ++++++++
 kernel/bpf/syscall.c        |   9 ++
 net/core/filter.c           | 247 ++++++++++++++++++++++++++++++++++++
 scripts/bpf_helpers_doc.py  |   9 +-
 7 files changed, 343 insertions(+), 1 deletion(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 8345cdf553b8..08c2aef674ac 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
 #ifdef CONFIG_INET
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
 	      struct sk_reuseport_md, struct sk_reuseport_kern)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
+	      struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
 #endif
 #if defined(CONFIG_BPF_JIT)
 BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index af37318bb1c5..33254e840c8d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1280,4 +1280,27 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+struct bpf_sk_lookup_kern {
+	unsigned short	family;
+	u16		protocol;
+	union {
+		struct {
+			__be32 saddr;
+			__be32 daddr;
+		} v4;
+		struct {
+			struct in6_addr saddr;
+			struct in6_addr daddr;
+		} v6;
+	};
+	__be16		sport;
+	u16		dport;
+	struct sock	*selected_sk;
+};
+
+int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
+int sk_lookup_prog_detach(const union bpf_attr *attr);
+int sk_lookup_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index ab96fb59131c..70bf4888c94d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -163,6 +163,7 @@ struct net {
 	struct net_generic __rcu	*gen;
 
 	struct bpf_prog __rcu	*flow_dissector_prog;
+	struct bpf_prog __rcu	*sk_lookup_prog;
 
 	/* Note : following structs are cache line aligned */
 #ifdef CONFIG_XFRM
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b3643e27e264..e4c61b63d4bc 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -187,6 +187,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -218,6 +219,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3041,6 +3043,10 @@ union bpf_attr {
  *
  * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3061,6 +3067,39 @@ union bpf_attr {
  *					call from outside of TC ingress.
  *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
  *
+ * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
+ *		match. While IP family (*AF_INET* or *AF_INET6*) must be
+ *		compatible, that is IPv6 sockets that are not v6-only can be
+ *		selected for IPv4 packets.
+ *
+ *		Only full sockets can be selected. However, there is no need to
+ *		call bpf_fullsock() before passing a socket as an argument to
+ *		this helper.
+ *
+ *		The *flags* argument must be zero.
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		**-EAFNOSUPPORT** is socket family (*sk->family*) is not
+ *		compatible with packet family (*ctx->family*).
+ *
+ *		**-EINVAL** if unsupported flags were specified.
+ *
+ *		**-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
+ *		match packet protocol (*ctx->protocol*).
+ *
+ *		**-ESOCKTNOSUPPORT** if socket is not a full socket.
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPPROTO_TCP, IPPROTO_UDP */
+	/* IP addresses allows 1, 2, and 4 bytes access */
+	__u32 src_ip4;
+	__u32 src_ip6[4];
+	__u32 src_port;		/* network byte order */
+	__u32 dst_ip4;
+	__u32 dst_ip6[4];
+	__u32 dst_port;		/* host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bb1ab7da6103..26d643c171fd 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2729,6 +2729,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 	case BPF_CGROUP_GETSOCKOPT:
 	case BPF_CGROUP_SETSOCKOPT:
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
+	case BPF_SK_LOOKUP:
+		return BPF_PROG_TYPE_SK_LOOKUP;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -2778,6 +2780,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		ret = sk_lookup_prog_attach(attr, prog);
+		break;
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2818,6 +2823,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 		return lirc_prog_detach(attr);
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		return skb_flow_dissector_bpf_prog_detach(attr);
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		return sk_lookup_prog_detach(attr);
 	case BPF_PROG_TYPE_CGROUP_DEVICE:
 	case BPF_PROG_TYPE_CGROUP_SKB:
 	case BPF_PROG_TYPE_CGROUP_SOCK:
@@ -2867,6 +2874,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
 		return skb_flow_dissector_prog_query(attr, uattr);
+	case BPF_SK_LOOKUP:
+		return sk_lookup_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index bc25bb1085b1..a00bdc70041c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9054,6 +9054,253 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
+static DEFINE_MUTEX(sk_lookup_prog_mutex);
+
+int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+	struct net *net = current->nsproxy->net_ns;
+	int ret;
+
+	if (unlikely(attr->attach_flags))
+		return -EINVAL;
+
+	mutex_lock(&sk_lookup_prog_mutex);
+	ret = bpf_prog_attach_one(&net->sk_lookup_prog,
+				  &sk_lookup_prog_mutex, prog,
+				  attr->attach_flags);
+	mutex_unlock(&sk_lookup_prog_mutex);
+
+	return ret;
+}
+
+int sk_lookup_prog_detach(const union bpf_attr *attr)
+{
+	struct net *net = current->nsproxy->net_ns;
+	int ret;
+
+	if (unlikely(attr->attach_flags))
+		return -EINVAL;
+
+	mutex_lock(&sk_lookup_prog_mutex);
+	ret = bpf_prog_detach_one(&net->sk_lookup_prog,
+				  &sk_lookup_prog_mutex);
+	mutex_unlock(&sk_lookup_prog_mutex);
+
+	return ret;
+}
+
+int sk_lookup_prog_query(const union bpf_attr *attr,
+			 union bpf_attr __user *uattr)
+{
+	struct net *net;
+	int ret;
+
+	net = get_net_ns_by_fd(attr->query.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	ret = bpf_prog_query_one(&net->sk_lookup_prog, attr, uattr);
+
+	put_net(net);
+	return ret;
+}
+
+BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
+	   struct sock *, sk, u64, flags)
+{
+	if (unlikely(flags != 0))
+		return -EINVAL;
+	if (unlikely(!sk_fullsock(sk)))
+		return -ESOCKTNOSUPPORT;
+
+	/* Check if socket is suitable for packet L3/L4 protocol */
+	if (sk->sk_protocol != ctx->protocol)
+		return -EPROTOTYPE;
+	if (sk->sk_family != ctx->family &&
+	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
+		return -EAFNOSUPPORT;
+
+	/* Select socket as lookup result */
+	ctx->selected_sk = sk;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sk_lookup_assign_proto = {
+	.func		= bpf_sk_lookup_assign,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_SOCK_COMMON,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_sk_assign:
+		return &bpf_sk_lookup_assign_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static bool sk_lookup_is_valid_access(int off, int size,
+				      enum bpf_access_type type,
+				      const struct bpf_prog *prog,
+				      struct bpf_insn_access_aux *info)
+{
+	const int size_default = sizeof(__u32);
+
+	if (off < 0 || off >= sizeof(struct bpf_sk_lookup))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct bpf_sk_lookup, src_ip4):
+	case bpf_ctx_range(struct bpf_sk_lookup, dst_ip4):
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				src_ip6[0], src_ip6[3]):
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				dst_ip6[0], dst_ip6[3]):
+		if (!bpf_ctx_narrow_access_ok(off, size, size_default))
+			return false;
+		bpf_ctx_record_field_size(info, size_default);
+		break;
+
+	case bpf_ctx_range(struct bpf_sk_lookup, family):
+	case bpf_ctx_range(struct bpf_sk_lookup, protocol):
+	case bpf_ctx_range(struct bpf_sk_lookup, src_port):
+	case bpf_ctx_range(struct bpf_sk_lookup, dst_port):
+		if (size != size_default)
+			return false;
+		break;
+
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+#define CHECK_FIELD_SIZE(BPF_TYPE, BPF_FIELD, KERN_TYPE, KERN_FIELD)	\
+	BUILD_BUG_ON(sizeof_field(BPF_TYPE, BPF_FIELD) <		\
+		     sizeof_field(KERN_TYPE, KERN_FIELD))
+
+#define LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, OFF)			\
+	BPF_LDX_MEM(SIZE, si->dst_reg, si->src_reg,			\
+		    bpf_target_off(TYPE, FIELD,				\
+				   sizeof_field(TYPE, FIELD),		\
+				   target_size) + (OFF))
+
+#define LOAD_FIELD_SIZE(TYPE, FIELD, SIZE) \
+	LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, 0)
+
+#define LOAD_FIELD(TYPE, FIELD) \
+	LOAD_FIELD_SIZE(TYPE, FIELD, BPF_FIELD_SIZEOF(TYPE, FIELD))
+
+static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
+					const struct bpf_insn *si,
+					struct bpf_insn *insn_buf,
+					struct bpf_prog *prog,
+					u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+	int off;
+
+	switch (si->off) {
+	case offsetof(struct bpf_sk_lookup, family):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, family,
+				 struct bpf_sk_lookup_kern, family);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, family);
+		break;
+
+	case offsetof(struct bpf_sk_lookup, protocol):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, protocol,
+				 struct bpf_sk_lookup_kern, protocol);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, protocol);
+		break;
+
+	case offsetof(struct bpf_sk_lookup, src_ip4):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, src_ip4,
+				 struct bpf_sk_lookup_kern, v4.saddr);
+		*insn++ = LOAD_FIELD_SIZE(struct bpf_sk_lookup_kern, v4.saddr,
+					  BPF_SIZE(si->code));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, dst_ip4):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, dst_ip4,
+				 struct bpf_sk_lookup_kern, v4.daddr);
+		*insn++ = LOAD_FIELD_SIZE(struct bpf_sk_lookup_kern, v4.daddr,
+					  BPF_SIZE(si->code));
+
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				src_ip6[0], src_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, src_ip6[0],
+				 struct bpf_sk_lookup_kern,
+				 v6.saddr.s6_addr32[0]);
+		off = si->off;
+		off -= offsetof(struct bpf_sk_lookup, src_ip6[0]);
+		*insn++ = LOAD_FIELD_SIZE_OFF(struct bpf_sk_lookup_kern,
+					      v6.saddr.s6_addr32[0],
+					      BPF_SIZE(si->code), off);
+#else
+		(void)off;
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				dst_ip6[0], dst_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, dst_ip6[0],
+				 struct bpf_sk_lookup_kern,
+				 v6.daddr.s6_addr32[0]);
+		off = si->off;
+		off -= offsetof(struct bpf_sk_lookup, dst_ip6[0]);
+		*insn++ = LOAD_FIELD_SIZE_OFF(struct bpf_sk_lookup_kern,
+					      v6.daddr.s6_addr32[0],
+					      BPF_SIZE(si->code), off);
+#else
+		(void)off;
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_sk_lookup, src_port):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, src_port,
+				 struct bpf_sk_lookup_kern, sport);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, sport);
+		break;
+
+	case offsetof(struct bpf_sk_lookup, dst_port):
+		CHECK_FIELD_SIZE(struct bpf_sk_lookup, dst_port,
+				 struct bpf_sk_lookup_kern, dport);
+		*insn++ = LOAD_FIELD(struct bpf_sk_lookup_kern, dport);
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_prog_ops sk_lookup_prog_ops = {
+};
+
+const struct bpf_verifier_ops sk_lookup_verifier_ops = {
+	.get_func_proto		= sk_lookup_func_proto,
+	.is_valid_access	= sk_lookup_is_valid_access,
+	.convert_ctx_access	= sk_lookup_convert_ctx_access,
+};
+
 #endif /* CONFIG_INET */
 
 DEFINE_BPF_DISPATCHER(xdp)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index f43d193aff3a..70b1b033c721 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -398,6 +398,7 @@ class PrinterHelpers(Printer):
 
     type_fwds = [
             'struct bpf_fib_lookup',
+            'struct bpf_sk_lookup',
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
@@ -437,6 +438,7 @@ class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
             'struct bpf_sock_ops',
@@ -467,6 +469,11 @@ class PrinterHelpers(Printer):
             'struct sk_msg_buff': 'struct sk_msg_md',
             'struct xdp_buff': 'struct xdp_md',
     }
+    # Helpers overloaded for different context types.
+    overloaded_helpers = [
+        'bpf_get_socket_cookie',
+        'bpf_sk_assign',
+    ]
 
     def print_header(self):
         header = '''\
@@ -523,7 +530,7 @@ class PrinterHelpers(Printer):
         for i, a in enumerate(proto['args']):
             t = a['type']
             n = a['name']
-            if proto['name'] = 'bpf_get_socket_cookie' and i = 0:
+            if proto['name'] in self.overloaded_helpers and i = 0:
                     t = 'void'
                     n = 'ctx'
             one_arg = '{}{}'.format(comma, self.map_type(t))
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 03/17] inet: Store layer 4 protocol in inet_hashinfo
@ 2020-05-06 12:54   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Lorenz Bauer

Make it possible to identify the protocol of sockets stored in hashinfo
without looking up a socket.

Subsequent patches make use the new field at the socket lookup time to
ensure that BPF program selects only sockets with matching protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 3 +++
 net/dccp/proto.c              | 2 +-
 net/ipv4/tcp_ipv4.c           | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index ad64ba6a057f..6072dfbd1078 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -144,6 +144,9 @@ struct inet_hashinfo {
 	unsigned int			lhash2_mask;
 	struct inet_listen_hashbucket	*lhash2;
 
+	/* Layer 4 protocol of the stored sockets */
+	int				protocol;
+
 	/* All the above members are written once at bootup and
 	 * never written again _or_ are predominantly read-access.
 	 *
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 4af8a98fe784..c826419e68e6 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL_GPL(dccp_statistics);
 struct percpu_counter dccp_orphan_count;
 EXPORT_SYMBOL_GPL(dccp_orphan_count);
 
-struct inet_hashinfo dccp_hashinfo;
+struct inet_hashinfo dccp_hashinfo = { .protocol = IPPROTO_DCCP };
 EXPORT_SYMBOL_GPL(dccp_hashinfo);
 
 /* the maximum queue length for tx in packets. 0 is no limit */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6c05f1ceb538..77e4f4e4c73c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -87,7 +87,7 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 			       __be32 daddr, __be32 saddr, const struct tcphdr *th);
 #endif
 
-struct inet_hashinfo tcp_hashinfo;
+struct inet_hashinfo tcp_hashinfo = { .protocol = IPPROTO_TCP };
 EXPORT_SYMBOL(tcp_hashinfo);
 
 static u32 tcp_v4_init_seq(const struct sk_buff *skb)
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 03/17] inet: Store layer 4 protocol in inet_hashinfo
@ 2020-05-06 12:54   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:54 UTC (permalink / raw)
  To: dccp

Make it possible to identify the protocol of sockets stored in hashinfo
without looking up a socket.

Subsequent patches make use the new field at the socket lookup time to
ensure that BPF program selects only sockets with matching protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 3 +++
 net/dccp/proto.c              | 2 +-
 net/ipv4/tcp_ipv4.c           | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index ad64ba6a057f..6072dfbd1078 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -144,6 +144,9 @@ struct inet_hashinfo {
 	unsigned int			lhash2_mask;
 	struct inet_listen_hashbucket	*lhash2;
 
+	/* Layer 4 protocol of the stored sockets */
+	int				protocol;
+
 	/* All the above members are written once at bootup and
 	 * never written again _or_ are predominantly read-access.
 	 *
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 4af8a98fe784..c826419e68e6 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL_GPL(dccp_statistics);
 struct percpu_counter dccp_orphan_count;
 EXPORT_SYMBOL_GPL(dccp_orphan_count);
 
-struct inet_hashinfo dccp_hashinfo;
+struct inet_hashinfo dccp_hashinfo = { .protocol = IPPROTO_DCCP };
 EXPORT_SYMBOL_GPL(dccp_hashinfo);
 
 /* the maximum queue length for tx in packets. 0 is no limit */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6c05f1ceb538..77e4f4e4c73c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -87,7 +87,7 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 			       __be32 daddr, __be32 saddr, const struct tcphdr *th);
 #endif
 
-struct inet_hashinfo tcp_hashinfo;
+struct inet_hashinfo tcp_hashinfo = { .protocol = IPPROTO_TCP };
 EXPORT_SYMBOL(tcp_hashinfo);
 
 static u32 tcp_v4_init_seq(const struct sk_buff *skb)
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 04/17] inet: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Prepare for calling into reuseport from __inet_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/inet_hashtables.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2bbaaf0c7176..ab64834837c8 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -246,6 +246,21 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /*
  * Here are some nice properties to exploit here. The BSD API
  * does not allow a listening sock to specify the remote port nor the
@@ -265,21 +280,17 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr,
 				      dif, sdif, exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet_ehashfn(net, daddr, hnum,
-						     saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 04/17] inet: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Prepare for calling into reuseport from __inet_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/inet_hashtables.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2bbaaf0c7176..ab64834837c8 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -246,6 +246,21 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /*
  * Here are some nice properties to exploit here. The BSD API
  * does not allow a listening sock to specify the remote port nor the
@@ -265,21 +280,17 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr,
 				      dif, sdif, exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet_ehashfn(net, daddr, hnum,
-						     saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 05/17] inet: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Marek Majkowski, Lorenz Bauer

Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning BPF_REDIRECT code.

Alternatively, program can also fail the lookup by returning with BPF_DROP,
or let the lookup continue as usual with BPF_OK on return.

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 36 +++++++++++++++++++++++++++++++++++
 net/ipv4/inet_hashtables.c    | 15 ++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 6072dfbd1078..3fcbc8f66f88 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -422,4 +422,40 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk);
+
+static inline struct sock *bpf_sk_lookup_run(struct net *net,
+					     struct bpf_sk_lookup_kern *ctx)
+{
+	struct bpf_prog *prog;
+	int ret = BPF_OK;
+
+	rcu_read_lock();
+	prog = rcu_dereference(net->sk_lookup_prog);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, ctx);
+	rcu_read_unlock();
+
+	if (ret == BPF_DROP)
+		return ERR_PTR(-ECONNREFUSED);
+	if (ret == BPF_REDIRECT)
+		return ctx->selected_sk;
+	return NULL;
+}
+
+static inline struct sock *inet_lookup_run_bpf(struct net *net, u8 protocol,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr, u16 dport)
+{
+	struct bpf_sk_lookup_kern ctx = {
+		.family		= AF_INET,
+		.protocol	= protocol,
+		.v4.saddr	= saddr,
+		.v4.daddr	= daddr,
+		.sport		= sport,
+		.dport		= dport,
+	};
+
+	return bpf_sk_lookup_run(net, &ctx);
+}
+
 #endif /* _INET_HASHTABLES_H */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ab64834837c8..f4d07285591a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -307,9 +307,22 @@ struct sock *__inet_lookup_listener(struct net *net,
 				    const int dif, const int sdif)
 {
 	struct inet_listen_hashbucket *ilb2;
-	struct sock *result = NULL;
+	struct sock *result, *reuse_sk;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	result = inet_lookup_run_bpf(net, hashinfo->protocol,
+				     saddr, sport, daddr, hnum);
+	if (IS_ERR(result))
+		return NULL;
+	if (result) {
+		reuse_sk = lookup_reuseport(net, result, skb, doff,
+					    saddr, sport, daddr, hnum);
+		if (reuse_sk)
+			result = reuse_sk;
+		goto done;
+	}
+
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 05/17] inet: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning BPF_REDIRECT code.

Alternatively, program can also fail the lookup by returning with BPF_DROP,
or let the lookup continue as usual with BPF_OK on return.

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 36 +++++++++++++++++++++++++++++++++++
 net/ipv4/inet_hashtables.c    | 15 ++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index 6072dfbd1078..3fcbc8f66f88 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -422,4 +422,40 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk);
+
+static inline struct sock *bpf_sk_lookup_run(struct net *net,
+					     struct bpf_sk_lookup_kern *ctx)
+{
+	struct bpf_prog *prog;
+	int ret = BPF_OK;
+
+	rcu_read_lock();
+	prog = rcu_dereference(net->sk_lookup_prog);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, ctx);
+	rcu_read_unlock();
+
+	if (ret = BPF_DROP)
+		return ERR_PTR(-ECONNREFUSED);
+	if (ret = BPF_REDIRECT)
+		return ctx->selected_sk;
+	return NULL;
+}
+
+static inline struct sock *inet_lookup_run_bpf(struct net *net, u8 protocol,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr, u16 dport)
+{
+	struct bpf_sk_lookup_kern ctx = {
+		.family		= AF_INET,
+		.protocol	= protocol,
+		.v4.saddr	= saddr,
+		.v4.daddr	= daddr,
+		.sport		= sport,
+		.dport		= dport,
+	};
+
+	return bpf_sk_lookup_run(net, &ctx);
+}
+
 #endif /* _INET_HASHTABLES_H */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ab64834837c8..f4d07285591a 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -307,9 +307,22 @@ struct sock *__inet_lookup_listener(struct net *net,
 				    const int dif, const int sdif)
 {
 	struct inet_listen_hashbucket *ilb2;
-	struct sock *result = NULL;
+	struct sock *result, *reuse_sk;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	result = inet_lookup_run_bpf(net, hashinfo->protocol,
+				     saddr, sport, daddr, hnum);
+	if (IS_ERR(result))
+		return NULL;
+	if (result) {
+		reuse_sk = lookup_reuseport(net, result, skb, doff,
+					    saddr, sport, daddr, hnum);
+		if (reuse_sk)
+			result = reuse_sk;
+		goto done;
+	}
+
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 06/17] inet6: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Prepare for calling into reuseport from inet6_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/inet6_hashtables.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index fbe9d4295eac..03942eef8ab6 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -111,6 +111,23 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *inet6_lhash2_lookup(struct net *net,
 		struct inet_listen_hashbucket *ilb2,
@@ -123,21 +140,17 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr, dif, sdif,
 				      exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet6_ehashfn(net, daddr, hnum,
-						      saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 06/17] inet6: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Prepare for calling into reuseport from inet6_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/inet6_hashtables.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index fbe9d4295eac..03942eef8ab6 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -111,6 +111,23 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *inet6_lhash2_lookup(struct net *net,
 		struct inet_listen_hashbucket *ilb2,
@@ -123,21 +140,17 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr, dif, sdif,
 				      exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet6_ehashfn(net, daddr, hnum,
-						      saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 07/17] inet6: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Marek Majkowski, Lorenz Bauer

Following ipv4 stack changes, run a BPF program attached to netns before
looking up a listening socket. Program can return a listening socket to use
as result of socket lookup, fail the lookup, or take no action.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet6_hashtables.h | 20 ++++++++++++++++++++
 net/ipv6/inet6_hashtables.c    | 15 ++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 81b965953036..8b8c0cb92ea8 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -21,6 +21,7 @@
 
 #include <net/ipv6.h>
 #include <net/netns/hash.h>
+#include <net/inet_hashtables.h>
 
 struct inet_hashinfo;
 
@@ -103,6 +104,25 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
 			  const int dif);
 
 int inet6_hash(struct sock *sk);
+
+static inline struct sock *inet6_lookup_run_bpf(struct net *net, u8 protocol,
+						const struct in6_addr *saddr,
+						__be16 sport,
+						const struct in6_addr *daddr,
+						u16 dport)
+{
+	struct bpf_sk_lookup_kern ctx = {
+		.family		= AF_INET6,
+		.protocol	= protocol,
+		.v6.saddr	= *saddr,
+		.v6.daddr	= *daddr,
+		.sport		= sport,
+		.dport		= dport,
+	};
+
+	return bpf_sk_lookup_run(net, &ctx);
+}
+
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
 #define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif, __sdif) \
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 03942eef8ab6..6d91de89fd2b 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -167,9 +167,22 @@ struct sock *inet6_lookup_listener(struct net *net,
 		const unsigned short hnum, const int dif, const int sdif)
 {
 	struct inet_listen_hashbucket *ilb2;
-	struct sock *result = NULL;
+	struct sock *result, *reuse_sk;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	result = inet6_lookup_run_bpf(net, hashinfo->protocol,
+				      saddr, sport, daddr, hnum);
+	if (IS_ERR(result))
+		return NULL;
+	if (result) {
+		reuse_sk = lookup_reuseport(net, result, skb, doff,
+					    saddr, sport, daddr, hnum);
+		if (reuse_sk)
+			result = reuse_sk;
+		goto done;
+	}
+
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 07/17] inet6: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Following ipv4 stack changes, run a BPF program attached to netns before
looking up a listening socket. Program can return a listening socket to use
as result of socket lookup, fail the lookup, or take no action.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet6_hashtables.h | 20 ++++++++++++++++++++
 net/ipv6/inet6_hashtables.c    | 15 ++++++++++++++-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index 81b965953036..8b8c0cb92ea8 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -21,6 +21,7 @@
 
 #include <net/ipv6.h>
 #include <net/netns/hash.h>
+#include <net/inet_hashtables.h>
 
 struct inet_hashinfo;
 
@@ -103,6 +104,25 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
 			  const int dif);
 
 int inet6_hash(struct sock *sk);
+
+static inline struct sock *inet6_lookup_run_bpf(struct net *net, u8 protocol,
+						const struct in6_addr *saddr,
+						__be16 sport,
+						const struct in6_addr *daddr,
+						u16 dport)
+{
+	struct bpf_sk_lookup_kern ctx = {
+		.family		= AF_INET6,
+		.protocol	= protocol,
+		.v6.saddr	= *saddr,
+		.v6.daddr	= *daddr,
+		.sport		= sport,
+		.dport		= dport,
+	};
+
+	return bpf_sk_lookup_run(net, &ctx);
+}
+
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
 #define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif, __sdif) \
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 03942eef8ab6..6d91de89fd2b 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -167,9 +167,22 @@ struct sock *inet6_lookup_listener(struct net *net,
 		const unsigned short hnum, const int dif, const int sdif)
 {
 	struct inet_listen_hashbucket *ilb2;
-	struct sock *result = NULL;
+	struct sock *result, *reuse_sk;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	result = inet6_lookup_run_bpf(net, hashinfo->protocol,
+				      saddr, sport, daddr, hnum);
+	if (IS_ERR(result))
+		return NULL;
+	if (result) {
+		reuse_sk = lookup_reuseport(net, result, skb, doff,
+					    saddr, sport, daddr, hnum);
+		if (reuse_sk)
+			result = reuse_sk;
+		goto done;
+	}
+
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 08/17] udp: Store layer 4 protocol in udp_table
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Lorenz Bauer

Because UDP and UDP-Lite share code, we pass the L4 protocol identifier
alongside the UDP socket table to functions which need to distinguishing
between the two protocol.

Put the protocol identifier in the UDP table itself, so that the protocol
is known to any function in the call chain that operates on socket table.

Subsequent patches make use the new udp_table field at the socket lookup
time to ensure that BPF program selects only sockets with matching
protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/udp.h   | 10 ++++++----
 net/ipv4/udp.c      | 15 +++++++--------
 net/ipv4/udp_impl.h |  2 +-
 net/ipv4/udplite.c  |  4 ++--
 net/ipv6/udp.c      | 12 ++++++------
 net/ipv6/udp_impl.h |  2 +-
 net/ipv6/udplite.c  |  2 +-
 7 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index a8fa6c0c6ded..f81c46c71fee 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -63,16 +63,18 @@ struct udp_hslot {
 /**
  *	struct udp_table - UDP table
  *
- *	@hash:	hash table, sockets are hashed on (local port)
- *	@hash2:	hash table, sockets are hashed on (local port, local address)
- *	@mask:	number of slots in hash tables, minus 1
- *	@log:	log2(number of slots in hash table)
+ *	@hash:		hash table, sockets are hashed on (local port)
+ *	@hash2:		hash table, sockets are hashed on local (port, address)
+ *	@mask:		number of slots in hash tables, minus 1
+ *	@log:		log2(number of slots in hash table)
+ *	@protocol:	layer 4 protocol of the stored sockets
  */
 struct udp_table {
 	struct udp_hslot	*hash;
 	struct udp_hslot	*hash2;
 	unsigned int		mask;
 	unsigned int		log;
+	int			protocol;
 };
 extern struct udp_table udp_table;
 void udp_table_init(struct udp_table *, const char *);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 32564b350823..ce96b1746ddf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,7 +113,7 @@
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
 
-struct udp_table udp_table __read_mostly;
+struct udp_table udp_table __read_mostly = { .protocol = IPPROTO_UDP };
 EXPORT_SYMBOL(udp_table);
 
 long sysctl_udp_mem[3] __read_mostly;
@@ -2145,8 +2145,7 @@ EXPORT_SYMBOL(udp_sk_rx_dst_set);
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
-				    struct udp_table *udptable,
-				    int proto)
+				    struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	unsigned short hnum = ntohs(uh->dest);
@@ -2202,7 +2201,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				proto == IPPROTO_UDPLITE);
+				udptable->protocol == IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -2279,8 +2278,7 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
  *	All we need to do is get the socket, and then do a checksum.
  */
 
-int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	struct sock *sk;
 	struct udphdr *uh;
@@ -2288,6 +2286,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	struct rtable *rt = skb_rtable(skb);
 	__be32 saddr, daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 	bool refcounted;
 
 	/*
@@ -2330,7 +2329,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 		return __udp4_lib_mcast_deliver(net, skb, uh,
-						saddr, daddr, udptable, proto);
+						saddr, daddr, udptable);
 
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk)
@@ -2504,7 +2503,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
 
 int udp_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp4_lib_rcv(skb, &udp_table);
 }
 
 void udp_destroy_sock(struct sock *sk)
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index 6b2fa77eeb1c..7013535f9084 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -6,7 +6,7 @@
 #include <net/protocol.h>
 #include <net/inet_common.h>
 
-int __udp4_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp4_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp4_lib_err(struct sk_buff *, u32, struct udp_table *);
 
 int udp_v4_get_port(struct sock *sk, unsigned short snum);
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 5936d66d1ce2..4e4e85de95b2 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -14,12 +14,12 @@
 #include <linux/proc_fs.h>
 #include "udp_impl.h"
 
-struct udp_table 	udplite_table __read_mostly;
+struct udp_table udplite_table __read_mostly = { .protocol = IPPROTO_UDPLITE };
 EXPORT_SYMBOL(udplite_table);
 
 static int udplite_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp4_lib_rcv(skb, &udplite_table);
 }
 
 static int udplite_err(struct sk_buff *skb, u32 info)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7d4151747340..f7866fded418 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -741,7 +741,7 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
  */
 static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
-		struct udp_table *udptable, int proto)
+		struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	const struct udphdr *uh = udp_hdr(skb);
@@ -803,7 +803,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				 proto == IPPROTO_UDPLITE);
+				 udptable->protocol == IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -836,11 +836,11 @@ static int udp6_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
 	return 0;
 }
 
-int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	const struct in6_addr *saddr, *daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 	struct udphdr *uh;
 	struct sock *sk;
 	bool refcounted;
@@ -905,7 +905,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	if (ipv6_addr_is_multicast(daddr))
 		return __udp6_lib_mcast_deliver(net, skb,
-				saddr, daddr, udptable, proto);
+				saddr, daddr, udptable);
 
 	/* Unicast */
 	sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
@@ -1014,7 +1014,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
 
 INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp6_lib_rcv(skb, &udp_table);
 }
 
 /*
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index 20e324b6f358..acd5a942c633 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -8,7 +8,7 @@
 #include <net/inet_common.h>
 #include <net/transp_v6.h>
 
-int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp6_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int,
 		   __be32, struct udp_table *);
 
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index bf7a7acd39b1..f442ed595e6f 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -14,7 +14,7 @@
 
 static int udplitev6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp6_lib_rcv(skb, &udplite_table);
 }
 
 static int udplitev6_err(struct sk_buff *skb,
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 08/17] udp: Store layer 4 protocol in udp_table
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Because UDP and UDP-Lite share code, we pass the L4 protocol identifier
alongside the UDP socket table to functions which need to distinguishing
between the two protocol.

Put the protocol identifier in the UDP table itself, so that the protocol
is known to any function in the call chain that operates on socket table.

Subsequent patches make use the new udp_table field at the socket lookup
time to ensure that BPF program selects only sockets with matching
protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/udp.h   | 10 ++++++----
 net/ipv4/udp.c      | 15 +++++++--------
 net/ipv4/udp_impl.h |  2 +-
 net/ipv4/udplite.c  |  4 ++--
 net/ipv6/udp.c      | 12 ++++++------
 net/ipv6/udp_impl.h |  2 +-
 net/ipv6/udplite.c  |  2 +-
 7 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index a8fa6c0c6ded..f81c46c71fee 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -63,16 +63,18 @@ struct udp_hslot {
 /**
  *	struct udp_table - UDP table
  *
- *	@hash:	hash table, sockets are hashed on (local port)
- *	@hash2:	hash table, sockets are hashed on (local port, local address)
- *	@mask:	number of slots in hash tables, minus 1
- *	@log:	log2(number of slots in hash table)
+ *	@hash:		hash table, sockets are hashed on (local port)
+ *	@hash2:		hash table, sockets are hashed on local (port, address)
+ *	@mask:		number of slots in hash tables, minus 1
+ *	@log:		log2(number of slots in hash table)
+ *	@protocol:	layer 4 protocol of the stored sockets
  */
 struct udp_table {
 	struct udp_hslot	*hash;
 	struct udp_hslot	*hash2;
 	unsigned int		mask;
 	unsigned int		log;
+	int			protocol;
 };
 extern struct udp_table udp_table;
 void udp_table_init(struct udp_table *, const char *);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 32564b350823..ce96b1746ddf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,7 +113,7 @@
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
 
-struct udp_table udp_table __read_mostly;
+struct udp_table udp_table __read_mostly = { .protocol = IPPROTO_UDP };
 EXPORT_SYMBOL(udp_table);
 
 long sysctl_udp_mem[3] __read_mostly;
@@ -2145,8 +2145,7 @@ EXPORT_SYMBOL(udp_sk_rx_dst_set);
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
-				    struct udp_table *udptable,
-				    int proto)
+				    struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	unsigned short hnum = ntohs(uh->dest);
@@ -2202,7 +2201,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				proto = IPPROTO_UDPLITE);
+				udptable->protocol = IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -2279,8 +2278,7 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
  *	All we need to do is get the socket, and then do a checksum.
  */
 
-int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	struct sock *sk;
 	struct udphdr *uh;
@@ -2288,6 +2286,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	struct rtable *rt = skb_rtable(skb);
 	__be32 saddr, daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 	bool refcounted;
 
 	/*
@@ -2330,7 +2329,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 		return __udp4_lib_mcast_deliver(net, skb, uh,
-						saddr, daddr, udptable, proto);
+						saddr, daddr, udptable);
 
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk)
@@ -2504,7 +2503,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
 
 int udp_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp4_lib_rcv(skb, &udp_table);
 }
 
 void udp_destroy_sock(struct sock *sk)
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index 6b2fa77eeb1c..7013535f9084 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -6,7 +6,7 @@
 #include <net/protocol.h>
 #include <net/inet_common.h>
 
-int __udp4_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp4_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp4_lib_err(struct sk_buff *, u32, struct udp_table *);
 
 int udp_v4_get_port(struct sock *sk, unsigned short snum);
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 5936d66d1ce2..4e4e85de95b2 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -14,12 +14,12 @@
 #include <linux/proc_fs.h>
 #include "udp_impl.h"
 
-struct udp_table 	udplite_table __read_mostly;
+struct udp_table udplite_table __read_mostly = { .protocol = IPPROTO_UDPLITE };
 EXPORT_SYMBOL(udplite_table);
 
 static int udplite_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp4_lib_rcv(skb, &udplite_table);
 }
 
 static int udplite_err(struct sk_buff *skb, u32 info)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7d4151747340..f7866fded418 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -741,7 +741,7 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
  */
 static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
-		struct udp_table *udptable, int proto)
+		struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	const struct udphdr *uh = udp_hdr(skb);
@@ -803,7 +803,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				 proto = IPPROTO_UDPLITE);
+				 udptable->protocol = IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -836,11 +836,11 @@ static int udp6_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
 	return 0;
 }
 
-int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	const struct in6_addr *saddr, *daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 	struct udphdr *uh;
 	struct sock *sk;
 	bool refcounted;
@@ -905,7 +905,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	if (ipv6_addr_is_multicast(daddr))
 		return __udp6_lib_mcast_deliver(net, skb,
-				saddr, daddr, udptable, proto);
+				saddr, daddr, udptable);
 
 	/* Unicast */
 	sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
@@ -1014,7 +1014,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
 
 INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp6_lib_rcv(skb, &udp_table);
 }
 
 /*
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index 20e324b6f358..acd5a942c633 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -8,7 +8,7 @@
 #include <net/inet_common.h>
 #include <net/transp_v6.h>
 
-int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp6_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int,
 		   __be32, struct udp_table *);
 
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index bf7a7acd39b1..f442ed595e6f 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -14,7 +14,7 @@
 
 static int udplitev6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp6_lib_rcv(skb, &udplite_table);
 }
 
 static int udplitev6_err(struct sk_buff *skb,
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 09/17] udp: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Prepare for calling into reuseport from __udp4_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ce96b1746ddf..d4842f29294a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -405,6 +405,25 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
 			      udp_ehash_secret + net_hash_mix(net));
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 saddr, __be16 sport,
@@ -415,7 +434,6 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = 0;
@@ -423,15 +441,11 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp_ehashfn(net, daddr, hnum,
-						   saddr, sport);
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			badness = score;
 			result = sk;
 		}
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 09/17] udp: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Prepare for calling into reuseport from __udp4_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ce96b1746ddf..d4842f29294a 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -405,6 +405,25 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
 			      udp_ehash_secret + net_hash_mix(net));
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 saddr, __be16 sport,
@@ -415,7 +434,6 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = 0;
@@ -423,15 +441,11 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp_ehashfn(net, daddr, hnum,
-						   saddr, sport);
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			badness = score;
 			result = sk;
 		}
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 10/17] udp: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Marek Majkowski, Lorenz Bauer

Following INET/TCP socket lookup changes, modify UDP socket lookup to let
BPF program select a receiving socket before searching for a socket by
destination address and port as usual.

Lookup of connected sockets that match packet 4-tuple is unaffected by this
change. BPF program runs, and potentially overrides the lookup result, only
if a 4-tuple match was not found.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d4842f29294a..18d8432f6551 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -460,7 +460,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport, int dif,
 		int sdif, struct udp_table *udptable, struct sk_buff *skb)
 {
-	struct sock *result;
+	struct sock *result, *sk, *reuse_sk;
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
@@ -469,18 +469,38 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-		slot2 = hash2 & udptable->mask;
-		hslot2 = &udptable->hash2[slot2];
+	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+		goto done;
 
-		result = udp4_lib_lookup2(net, saddr, sport,
-					  htonl(INADDR_ANY), hnum, dif, sdif,
-					  hslot2, skb);
+	/* Lookup redirect from BPF */
+	sk = inet_lookup_run_bpf(net, udptable->protocol,
+				 saddr, sport, daddr, hnum);
+	if (IS_ERR(sk))
+		return NULL;
+	if (sk) {
+		reuse_sk = lookup_reuseport(net, sk, skb,
+					    saddr, sport, daddr, hnum);
+		result = reuse_sk ? : sk;
+		goto done;
 	}
+
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
+
+	/* Lookup wildcard sockets */
+	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  htonl(INADDR_ANY), hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 10/17] udp: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Following INET/TCP socket lookup changes, modify UDP socket lookup to let
BPF program select a receiving socket before searching for a socket by
destination address and port as usual.

Lookup of connected sockets that match packet 4-tuple is unaffected by this
change. BPF program runs, and potentially overrides the lookup result, only
if a 4-tuple match was not found.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d4842f29294a..18d8432f6551 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -460,7 +460,7 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport, int dif,
 		int sdif, struct udp_table *udptable, struct sk_buff *skb)
 {
-	struct sock *result;
+	struct sock *result, *sk, *reuse_sk;
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
@@ -469,18 +469,38 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-		slot2 = hash2 & udptable->mask;
-		hslot2 = &udptable->hash2[slot2];
+	if (!IS_ERR_OR_NULL(result) && result->sk_state = TCP_ESTABLISHED)
+		goto done;
 
-		result = udp4_lib_lookup2(net, saddr, sport,
-					  htonl(INADDR_ANY), hnum, dif, sdif,
-					  hslot2, skb);
+	/* Lookup redirect from BPF */
+	sk = inet_lookup_run_bpf(net, udptable->protocol,
+				 saddr, sport, daddr, hnum);
+	if (IS_ERR(sk))
+		return NULL;
+	if (sk) {
+		reuse_sk = lookup_reuseport(net, sk, skb,
+					    saddr, sport, daddr, hnum);
+		result = reuse_sk ? : sk;
+		goto done;
 	}
+
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
+
+	/* Lookup wildcard sockets */
+	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  htonl(INADDR_ANY), hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 11/17] udp6: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Prepare for calling into reuseport from __udp6_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f7866fded418..ee2073329d25 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -141,6 +141,27 @@ static int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned int hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
@@ -150,7 +171,6 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = -1;
@@ -158,16 +178,11 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp6_ehashfn(net, daddr, hnum,
-						    saddr, sport);
-
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			badness = score;
 		}
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 11/17] udp6: Extract helper for selecting socket from reuseport group
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Prepare for calling into reuseport from __udp6_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f7866fded418..ee2073329d25 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -141,6 +141,27 @@ static int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned int hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
@@ -150,7 +171,6 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = -1;
@@ -158,16 +178,11 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp6_ehashfn(net, daddr, hnum,
-						    saddr, sport);
-
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			badness = score;
 		}
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 12/17] udp6: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Marek Majkowski, Lorenz Bauer

Same as for udp4, let BPF program override the socket lookup result, by
selecting a receiving socket of its choice or failing the lookup, if no
connected UDP socket matched packet 4-tuple.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index ee2073329d25..934f41a5e6ca 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -197,28 +197,47 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       int dif, int sdif, struct udp_table *udptable,
 			       struct sk_buff *skb)
 {
+	struct sock *result, *sk, *reuse_sk;
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
-	struct sock *result;
 
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-		slot2 = hash2 & udptable->mask;
+	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+		goto done;
 
-		hslot2 = &udptable->hash2[slot2];
-
-		result = udp6_lib_lookup2(net, saddr, sport,
-					  &in6addr_any, hnum, dif, sdif,
-					  hslot2, skb);
+	/* Lookup redirect from BPF */
+	sk = inet6_lookup_run_bpf(net, udptable->protocol,
+				  saddr, sport, daddr, hnum);
+	if (IS_ERR(sk))
+		return NULL;
+	if (sk) {
+		reuse_sk = lookup_reuseport(net, sk, skb,
+					    saddr, sport, daddr, hnum);
+		result = reuse_sk ? : sk;
+		goto done;
 	}
+
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
+
+	/* Lookup wildcard sockets */
+	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  &in6addr_any, hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 12/17] udp6: Run SK_LOOKUP BPF program on socket lookup
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Same as for udp4, let BPF program override the socket lookup result, by
selecting a receiving socket of its choice or failing the lookup, if no
connected UDP socket matched packet 4-tuple.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index ee2073329d25..934f41a5e6ca 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -197,28 +197,47 @@ struct sock *__udp6_lib_lookup(struct net *net,
 			       int dif, int sdif, struct udp_table *udptable,
 			       struct sk_buff *skb)
 {
+	struct sock *result, *sk, *reuse_sk;
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
-	struct sock *result;
 
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-		slot2 = hash2 & udptable->mask;
+	if (!IS_ERR_OR_NULL(result) && result->sk_state = TCP_ESTABLISHED)
+		goto done;
 
-		hslot2 = &udptable->hash2[slot2];
-
-		result = udp6_lib_lookup2(net, saddr, sport,
-					  &in6addr_any, hnum, dif, sdif,
-					  hslot2, skb);
+	/* Lookup redirect from BPF */
+	sk = inet6_lookup_run_bpf(net, udptable->protocol,
+				  saddr, sport, daddr, hnum);
+	if (IS_ERR(sk))
+		return NULL;
+	if (sk) {
+		reuse_sk = lookup_reuseport(net, sk, skb,
+					    saddr, sport, daddr, hnum);
+		result = reuse_sk ? : sk;
+		goto done;
 	}
+
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
+
+	/* Lookup wildcard sockets */
+	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  &in6addr_any, hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 13/17] bpf: Sync linux/bpf.h to tools/
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski,
	Lorenz Bauer

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/include/uapi/linux/bpf.h | 53 ++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b3643e27e264..e4c61b63d4bc 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -187,6 +187,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -218,6 +219,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3041,6 +3043,10 @@ union bpf_attr {
  *
  * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3061,6 +3067,39 @@ union bpf_attr {
  *					call from outside of TC ingress.
  *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
  *
+ * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
+ *		match. While IP family (*AF_INET* or *AF_INET6*) must be
+ *		compatible, that is IPv6 sockets that are not v6-only can be
+ *		selected for IPv4 packets.
+ *
+ *		Only full sockets can be selected. However, there is no need to
+ *		call bpf_fullsock() before passing a socket as an argument to
+ *		this helper.
+ *
+ *		The *flags* argument must be zero.
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		**-EAFNOSUPPORT** is socket family (*sk->family*) is not
+ *		compatible with packet family (*ctx->family*).
+ *
+ *		**-EINVAL** if unsupported flags were specified.
+ *
+ *		**-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
+ *		match packet protocol (*ctx->protocol*).
+ *
+ *		**-ESOCKTNOSUPPORT** if socket is not a full socket.
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPPROTO_TCP, IPPROTO_UDP */
+	/* IP addresses allows 1, 2, and 4 bytes access */
+	__u32 src_ip4;
+	__u32 src_ip6[4];
+	__u32 src_port;		/* network byte order */
+	__u32 dst_ip4;
+	__u32 dst_ip6[4];
+	__u32 dst_port;		/* host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 13/17] bpf: Sync linux/bpf.h to tools/
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/include/uapi/linux/bpf.h | 53 ++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b3643e27e264..e4c61b63d4bc 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -187,6 +187,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -218,6 +219,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3041,6 +3043,10 @@ union bpf_attr {
  *
  * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3061,6 +3067,39 @@ union bpf_attr {
  *					call from outside of TC ingress.
  *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
  *
+ * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
+ *		match. While IP family (*AF_INET* or *AF_INET6*) must be
+ *		compatible, that is IPv6 sockets that are not v6-only can be
+ *		selected for IPv4 packets.
+ *
+ *		Only full sockets can be selected. However, there is no need to
+ *		call bpf_fullsock() before passing a socket as an argument to
+ *		this helper.
+ *
+ *		The *flags* argument must be zero.
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		**-EAFNOSUPPORT** is socket family (*sk->family*) is not
+ *		compatible with packet family (*ctx->family*).
+ *
+ *		**-EINVAL** if unsupported flags were specified.
+ *
+ *		**-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
+ *		match packet protocol (*ctx->protocol*).
+ *
+ *		**-ESOCKTNOSUPPORT** if socket is not a full socket.
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPPROTO_TCP, IPPROTO_UDP */
+	/* IP addresses allows 1, 2, and 4 bytes access */
+	__u32 src_ip4;
+	__u32 src_ip6[4];
+	__u32 src_port;		/* network byte order */
+	__u32 dst_ip4;
+	__u32 dst_ip6[4];
+	__u32 dst_port;		/* host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Make libbpf aware of the newly added program type, and assign it a
section name.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/lib/bpf/libbpf.c        | 3 +++
 tools/lib/bpf/libbpf.h        | 2 ++
 tools/lib/bpf/libbpf.map      | 2 ++
 tools/lib/bpf/libbpf_probes.c | 1 +
 4 files changed, 8 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 977add1b73e2..74f4a15dc19e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
 BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
 BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
+BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
 
 enum bpf_attach_type
 bpf_program__get_expected_attach_type(struct bpf_program *prog)
@@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
 	BPF_EAPROG_SEC("cgroup/setsockopt",	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 						BPF_CGROUP_SETSOCKOPT),
 	BPF_PROG_SEC("struct_ops",		BPF_PROG_TYPE_STRUCT_OPS),
+	BPF_EAPROG_SEC("sk_lookup",		BPF_PROG_TYPE_SK_LOOKUP,
+						BPF_SK_LOOKUP),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f1dacecb1619..8373fbacbba3 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
+LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
 
 LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
 LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
@@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
+LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index e03bd4db827e..113ac0a669c2 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
 		bpf_program__set_attach_target;
 		bpf_program__set_lsm;
 		bpf_set_link_xdp_fd_opts;
+		bpf_program__is_sk_lookup;
+		bpf_program__set_sk_lookup;
 } LIBBPF_0.0.7;
 
 LIBBPF_0.0.9 {
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 2c92059c0c90..5c6d3e49f254 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_STRUCT_OPS:
 	case BPF_PROG_TYPE_EXT:
 	case BPF_PROG_TYPE_LSM:
+	case BPF_PROG_TYPE_SK_LOOKUP:
 	default:
 		break;
 	}
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Make libbpf aware of the newly added program type, and assign it a
section name.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/lib/bpf/libbpf.c        | 3 +++
 tools/lib/bpf/libbpf.h        | 2 ++
 tools/lib/bpf/libbpf.map      | 2 ++
 tools/lib/bpf/libbpf_probes.c | 1 +
 4 files changed, 8 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 977add1b73e2..74f4a15dc19e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
 BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
 BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
+BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
 
 enum bpf_attach_type
 bpf_program__get_expected_attach_type(struct bpf_program *prog)
@@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
 	BPF_EAPROG_SEC("cgroup/setsockopt",	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 						BPF_CGROUP_SETSOCKOPT),
 	BPF_PROG_SEC("struct_ops",		BPF_PROG_TYPE_STRUCT_OPS),
+	BPF_EAPROG_SEC("sk_lookup",		BPF_PROG_TYPE_SK_LOOKUP,
+						BPF_SK_LOOKUP),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f1dacecb1619..8373fbacbba3 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
+LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
 
 LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
 LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
@@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
+LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index e03bd4db827e..113ac0a669c2 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
 		bpf_program__set_attach_target;
 		bpf_program__set_lsm;
 		bpf_set_link_xdp_fd_opts;
+		bpf_program__is_sk_lookup;
+		bpf_program__set_sk_lookup;
 } LIBBPF_0.0.7;
 
 LIBBPF_0.0.9 {
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 2c92059c0c90..5c6d3e49f254 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_STRUCT_OPS:
 	case BPF_PROG_TYPE_EXT:
 	case BPF_PROG_TYPE_LSM:
+	case BPF_PROG_TYPE_SK_LOOKUP:
 	default:
 		break;
 	}
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 15/17] selftests/bpf: Add verifier tests for bpf_sk_lookup context access
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Exercise verifier access checks for bpf_sk_lookup context fields.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/verifier/ctx_sk_lookup.c    | 696 ++++++++++++++++++
 1 file changed, 696 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

diff --git a/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
new file mode 100644
index 000000000000..167cc3da6502
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
@@ -0,0 +1,696 @@
+{
+	"valid 1,2,4-byte read bpf_sk_lookup src_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup src_ip4",
+	.insns = {
+		/* 8-byte read */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 2-byte write */
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 1-byte write */
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_sk_lookup dst_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4) + 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_sk_lookup src_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup,
+				     src_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup,
+				     src_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_sk_lookup dst_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 15/17] selftests/bpf: Add verifier tests for bpf_sk_lookup context access
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Exercise verifier access checks for bpf_sk_lookup context fields.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/verifier/ctx_sk_lookup.c    | 696 ++++++++++++++++++
 1 file changed, 696 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

diff --git a/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
new file mode 100644
index 000000000000..167cc3da6502
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
@@ -0,0 +1,696 @@
+{
+	"valid 1,2,4-byte read bpf_sk_lookup src_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup src_ip4",
+	.insns = {
+		/* 8-byte read */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 2-byte write */
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup src_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 1-byte write */
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_sk_lookup dst_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4) + 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup dst_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_sk_lookup src_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup,
+				     src_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup,
+				     src_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup src_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_sk_lookup dst_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup dst_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup src_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup src_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, src_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup dst_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, dst_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_sk_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_sk_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+},
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 16/17] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Name the BPF C file after the test case that uses it.

This frees up "test_sk_lookup" namespace for BPF sk_lookup program tests
introduced by the following patch.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/reference_tracking.c     | 2 +-
 .../bpf/progs/{test_sk_lookup_kern.c => test_ref_track_kern.c}  | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename tools/testing/selftests/bpf/progs/{test_sk_lookup_kern.c => test_ref_track_kern.c} (100%)

diff --git a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
index fc0d7f4f02cf..106ca8bb2a8f 100644
--- a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
+++ b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
@@ -3,7 +3,7 @@
 
 void test_reference_tracking(void)
 {
-	const char *file = "test_sk_lookup_kern.o";
+	const char *file = "test_ref_track_kern.o";
 	const char *obj_name = "ref_track";
 	DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts,
 		.object_name = obj_name,
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/progs/test_ref_track_kern.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
rename to tools/testing/selftests/bpf/progs/test_ref_track_kern.c
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 16/17] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Name the BPF C file after the test case that uses it.

This frees up "test_sk_lookup" namespace for BPF sk_lookup program tests
introduced by the following patch.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/reference_tracking.c     | 2 +-
 .../bpf/progs/{test_sk_lookup_kern.c => test_ref_track_kern.c}  | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename tools/testing/selftests/bpf/progs/{test_sk_lookup_kern.c => test_ref_track_kern.c} (100%)

diff --git a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
index fc0d7f4f02cf..106ca8bb2a8f 100644
--- a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
+++ b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
@@ -3,7 +3,7 @@
 
 void test_reference_tracking(void)
 {
-	const char *file = "test_sk_lookup_kern.o";
+	const char *file = "test_ref_track_kern.o";
 	const char *obj_name = "ref_track";
 	DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts,
 		.object_name = obj_name,
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/progs/test_ref_track_kern.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
rename to tools/testing/selftests/bpf/progs/test_ref_track_kern.c
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 17/17] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: netdev, bpf
  Cc: dccp, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Eric Dumazet, Gerrit Renker, Jakub Kicinski

Add tests to test_progs that exercise:

 - attaching/detaching/querying sk_lookup program,
 - overriding socket lookup result for TCP/UDP with BPF sk_lookup by
   a) selecting a socket fetched from a SOCKMAP, or
   b) failing the lookup with no match.

Tests cover two special cases:

 - selecting an IPv6 socket (non v6-only) to receive an IPv4 packet,
 - using BPF sk_lookup together with BPF sk_reuseport program.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/prog_tests/sk_lookup.c      | 999 ++++++++++++++++++
 .../selftests/bpf/progs/test_sk_lookup_kern.c | 162 +++
 2 files changed, 1161 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c

diff --git a/tools/testing/selftests/bpf/prog_tests/sk_lookup.c b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
new file mode 100644
index 000000000000..96765b156f6f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
@@ -0,0 +1,999 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+/*
+ * Test BPF attach point for INET socket lookup (BPF_SK_LOOKUP).
+ *
+ * Tests exercise:
+ *
+ * 1. attaching/detaching/querying BPF sk_lookup program,
+ * 2. overriding socket lookup result by:
+ *    a) selecting a listening (TCP) or receiving (UDP) socket,
+ *    b) failing the lookup with no match.
+ *
+ * Special cases covered are:
+ * - selecting an IPv6 socket (non v6-only) to receive an IPv4 packet,
+ * - using BPF sk_lookup together with BPF sk_reuseport program.
+ *
+ * Tests run in a dedicated network namespace.
+ */
+
+#define _GNU_SOURCE
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+#include "test_sk_lookup_kern.skel.h"
+#include "test_progs.h"
+
+/* External (address, port) pairs the client sends packets to. */
+#define EXT_IP4		"127.0.0.1"
+#define EXT_IP6		"fd00::1"
+#define EXT_PORT	7007
+
+/* Internal (address, port) pairs the server listens/receives at. */
+#define INT_IP4		"127.0.0.2"
+#define INT_IP4_V6	"::ffff:127.0.0.2"
+#define INT_IP6		"fd00::2"
+#define INT_PORT	8008
+
+#define IO_TIMEOUT_SEC	3
+
+enum {
+	SERVER_A = 0,
+	SERVER_B = 1,
+	MAX_SERVERS,
+};
+
+struct inet_addr {
+	const char *ip;
+	unsigned short port;
+};
+
+struct test {
+	const char *desc;
+	struct bpf_program *lookup_prog;
+	struct bpf_program *reuseport_prog;
+	struct bpf_map *sock_map;
+	int sotype;
+	struct inet_addr send_to;
+	struct inet_addr recv_at;
+};
+
+static bool is_ipv6(const char *ip)
+{
+	return !!strchr(ip, ':');
+}
+
+static int make_addr(const char *ip, int port, struct sockaddr_storage *addr)
+{
+	struct sockaddr_in6 *addr6 = (void *)addr;
+	struct sockaddr_in *addr4 = (void *)addr;
+	int ret;
+
+	errno = 0;
+	if (is_ipv6(ip)) {
+		ret = inet_pton(AF_INET6, ip, &addr6->sin6_addr);
+		if (CHECK_FAIL(ret <= 0)) {
+			log_err("failed to convert IPv6 address '%s'", ip);
+			return -1;
+		}
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+	} else {
+		ret = inet_pton(AF_INET, ip, &addr4->sin_addr);
+		if (CHECK_FAIL(ret <= 0)) {
+			log_err("failed to convert IPv4 address '%s'", ip);
+			return -1;
+		}
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+	}
+	return 0;
+}
+
+static int setup_reuseport_prog(int sock_fd, struct bpf_program *reuseport_prog)
+{
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(reuseport_prog);
+	if (prog_fd < 0) {
+		errno = -prog_fd;
+		log_err("failed to get fd for program '%s'",
+			bpf_program__name(reuseport_prog));
+		return -1;
+	}
+
+	err = setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF,
+			 &prog_fd, sizeof(prog_fd));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to ATTACH_REUSEPORT_EBPF");
+		return -1;
+	}
+
+	return 0;
+}
+
+static socklen_t inetaddr_len(const struct sockaddr_storage *addr)
+{
+	return (addr->ss_family == AF_INET ? sizeof(struct sockaddr_in) :
+		addr->ss_family == AF_INET6 ? sizeof(struct sockaddr_in6) : 0);
+}
+
+static int make_socket_with_addr(int sotype, const char *ip, int port,
+				 struct sockaddr_storage *addr)
+{
+	struct timeval timeo = { .tv_sec = IO_TIMEOUT_SEC };
+	int err, fd;
+
+	err = make_addr(ip, port, addr);
+	if (err)
+		return -1;
+
+	fd = socket(addr->ss_family, sotype, 0);
+	if (CHECK_FAIL(fd < 0)) {
+		log_err("failed to create listen socket");
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeo, sizeof(timeo));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to set SO_SNDTIMEO");
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeo, sizeof(timeo));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to set SO_RCVTIMEO");
+		return -1;
+	}
+
+	return fd;
+}
+
+static int make_server(int sotype, const char *ip, int port,
+		       struct bpf_program *reuseport_prog)
+{
+	struct sockaddr_storage addr = {0};
+	const int one = 1;
+	int err, fd = -1;
+
+	fd = make_socket_with_addr(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	/* Enabled for UDPv6 sockets for IPv4-mapped IPv6 to work. */
+	if (sotype == SOCK_DGRAM) {
+		err = setsockopt(fd, SOL_IP, IP_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable IP_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype == SOCK_DGRAM && addr.ss_family == AF_INET6) {
+		err = setsockopt(fd, SOL_IPV6, IPV6_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable IPV6_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype == SOCK_STREAM) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable SO_REUSEADDR");
+			goto fail;
+		}
+	}
+
+	if (reuseport_prog) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable SO_REUSEPORT");
+			goto fail;
+		}
+	}
+
+	err = bind(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to bind listen socket");
+		goto fail;
+	}
+
+	if (sotype == SOCK_STREAM) {
+		err = listen(fd, SOMAXCONN);
+		if (CHECK_FAIL(err)) {
+			log_err("failed to listen on port %d", port);
+			goto fail;
+		}
+	}
+
+	/* Late attach reuseport prog so we can have one init path */
+	if (reuseport_prog) {
+		err = setup_reuseport_prog(fd, reuseport_prog);
+		if (err)
+			goto fail;
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int make_client(int sotype, const char *ip, int port)
+{
+	struct sockaddr_storage addr = {0};
+	int err, fd;
+
+	fd = make_socket_with_addr(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	err = connect(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to connect client socket");
+		goto fail;
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int send_byte(int fd)
+{
+	ssize_t n;
+
+	errno = 0;
+	n = send(fd, "a", 1, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial send");
+		return -1;
+	}
+	return 0;
+}
+
+static int recv_byte(int fd)
+{
+	char buf[1];
+	ssize_t n;
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial recv");
+		return -1;
+	}
+	return 0;
+}
+
+static int tcp_recv_send(int server_fd)
+{
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	fd = accept(server_fd, NULL, NULL);
+	if (CHECK_FAIL(fd < 0)) {
+		log_err("failed to accept");
+		return -1;
+	}
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial recv");
+		ret = -1;
+		goto close;
+	}
+
+	n = send(fd, buf, n, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial send");
+		ret = -1;
+		goto close;
+	}
+
+	ret = 0;
+close:
+	close(fd);
+	return ret;
+}
+
+static void v4_to_v6(struct sockaddr_storage *ss)
+{
+	struct sockaddr_in6 *v6 = (struct sockaddr_in6 *)ss;
+	struct sockaddr_in v4 = *(struct sockaddr_in *)ss;
+
+	v6->sin6_family = AF_INET6;
+	v6->sin6_port = v4.sin_port;
+	v6->sin6_addr.s6_addr[10] = 0xff;
+	v6->sin6_addr.s6_addr[11] = 0xff;
+	memcpy(&v6->sin6_addr.s6_addr[12], &v4.sin_addr.s_addr, 4);
+}
+
+static int udp_recv_send(int server_fd)
+{
+	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
+	struct sockaddr_storage _src_addr = { 0 };
+	struct sockaddr_storage *src_addr = &_src_addr;
+	struct sockaddr_storage *dst_addr = NULL;
+	struct msghdr msg = { 0 };
+	struct iovec iov = { 0 };
+	struct cmsghdr *cm;
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	iov.iov_base = buf;
+	iov.iov_len = sizeof(buf);
+
+	msg.msg_name = src_addr;
+	msg.msg_namelen = sizeof(*src_addr);
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+
+	errno = 0;
+	n = recvmsg(server_fd, &msg, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed to receive");
+		return -1;
+	}
+	if (CHECK_FAIL(msg.msg_flags & MSG_CTRUNC)) {
+		log_err("truncated cmsg");
+		return -1;
+	}
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if ((cm->cmsg_level == SOL_IP &&
+		     cm->cmsg_type == IP_ORIGDSTADDR) ||
+		    (cm->cmsg_level == SOL_IPV6 &&
+		     cm->cmsg_type == IPV6_ORIGDSTADDR)) {
+			dst_addr = (struct sockaddr_storage *)CMSG_DATA(cm);
+			break;
+		}
+		log_err("warning: ignored cmsg at level %d type %d",
+			cm->cmsg_level, cm->cmsg_type);
+	}
+	if (CHECK_FAIL(!dst_addr)) {
+		log_err("failed to get destination address");
+		return -1;
+	}
+
+	/* Server socket bound to IPv4-mapped IPv6 address */
+	if (src_addr->ss_family == AF_INET6 &&
+	    dst_addr->ss_family == AF_INET) {
+		v4_to_v6(dst_addr);
+	}
+
+	/* Reply from original destination address. */
+	fd = socket(dst_addr->ss_family, SOCK_DGRAM, 0);
+	if (CHECK_FAIL(fd < 0)) {
+		log_err("failed to create tx socket");
+		return -1;
+	}
+
+	ret = bind(fd, (struct sockaddr *)dst_addr, sizeof(*dst_addr));
+	if (CHECK_FAIL(ret)) {
+		log_err("failed to bind tx socket");
+		goto out;
+	}
+
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	n = sendmsg(fd, &msg, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed to send echo reply");
+		ret = -1;
+		goto out;
+	}
+
+	ret = 0;
+out:
+	close(fd);
+	return ret;
+}
+
+static int tcp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = tcp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static int udp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = udp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static int attach_lookup_prog(struct bpf_program *prog)
+{
+	const char *prog_name = bpf_program__name(prog);
+	enum bpf_attach_type attach_type;
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (CHECK_FAIL(prog_fd < 0)) {
+		errno = -prog_fd;
+		log_err("failed to get fd for program '%s'", prog_name);
+		return -1;
+	}
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	err = bpf_prog_attach(prog_fd, -1 /* target fd */, attach_type, 0);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to attach program '%s'", prog_name);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int detach_lookup_prog(struct bpf_program *prog)
+{
+	const char *prog_name = bpf_program__name(prog);
+	enum bpf_attach_type attach_type;
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (CHECK_FAIL(prog_fd < 0)) {
+		errno = -prog_fd;
+		log_err("failed to get fd for program '%s'", prog_name);
+		return -1;
+	}
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	err = bpf_prog_detach2(prog_fd, -1 /* attachable fd */, attach_type);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to detach program '%s'", prog_name);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int update_lookup_map(struct bpf_map *map, int index, int sock_fd)
+{
+	int err, map_fd;
+	uint64_t value;
+
+	map_fd = bpf_map__fd(map);
+	if (CHECK_FAIL(map_fd < 0)) {
+		errno = -map_fd;
+		log_err("failed to get map FD");
+		return -1;
+	}
+
+	value = (uint64_t)sock_fd;
+	err = bpf_map_update_elem(map_fd, &index, &value, BPF_NOEXIST);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to update redir_map @ %d", index);
+		return -1;
+	}
+
+	return 0;
+}
+
+static void query_lookup_prog(struct test_sk_lookup_kern *skel)
+{
+	struct bpf_program *lookup_prog = skel->progs.lookup_pass;
+	enum bpf_attach_type attach_type;
+	__u32 attach_flags = 0;
+	__u32 prog_ids[1] = { 0 };
+	__u32 prog_cnt = 1;
+	int net_fd = -1;
+	int err;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(net_fd < 0)) {
+		log_err("failed to open /proc/self/ns/net");
+		return;
+	}
+
+	err = attach_lookup_prog(lookup_prog);
+	if (err)
+		goto close;
+
+	attach_type = bpf_program__get_expected_attach_type(lookup_prog);
+	err = bpf_prog_query(net_fd, attach_type, 0 /* query flags */,
+			     &attach_flags, prog_ids, &prog_cnt);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to query lookup prog");
+		goto detach;
+	}
+
+	errno = 0;
+	if (CHECK_FAIL(attach_flags != 0)) {
+		log_err("wrong attach_flags on query: %u", attach_flags);
+		goto detach;
+	}
+	if (CHECK_FAIL(prog_cnt != 1)) {
+		log_err("wrong program count on query: %u", prog_cnt);
+		goto detach;
+	}
+	if (CHECK_FAIL(prog_ids[0] == 0)) {
+		log_err("invalid program id on query: %u", prog_ids[0]);
+		goto detach;
+	}
+
+detach:
+	detach_lookup_prog(lookup_prog);
+close:
+	close(net_fd);
+}
+
+static void run_lookup_prog(const struct test *t)
+{
+	int client_fd, server_fds[MAX_SERVERS] = { -1 };
+	int i, err, server_idx;
+
+	err = attach_lookup_prog(t->lookup_prog);
+	if (err)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		server_fds[i] = make_server(t->sotype, t->recv_at.ip,
+					    t->recv_at.port, t->reuseport_prog);
+		if (server_fds[i] < 0)
+			goto close;
+
+		err = update_lookup_map(t->sock_map, i, server_fds[i]);
+		if (err)
+			goto detach;
+
+		/* want just one server for non-reuseport test */
+		if (!t->reuseport_prog)
+			break;
+	}
+
+	client_fd = make_client(t->sotype, t->send_to.ip, t->send_to.port);
+	if (client_fd < 0)
+		goto close;
+
+	/* reuseport prog always selects server B */
+	server_idx = t->reuseport_prog ? SERVER_B : SERVER_A;
+
+	if (t->sotype == SOCK_STREAM)
+		tcp_echo_test(client_fd, server_fds[server_idx]);
+	else
+		udp_echo_test(client_fd, server_fds[server_idx]);
+
+	close(client_fd);
+close:
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++)
+		close(server_fds[i]);
+detach:
+	detach_lookup_prog(t->lookup_prog);
+}
+
+static void test_override_lookup(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+		{
+			.desc		= "TCP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.recv_at	= { INT_IP4_V6, INT_PORT },
+			.send_to	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+		{
+			.desc		= "UDP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+		{
+			.desc		= "UDP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.recv_at	= { INT_IP4_V6, INT_PORT },
+			.send_to	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			run_lookup_prog(t);
+	}
+}
+
+static void drop_on_lookup(const struct test *t)
+{
+	struct sockaddr_storage dst = { 0 };
+	int client_fd, server_fd, err;
+	ssize_t n;
+
+	if (attach_lookup_prog(t->lookup_prog))
+		return;
+
+	server_fd = make_server(t->sotype, t->recv_at.ip, t->recv_at.port,
+				t->reuseport_prog);
+	if (server_fd < 0)
+		goto detach;
+
+	client_fd = make_socket_with_addr(t->sotype, t->send_to.ip,
+					  t->send_to.port, &dst);
+	if (client_fd < 0)
+		goto close_srv;
+
+	err = connect(client_fd, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype == SOCK_DGRAM) {
+		err = send_byte(client_fd);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client_fd, NULL, 0, 0);
+		err = n == -1;
+	}
+	if (CHECK_FAIL(!err || errno != ECONNREFUSED))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client_fd);
+close_srv:
+	close(server_fd);
+detach:
+	detach_lookup_prog(t->lookup_prog);
+}
+
+static void test_drop_on_lookup(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, INT_PORT },
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_lookup(t);
+	}
+}
+
+static void drop_on_reuseport(const struct test *t)
+{
+	struct sockaddr_storage dst = { 0 };
+	int client, server1, server2, err;
+	ssize_t n;
+
+	if (attach_lookup_prog(t->lookup_prog))
+		return;
+
+	server1 = make_server(t->sotype, t->recv_at.ip, t->recv_at.port,
+			      t->reuseport_prog);
+	if (server1 < 0)
+		goto detach;
+
+	err = update_lookup_map(t->sock_map, SERVER_A, server1);
+	if (err)
+		goto detach;
+
+	/* second server on destination address we should never reach */
+	server2 = make_server(t->sotype, t->send_to.ip, t->send_to.port,
+			      NULL /* reuseport prog */);
+	if (server2 < 0)
+		goto close_srv1;
+
+	client = make_socket_with_addr(t->sotype, t->send_to.ip,
+				       t->send_to.port, &dst);
+	if (client < 0)
+		goto close_srv2;
+
+	err = connect(client, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype == SOCK_DGRAM) {
+		err = send_byte(client);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client, NULL, 0, 0);
+		err = n == -1;
+	}
+	if (CHECK_FAIL(!err || errno != ECONNREFUSED))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client);
+close_srv2:
+	close(server2);
+close_srv1:
+	close(server1);
+detach:
+	detach_lookup_prog(t->lookup_prog);
+}
+
+static void test_drop_on_reuseport(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+		},
+
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_reuseport(t);
+	}
+}
+
+static void run_tests(struct test_sk_lookup_kern *skel)
+{
+	if (test__start_subtest("query lookup prog"))
+		query_lookup_prog(skel);
+	test_override_lookup(skel);
+	test_drop_on_lookup(skel);
+	test_drop_on_reuseport(skel);
+}
+
+static int switch_netns(int *saved_net)
+{
+	static const char * const setup_script[] = {
+		"ip -6 addr add dev lo " EXT_IP6 "/128 nodad",
+		"ip -6 addr add dev lo " INT_IP6 "/128 nodad",
+		"ip link set dev lo up",
+		NULL,
+	};
+	const char * const *cmd;
+	int net_fd, err;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(net_fd < 0)) {
+		log_err("open(/proc/self/ns/net)");
+		return -1;
+	}
+
+	err = unshare(CLONE_NEWNET);
+	if (CHECK_FAIL(err)) {
+		log_err("unshare(CLONE_NEWNET)");
+		goto close;
+	}
+
+	for (cmd = setup_script; *cmd; cmd++) {
+		err = system(*cmd);
+		if (CHECK_FAIL(err)) {
+			log_err("system(%s)", *cmd);
+			goto close;
+		}
+	}
+
+	*saved_net = net_fd;
+	return 0;
+
+close:
+	close(net_fd);
+	return -1;
+}
+
+static void restore_netns(int saved_net)
+{
+	int err;
+
+	err = setns(saved_net, CLONE_NEWNET);
+	if (CHECK_FAIL(err))
+		log_err("setns(CLONE_NEWNET)");
+
+	close(saved_net);
+}
+
+void test_sk_lookup(void)
+{
+	struct test_sk_lookup_kern *skel;
+	int err, saved_net;
+
+	err = switch_netns(&saved_net);
+	if (err)
+		return;
+
+	skel = test_sk_lookup_kern__open_and_load();
+	if (CHECK_FAIL(!skel)) {
+		errno = 0;
+		log_err("failed to open and load BPF skeleton");
+		goto restore_netns;
+	}
+
+	run_tests(skel);
+
+	test_sk_lookup_kern__destroy(skel);
+restore_netns:
+	restore_netns(saved_net);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
new file mode 100644
index 000000000000..4ad7c6842487
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+
+#define IP4(a, b, c, d)					\
+	bpf_htonl((((__u32)(a) & 0xffU) << 24) |	\
+		  (((__u32)(b) & 0xffU) << 16) |	\
+		  (((__u32)(c) & 0xffU) <<  8) |	\
+		  (((__u32)(d) & 0xffU) <<  0))
+#define IP6(aaaa, bbbb, cccc, dddd)			\
+	{ bpf_htonl(aaaa), bpf_htonl(bbbb), bpf_htonl(cccc), bpf_htonl(dddd) }
+
+#define MAX_SOCKS 32
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, MAX_SOCKS);
+	__type(key, __u32);
+	__type(value, __u64);
+} redir_map SEC(".maps");
+
+enum {
+	SERVER_A = 0,
+	SERVER_B = 1,
+};
+
+enum {
+	NO_FLAGS = 0,
+};
+
+static const __u32 DST_PORT = 7007;
+static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
+static const __u32 DST_IP6[] = IP6(0xfd000000, 0x0, 0x0, 0x00000001);
+
+SEC("sk_lookup/lookup_pass")
+int lookup_pass(struct bpf_sk_lookup *ctx)
+{
+	return BPF_OK;
+}
+
+SEC("sk_lookup/lookup_drop")
+int lookup_drop(struct bpf_sk_lookup *ctx)
+{
+	return BPF_DROP;
+}
+
+SEC("sk_reuseport/reuse_pass")
+int reuseport_pass(struct sk_reuseport_md *ctx)
+{
+	return SK_PASS;
+}
+
+SEC("sk_reuseport/reuse_drop")
+int reuseport_drop(struct sk_reuseport_md *ctx)
+{
+	return SK_DROP;
+}
+
+/* Redirect packets destined for port DST_PORT to socket at redir_map[0]. */
+SEC("sk_lookup/redir_port")
+int redir_port(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->dst_port != DST_PORT)
+		return BPF_OK;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip4")
+int redir_ip4(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET)
+		return BPF_OK;
+	if (ctx->dst_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->dst_ip4 != DST_IP4)
+		return BPF_OK;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+/* Redirect packets destined for DST_IP6 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip6")
+int redir_ip6(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET6)
+		return BPF_OK;
+	if (ctx->dst_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->dst_ip6[0] != DST_IP6[0] ||
+	    ctx->dst_ip6[1] != DST_IP6[1] ||
+	    ctx->dst_ip6[2] != DST_IP6[2] ||
+	    ctx->dst_ip6[3] != DST_IP6[3])
+		return BPF_OK;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+SEC("sk_lookup/select_sock_a")
+int select_sock_a(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+SEC("sk_reuseport/select_sock_b")
+int select_sock_b(struct sk_reuseport_md *ctx)
+{
+	__u32 key = SERVER_B;
+	int err;
+
+	err = bpf_sk_select_reuseport(ctx, &redir_map, &key, NO_FLAGS);
+	return err ? SK_DROP : SK_PASS;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
+__u32 _version SEC("version") = 1;
-- 
2.25.3


^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH bpf-next 17/17] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
@ 2020-05-06 12:55   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 12:55 UTC (permalink / raw)
  To: dccp

Add tests to test_progs that exercise:

 - attaching/detaching/querying sk_lookup program,
 - overriding socket lookup result for TCP/UDP with BPF sk_lookup by
   a) selecting a socket fetched from a SOCKMAP, or
   b) failing the lookup with no match.

Tests cover two special cases:

 - selecting an IPv6 socket (non v6-only) to receive an IPv4 packet,
 - using BPF sk_lookup together with BPF sk_reuseport program.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/prog_tests/sk_lookup.c      | 999 ++++++++++++++++++
 .../selftests/bpf/progs/test_sk_lookup_kern.c | 162 +++
 2 files changed, 1161 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c

diff --git a/tools/testing/selftests/bpf/prog_tests/sk_lookup.c b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
new file mode 100644
index 000000000000..96765b156f6f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
@@ -0,0 +1,999 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+/*
+ * Test BPF attach point for INET socket lookup (BPF_SK_LOOKUP).
+ *
+ * Tests exercise:
+ *
+ * 1. attaching/detaching/querying BPF sk_lookup program,
+ * 2. overriding socket lookup result by:
+ *    a) selecting a listening (TCP) or receiving (UDP) socket,
+ *    b) failing the lookup with no match.
+ *
+ * Special cases covered are:
+ * - selecting an IPv6 socket (non v6-only) to receive an IPv4 packet,
+ * - using BPF sk_lookup together with BPF sk_reuseport program.
+ *
+ * Tests run in a dedicated network namespace.
+ */
+
+#define _GNU_SOURCE
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+#include "test_sk_lookup_kern.skel.h"
+#include "test_progs.h"
+
+/* External (address, port) pairs the client sends packets to. */
+#define EXT_IP4		"127.0.0.1"
+#define EXT_IP6		"fd00::1"
+#define EXT_PORT	7007
+
+/* Internal (address, port) pairs the server listens/receives at. */
+#define INT_IP4		"127.0.0.2"
+#define INT_IP4_V6	"::ffff:127.0.0.2"
+#define INT_IP6		"fd00::2"
+#define INT_PORT	8008
+
+#define IO_TIMEOUT_SEC	3
+
+enum {
+	SERVER_A = 0,
+	SERVER_B = 1,
+	MAX_SERVERS,
+};
+
+struct inet_addr {
+	const char *ip;
+	unsigned short port;
+};
+
+struct test {
+	const char *desc;
+	struct bpf_program *lookup_prog;
+	struct bpf_program *reuseport_prog;
+	struct bpf_map *sock_map;
+	int sotype;
+	struct inet_addr send_to;
+	struct inet_addr recv_at;
+};
+
+static bool is_ipv6(const char *ip)
+{
+	return !!strchr(ip, ':');
+}
+
+static int make_addr(const char *ip, int port, struct sockaddr_storage *addr)
+{
+	struct sockaddr_in6 *addr6 = (void *)addr;
+	struct sockaddr_in *addr4 = (void *)addr;
+	int ret;
+
+	errno = 0;
+	if (is_ipv6(ip)) {
+		ret = inet_pton(AF_INET6, ip, &addr6->sin6_addr);
+		if (CHECK_FAIL(ret <= 0)) {
+			log_err("failed to convert IPv6 address '%s'", ip);
+			return -1;
+		}
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+	} else {
+		ret = inet_pton(AF_INET, ip, &addr4->sin_addr);
+		if (CHECK_FAIL(ret <= 0)) {
+			log_err("failed to convert IPv4 address '%s'", ip);
+			return -1;
+		}
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+	}
+	return 0;
+}
+
+static int setup_reuseport_prog(int sock_fd, struct bpf_program *reuseport_prog)
+{
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(reuseport_prog);
+	if (prog_fd < 0) {
+		errno = -prog_fd;
+		log_err("failed to get fd for program '%s'",
+			bpf_program__name(reuseport_prog));
+		return -1;
+	}
+
+	err = setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF,
+			 &prog_fd, sizeof(prog_fd));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to ATTACH_REUSEPORT_EBPF");
+		return -1;
+	}
+
+	return 0;
+}
+
+static socklen_t inetaddr_len(const struct sockaddr_storage *addr)
+{
+	return (addr->ss_family = AF_INET ? sizeof(struct sockaddr_in) :
+		addr->ss_family = AF_INET6 ? sizeof(struct sockaddr_in6) : 0);
+}
+
+static int make_socket_with_addr(int sotype, const char *ip, int port,
+				 struct sockaddr_storage *addr)
+{
+	struct timeval timeo = { .tv_sec = IO_TIMEOUT_SEC };
+	int err, fd;
+
+	err = make_addr(ip, port, addr);
+	if (err)
+		return -1;
+
+	fd = socket(addr->ss_family, sotype, 0);
+	if (CHECK_FAIL(fd < 0)) {
+		log_err("failed to create listen socket");
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeo, sizeof(timeo));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to set SO_SNDTIMEO");
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeo, sizeof(timeo));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to set SO_RCVTIMEO");
+		return -1;
+	}
+
+	return fd;
+}
+
+static int make_server(int sotype, const char *ip, int port,
+		       struct bpf_program *reuseport_prog)
+{
+	struct sockaddr_storage addr = {0};
+	const int one = 1;
+	int err, fd = -1;
+
+	fd = make_socket_with_addr(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	/* Enabled for UDPv6 sockets for IPv4-mapped IPv6 to work. */
+	if (sotype = SOCK_DGRAM) {
+		err = setsockopt(fd, SOL_IP, IP_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable IP_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype = SOCK_DGRAM && addr.ss_family = AF_INET6) {
+		err = setsockopt(fd, SOL_IPV6, IPV6_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable IPV6_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype = SOCK_STREAM) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable SO_REUSEADDR");
+			goto fail;
+		}
+	}
+
+	if (reuseport_prog) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one,
+				 sizeof(one));
+		if (CHECK_FAIL(err)) {
+			log_err("failed to enable SO_REUSEPORT");
+			goto fail;
+		}
+	}
+
+	err = bind(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to bind listen socket");
+		goto fail;
+	}
+
+	if (sotype = SOCK_STREAM) {
+		err = listen(fd, SOMAXCONN);
+		if (CHECK_FAIL(err)) {
+			log_err("failed to listen on port %d", port);
+			goto fail;
+		}
+	}
+
+	/* Late attach reuseport prog so we can have one init path */
+	if (reuseport_prog) {
+		err = setup_reuseport_prog(fd, reuseport_prog);
+		if (err)
+			goto fail;
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int make_client(int sotype, const char *ip, int port)
+{
+	struct sockaddr_storage addr = {0};
+	int err, fd;
+
+	fd = make_socket_with_addr(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	err = connect(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK_FAIL(err)) {
+		log_err("failed to connect client socket");
+		goto fail;
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int send_byte(int fd)
+{
+	ssize_t n;
+
+	errno = 0;
+	n = send(fd, "a", 1, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial send");
+		return -1;
+	}
+	return 0;
+}
+
+static int recv_byte(int fd)
+{
+	char buf[1];
+	ssize_t n;
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial recv");
+		return -1;
+	}
+	return 0;
+}
+
+static int tcp_recv_send(int server_fd)
+{
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	fd = accept(server_fd, NULL, NULL);
+	if (CHECK_FAIL(fd < 0)) {
+		log_err("failed to accept");
+		return -1;
+	}
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial recv");
+		ret = -1;
+		goto close;
+	}
+
+	n = send(fd, buf, n, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed/partial send");
+		ret = -1;
+		goto close;
+	}
+
+	ret = 0;
+close:
+	close(fd);
+	return ret;
+}
+
+static void v4_to_v6(struct sockaddr_storage *ss)
+{
+	struct sockaddr_in6 *v6 = (struct sockaddr_in6 *)ss;
+	struct sockaddr_in v4 = *(struct sockaddr_in *)ss;
+
+	v6->sin6_family = AF_INET6;
+	v6->sin6_port = v4.sin_port;
+	v6->sin6_addr.s6_addr[10] = 0xff;
+	v6->sin6_addr.s6_addr[11] = 0xff;
+	memcpy(&v6->sin6_addr.s6_addr[12], &v4.sin_addr.s_addr, 4);
+}
+
+static int udp_recv_send(int server_fd)
+{
+	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
+	struct sockaddr_storage _src_addr = { 0 };
+	struct sockaddr_storage *src_addr = &_src_addr;
+	struct sockaddr_storage *dst_addr = NULL;
+	struct msghdr msg = { 0 };
+	struct iovec iov = { 0 };
+	struct cmsghdr *cm;
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	iov.iov_base = buf;
+	iov.iov_len = sizeof(buf);
+
+	msg.msg_name = src_addr;
+	msg.msg_namelen = sizeof(*src_addr);
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+
+	errno = 0;
+	n = recvmsg(server_fd, &msg, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed to receive");
+		return -1;
+	}
+	if (CHECK_FAIL(msg.msg_flags & MSG_CTRUNC)) {
+		log_err("truncated cmsg");
+		return -1;
+	}
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if ((cm->cmsg_level = SOL_IP &&
+		     cm->cmsg_type = IP_ORIGDSTADDR) ||
+		    (cm->cmsg_level = SOL_IPV6 &&
+		     cm->cmsg_type = IPV6_ORIGDSTADDR)) {
+			dst_addr = (struct sockaddr_storage *)CMSG_DATA(cm);
+			break;
+		}
+		log_err("warning: ignored cmsg at level %d type %d",
+			cm->cmsg_level, cm->cmsg_type);
+	}
+	if (CHECK_FAIL(!dst_addr)) {
+		log_err("failed to get destination address");
+		return -1;
+	}
+
+	/* Server socket bound to IPv4-mapped IPv6 address */
+	if (src_addr->ss_family = AF_INET6 &&
+	    dst_addr->ss_family = AF_INET) {
+		v4_to_v6(dst_addr);
+	}
+
+	/* Reply from original destination address. */
+	fd = socket(dst_addr->ss_family, SOCK_DGRAM, 0);
+	if (CHECK_FAIL(fd < 0)) {
+		log_err("failed to create tx socket");
+		return -1;
+	}
+
+	ret = bind(fd, (struct sockaddr *)dst_addr, sizeof(*dst_addr));
+	if (CHECK_FAIL(ret)) {
+		log_err("failed to bind tx socket");
+		goto out;
+	}
+
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	n = sendmsg(fd, &msg, 0);
+	if (CHECK_FAIL(n <= 0)) {
+		log_err("failed to send echo reply");
+		ret = -1;
+		goto out;
+	}
+
+	ret = 0;
+out:
+	close(fd);
+	return ret;
+}
+
+static int tcp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = tcp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static int udp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = udp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static int attach_lookup_prog(struct bpf_program *prog)
+{
+	const char *prog_name = bpf_program__name(prog);
+	enum bpf_attach_type attach_type;
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (CHECK_FAIL(prog_fd < 0)) {
+		errno = -prog_fd;
+		log_err("failed to get fd for program '%s'", prog_name);
+		return -1;
+	}
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	err = bpf_prog_attach(prog_fd, -1 /* target fd */, attach_type, 0);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to attach program '%s'", prog_name);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int detach_lookup_prog(struct bpf_program *prog)
+{
+	const char *prog_name = bpf_program__name(prog);
+	enum bpf_attach_type attach_type;
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(prog);
+	if (CHECK_FAIL(prog_fd < 0)) {
+		errno = -prog_fd;
+		log_err("failed to get fd for program '%s'", prog_name);
+		return -1;
+	}
+
+	attach_type = bpf_program__get_expected_attach_type(prog);
+	err = bpf_prog_detach2(prog_fd, -1 /* attachable fd */, attach_type);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to detach program '%s'", prog_name);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int update_lookup_map(struct bpf_map *map, int index, int sock_fd)
+{
+	int err, map_fd;
+	uint64_t value;
+
+	map_fd = bpf_map__fd(map);
+	if (CHECK_FAIL(map_fd < 0)) {
+		errno = -map_fd;
+		log_err("failed to get map FD");
+		return -1;
+	}
+
+	value = (uint64_t)sock_fd;
+	err = bpf_map_update_elem(map_fd, &index, &value, BPF_NOEXIST);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to update redir_map @ %d", index);
+		return -1;
+	}
+
+	return 0;
+}
+
+static void query_lookup_prog(struct test_sk_lookup_kern *skel)
+{
+	struct bpf_program *lookup_prog = skel->progs.lookup_pass;
+	enum bpf_attach_type attach_type;
+	__u32 attach_flags = 0;
+	__u32 prog_ids[1] = { 0 };
+	__u32 prog_cnt = 1;
+	int net_fd = -1;
+	int err;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(net_fd < 0)) {
+		log_err("failed to open /proc/self/ns/net");
+		return;
+	}
+
+	err = attach_lookup_prog(lookup_prog);
+	if (err)
+		goto close;
+
+	attach_type = bpf_program__get_expected_attach_type(lookup_prog);
+	err = bpf_prog_query(net_fd, attach_type, 0 /* query flags */,
+			     &attach_flags, prog_ids, &prog_cnt);
+	if (CHECK_FAIL(err)) {
+		log_err("failed to query lookup prog");
+		goto detach;
+	}
+
+	errno = 0;
+	if (CHECK_FAIL(attach_flags != 0)) {
+		log_err("wrong attach_flags on query: %u", attach_flags);
+		goto detach;
+	}
+	if (CHECK_FAIL(prog_cnt != 1)) {
+		log_err("wrong program count on query: %u", prog_cnt);
+		goto detach;
+	}
+	if (CHECK_FAIL(prog_ids[0] = 0)) {
+		log_err("invalid program id on query: %u", prog_ids[0]);
+		goto detach;
+	}
+
+detach:
+	detach_lookup_prog(lookup_prog);
+close:
+	close(net_fd);
+}
+
+static void run_lookup_prog(const struct test *t)
+{
+	int client_fd, server_fds[MAX_SERVERS] = { -1 };
+	int i, err, server_idx;
+
+	err = attach_lookup_prog(t->lookup_prog);
+	if (err)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		server_fds[i] = make_server(t->sotype, t->recv_at.ip,
+					    t->recv_at.port, t->reuseport_prog);
+		if (server_fds[i] < 0)
+			goto close;
+
+		err = update_lookup_map(t->sock_map, i, server_fds[i]);
+		if (err)
+			goto detach;
+
+		/* want just one server for non-reuseport test */
+		if (!t->reuseport_prog)
+			break;
+	}
+
+	client_fd = make_client(t->sotype, t->send_to.ip, t->send_to.port);
+	if (client_fd < 0)
+		goto close;
+
+	/* reuseport prog always selects server B */
+	server_idx = t->reuseport_prog ? SERVER_B : SERVER_A;
+
+	if (t->sotype = SOCK_STREAM)
+		tcp_echo_test(client_fd, server_fds[server_idx]);
+	else
+		udp_echo_test(client_fd, server_fds[server_idx]);
+
+	close(client_fd);
+close:
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++)
+		close(server_fds[i]);
+detach:
+	detach_lookup_prog(t->lookup_prog);
+}
+
+static void test_override_lookup(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+		{
+			.desc		= "TCP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.recv_at	= { INT_IP4_V6, INT_PORT },
+			.send_to	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+		{
+			.desc		= "UDP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+		{
+			.desc		= "UDP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.recv_at	= { INT_IP4_V6, INT_PORT },
+			.send_to	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+			.reuseport_prog	= skel->progs.select_sock_b,
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			run_lookup_prog(t);
+	}
+}
+
+static void drop_on_lookup(const struct test *t)
+{
+	struct sockaddr_storage dst = { 0 };
+	int client_fd, server_fd, err;
+	ssize_t n;
+
+	if (attach_lookup_prog(t->lookup_prog))
+		return;
+
+	server_fd = make_server(t->sotype, t->recv_at.ip, t->recv_at.port,
+				t->reuseport_prog);
+	if (server_fd < 0)
+		goto detach;
+
+	client_fd = make_socket_with_addr(t->sotype, t->send_to.ip,
+					  t->send_to.port, &dst);
+	if (client_fd < 0)
+		goto close_srv;
+
+	err = connect(client_fd, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype = SOCK_DGRAM) {
+		err = send_byte(client_fd);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client_fd, NULL, 0, 0);
+		err = n = -1;
+	}
+	if (CHECK_FAIL(!err || errno != ECONNREFUSED))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client_fd);
+close_srv:
+	close(server_fd);
+detach:
+	detach_lookup_prog(t->lookup_prog);
+}
+
+static void test_drop_on_lookup(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { EXT_IP6, INT_PORT },
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_lookup(t);
+	}
+}
+
+static void drop_on_reuseport(const struct test *t)
+{
+	struct sockaddr_storage dst = { 0 };
+	int client, server1, server2, err;
+	ssize_t n;
+
+	if (attach_lookup_prog(t->lookup_prog))
+		return;
+
+	server1 = make_server(t->sotype, t->recv_at.ip, t->recv_at.port,
+			      t->reuseport_prog);
+	if (server1 < 0)
+		goto detach;
+
+	err = update_lookup_map(t->sock_map, SERVER_A, server1);
+	if (err)
+		goto detach;
+
+	/* second server on destination address we should never reach */
+	server2 = make_server(t->sotype, t->send_to.ip, t->send_to.port,
+			      NULL /* reuseport prog */);
+	if (server2 < 0)
+		goto close_srv1;
+
+	client = make_socket_with_addr(t->sotype, t->send_to.ip,
+				       t->send_to.port, &dst);
+	if (client < 0)
+		goto close_srv2;
+
+	err = connect(client, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype = SOCK_DGRAM) {
+		err = send_byte(client);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client, NULL, 0, 0);
+		err = n = -1;
+	}
+	if (CHECK_FAIL(!err || errno != ECONNREFUSED))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client);
+close_srv2:
+	close(server2);
+close_srv1:
+	close(server1);
+detach:
+	detach_lookup_prog(t->lookup_prog);
+}
+
+static void test_drop_on_reuseport(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.send_to	= { EXT_IP4, EXT_PORT },
+			.recv_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.send_to	= { EXT_IP6, EXT_PORT },
+			.recv_at	= { INT_IP6, INT_PORT },
+		},
+
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_reuseport(t);
+	}
+}
+
+static void run_tests(struct test_sk_lookup_kern *skel)
+{
+	if (test__start_subtest("query lookup prog"))
+		query_lookup_prog(skel);
+	test_override_lookup(skel);
+	test_drop_on_lookup(skel);
+	test_drop_on_reuseport(skel);
+}
+
+static int switch_netns(int *saved_net)
+{
+	static const char * const setup_script[] = {
+		"ip -6 addr add dev lo " EXT_IP6 "/128 nodad",
+		"ip -6 addr add dev lo " INT_IP6 "/128 nodad",
+		"ip link set dev lo up",
+		NULL,
+	};
+	const char * const *cmd;
+	int net_fd, err;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK_FAIL(net_fd < 0)) {
+		log_err("open(/proc/self/ns/net)");
+		return -1;
+	}
+
+	err = unshare(CLONE_NEWNET);
+	if (CHECK_FAIL(err)) {
+		log_err("unshare(CLONE_NEWNET)");
+		goto close;
+	}
+
+	for (cmd = setup_script; *cmd; cmd++) {
+		err = system(*cmd);
+		if (CHECK_FAIL(err)) {
+			log_err("system(%s)", *cmd);
+			goto close;
+		}
+	}
+
+	*saved_net = net_fd;
+	return 0;
+
+close:
+	close(net_fd);
+	return -1;
+}
+
+static void restore_netns(int saved_net)
+{
+	int err;
+
+	err = setns(saved_net, CLONE_NEWNET);
+	if (CHECK_FAIL(err))
+		log_err("setns(CLONE_NEWNET)");
+
+	close(saved_net);
+}
+
+void test_sk_lookup(void)
+{
+	struct test_sk_lookup_kern *skel;
+	int err, saved_net;
+
+	err = switch_netns(&saved_net);
+	if (err)
+		return;
+
+	skel = test_sk_lookup_kern__open_and_load();
+	if (CHECK_FAIL(!skel)) {
+		errno = 0;
+		log_err("failed to open and load BPF skeleton");
+		goto restore_netns;
+	}
+
+	run_tests(skel);
+
+	test_sk_lookup_kern__destroy(skel);
+restore_netns:
+	restore_netns(saved_net);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
new file mode 100644
index 000000000000..4ad7c6842487
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
@@ -0,0 +1,162 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+
+#define IP4(a, b, c, d)					\
+	bpf_htonl((((__u32)(a) & 0xffU) << 24) |	\
+		  (((__u32)(b) & 0xffU) << 16) |	\
+		  (((__u32)(c) & 0xffU) <<  8) |	\
+		  (((__u32)(d) & 0xffU) <<  0))
+#define IP6(aaaa, bbbb, cccc, dddd)			\
+	{ bpf_htonl(aaaa), bpf_htonl(bbbb), bpf_htonl(cccc), bpf_htonl(dddd) }
+
+#define MAX_SOCKS 32
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, MAX_SOCKS);
+	__type(key, __u32);
+	__type(value, __u64);
+} redir_map SEC(".maps");
+
+enum {
+	SERVER_A = 0,
+	SERVER_B = 1,
+};
+
+enum {
+	NO_FLAGS = 0,
+};
+
+static const __u32 DST_PORT = 7007;
+static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
+static const __u32 DST_IP6[] = IP6(0xfd000000, 0x0, 0x0, 0x00000001);
+
+SEC("sk_lookup/lookup_pass")
+int lookup_pass(struct bpf_sk_lookup *ctx)
+{
+	return BPF_OK;
+}
+
+SEC("sk_lookup/lookup_drop")
+int lookup_drop(struct bpf_sk_lookup *ctx)
+{
+	return BPF_DROP;
+}
+
+SEC("sk_reuseport/reuse_pass")
+int reuseport_pass(struct sk_reuseport_md *ctx)
+{
+	return SK_PASS;
+}
+
+SEC("sk_reuseport/reuse_drop")
+int reuseport_drop(struct sk_reuseport_md *ctx)
+{
+	return SK_DROP;
+}
+
+/* Redirect packets destined for port DST_PORT to socket at redir_map[0]. */
+SEC("sk_lookup/redir_port")
+int redir_port(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->dst_port != DST_PORT)
+		return BPF_OK;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip4")
+int redir_ip4(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET)
+		return BPF_OK;
+	if (ctx->dst_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->dst_ip4 != DST_IP4)
+		return BPF_OK;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+/* Redirect packets destined for DST_IP6 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip6")
+int redir_ip6(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET6)
+		return BPF_OK;
+	if (ctx->dst_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->dst_ip6[0] != DST_IP6[0] ||
+	    ctx->dst_ip6[1] != DST_IP6[1] ||
+	    ctx->dst_ip6[2] != DST_IP6[2] ||
+	    ctx->dst_ip6[3] != DST_IP6[3])
+		return BPF_OK;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+SEC("sk_lookup/select_sock_a")
+int select_sock_a(struct bpf_sk_lookup *ctx)
+{
+	__u32 key = SERVER_A;
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &key);
+	if (!sk)
+		return BPF_OK;
+
+	err = bpf_sk_assign(ctx, sk, NO_FLAGS);
+	bpf_sk_release(sk);
+	return err ? BPF_DROP : BPF_REDIRECT;
+}
+
+SEC("sk_reuseport/select_sock_b")
+int select_sock_b(struct sk_reuseport_md *ctx)
+{
+	__u32 key = SERVER_B;
+	int err;
+
+	err = bpf_sk_select_reuseport(ctx, &redir_map, &key, NO_FLAGS);
+	return err ? SK_DROP : SK_PASS;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
+__u32 _version SEC("version") = 1;
-- 
2.25.3

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-06 13:16     ` Lorenz Bauer
  -1 siblings, 0 replies; 68+ messages in thread
From: Lorenz Bauer @ 2020-05-06 13:16 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Networking, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski

On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
> transport layer when looking up a socket for a received packet.
>
> When called, SK_LOOKUP program can select a socket that will receive the
> packet. This serves as a mechanism to overcome the limits of what bind()
> API allows to express. Two use-cases driving this work are:
>
>  (1) steer packets destined to an IP range, fixed port to a socket
>
>      192.0.2.0/24, port 80 -> NGINX socket
>
>  (2) steer packets destined to an IP address, any port to a socket
>
>      198.51.100.1, any port -> L7 proxy socket
>
> In its run-time context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple. Context can be further extended to include ingress
> interface identifier.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection. Transport layer then uses the selected
> socket as a result of socket lookup.
>
> This patch only enables the user to attach an SK_LOOKUP program to a
> network namespace. Subsequent patches hook it up to run on local delivery
> path in ipv4 and ipv6 stacks.
>
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf_types.h   |   2 +
>  include/linux/filter.h      |  23 ++++
>  include/net/net_namespace.h |   1 +
>  include/uapi/linux/bpf.h    |  53 ++++++++
>  kernel/bpf/syscall.c        |   9 ++
>  net/core/filter.c           | 247 ++++++++++++++++++++++++++++++++++++
>  scripts/bpf_helpers_doc.py  |   9 +-
>  7 files changed, 343 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 8345cdf553b8..08c2aef674ac 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
>  #ifdef CONFIG_INET
>  BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
>               struct sk_reuseport_md, struct sk_reuseport_kern)
> +BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
> +             struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
>  #endif
>  #if defined(CONFIG_BPF_JIT)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index af37318bb1c5..33254e840c8d 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1280,4 +1280,27 @@ struct bpf_sockopt_kern {
>         s32             retval;
>  };
>
> +struct bpf_sk_lookup_kern {
> +       unsigned short  family;
> +       u16             protocol;
> +       union {
> +               struct {
> +                       __be32 saddr;
> +                       __be32 daddr;
> +               } v4;
> +               struct {
> +                       struct in6_addr saddr;
> +                       struct in6_addr daddr;
> +               } v6;
> +       };
> +       __be16          sport;
> +       u16             dport;
> +       struct sock     *selected_sk;
> +};
> +
> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
> +int sk_lookup_prog_detach(const union bpf_attr *attr);
> +int sk_lookup_prog_query(const union bpf_attr *attr,
> +                        union bpf_attr __user *uattr);
> +
>  #endif /* __LINUX_FILTER_H__ */
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index ab96fb59131c..70bf4888c94d 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -163,6 +163,7 @@ struct net {
>         struct net_generic __rcu        *gen;
>
>         struct bpf_prog __rcu   *flow_dissector_prog;
> +       struct bpf_prog __rcu   *sk_lookup_prog;
>
>         /* Note : following structs are cache line aligned */
>  #ifdef CONFIG_XFRM
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b3643e27e264..e4c61b63d4bc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -187,6 +187,7 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_STRUCT_OPS,
>         BPF_PROG_TYPE_EXT,
>         BPF_PROG_TYPE_LSM,
> +       BPF_PROG_TYPE_SK_LOOKUP,
>  };
>
>  enum bpf_attach_type {
> @@ -218,6 +219,7 @@ enum bpf_attach_type {
>         BPF_TRACE_FEXIT,
>         BPF_MODIFY_RETURN,
>         BPF_LSM_MAC,
> +       BPF_SK_LOOKUP,
>         __MAX_BPF_ATTACH_TYPE
>  };
>
> @@ -3041,6 +3043,10 @@ union bpf_attr {
>   *
>   * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
>   *     Description
> + *             Helper is overloaded depending on BPF program type. This
> + *             description applies to **BPF_PROG_TYPE_SCHED_CLS** and
> + *             **BPF_PROG_TYPE_SCHED_ACT** programs.
> + *
>   *             Assign the *sk* to the *skb*. When combined with appropriate
>   *             routing configuration to receive the packet towards the socket,
>   *             will cause *skb* to be delivered to the specified socket.
> @@ -3061,6 +3067,39 @@ union bpf_attr {
>   *                                     call from outside of TC ingress.
>   *             * **-ESOCKTNOSUPPORT**  Socket type not supported (reuseport).
>   *
> + * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
> + *     Description
> + *             Helper is overloaded depending on BPF program type. This
> + *             description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
> + *
> + *             Select the *sk* as a result of a socket lookup.
> + *
> + *             For the operation to succeed passed socket must be compatible
> + *             with the packet description provided by the *ctx* object.
> + *
> + *             L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
> + *             match. While IP family (*AF_INET* or *AF_INET6*) must be
> + *             compatible, that is IPv6 sockets that are not v6-only can be
> + *             selected for IPv4 packets.
> + *
> + *             Only full sockets can be selected. However, there is no need to
> + *             call bpf_fullsock() before passing a socket as an argument to
> + *             this helper.
> + *
> + *             The *flags* argument must be zero.
> + *     Return
> + *             0 on success, or a negative errno in case of failure.
> + *
> + *             **-EAFNOSUPPORT** is socket family (*sk->family*) is not
> + *             compatible with packet family (*ctx->family*).
> + *
> + *             **-EINVAL** if unsupported flags were specified.
> + *
> + *             **-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
> + *             match packet protocol (*ctx->protocol*).
> + *
> + *             **-ESOCKTNOSUPPORT** if socket is not a full socket.
> + *
>   * u64 bpf_ktime_get_boot_ns(void)
>   *     Description
>   *             Return the time elapsed since system boot, in nanoseconds.
> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>         __u32 pid;
>         __u32 tgid;
>  };
> +
> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
> +struct bpf_sk_lookup {
> +       __u32 family;           /* AF_INET, AF_INET6 */
> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
> +       /* IP addresses allows 1, 2, and 4 bytes access */
> +       __u32 src_ip4;
> +       __u32 src_ip6[4];
> +       __u32 src_port;         /* network byte order */
> +       __u32 dst_ip4;
> +       __u32 dst_ip6[4];
> +       __u32 dst_port;         /* host byte order */

Jakub and I have discussed this off-list, but we couldn't come to an
agreement and decided to invite
your opinion.

I think that dst_port should be in network byte order, since it's one
less exception to the
rule to think about when writing BPF programs.

Jakub's argument is that this follows __sk_buff->local_port precedent,
which is also in host
byte order.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-06 13:16     ` Lorenz Bauer
  0 siblings, 0 replies; 68+ messages in thread
From: Lorenz Bauer @ 2020-05-06 13:16 UTC (permalink / raw)
  To: dccp

On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
> transport layer when looking up a socket for a received packet.
>
> When called, SK_LOOKUP program can select a socket that will receive the
> packet. This serves as a mechanism to overcome the limits of what bind()
> API allows to express. Two use-cases driving this work are:
>
>  (1) steer packets destined to an IP range, fixed port to a socket
>
>      192.0.2.0/24, port 80 -> NGINX socket
>
>  (2) steer packets destined to an IP address, any port to a socket
>
>      198.51.100.1, any port -> L7 proxy socket
>
> In its run-time context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple. Context can be further extended to include ingress
> interface identifier.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection. Transport layer then uses the selected
> socket as a result of socket lookup.
>
> This patch only enables the user to attach an SK_LOOKUP program to a
> network namespace. Subsequent patches hook it up to run on local delivery
> path in ipv4 and ipv6 stacks.
>
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf_types.h   |   2 +
>  include/linux/filter.h      |  23 ++++
>  include/net/net_namespace.h |   1 +
>  include/uapi/linux/bpf.h    |  53 ++++++++
>  kernel/bpf/syscall.c        |   9 ++
>  net/core/filter.c           | 247 ++++++++++++++++++++++++++++++++++++
>  scripts/bpf_helpers_doc.py  |   9 +-
>  7 files changed, 343 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 8345cdf553b8..08c2aef674ac 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
>  #ifdef CONFIG_INET
>  BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
>               struct sk_reuseport_md, struct sk_reuseport_kern)
> +BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
> +             struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
>  #endif
>  #if defined(CONFIG_BPF_JIT)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index af37318bb1c5..33254e840c8d 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1280,4 +1280,27 @@ struct bpf_sockopt_kern {
>         s32             retval;
>  };
>
> +struct bpf_sk_lookup_kern {
> +       unsigned short  family;
> +       u16             protocol;
> +       union {
> +               struct {
> +                       __be32 saddr;
> +                       __be32 daddr;
> +               } v4;
> +               struct {
> +                       struct in6_addr saddr;
> +                       struct in6_addr daddr;
> +               } v6;
> +       };
> +       __be16          sport;
> +       u16             dport;
> +       struct sock     *selected_sk;
> +};
> +
> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
> +int sk_lookup_prog_detach(const union bpf_attr *attr);
> +int sk_lookup_prog_query(const union bpf_attr *attr,
> +                        union bpf_attr __user *uattr);
> +
>  #endif /* __LINUX_FILTER_H__ */
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index ab96fb59131c..70bf4888c94d 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -163,6 +163,7 @@ struct net {
>         struct net_generic __rcu        *gen;
>
>         struct bpf_prog __rcu   *flow_dissector_prog;
> +       struct bpf_prog __rcu   *sk_lookup_prog;
>
>         /* Note : following structs are cache line aligned */
>  #ifdef CONFIG_XFRM
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b3643e27e264..e4c61b63d4bc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -187,6 +187,7 @@ enum bpf_prog_type {
>         BPF_PROG_TYPE_STRUCT_OPS,
>         BPF_PROG_TYPE_EXT,
>         BPF_PROG_TYPE_LSM,
> +       BPF_PROG_TYPE_SK_LOOKUP,
>  };
>
>  enum bpf_attach_type {
> @@ -218,6 +219,7 @@ enum bpf_attach_type {
>         BPF_TRACE_FEXIT,
>         BPF_MODIFY_RETURN,
>         BPF_LSM_MAC,
> +       BPF_SK_LOOKUP,
>         __MAX_BPF_ATTACH_TYPE
>  };
>
> @@ -3041,6 +3043,10 @@ union bpf_attr {
>   *
>   * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
>   *     Description
> + *             Helper is overloaded depending on BPF program type. This
> + *             description applies to **BPF_PROG_TYPE_SCHED_CLS** and
> + *             **BPF_PROG_TYPE_SCHED_ACT** programs.
> + *
>   *             Assign the *sk* to the *skb*. When combined with appropriate
>   *             routing configuration to receive the packet towards the socket,
>   *             will cause *skb* to be delivered to the specified socket.
> @@ -3061,6 +3067,39 @@ union bpf_attr {
>   *                                     call from outside of TC ingress.
>   *             * **-ESOCKTNOSUPPORT**  Socket type not supported (reuseport).
>   *
> + * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
> + *     Description
> + *             Helper is overloaded depending on BPF program type. This
> + *             description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
> + *
> + *             Select the *sk* as a result of a socket lookup.
> + *
> + *             For the operation to succeed passed socket must be compatible
> + *             with the packet description provided by the *ctx* object.
> + *
> + *             L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
> + *             match. While IP family (*AF_INET* or *AF_INET6*) must be
> + *             compatible, that is IPv6 sockets that are not v6-only can be
> + *             selected for IPv4 packets.
> + *
> + *             Only full sockets can be selected. However, there is no need to
> + *             call bpf_fullsock() before passing a socket as an argument to
> + *             this helper.
> + *
> + *             The *flags* argument must be zero.
> + *     Return
> + *             0 on success, or a negative errno in case of failure.
> + *
> + *             **-EAFNOSUPPORT** is socket family (*sk->family*) is not
> + *             compatible with packet family (*ctx->family*).
> + *
> + *             **-EINVAL** if unsupported flags were specified.
> + *
> + *             **-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
> + *             match packet protocol (*ctx->protocol*).
> + *
> + *             **-ESOCKTNOSUPPORT** if socket is not a full socket.
> + *
>   * u64 bpf_ktime_get_boot_ns(void)
>   *     Description
>   *             Return the time elapsed since system boot, in nanoseconds.
> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>         __u32 pid;
>         __u32 tgid;
>  };
> +
> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
> +struct bpf_sk_lookup {
> +       __u32 family;           /* AF_INET, AF_INET6 */
> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
> +       /* IP addresses allows 1, 2, and 4 bytes access */
> +       __u32 src_ip4;
> +       __u32 src_ip6[4];
> +       __u32 src_port;         /* network byte order */
> +       __u32 dst_ip4;
> +       __u32 dst_ip6[4];
> +       __u32 dst_port;         /* host byte order */

Jakub and I have discussed this off-list, but we couldn't come to an
agreement and decided to invite
your opinion.

I think that dst_port should be in network byte order, since it's one
less exception to the
rule to think about when writing BPF programs.

Jakub's argument is that this follows __sk_buff->local_port precedent,
which is also in host
byte order.

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-06 13:53       ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 13:53 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Networking, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski

On Wed, May 06, 2020 at 03:16 PM CEST, Lorenz Bauer wrote:
> On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:

[...]

>> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>>         __u32 pid;
>>         __u32 tgid;
>>  };
>> +
>> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
>> +struct bpf_sk_lookup {
>> +       __u32 family;           /* AF_INET, AF_INET6 */
>> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
>> +       /* IP addresses allows 1, 2, and 4 bytes access */
>> +       __u32 src_ip4;
>> +       __u32 src_ip6[4];
>> +       __u32 src_port;         /* network byte order */
>> +       __u32 dst_ip4;
>> +       __u32 dst_ip6[4];
>> +       __u32 dst_port;         /* host byte order */
>
> Jakub and I have discussed this off-list, but we couldn't come to an
> agreement and decided to invite
> your opinion.
>
> I think that dst_port should be in network byte order, since it's one
> less exception to the
> rule to think about when writing BPF programs.
>
> Jakub's argument is that this follows __sk_buff->local_port precedent,
> which is also in host
> byte order.

Yes, would be great to hear if there is a preference here.

Small correction, proposed sk_lookup program doesn't have access to
__sk_buff, so perhaps that case matters less.

bpf_sk_lookup->dst_port, the packet destination port, is in host byte
order so that it can be compared against bpf_sock->src_port, socket
local port, without conversion.

But I also see how it can be a surprise for a BPF user that one field has
a different byte order.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-06 13:53       ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-06 13:53 UTC (permalink / raw)
  To: dccp

On Wed, May 06, 2020 at 03:16 PM CEST, Lorenz Bauer wrote:
> On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:

[...]

>> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>>         __u32 pid;
>>         __u32 tgid;
>>  };
>> +
>> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
>> +struct bpf_sk_lookup {
>> +       __u32 family;           /* AF_INET, AF_INET6 */
>> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
>> +       /* IP addresses allows 1, 2, and 4 bytes access */
>> +       __u32 src_ip4;
>> +       __u32 src_ip6[4];
>> +       __u32 src_port;         /* network byte order */
>> +       __u32 dst_ip4;
>> +       __u32 dst_ip6[4];
>> +       __u32 dst_port;         /* host byte order */
>
> Jakub and I have discussed this off-list, but we couldn't come to an
> agreement and decided to invite
> your opinion.
>
> I think that dst_port should be in network byte order, since it's one
> less exception to the
> rule to think about when writing BPF programs.
>
> Jakub's argument is that this follows __sk_buff->local_port precedent,
> which is also in host
> byte order.

Yes, would be great to hear if there is a preference here.

Small correction, proposed sk_lookup program doesn't have access to
__sk_buff, so perhaps that case matters less.

bpf_sk_lookup->dst_port, the packet destination port, is in host byte
order so that it can be compared against bpf_sock->src_port, socket
local port, without conversion.

But I also see how it can be a surprise for a BPF user that one field has
a different byte order.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-07 20:55         ` Martin KaFai Lau
  -1 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-07 20:55 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Lorenz Bauer, Networking, bpf, dccp, kernel-team,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Eric Dumazet, Gerrit Renker, Jakub Kicinski, Marek Majkowski

On Wed, May 06, 2020 at 03:53:35PM +0200, Jakub Sitnicki wrote:
> On Wed, May 06, 2020 at 03:16 PM CEST, Lorenz Bauer wrote:
> > On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:
> 
> [...]
> 
> >> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
> >>         __u32 pid;
> >>         __u32 tgid;
> >>  };
> >> +
> >> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
> >> +struct bpf_sk_lookup {
> >> +       __u32 family;           /* AF_INET, AF_INET6 */
> >> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
> >> +       /* IP addresses allows 1, 2, and 4 bytes access */
> >> +       __u32 src_ip4;
> >> +       __u32 src_ip6[4];
> >> +       __u32 src_port;         /* network byte order */
> >> +       __u32 dst_ip4;
> >> +       __u32 dst_ip6[4];
> >> +       __u32 dst_port;         /* host byte order */
> >
> > Jakub and I have discussed this off-list, but we couldn't come to an
> > agreement and decided to invite
> > your opinion.
> >
> > I think that dst_port should be in network byte order, since it's one
> > less exception to the
> > rule to think about when writing BPF programs.
> >
> > Jakub's argument is that this follows __sk_buff->local_port precedent,
> > which is also in host
> > byte order.
> 
> Yes, would be great to hear if there is a preference here.
> 
> Small correction, proposed sk_lookup program doesn't have access to
> __sk_buff, so perhaps that case matters less.
> 
> bpf_sk_lookup->dst_port, the packet destination port, is in host byte
> order so that it can be compared against bpf_sock->src_port, socket
> local port, without conversion.
> 
> But I also see how it can be a surprise for a BPF user that one field has
> a different byte order.
I would also prefer port and addr were all in the same byte order.
However, it is not the cases for the other prog_type ctx.
People has stomped on it from time to time.  May be something
can be done at the libbpf to hide this difference.

I think uapi consistency with other existing ctx is more important here.
(i.e. keep the "local" port in host order).  Otherwise, the user will
be slapped left and right when writting bpf_prog in different prog_type.

Armed with the knowledge on skc_num, the "local" port is
in host byte order in the current existing prog ctx.  It is
unfortunate that the "dst"_port in this patch is the "local" port.
The "local" port in "struct bpf_sock" is actually the "src"_port. :/
Would "local"/"remote" be clearer than "src"/dst" in this patch?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-07 20:55         ` Martin KaFai Lau
  0 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-07 20:55 UTC (permalink / raw)
  To: dccp

On Wed, May 06, 2020 at 03:53:35PM +0200, Jakub Sitnicki wrote:
> On Wed, May 06, 2020 at 03:16 PM CEST, Lorenz Bauer wrote:
> > On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:
> 
> [...]
> 
> >> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
> >>         __u32 pid;
> >>         __u32 tgid;
> >>  };
> >> +
> >> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
> >> +struct bpf_sk_lookup {
> >> +       __u32 family;           /* AF_INET, AF_INET6 */
> >> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
> >> +       /* IP addresses allows 1, 2, and 4 bytes access */
> >> +       __u32 src_ip4;
> >> +       __u32 src_ip6[4];
> >> +       __u32 src_port;         /* network byte order */
> >> +       __u32 dst_ip4;
> >> +       __u32 dst_ip6[4];
> >> +       __u32 dst_port;         /* host byte order */
> >
> > Jakub and I have discussed this off-list, but we couldn't come to an
> > agreement and decided to invite
> > your opinion.
> >
> > I think that dst_port should be in network byte order, since it's one
> > less exception to the
> > rule to think about when writing BPF programs.
> >
> > Jakub's argument is that this follows __sk_buff->local_port precedent,
> > which is also in host
> > byte order.
> 
> Yes, would be great to hear if there is a preference here.
> 
> Small correction, proposed sk_lookup program doesn't have access to
> __sk_buff, so perhaps that case matters less.
> 
> bpf_sk_lookup->dst_port, the packet destination port, is in host byte
> order so that it can be compared against bpf_sock->src_port, socket
> local port, without conversion.
> 
> But I also see how it can be a surprise for a BPF user that one field has
> a different byte order.
I would also prefer port and addr were all in the same byte order.
However, it is not the cases for the other prog_type ctx.
People has stomped on it from time to time.  May be something
can be done at the libbpf to hide this difference.

I think uapi consistency with other existing ctx is more important here.
(i.e. keep the "local" port in host order).  Otherwise, the user will
be slapped left and right when writting bpf_prog in different prog_type.

Armed with the knowledge on skc_num, the "local" port is
in host byte order in the current existing prog ctx.  It is
unfortunate that the "dst"_port in this patch is the "local" port.
The "local" port in "struct bpf_sock" is actually the "src"_port. :/
Would "local"/"remote" be clearer than "src"/dst" in this patch?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-08  7:06     ` Martin KaFai Lau
  -1 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-08  7:06 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
> transport layer when looking up a socket for a received packet.
> 
> When called, SK_LOOKUP program can select a socket that will receive the
> packet. This serves as a mechanism to overcome the limits of what bind()
> API allows to express. Two use-cases driving this work are:
> 
>  (1) steer packets destined to an IP range, fixed port to a socket
> 
>      192.0.2.0/24, port 80 -> NGINX socket
> 
>  (2) steer packets destined to an IP address, any port to a socket
> 
>      198.51.100.1, any port -> L7 proxy socket
> 
> In its run-time context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple. Context can be further extended to include ingress
> interface identifier.
> 
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection. Transport layer then uses the selected
> socket as a result of socket lookup.
> 
> This patch only enables the user to attach an SK_LOOKUP program to a
> network namespace. Subsequent patches hook it up to run on local delivery
> path in ipv4 and ipv6 stacks.
> 
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf_types.h   |   2 +
>  include/linux/filter.h      |  23 ++++
>  include/net/net_namespace.h |   1 +
>  include/uapi/linux/bpf.h    |  53 ++++++++
>  kernel/bpf/syscall.c        |   9 ++
>  net/core/filter.c           | 247 ++++++++++++++++++++++++++++++++++++
>  scripts/bpf_helpers_doc.py  |   9 +-
>  7 files changed, 343 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 8345cdf553b8..08c2aef674ac 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
>  #ifdef CONFIG_INET
>  BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
>  	      struct sk_reuseport_md, struct sk_reuseport_kern)
> +BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
> +	      struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
>  #endif
>  #if defined(CONFIG_BPF_JIT)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index af37318bb1c5..33254e840c8d 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1280,4 +1280,27 @@ struct bpf_sockopt_kern {
>  	s32		retval;
>  };
>  
> +struct bpf_sk_lookup_kern {
> +	unsigned short	family;
> +	u16		protocol;
> +	union {
> +		struct {
> +			__be32 saddr;
> +			__be32 daddr;
> +		} v4;
> +		struct {
> +			struct in6_addr saddr;
> +			struct in6_addr daddr;
> +		} v6;
> +	};
> +	__be16		sport;
> +	u16		dport;
> +	struct sock	*selected_sk;
> +};
> +
> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
> +int sk_lookup_prog_detach(const union bpf_attr *attr);
> +int sk_lookup_prog_query(const union bpf_attr *attr,
> +			 union bpf_attr __user *uattr);
> +
>  #endif /* __LINUX_FILTER_H__ */
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index ab96fb59131c..70bf4888c94d 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -163,6 +163,7 @@ struct net {
>  	struct net_generic __rcu	*gen;
>  
>  	struct bpf_prog __rcu	*flow_dissector_prog;
> +	struct bpf_prog __rcu	*sk_lookup_prog;
>  
>  	/* Note : following structs are cache line aligned */
>  #ifdef CONFIG_XFRM
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b3643e27e264..e4c61b63d4bc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -187,6 +187,7 @@ enum bpf_prog_type {
>  	BPF_PROG_TYPE_STRUCT_OPS,
>  	BPF_PROG_TYPE_EXT,
>  	BPF_PROG_TYPE_LSM,
> +	BPF_PROG_TYPE_SK_LOOKUP,
>  };
>  
>  enum bpf_attach_type {
> @@ -218,6 +219,7 @@ enum bpf_attach_type {
>  	BPF_TRACE_FEXIT,
>  	BPF_MODIFY_RETURN,
>  	BPF_LSM_MAC,
> +	BPF_SK_LOOKUP,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> @@ -3041,6 +3043,10 @@ union bpf_attr {
>   *
>   * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
>   *	Description
> + *		Helper is overloaded depending on BPF program type. This
> + *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
> + *		**BPF_PROG_TYPE_SCHED_ACT** programs.
> + *
>   *		Assign the *sk* to the *skb*. When combined with appropriate
>   *		routing configuration to receive the packet towards the socket,
>   *		will cause *skb* to be delivered to the specified socket.
> @@ -3061,6 +3067,39 @@ union bpf_attr {
>   *					call from outside of TC ingress.
>   *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
>   *
> + * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
> + *	Description
> + *		Helper is overloaded depending on BPF program type. This
> + *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
> + *
> + *		Select the *sk* as a result of a socket lookup.
> + *
> + *		For the operation to succeed passed socket must be compatible
> + *		with the packet description provided by the *ctx* object.
> + *
> + *		L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
> + *		match. While IP family (*AF_INET* or *AF_INET6*) must be
> + *		compatible, that is IPv6 sockets that are not v6-only can be
> + *		selected for IPv4 packets.
> + *
> + *		Only full sockets can be selected. However, there is no need to
> + *		call bpf_fullsock() before passing a socket as an argument to
> + *		this helper.
> + *
> + *		The *flags* argument must be zero.
> + *	Return
> + *		0 on success, or a negative errno in case of failure.
> + *
> + *		**-EAFNOSUPPORT** is socket family (*sk->family*) is not
> + *		compatible with packet family (*ctx->family*).
> + *
> + *		**-EINVAL** if unsupported flags were specified.
> + *
> + *		**-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
> + *		match packet protocol (*ctx->protocol*).
> + *
> + *		**-ESOCKTNOSUPPORT** if socket is not a full socket.
> + *
>   * u64 bpf_ktime_get_boot_ns(void)
>   * 	Description
>   * 		Return the time elapsed since system boot, in nanoseconds.
> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>  	__u32 pid;
>  	__u32 tgid;
>  };
> +
> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
> +struct bpf_sk_lookup {
> +	__u32 family;		/* AF_INET, AF_INET6 */
> +	__u32 protocol;		/* IPPROTO_TCP, IPPROTO_UDP */
> +	/* IP addresses allows 1, 2, and 4 bytes access */
> +	__u32 src_ip4;
> +	__u32 src_ip6[4];
> +	__u32 src_port;		/* network byte order */
> +	__u32 dst_ip4;
> +	__u32 dst_ip6[4];
> +	__u32 dst_port;		/* host byte order */
> +};
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bb1ab7da6103..26d643c171fd 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2729,6 +2729,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
>  	case BPF_CGROUP_GETSOCKOPT:
>  	case BPF_CGROUP_SETSOCKOPT:
>  		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
> +	case BPF_SK_LOOKUP:
It may be a good idea to enforce the "expected_attach_type ==
BPF_SK_LOOKUP" during prog load time in bpf_prog_load_check_attach().
The attr->expected_attach_type could be anything right now if I read
it correctly.

> +		return BPF_PROG_TYPE_SK_LOOKUP;
>  	default:
>  		return BPF_PROG_TYPE_UNSPEC;
>  	}
> @@ -2778,6 +2780,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>  		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
>  		break;
> +	case BPF_PROG_TYPE_SK_LOOKUP:
> +		ret = sk_lookup_prog_attach(attr, prog);
> +		break;
>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>  	case BPF_PROG_TYPE_CGROUP_SKB:
>  	case BPF_PROG_TYPE_CGROUP_SOCK:
> @@ -2818,6 +2823,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>  		return lirc_prog_detach(attr);
>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>  		return skb_flow_dissector_bpf_prog_detach(attr);
> +	case BPF_PROG_TYPE_SK_LOOKUP:
> +		return sk_lookup_prog_detach(attr);
>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>  	case BPF_PROG_TYPE_CGROUP_SKB:
>  	case BPF_PROG_TYPE_CGROUP_SOCK:
> @@ -2867,6 +2874,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
>  		return lirc_prog_query(attr, uattr);
>  	case BPF_FLOW_DISSECTOR:
>  		return skb_flow_dissector_prog_query(attr, uattr);
> +	case BPF_SK_LOOKUP:
> +		return sk_lookup_prog_query(attr, uattr);
"# CONFIG_NET is not set" needs to be taken care.

>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bc25bb1085b1..a00bdc70041c 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9054,6 +9054,253 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
>  
>  const struct bpf_prog_ops sk_reuseport_prog_ops = {
>  };
> +
> +static DEFINE_MUTEX(sk_lookup_prog_mutex);
> +
> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct net *net = current->nsproxy->net_ns;
> +	int ret;
> +
> +	if (unlikely(attr->attach_flags))
> +		return -EINVAL;
> +
> +	mutex_lock(&sk_lookup_prog_mutex);
> +	ret = bpf_prog_attach_one(&net->sk_lookup_prog,
> +				  &sk_lookup_prog_mutex, prog,
> +				  attr->attach_flags);
> +	mutex_unlock(&sk_lookup_prog_mutex);
> +
> +	return ret;
> +}
> +
> +int sk_lookup_prog_detach(const union bpf_attr *attr)
> +{
> +	struct net *net = current->nsproxy->net_ns;
> +	int ret;
> +
> +	if (unlikely(attr->attach_flags))
> +		return -EINVAL;
> +
> +	mutex_lock(&sk_lookup_prog_mutex);
> +	ret = bpf_prog_detach_one(&net->sk_lookup_prog,
> +				  &sk_lookup_prog_mutex);
> +	mutex_unlock(&sk_lookup_prog_mutex);
> +
> +	return ret;
> +}
> +
> +int sk_lookup_prog_query(const union bpf_attr *attr,
> +			 union bpf_attr __user *uattr)
> +{
> +	struct net *net;
> +	int ret;
> +
> +	net = get_net_ns_by_fd(attr->query.target_fd);
> +	if (IS_ERR(net))
> +		return PTR_ERR(net);
> +
> +	ret = bpf_prog_query_one(&net->sk_lookup_prog, attr, uattr);
> +
> +	put_net(net);
> +	return ret;
> +}
> +
> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
> +	   struct sock *, sk, u64, flags)
> +{
> +	if (unlikely(flags != 0))
> +		return -EINVAL;
> +	if (unlikely(!sk_fullsock(sk)))
May be ARG_PTR_TO_SOCKET instead?

> +		return -ESOCKTNOSUPPORT;
> +
> +	/* Check if socket is suitable for packet L3/L4 protocol */
> +	if (sk->sk_protocol != ctx->protocol)
> +		return -EPROTOTYPE;
> +	if (sk->sk_family != ctx->family &&
> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
> +		return -EAFNOSUPPORT;
> +
> +	/* Select socket as lookup result */
> +	ctx->selected_sk = sk;
Could sk be a TCP_ESTABLISHED sk?

> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-08  7:06     ` Martin KaFai Lau
  0 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-08  7:06 UTC (permalink / raw)
  To: dccp

On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
> transport layer when looking up a socket for a received packet.
> 
> When called, SK_LOOKUP program can select a socket that will receive the
> packet. This serves as a mechanism to overcome the limits of what bind()
> API allows to express. Two use-cases driving this work are:
> 
>  (1) steer packets destined to an IP range, fixed port to a socket
> 
>      192.0.2.0/24, port 80 -> NGINX socket
> 
>  (2) steer packets destined to an IP address, any port to a socket
> 
>      198.51.100.1, any port -> L7 proxy socket
> 
> In its run-time context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple. Context can be further extended to include ingress
> interface identifier.
> 
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection. Transport layer then uses the selected
> socket as a result of socket lookup.
> 
> This patch only enables the user to attach an SK_LOOKUP program to a
> network namespace. Subsequent patches hook it up to run on local delivery
> path in ipv4 and ipv6 stacks.
> 
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/linux/bpf_types.h   |   2 +
>  include/linux/filter.h      |  23 ++++
>  include/net/net_namespace.h |   1 +
>  include/uapi/linux/bpf.h    |  53 ++++++++
>  kernel/bpf/syscall.c        |   9 ++
>  net/core/filter.c           | 247 ++++++++++++++++++++++++++++++++++++
>  scripts/bpf_helpers_doc.py  |   9 +-
>  7 files changed, 343 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
> index 8345cdf553b8..08c2aef674ac 100644
> --- a/include/linux/bpf_types.h
> +++ b/include/linux/bpf_types.h
> @@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
>  #ifdef CONFIG_INET
>  BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
>  	      struct sk_reuseport_md, struct sk_reuseport_kern)
> +BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
> +	      struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
>  #endif
>  #if defined(CONFIG_BPF_JIT)
>  BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index af37318bb1c5..33254e840c8d 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1280,4 +1280,27 @@ struct bpf_sockopt_kern {
>  	s32		retval;
>  };
>  
> +struct bpf_sk_lookup_kern {
> +	unsigned short	family;
> +	u16		protocol;
> +	union {
> +		struct {
> +			__be32 saddr;
> +			__be32 daddr;
> +		} v4;
> +		struct {
> +			struct in6_addr saddr;
> +			struct in6_addr daddr;
> +		} v6;
> +	};
> +	__be16		sport;
> +	u16		dport;
> +	struct sock	*selected_sk;
> +};
> +
> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog);
> +int sk_lookup_prog_detach(const union bpf_attr *attr);
> +int sk_lookup_prog_query(const union bpf_attr *attr,
> +			 union bpf_attr __user *uattr);
> +
>  #endif /* __LINUX_FILTER_H__ */
> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> index ab96fb59131c..70bf4888c94d 100644
> --- a/include/net/net_namespace.h
> +++ b/include/net/net_namespace.h
> @@ -163,6 +163,7 @@ struct net {
>  	struct net_generic __rcu	*gen;
>  
>  	struct bpf_prog __rcu	*flow_dissector_prog;
> +	struct bpf_prog __rcu	*sk_lookup_prog;
>  
>  	/* Note : following structs are cache line aligned */
>  #ifdef CONFIG_XFRM
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b3643e27e264..e4c61b63d4bc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -187,6 +187,7 @@ enum bpf_prog_type {
>  	BPF_PROG_TYPE_STRUCT_OPS,
>  	BPF_PROG_TYPE_EXT,
>  	BPF_PROG_TYPE_LSM,
> +	BPF_PROG_TYPE_SK_LOOKUP,
>  };
>  
>  enum bpf_attach_type {
> @@ -218,6 +219,7 @@ enum bpf_attach_type {
>  	BPF_TRACE_FEXIT,
>  	BPF_MODIFY_RETURN,
>  	BPF_LSM_MAC,
> +	BPF_SK_LOOKUP,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> @@ -3041,6 +3043,10 @@ union bpf_attr {
>   *
>   * int bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
>   *	Description
> + *		Helper is overloaded depending on BPF program type. This
> + *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
> + *		**BPF_PROG_TYPE_SCHED_ACT** programs.
> + *
>   *		Assign the *sk* to the *skb*. When combined with appropriate
>   *		routing configuration to receive the packet towards the socket,
>   *		will cause *skb* to be delivered to the specified socket.
> @@ -3061,6 +3067,39 @@ union bpf_attr {
>   *					call from outside of TC ingress.
>   *		* **-ESOCKTNOSUPPORT**	Socket type not supported (reuseport).
>   *
> + * int bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
> + *	Description
> + *		Helper is overloaded depending on BPF program type. This
> + *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
> + *
> + *		Select the *sk* as a result of a socket lookup.
> + *
> + *		For the operation to succeed passed socket must be compatible
> + *		with the packet description provided by the *ctx* object.
> + *
> + *		L4 protocol (*IPPROTO_TCP* or *IPPROTO_UDP*) must be an exact
> + *		match. While IP family (*AF_INET* or *AF_INET6*) must be
> + *		compatible, that is IPv6 sockets that are not v6-only can be
> + *		selected for IPv4 packets.
> + *
> + *		Only full sockets can be selected. However, there is no need to
> + *		call bpf_fullsock() before passing a socket as an argument to
> + *		this helper.
> + *
> + *		The *flags* argument must be zero.
> + *	Return
> + *		0 on success, or a negative errno in case of failure.
> + *
> + *		**-EAFNOSUPPORT** is socket family (*sk->family*) is not
> + *		compatible with packet family (*ctx->family*).
> + *
> + *		**-EINVAL** if unsupported flags were specified.
> + *
> + *		**-EPROTOTYPE** if socket L4 protocol (*sk->protocol*) doesn't
> + *		match packet protocol (*ctx->protocol*).
> + *
> + *		**-ESOCKTNOSUPPORT** if socket is not a full socket.
> + *
>   * u64 bpf_ktime_get_boot_ns(void)
>   * 	Description
>   * 		Return the time elapsed since system boot, in nanoseconds.
> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>  	__u32 pid;
>  	__u32 tgid;
>  };
> +
> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
> +struct bpf_sk_lookup {
> +	__u32 family;		/* AF_INET, AF_INET6 */
> +	__u32 protocol;		/* IPPROTO_TCP, IPPROTO_UDP */
> +	/* IP addresses allows 1, 2, and 4 bytes access */
> +	__u32 src_ip4;
> +	__u32 src_ip6[4];
> +	__u32 src_port;		/* network byte order */
> +	__u32 dst_ip4;
> +	__u32 dst_ip6[4];
> +	__u32 dst_port;		/* host byte order */
> +};
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bb1ab7da6103..26d643c171fd 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2729,6 +2729,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
>  	case BPF_CGROUP_GETSOCKOPT:
>  	case BPF_CGROUP_SETSOCKOPT:
>  		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
> +	case BPF_SK_LOOKUP:
It may be a good idea to enforce the "expected_attach_type =
BPF_SK_LOOKUP" during prog load time in bpf_prog_load_check_attach().
The attr->expected_attach_type could be anything right now if I read
it correctly.

> +		return BPF_PROG_TYPE_SK_LOOKUP;
>  	default:
>  		return BPF_PROG_TYPE_UNSPEC;
>  	}
> @@ -2778,6 +2780,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>  		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
>  		break;
> +	case BPF_PROG_TYPE_SK_LOOKUP:
> +		ret = sk_lookup_prog_attach(attr, prog);
> +		break;
>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>  	case BPF_PROG_TYPE_CGROUP_SKB:
>  	case BPF_PROG_TYPE_CGROUP_SOCK:
> @@ -2818,6 +2823,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>  		return lirc_prog_detach(attr);
>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>  		return skb_flow_dissector_bpf_prog_detach(attr);
> +	case BPF_PROG_TYPE_SK_LOOKUP:
> +		return sk_lookup_prog_detach(attr);
>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>  	case BPF_PROG_TYPE_CGROUP_SKB:
>  	case BPF_PROG_TYPE_CGROUP_SOCK:
> @@ -2867,6 +2874,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
>  		return lirc_prog_query(attr, uattr);
>  	case BPF_FLOW_DISSECTOR:
>  		return skb_flow_dissector_prog_query(attr, uattr);
> +	case BPF_SK_LOOKUP:
> +		return sk_lookup_prog_query(attr, uattr);
"# CONFIG_NET is not set" needs to be taken care.

>  	default:
>  		return -EINVAL;
>  	}
> diff --git a/net/core/filter.c b/net/core/filter.c
> index bc25bb1085b1..a00bdc70041c 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -9054,6 +9054,253 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
>  
>  const struct bpf_prog_ops sk_reuseport_prog_ops = {
>  };
> +
> +static DEFINE_MUTEX(sk_lookup_prog_mutex);
> +
> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> +{
> +	struct net *net = current->nsproxy->net_ns;
> +	int ret;
> +
> +	if (unlikely(attr->attach_flags))
> +		return -EINVAL;
> +
> +	mutex_lock(&sk_lookup_prog_mutex);
> +	ret = bpf_prog_attach_one(&net->sk_lookup_prog,
> +				  &sk_lookup_prog_mutex, prog,
> +				  attr->attach_flags);
> +	mutex_unlock(&sk_lookup_prog_mutex);
> +
> +	return ret;
> +}
> +
> +int sk_lookup_prog_detach(const union bpf_attr *attr)
> +{
> +	struct net *net = current->nsproxy->net_ns;
> +	int ret;
> +
> +	if (unlikely(attr->attach_flags))
> +		return -EINVAL;
> +
> +	mutex_lock(&sk_lookup_prog_mutex);
> +	ret = bpf_prog_detach_one(&net->sk_lookup_prog,
> +				  &sk_lookup_prog_mutex);
> +	mutex_unlock(&sk_lookup_prog_mutex);
> +
> +	return ret;
> +}
> +
> +int sk_lookup_prog_query(const union bpf_attr *attr,
> +			 union bpf_attr __user *uattr)
> +{
> +	struct net *net;
> +	int ret;
> +
> +	net = get_net_ns_by_fd(attr->query.target_fd);
> +	if (IS_ERR(net))
> +		return PTR_ERR(net);
> +
> +	ret = bpf_prog_query_one(&net->sk_lookup_prog, attr, uattr);
> +
> +	put_net(net);
> +	return ret;
> +}
> +
> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
> +	   struct sock *, sk, u64, flags)
> +{
> +	if (unlikely(flags != 0))
> +		return -EINVAL;
> +	if (unlikely(!sk_fullsock(sk)))
May be ARG_PTR_TO_SOCKET instead?

> +		return -ESOCKTNOSUPPORT;
> +
> +	/* Check if socket is suitable for packet L3/L4 protocol */
> +	if (sk->sk_protocol != ctx->protocol)
> +		return -EPROTOTYPE;
> +	if (sk->sk_family != ctx->family &&
> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
> +		return -EAFNOSUPPORT;
> +
> +	/* Select socket as lookup result */
> +	ctx->selected_sk = sk;
Could sk be a TCP_ESTABLISHED sk?

> +	return 0;
> +}
> +

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-08  8:54           ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-08  8:54 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: Lorenz Bauer, Networking, bpf, dccp, kernel-team,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Eric Dumazet, Gerrit Renker, Jakub Kicinski, Marek Majkowski

On Thu, May 07, 2020 at 10:55 PM CEST, Martin KaFai Lau wrote:
> On Wed, May 06, 2020 at 03:53:35PM +0200, Jakub Sitnicki wrote:
>> On Wed, May 06, 2020 at 03:16 PM CEST, Lorenz Bauer wrote:
>> > On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> [...]
>>
>> >> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>> >>         __u32 pid;
>> >>         __u32 tgid;
>> >>  };
>> >> +
>> >> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
>> >> +struct bpf_sk_lookup {
>> >> +       __u32 family;           /* AF_INET, AF_INET6 */
>> >> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
>> >> +       /* IP addresses allows 1, 2, and 4 bytes access */
>> >> +       __u32 src_ip4;
>> >> +       __u32 src_ip6[4];
>> >> +       __u32 src_port;         /* network byte order */
>> >> +       __u32 dst_ip4;
>> >> +       __u32 dst_ip6[4];
>> >> +       __u32 dst_port;         /* host byte order */
>> >
>> > Jakub and I have discussed this off-list, but we couldn't come to an
>> > agreement and decided to invite
>> > your opinion.
>> >
>> > I think that dst_port should be in network byte order, since it's one
>> > less exception to the
>> > rule to think about when writing BPF programs.
>> >
>> > Jakub's argument is that this follows __sk_buff->local_port precedent,
>> > which is also in host
>> > byte order.
>>
>> Yes, would be great to hear if there is a preference here.
>>
>> Small correction, proposed sk_lookup program doesn't have access to
>> __sk_buff, so perhaps that case matters less.
>>
>> bpf_sk_lookup->dst_port, the packet destination port, is in host byte
>> order so that it can be compared against bpf_sock->src_port, socket
>> local port, without conversion.
>>
>> But I also see how it can be a surprise for a BPF user that one field has
>> a different byte order.
> I would also prefer port and addr were all in the same byte order.
> However, it is not the cases for the other prog_type ctx.
> People has stomped on it from time to time.  May be something
> can be done at the libbpf to hide this difference.
>
> I think uapi consistency with other existing ctx is more important here.
> (i.e. keep the "local" port in host order).  Otherwise, the user will
> be slapped left and right when writting bpf_prog in different prog_type.
>
> Armed with the knowledge on skc_num, the "local" port is
> in host byte order in the current existing prog ctx.  It is
> unfortunate that the "dst"_port in this patch is the "local" port.
> The "local" port in "struct bpf_sock" is actually the "src"_port. :/
> Would "local"/"remote" be clearer than "src"/dst" in this patch?

I went and compared the field naming and byte order in existing structs:

  | struct         | field      | byte order |
  |----------------+------------+------------|
  | __sk_buff      | local_port | host       |
  | sk_msg_md      | local_port | host       |
  | bpf_sock_ops   | local_port | host       |
  | bpf_sock       | src_port   | host       |
  | bpf_fib_lookup | dport      | network    |
  | bpf_flow_keys  | dport      | network    |
  | bpf_sock_tuple | dport      | network    |
  | bpf_sock_addr  | user_port  | network    |

It does look like "local"/"remote" prefix is the sensible choice.

I got carried away trying to match the field names with bpf_sock, which
actually doesn't follow the naming convention.

Will rename fields to local_*, remote_* in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-08  8:54           ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-08  8:54 UTC (permalink / raw)
  To: dccp

On Thu, May 07, 2020 at 10:55 PM CEST, Martin KaFai Lau wrote:
> On Wed, May 06, 2020 at 03:53:35PM +0200, Jakub Sitnicki wrote:
>> On Wed, May 06, 2020 at 03:16 PM CEST, Lorenz Bauer wrote:
>> > On Wed, 6 May 2020 at 13:55, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> [...]
>>
>> >> @@ -4012,4 +4051,18 @@ struct bpf_pidns_info {
>> >>         __u32 pid;
>> >>         __u32 tgid;
>> >>  };
>> >> +
>> >> +/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
>> >> +struct bpf_sk_lookup {
>> >> +       __u32 family;           /* AF_INET, AF_INET6 */
>> >> +       __u32 protocol;         /* IPPROTO_TCP, IPPROTO_UDP */
>> >> +       /* IP addresses allows 1, 2, and 4 bytes access */
>> >> +       __u32 src_ip4;
>> >> +       __u32 src_ip6[4];
>> >> +       __u32 src_port;         /* network byte order */
>> >> +       __u32 dst_ip4;
>> >> +       __u32 dst_ip6[4];
>> >> +       __u32 dst_port;         /* host byte order */
>> >
>> > Jakub and I have discussed this off-list, but we couldn't come to an
>> > agreement and decided to invite
>> > your opinion.
>> >
>> > I think that dst_port should be in network byte order, since it's one
>> > less exception to the
>> > rule to think about when writing BPF programs.
>> >
>> > Jakub's argument is that this follows __sk_buff->local_port precedent,
>> > which is also in host
>> > byte order.
>>
>> Yes, would be great to hear if there is a preference here.
>>
>> Small correction, proposed sk_lookup program doesn't have access to
>> __sk_buff, so perhaps that case matters less.
>>
>> bpf_sk_lookup->dst_port, the packet destination port, is in host byte
>> order so that it can be compared against bpf_sock->src_port, socket
>> local port, without conversion.
>>
>> But I also see how it can be a surprise for a BPF user that one field has
>> a different byte order.
> I would also prefer port and addr were all in the same byte order.
> However, it is not the cases for the other prog_type ctx.
> People has stomped on it from time to time.  May be something
> can be done at the libbpf to hide this difference.
>
> I think uapi consistency with other existing ctx is more important here.
> (i.e. keep the "local" port in host order).  Otherwise, the user will
> be slapped left and right when writting bpf_prog in different prog_type.
>
> Armed with the knowledge on skc_num, the "local" port is
> in host byte order in the current existing prog ctx.  It is
> unfortunate that the "dst"_port in this patch is the "local" port.
> The "local" port in "struct bpf_sock" is actually the "src"_port. :/
> Would "local"/"remote" be clearer than "src"/dst" in this patch?

I went and compared the field naming and byte order in existing structs:

  | struct         | field      | byte order |
  |----------------+------------+------------|
  | __sk_buff      | local_port | host       |
  | sk_msg_md      | local_port | host       |
  | bpf_sock_ops   | local_port | host       |
  | bpf_sock       | src_port   | host       |
  | bpf_fib_lookup | dport      | network    |
  | bpf_flow_keys  | dport      | network    |
  | bpf_sock_tuple | dport      | network    |
  | bpf_sock_addr  | user_port  | network    |

It does look like "local"/"remote" prefix is the sensible choice.

I got carried away trying to match the field names with bpf_sock, which
actually doesn't follow the naming convention.

Will rename fields to local_*, remote_* in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-08 10:45       ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-08 10:45 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
>> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
>> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
>> transport layer when looking up a socket for a received packet.
>>
>> When called, SK_LOOKUP program can select a socket that will receive the
>> packet. This serves as a mechanism to overcome the limits of what bind()
>> API allows to express. Two use-cases driving this work are:
>>
>>  (1) steer packets destined to an IP range, fixed port to a socket
>>
>>      192.0.2.0/24, port 80 -> NGINX socket
>>
>>  (2) steer packets destined to an IP address, any port to a socket
>>
>>      198.51.100.1, any port -> L7 proxy socket
>>
>> In its run-time context, program receives information about the packet that
>> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
>> address 4-tuple. Context can be further extended to include ingress
>> interface identifier.
>>
>> To select a socket BPF program fetches it from a map holding socket
>> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
>> helper to record the selection. Transport layer then uses the selected
>> socket as a result of socket lookup.
>>
>> This patch only enables the user to attach an SK_LOOKUP program to a
>> network namespace. Subsequent patches hook it up to run on local delivery
>> path in ipv4 and ipv6 stacks.
>>
>> Suggested-by: Marek Majkowski <marek@cloudflare.com>
>> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---

[...]

>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index bb1ab7da6103..26d643c171fd 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2729,6 +2729,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
>>  	case BPF_CGROUP_GETSOCKOPT:
>>  	case BPF_CGROUP_SETSOCKOPT:
>>  		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
>> +	case BPF_SK_LOOKUP:
> It may be a good idea to enforce the "expected_attach_type ==
> BPF_SK_LOOKUP" during prog load time in bpf_prog_load_check_attach().
> The attr->expected_attach_type could be anything right now if I read
> it correctly.

I'll extend bpf_prog_attach_check_attach_type to enforce it for SK_LOOKUP.

>
>> +		return BPF_PROG_TYPE_SK_LOOKUP;
>>  	default:
>>  		return BPF_PROG_TYPE_UNSPEC;
>>  	}
>> @@ -2778,6 +2780,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>>  		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
>>  		break;
>> +	case BPF_PROG_TYPE_SK_LOOKUP:
>> +		ret = sk_lookup_prog_attach(attr, prog);
>> +		break;
>>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>>  	case BPF_PROG_TYPE_CGROUP_SKB:
>>  	case BPF_PROG_TYPE_CGROUP_SOCK:
>> @@ -2818,6 +2823,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>>  		return lirc_prog_detach(attr);
>>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>>  		return skb_flow_dissector_bpf_prog_detach(attr);
>> +	case BPF_PROG_TYPE_SK_LOOKUP:
>> +		return sk_lookup_prog_detach(attr);
>>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>>  	case BPF_PROG_TYPE_CGROUP_SKB:
>>  	case BPF_PROG_TYPE_CGROUP_SOCK:
>> @@ -2867,6 +2874,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
>>  		return lirc_prog_query(attr, uattr);
>>  	case BPF_FLOW_DISSECTOR:
>>  		return skb_flow_dissector_prog_query(attr, uattr);
>> +	case BPF_SK_LOOKUP:
>> +		return sk_lookup_prog_query(attr, uattr);
> "# CONFIG_NET is not set" needs to be taken care.

Sorry, embarassing mistake. Will add stubs returning -EINVAL like
flow_dissector and cgroup_bpf progs have.

>
>>  	default:
>>  		return -EINVAL;
>>  	}
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index bc25bb1085b1..a00bdc70041c 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -9054,6 +9054,253 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
>>
>>  const struct bpf_prog_ops sk_reuseport_prog_ops = {
>>  };
>> +
>> +static DEFINE_MUTEX(sk_lookup_prog_mutex);
>> +
>> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>> +{
>> +	struct net *net = current->nsproxy->net_ns;
>> +	int ret;
>> +
>> +	if (unlikely(attr->attach_flags))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&sk_lookup_prog_mutex);
>> +	ret = bpf_prog_attach_one(&net->sk_lookup_prog,
>> +				  &sk_lookup_prog_mutex, prog,
>> +				  attr->attach_flags);
>> +	mutex_unlock(&sk_lookup_prog_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +int sk_lookup_prog_detach(const union bpf_attr *attr)
>> +{
>> +	struct net *net = current->nsproxy->net_ns;
>> +	int ret;
>> +
>> +	if (unlikely(attr->attach_flags))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&sk_lookup_prog_mutex);
>> +	ret = bpf_prog_detach_one(&net->sk_lookup_prog,
>> +				  &sk_lookup_prog_mutex);
>> +	mutex_unlock(&sk_lookup_prog_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +int sk_lookup_prog_query(const union bpf_attr *attr,
>> +			 union bpf_attr __user *uattr)
>> +{
>> +	struct net *net;
>> +	int ret;
>> +
>> +	net = get_net_ns_by_fd(attr->query.target_fd);
>> +	if (IS_ERR(net))
>> +		return PTR_ERR(net);
>> +
>> +	ret = bpf_prog_query_one(&net->sk_lookup_prog, attr, uattr);
>> +
>> +	put_net(net);
>> +	return ret;
>> +}
>> +
>> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
>> +	   struct sock *, sk, u64, flags)
>> +{
>> +	if (unlikely(flags != 0))
>> +		return -EINVAL;
>> +	if (unlikely(!sk_fullsock(sk)))
> May be ARG_PTR_TO_SOCKET instead?

I had ARG_PTR_TO_SOCKET initially, then switched to SOCK_COMMON to match
the TC bpf_sk_assign proto. Now that you point it out, it makes more
sense to be more specific in the helper proto.

>
>> +		return -ESOCKTNOSUPPORT;
>> +
>> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> +	if (sk->sk_protocol != ctx->protocol)
>> +		return -EPROTOTYPE;
>> +	if (sk->sk_family != ctx->family &&
>> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
>> +		return -EAFNOSUPPORT;
>> +
>> +	/* Select socket as lookup result */
>> +	ctx->selected_sk = sk;
> Could sk be a TCP_ESTABLISHED sk?

Yes, and what's worse, it could be ref-counted. This is a bug. I should
be rejecting ref counted sockets here.

Callers of __inet_lookup_listener() and inet6_lookup_listener() expect
an RCU-freed socket on return.

For UDP lookup, returning a TCP_ESTABLISHED (connected) socket is okay.


Thank you for valuable comments. Will fix all of the above in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-08 10:45       ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-08 10:45 UTC (permalink / raw)
  To: dccp

On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
>> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
>> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
>> transport layer when looking up a socket for a received packet.
>>
>> When called, SK_LOOKUP program can select a socket that will receive the
>> packet. This serves as a mechanism to overcome the limits of what bind()
>> API allows to express. Two use-cases driving this work are:
>>
>>  (1) steer packets destined to an IP range, fixed port to a socket
>>
>>      192.0.2.0/24, port 80 -> NGINX socket
>>
>>  (2) steer packets destined to an IP address, any port to a socket
>>
>>      198.51.100.1, any port -> L7 proxy socket
>>
>> In its run-time context, program receives information about the packet that
>> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
>> address 4-tuple. Context can be further extended to include ingress
>> interface identifier.
>>
>> To select a socket BPF program fetches it from a map holding socket
>> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
>> helper to record the selection. Transport layer then uses the selected
>> socket as a result of socket lookup.
>>
>> This patch only enables the user to attach an SK_LOOKUP program to a
>> network namespace. Subsequent patches hook it up to run on local delivery
>> path in ipv4 and ipv6 stacks.
>>
>> Suggested-by: Marek Majkowski <marek@cloudflare.com>
>> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---

[...]

>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index bb1ab7da6103..26d643c171fd 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -2729,6 +2729,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
>>  	case BPF_CGROUP_GETSOCKOPT:
>>  	case BPF_CGROUP_SETSOCKOPT:
>>  		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
>> +	case BPF_SK_LOOKUP:
> It may be a good idea to enforce the "expected_attach_type =
> BPF_SK_LOOKUP" during prog load time in bpf_prog_load_check_attach().
> The attr->expected_attach_type could be anything right now if I read
> it correctly.

I'll extend bpf_prog_attach_check_attach_type to enforce it for SK_LOOKUP.

>
>> +		return BPF_PROG_TYPE_SK_LOOKUP;
>>  	default:
>>  		return BPF_PROG_TYPE_UNSPEC;
>>  	}
>> @@ -2778,6 +2780,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>>  		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
>>  		break;
>> +	case BPF_PROG_TYPE_SK_LOOKUP:
>> +		ret = sk_lookup_prog_attach(attr, prog);
>> +		break;
>>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>>  	case BPF_PROG_TYPE_CGROUP_SKB:
>>  	case BPF_PROG_TYPE_CGROUP_SOCK:
>> @@ -2818,6 +2823,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>>  		return lirc_prog_detach(attr);
>>  	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>>  		return skb_flow_dissector_bpf_prog_detach(attr);
>> +	case BPF_PROG_TYPE_SK_LOOKUP:
>> +		return sk_lookup_prog_detach(attr);
>>  	case BPF_PROG_TYPE_CGROUP_DEVICE:
>>  	case BPF_PROG_TYPE_CGROUP_SKB:
>>  	case BPF_PROG_TYPE_CGROUP_SOCK:
>> @@ -2867,6 +2874,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
>>  		return lirc_prog_query(attr, uattr);
>>  	case BPF_FLOW_DISSECTOR:
>>  		return skb_flow_dissector_prog_query(attr, uattr);
>> +	case BPF_SK_LOOKUP:
>> +		return sk_lookup_prog_query(attr, uattr);
> "# CONFIG_NET is not set" needs to be taken care.

Sorry, embarassing mistake. Will add stubs returning -EINVAL like
flow_dissector and cgroup_bpf progs have.

>
>>  	default:
>>  		return -EINVAL;
>>  	}
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index bc25bb1085b1..a00bdc70041c 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -9054,6 +9054,253 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
>>
>>  const struct bpf_prog_ops sk_reuseport_prog_ops = {
>>  };
>> +
>> +static DEFINE_MUTEX(sk_lookup_prog_mutex);
>> +
>> +int sk_lookup_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>> +{
>> +	struct net *net = current->nsproxy->net_ns;
>> +	int ret;
>> +
>> +	if (unlikely(attr->attach_flags))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&sk_lookup_prog_mutex);
>> +	ret = bpf_prog_attach_one(&net->sk_lookup_prog,
>> +				  &sk_lookup_prog_mutex, prog,
>> +				  attr->attach_flags);
>> +	mutex_unlock(&sk_lookup_prog_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +int sk_lookup_prog_detach(const union bpf_attr *attr)
>> +{
>> +	struct net *net = current->nsproxy->net_ns;
>> +	int ret;
>> +
>> +	if (unlikely(attr->attach_flags))
>> +		return -EINVAL;
>> +
>> +	mutex_lock(&sk_lookup_prog_mutex);
>> +	ret = bpf_prog_detach_one(&net->sk_lookup_prog,
>> +				  &sk_lookup_prog_mutex);
>> +	mutex_unlock(&sk_lookup_prog_mutex);
>> +
>> +	return ret;
>> +}
>> +
>> +int sk_lookup_prog_query(const union bpf_attr *attr,
>> +			 union bpf_attr __user *uattr)
>> +{
>> +	struct net *net;
>> +	int ret;
>> +
>> +	net = get_net_ns_by_fd(attr->query.target_fd);
>> +	if (IS_ERR(net))
>> +		return PTR_ERR(net);
>> +
>> +	ret = bpf_prog_query_one(&net->sk_lookup_prog, attr, uattr);
>> +
>> +	put_net(net);
>> +	return ret;
>> +}
>> +
>> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
>> +	   struct sock *, sk, u64, flags)
>> +{
>> +	if (unlikely(flags != 0))
>> +		return -EINVAL;
>> +	if (unlikely(!sk_fullsock(sk)))
> May be ARG_PTR_TO_SOCKET instead?

I had ARG_PTR_TO_SOCKET initially, then switched to SOCK_COMMON to match
the TC bpf_sk_assign proto. Now that you point it out, it makes more
sense to be more specific in the helper proto.

>
>> +		return -ESOCKTNOSUPPORT;
>> +
>> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> +	if (sk->sk_protocol != ctx->protocol)
>> +		return -EPROTOTYPE;
>> +	if (sk->sk_family != ctx->family &&
>> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
>> +		return -EAFNOSUPPORT;
>> +
>> +	/* Select socket as lookup result */
>> +	ctx->selected_sk = sk;
> Could sk be a TCP_ESTABLISHED sk?

Yes, and what's worse, it could be ref-counted. This is a bug. I should
be rejecting ref counted sockets here.

Callers of __inet_lookup_listener() and inet6_lookup_listener() expect
an RCU-freed socket on return.

For UDP lookup, returning a TCP_ESTABLISHED (connected) socket is okay.


Thank you for valuable comments. Will fix all of the above in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
  2020-05-06 12:55   ` Jakub Sitnicki
@ 2020-05-08 17:41     ` Andrii Nakryiko
  -1 siblings, 0 replies; 68+ messages in thread
From: Andrii Nakryiko @ 2020-05-08 17:41 UTC (permalink / raw)
  To: Jakub Sitnicki, Yonghong Song
  Cc: Networking, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski

On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Make libbpf aware of the newly added program type, and assign it a
> section name.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  tools/lib/bpf/libbpf.c        | 3 +++
>  tools/lib/bpf/libbpf.h        | 2 ++
>  tools/lib/bpf/libbpf.map      | 2 ++
>  tools/lib/bpf/libbpf_probes.c | 1 +
>  4 files changed, 8 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 977add1b73e2..74f4a15dc19e 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
>  BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
>  BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
>  BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
> +BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
>
>  enum bpf_attach_type
>  bpf_program__get_expected_attach_type(struct bpf_program *prog)
> @@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
>         BPF_EAPROG_SEC("cgroup/setsockopt",     BPF_PROG_TYPE_CGROUP_SOCKOPT,
>                                                 BPF_CGROUP_SETSOCKOPT),
>         BPF_PROG_SEC("struct_ops",              BPF_PROG_TYPE_STRUCT_OPS),
> +       BPF_EAPROG_SEC("sk_lookup",             BPF_PROG_TYPE_SK_LOOKUP,
> +                                               BPF_SK_LOOKUP),
>  };
>
>  #undef BPF_PROG_SEC_IMPL
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index f1dacecb1619..8373fbacbba3 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
>  LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
>  LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
>  LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
> +LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
>
>  LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
>  LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
> @@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
>  LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
>  LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
>  LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
> +LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);

cc Yonghong, bpf_iter programs should probably have similar
is_xxx/set_xxx functions?..

>
>  /*
>   * No need for __attribute__((packed)), all members of 'bpf_map_def'
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index e03bd4db827e..113ac0a669c2 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
>                 bpf_program__set_attach_target;
>                 bpf_program__set_lsm;
>                 bpf_set_link_xdp_fd_opts;
> +               bpf_program__is_sk_lookup;
> +               bpf_program__set_sk_lookup;
>  } LIBBPF_0.0.7;
>

0.0.8 is sealed, please add them into 0.0.9 map below

>  LIBBPF_0.0.9 {
> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> index 2c92059c0c90..5c6d3e49f254 100644
> --- a/tools/lib/bpf/libbpf_probes.c
> +++ b/tools/lib/bpf/libbpf_probes.c
> @@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
>         case BPF_PROG_TYPE_STRUCT_OPS:
>         case BPF_PROG_TYPE_EXT:
>         case BPF_PROG_TYPE_LSM:
> +       case BPF_PROG_TYPE_SK_LOOKUP:
>         default:
>                 break;
>         }
> --
> 2.25.3
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
@ 2020-05-08 17:41     ` Andrii Nakryiko
  0 siblings, 0 replies; 68+ messages in thread
From: Andrii Nakryiko @ 2020-05-08 17:41 UTC (permalink / raw)
  To: dccp

On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Make libbpf aware of the newly added program type, and assign it a
> section name.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  tools/lib/bpf/libbpf.c        | 3 +++
>  tools/lib/bpf/libbpf.h        | 2 ++
>  tools/lib/bpf/libbpf.map      | 2 ++
>  tools/lib/bpf/libbpf_probes.c | 1 +
>  4 files changed, 8 insertions(+)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 977add1b73e2..74f4a15dc19e 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
>  BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
>  BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
>  BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
> +BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
>
>  enum bpf_attach_type
>  bpf_program__get_expected_attach_type(struct bpf_program *prog)
> @@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
>         BPF_EAPROG_SEC("cgroup/setsockopt",     BPF_PROG_TYPE_CGROUP_SOCKOPT,
>                                                 BPF_CGROUP_SETSOCKOPT),
>         BPF_PROG_SEC("struct_ops",              BPF_PROG_TYPE_STRUCT_OPS),
> +       BPF_EAPROG_SEC("sk_lookup",             BPF_PROG_TYPE_SK_LOOKUP,
> +                                               BPF_SK_LOOKUP),
>  };
>
>  #undef BPF_PROG_SEC_IMPL
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index f1dacecb1619..8373fbacbba3 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
>  LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
>  LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
>  LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
> +LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
>
>  LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
>  LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
> @@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
>  LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
>  LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
>  LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
> +LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);

cc Yonghong, bpf_iter programs should probably have similar
is_xxx/set_xxx functions?..

>
>  /*
>   * No need for __attribute__((packed)), all members of 'bpf_map_def'
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index e03bd4db827e..113ac0a669c2 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
>                 bpf_program__set_attach_target;
>                 bpf_program__set_lsm;
>                 bpf_set_link_xdp_fd_opts;
> +               bpf_program__is_sk_lookup;
> +               bpf_program__set_sk_lookup;
>  } LIBBPF_0.0.7;
>

0.0.8 is sealed, please add them into 0.0.9 map below

>  LIBBPF_0.0.9 {
> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> index 2c92059c0c90..5c6d3e49f254 100644
> --- a/tools/lib/bpf/libbpf_probes.c
> +++ b/tools/lib/bpf/libbpf_probes.c
> @@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
>         case BPF_PROG_TYPE_STRUCT_OPS:
>         case BPF_PROG_TYPE_EXT:
>         case BPF_PROG_TYPE_LSM:
> +       case BPF_PROG_TYPE_SK_LOOKUP:
>         default:
>                 break;
>         }
> --
> 2.25.3
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
  2020-05-06 12:55   ` Jakub Sitnicki
@ 2020-05-08 17:52       ` Yonghong Song
  -1 siblings, 0 replies; 68+ messages in thread
From: Yonghong Song @ 2020-05-08 17:52 UTC (permalink / raw)
  To: Andrii Nakryiko, Jakub Sitnicki
  Cc: Networking, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski



On 5/8/20 10:41 AM, Andrii Nakryiko wrote:
> On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Make libbpf aware of the newly added program type, and assign it a
>> section name.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>   tools/lib/bpf/libbpf.c        | 3 +++
>>   tools/lib/bpf/libbpf.h        | 2 ++
>>   tools/lib/bpf/libbpf.map      | 2 ++
>>   tools/lib/bpf/libbpf_probes.c | 1 +
>>   4 files changed, 8 insertions(+)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 977add1b73e2..74f4a15dc19e 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
>>   BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
>>   BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
>>   BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
>> +BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
>>
>>   enum bpf_attach_type
>>   bpf_program__get_expected_attach_type(struct bpf_program *prog)
>> @@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
>>          BPF_EAPROG_SEC("cgroup/setsockopt",     BPF_PROG_TYPE_CGROUP_SOCKOPT,
>>                                                  BPF_CGROUP_SETSOCKOPT),
>>          BPF_PROG_SEC("struct_ops",              BPF_PROG_TYPE_STRUCT_OPS),
>> +       BPF_EAPROG_SEC("sk_lookup",             BPF_PROG_TYPE_SK_LOOKUP,
>> +                                               BPF_SK_LOOKUP),
>>   };
>>
>>   #undef BPF_PROG_SEC_IMPL
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index f1dacecb1619..8373fbacbba3 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
>>   LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
>>   LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
>>   LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
>> +LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
>>
>>   LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
>>   LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
>> @@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
>>   LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
>>   LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
>>   LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
>> +LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
> 
> cc Yonghong, bpf_iter programs should probably have similar
> is_xxx/set_xxx functions?..

Not sure about this. bpf_iter programs have prog type TRACING
which is covered by the above bpf_program__is_tracing.

> 
>>
>>   /*
>>    * No need for __attribute__((packed)), all members of 'bpf_map_def'
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index e03bd4db827e..113ac0a669c2 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
>>                  bpf_program__set_attach_target;
>>                  bpf_program__set_lsm;
>>                  bpf_set_link_xdp_fd_opts;
>> +               bpf_program__is_sk_lookup;
>> +               bpf_program__set_sk_lookup;
>>   } LIBBPF_0.0.7;
>>
> 
> 0.0.8 is sealed, please add them into 0.0.9 map below
> 
>>   LIBBPF_0.0.9 {
>> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
>> index 2c92059c0c90..5c6d3e49f254 100644
>> --- a/tools/lib/bpf/libbpf_probes.c
>> +++ b/tools/lib/bpf/libbpf_probes.c
>> @@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
>>          case BPF_PROG_TYPE_STRUCT_OPS:
>>          case BPF_PROG_TYPE_EXT:
>>          case BPF_PROG_TYPE_LSM:
>> +       case BPF_PROG_TYPE_SK_LOOKUP:
>>          default:
>>                  break;
>>          }
>> --
>> 2.25.3
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
@ 2020-05-08 17:52       ` Yonghong Song
  0 siblings, 0 replies; 68+ messages in thread
From: Yonghong Song @ 2020-05-08 17:52 UTC (permalink / raw)
  To: dccp



On 5/8/20 10:41 AM, Andrii Nakryiko wrote:
> On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Make libbpf aware of the newly added program type, and assign it a
>> section name.
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>   tools/lib/bpf/libbpf.c        | 3 +++
>>   tools/lib/bpf/libbpf.h        | 2 ++
>>   tools/lib/bpf/libbpf.map      | 2 ++
>>   tools/lib/bpf/libbpf_probes.c | 1 +
>>   4 files changed, 8 insertions(+)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 977add1b73e2..74f4a15dc19e 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
>>   BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
>>   BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
>>   BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
>> +BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
>>
>>   enum bpf_attach_type
>>   bpf_program__get_expected_attach_type(struct bpf_program *prog)
>> @@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
>>          BPF_EAPROG_SEC("cgroup/setsockopt",     BPF_PROG_TYPE_CGROUP_SOCKOPT,
>>                                                  BPF_CGROUP_SETSOCKOPT),
>>          BPF_PROG_SEC("struct_ops",              BPF_PROG_TYPE_STRUCT_OPS),
>> +       BPF_EAPROG_SEC("sk_lookup",             BPF_PROG_TYPE_SK_LOOKUP,
>> +                                               BPF_SK_LOOKUP),
>>   };
>>
>>   #undef BPF_PROG_SEC_IMPL
>> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
>> index f1dacecb1619..8373fbacbba3 100644
>> --- a/tools/lib/bpf/libbpf.h
>> +++ b/tools/lib/bpf/libbpf.h
>> @@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
>>   LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
>>   LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
>>   LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
>> +LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
>>
>>   LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
>>   LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
>> @@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
>>   LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
>>   LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
>>   LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
>> +LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
> 
> cc Yonghong, bpf_iter programs should probably have similar
> is_xxx/set_xxx functions?..

Not sure about this. bpf_iter programs have prog type TRACING
which is covered by the above bpf_program__is_tracing.

> 
>>
>>   /*
>>    * No need for __attribute__((packed)), all members of 'bpf_map_def'
>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index e03bd4db827e..113ac0a669c2 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
>>                  bpf_program__set_attach_target;
>>                  bpf_program__set_lsm;
>>                  bpf_set_link_xdp_fd_opts;
>> +               bpf_program__is_sk_lookup;
>> +               bpf_program__set_sk_lookup;
>>   } LIBBPF_0.0.7;
>>
> 
> 0.0.8 is sealed, please add them into 0.0.9 map below
> 
>>   LIBBPF_0.0.9 {
>> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
>> index 2c92059c0c90..5c6d3e49f254 100644
>> --- a/tools/lib/bpf/libbpf_probes.c
>> +++ b/tools/lib/bpf/libbpf_probes.c
>> @@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
>>          case BPF_PROG_TYPE_STRUCT_OPS:
>>          case BPF_PROG_TYPE_EXT:
>>          case BPF_PROG_TYPE_LSM:
>> +       case BPF_PROG_TYPE_SK_LOOKUP:
>>          default:
>>                  break;
>>          }
>> --
>> 2.25.3
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
  2020-05-06 12:55   ` Jakub Sitnicki
@ 2020-05-08 17:59         ` Andrii Nakryiko
  -1 siblings, 0 replies; 68+ messages in thread
From: Andrii Nakryiko @ 2020-05-08 17:59 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Jakub Sitnicki, Networking, bpf, dccp, kernel-team,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Eric Dumazet, Gerrit Renker, Jakub Kicinski

On Fri, May 8, 2020 at 10:52 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 5/8/20 10:41 AM, Andrii Nakryiko wrote:
> > On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Make libbpf aware of the newly added program type, and assign it a
> >> section name.
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>   tools/lib/bpf/libbpf.c        | 3 +++
> >>   tools/lib/bpf/libbpf.h        | 2 ++
> >>   tools/lib/bpf/libbpf.map      | 2 ++
> >>   tools/lib/bpf/libbpf_probes.c | 1 +
> >>   4 files changed, 8 insertions(+)
> >>
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index 977add1b73e2..74f4a15dc19e 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
> >>   BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
> >>   BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
> >>   BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
> >> +BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
> >>
> >>   enum bpf_attach_type
> >>   bpf_program__get_expected_attach_type(struct bpf_program *prog)
> >> @@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
> >>          BPF_EAPROG_SEC("cgroup/setsockopt",     BPF_PROG_TYPE_CGROUP_SOCKOPT,
> >>                                                  BPF_CGROUP_SETSOCKOPT),
> >>          BPF_PROG_SEC("struct_ops",              BPF_PROG_TYPE_STRUCT_OPS),
> >> +       BPF_EAPROG_SEC("sk_lookup",             BPF_PROG_TYPE_SK_LOOKUP,
> >> +                                               BPF_SK_LOOKUP),
> >>   };
> >>
> >>   #undef BPF_PROG_SEC_IMPL
> >> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> >> index f1dacecb1619..8373fbacbba3 100644
> >> --- a/tools/lib/bpf/libbpf.h
> >> +++ b/tools/lib/bpf/libbpf.h
> >> @@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
> >>   LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
> >>   LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
> >>   LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
> >> +LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
> >>
> >>   LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
> >>   LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
> >> @@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
> >>   LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
> >>   LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
> >>   LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
> >> +LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
> >
> > cc Yonghong, bpf_iter programs should probably have similar
> > is_xxx/set_xxx functions?..
>
> Not sure about this. bpf_iter programs have prog type TRACING
> which is covered by the above bpf_program__is_tracing.

Ah, right, never mind then, sorry.

>
> >
> >>
> >>   /*
> >>    * No need for __attribute__((packed)), all members of 'bpf_map_def'
> >> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> >> index e03bd4db827e..113ac0a669c2 100644
> >> --- a/tools/lib/bpf/libbpf.map
> >> +++ b/tools/lib/bpf/libbpf.map
> >> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
> >>                  bpf_program__set_attach_target;
> >>                  bpf_program__set_lsm;
> >>                  bpf_set_link_xdp_fd_opts;
> >> +               bpf_program__is_sk_lookup;
> >> +               bpf_program__set_sk_lookup;
> >>   } LIBBPF_0.0.7;
> >>
> >
> > 0.0.8 is sealed, please add them into 0.0.9 map below
> >
> >>   LIBBPF_0.0.9 {
> >> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> >> index 2c92059c0c90..5c6d3e49f254 100644
> >> --- a/tools/lib/bpf/libbpf_probes.c
> >> +++ b/tools/lib/bpf/libbpf_probes.c
> >> @@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
> >>          case BPF_PROG_TYPE_STRUCT_OPS:
> >>          case BPF_PROG_TYPE_EXT:
> >>          case BPF_PROG_TYPE_LSM:
> >> +       case BPF_PROG_TYPE_SK_LOOKUP:
> >>          default:
> >>                  break;
> >>          }
> >> --
> >> 2.25.3
> >>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
@ 2020-05-08 17:59         ` Andrii Nakryiko
  0 siblings, 0 replies; 68+ messages in thread
From: Andrii Nakryiko @ 2020-05-08 17:59 UTC (permalink / raw)
  To: dccp

On Fri, May 8, 2020 at 10:52 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 5/8/20 10:41 AM, Andrii Nakryiko wrote:
> > On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> Make libbpf aware of the newly added program type, and assign it a
> >> section name.
> >>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> >>   tools/lib/bpf/libbpf.c        | 3 +++
> >>   tools/lib/bpf/libbpf.h        | 2 ++
> >>   tools/lib/bpf/libbpf.map      | 2 ++
> >>   tools/lib/bpf/libbpf_probes.c | 1 +
> >>   4 files changed, 8 insertions(+)
> >>
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index 977add1b73e2..74f4a15dc19e 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -6524,6 +6524,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
> >>   BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
> >>   BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
> >>   BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
> >> +BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
> >>
> >>   enum bpf_attach_type
> >>   bpf_program__get_expected_attach_type(struct bpf_program *prog)
> >> @@ -6684,6 +6685,8 @@ static const struct bpf_sec_def section_defs[] = {
> >>          BPF_EAPROG_SEC("cgroup/setsockopt",     BPF_PROG_TYPE_CGROUP_SOCKOPT,
> >>                                                  BPF_CGROUP_SETSOCKOPT),
> >>          BPF_PROG_SEC("struct_ops",              BPF_PROG_TYPE_STRUCT_OPS),
> >> +       BPF_EAPROG_SEC("sk_lookup",             BPF_PROG_TYPE_SK_LOOKUP,
> >> +                                               BPF_SK_LOOKUP),
> >>   };
> >>
> >>   #undef BPF_PROG_SEC_IMPL
> >> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> >> index f1dacecb1619..8373fbacbba3 100644
> >> --- a/tools/lib/bpf/libbpf.h
> >> +++ b/tools/lib/bpf/libbpf.h
> >> @@ -337,6 +337,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
> >>   LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
> >>   LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
> >>   LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
> >> +LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
> >>
> >>   LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
> >>   LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
> >> @@ -364,6 +365,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
> >>   LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
> >>   LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
> >>   LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
> >> +LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
> >
> > cc Yonghong, bpf_iter programs should probably have similar
> > is_xxx/set_xxx functions?..
>
> Not sure about this. bpf_iter programs have prog type TRACING
> which is covered by the above bpf_program__is_tracing.

Ah, right, never mind then, sorry.

>
> >
> >>
> >>   /*
> >>    * No need for __attribute__((packed)), all members of 'bpf_map_def'
> >> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> >> index e03bd4db827e..113ac0a669c2 100644
> >> --- a/tools/lib/bpf/libbpf.map
> >> +++ b/tools/lib/bpf/libbpf.map
> >> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
> >>                  bpf_program__set_attach_target;
> >>                  bpf_program__set_lsm;
> >>                  bpf_set_link_xdp_fd_opts;
> >> +               bpf_program__is_sk_lookup;
> >> +               bpf_program__set_sk_lookup;
> >>   } LIBBPF_0.0.7;
> >>
> >
> > 0.0.8 is sealed, please add them into 0.0.9 map below
> >
> >>   LIBBPF_0.0.9 {
> >> diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
> >> index 2c92059c0c90..5c6d3e49f254 100644
> >> --- a/tools/lib/bpf/libbpf_probes.c
> >> +++ b/tools/lib/bpf/libbpf_probes.c
> >> @@ -109,6 +109,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
> >>          case BPF_PROG_TYPE_STRUCT_OPS:
> >>          case BPF_PROG_TYPE_EXT:
> >>          case BPF_PROG_TYPE_LSM:
> >> +       case BPF_PROG_TYPE_SK_LOOKUP:
> >>          default:
> >>                  break;
> >>          }
> >> --
> >> 2.25.3
> >>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-08 18:39         ` Martin KaFai Lau
  -1 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-08 18:39 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> >> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
> >> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
> >> transport layer when looking up a socket for a received packet.
> >>
> >> When called, SK_LOOKUP program can select a socket that will receive the
> >> packet. This serves as a mechanism to overcome the limits of what bind()
> >> API allows to express. Two use-cases driving this work are:
> >>
> >>  (1) steer packets destined to an IP range, fixed port to a socket
> >>
> >>      192.0.2.0/24, port 80 -> NGINX socket
> >>
> >>  (2) steer packets destined to an IP address, any port to a socket
> >>
> >>      198.51.100.1, any port -> L7 proxy socket
> >>
> >> In its run-time context, program receives information about the packet that
> >> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> >> address 4-tuple. Context can be further extended to include ingress
> >> interface identifier.
> >>
> >> To select a socket BPF program fetches it from a map holding socket
> >> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> >> helper to record the selection. Transport layer then uses the selected
> >> socket as a result of socket lookup.
> >>
> >> This patch only enables the user to attach an SK_LOOKUP program to a
> >> network namespace. Subsequent patches hook it up to run on local delivery
> >> path in ipv4 and ipv6 stacks.
> >>
> >> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> >> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> 

[...]

> >> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
> >> +	   struct sock *, sk, u64, flags)
> >> +{
> >> +	if (unlikely(flags != 0))
> >> +		return -EINVAL;
> >> +	if (unlikely(!sk_fullsock(sk)))
> > May be ARG_PTR_TO_SOCKET instead?
> 
> I had ARG_PTR_TO_SOCKET initially, then switched to SOCK_COMMON to match
> the TC bpf_sk_assign proto. Now that you point it out, it makes more
> sense to be more specific in the helper proto.
> 
> >
> >> +		return -ESOCKTNOSUPPORT;
> >> +
> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
> >> +	if (sk->sk_protocol != ctx->protocol)
> >> +		return -EPROTOTYPE;
> >> +	if (sk->sk_family != ctx->family &&
> >> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
> >> +		return -EAFNOSUPPORT;
> >> +
> >> +	/* Select socket as lookup result */
> >> +	ctx->selected_sk = sk;
> > Could sk be a TCP_ESTABLISHED sk?
> 
> Yes, and what's worse, it could be ref-counted. This is a bug. I should
> be rejecting ref counted sockets here.
Agree. ref-counted (i.e. checking rcu protected or not) is the right check
here.

An unrelated quick thought, it may still be fine for the
TCP_ESTABLISHED tcp_sk returned from sock_map because of the
"call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
I was more thinking about in the future, what if this helper can take
other sk not coming from sock_map.

> 
> Callers of __inet_lookup_listener() and inet6_lookup_listener() expect
> an RCU-freed socket on return.
> 
> For UDP lookup, returning a TCP_ESTABLISHED (connected) socket is okay.
> 
> 
> Thank you for valuable comments. Will fix all of the above in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-08 18:39         ` Martin KaFai Lau
  0 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-08 18:39 UTC (permalink / raw)
  To: dccp

On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> >> Add a new program type BPF_PROG_TYPE_SK_LOOKUP and a dedicated attach type
> >> called BPF_SK_LOOKUP. The new program kind is to be invoked by the
> >> transport layer when looking up a socket for a received packet.
> >>
> >> When called, SK_LOOKUP program can select a socket that will receive the
> >> packet. This serves as a mechanism to overcome the limits of what bind()
> >> API allows to express. Two use-cases driving this work are:
> >>
> >>  (1) steer packets destined to an IP range, fixed port to a socket
> >>
> >>      192.0.2.0/24, port 80 -> NGINX socket
> >>
> >>  (2) steer packets destined to an IP address, any port to a socket
> >>
> >>      198.51.100.1, any port -> L7 proxy socket
> >>
> >> In its run-time context, program receives information about the packet that
> >> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> >> address 4-tuple. Context can be further extended to include ingress
> >> interface identifier.
> >>
> >> To select a socket BPF program fetches it from a map holding socket
> >> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> >> helper to record the selection. Transport layer then uses the selected
> >> socket as a result of socket lookup.
> >>
> >> This patch only enables the user to attach an SK_LOOKUP program to a
> >> network namespace. Subsequent patches hook it up to run on local delivery
> >> path in ipv4 and ipv6 stacks.
> >>
> >> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> >> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> ---
> 

[...]

> >> +BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
> >> +	   struct sock *, sk, u64, flags)
> >> +{
> >> +	if (unlikely(flags != 0))
> >> +		return -EINVAL;
> >> +	if (unlikely(!sk_fullsock(sk)))
> > May be ARG_PTR_TO_SOCKET instead?
> 
> I had ARG_PTR_TO_SOCKET initially, then switched to SOCK_COMMON to match
> the TC bpf_sk_assign proto. Now that you point it out, it makes more
> sense to be more specific in the helper proto.
> 
> >
> >> +		return -ESOCKTNOSUPPORT;
> >> +
> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
> >> +	if (sk->sk_protocol != ctx->protocol)
> >> +		return -EPROTOTYPE;
> >> +	if (sk->sk_family != ctx->family &&
> >> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
> >> +		return -EAFNOSUPPORT;
> >> +
> >> +	/* Select socket as lookup result */
> >> +	ctx->selected_sk = sk;
> > Could sk be a TCP_ESTABLISHED sk?
> 
> Yes, and what's worse, it could be ref-counted. This is a bug. I should
> be rejecting ref counted sockets here.
Agree. ref-counted (i.e. checking rcu protected or not) is the right check
here.

An unrelated quick thought, it may still be fine for the
TCP_ESTABLISHED tcp_sk returned from sock_map because of the
"call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
I was more thinking about in the future, what if this helper can take
other sk not coming from sock_map.

> 
> Callers of __inet_lookup_listener() and inet6_lookup_listener() expect
> an RCU-freed socket on return.
> 
> For UDP lookup, returning a TCP_ESTABLISHED (connected) socket is okay.
> 
> 
> Thank you for valuable comments. Will fix all of the above in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
  2020-05-06 12:55   ` Jakub Sitnicki
@ 2020-05-11  8:12       ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-11  8:12 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, Networking, bpf, dccp, kernel-team,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Eric Dumazet, Gerrit Renker, Jakub Kicinski

On Fri, May 08, 2020 at 07:41 PM CEST, Andrii Nakryiko wrote:
> On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:

[...]

>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index e03bd4db827e..113ac0a669c2 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
>>                 bpf_program__set_attach_target;
>>                 bpf_program__set_lsm;
>>                 bpf_set_link_xdp_fd_opts;
>> +               bpf_program__is_sk_lookup;
>> +               bpf_program__set_sk_lookup;
>>  } LIBBPF_0.0.7;
>>
>
> 0.0.8 is sealed, please add them into 0.0.9 map below
>

Ah, thanks. I missed that been rebases. Will fix in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type
@ 2020-05-11  8:12       ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-11  8:12 UTC (permalink / raw)
  To: dccp

On Fri, May 08, 2020 at 07:41 PM CEST, Andrii Nakryiko wrote:
> On Wed, May 6, 2020 at 5:58 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:

[...]

>> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
>> index e03bd4db827e..113ac0a669c2 100644
>> --- a/tools/lib/bpf/libbpf.map
>> +++ b/tools/lib/bpf/libbpf.map
>> @@ -253,6 +253,8 @@ LIBBPF_0.0.8 {
>>                 bpf_program__set_attach_target;
>>                 bpf_program__set_lsm;
>>                 bpf_set_link_xdp_fd_opts;
>> +               bpf_program__is_sk_lookup;
>> +               bpf_program__set_sk_lookup;
>>  } LIBBPF_0.0.7;
>>
>
> 0.0.8 is sealed, please add them into 0.0.9 map below
>

Ah, thanks. I missed that been rebases. Will fix in v2.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-11  9:08           ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-11  9:08 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
> On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
>> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
>> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:

[...]

>> >> +		return -ESOCKTNOSUPPORT;
>> >> +
>> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> >> +	if (sk->sk_protocol != ctx->protocol)
>> >> +		return -EPROTOTYPE;
>> >> +	if (sk->sk_family != ctx->family &&
>> >> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
>> >> +		return -EAFNOSUPPORT;
>> >> +
>> >> +	/* Select socket as lookup result */
>> >> +	ctx->selected_sk = sk;
>> > Could sk be a TCP_ESTABLISHED sk?
>>
>> Yes, and what's worse, it could be ref-counted. This is a bug. I should
>> be rejecting ref counted sockets here.
> Agree. ref-counted (i.e. checking rcu protected or not) is the right check
> here.
>
> An unrelated quick thought, it may still be fine for the
> TCP_ESTABLISHED tcp_sk returned from sock_map because of the
> "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
> I was more thinking about in the future, what if this helper can take
> other sk not coming from sock_map.

I see, psock holds a sock reference and will not release it until a full
grace period has elapsed.

Even if holding a ref wasn't a problem, I'm not sure if returning a
TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
(tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
when processing a SYN to TIME_WAIT socket.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-11  9:08           ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-11  9:08 UTC (permalink / raw)
  To: dccp

On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
> On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
>> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
>> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:

[...]

>> >> +		return -ESOCKTNOSUPPORT;
>> >> +
>> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> >> +	if (sk->sk_protocol != ctx->protocol)
>> >> +		return -EPROTOTYPE;
>> >> +	if (sk->sk_family != ctx->family &&
>> >> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
>> >> +		return -EAFNOSUPPORT;
>> >> +
>> >> +	/* Select socket as lookup result */
>> >> +	ctx->selected_sk = sk;
>> > Could sk be a TCP_ESTABLISHED sk?
>>
>> Yes, and what's worse, it could be ref-counted. This is a bug. I should
>> be rejecting ref counted sockets here.
> Agree. ref-counted (i.e. checking rcu protected or not) is the right check
> here.
>
> An unrelated quick thought, it may still be fine for the
> TCP_ESTABLISHED tcp_sk returned from sock_map because of the
> "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
> I was more thinking about in the future, what if this helper can take
> other sk not coming from sock_map.

I see, psock holds a sock reference and will not release it until a full
grace period has elapsed.

Even if holding a ref wasn't a problem, I'm not sure if returning a
TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
(tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
when processing a SYN to TIME_WAIT socket.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-11 18:59             ` Martin KaFai Lau
  -1 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-11 18:59 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> 
> [...]
> 
> >> >> +		return -ESOCKTNOSUPPORT;
> >> >> +
> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
> >> >> +	if (sk->sk_protocol != ctx->protocol)
> >> >> +		return -EPROTOTYPE;
> >> >> +	if (sk->sk_family != ctx->family &&
> >> >> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
> >> >> +		return -EAFNOSUPPORT;
> >> >> +
> >> >> +	/* Select socket as lookup result */
> >> >> +	ctx->selected_sk = sk;
> >> > Could sk be a TCP_ESTABLISHED sk?
> >>
> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
> >> be rejecting ref counted sockets here.
> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
> > here.
> >
> > An unrelated quick thought, it may still be fine for the
> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
> > I was more thinking about in the future, what if this helper can take
> > other sk not coming from sock_map.
> 
> I see, psock holds a sock reference and will not release it until a full
> grace period has elapsed.
> 
> Even if holding a ref wasn't a problem, I'm not sure if returning a
> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
> when processing a SYN to TIME_WAIT socket.
Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
if there is no use case for it.

Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
From the cover letter use cases, it is not clear to me it is
required.

or both should only support unconnected sk?

Regardless, this details will be useful in the helper's doc.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-11 18:59             ` Martin KaFai Lau
  0 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-11 18:59 UTC (permalink / raw)
  To: dccp

On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> 
> [...]
> 
> >> >> +		return -ESOCKTNOSUPPORT;
> >> >> +
> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
> >> >> +	if (sk->sk_protocol != ctx->protocol)
> >> >> +		return -EPROTOTYPE;
> >> >> +	if (sk->sk_family != ctx->family &&
> >> >> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
> >> >> +		return -EAFNOSUPPORT;
> >> >> +
> >> >> +	/* Select socket as lookup result */
> >> >> +	ctx->selected_sk = sk;
> >> > Could sk be a TCP_ESTABLISHED sk?
> >>
> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
> >> be rejecting ref counted sockets here.
> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
> > here.
> >
> > An unrelated quick thought, it may still be fine for the
> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
> > I was more thinking about in the future, what if this helper can take
> > other sk not coming from sock_map.
> 
> I see, psock holds a sock reference and will not release it until a full
> grace period has elapsed.
> 
> Even if holding a ref wasn't a problem, I'm not sure if returning a
> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
> when processing a SYN to TIME_WAIT socket.
Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
if there is no use case for it.

Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
From the cover letter use cases, it is not clear to me it is
required.

or both should only support unconnected sk?

Regardless, this details will be useful in the helper's doc.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-11 19:26               ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-11 19:26 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote:
> On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
>> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
>> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
>> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
>> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
>>
>> [...]
>>
>> >> >> +		return -ESOCKTNOSUPPORT;
>> >> >> +
>> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> >> >> +	if (sk->sk_protocol != ctx->protocol)
>> >> >> +		return -EPROTOTYPE;
>> >> >> +	if (sk->sk_family != ctx->family &&
>> >> >> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
>> >> >> +		return -EAFNOSUPPORT;
>> >> >> +
>> >> >> +	/* Select socket as lookup result */
>> >> >> +	ctx->selected_sk = sk;
>> >> > Could sk be a TCP_ESTABLISHED sk?
>> >>
>> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
>> >> be rejecting ref counted sockets here.
>> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
>> > here.
>> >
>> > An unrelated quick thought, it may still be fine for the
>> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
>> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
>> > I was more thinking about in the future, what if this helper can take
>> > other sk not coming from sock_map.
>>
>> I see, psock holds a sock reference and will not release it until a full
>> grace period has elapsed.
>>
>> Even if holding a ref wasn't a problem, I'm not sure if returning a
>> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
>> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
>> when processing a SYN to TIME_WAIT socket.
> Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
> if there is no use case for it.

Ack, I didn't think you were. Just explored the consequences.

> Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
> From the cover letter use cases, it is not clear to me it is
> required.
>
> or both should only support unconnected sk?

No, we don't have a use case for selecting a connected UDP socket.

I left it as a possiblity because __udp[46]_lib_lookup, where BPF
sk_lookup is invoked from, can return one.

Perhaps the user would like to connect the selected receiving socket
(for instance to itself) to ensure its not used for TX?

I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign
to select only unconnected UDP sockets, if returning a connected one
doesn't make sense.

> Regardless, this details will be useful in the helper's doc.

I've reworded the helper doc in v2 to say:

        Description
                ...

                Only TCP listeners and UDP sockets, that is sockets
                which have *SOCK_RCU_FREE* flag set, can be selected.

                ...
        Return
                ...

                **-ESOCKTNOSUPPORT** if socket does not use RCU freeing.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-11 19:26               ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-11 19:26 UTC (permalink / raw)
  To: dccp

On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote:
> On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
>> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
>> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
>> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
>> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
>>
>> [...]
>>
>> >> >> +		return -ESOCKTNOSUPPORT;
>> >> >> +
>> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> >> >> +	if (sk->sk_protocol != ctx->protocol)
>> >> >> +		return -EPROTOTYPE;
>> >> >> +	if (sk->sk_family != ctx->family &&
>> >> >> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
>> >> >> +		return -EAFNOSUPPORT;
>> >> >> +
>> >> >> +	/* Select socket as lookup result */
>> >> >> +	ctx->selected_sk = sk;
>> >> > Could sk be a TCP_ESTABLISHED sk?
>> >>
>> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
>> >> be rejecting ref counted sockets here.
>> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
>> > here.
>> >
>> > An unrelated quick thought, it may still be fine for the
>> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
>> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
>> > I was more thinking about in the future, what if this helper can take
>> > other sk not coming from sock_map.
>>
>> I see, psock holds a sock reference and will not release it until a full
>> grace period has elapsed.
>>
>> Even if holding a ref wasn't a problem, I'm not sure if returning a
>> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
>> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
>> when processing a SYN to TIME_WAIT socket.
> Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
> if there is no use case for it.

Ack, I didn't think you were. Just explored the consequences.

> Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
> From the cover letter use cases, it is not clear to me it is
> required.
>
> or both should only support unconnected sk?

No, we don't have a use case for selecting a connected UDP socket.

I left it as a possiblity because __udp[46]_lib_lookup, where BPF
sk_lookup is invoked from, can return one.

Perhaps the user would like to connect the selected receiving socket
(for instance to itself) to ensure its not used for TX?

I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign
to select only unconnected UDP sockets, if returning a connected one
doesn't make sense.

> Regardless, this details will be useful in the helper's doc.

I've reworded the helper doc in v2 to say:

        Description
                ...

                Only TCP listeners and UDP sockets, that is sockets
                which have *SOCK_RCU_FREE* flag set, can be selected.

                ...
        Return
                ...

                **-ESOCKTNOSUPPORT** if socket does not use RCU freeing.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-11 20:54                 ` Martin KaFai Lau
  -1 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-11 20:54 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Mon, May 11, 2020 at 09:26:02PM +0200, Jakub Sitnicki wrote:
> On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote:
> > On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
> >> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
> >> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
> >> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> >> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> >>
> >> [...]
> >>
> >> >> >> +		return -ESOCKTNOSUPPORT;
> >> >> >> +
> >> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
> >> >> >> +	if (sk->sk_protocol != ctx->protocol)
> >> >> >> +		return -EPROTOTYPE;
> >> >> >> +	if (sk->sk_family != ctx->family &&
> >> >> >> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
> >> >> >> +		return -EAFNOSUPPORT;
> >> >> >> +
> >> >> >> +	/* Select socket as lookup result */
> >> >> >> +	ctx->selected_sk = sk;
> >> >> > Could sk be a TCP_ESTABLISHED sk?
> >> >>
> >> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
> >> >> be rejecting ref counted sockets here.
> >> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
> >> > here.
> >> >
> >> > An unrelated quick thought, it may still be fine for the
> >> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
> >> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
> >> > I was more thinking about in the future, what if this helper can take
> >> > other sk not coming from sock_map.
> >>
> >> I see, psock holds a sock reference and will not release it until a full
> >> grace period has elapsed.
> >>
> >> Even if holding a ref wasn't a problem, I'm not sure if returning a
> >> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
> >> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
> >> when processing a SYN to TIME_WAIT socket.
> > Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
> > if there is no use case for it.
> 
> Ack, I didn't think you were. Just explored the consequences.
> 
> > Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
> > From the cover letter use cases, it is not clear to me it is
> > required.
> >
> > or both should only support unconnected sk?
> 
> No, we don't have a use case for selecting a connected UDP socket.
> 
> I left it as a possiblity because __udp[46]_lib_lookup, where BPF
> sk_lookup is invoked from, can return one.
> 
> Perhaps the user would like to connect the selected receiving socket
> (for instance to itself) to ensure its not used for TX?
> 
> I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign
> to select only unconnected UDP sockets, if returning a connected one
> doesn't make sense.
OTOH, my concern is:
TCP's SK_LOOKUP can override the kernel choice on TCP_LISTEN sk.
UDP's SK_LOOKUP can override the kernel choice on unconnected sk but
not the connected sk.

It could be quite confusing to bpf user if a bpf_prog was written to return
both connected and unconnected UDP sk and logically expect both
will be done before the kernel's choice.

> 
> > Regardless, this details will be useful in the helper's doc.
> 
> I've reworded the helper doc in v2 to say:
> 
>         Description
>                 ...
> 
>                 Only TCP listeners and UDP sockets, that is sockets
>                 which have *SOCK_RCU_FREE* flag set, can be selected.
> 
>                 ...
>         Return
>                 ...
> 
>                 **-ESOCKTNOSUPPORT** if socket does not use RCU freeing.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-11 20:54                 ` Martin KaFai Lau
  0 siblings, 0 replies; 68+ messages in thread
From: Martin KaFai Lau @ 2020-05-11 20:54 UTC (permalink / raw)
  To: dccp

On Mon, May 11, 2020 at 09:26:02PM +0200, Jakub Sitnicki wrote:
> On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote:
> > On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
> >> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
> >> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
> >> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
> >> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
> >>
> >> [...]
> >>
> >> >> >> +		return -ESOCKTNOSUPPORT;
> >> >> >> +
> >> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
> >> >> >> +	if (sk->sk_protocol != ctx->protocol)
> >> >> >> +		return -EPROTOTYPE;
> >> >> >> +	if (sk->sk_family != ctx->family &&
> >> >> >> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
> >> >> >> +		return -EAFNOSUPPORT;
> >> >> >> +
> >> >> >> +	/* Select socket as lookup result */
> >> >> >> +	ctx->selected_sk = sk;
> >> >> > Could sk be a TCP_ESTABLISHED sk?
> >> >>
> >> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
> >> >> be rejecting ref counted sockets here.
> >> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
> >> > here.
> >> >
> >> > An unrelated quick thought, it may still be fine for the
> >> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
> >> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
> >> > I was more thinking about in the future, what if this helper can take
> >> > other sk not coming from sock_map.
> >>
> >> I see, psock holds a sock reference and will not release it until a full
> >> grace period has elapsed.
> >>
> >> Even if holding a ref wasn't a problem, I'm not sure if returning a
> >> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
> >> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
> >> when processing a SYN to TIME_WAIT socket.
> > Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
> > if there is no use case for it.
> 
> Ack, I didn't think you were. Just explored the consequences.
> 
> > Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
> > From the cover letter use cases, it is not clear to me it is
> > required.
> >
> > or both should only support unconnected sk?
> 
> No, we don't have a use case for selecting a connected UDP socket.
> 
> I left it as a possiblity because __udp[46]_lib_lookup, where BPF
> sk_lookup is invoked from, can return one.
> 
> Perhaps the user would like to connect the selected receiving socket
> (for instance to itself) to ensure its not used for TX?
> 
> I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign
> to select only unconnected UDP sockets, if returning a connected one
> doesn't make sense.
OTOH, my concern is:
TCP's SK_LOOKUP can override the kernel choice on TCP_LISTEN sk.
UDP's SK_LOOKUP can override the kernel choice on unconnected sk but
not the connected sk.

It could be quite confusing to bpf user if a bpf_prog was written to return
both connected and unconnected UDP sk and logically expect both
will be done before the kernel's choice.

> 
> > Regardless, this details will be useful in the helper's doc.
> 
> I've reworded the helper doc in v2 to say:
> 
>         Description
>                 ...
> 
>                 Only TCP listeners and UDP sockets, that is sockets
>                 which have *SOCK_RCU_FREE* flag set, can be selected.
> 
>                 ...
>         Return
>                 ...
> 
>                 **-ESOCKTNOSUPPORT** if socket does not use RCU freeing.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-05-06 12:54   ` Jakub Sitnicki
@ 2020-05-12 14:16                   ` Jakub Sitnicki
  -1 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-12 14:16 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: netdev, bpf, dccp, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Eric Dumazet, Gerrit Renker,
	Jakub Kicinski, Marek Majkowski, Lorenz Bauer

On Mon, May 11, 2020 at 10:54 PM CEST, Martin KaFai Lau wrote:
> On Mon, May 11, 2020 at 09:26:02PM +0200, Jakub Sitnicki wrote:
>> On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote:
>> > On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
>> >> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
>> >> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
>> >> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
>> >> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
>> >>
>> >> [...]
>> >>
>> >> >> >> +		return -ESOCKTNOSUPPORT;
>> >> >> >> +
>> >> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> >> >> >> +	if (sk->sk_protocol != ctx->protocol)
>> >> >> >> +		return -EPROTOTYPE;
>> >> >> >> +	if (sk->sk_family != ctx->family &&
>> >> >> >> +	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
>> >> >> >> +		return -EAFNOSUPPORT;
>> >> >> >> +
>> >> >> >> +	/* Select socket as lookup result */
>> >> >> >> +	ctx->selected_sk = sk;
>> >> >> > Could sk be a TCP_ESTABLISHED sk?
>> >> >>
>> >> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
>> >> >> be rejecting ref counted sockets here.
>> >> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
>> >> > here.
>> >> >
>> >> > An unrelated quick thought, it may still be fine for the
>> >> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
>> >> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
>> >> > I was more thinking about in the future, what if this helper can take
>> >> > other sk not coming from sock_map.
>> >>
>> >> I see, psock holds a sock reference and will not release it until a full
>> >> grace period has elapsed.
>> >>
>> >> Even if holding a ref wasn't a problem, I'm not sure if returning a
>> >> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
>> >> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
>> >> when processing a SYN to TIME_WAIT socket.
>> > Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
>> > if there is no use case for it.
>>
>> Ack, I didn't think you were. Just explored the consequences.
>>
>> > Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
>> > From the cover letter use cases, it is not clear to me it is
>> > required.
>> >
>> > or both should only support unconnected sk?
>>
>> No, we don't have a use case for selecting a connected UDP socket.
>>
>> I left it as a possiblity because __udp[46]_lib_lookup, where BPF
>> sk_lookup is invoked from, can return one.
>>
>> Perhaps the user would like to connect the selected receiving socket
>> (for instance to itself) to ensure its not used for TX?
>>
>> I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign
>> to select only unconnected UDP sockets, if returning a connected one
>> doesn't make sense.
> OTOH, my concern is:
> TCP's SK_LOOKUP can override the kernel choice on TCP_LISTEN sk.
> UDP's SK_LOOKUP can override the kernel choice on unconnected sk but
> not the connected sk.
>
> It could be quite confusing to bpf user if a bpf_prog was written to return
> both connected and unconnected UDP sk and logically expect both
> will be done before the kernel's choice.
>

That's a fair point. I've been looking at this from the PoV of in-kernel
callers of udp socket lookup, which now seems wrong.

I agree it would a be surprising if not confusing UAPI. Will limit it to
just unconnected UDP in v3.

Thanks for raising the concern,
Jakub

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
@ 2020-05-12 14:16                   ` Jakub Sitnicki
  0 siblings, 0 replies; 68+ messages in thread
From: Jakub Sitnicki @ 2020-05-12 14:16 UTC (permalink / raw)
  To: dccp

On Mon, May 11, 2020 at 10:54 PM CEST, Martin KaFai Lau wrote:
> On Mon, May 11, 2020 at 09:26:02PM +0200, Jakub Sitnicki wrote:
>> On Mon, May 11, 2020 at 08:59 PM CEST, Martin KaFai Lau wrote:
>> > On Mon, May 11, 2020 at 11:08:15AM +0200, Jakub Sitnicki wrote:
>> >> On Fri, May 08, 2020 at 08:39 PM CEST, Martin KaFai Lau wrote:
>> >> > On Fri, May 08, 2020 at 12:45:14PM +0200, Jakub Sitnicki wrote:
>> >> >> On Fri, May 08, 2020 at 09:06 AM CEST, Martin KaFai Lau wrote:
>> >> >> > On Wed, May 06, 2020 at 02:54:58PM +0200, Jakub Sitnicki wrote:
>> >>
>> >> [...]
>> >>
>> >> >> >> +		return -ESOCKTNOSUPPORT;
>> >> >> >> +
>> >> >> >> +	/* Check if socket is suitable for packet L3/L4 protocol */
>> >> >> >> +	if (sk->sk_protocol != ctx->protocol)
>> >> >> >> +		return -EPROTOTYPE;
>> >> >> >> +	if (sk->sk_family != ctx->family &&
>> >> >> >> +	    (sk->sk_family = AF_INET || ipv6_only_sock(sk)))
>> >> >> >> +		return -EAFNOSUPPORT;
>> >> >> >> +
>> >> >> >> +	/* Select socket as lookup result */
>> >> >> >> +	ctx->selected_sk = sk;
>> >> >> > Could sk be a TCP_ESTABLISHED sk?
>> >> >>
>> >> >> Yes, and what's worse, it could be ref-counted. This is a bug. I should
>> >> >> be rejecting ref counted sockets here.
>> >> > Agree. ref-counted (i.e. checking rcu protected or not) is the right check
>> >> > here.
>> >> >
>> >> > An unrelated quick thought, it may still be fine for the
>> >> > TCP_ESTABLISHED tcp_sk returned from sock_map because of the
>> >> > "call_rcu(&psock->rcu, sk_psock_destroy);" in sk_psock_drop().
>> >> > I was more thinking about in the future, what if this helper can take
>> >> > other sk not coming from sock_map.
>> >>
>> >> I see, psock holds a sock reference and will not release it until a full
>> >> grace period has elapsed.
>> >>
>> >> Even if holding a ref wasn't a problem, I'm not sure if returning a
>> >> TCP_ESTABLISHED socket wouldn't trip up callers of inet_lookup_listener
>> >> (tcp_v4_rcv and nf_tproxy_handle_time_wait4), that look for a listener
>> >> when processing a SYN to TIME_WAIT socket.
>> > Not suggesting the sk_assign helper has to support TCP_ESTABLISHED in TCP
>> > if there is no use case for it.
>>
>> Ack, I didn't think you were. Just explored the consequences.
>>
>> > Do you have a use case on supporting TCP_ESTABLISHED sk in UDP?
>> > From the cover letter use cases, it is not clear to me it is
>> > required.
>> >
>> > or both should only support unconnected sk?
>>
>> No, we don't have a use case for selecting a connected UDP socket.
>>
>> I left it as a possiblity because __udp[46]_lib_lookup, where BPF
>> sk_lookup is invoked from, can return one.
>>
>> Perhaps the user would like to connect the selected receiving socket
>> (for instance to itself) to ensure its not used for TX?
>>
>> I've pulled this scenario out of the hat. Happy to limit bpf_sk_assign
>> to select only unconnected UDP sockets, if returning a connected one
>> doesn't make sense.
> OTOH, my concern is:
> TCP's SK_LOOKUP can override the kernel choice on TCP_LISTEN sk.
> UDP's SK_LOOKUP can override the kernel choice on unconnected sk but
> not the connected sk.
>
> It could be quite confusing to bpf user if a bpf_prog was written to return
> both connected and unconnected UDP sk and logically expect both
> will be done before the kernel's choice.
>

That's a fair point. I've been looking at this from the PoV of in-kernel
callers of udp socket lookup, which now seems wrong.

I agree it would a be surprising if not confusing UAPI. Will limit it to
just unconnected UDP in v3.

Thanks for raising the concern,
Jakub

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2020-05-12 14:16 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-06 12:54 [PATCH bpf-next 00/17] Run a BPF program on socket lookup Jakub Sitnicki
2020-05-06 12:54 ` Jakub Sitnicki
2020-05-06 12:54 ` [PATCH bpf-next 01/17] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
2020-05-06 12:54   ` Jakub Sitnicki
2020-05-06 12:54 ` [PATCH bpf-next 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-05-06 12:54   ` Jakub Sitnicki
2020-05-06 13:16   ` Lorenz Bauer
2020-05-06 13:16     ` Lorenz Bauer
2020-05-06 13:53     ` Jakub Sitnicki
2020-05-06 13:53       ` Jakub Sitnicki
2020-05-07 20:55       ` Martin KaFai Lau
2020-05-07 20:55         ` Martin KaFai Lau
2020-05-08  8:54         ` Jakub Sitnicki
2020-05-08  8:54           ` Jakub Sitnicki
2020-05-08  7:06   ` Martin KaFai Lau
2020-05-08  7:06     ` Martin KaFai Lau
2020-05-08 10:45     ` Jakub Sitnicki
2020-05-08 10:45       ` Jakub Sitnicki
2020-05-08 18:39       ` Martin KaFai Lau
2020-05-08 18:39         ` Martin KaFai Lau
2020-05-11  9:08         ` Jakub Sitnicki
2020-05-11  9:08           ` Jakub Sitnicki
2020-05-11 18:59           ` Martin KaFai Lau
2020-05-11 18:59             ` Martin KaFai Lau
2020-05-11 19:26             ` Jakub Sitnicki
2020-05-11 19:26               ` Jakub Sitnicki
2020-05-11 20:54               ` Martin KaFai Lau
2020-05-11 20:54                 ` Martin KaFai Lau
2020-05-12 14:16                 ` Jakub Sitnicki
2020-05-12 14:16                   ` Jakub Sitnicki
2020-05-06 12:54 ` [PATCH bpf-next 03/17] inet: Store layer 4 protocol in inet_hashinfo Jakub Sitnicki
2020-05-06 12:54   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 04/17] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 05/17] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 06/17] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 07/17] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 08/17] udp: Store layer 4 protocol in udp_table Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 09/17] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 10/17] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 11/17] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 12/17] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 13/17] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 14/17] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-08 17:41   ` Andrii Nakryiko
2020-05-08 17:41     ` Andrii Nakryiko
2020-05-08 17:52     ` Yonghong Song
2020-05-08 17:52       ` Yonghong Song
2020-05-08 17:59       ` Andrii Nakryiko
2020-05-08 17:59         ` Andrii Nakryiko
2020-05-11  8:12     ` Jakub Sitnicki
2020-05-11  8:12       ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 15/17] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 16/17] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki
2020-05-06 12:55 ` [PATCH bpf-next 17/17] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-05-06 12:55   ` Jakub Sitnicki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.