bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3 00/16] bpfilter
@ 2022-12-24  0:03 Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 01/16] bpfilter: add types for usermode helper Quentin Deslandes
                   ` (17 more replies)
  0 siblings, 18 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

The patchset is based on the patches from David S. Miller [1],
Daniel Borkmann [2], and Dmitrii Banshchikov [3].

Note: I've partially sent this patchset earlier due to a
mistake on my side, sorry for then noise.

The main goal of the patchset is to prepare bpfilter for
iptables' configuration blob parsing and code generation.

The patchset introduces data structures and code for matches,
targets, rules and tables. Beside that the code generation
is introduced.

The first version of the code generation supports only "inline"
mode - all chains and their rules emit instructions in linear
approach.

Things that are not implemented yet:
  1) The process of switching from the previous BPF programs to the
     new set isn't atomic.
  2) No support of device ifindex - it's hardcoded
  3) No helper subprog for counters update

Another problem is using iptables' blobs for tests and filter
table initialization. While it saves lines something more
maintainable should be done here.

The plan for the next iteration:
  1) Add a helper program for counters update
  2) Handle ifindex

Patches 1/2 adds definitions of the used types.
Patch 3 adds logging to bpfilter.
Patch 4 adds an associative map.
Patch 5 add runtime context structure.
Patches 6/7 add code generation infrastructure and TC code generator.
Patches 8/9/10/11/12 add code for matches, targets, rules and table.
Patch 13 adds code generation for table.
Patch 14 handles hooked setsockopt(2) calls.
Patch 15 adds filter table
Patch 16 uses prepared code in main().

Due to poor hardware availability on my side, I've not been able to
benchmark those changes. I plan to get some numbers for the next iteration.

FORWARD filter chain is now supported, however, it's attached to
TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
multiple programs to be attached. I could generate a single program
out of both INPUT and FORWARD chains, but that would prevent another
BPF program to be attached to the interface anyway. If a solution
exists to attach both those programs to XDP while allowing for other
programs to be attached, it requires more investigation. In the meantime,
INPUT and FORWARD filtering is supported using TC.

Most of the code in this series was written by Dmitrii Banshchikov,
my changes are limited to v3. I've tried to reflect this fact in the
commits by adding 'Co-developed-by:' and 'Signed-off-by:' for Dmitrii,
please tell me this was done the wrong way.

v2 -> v3
Chains:
  * Add support for FORWARD filter chain.
  * Add generation of BPF bytecode to assess whether a packet should be
    forwarded or not, using bpf_fib_lookup().
  * Allow for multiple programs to be attached to TC.
  * Allow for multiple TC hooks to be used.
Code generation:
  * Remove duplicated BPF bytecode generation.
  * Fix a bug regarding jump offset during generation.
  * Remove support for XDP from the series, as it's not currently
    used.
Table:
  * Add new filter_table_update_counters() virtual call. It updates
    the table's counter stored in the ipt_entry structure. This way,
    when iptables tries to fetch the values of the counters, bpfilter only
    has to copy the ipt_entry cached in the table structure.
Logging:
  * Refactor logging primitives.
Sockopts:
  * Add support for userspace counters querying.
Rule:
  * Store the rule's index inside struct rule, to each counters'
    map usage.

v1 -> v2
Maps:
  * Use map_upsert instead of separate map_insert and map_update
Matches:
  * Add a new virtual call - gen_inline. The call is used for
  * inline generating of a rule's match.
Targets:
  * Add a new virtual call - gen_inline. The call is used for inline
    generating of a rule's target.
Rules:
  * Add code generation for rules
Table:
  * Add struct table_ops
  * Add map for table_ops
  * Add filter table
  * Reorganize the way filter table is initialized
Sockopts:
  * Install/uninstall BPF programs while handling
    IPT_SO_SET_REPLACE
Code generation:
  * Add first version of the code generation
Dependencies:
  * Add libbpf

v0 -> v1
IO:
  * Use ssize_t in pvm_read, pvm_write for total_bytes
  * Move IO functions into sockopt.c and main.c
Logging:
  * Use LOGLEVEL_EMERG, LOGLEVEL_NOTICE, LOGLEVE_DEBUG
    while logging to /dev/kmsg
  * Prepend log message with <n> where n is log level
  * Conditionally enable BFLOG_DEBUG messages
  * Merge bflog.{h,c} into context.h
Matches:
  * Reorder fields in struct match_ops for tight packing
  * Get rid of struct match_ops_map
  * Rename udp_match_ops to xt_udp
  * Use XT_ALIGN macro
  * Store payload size in match size
  * Move udp match routines into a separate file
Targets:
  * Reorder fields in struct target_ops for tight packing
  * Get rid of struct target_ops_map
  * Add comments for convert_verdict function
Rules:
  * Add validation
Tables:
  * Combine table_map and table_list into table_index
  * Add validation
Sockopts:
  * Handle IPT_SO_GET_REVISION_TARGET

1. https://lore.kernel.org/patchwork/patch/902785/
2. https://lore.kernel.org/patchwork/patch/902783/
3. https://kernel.ubuntu.com/~cking/stress-ng/stress-ng.pdf

Quentin Deslandes (16):
  bpfilter: add types for usermode helper
  tools: add bpfilter usermode helper header
  bpfilter: add logging facility
  bpfilter: add map container
  bpfilter: add runtime context
  bpfilter: add BPF bytecode generation infrastructure
  bpfilter: add support for TC bytecode generation
  bpfilter: add match structure
  bpfilter: add support for src/dst addr and ports
  bpfilter: add target structure
  bpfilter: add rule structure
  bpfilter: add table structure
  bpfilter: add table code generation
  bpfilter: add setsockopt() support
  bpfilter: add filter table
  bpfilter: handle setsockopt() calls

 include/uapi/linux/bpfilter.h                 |  154 +++
 net/bpfilter/Makefile                         |   16 +-
 net/bpfilter/codegen.c                        | 1040 +++++++++++++++++
 net/bpfilter/codegen.h                        |  183 +++
 net/bpfilter/context.c                        |  168 +++
 net/bpfilter/context.h                        |   24 +
 net/bpfilter/filter-table.c                   |  344 ++++++
 net/bpfilter/filter-table.h                   |   18 +
 net/bpfilter/logger.c                         |   52 +
 net/bpfilter/logger.h                         |   80 ++
 net/bpfilter/main.c                           |  132 ++-
 net/bpfilter/map-common.c                     |   51 +
 net/bpfilter/map-common.h                     |   19 +
 net/bpfilter/match.c                          |   55 +
 net/bpfilter/match.h                          |   37 +
 net/bpfilter/rule.c                           |  286 +++++
 net/bpfilter/rule.h                           |   37 +
 net/bpfilter/sockopt.c                        |  533 +++++++++
 net/bpfilter/sockopt.h                        |   15 +
 net/bpfilter/table.c                          |  391 +++++++
 net/bpfilter/table.h                          |   59 +
 net/bpfilter/target.c                         |  203 ++++
 net/bpfilter/target.h                         |   57 +
 net/bpfilter/xt_udp.c                         |  111 ++
 tools/include/uapi/linux/bpfilter.h           |  175 +++
 .../testing/selftests/bpf/bpfilter/.gitignore |    8 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   57 +
 .../selftests/bpf/bpfilter/bpfilter_util.h    |   80 ++
 .../selftests/bpf/bpfilter/test_codegen.c     |  338 ++++++
 .../testing/selftests/bpf/bpfilter/test_map.c |   63 +
 .../selftests/bpf/bpfilter/test_match.c       |   69 ++
 .../selftests/bpf/bpfilter/test_rule.c        |   56 +
 .../selftests/bpf/bpfilter/test_target.c      |   83 ++
 .../selftests/bpf/bpfilter/test_xt_udp.c      |   48 +
 34 files changed, 4999 insertions(+), 43 deletions(-)
 create mode 100644 net/bpfilter/codegen.c
 create mode 100644 net/bpfilter/codegen.h
 create mode 100644 net/bpfilter/context.c
 create mode 100644 net/bpfilter/context.h
 create mode 100644 net/bpfilter/filter-table.c
 create mode 100644 net/bpfilter/filter-table.h
 create mode 100644 net/bpfilter/logger.c
 create mode 100644 net/bpfilter/logger.h
 create mode 100644 net/bpfilter/map-common.c
 create mode 100644 net/bpfilter/map-common.h
 create mode 100644 net/bpfilter/match.c
 create mode 100644 net/bpfilter/match.h
 create mode 100644 net/bpfilter/rule.c
 create mode 100644 net/bpfilter/rule.h
 create mode 100644 net/bpfilter/sockopt.c
 create mode 100644 net/bpfilter/sockopt.h
 create mode 100644 net/bpfilter/table.c
 create mode 100644 net/bpfilter/table.h
 create mode 100644 net/bpfilter/target.c
 create mode 100644 net/bpfilter/target.h
 create mode 100644 net/bpfilter/xt_udp.c
 create mode 100644 tools/include/uapi/linux/bpfilter.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/.gitignore
 create mode 100644 tools/testing/selftests/bpf/bpfilter/Makefile
 create mode 100644 tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_codegen.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_map.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_match.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_rule.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_target.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_xt_udp.c

--
2.38.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 01/16] bpfilter: add types for usermode helper
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 02/16] tools: add bpfilter usermode helper header Quentin Deslandes
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Add required definitions that mirror existing iptables' ABI. Those
definitions are needed by usermode helper.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 include/uapi/linux/bpfilter.h | 154 ++++++++++++++++++++++++++++++++++
 1 file changed, 154 insertions(+)

diff --git a/include/uapi/linux/bpfilter.h b/include/uapi/linux/bpfilter.h
index cbc1f5813f50..295fd9caa3c8 100644
--- a/include/uapi/linux/bpfilter.h
+++ b/include/uapi/linux/bpfilter.h
@@ -3,6 +3,10 @@
 #define _UAPI_LINUX_BPFILTER_H
 
 #include <linux/if.h>
+#include <linux/const.h>
+
+#define BPFILTER_STANDARD_TARGET        ""
+#define BPFILTER_ERROR_TARGET           "ERROR"
 
 enum {
 	BPFILTER_IPT_SO_SET_REPLACE = 64,
@@ -18,4 +22,154 @@ enum {
 	BPFILTER_IPT_GET_MAX,
 };
 
+enum {
+	BPFILTER_XT_TABLE_MAXNAMELEN = 32,
+	BPFILTER_FUNCTION_MAXNAMELEN = 30,
+	BPFILTER_EXTENSION_MAXNAMELEN = 29,
+};
+
+enum {
+	BPFILTER_NF_DROP = 0,
+	BPFILTER_NF_ACCEPT = 1,
+	BPFILTER_NF_STOLEN = 2,
+	BPFILTER_NF_QUEUE = 3,
+	BPFILTER_NF_REPEAT = 4,
+	BPFILTER_NF_STOP = 5,
+	BPFILTER_NF_MAX_VERDICT = BPFILTER_NF_STOP,
+	BPFILTER_RETURN = (-BPFILTER_NF_REPEAT - 1),
+};
+
+enum {
+	BPFILTER_INET_HOOK_PRE_ROUTING = 0,
+	BPFILTER_INET_HOOK_LOCAL_IN = 1,
+	BPFILTER_INET_HOOK_FORWARD = 2,
+	BPFILTER_INET_HOOK_LOCAL_OUT = 3,
+	BPFILTER_INET_HOOK_POST_ROUTING = 4,
+	BPFILTER_INET_HOOK_MAX,
+};
+
+enum {
+	BPFILTER_IPT_F_MASK = 0x03,
+	BPFILTER_IPT_INV_MASK = 0x7f
+};
+
+struct bpfilter_ipt_match {
+	union {
+		struct {
+			__u16 match_size;
+			char name[BPFILTER_EXTENSION_MAXNAMELEN];
+			__u8 revision;
+		} user;
+		struct {
+			__u16 match_size;
+			void *match;
+		} kernel;
+		__u16 match_size;
+	} u;
+	unsigned char data[];
+};
+
+struct bpfilter_ipt_target {
+	union {
+		struct {
+			__u16 target_size;
+			char name[BPFILTER_EXTENSION_MAXNAMELEN];
+			__u8 revision;
+		} user;
+		struct {
+			__u16 target_size;
+			void *target;
+		} kernel;
+		__u16 target_size;
+	} u;
+	unsigned char data[];
+};
+
+struct bpfilter_ipt_standard_target {
+	struct bpfilter_ipt_target target;
+	int verdict;
+};
+
+struct bpfilter_ipt_error_target {
+	struct bpfilter_ipt_target target;
+	char error_name[BPFILTER_FUNCTION_MAXNAMELEN];
+};
+
+struct bpfilter_ipt_get_info {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 valid_hooks;
+	__u32 hook_entry[BPFILTER_INET_HOOK_MAX];
+	__u32 underflow[BPFILTER_INET_HOOK_MAX];
+	__u32 num_entries;
+	__u32 size;
+};
+
+struct bpfilter_ipt_counters {
+	__u64 packet_cnt;
+	__u64 byte_cnt;
+};
+
+struct bpfilter_ipt_counters_info {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 num_counters;
+	struct bpfilter_ipt_counters counters[];
+};
+
+struct bpfilter_ipt_get_revision {
+	char name[BPFILTER_EXTENSION_MAXNAMELEN];
+	__u8 revision;
+};
+
+struct bpfilter_ipt_ip {
+	__u32 src;
+	__u32 dst;
+	__u32 src_mask;
+	__u32 dst_mask;
+	char in_iface[IFNAMSIZ];
+	char out_iface[IFNAMSIZ];
+	__u8 in_iface_mask[IFNAMSIZ];
+	__u8 out_iface_mask[IFNAMSIZ];
+	__u16 protocol;
+	__u8 flags;
+	__u8 invflags;
+};
+
+struct bpfilter_ipt_entry {
+	struct bpfilter_ipt_ip ip;
+	__u32 bfcache;
+	__u16 target_offset;
+	__u16 next_offset;
+	__u32 comefrom;
+	struct bpfilter_ipt_counters counters;
+	__u8 elems[];
+};
+
+struct bpfilter_ipt_standard_entry {
+	struct bpfilter_ipt_entry entry;
+	struct bpfilter_ipt_standard_target target;
+};
+
+struct bpfilter_ipt_error_entry {
+	struct bpfilter_ipt_entry entry;
+	struct bpfilter_ipt_error_target target;
+};
+
+struct bpfilter_ipt_get_entries {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 size;
+	struct bpfilter_ipt_entry entries[];
+};
+
+struct bpfilter_ipt_replace {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 valid_hooks;
+	__u32 num_entries;
+	__u32 size;
+	__u32 hook_entry[BPFILTER_INET_HOOK_MAX];
+	__u32 underflow[BPFILTER_INET_HOOK_MAX];
+	__u32 num_counters;
+	struct bpfilter_ipt_counters *cntrs;
+	struct bpfilter_ipt_entry entries[];
+};
+
 #endif /* _UAPI_LINUX_BPFILTER_H */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 02/16] tools: add bpfilter usermode helper header
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 01/16] bpfilter: add types for usermode helper Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 03/16] bpfilter: add logging facility Quentin Deslandes
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Add header containing bpfilter structures definitions, for test
purposes.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 tools/include/uapi/linux/bpfilter.h | 175 ++++++++++++++++++++++++++++
 1 file changed, 175 insertions(+)
 create mode 100644 tools/include/uapi/linux/bpfilter.h

diff --git a/tools/include/uapi/linux/bpfilter.h b/tools/include/uapi/linux/bpfilter.h
new file mode 100644
index 000000000000..295fd9caa3c8
--- /dev/null
+++ b/tools/include/uapi/linux/bpfilter.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_LINUX_BPFILTER_H
+#define _UAPI_LINUX_BPFILTER_H
+
+#include <linux/if.h>
+#include <linux/const.h>
+
+#define BPFILTER_STANDARD_TARGET        ""
+#define BPFILTER_ERROR_TARGET           "ERROR"
+
+enum {
+	BPFILTER_IPT_SO_SET_REPLACE = 64,
+	BPFILTER_IPT_SO_SET_ADD_COUNTERS = 65,
+	BPFILTER_IPT_SET_MAX,
+};
+
+enum {
+	BPFILTER_IPT_SO_GET_INFO = 64,
+	BPFILTER_IPT_SO_GET_ENTRIES = 65,
+	BPFILTER_IPT_SO_GET_REVISION_MATCH = 66,
+	BPFILTER_IPT_SO_GET_REVISION_TARGET = 67,
+	BPFILTER_IPT_GET_MAX,
+};
+
+enum {
+	BPFILTER_XT_TABLE_MAXNAMELEN = 32,
+	BPFILTER_FUNCTION_MAXNAMELEN = 30,
+	BPFILTER_EXTENSION_MAXNAMELEN = 29,
+};
+
+enum {
+	BPFILTER_NF_DROP = 0,
+	BPFILTER_NF_ACCEPT = 1,
+	BPFILTER_NF_STOLEN = 2,
+	BPFILTER_NF_QUEUE = 3,
+	BPFILTER_NF_REPEAT = 4,
+	BPFILTER_NF_STOP = 5,
+	BPFILTER_NF_MAX_VERDICT = BPFILTER_NF_STOP,
+	BPFILTER_RETURN = (-BPFILTER_NF_REPEAT - 1),
+};
+
+enum {
+	BPFILTER_INET_HOOK_PRE_ROUTING = 0,
+	BPFILTER_INET_HOOK_LOCAL_IN = 1,
+	BPFILTER_INET_HOOK_FORWARD = 2,
+	BPFILTER_INET_HOOK_LOCAL_OUT = 3,
+	BPFILTER_INET_HOOK_POST_ROUTING = 4,
+	BPFILTER_INET_HOOK_MAX,
+};
+
+enum {
+	BPFILTER_IPT_F_MASK = 0x03,
+	BPFILTER_IPT_INV_MASK = 0x7f
+};
+
+struct bpfilter_ipt_match {
+	union {
+		struct {
+			__u16 match_size;
+			char name[BPFILTER_EXTENSION_MAXNAMELEN];
+			__u8 revision;
+		} user;
+		struct {
+			__u16 match_size;
+			void *match;
+		} kernel;
+		__u16 match_size;
+	} u;
+	unsigned char data[];
+};
+
+struct bpfilter_ipt_target {
+	union {
+		struct {
+			__u16 target_size;
+			char name[BPFILTER_EXTENSION_MAXNAMELEN];
+			__u8 revision;
+		} user;
+		struct {
+			__u16 target_size;
+			void *target;
+		} kernel;
+		__u16 target_size;
+	} u;
+	unsigned char data[];
+};
+
+struct bpfilter_ipt_standard_target {
+	struct bpfilter_ipt_target target;
+	int verdict;
+};
+
+struct bpfilter_ipt_error_target {
+	struct bpfilter_ipt_target target;
+	char error_name[BPFILTER_FUNCTION_MAXNAMELEN];
+};
+
+struct bpfilter_ipt_get_info {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 valid_hooks;
+	__u32 hook_entry[BPFILTER_INET_HOOK_MAX];
+	__u32 underflow[BPFILTER_INET_HOOK_MAX];
+	__u32 num_entries;
+	__u32 size;
+};
+
+struct bpfilter_ipt_counters {
+	__u64 packet_cnt;
+	__u64 byte_cnt;
+};
+
+struct bpfilter_ipt_counters_info {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 num_counters;
+	struct bpfilter_ipt_counters counters[];
+};
+
+struct bpfilter_ipt_get_revision {
+	char name[BPFILTER_EXTENSION_MAXNAMELEN];
+	__u8 revision;
+};
+
+struct bpfilter_ipt_ip {
+	__u32 src;
+	__u32 dst;
+	__u32 src_mask;
+	__u32 dst_mask;
+	char in_iface[IFNAMSIZ];
+	char out_iface[IFNAMSIZ];
+	__u8 in_iface_mask[IFNAMSIZ];
+	__u8 out_iface_mask[IFNAMSIZ];
+	__u16 protocol;
+	__u8 flags;
+	__u8 invflags;
+};
+
+struct bpfilter_ipt_entry {
+	struct bpfilter_ipt_ip ip;
+	__u32 bfcache;
+	__u16 target_offset;
+	__u16 next_offset;
+	__u32 comefrom;
+	struct bpfilter_ipt_counters counters;
+	__u8 elems[];
+};
+
+struct bpfilter_ipt_standard_entry {
+	struct bpfilter_ipt_entry entry;
+	struct bpfilter_ipt_standard_target target;
+};
+
+struct bpfilter_ipt_error_entry {
+	struct bpfilter_ipt_entry entry;
+	struct bpfilter_ipt_error_target target;
+};
+
+struct bpfilter_ipt_get_entries {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 size;
+	struct bpfilter_ipt_entry entries[];
+};
+
+struct bpfilter_ipt_replace {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	__u32 valid_hooks;
+	__u32 num_entries;
+	__u32 size;
+	__u32 hook_entry[BPFILTER_INET_HOOK_MAX];
+	__u32 underflow[BPFILTER_INET_HOOK_MAX];
+	__u32 num_counters;
+	struct bpfilter_ipt_counters *cntrs;
+	struct bpfilter_ipt_entry entries[];
+};
+
+#endif /* _UAPI_LINUX_BPFILTER_H */
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 03/16] bpfilter: add logging facility
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 01/16] bpfilter: add types for usermode helper Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 02/16] tools: add bpfilter usermode helper header Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 04/16] bpfilter: add map container Quentin Deslandes
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

bpfilter will log to /dev/kmsg by default. Four different log levels are
available. LOG_EMERG() will exit the usermode helper after logging.

Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile |  2 +-
 net/bpfilter/logger.c | 52 ++++++++++++++++++++++++++++
 net/bpfilter/logger.h | 80 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 net/bpfilter/logger.c
 create mode 100644 net/bpfilter/logger.h

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index cdac82b8c53a..8d9c726ba1a5 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -4,7 +4,7 @@
 #
 
 userprogs := bpfilter_umh
-bpfilter_umh-objs := main.o
+bpfilter_umh-objs := main.o logger.o
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
 ifeq ($(CONFIG_BPFILTER_UMH), y)
diff --git a/net/bpfilter/logger.c b/net/bpfilter/logger.c
new file mode 100644
index 000000000000..c256bfef7e6c
--- /dev/null
+++ b/net/bpfilter/logger.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#include "logger.h"
+
+#include <errno.h>
+
+static const char *log_file_path = "/dev/kmsg";
+static FILE *log_file;
+
+int logger_init(void)
+{
+	if (log_file)
+		return 0;
+
+	log_file = fopen(log_file_path, "w");
+	if (!log_file)
+		return -errno;
+
+	if (setvbuf(log_file, 0, _IOLBF, 0))
+		return -errno;
+
+	return 0;
+}
+
+void logger_set_file(FILE *file)
+{
+	log_file = file;
+}
+
+FILE *logger_get_file(void)
+{
+	return log_file;
+}
+
+int logger_clean(void)
+{
+	int r;
+
+	if (!log_file)
+		return 0;
+
+	r = fclose(log_file);
+	if (r == EOF)
+		return -errno;
+
+	log_file = NULL;
+
+	return 0;
+}
diff --git a/net/bpfilter/logger.h b/net/bpfilter/logger.h
new file mode 100644
index 000000000000..c44739ec0069
--- /dev/null
+++ b/net/bpfilter/logger.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_LOGGER_H
+#define NET_BPFILTER_LOGGER_H
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syslog.h>
+
+#define _BFLOG_IMPL(level, fmt, ...)					      \
+	do {								      \
+		typeof(level) __level = level;				      \
+		if (logger_get_file()) {				      \
+			fprintf(logger_get_file(), "<%d>bpfilter: " fmt "\n", \
+				(__level), ##__VA_ARGS__);		      \
+		}							      \
+		if ((__level) == LOG_EMERG)				      \
+			exit(EXIT_FAILURE);				      \
+	} while (0)
+
+#define BFLOG_EMERG(fmt, ...) \
+	_BFLOG_IMPL(LOG_KERN | LOG_EMERG, fmt, ##__VA_ARGS__)
+#define BFLOG_ERR(fmt, ...) \
+	_BFLOG_IMPL(LOG_KERN | LOG_ERR, fmt, ##__VA_ARGS__)
+#define BFLOG_NOTICE(fmt, ...) \
+	_BFLOG_IMPL(LOG_KERN | LOG_NOTICE, fmt, ##__VA_ARGS__)
+
+#ifdef DEBUG
+#define BFLOG_DBG(fmt, ...) BFLOG_IMPL(LOG_KERN | LOG_DEBUG, fmt, ##__VA_ARGS__)
+#else
+#define BFLOG_DBG(fmt, ...)
+#endif
+
+#define STRERR(v) strerror(abs(v))
+
+/**
+ * logger_init() - Initialise logging facility.
+ *
+ * This function is used to open a file to write logs to (see @log_file_path).
+ * It must be called before using any logging macro, otherwise log messages
+ * will be discarded.
+ *
+ * Return: 0 on success, negative errno value on error.
+ */
+int logger_init(void);
+
+/**
+ * logger_set_file() - Set the FILE pointer to use to log messages.
+ * @file: new FILE * to the log file.
+ *
+ * This function won't check whether the FILE pointer is valid, nor whether
+ * a file is already opened, this is the responsibility of the caller. Once
+ * logger_set_file() returns, all new log messages will be printed to the
+ * FILE * provided.
+ */
+void logger_set_file(FILE *file);
+
+/**
+ * logger_get_file() - Returns a FILE * pointer to the log file.
+ *
+ * Return: pointer to the file to log to (as a FILE *), or NULL if the file
+ *	is not valid.
+ */
+FILE *logger_get_file(void);
+
+/**
+ * logger_clean() - Close the log file.
+ *
+ * On success, the log file pointer will be NULL. If the function fails,
+ * the log file pointer remain unchanged and the file should be considered open.
+ *
+ * Return: 0 on success, negative errno value on error.
+ */
+int logger_clean(void);
+
+#endif // NET_BPFILTER_LOGGER_H
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 04/16] bpfilter: add map container
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (2 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 03/16] bpfilter: add logging facility Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 05/16] bpfilter: add runtime context Quentin Deslandes
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Introduce common code for an associative container. This common code
will be used for maps of matches, targets, and tables. Hash search
tables from libc are used as an index.

The supported sets of operations is: create, find, upsert, free.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |  2 +-
 net/bpfilter/map-common.c                     | 51 +++++++++++++++
 net/bpfilter/map-common.h                     | 19 ++++++
 .../testing/selftests/bpf/bpfilter/.gitignore |  2 +
 tools/testing/selftests/bpf/bpfilter/Makefile | 19 ++++++
 .../testing/selftests/bpf/bpfilter/test_map.c | 63 +++++++++++++++++++
 6 files changed, 155 insertions(+), 1 deletion(-)
 create mode 100644 net/bpfilter/map-common.c
 create mode 100644 net/bpfilter/map-common.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/.gitignore
 create mode 100644 tools/testing/selftests/bpf/bpfilter/Makefile
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_map.c

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 8d9c726ba1a5..1b0c399c19df 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -4,7 +4,7 @@
 #
 
 userprogs := bpfilter_umh
-bpfilter_umh-objs := main.o logger.o
+bpfilter_umh-objs := main.o logger.o map-common.o
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
 ifeq ($(CONFIG_BPFILTER_UMH), y)
diff --git a/net/bpfilter/map-common.c b/net/bpfilter/map-common.c
new file mode 100644
index 000000000000..cc6c3a59b315
--- /dev/null
+++ b/net/bpfilter/map-common.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#include "map-common.h"
+
+#include <linux/err.h>
+
+#include <errno.h>
+#include <string.h>
+
+int create_map(struct hsearch_data *htab, size_t nelem)
+{
+	memset(htab, 0, sizeof(*htab));
+	if (!hcreate_r(nelem, htab))
+		return -errno;
+
+	return 0;
+}
+
+void *map_find(struct hsearch_data *htab, const char *key)
+{
+	const ENTRY needle = { .key = (char *)key };
+	ENTRY *found;
+
+	if (!hsearch_r(needle, FIND, &found, htab))
+		return ERR_PTR(-ENOENT);
+
+	return found->data;
+}
+
+int map_upsert(struct hsearch_data *htab, const char *key, void *value)
+{
+	const ENTRY needle = { .key = (char *)key, .data = value };
+	ENTRY *found;
+
+	if (!hsearch_r(needle, ENTER, &found, htab))
+		return -errno;
+
+	found->key = (char *)key;
+	found->data = value;
+
+	return 0;
+}
+
+void free_map(struct hsearch_data *htab)
+{
+	hdestroy_r(htab);
+}
diff --git a/net/bpfilter/map-common.h b/net/bpfilter/map-common.h
new file mode 100644
index 000000000000..666a4ffe9b29
--- /dev/null
+++ b/net/bpfilter/map-common.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_MAP_COMMON_H
+#define NET_BPFILTER_MAP_COMMON_H
+
+#define _GNU_SOURCE
+
+#include <search.h>
+
+int create_map(struct hsearch_data *htab, size_t nelem);
+void *map_find(struct hsearch_data *htab, const char *key);
+int map_upsert(struct hsearch_data *htab, const char *key, void *value);
+void free_map(struct hsearch_data *htab);
+
+#endif // NET_BPFILTER_MAP_COMMON_H
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
new file mode 100644
index 000000000000..983fd06cbefa
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+test_map
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
new file mode 100644
index 000000000000..c262aad8c2a4
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -0,0 +1,19 @@
+# SPDX-License-Identifier: GPL-2.0
+
+top_srcdir = ../../../../..
+TOOLSDIR := $(abspath ../../../../)
+TOOLSINCDIR := $(TOOLSDIR)/include
+APIDIR := $(TOOLSINCDIR)/uapi
+BPFILTERSRCDIR := $(top_srcdir)/net/bpfilter
+
+CFLAGS += -Wall -g -pthread -I$(TOOLSINCDIR) -I$(APIDIR) -I$(BPFILTERSRCDIR)
+
+TEST_GEN_PROGS += test_map
+
+KSFT_KHDR_INSTALL := 1
+
+include ../../lib.mk
+
+BPFILTER_MAP_SRCS := $(BPFILTERSRCDIR)/map-common.c
+
+$(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
diff --git a/tools/testing/selftests/bpf/bpfilter/test_map.c b/tools/testing/selftests/bpf/bpfilter/test_map.c
new file mode 100644
index 000000000000..7ed737b78816
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/test_map.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "map-common.h"
+
+#include <linux/err.h>
+
+#include "../../kselftest_harness.h"
+
+FIXTURE(test_map)
+{
+	struct hsearch_data map;
+	const char *key;
+	void *expected;
+	void *actual;
+};
+
+FIXTURE_SETUP(test_map)
+{
+	const int max_nelements = 100;
+
+	create_map(&self->map, max_nelements);
+	self->key = "key";
+	self->expected = "expected";
+	self->actual = "actual";
+}
+
+FIXTURE_TEARDOWN(test_map)
+{
+	free_map(&self->map);
+}
+
+TEST_F(test_map, upsert_and_find)
+{
+	void *found;
+
+	found = map_find(&self->map, self->key);
+	ASSERT_TRUE(IS_ERR(found))
+	ASSERT_EQ(-ENOENT, PTR_ERR(found))
+
+	ASSERT_EQ(0, map_upsert(&self->map, self->key, self->expected));
+	ASSERT_EQ(0, map_upsert(&self->map, self->key, self->expected));
+	ASSERT_EQ(0, map_upsert(&self->map, self->key, self->actual));
+
+	found = map_find(&self->map, self->key);
+
+	ASSERT_FALSE(IS_ERR(found));
+	ASSERT_STREQ(self->actual, found);
+}
+
+TEST_F(test_map, update)
+{
+	void *found;
+
+	ASSERT_EQ(0, map_upsert(&self->map, self->key, self->actual));
+	ASSERT_EQ(0, map_upsert(&self->map, self->key, self->expected));
+
+	found = map_find(&self->map, self->key);
+
+	ASSERT_FALSE(IS_ERR(found));
+	ASSERT_STREQ(self->expected, found);
+}
+
+TEST_HARNESS_MAIN
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 05/16] bpfilter: add runtime context
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (3 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 04/16] bpfilter: add map container Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 06/16] bpfilter: add BPF bytecode generation infrastructure Quentin Deslandes
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Create struct context to store bpfilter's runtime context. Eventually,
this structure will contain the maps/tables containing ops structures
for matches, targets, tables...

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile  |  1 +
 net/bpfilter/context.c | 18 ++++++++++++++++++
 net/bpfilter/context.h | 16 ++++++++++++++++
 3 files changed, 35 insertions(+)
 create mode 100644 net/bpfilter/context.c
 create mode 100644 net/bpfilter/context.h

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 1b0c399c19df..9878f5fd8152 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -5,6 +5,7 @@
 
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
+bpfilter_umh-objs += context.o
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
 ifeq ($(CONFIG_BPFILTER_UMH), y)
diff --git a/net/bpfilter/context.c b/net/bpfilter/context.c
new file mode 100644
index 000000000000..fdfd5fe78424
--- /dev/null
+++ b/net/bpfilter/context.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "context.h"
+
+int create_context(struct context *ctx)
+{
+	return 0;
+}
+
+void free_context(struct context *ctx)
+{
+}
diff --git a/net/bpfilter/context.h b/net/bpfilter/context.h
new file mode 100644
index 000000000000..df41b9707a81
--- /dev/null
+++ b/net/bpfilter/context.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_CONTEXT_H
+#define NET_BPFILTER_CONTEXT_H
+
+struct context {
+};
+
+int create_context(struct context *ctx);
+void free_context(struct context *ctx);
+
+#endif // NET_BPFILTER_CONTEXT_H
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 06/16] bpfilter: add BPF bytecode generation infrastructure
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (4 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 05/16] bpfilter: add runtime context Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 07/16] bpfilter: add support for TC bytecode generation Quentin Deslandes
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Prepare codegen infrastructure to be used by matches, targets, rules,
and tables.

struct codegen contains an array of struct bpf_insn representing the
generated BPF program.

The current infrastructure allows for multiple BPF program flavours to
be supported (TC, XDP...). Most of the logic will be shared, but each
flavour will be able to define its own prologue and epilogue bytecode,
as well as packet data access. Loading and unloading flow is also
flavour-dependent.

Not all required information is known during generation. This commit
introduces two bpfilter concepts to resolve this issue:
- Fixup: placeholder to replace once code generation is complete. For
example, fixup is used to jump to the next rule. The next rule's
offset is only known once it has been generated.
- Relocation: placeholder to replace before loading the BPF program. BPF
maps are an example of features using relocation. Maps are created
before the programs are loaded, so their FD is only known at that
point in time.

Subprogs are required to support user-defined chains and helper
subprograms. All already generated subprogs are stored in subprogs
array. This sorted array acts as an index. All subprogs awaiting
the generation phase are stored in awaiting_subprogs list.

struct shared_codegen is used to share data between various BPF programs
created by BPF filter. The only currently supported shared data is the
map containing the counters for each rule defined: a unique map shared
between all the programs stores the counters for all the bpfilter
programs.

Besides that, there is a runtime_context struct that might be used to
store frequently required data such as the size of the packet and pointer to
L3/L4 headers. This context is stored on the stack and there are macros
to access individual fields of this struct.  Immediately after
runtime_context on stack, there is a scratchpad area.

The calling convention follows the BPF calling convention with a couple
of additions:
* CODEGEN_REG_CTX(BPF_REG_9) is a pointer to the program context
* CODEGEN_REG_RUNTIME_CTX(BPF_REG_8) is a pointer to the runtime context

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |  12 +-
 net/bpfilter/codegen.c                        | 530 ++++++++++++++++++
 net/bpfilter/codegen.h                        | 181 ++++++
 .../testing/selftests/bpf/bpfilter/.gitignore |   1 +
 tools/testing/selftests/bpf/bpfilter/Makefile |  19 +
 5 files changed, 742 insertions(+), 1 deletion(-)
 create mode 100644 net/bpfilter/codegen.c
 create mode 100644 net/bpfilter/codegen.h

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 9878f5fd8152..ac039f1fac34 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -3,11 +3,21 @@
 # Makefile for the Linux BPFILTER layer.
 #
 
+LIBBPF_SRCS = $(srctree)/tools/lib/bpf/
+LIBBPF_A = $(obj)/libbpf.a
+LIBBPF_OUT = $(abspath $(obj))
+
+$(LIBBPF_A):
+	$(Q)$(MAKE) -C $(LIBBPF_SRCS) O=$(LIBBPF_OUT)/ OUTPUT=$(LIBBPF_OUT)/ $(LIBBPF_OUT)/libbpf.a
+
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
-bpfilter_umh-objs += context.o
+bpfilter_umh-objs += context.o codegen.o
+bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
+$(obj)/bpfilter_umh: $(LIBBPF_A)
+
 ifeq ($(CONFIG_BPFILTER_UMH), y)
 # builtin bpfilter_umh should be linked with -static
 # since rootfs isn't mounted at the time of __init
diff --git a/net/bpfilter/codegen.c b/net/bpfilter/codegen.c
new file mode 100644
index 000000000000..545bc7aeb77c
--- /dev/null
+++ b/net/bpfilter/codegen.c
@@ -0,0 +1,530 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#include "codegen.h"
+
+#include "../../include/uapi/linux/bpfilter.h"
+
+#include <unistd.h>
+#include <sys/syscall.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "logger.h"
+
+enum fixup_insn_type {
+	FIXUP_INSN_OFF,
+	FIXUP_INSN_IMM,
+	__MAX_FIXUP_INSN_TYPE
+};
+
+static int sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)
+{
+	return syscall(SYS_bpf, cmd, attr, size);
+}
+
+static __u64 bpf_ptr_to_u64(const void *ptr)
+{
+	return (__u64)(unsigned long)ptr;
+}
+
+static int subprog_desc_comparator(const void *x, const void *y)
+{
+	const struct codegen_subprog_desc *subprog_x = *(const struct codegen_subprog_desc **)x;
+	const struct codegen_subprog_desc *subprog_y = *(const struct codegen_subprog_desc **)y;
+
+	if (subprog_x->type != subprog_y->type)
+		return subprog_x->type - subprog_y->type;
+
+	if (subprog_x->type == CODEGEN_SUBPROG_USER_CHAIN)
+		return subprog_x->offset - subprog_y->offset;
+
+	BUG_ON(1);
+
+	return -1;
+}
+
+static const struct codegen_subprog_desc *codegen_find_subprog(struct codegen *codegen,
+							       const struct codegen_subprog_desc **subprog)
+{
+	const struct codegen_subprog_desc **found;
+
+	found = bsearch(subprog, codegen->subprogs, codegen->subprogs_cur,
+			sizeof(codegen->subprogs[0]), subprog_desc_comparator);
+
+	return found ? *found : NULL;
+}
+
+static const struct codegen_subprog_desc *codegen_find_user_chain_subprog(struct codegen *codegen,
+									  uint32_t offset)
+{
+	const struct codegen_subprog_desc subprog = {
+		.type = CODEGEN_SUBPROG_USER_CHAIN,
+		.offset = offset
+	};
+	const struct codegen_subprog_desc *subprog_ptr = &subprog;
+
+	return codegen_find_subprog(codegen, &subprog_ptr);
+}
+
+int codegen_push_awaiting_subprog(struct codegen *codegen,
+				  struct codegen_subprog_desc *subprog)
+{
+	struct list_head *t, *n;
+
+	if (codegen_find_subprog(codegen, (const struct codegen_subprog_desc **)&subprog)) {
+		free(subprog);
+		return 0;
+	}
+
+	list_for_each_safe(t, n, &codegen->awaiting_subprogs) {
+		struct codegen_subprog_desc *awaiting_subprog;
+
+		awaiting_subprog = list_entry(t, struct codegen_subprog_desc, list);
+		if (!subprog_desc_comparator(&awaiting_subprog, &subprog)) {
+			free(subprog);
+			return 0;
+		}
+	}
+
+	list_add_tail(&subprog->list, &codegen->awaiting_subprogs);
+
+	return 0;
+}
+
+static int codegen_fixup_insn(struct bpf_insn *insn, enum fixup_insn_type type,
+			      __s32 v)
+{
+	switch (type) {
+	case FIXUP_INSN_OFF:
+		if (insn->off) {
+			BFLOG_ERR("missing instruction offset");
+			return -EINVAL;
+		}
+
+		insn->off = v;
+
+		return 0;
+	case FIXUP_INSN_IMM:
+		if (insn->imm) {
+			BFLOG_ERR("missing instruction immediate value");
+			return -EINVAL;
+		}
+
+		insn->imm = v;
+
+		return 0;
+	default:
+		BFLOG_ERR("invalid fixup instruction type");
+		return -EINVAL;
+	}
+}
+
+int codegen_fixup(struct codegen *codegen, enum codegen_fixup_type fixup_type)
+{
+	struct list_head *t, *n;
+
+	list_for_each_safe(t, n, &codegen->fixup) {
+		enum fixup_insn_type type = __MAX_FIXUP_INSN_TYPE;
+		struct codegen_fixup_desc *fixup;
+		struct bpf_insn *insn;
+		__s32 v;
+		int r;
+
+		fixup = list_entry(t, struct codegen_fixup_desc, list);
+		if (fixup->type != fixup_type)
+			continue;
+
+		if (fixup->type >= __MAX_CODEGEN_FIXUP_TYPE) {
+			BFLOG_ERR("invalid instruction fixup type: %d",
+				  fixup->type);
+			return -EINVAL;
+		}
+
+		if (fixup->insn > codegen->len_cur) {
+			BFLOG_ERR("invalid instruction fixup offset");
+			return -EINVAL;
+		}
+
+		insn = &codegen->img[fixup->insn];
+
+		if (fixup_type == CODEGEN_FIXUP_NEXT_RULE ||
+		    fixup_type == CODEGEN_FIXUP_END_OF_CHAIN) {
+			type = FIXUP_INSN_OFF;
+			v = codegen->len_cur - fixup->insn - 1;
+		}
+
+		if (fixup_type == CODEGEN_FIXUP_JUMP_TO_CHAIN) {
+			const struct codegen_subprog_desc *subprog;
+
+			subprog = codegen_find_user_chain_subprog(codegen,
+								  fixup->offset);
+			if (!subprog) {
+				BFLOG_ERR("subprogram not found for offset %d",
+					  fixup->offset);
+				return -EINVAL;
+			}
+
+			type = FIXUP_INSN_OFF;
+			v = subprog->insn - fixup->insn - 1;
+		}
+
+		if (fixup_type == CODEGEN_FIXUP_COUNTERS_INDEX) {
+			type = FIXUP_INSN_IMM;
+			BFLOG_DBG("fixup counter for rule %d", codegen->rule_index);
+			v = codegen->rule_index;
+		}
+
+		r = codegen_fixup_insn(insn, type, v);
+		if (r) {
+			BFLOG_ERR("failed to fixup codegen instruction: %s",
+				  STRERR(r));
+			return r;
+		}
+
+		list_del(t);
+		free(fixup);
+	}
+
+	return 0;
+}
+
+int emit_fixup(struct codegen *codegen, enum codegen_fixup_type fixup_type,
+	       struct bpf_insn insn)
+{
+	struct codegen_fixup_desc *fixup;
+
+	fixup = malloc(sizeof(*fixup));
+	if (!fixup) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&fixup->list);
+	fixup->type = fixup_type;
+	fixup->insn = codegen->len_cur;
+	list_add_tail(&fixup->list, &codegen->fixup);
+
+	EMIT(codegen, insn);
+
+	return 0;
+}
+
+int emit_add_counter(struct codegen *codegen)
+{
+	struct bpf_insn insns[2] = { BPF_LD_MAP_FD(BPF_REG_ARG1, 0) };
+	struct codegen_reloc_desc *reloc;
+
+	reloc = malloc(sizeof(*reloc));
+	if (!reloc) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&reloc->list);
+	reloc->type = CODEGEN_RELOC_MAP;
+	reloc->map = CODEGEN_MAP_COUNTERS;
+	reloc->insn = codegen->len_cur;
+	list_add_tail(&reloc->list, &codegen->relocs);
+
+	EMIT(codegen, insns[0]);
+	EMIT(codegen, insns[1]);
+
+	EMIT_FIXUP(codegen, CODEGEN_FIXUP_COUNTERS_INDEX,
+		   BPF_ST_MEM(BPF_W, BPF_REG_10, STACK_SCRATCHPAD_OFFSET - 4, 0));
+	EMIT(codegen, BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_10));
+	EMIT(codegen,
+	     BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, STACK_SCRATCHPAD_OFFSET - 4));
+	EMIT(codegen, BPF_EMIT_CALL(BPF_FUNC_map_lookup_elem));
+	EMIT(codegen, BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 14));
+
+	reloc = malloc(sizeof(*reloc));
+	if (!reloc) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&reloc->list);
+	reloc->type = CODEGEN_RELOC_MAP;
+	reloc->map = CODEGEN_MAP_COUNTERS;
+	reloc->insn = codegen->len_cur;
+	list_add_tail(&reloc->list, &codegen->relocs);
+
+	EMIT(codegen, insns[0]);
+	EMIT(codegen, insns[1]);
+
+	EMIT(codegen, BPF_LDX_MEM(BPF_DW, CODEGEN_REG_SCRATCH5, BPF_REG_0, 0));
+	EMIT(codegen, BPF_LDX_MEM(BPF_DW, CODEGEN_REG_SCRATCH4, BPF_REG_0, 8));
+	EMIT(codegen, BPF_LDX_MEM(BPF_W, CODEGEN_REG_SCRATCH3, CODEGEN_REG_RUNTIME_CTX,
+				  STACK_RUNTIME_CONTEXT_OFFSET(data_size)));
+	EMIT(codegen, BPF_ALU64_IMM(BPF_ADD, CODEGEN_REG_SCRATCH5, 1));
+	EMIT(codegen,
+	     BPF_ALU64_REG(BPF_ADD, CODEGEN_REG_SCRATCH4, CODEGEN_REG_SCRATCH3));
+	EMIT(codegen, BPF_STX_MEM(BPF_DW, BPF_REG_0, CODEGEN_REG_SCRATCH5, 0));
+	EMIT(codegen, BPF_STX_MEM(BPF_DW, BPF_REG_0, CODEGEN_REG_SCRATCH4, 8));
+	EMIT(codegen, BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_10));
+	EMIT(codegen,
+	     BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2, STACK_SCRATCHPAD_OFFSET - 4));
+	EMIT(codegen, BPF_MOV64_REG(BPF_REG_ARG3, BPF_REG_0));
+	EMIT(codegen, BPF_MOV32_IMM(BPF_REG_ARG4, BPF_EXIST));
+	EMIT(codegen, BPF_EMIT_CALL(BPF_FUNC_map_update_elem));
+
+	return 0;
+}
+
+static int codegen_reloc(struct codegen *codegen)
+{
+	struct shared_codegen *shared_codegen;
+	struct list_head *t;
+
+	shared_codegen = codegen->shared_codegen;
+
+	list_for_each(t, &codegen->relocs) {
+		struct codegen_reloc_desc *reloc;
+		struct bpf_insn *insn;
+
+		reloc = list_entry(t, struct codegen_reloc_desc, list);
+
+		if (reloc->insn >= codegen->len_cur) {
+			BFLOG_ERR("invalid instruction relocation offset");
+			return -EINVAL;
+		}
+
+		insn = &codegen->img[reloc->insn];
+
+		if (reloc->type == CODEGEN_RELOC_MAP) {
+			enum codegen_map_type map_type;
+
+			if (codegen->len_cur <= reloc->insn + 1) {
+				BFLOG_ERR("invalid instruction relocation map offset");
+				return -EINVAL;
+			}
+
+			if (insn->code != (BPF_LD | BPF_DW | BPF_IMM)) {
+				BFLOG_ERR("invalid instruction relocation code %d",
+					  insn->code);
+				return -EINVAL;
+			}
+
+			map_type = insn->imm;
+			if (map_type < 0 || map_type >= __MAX_CODEGEN_MAP_TYPE) {
+				BFLOG_ERR("invalid instruction relocation map type: %d",
+					  map_type);
+				return -EINVAL;
+			}
+
+			BUG_ON(shared_codegen->maps_fd[map_type] < 0);
+			insn->imm = shared_codegen->maps_fd[map_type];
+
+			continue;
+		}
+
+		BFLOG_ERR("invalid instruction relocation type %d", reloc->type);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int load_maps(struct codegen *codegen)
+{
+	struct shared_codegen *shared_codegen;
+	int i;
+
+	shared_codegen = codegen->shared_codegen;
+
+	if (shared_codegen->maps_refcnt++)
+		return 0;
+
+	for (i = 0; i < __MAX_CODEGEN_MAP_TYPE; ++i) {
+		int j;
+		int fd;
+		int saved_errno;
+		union bpf_attr *map;
+
+		BUG_ON(shared_codegen->maps_fd[i] > -1);
+
+		map = &shared_codegen->maps[i];
+		fd = sys_bpf(BPF_MAP_CREATE, map, sizeof(*map));
+		if (fd > -1) {
+			BFLOG_DBG("opened BPF map with FD %d", fd);
+			shared_codegen->maps_fd[i] = fd;
+			continue;
+		}
+
+		BFLOG_ERR("bpf syscall failed during map creation: %s",
+			  STRERR(fd));
+		saved_errno = errno;
+
+		for (j = 0; j < i; ++j) {
+			close(shared_codegen->maps_fd[j]);
+			shared_codegen->maps_fd[j] = -1;
+		}
+
+		return saved_errno;
+	}
+
+	return 0;
+}
+
+static void unload_maps(struct codegen *codegen)
+{
+	struct shared_codegen *shared_codegen;
+	int i;
+
+	shared_codegen = codegen->shared_codegen;
+
+	if (--shared_codegen->maps_refcnt)
+		return;
+
+	for (i = 0; i < __MAX_CODEGEN_MAP_TYPE; ++i) {
+		if (shared_codegen->maps_fd[i] > -1) {
+			close(shared_codegen->maps_fd[i]);
+			shared_codegen->maps_fd[i] = -1;
+		}
+	}
+}
+
+void create_shared_codegen(struct shared_codegen *shared_codegen)
+{
+	shared_codegen->maps_refcnt = 0;
+
+	shared_codegen->maps[CODEGEN_MAP_COUNTERS].map_type =
+		BPF_MAP_TYPE_PERCPU_ARRAY;
+	shared_codegen->maps[CODEGEN_MAP_COUNTERS].key_size = 4;
+	shared_codegen->maps[CODEGEN_MAP_COUNTERS].value_size =
+		sizeof(struct bpfilter_ipt_counters);
+	shared_codegen->maps[CODEGEN_MAP_COUNTERS].max_entries = 0;
+	snprintf(shared_codegen->maps[CODEGEN_MAP_COUNTERS].map_name,
+		 sizeof(shared_codegen->maps[CODEGEN_MAP_COUNTERS].map_name),
+			"bpfilter_cntrs");
+	shared_codegen->maps_fd[CODEGEN_MAP_COUNTERS] = -1;
+}
+
+int create_codegen(struct codegen *codegen, enum bpf_prog_type type)
+{
+	int r;
+
+	memset(codegen, 0, sizeof(*codegen));
+
+	switch (type) {
+	default:
+		BFLOG_ERR("unsupported BPF program type %d", type);
+		return -EINVAL;
+	}
+
+	codegen->prog_type = type;
+
+	codegen->log_buf_size = 1 << 20;
+	codegen->log_buf = malloc(codegen->log_buf_size);
+	if (!codegen->log_buf) {
+		BFLOG_ERR("out of memory");
+		r = -ENOMEM;
+		goto err_free;
+	}
+
+	codegen->len_max = BPF_MAXINSNS;
+	codegen->img = malloc(codegen->len_max * sizeof(codegen->img[0]));
+	if (!codegen->img) {
+		BFLOG_ERR("out of memory");
+		r = -ENOMEM;
+		goto err_free;
+	}
+
+	codegen->shared_codegen = NULL;
+
+	INIT_LIST_HEAD(&codegen->fixup);
+	INIT_LIST_HEAD(&codegen->relocs);
+	INIT_LIST_HEAD(&codegen->awaiting_subprogs);
+
+	return 0;
+
+err_free:
+	free(codegen->img);
+
+	return r;
+}
+
+int load_img(struct codegen *codegen)
+{
+	union bpf_attr attr = {};
+	int fd;
+	int r;
+
+	r = load_maps(codegen);
+	if (r) {
+		BFLOG_ERR("failed to load maps: %s", STRERR(r));
+		return r;
+	}
+
+	r = codegen_reloc(codegen);
+	if (r) {
+		BFLOG_ERR("failed to generate relocations: %s", STRERR(r));
+		return r;
+	}
+
+	attr.prog_type = codegen->prog_type;
+	attr.insns = bpf_ptr_to_u64(codegen->img);
+	attr.insn_cnt = codegen->len_cur;
+	attr.license = bpf_ptr_to_u64("GPL");
+	attr.prog_ifindex = 0;
+	snprintf(attr.prog_name, sizeof(attr.prog_name), "bpfilter");
+
+	if (codegen->log_buf && codegen->log_buf_size) {
+		attr.log_buf = bpf_ptr_to_u64(codegen->log_buf);
+		attr.log_size = codegen->log_buf_size;
+		attr.log_level = 1;
+	}
+
+	fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+	if (fd == -1) {
+		BFLOG_ERR("failed to load BPF program: %s", codegen->log_buf);
+		return -errno;
+	}
+
+	return fd;
+}
+
+void unload_img(struct codegen *codegen)
+{
+	unload_maps(codegen);
+}
+
+void free_codegen(struct codegen *codegen)
+{
+	struct list_head *t, *n;
+	int i;
+
+	list_for_each_safe(t, n, &codegen->fixup) {
+		struct codegen_fixup_desc *fixup;
+
+		fixup = list_entry(t, struct codegen_fixup_desc, list);
+		free(fixup);
+	}
+
+	list_for_each_safe(t, n, &codegen->relocs) {
+		struct codegen_reloc_desc *reloc;
+
+		reloc = list_entry(t, struct codegen_reloc_desc, list);
+		free(reloc);
+	}
+
+	list_for_each_safe(t, n, &codegen->awaiting_subprogs) {
+		struct codegen_subprog_desc *subprog;
+
+		subprog = list_entry(t, struct codegen_subprog_desc, list);
+		free(subprog);
+	}
+
+	for (i = 0; i < codegen->subprogs_cur; ++i)
+		free(codegen->subprogs[i]);
+	free(codegen->subprogs);
+
+	free(codegen->log_buf);
+	free(codegen->img);
+}
diff --git a/net/bpfilter/codegen.h b/net/bpfilter/codegen.h
new file mode 100644
index 000000000000..cca45a13c4aa
--- /dev/null
+++ b/net/bpfilter/codegen.h
@@ -0,0 +1,181 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_CODEGEN_H
+#define NET_BPFILTER_CODEGEN_H
+
+#include <linux/bpf.h>
+#include <linux/filter.h>
+#include <linux/list.h>
+
+#include <bpf/libbpf.h>
+
+#include <errno.h>
+#include <stddef.h>
+#include <stdint.h>
+
+struct context;
+
+#define CODEGEN_REG_RETVAL	BPF_REG_0
+#define CODEGEN_REG_SCRATCH1	BPF_REG_1
+#define CODEGEN_REG_SCRATCH2	BPF_REG_2
+#define CODEGEN_REG_SCRATCH3	BPF_REG_3
+#define CODEGEN_REG_SCRATCH4	BPF_REG_4
+#define CODEGEN_REG_SCRATCH5	BPF_REG_5
+#define CODEGEN_REG_DATA_END	CODEGEN_REG_SCRATCH5
+#define CODEGEN_REG_L3		BPF_REG_6
+#define CODEGEN_REG_L4		BPF_REG_7
+#define CODEGEN_REG_RUNTIME_CTX BPF_REG_8
+#define CODEGEN_REG_CTX		BPF_REG_9
+
+#define EMIT(codegen, x)					     \
+	do {							     \
+		typeof(codegen) __codegen = codegen;		     \
+		if ((__codegen)->len_cur + 1 > (__codegen)->len_max) \
+			return -ENOMEM;				     \
+		(__codegen)->img[codegen->len_cur++] = (x);	     \
+	} while (0)
+
+#define EMIT_FIXUP(codegen, fixup_type, insn)				       \
+	do {								       \
+		const int __err = emit_fixup((codegen), (fixup_type), (insn)); \
+		if (__err)						       \
+			return __err;					       \
+	} while (0)
+
+#define EMIT_ADD_COUNTER(codegen)			     \
+	do {						     \
+		const int __err = emit_add_counter(codegen); \
+		if (__err)				     \
+			return __err;			     \
+	} while (0)
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+#define EMIT_LITTLE_ENDIAN(codegen, x) EMIT(codegen, x)
+#else
+#define EMIT_LITTLE_ENDIAN(codegen, x)
+#endif
+
+struct runtime_context {
+	uint32_t data_size;
+	void *l3;
+	void *l4;
+};
+
+#define STACK_RUNTIME_CONTEXT_OFFSET(field)		    \
+	(-(short)(offsetof(struct runtime_context, field) + \
+		  sizeof(((struct runtime_context *)NULL)->field)))
+
+#define STACK_SCRATCHPAD_OFFSET (-(short)sizeof(struct runtime_context))
+
+enum codegen_map_type {
+	CODEGEN_MAP_COUNTERS,
+	__MAX_CODEGEN_MAP_TYPE
+};
+
+enum codegen_fixup_type {
+	CODEGEN_FIXUP_NEXT_RULE,
+	CODEGEN_FIXUP_END_OF_CHAIN,
+	CODEGEN_FIXUP_JUMP_TO_CHAIN,
+	CODEGEN_FIXUP_COUNTERS_INDEX,
+	__MAX_CODEGEN_FIXUP_TYPE
+};
+
+struct codegen_fixup_desc {
+	struct list_head list;
+	enum codegen_fixup_type type;
+	uint32_t insn;
+	union {
+		uint32_t offset;
+	};
+};
+
+enum codegen_reloc_type {
+	CODEGEN_RELOC_MAP,
+	__MAX_CODEGEN_RELOC_TYPE
+};
+
+struct codegen_reloc_desc {
+	struct list_head list;
+	enum codegen_reloc_type type;
+	uint32_t insn;
+	union {
+		struct {
+			enum codegen_map_type map;
+			// TODO: add BTF
+		};
+	};
+};
+
+enum codegen_subprog_type {
+	CODEGEN_SUBPROG_USER_CHAIN,
+};
+
+struct codegen_subprog_desc {
+	struct list_head list;
+	enum codegen_subprog_type type;
+	uint32_t insn;
+	union {
+		uint32_t offset;
+	};
+};
+
+struct codegen_ops;
+struct shared_codegen;
+
+struct codegen {
+	struct context *ctx;
+	struct bpf_insn *img;
+	char *log_buf;
+	size_t log_buf_size;
+	int iptables_hook;
+	union {
+		enum bpf_tc_attach_point bpf_tc_hook;
+	};
+	enum bpf_prog_type prog_type;
+	uint32_t len_cur;
+	uint32_t len_max;
+	uint32_t rule_index;
+	const struct codegen_ops *codegen_ops;
+	struct shared_codegen *shared_codegen;
+	struct list_head fixup;
+	struct list_head relocs;
+	struct list_head awaiting_subprogs;
+	uint16_t subprogs_cur;
+	uint16_t subprogs_max;
+	struct codegen_subprog_desc **subprogs;
+	void *img_ctx;
+};
+
+struct shared_codegen {
+	int maps_refcnt;
+	union bpf_attr maps[__MAX_CODEGEN_MAP_TYPE];
+	int maps_fd[__MAX_CODEGEN_MAP_TYPE];
+};
+
+struct codegen_ops {
+	int (*gen_inline_prologue)(struct codegen *codegen);
+	int (*load_packet_data)(struct codegen *codegen, int dst_reg);
+	int (*load_packet_data_end)(struct codegen *codegen, int dst_reg);
+	int (*emit_ret_code)(struct codegen *codegen, int ret_code);
+	int (*gen_inline_epilogue)(struct codegen *codegen);
+	int (*load_img)(struct codegen *codegen);
+	void (*unload_img)(struct codegen *codegen);
+};
+
+void create_shared_codegen(struct shared_codegen *shared_codegen);
+int create_codegen(struct codegen *codegen, enum bpf_prog_type type);
+int codegen_push_awaiting_subprog(struct codegen *codegen,
+				  struct codegen_subprog_desc *subprog);
+int codegen_fixup(struct codegen *codegen, enum codegen_fixup_type fixup_type);
+int emit_fixup(struct codegen *codegen, enum codegen_fixup_type fixup_type,
+	       struct bpf_insn insn);
+int emit_add_counter(struct codegen *codegen);
+int load_img(struct codegen *codegen);
+void unload_img(struct codegen *codegen);
+void free_codegen(struct codegen *codegen);
+
+#endif // NET_BPFILTER_CODEGEN_H
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
index 983fd06cbefa..39ec0c09dff4 100644
--- a/tools/testing/selftests/bpf/bpfilter/.gitignore
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -1,2 +1,3 @@
 # SPDX-License-Identifier: GPL-2.0-only
+tools/**
 test_map
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index c262aad8c2a4..e3b8bf76a10c 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -5,6 +5,8 @@ TOOLSDIR := $(abspath ../../../../)
 TOOLSINCDIR := $(TOOLSDIR)/include
 APIDIR := $(TOOLSINCDIR)/uapi
 BPFILTERSRCDIR := $(top_srcdir)/net/bpfilter
+LIBDIR := $(TOOLSDIR)/lib
+BPFDIR := $(LIBDIR)/bpf
 
 CFLAGS += -Wall -g -pthread -I$(TOOLSINCDIR) -I$(APIDIR) -I$(BPFILTERSRCDIR)
 
@@ -14,6 +16,23 @@ KSFT_KHDR_INSTALL := 1
 
 include ../../lib.mk
 
+SCRATCH_DIR := $(OUTPUT)/tools
+BUILD_DIR := $(SCRATCH_DIR)/build
+BPFOBJ_DIR := $(BUILD_DIR)/libbpf
+BPFOBJ := $(BPFOBJ_DIR)/libbpf.a
+
+MAKE_DIRS := $(BPFOBJ_DIR)
+$(MAKE_DIRS):
+	$(call msg,MKDIR,,$@)
+	$(Q)mkdir -p $@
+
+$(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)			\
+	   ../../../../include/uapi/linux/bpf.h					\
+	   | $(INCLUDE_DIR) $(BUILD_DIR)/libbpf
+	$(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) OUTPUT=$(BUILD_DIR)/libbpf/ 	\
+		    DESTDIR=$(SCRATCH_DIR) prefix= all install_headers
+
 BPFILTER_MAP_SRCS := $(BPFILTERSRCDIR)/map-common.c
+BPFILTER_CODEGEN_SRCS := $(BPFILTERSRCDIR)/codegen.c $(BPFOBJ) -lelf -lz
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 07/16] bpfilter: add support for TC bytecode generation
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (5 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 06/16] bpfilter: add BPF bytecode generation infrastructure Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 08/16] bpfilter: add match structure Quentin Deslandes
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Add code generation support for TC hooks.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/codegen.c | 151 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/net/bpfilter/codegen.c b/net/bpfilter/codegen.c
index 545bc7aeb77c..e7ae7dfa5118 100644
--- a/net/bpfilter/codegen.c
+++ b/net/bpfilter/codegen.c
@@ -8,6 +8,8 @@
 
 #include "../../include/uapi/linux/bpfilter.h"
 
+#include <linux/pkt_cls.h>
+
 #include <unistd.h>
 #include <sys/syscall.h>
 
@@ -15,6 +17,8 @@
 #include <stdlib.h>
 #include <string.h>
 
+#include <bpf/libbpf.h>
+
 #include "logger.h"
 
 enum fixup_insn_type {
@@ -390,6 +394,150 @@ static void unload_maps(struct codegen *codegen)
 	}
 }
 
+static int tc_gen_inline_prologue(struct codegen *codegen)
+{
+	EMIT(codegen, BPF_MOV64_REG(CODEGEN_REG_CTX, BPF_REG_ARG1));
+	EMIT(codegen, BPF_MOV64_REG(CODEGEN_REG_RUNTIME_CTX, BPF_REG_FP));
+	EMIT(codegen, BPF_MOV32_IMM(CODEGEN_REG_RETVAL, TC_ACT_OK));
+
+	return 0;
+}
+
+static int tc_load_packet_data(struct codegen *codegen, int dst_reg)
+{
+	EMIT(codegen, BPF_LDX_MEM(BPF_W, dst_reg, CODEGEN_REG_CTX,
+				  offsetof(struct __sk_buff, data)));
+
+	return 0;
+}
+
+static int tc_load_packet_data_end(struct codegen *codegen, int dst_reg)
+{
+	EMIT(codegen, BPF_LDX_MEM(BPF_W, CODEGEN_REG_DATA_END, CODEGEN_REG_CTX,
+				  offsetof(struct __sk_buff, data_end)));
+
+	return 0;
+}
+
+static int tc_emit_ret_code(struct codegen *codegen, int ret_code)
+{
+	int tc_ret_code;
+
+	if (ret_code == BPFILTER_NF_ACCEPT)
+		tc_ret_code = TC_ACT_UNSPEC;
+	else if (ret_code == BPFILTER_NF_DROP)
+		tc_ret_code = TC_ACT_SHOT;
+	else
+		return -EINVAL;
+
+	EMIT(codegen, BPF_MOV32_IMM(BPF_REG_0, tc_ret_code));
+
+	return 0;
+}
+
+static int tc_gen_inline_epilogue(struct codegen *codegen)
+{
+	EMIT(codegen, BPF_EXIT_INSN());
+
+	return 0;
+}
+
+struct tc_img_ctx {
+	int fd;
+	struct bpf_tc_hook hook;
+	struct bpf_tc_opts opts;
+};
+
+static int tc_load_img(struct codegen *codegen)
+{
+	struct tc_img_ctx *img_ctx;
+	int fd;
+	int r;
+
+	if (codegen->img_ctx) {
+		BFLOG_ERR("TC context missing from codegen");
+		return -EINVAL;
+	}
+
+	img_ctx = calloc(1, sizeof(*img_ctx));
+	if (!img_ctx) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	img_ctx->hook.sz = sizeof(img_ctx->hook);
+	img_ctx->hook.ifindex = 2;
+	img_ctx->hook.attach_point = codegen->bpf_tc_hook;
+
+	fd = load_img(codegen);
+	if (fd < 0) {
+		BFLOG_ERR("failed to load TC codegen image: %s", STRERR(fd));
+		r = fd;
+		goto err_free;
+	}
+
+	r = bpf_tc_hook_create(&img_ctx->hook);
+	if (r && r != -EEXIST) {
+		BFLOG_ERR("failed to create TC hook: %s\n", STRERR(r));
+		goto err_free;
+	}
+
+	img_ctx->opts.sz = sizeof(img_ctx->opts);
+	img_ctx->opts.handle = codegen->iptables_hook;
+	img_ctx->opts.priority = 0;
+	img_ctx->opts.prog_fd = fd;
+	r = bpf_tc_attach(&img_ctx->hook, &img_ctx->opts);
+	if (r) {
+		BFLOG_ERR("failed to attach TC program: %s", STRERR(r));
+		goto err_free;
+	}
+
+	img_ctx->fd = fd;
+	codegen->img_ctx = img_ctx;
+
+	return fd;
+
+err_free:
+	if (fd > -1)
+		close(fd);
+	free(img_ctx);
+	return r;
+}
+
+static void tc_unload_img(struct codegen *codegen)
+{
+	struct tc_img_ctx *img_ctx;
+	int r;
+
+	BUG_ON(!codegen->img_ctx);
+
+	img_ctx = (struct tc_img_ctx *)codegen->img_ctx;
+	img_ctx->opts.flags = 0;
+	img_ctx->opts.prog_fd = 0;
+	img_ctx->opts.prog_id = 0;
+	r = bpf_tc_detach(&img_ctx->hook, &img_ctx->opts);
+	if (r)
+		BFLOG_EMERG("failed to detach TC program: %s", STRERR(r));
+
+	BUG_ON(img_ctx->fd < 0);
+	close(img_ctx->fd);
+	free(img_ctx);
+
+	codegen->img_ctx = NULL;
+
+	unload_img(codegen);
+}
+
+static const struct codegen_ops tc_codegen_ops = {
+	.gen_inline_prologue = tc_gen_inline_prologue,
+	.load_packet_data = tc_load_packet_data,
+	.load_packet_data_end = tc_load_packet_data_end,
+	.emit_ret_code = tc_emit_ret_code,
+	.gen_inline_epilogue = tc_gen_inline_epilogue,
+	.load_img = tc_load_img,
+	.unload_img = tc_unload_img,
+};
+
 void create_shared_codegen(struct shared_codegen *shared_codegen)
 {
 	shared_codegen->maps_refcnt = 0;
@@ -413,6 +561,9 @@ int create_codegen(struct codegen *codegen, enum bpf_prog_type type)
 	memset(codegen, 0, sizeof(*codegen));
 
 	switch (type) {
+	case BPF_PROG_TYPE_SCHED_CLS:
+		codegen->codegen_ops = &tc_codegen_ops;
+		break;
 	default:
 		BFLOG_ERR("unsupported BPF program type %d", type);
 		return -EINVAL;
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 08/16] bpfilter: add match structure
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (6 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 07/16] bpfilter: add support for TC bytecode generation Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 09/16] bpfilter: add support for src/dst addr and ports Quentin Deslandes
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

struct match_ops defines a polymorphic interface for matches. A match
consists of pointers to struct match_ops and struct xt_entry_match which
contains a payload for the match's type.

The match interface supports the following operations:
- check: validate a rule's match.
- gen_inline: generate eBPF bytecode for the match.

All match_ops structures are kept in a map by their name.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |  1 +
 net/bpfilter/context.c                        | 43 ++++++++++++
 net/bpfilter/context.h                        |  3 +
 net/bpfilter/match.c                          | 55 +++++++++++++++
 net/bpfilter/match.h                          | 35 ++++++++++
 .../testing/selftests/bpf/bpfilter/.gitignore |  1 +
 tools/testing/selftests/bpf/bpfilter/Makefile |  7 ++
 .../selftests/bpf/bpfilter/bpfilter_util.h    | 22 ++++++
 .../selftests/bpf/bpfilter/test_match.c       | 69 +++++++++++++++++++
 9 files changed, 236 insertions(+)
 create mode 100644 net/bpfilter/match.c
 create mode 100644 net/bpfilter/match.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_match.c

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index ac039f1fac34..2f8d867a6038 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -13,6 +13,7 @@ $(LIBBPF_A):
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
+bpfilter_umh-objs += match.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/context.c b/net/bpfilter/context.c
index fdfd5fe78424..b5e172412fab 100644
--- a/net/bpfilter/context.c
+++ b/net/bpfilter/context.c
@@ -8,11 +8,54 @@
 
 #include "context.h"
 
+#include <linux/kernel.h>
+
+#include <string.h>
+
+#include "logger.h"
+#include "map-common.h"
+#include "match.h"
+
+static const struct match_ops *match_ops[] = { };
+
+static int init_match_ops_map(struct context *ctx)
+{
+	int r;
+
+	r = create_map(&ctx->match_ops_map, ARRAY_SIZE(match_ops));
+	if (r) {
+		BFLOG_ERR("failed to create matches map: %s", STRERR(r));
+		return r;
+	}
+
+	for (int i = 0; i < ARRAY_SIZE(match_ops); ++i) {
+		const struct match_ops *m = match_ops[i];
+
+		r = map_upsert(&ctx->match_ops_map, m->name, (void *)m);
+		if (r) {
+			BFLOG_ERR("failed to upsert in matches map: %s",
+				  STRERR(r));
+			return r;
+		}
+	}
+
+	return 0;
+}
+
 int create_context(struct context *ctx)
 {
+	int r;
+
+	r = init_match_ops_map(ctx);
+	if (r) {
+		BFLOG_ERR("failed to initialize matches map: %s", STRERR(r));
+		return r;
+	}
+
 	return 0;
 }
 
 void free_context(struct context *ctx)
 {
+	free_map(&ctx->match_ops_map);
 }
diff --git a/net/bpfilter/context.h b/net/bpfilter/context.h
index df41b9707a81..e36aa8ebf57e 100644
--- a/net/bpfilter/context.h
+++ b/net/bpfilter/context.h
@@ -7,7 +7,10 @@
 #ifndef NET_BPFILTER_CONTEXT_H
 #define NET_BPFILTER_CONTEXT_H
 
+#include <search.h>
+
 struct context {
+	struct hsearch_data match_ops_map;
 };
 
 int create_context(struct context *ctx);
diff --git a/net/bpfilter/match.c b/net/bpfilter/match.c
new file mode 100644
index 000000000000..fdb0926442a8
--- /dev/null
+++ b/net/bpfilter/match.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "match.h"
+
+#include <linux/err.h>
+
+#include <errno.h>
+#include <string.h>
+
+#include "context.h"
+#include "logger.h"
+#include "map-common.h"
+
+int init_match(struct context *ctx, const struct bpfilter_ipt_match *ipt_match,
+	       struct match *match)
+{
+	const size_t maxlen = sizeof(ipt_match->u.user.name);
+	const struct match_ops *found;
+	int r;
+
+	if (strnlen(ipt_match->u.user.name, maxlen) == maxlen) {
+		BFLOG_ERR("failed to init match: name too long");
+		return -EINVAL;
+	}
+
+	found = map_find(&ctx->match_ops_map, ipt_match->u.user.name);
+	if (IS_ERR(found)) {
+		BFLOG_ERR("failed to find match by name: '%s'",
+			  ipt_match->u.user.name);
+		return PTR_ERR(found);
+	}
+
+	if (found->size + sizeof(*ipt_match) != ipt_match->u.match_size ||
+	    found->revision != ipt_match->u.user.revision) {
+		BFLOG_ERR("invalid match: '%s'", ipt_match->u.user.name);
+		return -EINVAL;
+	}
+
+	r = found->check(ctx, ipt_match);
+	if (r) {
+		BFLOG_ERR("match check failed: %s", STRERR(r));
+		return r;
+	}
+
+	match->match_ops = found;
+	match->ipt_match = ipt_match;
+
+	return 0;
+}
diff --git a/net/bpfilter/match.h b/net/bpfilter/match.h
new file mode 100644
index 000000000000..c6541e6a6567
--- /dev/null
+++ b/net/bpfilter/match.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_MATCH_H
+#define NET_BPFILTER_MATCH_H
+
+#include "../../include/uapi/linux/bpfilter.h"
+
+#include <stdint.h>
+
+struct bpfilter_ipt_match;
+struct codegen;
+struct context;
+struct match;
+
+struct match_ops {
+	char name[BPFILTER_EXTENSION_MAXNAMELEN];
+	uint8_t revision;
+	uint16_t size;
+	int (*check)(struct context *ctx, const struct bpfilter_ipt_match *ipt_match);
+	int (*gen_inline)(struct codegen *ctx, const struct match *match);
+};
+
+struct match {
+	const struct match_ops *match_ops;
+	const struct bpfilter_ipt_match *ipt_match;
+};
+
+int init_match(struct context *ctx, const struct bpfilter_ipt_match *ipt_match,
+	       struct match *match);
+
+#endif // NET_BPFILTER_MATCH_H
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
index 39ec0c09dff4..9ac1b3caf246 100644
--- a/tools/testing/selftests/bpf/bpfilter/.gitignore
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 tools/**
 test_map
+test_match
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index e3b8bf76a10c..10642c1d6a87 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -11,6 +11,7 @@ BPFDIR := $(LIBDIR)/bpf
 CFLAGS += -Wall -g -pthread -I$(TOOLSINCDIR) -I$(APIDIR) -I$(BPFILTERSRCDIR)
 
 TEST_GEN_PROGS += test_map
+TEST_GEN_PROGS += test_match
 
 KSFT_KHDR_INSTALL := 1
 
@@ -34,5 +35,11 @@ $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)			\
 
 BPFILTER_MAP_SRCS := $(BPFILTERSRCDIR)/map-common.c
 BPFILTER_CODEGEN_SRCS := $(BPFILTERSRCDIR)/codegen.c $(BPFOBJ) -lelf -lz
+BPFILTER_MATCH_SRCS := $(BPFILTERSRCDIR)/match.c
+
+BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS)
+BPFILTER_COMMON_SRCS += $(BPFILTERSRCDIR)/context.c $(BPFILTERSRCDIR)/logger.c
+BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS)
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
+$(OUTPUT)/test_match: test_match.c $(BPFILTER_COMMON_SRCS)
diff --git a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
new file mode 100644
index 000000000000..705fd1777a67
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BPFILTER_UTIL_H
+#define BPFILTER_UTIL_H
+
+#include <linux/netfilter/x_tables.h>
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+
+static inline void init_entry_match(struct xt_entry_match *match,
+				    uint16_t size, uint8_t revision,
+				    const char *name)
+{
+	memset(match, 0, sizeof(*match));
+	sprintf(match->u.user.name, "%s", name);
+	match->u.user.match_size = size;
+	match->u.user.revision = revision;
+}
+
+#endif // BPFILTER_UTIL_H
diff --git a/tools/testing/selftests/bpf/bpfilter/test_match.c b/tools/testing/selftests/bpf/bpfilter/test_match.c
new file mode 100644
index 000000000000..4a0dc1b14e4d
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/test_match.c
@@ -0,0 +1,69 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_tcpudp.h>
+
+#include "../../kselftest_harness.h"
+
+#include "context.h"
+#include "logger.h"
+#include "match.h"
+
+#include "bpfilter_util.h"
+
+/**
+ * struct udp_match - Dummy test structure.
+ *
+ * This structure provides enough space to allow for name too long, so it
+ * doesn't overwrite anything.
+ */
+struct udp_match {
+	struct xt_entry_match ipt_match;
+	char placeholder[32];
+};
+
+FIXTURE(test_match_init)
+{
+	struct context ctx;
+	struct udp_match udp_match;
+	struct match match;
+};
+
+FIXTURE_SETUP(test_match_init)
+{
+	logger_set_file(stderr);
+	ASSERT_EQ(0, create_context(&self->ctx));
+};
+
+FIXTURE_TEARDOWN(test_match_init)
+{
+	free_context(&self->ctx);
+}
+
+TEST_F(test_match_init, name_too_long)
+{
+	init_entry_match(&self->udp_match.ipt_match, sizeof(self->udp_match), 0,
+			 "this match name is supposed to be way too long...");
+
+	ASSERT_EQ(init_match(&self->ctx,
+			     (const struct bpfilter_ipt_match *)&self->udp_match
+				     .ipt_match,
+			     &self->match),
+		  -EINVAL);
+}
+
+TEST_F(test_match_init, not_found)
+{
+	init_entry_match(&self->udp_match.ipt_match, sizeof(self->udp_match), 0,
+			 "doesn't exist");
+
+	ASSERT_EQ(init_match(&self->ctx,
+			     (const struct bpfilter_ipt_match *)&self->udp_match
+				     .ipt_match,
+			     &self->match),
+		  -ENOENT);
+}
+
+TEST_HARNESS_MAIN
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 09/16] bpfilter: add support for src/dst addr and ports
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (7 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 08/16] bpfilter: add match structure Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 10/16] bpfilter: add target structure Quentin Deslandes
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Implement support for source and destination addresses and ports
matching.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |   2 +-
 net/bpfilter/context.c                        |   2 +-
 net/bpfilter/match.h                          |   2 +
 net/bpfilter/xt_udp.c                         | 111 ++++++++++++++++++
 .../testing/selftests/bpf/bpfilter/.gitignore |   1 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   6 +-
 .../selftests/bpf/bpfilter/test_xt_udp.c      |  48 ++++++++
 7 files changed, 168 insertions(+), 4 deletions(-)
 create mode 100644 net/bpfilter/xt_udp.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_xt_udp.c

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 2f8d867a6038..345341a9ee30 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -13,7 +13,7 @@ $(LIBBPF_A):
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
-bpfilter_umh-objs += match.o
+bpfilter_umh-objs += match.o xt_udp.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/context.c b/net/bpfilter/context.c
index b5e172412fab..f420fb8b6507 100644
--- a/net/bpfilter/context.c
+++ b/net/bpfilter/context.c
@@ -16,7 +16,7 @@
 #include "map-common.h"
 #include "match.h"
 
-static const struct match_ops *match_ops[] = { };
+static const struct match_ops *match_ops[] = { &xt_udp };
 
 static int init_match_ops_map(struct context *ctx)
 {
diff --git a/net/bpfilter/match.h b/net/bpfilter/match.h
index c6541e6a6567..7de3d2a07dc5 100644
--- a/net/bpfilter/match.h
+++ b/net/bpfilter/match.h
@@ -29,6 +29,8 @@ struct match {
 	const struct bpfilter_ipt_match *ipt_match;
 };
 
+extern const struct match_ops xt_udp;
+
 int init_match(struct context *ctx, const struct bpfilter_ipt_match *ipt_match,
 	       struct match *match);
 
diff --git a/net/bpfilter/xt_udp.c b/net/bpfilter/xt_udp.c
new file mode 100644
index 000000000000..c78cd4341f81
--- /dev/null
+++ b/net/bpfilter/xt_udp.c
@@ -0,0 +1,111 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include <linux/filter.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_tcpudp.h>
+#include <linux/udp.h>
+
+#include <arpa/inet.h>
+#include <errno.h>
+
+#include "codegen.h"
+#include "context.h"
+#include "logger.h"
+#include "match.h"
+
+static int xt_udp_check(struct context *ctx,
+			const struct bpfilter_ipt_match *ipt_match)
+{
+	const struct xt_udp *udp;
+
+	udp = (const struct xt_udp *)&ipt_match->data;
+
+	if (udp->invflags & XT_UDP_INV_MASK) {
+		BFLOG_ERR("cannot check match 'udp': invalid flags\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int xt_udp_gen_inline_ports(struct codegen *ctx, int regno, bool inv,
+				   const u16 (*ports)[2])
+{
+	if ((*ports)[0] == 0 && (*ports)[1] == 65535) {
+		if (inv)
+			EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+				   BPF_JMP_IMM(BPF_JA, 0, 0, 0));
+	} else if ((*ports)[0] == (*ports)[1]) {
+		const u16 port = htons((*ports)[0]);
+
+		EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+			   BPF_JMP_IMM((inv ? BPF_JEQ : BPF_JNE), regno, port, 0));
+	} else {
+		EMIT_LITTLE_ENDIAN(ctx, BPF_ENDIAN(BPF_TO_BE, regno, 16));
+		EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+			   BPF_JMP_IMM(inv ? BPF_JGT : BPF_JLT, regno, (*ports)[0], 0));
+		EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+			   BPF_JMP_IMM(inv ? BPF_JLT : BPF_JGT, regno, (*ports)[1], 0));
+	}
+
+	return 0;
+}
+
+static int xt_udp_gen_inline(struct codegen *ctx, const struct match *match)
+{
+	const struct xt_udp *udp;
+	int r;
+
+	udp = (const struct xt_udp *)&match->ipt_match->data;
+
+	EMIT(ctx, BPF_MOV64_REG(CODEGEN_REG_SCRATCH1, CODEGEN_REG_L4));
+	EMIT(ctx, BPF_ALU64_IMM(BPF_ADD, CODEGEN_REG_SCRATCH1, sizeof(struct udphdr)));
+	r = ctx->codegen_ops->load_packet_data_end(ctx, CODEGEN_REG_DATA_END);
+	if (r) {
+		BFLOG_ERR("failed to generate code to load packet data end: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+		   BPF_JMP_REG(BPF_JGT, CODEGEN_REG_SCRATCH1, CODEGEN_REG_DATA_END, 0));
+
+	EMIT(ctx, BPF_LDX_MEM(BPF_H, CODEGEN_REG_SCRATCH4, CODEGEN_REG_L4,
+			      offsetof(struct udphdr, source)));
+	EMIT(ctx, BPF_LDX_MEM(BPF_H, CODEGEN_REG_SCRATCH5, CODEGEN_REG_L4,
+			      offsetof(struct udphdr, dest)));
+
+	r = xt_udp_gen_inline_ports(ctx, CODEGEN_REG_SCRATCH4,
+				    udp->invflags & XT_UDP_INV_SRCPT,
+				    &udp->spts);
+	if (r) {
+		BFLOG_ERR("failed to generate code to match source ports: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	r = xt_udp_gen_inline_ports(ctx, CODEGEN_REG_SCRATCH5,
+				    udp->invflags & XT_UDP_INV_DSTPT,
+				    &udp->dpts);
+	if (r) {
+		BFLOG_ERR("failed to generate code to match destination ports: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	return 0;
+}
+
+const struct match_ops xt_udp = {
+	.name = "udp",
+	.size = XT_ALIGN(sizeof(struct xt_udp)),
+	.revision = 0,
+	.check = xt_udp_check,
+	.gen_inline = xt_udp_gen_inline
+};
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
index 9ac1b3caf246..f84cc86493df 100644
--- a/tools/testing/selftests/bpf/bpfilter/.gitignore
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -2,3 +2,4 @@
 tools/**
 test_map
 test_match
+test_xt_udp
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index 10642c1d6a87..97f8d596de36 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -12,6 +12,7 @@ CFLAGS += -Wall -g -pthread -I$(TOOLSINCDIR) -I$(APIDIR) -I$(BPFILTERSRCDIR)
 
 TEST_GEN_PROGS += test_map
 TEST_GEN_PROGS += test_match
+TEST_GEN_PROGS += test_xt_udp
 
 KSFT_KHDR_INSTALL := 1
 
@@ -35,11 +36,12 @@ $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)			\
 
 BPFILTER_MAP_SRCS := $(BPFILTERSRCDIR)/map-common.c
 BPFILTER_CODEGEN_SRCS := $(BPFILTERSRCDIR)/codegen.c $(BPFOBJ) -lelf -lz
-BPFILTER_MATCH_SRCS := $(BPFILTERSRCDIR)/match.c
+BPFILTER_MATCH_SRCS := $(BPFILTERSRCDIR)/match.c $(BPFILTERSRCDIR)/xt_udp.c
 
-BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS)
+BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS) $(BPFILTER_CODEGEN_SRCS)
 BPFILTER_COMMON_SRCS += $(BPFILTERSRCDIR)/context.c $(BPFILTERSRCDIR)/logger.c
 BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS)
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
 $(OUTPUT)/test_match: test_match.c $(BPFILTER_COMMON_SRCS)
+$(OUTPUT)/test_xt_udp: test_xt_udp.c $(BPFILTER_COMMON_SRCS)
diff --git a/tools/testing/selftests/bpf/bpfilter/test_xt_udp.c b/tools/testing/selftests/bpf/bpfilter/test_xt_udp.c
new file mode 100644
index 000000000000..c0898b0eca30
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/test_xt_udp.c
@@ -0,0 +1,48 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_tcpudp.h>
+
+#include "../../kselftest_harness.h"
+
+#include "context.h"
+#include "logger.h"
+#include "match.h"
+
+#include "bpfilter_util.h"
+
+FIXTURE(test_xt_udp)
+{
+	struct context ctx;
+	struct {
+		struct xt_entry_match match;
+		struct xt_udp udp;
+
+	} ipt_match;
+	struct match match;
+};
+
+FIXTURE_SETUP(test_xt_udp)
+{
+	logger_set_file(stderr);
+	ASSERT_EQ(0, create_context(&self->ctx));
+};
+
+FIXTURE_TEARDOWN(test_xt_udp)
+{
+	free_context(&self->ctx);
+};
+
+TEST_F(test_xt_udp, init)
+{
+	init_entry_match((struct xt_entry_match *)&self->ipt_match,
+			 sizeof(self->ipt_match), 0, "udp");
+	ASSERT_EQ(init_match(&self->ctx,
+			     (const struct bpfilter_ipt_match *)&self->ipt_match,
+			     &self->match),
+		 0);
+}
+
+TEST_HARNESS_MAIN
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 10/16] bpfilter: add target structure
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (8 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 09/16] bpfilter: add support for src/dst addr and ports Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 11/16] bpfilter: add rule structure Quentin Deslandes
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Add target structure, containing a pointer to a target_ops structure and
a pointer to an xt_entry_target structure. The later containing the
payload for the target's type.

target_ops structure provides two operations:
- check: validates the target.
- gen_inline: generate the eBPF bytecode for the target.

All target_ops are kept in a map by their name.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |   2 +-
 net/bpfilter/context.c                        |  42 ++++
 net/bpfilter/context.h                        |   1 +
 net/bpfilter/target.c                         | 203 ++++++++++++++++++
 net/bpfilter/target.h                         |  57 +++++
 .../testing/selftests/bpf/bpfilter/.gitignore |   1 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   5 +-
 .../selftests/bpf/bpfilter/bpfilter_util.h    |  23 ++
 .../selftests/bpf/bpfilter/test_target.c      |  83 +++++++
 9 files changed, 415 insertions(+), 2 deletions(-)
 create mode 100644 net/bpfilter/target.c
 create mode 100644 net/bpfilter/target.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_target.c

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 345341a9ee30..7e642e0ae932 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -13,7 +13,7 @@ $(LIBBPF_A):
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
-bpfilter_umh-objs += match.o xt_udp.o
+bpfilter_umh-objs += match.o xt_udp.o target.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/context.c b/net/bpfilter/context.c
index f420fb8b6507..ac07b678baa7 100644
--- a/net/bpfilter/context.c
+++ b/net/bpfilter/context.c
@@ -15,6 +15,7 @@
 #include "logger.h"
 #include "map-common.h"
 #include "match.h"
+#include "target.h"
 
 static const struct match_ops *match_ops[] = { &xt_udp };
 
@@ -42,6 +43,35 @@ static int init_match_ops_map(struct context *ctx)
 	return 0;
 }
 
+static const struct target_ops *target_ops[] = {
+	&standard_target_ops,
+	&error_target_ops
+};
+
+static int init_target_ops_map(struct context *ctx)
+{
+	int r;
+
+	r = create_map(&ctx->target_ops_map, ARRAY_SIZE(target_ops));
+	if (r) {
+		BFLOG_ERR("failed to create targets map: %s", STRERR(r));
+		return r;
+	}
+
+	for (int i = 0; i < ARRAY_SIZE(target_ops); ++i) {
+		const struct target_ops *t = target_ops[i];
+
+		r = map_upsert(&ctx->target_ops_map, t->name, (void *)t);
+		if (r) {
+			BFLOG_ERR("failed to upsert in targets map: %s",
+				  STRERR(r));
+			return r;
+		}
+	}
+
+	return 0;
+}
+
 int create_context(struct context *ctx)
 {
 	int r;
@@ -52,10 +82,22 @@ int create_context(struct context *ctx)
 		return r;
 	}
 
+	r = init_target_ops_map(ctx);
+	if (r) {
+		BFLOG_ERR("failed to initialize targets map: %s", STRERR(r));
+		goto err_free_match_ops_map;
+	}
+
 	return 0;
+
+err_free_match_ops_map:
+	free_map(&ctx->match_ops_map);
+
+	return r;
 }
 
 void free_context(struct context *ctx)
 {
+	free_map(&ctx->target_ops_map);
 	free_map(&ctx->match_ops_map);
 }
diff --git a/net/bpfilter/context.h b/net/bpfilter/context.h
index e36aa8ebf57e..f9c34a9968b8 100644
--- a/net/bpfilter/context.h
+++ b/net/bpfilter/context.h
@@ -11,6 +11,7 @@
 
 struct context {
 	struct hsearch_data match_ops_map;
+	struct hsearch_data target_ops_map;
 };
 
 int create_context(struct context *ctx);
diff --git a/net/bpfilter/target.c b/net/bpfilter/target.c
new file mode 100644
index 000000000000..a96ec7735c0e
--- /dev/null
+++ b/net/bpfilter/target.c
@@ -0,0 +1,203 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "target.h"
+
+#include <linux/err.h>
+#include <linux/filter.h>
+#include <linux/list.h>
+#include <linux/netfilter/x_tables.h>
+
+#include <errno.h>
+#include <string.h>
+
+#include "codegen.h"
+#include "context.h"
+#include "logger.h"
+#include "map-common.h"
+
+static const struct target_ops *target_ops_map_find(struct hsearch_data *map,
+						    const char *name)
+{
+	const size_t len = strnlen(name, BPFILTER_EXTENSION_MAXNAMELEN);
+
+	if (len < BPFILTER_EXTENSION_MAXNAMELEN)
+		return map_find(map, name);
+
+	return ERR_PTR(-EINVAL);
+}
+
+static int standard_target_check(struct context *ctx,
+				 const struct bpfilter_ipt_target *ipt_target)
+{
+	const struct bpfilter_ipt_standard_target *standard_target;
+
+	standard_target = (const struct bpfilter_ipt_standard_target *)ipt_target;
+
+	// Positive values of verdict denote a jump offset into a blob.
+	if (standard_target->verdict > 0)
+		return 0;
+
+	// Special values like ACCEPT, DROP, RETURN are encoded as negative values.
+	if (standard_target->verdict < 0) {
+		if (standard_target->verdict == BPFILTER_RETURN)
+			return 0;
+
+		switch (convert_verdict(standard_target->verdict)) {
+		case BPFILTER_NF_ACCEPT:
+		case BPFILTER_NF_DROP:
+		case BPFILTER_NF_QUEUE:
+			return 0;
+		}
+	}
+
+	BFLOG_ERR("unsupported verdict: %d", standard_target->verdict);
+
+	return -EINVAL;
+}
+
+static int standard_target_gen_inline(struct codegen *ctx,
+				      const struct target *target)
+{
+	const struct bpfilter_ipt_standard_target *standard_target;
+	int r;
+
+	standard_target = (const struct bpfilter_ipt_standard_target *)target->ipt_target;
+
+	if (standard_target->verdict >= 0) {
+		struct codegen_subprog_desc *subprog;
+		struct codegen_fixup_desc *fixup;
+
+		subprog = malloc(sizeof(*subprog));
+		if (!subprog) {
+			BFLOG_ERR("out of memory");
+			return -ENOMEM;
+		}
+
+		INIT_LIST_HEAD(&subprog->list);
+		subprog->type = CODEGEN_SUBPROG_USER_CHAIN;
+		subprog->insn = 0;
+		subprog->offset = standard_target->verdict;
+
+		fixup = malloc(sizeof(*fixup));
+		if (!fixup) {
+			BFLOG_ERR("out of memory");
+			free(subprog);
+			return -ENOMEM;
+		}
+
+		INIT_LIST_HEAD(&fixup->list);
+		fixup->type = CODEGEN_FIXUP_JUMP_TO_CHAIN;
+		fixup->insn = ctx->len_cur;
+		fixup->offset = standard_target->verdict;
+
+		list_add_tail(&fixup->list, &ctx->fixup);
+
+		r = codegen_push_awaiting_subprog(ctx, subprog);
+		if (r) {
+			BFLOG_ERR("failed to push awaiting subprog: %s",
+				  STRERR(r));
+			return r;
+		}
+
+		EMIT(ctx, BPF_JMP_IMM(BPF_JA, 0, 0, 0));
+
+		return 0;
+	}
+
+	if (standard_target->verdict == BPFILTER_RETURN) {
+		EMIT(ctx, BPF_EXIT_INSN());
+		return 0;
+	}
+
+	r = ctx->codegen_ops->emit_ret_code(ctx, convert_verdict(standard_target->verdict));
+	if (r) {
+		BFLOG_ERR("failed to emit return code: %s", STRERR(r));
+		return r;
+	}
+
+	EMIT(ctx, BPF_EXIT_INSN());
+
+	return 0;
+}
+
+const struct target_ops standard_target_ops = {
+	.name = "",
+	.revision = 0,
+	.size = sizeof(struct xt_standard_target),
+	.check = standard_target_check,
+	.gen_inline = standard_target_gen_inline,
+};
+
+static int error_target_check(struct context *ctx,
+			      const struct bpfilter_ipt_target *ipt_target)
+{
+	const struct bpfilter_ipt_error_target *error_target;
+	size_t maxlen;
+
+	error_target = (const struct bpfilter_ipt_error_target *)ipt_target;
+	maxlen = sizeof(error_target->error_name);
+	if (strnlen(error_target->error_name, maxlen) == maxlen) {
+		BFLOG_ERR("failed to check error target: too long errorname");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int error_target_gen_inline(struct codegen *ctx,
+				   const struct target *target)
+{
+	return -EINVAL;
+}
+
+const struct target_ops error_target_ops = {
+	.name = "ERROR",
+	.revision = 0,
+	.size = sizeof(struct xt_error_target),
+	.check = error_target_check,
+	.gen_inline = error_target_gen_inline,
+};
+
+int init_target(struct context *ctx,
+		const struct bpfilter_ipt_target *ipt_target,
+		struct target *target)
+{
+	const size_t maxlen = sizeof(ipt_target->u.user.name);
+	const struct target_ops *found;
+	int r;
+
+	if (strnlen(ipt_target->u.user.name, maxlen) == maxlen) {
+		BFLOG_ERR("cannot init target: too long target name '%s'",
+			  ipt_target->u.user.name);
+		return -EINVAL;
+	}
+
+	found = target_ops_map_find(&ctx->target_ops_map,
+				    ipt_target->u.user.name);
+	if (IS_ERR(found)) {
+		BFLOG_ERR("cannot find target by name '%s' in map",
+			  ipt_target->u.user.name);
+		return PTR_ERR(found);
+	}
+
+	if (found->size != ipt_target->u.target_size ||
+	    found->revision != ipt_target->u.user.revision) {
+		BFLOG_ERR("invalid target size: '%s'", ipt_target->u.user.name);
+		return -EINVAL;
+	}
+
+	r = found->check(ctx, ipt_target);
+	if (r)
+		return r;
+
+	target->target_ops = found;
+	target->ipt_target = ipt_target;
+
+	return 0;
+}
diff --git a/net/bpfilter/target.h b/net/bpfilter/target.h
new file mode 100644
index 000000000000..57bae658b6a2
--- /dev/null
+++ b/net/bpfilter/target.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_TARGET_H
+#define NET_BPFILTER_TARGET_H
+
+#include "../../include/uapi/linux/bpfilter.h"
+
+#include <stdint.h>
+
+struct codegen;
+struct context;
+struct target;
+struct target_ops_map;
+
+struct target_ops {
+	char name[BPFILTER_EXTENSION_MAXNAMELEN];
+	uint8_t revision;
+	uint16_t size;
+	int (*check)(struct context *ctx,
+		     const struct bpfilter_ipt_target *ipt_target);
+	int (*gen_inline)(struct codegen *ctx, const struct target *target);
+};
+
+struct target {
+	const struct target_ops *target_ops;
+	const struct bpfilter_ipt_target *ipt_target;
+};
+
+extern const struct target_ops standard_target_ops;
+extern const struct target_ops error_target_ops;
+
+/* Restore verdict's special value(ACCEPT, DROP, etc.) from its negative
+ * representation.
+ */
+static inline int convert_verdict(int verdict)
+{
+	return -verdict - 1;
+}
+
+static inline int standard_target_verdict(const struct bpfilter_ipt_target *ipt_target)
+{
+	const struct bpfilter_ipt_standard_target *standard_target;
+
+	standard_target = (const struct bpfilter_ipt_standard_target *)ipt_target;
+
+	return standard_target->verdict;
+}
+
+int init_target(struct context *ctx,
+		const struct bpfilter_ipt_target *ipt_target,
+		struct target *target);
+
+#endif // NET_BPFILTER_TARGET_H
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
index f84cc86493df..89912a44109f 100644
--- a/tools/testing/selftests/bpf/bpfilter/.gitignore
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -3,3 +3,4 @@ tools/**
 test_map
 test_match
 test_xt_udp
+test_target
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index 97f8d596de36..587951d14c0c 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -13,6 +13,7 @@ CFLAGS += -Wall -g -pthread -I$(TOOLSINCDIR) -I$(APIDIR) -I$(BPFILTERSRCDIR)
 TEST_GEN_PROGS += test_map
 TEST_GEN_PROGS += test_match
 TEST_GEN_PROGS += test_xt_udp
+TEST_GEN_PROGS += test_target
 
 KSFT_KHDR_INSTALL := 1
 
@@ -37,11 +38,13 @@ $(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)			\
 BPFILTER_MAP_SRCS := $(BPFILTERSRCDIR)/map-common.c
 BPFILTER_CODEGEN_SRCS := $(BPFILTERSRCDIR)/codegen.c $(BPFOBJ) -lelf -lz
 BPFILTER_MATCH_SRCS := $(BPFILTERSRCDIR)/match.c $(BPFILTERSRCDIR)/xt_udp.c
+BPFILTER_TARGET_SRCS := $(BPFILTERSRCDIR)/target.c
 
 BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS) $(BPFILTER_CODEGEN_SRCS)
 BPFILTER_COMMON_SRCS += $(BPFILTERSRCDIR)/context.c $(BPFILTERSRCDIR)/logger.c
-BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS)
+BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS) $(BPFILTER_TARGET_SRCS)
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
 $(OUTPUT)/test_match: test_match.c $(BPFILTER_COMMON_SRCS)
 $(OUTPUT)/test_xt_udp: test_xt_udp.c $(BPFILTER_COMMON_SRCS)
+$(OUTPUT)/test_target: test_target.c $(BPFILTER_COMMON_SRCS)
diff --git a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
index 705fd1777a67..0d6a6bee5514 100644
--- a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
+++ b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
@@ -19,4 +19,27 @@ static inline void init_entry_match(struct xt_entry_match *match,
 	match->u.user.revision = revision;
 }
 
+static inline void init_standard_target(struct xt_standard_target *ipt_target,
+					int revision, int verdict)
+{
+	snprintf(ipt_target->target.u.user.name,
+		 sizeof(ipt_target->target.u.user.name), "%s",
+		 BPFILTER_STANDARD_TARGET);
+	ipt_target->target.u.user.revision = revision;
+	ipt_target->target.u.user.target_size = sizeof(*ipt_target);
+	ipt_target->verdict = verdict;
+}
+
+static inline void init_error_target(struct xt_error_target *ipt_target,
+				     int revision, const char *error_name)
+{
+	snprintf(ipt_target->target.u.user.name,
+		 sizeof(ipt_target->target.u.user.name), "%s",
+		 BPFILTER_ERROR_TARGET);
+	ipt_target->target.u.user.revision = revision;
+	ipt_target->target.u.user.target_size = sizeof(*ipt_target);
+	snprintf(ipt_target->errorname, sizeof(ipt_target->errorname), "%s",
+		 error_name);
+}
+
 #endif // BPFILTER_UTIL_H
diff --git a/tools/testing/selftests/bpf/bpfilter/test_target.c b/tools/testing/selftests/bpf/bpfilter/test_target.c
new file mode 100644
index 000000000000..0ebe4b052a9b
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/test_target.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <linux/bpfilter.h>
+#include <linux/netfilter/x_tables.h>
+
+#include "../../kselftest_harness.h"
+
+#include "context.h"
+#include "logger.h"
+#include "target.h"
+
+#include "bpfilter_util.h"
+
+FIXTURE(test_standard_target)
+{
+	struct context ctx;
+	struct xt_standard_target ipt_target;
+	struct target target;
+};
+
+FIXTURE_VARIANT(test_standard_target)
+{
+	int verdict;
+};
+
+FIXTURE_VARIANT_ADD(test_standard_target, accept) {
+	.verdict = -BPFILTER_NF_ACCEPT - 1,
+};
+
+FIXTURE_VARIANT_ADD(test_standard_target, drop) {
+	.verdict = -BPFILTER_NF_DROP - 1,
+};
+
+FIXTURE_SETUP(test_standard_target)
+{
+	logger_set_file(stderr);
+	ASSERT_EQ(0, create_context(&self->ctx));
+
+	memset(&self->ipt_target, 0, sizeof(self->ipt_target));
+	init_standard_target(&self->ipt_target, 0, variant->verdict);
+}
+
+FIXTURE_TEARDOWN(test_standard_target)
+{
+	free_context(&self->ctx);
+}
+
+TEST_F(test_standard_target, init)
+{
+	ASSERT_EQ(0, init_target(&self->ctx, (const struct bpfilter_ipt_target *)&self->ipt_target,
+				 &self->target));
+}
+
+FIXTURE(test_error_target)
+{
+	struct context ctx;
+	struct xt_error_target ipt_target;
+	struct target target;
+};
+
+FIXTURE_SETUP(test_error_target)
+{
+	logger_set_file(stderr);
+	ASSERT_EQ(0, create_context(&self->ctx));
+
+	memset(&self->ipt_target, 0, sizeof(self->ipt_target));
+	init_error_target(&self->ipt_target, 0, "x");
+}
+
+FIXTURE_TEARDOWN(test_error_target)
+{
+	free_context(&self->ctx);
+}
+
+TEST_F(test_error_target, init)
+{
+	ASSERT_EQ(0, init_target(&self->ctx, (const struct bpfilter_ipt_target *)&self->ipt_target,
+				 &self->target));
+}
+
+TEST_HARNESS_MAIN
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 11/16] bpfilter: add rule structure
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (9 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 10/16] bpfilter: add target structure Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 12/16] bpfilter: add table structure Quentin Deslandes
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

The rule structure is an equivalent to ipt_entry structure. A rule
consists of zero or more matches and a target. A rule has a pointer to
its ipt_entry structure in entries blob. This structure is defined to
ease iteration over the various rules of a given chain. The original
ipt_entry blob is kept to simplify interaction with iptables binary.

Inline bytecode generation is performed by gen_inline_rule(), and
consists of the following steps:
1. Emit instructions for rule's L3 src/dst addresses and protocol.
2. Emit instructions for each rule's match by calling match's interface.
3. Emit instructions for rule's target by calling target's interface.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |   2 +-
 net/bpfilter/rule.c                           | 286 ++++++++++++++++++
 net/bpfilter/rule.h                           |  37 +++
 .../testing/selftests/bpf/bpfilter/.gitignore |   1 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   4 +
 .../selftests/bpf/bpfilter/bpfilter_util.h    |   8 +
 .../selftests/bpf/bpfilter/test_rule.c        |  56 ++++
 7 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 net/bpfilter/rule.c
 create mode 100644 net/bpfilter/rule.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_rule.c

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 7e642e0ae932..759fb6c847d1 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -13,7 +13,7 @@ $(LIBBPF_A):
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
-bpfilter_umh-objs += match.o xt_udp.o target.o
+bpfilter_umh-objs += match.o xt_udp.o target.o rule.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/rule.c b/net/bpfilter/rule.c
new file mode 100644
index 000000000000..0f5217f6ab16
--- /dev/null
+++ b/net/bpfilter/rule.c
@@ -0,0 +1,286 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "rule.h"
+
+#include "../../include/uapi/linux/bpfilter.h"
+
+#include <linux/filter.h>
+#include <linux/ip.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#include <errno.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "codegen.h"
+#include "context.h"
+#include "logger.h"
+#include "match.h"
+
+static const struct bpfilter_ipt_target *ipt_entry_target(const struct bpfilter_ipt_entry *ipt_entry)
+{
+	return (const void *)ipt_entry + ipt_entry->target_offset;
+}
+
+static const struct bpfilter_ipt_match *ipt_entry_match(const struct bpfilter_ipt_entry *entry,
+							size_t offset)
+{
+	return (const void *)entry + offset;
+}
+
+static int ipt_entry_num_matches(const struct bpfilter_ipt_entry *ipt_entry)
+{
+	const struct bpfilter_ipt_match *ipt_match;
+	uint32_t offset = sizeof(*ipt_entry);
+	int num_matches = 0;
+
+	while (offset < ipt_entry->target_offset) {
+		ipt_match = ipt_entry_match(ipt_entry, offset);
+
+		if ((uintptr_t)ipt_match % __alignof__(struct bpfilter_ipt_match)) {
+			BFLOG_ERR("match must be aligned on struct bpfilter_ipt_match size");
+			return -EINVAL;
+		}
+
+		if (ipt_entry->target_offset < offset + sizeof(*ipt_match)) {
+			BFLOG_ERR("invalid target offset for struct ipt_entry");
+			return -EINVAL;
+		}
+
+		if (ipt_match->u.match_size < sizeof(*ipt_match)) {
+			BFLOG_ERR("invalid match size for struct ipt_match");
+			return -EINVAL;
+		}
+
+		if (ipt_entry->target_offset < offset + ipt_match->u.match_size) {
+			BFLOG_ERR("invalid target offset for struct ipt_entry");
+			return -EINVAL;
+		}
+
+		++num_matches;
+		offset += ipt_match->u.match_size;
+	}
+
+	if (offset != ipt_entry->target_offset) {
+		BFLOG_ERR("invalid offset");
+		return -EINVAL;
+	}
+
+	return num_matches;
+}
+
+static int init_rule_matches(struct context *ctx,
+			     const struct bpfilter_ipt_entry *ipt_entry,
+			     struct rule *rule)
+{
+	const struct bpfilter_ipt_match *ipt_match;
+	uint32_t offset = sizeof(*ipt_entry);
+	struct match *match;
+	int r;
+
+	rule->matches = calloc(rule->num_matches, sizeof(rule->matches[0]));
+	if (!rule->matches) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	match = rule->matches;
+	while (offset < ipt_entry->target_offset) {
+		ipt_match = ipt_entry_match(ipt_entry, offset);
+		r = init_match(ctx, ipt_match, match);
+		if (r) {
+			free(rule->matches);
+			rule->matches = NULL;
+			BFLOG_ERR("failed to initialize match: %s", STRERR(r));
+			return r;
+		}
+
+		++match;
+		offset += ipt_match->u.match_size;
+	}
+
+	return 0;
+}
+
+static int check_ipt_entry_ip(const struct bpfilter_ipt_ip *ip)
+{
+	if (ip->flags & ~BPFILTER_IPT_F_MASK) {
+		BFLOG_ERR("invalid flags: %d", ip->flags);
+		return -EINVAL;
+	}
+
+	if (ip->invflags & ~BPFILTER_IPT_INV_MASK) {
+		BFLOG_ERR("invalid inverse flags: %d", ip->invflags);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+bool rule_has_standard_target(const struct rule *rule)
+{
+	return rule->target.target_ops == &standard_target_ops;
+}
+
+bool rule_is_unconditional(const struct rule *rule)
+{
+	static const struct bpfilter_ipt_ip unconditional;
+
+	if (rule->num_matches)
+		return false;
+
+	return !memcmp(&rule->ipt_entry->ip, &unconditional,
+		       sizeof(unconditional));
+}
+
+int init_rule(struct context *ctx, const struct bpfilter_ipt_entry *ipt_entry,
+	      struct rule *rule)
+{
+	const struct bpfilter_ipt_target *ipt_target;
+	int r;
+
+	r = check_ipt_entry_ip(&ipt_entry->ip);
+	if (r) {
+		BFLOG_ERR("failed to check IPT entry IP: %s", STRERR(r));
+		return r;
+	}
+
+	if (ipt_entry->target_offset < sizeof(*ipt_entry)) {
+		BFLOG_ERR("invalid struct ipt_entry target offset: %d",
+			  ipt_entry->target_offset);
+		return -EINVAL;
+	}
+
+	if (ipt_entry->next_offset <
+	    ipt_entry->target_offset + sizeof(*ipt_target)) {
+		BFLOG_ERR("invalid struct ipt_entry next offset: %d",
+			  ipt_entry->next_offset);
+		return -EINVAL;
+	}
+
+	ipt_target = ipt_entry_target(ipt_entry);
+
+	if (ipt_target->u.target_size < sizeof(*ipt_target)) {
+		BFLOG_ERR("invalid struct ipt_target target size: %d",
+			  ipt_target->u.target_size);
+		return -EINVAL;
+	}
+
+	if (ipt_entry->next_offset <
+	    ipt_entry->target_offset + ipt_target->u.target_size) {
+		BFLOG_ERR("invalid struct ipt_entry next offset: %d",
+			  ipt_entry->next_offset);
+		return -EINVAL;
+	}
+
+	rule->ipt_entry = ipt_entry;
+
+	r = init_target(ctx, ipt_target, &rule->target);
+	if (r) {
+		BFLOG_ERR("failed to initialise target: %s", STRERR(r));
+		return r;
+	}
+
+	if (rule_has_standard_target(rule)) {
+		if (XT_ALIGN(ipt_entry->target_offset + sizeof(struct bpfilter_ipt_standard_target)) !=
+		    ipt_entry->next_offset) {
+			BFLOG_ERR("invalid struct ipt_entry target offset alignment");
+			return -EINVAL;
+		}
+	}
+
+	rule->num_matches = ipt_entry_num_matches(ipt_entry);
+	if (rule->num_matches < 0)
+		return rule->num_matches;
+
+	return init_rule_matches(ctx, ipt_entry, rule);
+}
+
+int gen_inline_rule(struct codegen *ctx, const struct rule *rule)
+{
+	int r;
+
+	const struct bpfilter_ipt_ip *ipt_ip = &rule->ipt_entry->ip;
+
+	if (!ipt_ip->src_mask && !ipt_ip->src) {
+		if (ipt_ip->invflags & IPT_INV_SRCIP)
+			return 0;
+	}
+
+	if (!ipt_ip->dst_mask && !ipt_ip->dst) {
+		if (ipt_ip->invflags & IPT_INV_DSTIP)
+			return 0;
+	}
+
+	if (ipt_ip->src_mask || ipt_ip->src) {
+		const int op = ipt_ip->invflags & IPT_INV_SRCIP ? BPF_JEQ : BPF_JNE;
+
+		EMIT(ctx, BPF_LDX_MEM(BPF_W, CODEGEN_REG_SCRATCH1, CODEGEN_REG_L3,
+				      offsetof(struct iphdr, saddr)));
+		EMIT(ctx, BPF_ALU32_IMM(BPF_AND, CODEGEN_REG_SCRATCH1, ipt_ip->src_mask));
+		EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+			   BPF_JMP_IMM(op, CODEGEN_REG_SCRATCH1, ipt_ip->src, 0));
+	}
+
+	if (ipt_ip->dst_mask || ipt_ip->dst) {
+		const int op = ipt_ip->invflags & IPT_INV_DSTIP ? BPF_JEQ : BPF_JNE;
+
+		EMIT(ctx, BPF_LDX_MEM(BPF_W, CODEGEN_REG_SCRATCH2, CODEGEN_REG_L3,
+				      offsetof(struct iphdr, daddr)));
+		EMIT(ctx, BPF_ALU32_IMM(BPF_AND, CODEGEN_REG_SCRATCH2, ipt_ip->dst_mask));
+		EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+			   BPF_JMP_IMM(op, CODEGEN_REG_SCRATCH2, ipt_ip->dst, 0));
+	}
+
+	if (ipt_ip->protocol) {
+		EMIT(ctx, BPF_LDX_MEM(BPF_B, CODEGEN_REG_SCRATCH4, CODEGEN_REG_L3,
+				      offsetof(struct iphdr, protocol)));
+		EMIT_FIXUP(ctx, CODEGEN_FIXUP_NEXT_RULE,
+			   BPF_JMP_IMM(BPF_JNE, CODEGEN_REG_SCRATCH4, ipt_ip->protocol, 0));
+
+		EMIT(ctx, BPF_LDX_MEM(BPF_B, CODEGEN_REG_SCRATCH4, CODEGEN_REG_L3,
+				      offsetof(struct iphdr, protocol)));
+		EMIT(ctx, BPF_MOV64_REG(CODEGEN_REG_L4, CODEGEN_REG_L3));
+		EMIT(ctx, BPF_LDX_MEM(BPF_B, CODEGEN_REG_SCRATCH1, CODEGEN_REG_L3, 0));
+		EMIT(ctx, BPF_ALU32_IMM(BPF_AND, CODEGEN_REG_SCRATCH1, 0x0f));
+		EMIT(ctx, BPF_ALU32_IMM(BPF_LSH, CODEGEN_REG_SCRATCH1, 2));
+		EMIT(ctx, BPF_ALU64_REG(BPF_ADD, CODEGEN_REG_L4, CODEGEN_REG_SCRATCH1));
+	}
+
+	for (int i = 0; i < rule->num_matches; ++i) {
+		const struct match *match;
+
+		match = &rule->matches[i];
+		r = match->match_ops->gen_inline(ctx, match);
+		if (r) {
+			BFLOG_ERR("failed to generate inline code match: %s",
+				  STRERR(r));
+			return r;
+		}
+	}
+
+	EMIT_ADD_COUNTER(ctx);
+
+	r = rule->target.target_ops->gen_inline(ctx, &rule->target);
+	if (r) {
+		BFLOG_ERR("failed to generate inline code for target: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	codegen_fixup(ctx, CODEGEN_FIXUP_NEXT_RULE);
+
+	return 0;
+}
+
+void free_rule(struct rule *rule)
+{
+	free(rule->matches);
+}
diff --git a/net/bpfilter/rule.h b/net/bpfilter/rule.h
new file mode 100644
index 000000000000..3a50c6112d3b
--- /dev/null
+++ b/net/bpfilter/rule.h
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_RULE_H
+#define NET_BPFILTER_RULE_H
+
+#include <stdint.h>
+#include <stdbool.h>
+
+#include "target.h"
+
+struct bpfilter_ipt_entry;
+struct codegen;
+struct context;
+struct match;
+
+struct rule {
+	const struct bpfilter_ipt_entry *ipt_entry;
+	uint32_t came_from;
+	uint32_t hook_mask;
+	uint16_t num_matches;
+	struct match *matches;
+	struct target target;
+	uint32_t index;
+};
+
+bool rule_has_standard_target(const struct rule *rule);
+bool rule_is_unconditional(const struct rule *rule);
+int init_rule(struct context *ctx, const struct bpfilter_ipt_entry *ipt_entry,
+	      struct rule *rule);
+int gen_inline_rule(struct codegen *ctx, const struct rule *rule);
+void free_rule(struct rule *rule);
+
+#endif // NET_BPFILTER_RULE_H
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
index 89912a44109f..a934ddef58d2 100644
--- a/tools/testing/selftests/bpf/bpfilter/.gitignore
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -4,3 +4,4 @@ test_map
 test_match
 test_xt_udp
 test_target
+test_rule
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index 587951d14c0c..4ef52bfe2d21 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -14,6 +14,7 @@ TEST_GEN_PROGS += test_map
 TEST_GEN_PROGS += test_match
 TEST_GEN_PROGS += test_xt_udp
 TEST_GEN_PROGS += test_target
+TEST_GEN_PROGS += test_rule
 
 KSFT_KHDR_INSTALL := 1
 
@@ -39,12 +40,15 @@ BPFILTER_MAP_SRCS := $(BPFILTERSRCDIR)/map-common.c
 BPFILTER_CODEGEN_SRCS := $(BPFILTERSRCDIR)/codegen.c $(BPFOBJ) -lelf -lz
 BPFILTER_MATCH_SRCS := $(BPFILTERSRCDIR)/match.c $(BPFILTERSRCDIR)/xt_udp.c
 BPFILTER_TARGET_SRCS := $(BPFILTERSRCDIR)/target.c
+BPFILTER_RULE_SRCS := $(BPFILTERSRCDIR)/rule.c
 
 BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS) $(BPFILTER_CODEGEN_SRCS)
 BPFILTER_COMMON_SRCS += $(BPFILTERSRCDIR)/context.c $(BPFILTERSRCDIR)/logger.c
 BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS) $(BPFILTER_TARGET_SRCS)
+BPFILTER_COMMON_SRCS += $(BPFILTER_RULE_SRCS)
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
 $(OUTPUT)/test_match: test_match.c $(BPFILTER_COMMON_SRCS)
 $(OUTPUT)/test_xt_udp: test_xt_udp.c $(BPFILTER_COMMON_SRCS)
 $(OUTPUT)/test_target: test_target.c $(BPFILTER_COMMON_SRCS)
+$(OUTPUT)/test_rule: test_rule.c $(BPFILTER_COMMON_SRCS)
diff --git a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
index 0d6a6bee5514..8dd7911fa06f 100644
--- a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
+++ b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
@@ -4,6 +4,7 @@
 #define BPFILTER_UTIL_H
 
 #include <linux/netfilter/x_tables.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
 
 #include <stdio.h>
 #include <stdint.h>
@@ -42,4 +43,11 @@ static inline void init_error_target(struct xt_error_target *ipt_target,
 		 error_name);
 }
 
+static inline void init_standard_entry(struct ipt_entry *entry, __u16 matches_size)
+{
+	memset(entry, 0, sizeof(*entry));
+	entry->target_offset = sizeof(*entry) + matches_size;
+	entry->next_offset = sizeof(*entry) + matches_size + sizeof(struct xt_standard_target);
+}
+
 #endif // BPFILTER_UTIL_H
diff --git a/tools/testing/selftests/bpf/bpfilter/test_rule.c b/tools/testing/selftests/bpf/bpfilter/test_rule.c
new file mode 100644
index 000000000000..db2cc7c5586a
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/test_rule.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include "rule.h"
+
+#include <linux/bpfilter.h>
+#include <linux/err.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include "../../kselftest_harness.h"
+
+#include "context.h"
+#include "logger.h"
+#include "rule.h"
+
+#include "bpfilter_util.h"
+
+FIXTURE(test_standard_rule)
+{
+	struct context ctx;
+	struct {
+		struct ipt_entry entry;
+		struct xt_standard_target target;
+	} entry;
+	struct rule rule;
+};
+
+FIXTURE_SETUP(test_standard_rule)
+{
+	const int verdict = BPFILTER_NF_ACCEPT;
+
+	logger_set_file(stderr);
+	ASSERT_EQ(create_context(&self->ctx), 0);
+
+	init_standard_entry(&self->entry.entry, 0);
+	init_standard_target(&self->entry.target, 0, -verdict - 1);
+}
+
+FIXTURE_TEARDOWN(test_standard_rule)
+{
+	free_rule(&self->rule);
+	free_context(&self->ctx);
+}
+
+TEST_F(test_standard_rule, init)
+{
+	ASSERT_EQ(0, init_rule(&self->ctx, (const struct bpfilter_ipt_entry *)&self->entry.entry,
+			       &self->rule));
+}
+
+TEST_HARNESS_MAIN
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 12/16] bpfilter: add table structure
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (10 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 11/16] bpfilter: add rule structure Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:03 ` [PATCH bpf-next v3 13/16] bpfilter: add table code generation Quentin Deslandes
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

The table_ops structure describes a set of operations for an individual
table type.

Tables support the following set of operations:
- create: create an instance of a table from an ipt_replace blob.
- codegen: generate eBPF bytecode for a table.
- install: load BPF maps, progs, and attach them.
- uninstall: detach loaded BPF maps and progs, and unload them.
- free: free all resources used by a table.

Each table keeps an instance of iptables' table blob and an array of
rules for this blob. The array of rules provides a more convenient way
to interact with the blob's entries, while having a copy of the blob
will ease communication with iptables.

All tables created are stored in a map, used for lookups. Also, all
tables are linked into a list to ease cleanup.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |   2 +-
 net/bpfilter/context.c                        |  64 +++
 net/bpfilter/context.h                        |   4 +
 net/bpfilter/table.c                          | 391 ++++++++++++++++++
 net/bpfilter/table.h                          |  59 +++
 tools/testing/selftests/bpf/bpfilter/Makefile |   2 +-
 6 files changed, 520 insertions(+), 2 deletions(-)
 create mode 100644 net/bpfilter/table.c
 create mode 100644 net/bpfilter/table.h

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 759fb6c847d1..9f5b46c70a41 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -13,7 +13,7 @@ $(LIBBPF_A):
 userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
-bpfilter_umh-objs += match.o xt_udp.o target.o rule.o
+bpfilter_umh-objs += match.o xt_udp.o target.o rule.o table.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/context.c b/net/bpfilter/context.c
index ac07b678baa7..81c9751a2a2d 100644
--- a/net/bpfilter/context.c
+++ b/net/bpfilter/context.c
@@ -9,6 +9,7 @@
 #include "context.h"
 
 #include <linux/kernel.h>
+#include <linux/list.h>
 
 #include <string.h>
 
@@ -72,6 +73,39 @@ static int init_target_ops_map(struct context *ctx)
 	return 0;
 }
 
+static const struct table_ops *table_ops[] = {};
+
+static int init_table_ops_map(struct context *ctx)
+{
+	int r;
+
+	r = create_map(&ctx->table_ops_map, ARRAY_SIZE(table_ops));
+	if (r) {
+		BFLOG_ERR("failed to create tables map: %s", STRERR(r));
+		return r;
+	}
+
+	for (int i = 0; i < ARRAY_SIZE(table_ops); ++i) {
+		const struct table_ops *t = table_ops[i];
+
+		r = map_upsert(&ctx->table_ops_map, t->name, (void *)t);
+		if (r) {
+			BFLOG_ERR("failed to upsert in tables map: %s",
+				  STRERR(r));
+			return r;
+		}
+	}
+
+	return 0;
+}
+
+static int init_table_index(struct context *ctx)
+{
+	INIT_LIST_HEAD(&ctx->table_index.list);
+
+	return create_map(&ctx->table_index.map, ARRAY_SIZE(table_ops));
+}
+
 int create_context(struct context *ctx)
 {
 	int r;
@@ -88,8 +122,26 @@ int create_context(struct context *ctx)
 		goto err_free_match_ops_map;
 	}
 
+	r = init_table_ops_map(ctx);
+	if (r) {
+		BFLOG_ERR("failed to initialize tables map: %s", STRERR(r));
+		goto err_free_target_ops_map;
+	}
+
+	r = init_table_index(ctx);
+	if (r) {
+		BFLOG_ERR("failed to initialize tables index: %s", STRERR(r));
+		goto err_free_table_ops_map;
+	}
+
 	return 0;
 
+err_free_table_ops_map:
+	free_map(&ctx->table_ops_map);
+
+err_free_target_ops_map:
+	free_map(&ctx->target_ops_map);
+
 err_free_match_ops_map:
 	free_map(&ctx->match_ops_map);
 
@@ -98,6 +150,18 @@ int create_context(struct context *ctx)
 
 void free_context(struct context *ctx)
 {
+	struct list_head *t;
+	struct list_head *n;
+
+	list_for_each_safe(t, n, &ctx->table_index.list) {
+		struct table *table;
+
+		table = list_entry(t, struct table, list);
+		table->table_ops->uninstall(ctx, table);
+		table->table_ops->free(table);
+	}
+	free_map(&ctx->table_index.map);
+	free_map(&ctx->table_ops_map);
 	free_map(&ctx->target_ops_map);
 	free_map(&ctx->match_ops_map);
 }
diff --git a/net/bpfilter/context.h b/net/bpfilter/context.h
index f9c34a9968b8..b0e91e37d057 100644
--- a/net/bpfilter/context.h
+++ b/net/bpfilter/context.h
@@ -9,9 +9,13 @@
 
 #include <search.h>
 
+#include "table.h"
+
 struct context {
 	struct hsearch_data match_ops_map;
 	struct hsearch_data target_ops_map;
+	struct hsearch_data table_ops_map;
+	struct table_index table_index;
 };
 
 int create_context(struct context *ctx);
diff --git a/net/bpfilter/table.c b/net/bpfilter/table.c
new file mode 100644
index 000000000000..4094c82c31de
--- /dev/null
+++ b/net/bpfilter/table.c
@@ -0,0 +1,391 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "table.h"
+
+#include <linux/err.h>
+#include <linux/list.h>
+
+#include <errno.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "context.h"
+#include "logger.h"
+#include "rule.h"
+
+static int rule_offset_comparator(const void *x, const void *y)
+{
+	const struct rule *rule = y;
+
+	return x - (const void *)rule->ipt_entry;
+}
+
+static bool table_has_hook(const struct table *table, uint32_t hook)
+{
+	BUG_ON(hook >= BPFILTER_INET_HOOK_MAX);
+
+	return table->valid_hooks & (1 << hook);
+}
+
+static int table_init_rules(struct context *ctx, struct table *table,
+			    const struct bpfilter_ipt_replace *ipt_replace)
+{
+	uint32_t offset;
+
+	table->entries = malloc(table->size);
+	if (!table->entries) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	memcpy(table->entries, ipt_replace->entries, table->size);
+
+	table->rules = calloc(table->num_rules, sizeof(table->rules[0]));
+	if (!table->rules) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	offset = 0;
+	for (int i = 0; i < table->num_rules; ++i) {
+		const struct bpfilter_ipt_entry *ipt_entry;
+		int r;
+
+		if (table->size < offset + sizeof(*ipt_entry)) {
+			BFLOG_ERR("invalid table size: %d", table->size);
+			return -EINVAL;
+		}
+
+		ipt_entry = table->entries + offset;
+
+		if ((uintptr_t)ipt_entry % __alignof__(struct bpfilter_ipt_entry)) {
+			BFLOG_ERR("invalid alignment for struct ipt_entry");
+			return -EINVAL;
+		}
+
+		if (table->size < offset + ipt_entry->next_offset) {
+			BFLOG_ERR("invalid table size: %d", table->size);
+			return -EINVAL;
+		}
+
+		r = init_rule(ctx, ipt_entry, &table->rules[i]);
+		if (r) {
+			BFLOG_ERR("failed to initialize rule: %s",
+				  STRERR(r));
+			return r;
+		}
+
+		table->rules[i].ipt_entry = ipt_entry;
+		offset += ipt_entry->next_offset;
+	}
+
+	if (offset != ipt_replace->size) {
+		BFLOG_ERR("invalid final offset: %d", offset);
+		return -EINVAL;
+	}
+
+	if (table->num_rules != ipt_replace->num_entries) {
+		BFLOG_ERR("mismatch in number of rules: got %d, expected %d",
+			  table->num_rules, ipt_replace->num_entries);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int table_check_hooks(const struct table *table)
+{
+	uint32_t max_rule_front, max_rule_last;
+	bool check = false;
+
+	for (int i = 0; i < BPFILTER_INET_HOOK_MAX; ++i) {
+		if (!table_has_hook(table, i))
+			continue;
+
+		if (check) {
+			if (table->hook_entry[i] <= max_rule_front) {
+				BFLOG_ERR("invalid hook entry");
+				return -EINVAL;
+			}
+
+			if (table->underflow[i] <= max_rule_last) {
+				BFLOG_ERR("invalid underflow entry");
+				return -EINVAL;
+			}
+		}
+
+		max_rule_front = table->hook_entry[i];
+		max_rule_last = table->underflow[i];
+		check = true;
+	}
+
+	return 0;
+}
+
+static int table_init_hooks(struct table *table,
+			    const struct bpfilter_ipt_replace *ipt_replace)
+{
+	for (int i = 0; i < BPFILTER_INET_HOOK_MAX; ++i) {
+		struct rule *rule_front;
+		struct rule *rule_last;
+		int verdict;
+
+		if (!table_has_hook(table, i))
+			continue;
+
+		rule_front = table_find_rule_by_offset(table, ipt_replace->hook_entry[i]);
+		rule_last = table_find_rule_by_offset(table, ipt_replace->underflow[i]);
+
+		if (!rule_front || !rule_last) {
+			BFLOG_ERR("expected a first and last rule");
+			return -EINVAL;
+		}
+
+		if (!rule_is_unconditional(rule_last)) {
+			BFLOG_ERR("expected unconditional rule");
+			return -EINVAL;
+		}
+
+		if (!rule_has_standard_target(rule_last)) {
+			BFLOG_ERR("expected rule for a standard target");
+			return -EINVAL;
+		}
+
+		verdict = standard_target_verdict(rule_last->target.ipt_target);
+		if (verdict >= 0) {
+			BFLOG_ERR("expected a valid standard target verdict: %d",
+				  verdict);
+			return -EINVAL;
+		}
+
+		verdict = convert_verdict(verdict);
+
+		if (verdict != BPFILTER_NF_DROP && verdict != BPFILTER_NF_ACCEPT) {
+			BFLOG_ERR("verdict must be either NF_DROP or NF_ACCEPT");
+			return -EINVAL;
+		}
+
+		table->hook_entry[i] = rule_front - table->rules;
+		table->underflow[i] = rule_last - table->rules;
+	}
+
+	return table_check_hooks(table);
+}
+
+static struct rule *next_rule(const struct table *table, struct rule *rule)
+{
+	const uint32_t i = rule - table->rules;
+
+	if (table->num_rules <= i + 1) {
+		BFLOG_ERR("rule index is out of range");
+		return ERR_PTR(-EINVAL);
+	}
+
+	++rule;
+	rule->came_from = i;
+
+	return rule;
+}
+
+static struct rule *backtrack_rule(const struct table *table, struct rule *rule)
+{
+	uint32_t i = rule - table->rules;
+	int prev_i;
+
+	do {
+		rule->hook_mask ^= (1 << BPFILTER_INET_HOOK_MAX);
+		prev_i = i;
+		i = rule->came_from;
+		rule->came_from = 0;
+
+		if (i == prev_i)
+			return NULL;
+
+		rule = &table->rules[i];
+	} while (prev_i == i + 1);
+
+	return next_rule(table, rule);
+}
+
+static int table_check_chain(struct table *table, uint32_t hook,
+			     struct rule *rule)
+{
+	uint32_t i = rule - table->rules;
+
+	rule->came_from = i;
+
+	for (;;) {
+		bool visited;
+		int verdict;
+
+		if (!rule)
+			return 0;
+
+		if (IS_ERR(rule))
+			return PTR_ERR(rule);
+
+		i = rule - table->rules;
+
+		if (table->num_rules <= i) {
+			BFLOG_ERR("rule index is out of range: %d", i);
+			return -EINVAL;
+		}
+
+		if (rule->hook_mask & (1 << BPFILTER_INET_HOOK_MAX)) {
+			BFLOG_ERR("hook index out of range");
+			return -EINVAL;
+		}
+
+		// already visited
+		visited = rule->hook_mask & (1 << hook);
+		rule->hook_mask |= (1 << hook) | (1 << BPFILTER_INET_HOOK_MAX);
+
+		if (visited) {
+			rule = backtrack_rule(table, rule);
+			continue;
+		}
+
+		if (!rule_has_standard_target(rule)) {
+			rule = next_rule(table, rule);
+			continue;
+		}
+
+		verdict = standard_target_verdict(rule->target.ipt_target);
+		if (verdict > 0) {
+			rule = table_find_rule_by_offset(table, verdict);
+			if (!rule) {
+				BFLOG_ERR("failed to find rule by offset");
+				return -EINVAL;
+			}
+
+			rule->came_from = i;
+			continue;
+		}
+
+		if (!rule_is_unconditional(rule)) {
+			rule = next_rule(table, rule);
+			continue;
+		}
+
+		rule = backtrack_rule(table, rule);
+	}
+
+	return 0;
+}
+
+static int table_check_chains(struct table *table)
+{
+	int r = 0;
+
+	for (int i = 0, r = 0; !r && i < BPFILTER_INET_HOOK_MAX; ++i) {
+		if (table_has_hook(table, i))
+			r = table_check_chain(table, i, &table->rules[table->hook_entry[i]]);
+	}
+
+	return r;
+}
+
+struct table *create_table(struct context *ctx,
+			   const struct bpfilter_ipt_replace *ipt_replace)
+{
+	struct table *table;
+	int r;
+
+	table = calloc(1, sizeof(*table));
+	if (!table) {
+		BFLOG_ERR("out of memory");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	INIT_LIST_HEAD(&table->list);
+	table->valid_hooks = ipt_replace->valid_hooks;
+	table->num_rules = ipt_replace->num_entries;
+	table->num_counters = ipt_replace->num_counters;
+	table->size = ipt_replace->size;
+
+	r = table_init_rules(ctx, table, ipt_replace);
+	if (r) {
+		BFLOG_ERR("failed to initialise table rules: %s", STRERR(r));
+		goto err_free;
+	}
+
+	r = table_init_hooks(table, ipt_replace);
+	if (r) {
+		BFLOG_ERR("failed to initialise table hooks: %s", STRERR(r));
+		goto err_free;
+	}
+
+	r = table_check_chains(table);
+	if (r) {
+		BFLOG_ERR("failed to check table chains: %s", STRERR(r));
+		goto err_free;
+	}
+
+	return table;
+
+err_free:
+	free_table(table);
+
+	return ERR_PTR(r);
+}
+
+struct rule *table_find_rule_by_offset(const struct table *table,
+				       uint32_t offset)
+{
+	const struct bpfilter_ipt_entry *key;
+
+	key = table->entries + offset;
+
+	return bsearch(key, table->rules, table->num_rules,
+		       sizeof(table->rules[0]), rule_offset_comparator);
+}
+
+void table_get_info(const struct table *table,
+		    struct bpfilter_ipt_get_info *info)
+{
+	snprintf(info->name, sizeof(info->name), "%s", table->table_ops->name);
+	info->valid_hooks = table->valid_hooks;
+
+	for (int i = 0; i < BPFILTER_INET_HOOK_MAX; ++i) {
+		const struct rule *rule_front, *rule_last;
+
+		if (!table_has_hook(table, i)) {
+			info->hook_entry[i] = 0;
+			info->underflow[i] = 0;
+			continue;
+		}
+
+		rule_front = &table->rules[table->hook_entry[i]];
+		rule_last = &table->rules[table->underflow[i]];
+		info->hook_entry[i] = (const void *)rule_front->ipt_entry - table->entries;
+		info->underflow[i] = (const void *)rule_last->ipt_entry - table->entries;
+	}
+
+	info->num_entries = table->num_rules;
+	info->size = table->size;
+}
+
+void free_table(struct table *table)
+{
+	if (!table)
+		return;
+
+	list_del(&table->list);
+
+	if (table->rules) {
+		for (int i = 0; i < table->num_rules; ++i)
+			free_rule(&table->rules[i]);
+		free(table->rules);
+	}
+
+	free(table->entries);
+	free(table);
+}
diff --git a/net/bpfilter/table.h b/net/bpfilter/table.h
new file mode 100644
index 000000000000..d683005e1755
--- /dev/null
+++ b/net/bpfilter/table.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_TABLE_H
+#define NET_BPFILTER_TABLE_H
+
+#include "../../include/uapi/linux/bpfilter.h"
+
+#include <linux/types.h>
+
+#include <search.h>
+#include <stdint.h>
+
+struct context;
+struct rule;
+struct table;
+
+struct table_ops {
+	char name[BPFILTER_XT_TABLE_MAXNAMELEN];
+	struct table *(*create)(struct context *ctx,
+				const struct bpfilter_ipt_replace *ipt_replace);
+	int (*codegen)(struct context *ctx, struct table *table);
+	int (*install)(struct context *ctx, struct table *table);
+	void (*uninstall)(struct context *ctx, struct table *table);
+	void (*free)(struct table *table);
+	void (*update_counters)(struct table *table);
+};
+
+struct table {
+	const struct table_ops *table_ops;
+	uint32_t valid_hooks;
+	uint32_t num_rules;
+	uint32_t num_counters;
+	uint32_t size;
+	uint32_t hook_entry[BPFILTER_INET_HOOK_MAX];
+	uint32_t underflow[BPFILTER_INET_HOOK_MAX];
+	struct rule *rules;
+	void *entries;
+	void *ctx;
+	struct list_head list;
+};
+
+struct table_index {
+	struct hsearch_data map;
+	struct list_head list;
+};
+
+struct table *create_table(struct context *ctx,
+			   const struct bpfilter_ipt_replace *ipt_replace);
+struct rule *table_find_rule_by_offset(const struct table *table,
+				       uint32_t offset);
+void table_get_info(const struct table *table,
+		    struct bpfilter_ipt_get_info *info);
+void free_table(struct table *table);
+
+#endif // NET_BPFILTER_TABLE_H
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index 4ef52bfe2d21..53634699d427 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -45,7 +45,7 @@ BPFILTER_RULE_SRCS := $(BPFILTERSRCDIR)/rule.c
 BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS) $(BPFILTER_CODEGEN_SRCS)
 BPFILTER_COMMON_SRCS += $(BPFILTERSRCDIR)/context.c $(BPFILTERSRCDIR)/logger.c
 BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS) $(BPFILTER_TARGET_SRCS)
-BPFILTER_COMMON_SRCS += $(BPFILTER_RULE_SRCS)
+BPFILTER_COMMON_SRCS += $(BPFILTER_RULE_SRCS) $(BPFILTERSRCDIR)/table.c
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
 $(OUTPUT)/test_match: test_match.c $(BPFILTER_COMMON_SRCS)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 13/16] bpfilter: add table code generation
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (11 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 12/16] bpfilter: add table structure Quentin Deslandes
@ 2022-12-24  0:03 ` Quentin Deslandes
  2022-12-24  0:04 ` [PATCH bpf-next v3 14/16] bpfilter: add setsockopt() support Quentin Deslandes
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:03 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Table code generation consists of multiple steps:
  1) Find front and last rules for the supplied table and hook.
  2) Try to generate code for each rule in [front rule; last rule].
  3) Try to generate each remaining subprog by its type.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/codegen.c | 359 +++++++++++++++++++++++++++++++++++++++++
 net/bpfilter/codegen.h |   2 +
 2 files changed, 361 insertions(+)

diff --git a/net/bpfilter/codegen.c b/net/bpfilter/codegen.c
index e7ae7dfa5118..db0a20b378b5 100644
--- a/net/bpfilter/codegen.c
+++ b/net/bpfilter/codegen.c
@@ -4,15 +4,22 @@
  * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
  */
 
+#define _GNU_SOURCE
+
 #include "codegen.h"
 
 #include "../../include/uapi/linux/bpfilter.h"
 
+#include <linux/if_ether.h>
+#include <linux/ip.h>
 #include <linux/pkt_cls.h>
 
+#include <bpf/bpf_endian.h>
+
 #include <unistd.h>
 #include <sys/syscall.h>
 
+#include <search.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
@@ -20,6 +27,8 @@
 #include <bpf/libbpf.h>
 
 #include "logger.h"
+#include "rule.h"
+#include "table.h"
 
 enum fixup_insn_type {
 	FIXUP_INSN_OFF,
@@ -53,6 +62,65 @@ static int subprog_desc_comparator(const void *x, const void *y)
 	return -1;
 }
 
+/**
+ * codegen_inline_memset_zero_64() - Generate eBPF bytecode to initialise a
+ *	contiguous memory area with 0.
+ * @ctx: codegen context.
+ * @addr_reg: register containing the address of the memory area to initialise.
+ *	The caller need to initialise the register properly before calling this
+ *	function.
+ * @size: size of the memory area. As this function initialises 64 bits at a
+ *	time, @size needs to be a multiple of 64 bits. If it doesn't, the
+ *	function return without intialising the memory and an error message is
+ *	printed out.
+ *
+ * Return: 0 on success, negative errno value on error.
+ */
+static int codegen_inline_memset_zero_64(struct codegen *ctx, int reg,
+					 size_t size)
+{
+	if (size % 8) {
+		BFLOG_ERR("codegen_memset_zero_64() called with size %ld, size must be a multiple of 8",
+			  size);
+		return -EINVAL;
+	}
+
+	for (size_t i = 0; i * 8 < size; ++i)
+		EMIT(ctx, BPF_ST_MEM(BPF_DW, reg, i * 8, 0));
+
+	return 0;
+}
+
+static int codegen_push_subprog(struct codegen *codegen,
+				struct codegen_subprog_desc *subprog)
+{
+	// TODO: merge this with codegen_fixup_push.
+
+	if (codegen->subprogs_cur == codegen->subprogs_max) {
+		struct codegen_subprog_desc **subprogs;
+		uint16_t subprogs_max;
+
+		subprogs_max = codegen->subprogs_cur ?
+			       2 * codegen->subprogs_cur : 1;
+		subprogs = reallocarray(codegen->subprogs, subprogs_max,
+					sizeof(codegen->subprogs[0]));
+		if (!subprogs) {
+			BFLOG_ERR("out of memory");
+			return -ENOMEM;
+		}
+
+		codegen->subprogs_max = subprogs_max;
+		codegen->subprogs = subprogs;
+	}
+
+	codegen->subprogs[codegen->subprogs_cur++] = subprog;
+
+	qsort(codegen->subprogs, codegen->subprogs_cur,
+	      sizeof(codegen->subprogs[0]), subprog_desc_comparator);
+
+	return 0;
+}
+
 static const struct codegen_subprog_desc *codegen_find_subprog(struct codegen *codegen,
 							       const struct codegen_subprog_desc **subprog)
 {
@@ -601,6 +669,297 @@ int create_codegen(struct codegen *codegen, enum bpf_prog_type type)
 	return r;
 }
 
+static int try_codegen_rules(struct codegen *codegen, struct rule *rule_front,
+			     struct rule *rule_last)
+{
+	int r;
+
+	for (; rule_front <= rule_last; ++rule_front, ++codegen->rule_index) {
+		rule_front->index = codegen->rule_index;
+		r = gen_inline_rule(codegen, rule_front);
+		if (r) {
+			BFLOG_ERR("failed to generate inline rule: %s",
+				  STRERR(r));
+			return r;
+		}
+
+		r = codegen_fixup(codegen, CODEGEN_FIXUP_NEXT_RULE);
+		if (r) {
+			BFLOG_ERR("failed to generate next rule fixups: %s",
+				  STRERR(r));
+			return r;
+		}
+
+		r = codegen_fixup(codegen, CODEGEN_FIXUP_COUNTERS_INDEX);
+		if (r) {
+			BFLOG_ERR("failed to generate counters fixups: %s",
+				  STRERR(r));
+			return r;
+		}
+	}
+
+	r = codegen_fixup(codegen, CODEGEN_FIXUP_END_OF_CHAIN);
+	if (r) {
+		BFLOG_ERR("failed to generate end of chain fixups: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	return 0;
+}
+
+static struct rule *table_find_last_rule(const struct table *table,
+					 struct rule *rule_front)
+{
+	for (; rule_front; ++rule_front) {
+		if (!rule_is_unconditional(rule_front))
+			continue;
+
+		if (!rule_has_standard_target(rule_front))
+			continue;
+
+		if (standard_target_verdict(rule_front->target.ipt_target) >= 0)
+			continue;
+
+		return rule_front;
+	}
+
+	return rule_front;
+}
+
+static int try_codegen_user_chain_subprog(struct codegen *codegen,
+					  const struct table *table,
+					  struct codegen_subprog_desc *subprog)
+{
+	struct rule *rule_front;
+	struct rule *rule_last;
+	int r;
+
+	rule_front = table_find_rule_by_offset(table, subprog->offset);
+	if (!rule_front) {
+		BFLOG_ERR("failed to get rule at offset %d", subprog->offset);
+		return -EINVAL;
+	}
+
+	rule_last = table_find_last_rule(table, rule_front);
+	if (!rule_last) {
+		BFLOG_ERR("failed to find last rule");
+		return -EINVAL;
+	}
+
+	subprog->insn = codegen->len_cur;
+	codegen->rule_index = rule_front - table->rules;
+	r = try_codegen_rules(codegen, rule_front, rule_last);
+	if (r) {
+		BFLOG_ERR("failed to generate rules");
+		return r;
+	}
+
+	return codegen_push_subprog(codegen, subprog);
+}
+
+static int try_codegen_subprogs(struct codegen *codegen, const struct table *table)
+{
+	while (!list_empty(&codegen->awaiting_subprogs)) {
+		struct codegen_subprog_desc *subprog;
+		int r = -EINVAL;
+
+		subprog = list_entry(codegen->awaiting_subprogs.next,
+				     struct codegen_subprog_desc,
+				     list);
+
+		if (subprog->type == CODEGEN_SUBPROG_USER_CHAIN) {
+			r = try_codegen_user_chain_subprog(codegen, table,
+							   subprog);
+			if (r < 0) {
+				BFLOG_ERR("failed to generate code for user defined chain: %s",
+					  STRERR(r));
+				return r;
+			}
+		} else {
+			BFLOG_ERR("code generation for subprogram of type %d is not supported",
+				  subprog->type);
+			return -EINVAL;
+		}
+
+		list_del(&subprog->list);
+	}
+
+	return 0;
+}
+
+/**
+ * generate_inline_forward_packet_assessment() - Add eBPF bytecode to assess
+ *	whether the current packet is to be forwarded or not.
+ * @ctx: context to add the bytecode to.
+ *
+ * Use bpf_fib_lookup() to find out whether the current packet is to be
+ * forwarded or not. bpf_fib_lookup() requires a struct bpf_fib_lookup to be
+ * filled with data from the packet.
+ * The outcome of the bytecode will depend on the actual iptables hook used:
+ * BPFILTER_INET_HOOK_FORWARD's chain will be processed if the packet is to be
+ * forwarded, or will be skipped otherwise and jump to the next chain. The
+ * opposite behaviour apply if hook is BPFILTER_INET_HOOK_LOCAL_IN.
+ *
+ * Return: 0 on success, negative errno value on error.
+ */
+static int generate_inline_forward_packet_assessment(struct codegen *ctx)
+{
+	// Set ARG2 to contain the address of the struct bpf_fib_lookup.
+	EMIT(ctx, BPF_MOV64_REG(BPF_REG_ARG2, BPF_REG_10));
+	EMIT(ctx, BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG2,
+				(-(unsigned int)sizeof(struct bpf_fib_lookup)) - sizeof(struct runtime_context)));
+
+	codegen_inline_memset_zero_64(ctx, BPF_REG_ARG2,
+				      sizeof(struct bpf_fib_lookup));
+
+	// Store h_proto for further decision
+	EMIT(ctx, BPF_LDX_MEM(BPF_H, CODEGEN_REG_SCRATCH3, CODEGEN_REG_L3,
+			      -(short int)(ETH_HLEN) + (short int)offsetof(struct ethhdr, h_proto)));
+	EMIT(ctx, BPF_JMP_IMM(BPF_JEQ, CODEGEN_REG_SCRATCH3,
+			      bpf_htons(ETH_P_IP), 2));
+
+	// Not IPv4? Then do no process any FORWARD rule and move to the next chain (program).
+	EMIT(ctx, BPF_MOV64_IMM(CODEGEN_REG_RETVAL, TC_ACT_UNSPEC));
+	EMIT_FIXUP(ctx, CODEGEN_FIXUP_END_OF_CHAIN, BPF_JMP_A(0));
+
+	// bpf_fib_lookup.family field
+	EMIT(ctx, BPF_ST_MEM(BPF_B, BPF_REG_ARG2,
+			     offsetof(struct bpf_fib_lookup, family), AF_INET));
+
+	// bpf_fib_lookup.l4_protocol field
+	EMIT(ctx, BPF_LDX_MEM(BPF_B, CODEGEN_REG_SCRATCH3, CODEGEN_REG_L3,
+			      offsetof(struct iphdr, protocol)));
+	EMIT(ctx, BPF_STX_MEM(BPF_B, BPF_REG_ARG2, CODEGEN_REG_SCRATCH3,
+			      offsetof(struct bpf_fib_lookup, l4_protocol)));
+
+	// bpf_fib_lookup.tot_len field
+	EMIT(ctx, BPF_LDX_MEM(BPF_H, CODEGEN_REG_SCRATCH3, CODEGEN_REG_L3,
+			      offsetof(struct iphdr, tot_len)));
+	EMIT(ctx, BPF_STX_MEM(BPF_H, BPF_REG_ARG2, CODEGEN_REG_SCRATCH3,
+			      offsetof(struct bpf_fib_lookup, tot_len)));
+
+	// bpf_fib_lookup.ifindex field
+	EMIT(ctx, BPF_LDX_MEM(BPF_W, CODEGEN_REG_SCRATCH3, CODEGEN_REG_CTX,
+			      offsetof(struct __sk_buff, ingress_ifindex)));
+	EMIT(ctx, BPF_STX_MEM(BPF_W, BPF_REG_ARG2, CODEGEN_REG_SCRATCH3,
+			      offsetof(struct bpf_fib_lookup, ifindex)));
+
+	// bpf_fib_lookup.tos field
+	EMIT(ctx, BPF_LDX_MEM(BPF_B, CODEGEN_REG_SCRATCH3, CODEGEN_REG_L3,
+			      offsetof(struct iphdr, tos)));
+	EMIT(ctx, BPF_STX_MEM(BPF_B, BPF_REG_ARG2, CODEGEN_REG_SCRATCH3,
+			      offsetof(struct bpf_fib_lookup, tos)));
+
+	// bpf_fib_lookup.ipv4_src and bpf_fib_lookup.ipv4_dst fields
+	EMIT(ctx, BPF_LDX_MEM(BPF_W, CODEGEN_REG_SCRATCH3, CODEGEN_REG_L3,
+			      offsetof(struct iphdr, saddr)));
+	EMIT(ctx, BPF_STX_MEM(BPF_W, BPF_REG_ARG2, CODEGEN_REG_SCRATCH3,
+			      offsetof(struct bpf_fib_lookup, ipv4_src)));
+	EMIT(ctx, BPF_LDX_MEM(BPF_W, CODEGEN_REG_SCRATCH3, CODEGEN_REG_L3,
+			      offsetof(struct iphdr, daddr)));
+	EMIT(ctx, BPF_STX_MEM(BPF_W, BPF_REG_ARG2, CODEGEN_REG_SCRATCH3,
+			      offsetof(struct bpf_fib_lookup, ipv4_dst)));
+
+	EMIT(ctx, BPF_MOV64_REG(BPF_REG_ARG1, CODEGEN_REG_CTX));
+	EMIT(ctx, BPF_MOV64_IMM(BPF_REG_ARG3, sizeof(struct bpf_fib_lookup)));
+	EMIT(ctx, BPF_MOV64_IMM(BPF_REG_ARG4, 0));
+
+	EMIT(ctx, BPF_EMIT_CALL(BPF_FUNC_fib_lookup));
+	EMIT(ctx, BPF_MOV64_REG(CODEGEN_REG_SCRATCH3, CODEGEN_REG_RETVAL));
+	EMIT(ctx, BPF_MOV64_IMM(CODEGEN_REG_RETVAL, TC_ACT_UNSPEC));
+	EMIT_FIXUP(ctx, CODEGEN_FIXUP_END_OF_CHAIN,
+		   BPF_JMP_IMM(ctx->iptables_hook == BPFILTER_INET_HOOK_FORWARD ? BPF_JNE : BPF_JEQ,
+			       CODEGEN_REG_SCRATCH3, BPF_FIB_LKUP_RET_SUCCESS, 0));
+
+	return 0;
+}
+
+int try_codegen(struct codegen *codegen, const struct table *table)
+{
+	struct rule *rule_front;
+	struct rule *rule_last;
+	int r;
+
+	r = codegen->codegen_ops->gen_inline_prologue(codegen);
+	if (r) {
+		BFLOG_ERR("failed to generate inline prologue: %s", STRERR(r));
+		return r;
+	}
+
+	r = codegen->codegen_ops->load_packet_data(codegen, CODEGEN_REG_L3);
+	if (r) {
+		BFLOG_ERR("failed to generate code to load packet data: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	r = codegen->codegen_ops->load_packet_data_end(codegen,
+						       CODEGEN_REG_DATA_END);
+	if (r) {
+		BFLOG_ERR("failed to generate code to load packet data end: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	// save packet size once
+	EMIT(codegen, BPF_MOV64_REG(CODEGEN_REG_SCRATCH2, CODEGEN_REG_DATA_END));
+	EMIT(codegen, BPF_ALU64_REG(BPF_SUB, CODEGEN_REG_SCRATCH2, CODEGEN_REG_L3));
+	EMIT(codegen, BPF_STX_MEM(BPF_W, CODEGEN_REG_RUNTIME_CTX, CODEGEN_REG_SCRATCH2,
+				  STACK_RUNTIME_CONTEXT_OFFSET(data_size)));
+
+	EMIT(codegen, BPF_ALU64_IMM(BPF_ADD, CODEGEN_REG_L3, ETH_HLEN));
+	EMIT_FIXUP(codegen, CODEGEN_FIXUP_END_OF_CHAIN,
+		   BPF_JMP_REG(BPF_JGT, CODEGEN_REG_L3, CODEGEN_REG_DATA_END, 0));
+	EMIT(codegen, BPF_MOV64_REG(CODEGEN_REG_SCRATCH1, CODEGEN_REG_L3));
+	EMIT(codegen, BPF_ALU64_IMM(BPF_ADD, CODEGEN_REG_SCRATCH1, sizeof(struct iphdr)));
+	EMIT_FIXUP(codegen, CODEGEN_FIXUP_END_OF_CHAIN,
+		   BPF_JMP_REG(BPF_JGT, CODEGEN_REG_SCRATCH1, CODEGEN_REG_DATA_END, 0));
+
+	if (codegen->iptables_hook == BPFILTER_INET_HOOK_LOCAL_IN ||
+	    codegen->iptables_hook == BPFILTER_INET_HOOK_FORWARD) {
+		/* There is no XDP nor TC forward hook to attach to. So, we
+		 * need to add code to assess whether a incoming packet it
+		 * to be forwarded or not.
+		 */
+		BFLOG_NOTICE("generate forward packet assessment");
+		generate_inline_forward_packet_assessment(codegen);
+	}
+
+	rule_front = &table->rules[table->hook_entry[codegen->iptables_hook]];
+	rule_last = &table->rules[table->underflow[codegen->iptables_hook]];
+
+	codegen->rule_index = rule_front - table->rules;
+	r = try_codegen_rules(codegen, rule_front, rule_last);
+	if (r) {
+		BFLOG_ERR("failed to generate rules: %s", STRERR(r));
+		return r;
+	}
+
+	r = codegen->codegen_ops->gen_inline_epilogue(codegen);
+	if (r) {
+		BFLOG_ERR("failed to generate inline epilogue: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	r = try_codegen_subprogs(codegen, table);
+	if (r) {
+		BFLOG_ERR("failed to generate subprograms: %s", STRERR(r));
+		return r;
+	}
+
+	r = codegen_fixup(codegen, CODEGEN_FIXUP_JUMP_TO_CHAIN);
+	if (r) {
+		BFLOG_ERR("failed to generate fixups: %s", STRERR(r));
+		return r;
+	}
+
+	codegen->shared_codegen->maps[CODEGEN_RELOC_MAP].max_entries = table->num_rules;
+
+	return 0;
+}
+
 int load_img(struct codegen *codegen)
 {
 	union bpf_attr attr = {};
diff --git a/net/bpfilter/codegen.h b/net/bpfilter/codegen.h
index cca45a13c4aa..6cfd8e7a3692 100644
--- a/net/bpfilter/codegen.h
+++ b/net/bpfilter/codegen.h
@@ -18,6 +18,7 @@
 #include <stdint.h>
 
 struct context;
+struct table;
 
 #define CODEGEN_REG_RETVAL	BPF_REG_0
 #define CODEGEN_REG_SCRATCH1	BPF_REG_1
@@ -174,6 +175,7 @@ int codegen_fixup(struct codegen *codegen, enum codegen_fixup_type fixup_type);
 int emit_fixup(struct codegen *codegen, enum codegen_fixup_type fixup_type,
 	       struct bpf_insn insn);
 int emit_add_counter(struct codegen *codegen);
+int try_codegen(struct codegen *codegen, const struct table *table);
 int load_img(struct codegen *codegen);
 void unload_img(struct codegen *codegen);
 void free_codegen(struct codegen *codegen);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 14/16] bpfilter: add setsockopt() support
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (12 preceding siblings ...)
  2022-12-24  0:03 ` [PATCH bpf-next v3 13/16] bpfilter: add table code generation Quentin Deslandes
@ 2022-12-24  0:04 ` Quentin Deslandes
  2022-12-24  0:04 ` [PATCH bpf-next v3 15/16] bpfilter: add filter table Quentin Deslandes
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:04 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Add support of iptables' setsockopt(2).
The parameters of a setsockopt(2) call are passed by struct mbox_request
which contains a type of the setsockopt(2) call and its memory buffer
description. The supplied memory buffer is read-written by
process_vm_readv(2)/process_vm_writev(2).

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile  |   1 +
 net/bpfilter/sockopt.c | 533 +++++++++++++++++++++++++++++++++++++++++
 net/bpfilter/sockopt.h |  15 ++
 3 files changed, 549 insertions(+)
 create mode 100644 net/bpfilter/sockopt.c
 create mode 100644 net/bpfilter/sockopt.h

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 9f5b46c70a41..4a78a665b3f1 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -14,6 +14,7 @@ userprogs := bpfilter_umh
 bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
 bpfilter_umh-objs += match.o xt_udp.o target.o rule.o table.o
+bpfilter_umh-objs += sockopt.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/sockopt.c b/net/bpfilter/sockopt.c
new file mode 100644
index 000000000000..15de8e6ee31c
--- /dev/null
+++ b/net/bpfilter/sockopt.c
@@ -0,0 +1,533 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "sockopt.h"
+
+#include <linux/err.h>
+#include <linux/list.h>
+#include <linux/netfilter/x_tables.h>
+
+#include <sys/types.h>
+#include <sys/uio.h>
+
+#include <errno.h>
+#include <limits.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "context.h"
+#include "logger.h"
+#include "map-common.h"
+#include "match.h"
+#include "msgfmt.h"
+#include "table.h"
+
+static int pvm_read(pid_t pid, void *to, const void *from, size_t count)
+{
+	ssize_t total_bytes;
+	const struct iovec l_iov = {
+		.iov_base = to,
+		.iov_len = count
+	};
+	const struct iovec r_iov = {
+		.iov_base = (void *)from,
+		.iov_len = count
+	};
+
+	total_bytes = process_vm_readv(pid, &l_iov, 1, &r_iov, 1, 0);
+	if (total_bytes == -1) {
+		BFLOG_ERR("failed to read from PID %d: %s", pid, STRERR(errno));
+		return -errno;
+	}
+
+	if (total_bytes != count) {
+		BFLOG_ERR("invalid amount a data transferred: %ld bytes, %ld expected",
+			  total_bytes, count);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int pvm_read_from_offset(pid_t pid, void *to, const void *from,
+				size_t offset, size_t count)
+{
+	return pvm_read(pid, to + offset, from + offset, count);
+}
+
+static int pvm_write(pid_t pid, void *to, const void *from, size_t count)
+{
+	ssize_t total_bytes;
+	const struct iovec l_iov = {
+		.iov_base = (void *)from,
+		.iov_len = count
+	};
+	const struct iovec r_iov = {
+		.iov_base = to,
+		.iov_len = count
+	};
+
+	total_bytes = process_vm_writev(pid, &l_iov, 1, &r_iov, 1, 0);
+	if (total_bytes == -1) {
+		BFLOG_ERR("failed to write to PID %d: %s", pid, STRERR(errno));
+		return -errno;
+	}
+
+	if (total_bytes != count) {
+		BFLOG_ERR("invalid amount a data transferred: %ld bytes, %ld expected",
+			  total_bytes, count);
+		return -EFAULT;
+	}
+
+	return 0;
+}
+
+static int read_ipt_get_info(const struct mbox_request *req,
+			     struct bpfilter_ipt_get_info *info)
+{
+	int r;
+
+	if (req->len != sizeof(*info)) {
+		BFLOG_ERR("invalid request size: %d", req->len);
+		return -EINVAL;
+	}
+
+	r = pvm_read(req->pid, info, (const void *)req->addr, sizeof(*info));
+	if (r) {
+		BFLOG_ERR("failed to read from PID %d", req->pid);
+		return r;
+	}
+
+	info->name[sizeof(info->name) - 1] = '\0';
+
+	return 0;
+}
+
+static int sockopt_get_info(struct context *ctx, const struct mbox_request *req)
+{
+	struct bpfilter_ipt_get_info info;
+	struct table *table;
+	int r;
+
+	if (req->len != sizeof(info)) {
+		BFLOG_ERR("invalid request size: %d", req->len);
+		return -EINVAL;
+	}
+
+	r = read_ipt_get_info(req, &info);
+	if (r) {
+		BFLOG_ERR("failed to read struct ipt_get_info : %s", STRERR(r));
+		return r;
+	}
+
+	table = map_find(&ctx->table_index.map, info.name);
+	if (IS_ERR(table)) {
+		BFLOG_ERR("cannot find table '%s' in map", info.name);
+		return -ENOENT;
+	}
+
+	table_get_info(table, &info);
+
+	return pvm_write(req->pid, (void *)req->addr, &info, sizeof(info));
+}
+
+static int read_ipt_get_entries(const struct mbox_request *req,
+				struct bpfilter_ipt_get_entries *entries)
+{
+	int r;
+
+	if (req->len < sizeof(*entries)) {
+		BFLOG_ERR("invalid request size: %d", req->len);
+		return -EINVAL;
+	}
+
+	r = pvm_read(req->pid, entries, (const void *)req->addr,
+		     sizeof(*entries));
+	if (r) {
+		BFLOG_ERR("failed to read from PID %d", req->pid);
+		return r;
+	}
+
+	entries->name[sizeof(entries->name) - 1] = '\0';
+
+	return 0;
+}
+
+static int sockopt_get_entries(struct context *ctx,
+			       const struct mbox_request *req)
+{
+	struct bpfilter_ipt_get_entries get_entries;
+	struct bpfilter_ipt_get_entries *entries;
+	struct table *table;
+	int r;
+
+	r = read_ipt_get_entries(req, &get_entries);
+	if (r) {
+		BFLOG_ERR("failed to read struct ipt_get_entries: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	table = map_find(&ctx->table_index.map, get_entries.name);
+	if (IS_ERR(table)) {
+		BFLOG_ERR("cannot find table '%s' in map", get_entries.name);
+		return -ENOENT;
+	}
+
+	if (get_entries.size != table->size) {
+		BFLOG_ERR("table '%s' get entries size mismatch",
+			  get_entries.name);
+		return -EINVAL;
+	}
+
+	entries = (struct bpfilter_ipt_get_entries *)req->addr;
+
+	table->table_ops->update_counters(table);
+
+	r = pvm_write(req->pid, entries->name, table->table_ops->name,
+		      sizeof(entries->name));
+	if (r) {
+		BFLOG_ERR("failed to write to PID %d", req->pid);
+		return r;
+	}
+
+	r = pvm_write(req->pid, &entries->size, &table->size,
+		      sizeof(table->size));
+	if (r) {
+		BFLOG_ERR("failed to write to PID %d", req->pid);
+		return r;
+	}
+
+	return pvm_write(req->pid, entries->entries, table->entries, table->size);
+}
+
+static int read_ipt_get_revision(const struct mbox_request *req,
+				 struct bpfilter_ipt_get_revision *revision)
+{
+	int r;
+
+	if (req->len != sizeof(*revision)) {
+		BFLOG_ERR("invalid request size: %d", req->len);
+		return -EINVAL;
+	}
+
+	r = pvm_read(req->pid, revision, (const void *)req->addr,
+		     sizeof(*revision));
+	if (r) {
+		BFLOG_ERR("failed to read to PID %d", req->pid);
+		return r;
+	}
+
+	revision->name[sizeof(revision->name) - 1] = '\0';
+
+	return 0;
+}
+
+static int sockopt_get_revision_match(struct context *ctx,
+				      const struct mbox_request *req)
+{
+	struct bpfilter_ipt_get_revision get_revision;
+	const struct match_ops *found;
+	int r;
+
+	r = read_ipt_get_revision(req, &get_revision);
+	if (r) {
+		BFLOG_ERR("failed to read struct ipt_get_revision: %s", STRERR(r));
+		return r;
+	}
+
+	found = map_find(&ctx->match_ops_map, get_revision.name);
+	if (IS_ERR(found)) {
+		BFLOG_ERR("cannot find match '%s' in map", get_revision.name);
+		return PTR_ERR(found);
+	}
+
+	return found->revision;
+}
+
+static int sockopt_get_revision_target(struct context *ctx,
+				       const struct mbox_request *req)
+{
+	struct bpfilter_ipt_get_revision get_revision;
+	const struct match_ops *found;
+	int r;
+
+	r = read_ipt_get_revision(req, &get_revision);
+	if (r) {
+		BFLOG_ERR("failed to read struct ipt_get_revision: %s",
+			  STRERR(r));
+		return r;
+	}
+
+	found = map_find(&ctx->target_ops_map, get_revision.name);
+	if (IS_ERR(found)) {
+		BFLOG_ERR("cannot find target '%s' in map", get_revision.name);
+		return PTR_ERR(found);
+	}
+
+	return found->revision;
+}
+
+static struct bpfilter_ipt_replace *read_ipt_replace(struct context *ctx,
+						     const struct mbox_request *req)
+{
+	struct bpfilter_ipt_replace ipt_header;
+	struct bpfilter_ipt_replace *ipt_replace;
+	int r;
+
+	if (req->len < sizeof(ipt_header)) {
+		BFLOG_ERR("invalid request size: %d", req->len);
+		return ERR_PTR(-EINVAL);
+	}
+
+	r = pvm_read(req->pid, &ipt_header, (const void *)req->addr,
+		     sizeof(ipt_header));
+	if (r) {
+		BFLOG_ERR("failed to read from PID %d: %s", req->pid,
+			  STRERR(r));
+		return ERR_PTR(r);
+	}
+
+	if (ipt_header.num_counters == 0) {
+		BFLOG_ERR("no counter defined in struct ipt_header");
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (ipt_header.num_counters >= INT_MAX / sizeof(struct bpfilter_ipt_counters)) {
+		BFLOG_ERR("too many counters defined: %u",
+			  ipt_header.num_counters);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ipt_header.name[sizeof(ipt_header.name) - 1] = '\0';
+
+	ipt_replace = malloc(sizeof(ipt_header) + ipt_header.size);
+	if (!ipt_replace) {
+		BFLOG_ERR("out of memory");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	memcpy(ipt_replace, &ipt_header, sizeof(ipt_header));
+
+	r = pvm_read_from_offset(req->pid, ipt_replace, (const void *)req->addr,
+				 sizeof(ipt_header), ipt_header.size);
+	if (r) {
+		free(ipt_replace);
+		BFLOG_ERR("failed to read from PID %u at offset %lu: %s",
+			  req->pid, sizeof(ipt_header), STRERR(r));
+		return ERR_PTR(r);
+	}
+
+	return ipt_replace;
+}
+
+static int sockopt_set_replace(struct context *ctx,
+			       const struct mbox_request *req)
+{
+	struct bpfilter_ipt_replace *ipt_replace;
+	struct table *table;
+	struct table *new_table = NULL;
+	struct table_ops *table_ops;
+	int r;
+
+	ipt_replace = read_ipt_replace(ctx, req);
+	if (IS_ERR(ipt_replace)) {
+		BFLOG_ERR("failed to read struct ipt_replace: %s",
+			  STRERR(PTR_ERR(ipt_replace)));
+		return PTR_ERR(ipt_replace);
+	}
+
+	table_ops = map_find(&ctx->table_ops_map, ipt_replace->name);
+	if (IS_ERR(table_ops)) {
+		r = PTR_ERR(table_ops);
+		BFLOG_ERR("cannot find table_ops '%s' in map", ipt_replace->name);
+		goto cleanup;
+	}
+
+	new_table = table_ops->create(ctx, ipt_replace);
+	if (IS_ERR(new_table)) {
+		r = PTR_ERR(table_ops);
+		BFLOG_ERR("failed to create table '%s'", ipt_replace->name);
+		goto cleanup;
+	}
+
+	r = new_table->table_ops->codegen(ctx, new_table);
+	if (r) {
+		BFLOG_ERR("failed to generate code for table '%s'",
+			  ipt_replace->name);
+		goto cleanup;
+	}
+
+	table = map_find(&ctx->table_index.map, ipt_replace->name);
+	if (IS_ERR(table) && PTR_ERR(table) == -ENOENT)
+		table = NULL;
+
+	if (IS_ERR(table)) {
+		r = PTR_ERR(table);
+		BFLOG_ERR("cannot find table '%s' in map", ipt_replace->name);
+		goto cleanup;
+	}
+
+	if (table)
+		table->table_ops->uninstall(ctx, table);
+
+	r = new_table->table_ops->install(ctx, new_table);
+	if (r) {
+		BFLOG_ERR("failed to install new table '%s': %s",
+			  ipt_replace->name, STRERR(r));
+		if (table) {
+			int r2 = table->table_ops->install(ctx, table);
+
+			if (r2)
+				BFLOG_EMERG("failed to restore old table '%s': %s",
+					    table->table_ops->name,
+					    STRERR(r2));
+		}
+
+		goto cleanup;
+	}
+
+	r = map_upsert(&ctx->table_index.map, new_table->table_ops->name,
+		       new_table);
+	if (r) {
+		BFLOG_ERR("failed to upsert table map for '%s': %s",
+			  new_table->table_ops->name, STRERR(r));
+		goto cleanup;
+	}
+
+	list_add_tail(&new_table->list, &ctx->table_index.list);
+
+	new_table = table;
+
+cleanup:
+	if (!IS_ERR_OR_NULL(new_table))
+		new_table->table_ops->free(new_table);
+
+	free(ipt_replace);
+
+	return r;
+}
+
+static struct bpfilter_ipt_counters_info *read_ipt_counters_info(const struct mbox_request *req)
+{
+	struct bpfilter_ipt_counters_info *info;
+	size_t size;
+	int r;
+
+	if (req->len < sizeof(*info)) {
+		BFLOG_ERR("invalid request size: %d", req->len);
+		return ERR_PTR(-EINVAL);
+	}
+
+	info = malloc(req->len);
+	if (!info) {
+		BFLOG_ERR("out of memory");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	r = pvm_read(req->pid, info, (const void *)req->addr, sizeof(*info));
+	if (r) {
+		BFLOG_ERR("failed to read from PID %d", req->pid);
+		goto err_free;
+	}
+
+	size = info->num_counters * sizeof(info->counters[0]);
+	if (req->len != sizeof(*info) + size) {
+		BFLOG_ERR("not enough space to return counters");
+		r = -EINVAL;
+		goto err_free;
+	}
+
+	info->name[sizeof(info->name) - 1] = '\0';
+
+	r = pvm_read_from_offset(req->pid, info, (const void *)req->addr,
+				 sizeof(*info), size);
+	if (r) {
+		BFLOG_ERR("failed to read from PID %u at offset %lu: %s",
+			  req->pid, sizeof(*info), STRERR(r));
+		goto err_free;
+	}
+
+	return info;
+
+err_free:
+	free(info);
+
+	return ERR_PTR(r);
+}
+
+static int sockopt_set_add_counters(struct context *ctx,
+				    const struct mbox_request *req)
+{
+	struct bpfilter_ipt_counters_info *info;
+	struct table *table;
+	int r = 0;
+
+	info = read_ipt_counters_info(req);
+	if (IS_ERR(info)) {
+		r = PTR_ERR(info);
+		BFLOG_ERR("failed to read struct ipt_counters_info: %s",
+			  STRERR(r));
+		goto err_free;
+	}
+
+	table = map_find(&ctx->table_index.map, info->name);
+	if (IS_ERR(table)) {
+		r = PTR_ERR(table);
+		BFLOG_ERR("cannot find table '%s' in map", info->name);
+		goto err_free;
+	}
+
+	// TODO handle counters
+
+err_free:
+	free(info);
+
+	return r;
+}
+
+static int handle_get_request(struct context *ctx,
+			      const struct mbox_request *req)
+{
+	switch (req->cmd) {
+	case 0:
+		return 0;
+	case BPFILTER_IPT_SO_GET_INFO:
+		return sockopt_get_info(ctx, req);
+	case BPFILTER_IPT_SO_GET_ENTRIES:
+		return sockopt_get_entries(ctx, req);
+	case BPFILTER_IPT_SO_GET_REVISION_MATCH:
+		return sockopt_get_revision_match(ctx, req);
+	case BPFILTER_IPT_SO_GET_REVISION_TARGET:
+		return sockopt_get_revision_target(ctx, req);
+	}
+
+	BFLOG_ERR("unsupported SO_GET command: %d", req->cmd);
+	return -ENOPROTOOPT;
+}
+
+static int handle_set_request(struct context *ctx,
+			      const struct mbox_request *req)
+{
+	switch (req->cmd) {
+	case BPFILTER_IPT_SO_SET_REPLACE:
+		return sockopt_set_replace(ctx, req);
+	case BPFILTER_IPT_SO_SET_ADD_COUNTERS:
+		return sockopt_set_add_counters(ctx, req);
+	}
+
+	BFLOG_ERR("unsupported SO_SET command: %d", req->cmd);
+	return -ENOPROTOOPT;
+}
+
+int handle_sockopt_request(struct context *ctx,
+			   const struct mbox_request *req)
+{
+	return req->is_set ? handle_set_request(ctx, req) :
+			     handle_get_request(ctx, req);
+}
diff --git a/net/bpfilter/sockopt.h b/net/bpfilter/sockopt.h
new file mode 100644
index 000000000000..faf0502959b3
--- /dev/null
+++ b/net/bpfilter/sockopt.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_SOCKOPT_H
+#define NET_BPFILTER_SOCKOPT_H
+
+struct context;
+struct mbox_request;
+
+int handle_sockopt_request(struct context *ctx, const struct mbox_request *req);
+
+#endif // NET_BPFILTER_SOCKOPT_H
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 15/16] bpfilter: add filter table
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (13 preceding siblings ...)
  2022-12-24  0:04 ` [PATCH bpf-next v3 14/16] bpfilter: add setsockopt() support Quentin Deslandes
@ 2022-12-24  0:04 ` Quentin Deslandes
  2022-12-24  0:04 ` [PATCH bpf-next v3 16/16] bpfilter: handle setsockopt() calls Quentin Deslandes
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:04 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Introduce filter table support for INPUT, FORWARD, and OUTPUT chains.
All those chains are supported by SCHED_CLS_BPF.

INPUT and FORWARD programs are attached to TC_INGRESS hook and leverage
generate_inline_forward_packet_assessment() to check whether they should
or not process the incoming packet.

OUTPUT program is attached to TC_EGRESS hook.

create_filter_table() is used to create a default filter table
(statically stored in filter_table_replace_blob). This table doesn't
contain any rule and defaults to ACCEPTing packets on each chain.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/Makefile                         |   1 +
 net/bpfilter/context.c                        |   3 +-
 net/bpfilter/filter-table.c                   | 344 ++++++++++++++++++
 net/bpfilter/filter-table.h                   |  18 +
 .../testing/selftests/bpf/bpfilter/.gitignore |   1 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   3 +
 .../selftests/bpf/bpfilter/bpfilter_util.h    |  27 ++
 .../selftests/bpf/bpfilter/test_codegen.c     | 338 +++++++++++++++++
 8 files changed, 734 insertions(+), 1 deletion(-)
 create mode 100644 net/bpfilter/filter-table.c
 create mode 100644 net/bpfilter/filter-table.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_codegen.c

diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile
index 4a78a665b3f1..7def305f0af3 100644
--- a/net/bpfilter/Makefile
+++ b/net/bpfilter/Makefile
@@ -15,6 +15,7 @@ bpfilter_umh-objs := main.o logger.o map-common.o
 bpfilter_umh-objs += context.o codegen.o
 bpfilter_umh-objs += match.o xt_udp.o target.o rule.o table.o
 bpfilter_umh-objs += sockopt.o
+bpfilter_umh-objs += filter-table.o
 bpfilter_umh-userldlibs := $(LIBBPF_A) -lelf -lz
 userccflags += -I $(srctree)/tools/include/ -I $(srctree)/tools/include/uapi
 
diff --git a/net/bpfilter/context.c b/net/bpfilter/context.c
index 81c9751a2a2d..6bb0362a79c8 100644
--- a/net/bpfilter/context.c
+++ b/net/bpfilter/context.c
@@ -13,6 +13,7 @@
 
 #include <string.h>
 
+#include "filter-table.h"
 #include "logger.h"
 #include "map-common.h"
 #include "match.h"
@@ -73,7 +74,7 @@ static int init_target_ops_map(struct context *ctx)
 	return 0;
 }
 
-static const struct table_ops *table_ops[] = {};
+static const struct table_ops *table_ops[] = { &filter_table_ops };
 
 static int init_table_ops_map(struct context *ctx)
 {
diff --git a/net/bpfilter/filter-table.c b/net/bpfilter/filter-table.c
new file mode 100644
index 000000000000..452a6d5b2fd0
--- /dev/null
+++ b/net/bpfilter/filter-table.c
@@ -0,0 +1,344 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#define _GNU_SOURCE
+
+#include "filter-table.h"
+
+#include "../../include/uapi/linux/bpfilter.h"
+
+#include <linux/kernel.h>
+#include <linux/err.h>
+
+#include <errno.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include <bpf/bpf.h>
+
+#include "codegen.h"
+#include "context.h"
+#include "logger.h"
+#include "msgfmt.h"
+#include "rule.h"
+#include "sockopt.h"
+
+struct filter_table_context {
+	struct shared_codegen shared;
+	struct codegen local_in;
+	struct codegen forward;
+	struct codegen local_out;
+};
+
+static struct table *filter_table_create(struct context *ctx,
+					 const struct bpfilter_ipt_replace *ipt_replace)
+{
+	struct table *t = create_table(ctx, ipt_replace);
+
+	if (!IS_ERR_OR_NULL(t))
+		t->table_ops = &filter_table_ops;
+
+	return t;
+}
+
+static int filter_table_codegen(struct context *ctx, struct table *table)
+{
+	struct filter_table_context *table_ctx;
+	int r;
+
+	BUG_ON(table->table_ops != &filter_table_ops);
+
+	if (table->ctx) {
+		BFLOG_ERR("filter table has no context");
+		return -EINVAL;
+	}
+
+	table_ctx = calloc(1, sizeof(*table_ctx));
+	if (!table_ctx) {
+		BFLOG_ERR("out of memory");
+		return -ENOMEM;
+	}
+
+	create_shared_codegen(&table_ctx->shared);
+
+	// INPUT chain
+	r = create_codegen(&table_ctx->local_in, BPF_PROG_TYPE_SCHED_CLS);
+	if (r) {
+		BFLOG_ERR("TC codegen for INPUT chain creation failed");
+		goto err_free_table_ctx;
+	}
+
+	table_ctx->local_in.ctx = ctx;
+	table_ctx->local_in.shared_codegen = &table_ctx->shared;
+	table_ctx->local_in.iptables_hook = BPFILTER_INET_HOOK_LOCAL_IN;
+	table_ctx->local_in.bpf_tc_hook = BPF_TC_INGRESS;
+
+	r = try_codegen(&table_ctx->local_in, table);
+	if (r) {
+		BFLOG_ERR("failed to generate LOCAL_IN code");
+		goto err_free_local_in;
+	}
+
+	// FORWARD chain
+	r = create_codegen(&table_ctx->forward, BPF_PROG_TYPE_SCHED_CLS);
+	if (r) {
+		BFLOG_ERR("TC codegen for FORWARD chain create failed");
+		goto err_free_local_in;
+	}
+
+	table_ctx->forward.ctx = ctx;
+	table_ctx->forward.shared_codegen = &table_ctx->shared;
+	table_ctx->forward.iptables_hook = BPFILTER_INET_HOOK_FORWARD;
+	table_ctx->forward.bpf_tc_hook = BPF_TC_INGRESS;
+
+	r = try_codegen(&table_ctx->forward, table);
+	if (r) {
+		BFLOG_ERR("failed to generate LOCAL_FORWARD code");
+		goto err_free_local_fwd;
+	}
+
+	// OUTPUT chain
+	r = create_codegen(&table_ctx->local_out, BPF_PROG_TYPE_SCHED_CLS);
+	if (r) {
+		BFLOG_ERR("TC codegen for OUTPUT chain creation failed");
+		goto err_free_local_fwd;
+	}
+
+	table_ctx->local_out.ctx = ctx;
+	table_ctx->local_out.shared_codegen = &table_ctx->shared;
+	table_ctx->local_out.iptables_hook = BPFILTER_INET_HOOK_LOCAL_OUT;
+	table_ctx->local_out.bpf_tc_hook = BPF_TC_EGRESS;
+
+	r = try_codegen(&table_ctx->local_out, table);
+	if (r) {
+		BFLOG_ERR("failed to generate LOCAL_OUT code");
+		goto err_free_local_out;
+	}
+
+	table->ctx = table_ctx;
+
+	return 0;
+
+err_free_local_out:
+	free_codegen(&table_ctx->local_out);
+err_free_local_fwd:
+	free_codegen(&table_ctx->forward);
+err_free_local_in:
+	free_codegen(&table_ctx->local_in);
+err_free_table_ctx:
+	free(table_ctx);
+
+	return r;
+}
+
+static int filter_table_install(struct context *ctx, struct table *table)
+{
+	struct filter_table_context *table_ctx;
+	int r;
+
+	if (!table->ctx)
+		return -EINVAL;
+
+	table_ctx = (struct filter_table_context *)table->ctx;
+
+	r = table_ctx->local_in.codegen_ops->load_img(&table_ctx->local_in);
+	if (r < 0) {
+		BFLOG_ERR("failed to load chain INPUT in table filter: %s",
+			  table_ctx->local_in.log_buf);
+		return r;
+	}
+
+	r = table_ctx->forward.codegen_ops->load_img(&table_ctx->forward);
+	if (r < 0) {
+		BFLOG_ERR("failed to load chain FORWARD in table filter: %s",
+			  table_ctx->forward.log_buf);
+		goto err_unload_local_in;
+	}
+
+	r = table_ctx->local_out.codegen_ops->load_img(&table_ctx->local_out);
+	if (r < 0) {
+		BFLOG_ERR("failed to load chain OUTPUT in table filter: %s",
+			  table_ctx->local_out.log_buf);
+		goto err_unload_forward;
+	}
+
+	BFLOG_DBG("installed filter table");
+
+	return 0;
+
+err_unload_forward:
+	table_ctx->forward.codegen_ops->unload_img(&table_ctx->forward);
+err_unload_local_in:
+	table_ctx->local_in.codegen_ops->unload_img(&table_ctx->local_in);
+
+	return r;
+}
+
+static void filter_table_uninstall(struct context *ctx, struct table *table)
+{
+	struct filter_table_context *table_ctx;
+
+	BUG_ON(!table->ctx);
+
+	table_ctx = (struct filter_table_context *)table->ctx;
+
+	table_ctx->local_in.codegen_ops->unload_img(&table_ctx->local_in);
+	table_ctx->forward.codegen_ops->unload_img(&table_ctx->forward);
+	table_ctx->local_out.codegen_ops->unload_img(&table_ctx->local_out);
+}
+
+static void filter_table_free(struct table *table)
+{
+	if (table->ctx) {
+		struct filter_table_context *table_ctx;
+
+		table_ctx = (struct filter_table_context *)table->ctx;
+
+		free_codegen(&table_ctx->local_in);
+		free_codegen(&table_ctx->forward);
+		free_codegen(&table_ctx->local_out);
+		free(table_ctx);
+	}
+
+	free_table(table);
+}
+
+static void filter_table_update_counters(struct table *table)
+{
+	int r;
+	struct rule *rule;
+	struct filter_table_context *ctx = table->ctx;
+	struct shared_codegen *shared = &ctx->shared;
+	int map_fd = shared->maps_fd[CODEGEN_MAP_COUNTERS];
+
+	for (uint32_t i = 0; i < table->num_rules; ++i) {
+		rule = &table->rules[i];
+
+		r = bpf_map_lookup_elem(map_fd, &rule->index,
+					(void *)&rule->ipt_entry->counters);
+		if (r < 0) {
+			BFLOG_DBG("couldn't fetch counter for rule at %p",
+				  rule);
+		}
+	}
+}
+
+const struct table_ops filter_table_ops = {
+	.name = "filter",
+	.create = filter_table_create,
+	.codegen = filter_table_codegen,
+	.install = filter_table_install,
+	.uninstall = filter_table_uninstall,
+	.free = filter_table_free,
+	.update_counters = filter_table_update_counters
+};
+
+static uint8_t filter_table_replace_blob[] = {
+	0x66, 0x69, 0x6c, 0x74, 0x65, 0x72, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x0e, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
+	0x78, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x98, 0x00, 0x00, 0x00,
+	0x30, 0x01, 0x00, 0x00,	0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x98, 0x00, 0x00, 0x00, 0x30, 0x01, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00,
+	0x10, 0x32, 0x40, 0x36, 0x43, 0x56, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0xb0, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x40, 0x00, 0x45, 0x52, 0x52, 0x4f, 0x52, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x45, 0x52, 0x52, 0x4f, 0x52, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+int create_filter_table(struct context *ctx)
+{
+	struct mbox_request req;
+
+	req.addr = (__u64)filter_table_replace_blob;
+	req.len = ARRAY_SIZE(filter_table_replace_blob);
+	req.is_set = 1;
+	req.cmd = BPFILTER_IPT_SO_SET_REPLACE;
+	req.pid = getpid();
+
+	return handle_sockopt_request(ctx, &req);
+}
diff --git a/net/bpfilter/filter-table.h b/net/bpfilter/filter-table.h
new file mode 100644
index 000000000000..7d5a8464456f
--- /dev/null
+++ b/net/bpfilter/filter-table.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
+#ifndef NET_BPFILTER_FILTER_TABLE_H
+#define NET_BPFILTER_FILTER_TABLE_H
+
+#include "table.h"
+
+struct context;
+
+extern const struct table_ops filter_table_ops;
+
+int create_filter_table(struct context *ctx);
+
+#endif // NET_BPFILTER_FILTER_TABLE_H
diff --git a/tools/testing/selftests/bpf/bpfilter/.gitignore b/tools/testing/selftests/bpf/bpfilter/.gitignore
index a934ddef58d2..926cbb0cfb59 100644
--- a/tools/testing/selftests/bpf/bpfilter/.gitignore
+++ b/tools/testing/selftests/bpf/bpfilter/.gitignore
@@ -5,3 +5,4 @@ test_match
 test_xt_udp
 test_target
 test_rule
+test_codegen
diff --git a/tools/testing/selftests/bpf/bpfilter/Makefile b/tools/testing/selftests/bpf/bpfilter/Makefile
index 53634699d427..9eb558dfc99a 100644
--- a/tools/testing/selftests/bpf/bpfilter/Makefile
+++ b/tools/testing/selftests/bpf/bpfilter/Makefile
@@ -15,6 +15,7 @@ TEST_GEN_PROGS += test_match
 TEST_GEN_PROGS += test_xt_udp
 TEST_GEN_PROGS += test_target
 TEST_GEN_PROGS += test_rule
+TEST_GEN_PROGS += test_codegen
 
 KSFT_KHDR_INSTALL := 1
 
@@ -46,9 +47,11 @@ BPFILTER_COMMON_SRCS := $(BPFILTER_MAP_SRCS) $(BPFILTER_CODEGEN_SRCS)
 BPFILTER_COMMON_SRCS += $(BPFILTERSRCDIR)/context.c $(BPFILTERSRCDIR)/logger.c
 BPFILTER_COMMON_SRCS += $(BPFILTER_MATCH_SRCS) $(BPFILTER_TARGET_SRCS)
 BPFILTER_COMMON_SRCS += $(BPFILTER_RULE_SRCS) $(BPFILTERSRCDIR)/table.c
+BPFILTER_COMMON_SRCS +=  $(BPFILTERSRCDIR)/filter-table.c $(BPFILTERSRCDIR)/sockopt.c
 
 $(OUTPUT)/test_map: test_map.c $(BPFILTER_MAP_SRCS)
 $(OUTPUT)/test_match: test_match.c $(BPFILTER_COMMON_SRCS)
 $(OUTPUT)/test_xt_udp: test_xt_udp.c $(BPFILTER_COMMON_SRCS)
 $(OUTPUT)/test_target: test_target.c $(BPFILTER_COMMON_SRCS)
 $(OUTPUT)/test_rule: test_rule.c $(BPFILTER_COMMON_SRCS)
+$(OUTPUT)/test_codegen: test_codegen.c $(BPFILTER_COMMON_SRCS)
diff --git a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
index 8dd7911fa06f..846b50bdab07 100644
--- a/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
+++ b/tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
@@ -3,12 +3,39 @@
 #ifndef BPFILTER_UTIL_H
 #define BPFILTER_UTIL_H
 
+#include <linux/bpf.h>
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter_ipv4/ip_tables.h>
 
 #include <stdio.h>
 #include <stdint.h>
 #include <string.h>
+#include <sys/syscall.h>
+#include <unistd.h>
+
+static inline int sys_bpf(int cmd, union bpf_attr *attr, unsigned int size)
+{
+	return syscall(SYS_bpf, cmd, attr, size);
+}
+
+static inline int bpf_prog_test_run(int fd, const void *data,
+				    unsigned int data_size, uint32_t *retval)
+{
+	union bpf_attr attr = {};
+	int r;
+
+	attr.test.prog_fd = fd;
+	attr.test.data_in = (uintptr_t)data;
+	attr.test.data_size_in = data_size;
+	attr.test.repeat = 1000000;
+
+	r = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
+
+	if (retval)
+		*retval = attr.test.retval;
+
+	return r;
+}
 
 static inline void init_entry_match(struct xt_entry_match *match,
 				    uint16_t size, uint8_t revision,
diff --git a/tools/testing/selftests/bpf/bpfilter/test_codegen.c b/tools/testing/selftests/bpf/bpfilter/test_codegen.c
new file mode 100644
index 000000000000..9f11b7b8a126
--- /dev/null
+++ b/tools/testing/selftests/bpf/bpfilter/test_codegen.c
@@ -0,0 +1,338 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+
+#include <linux/bpf.h>
+#include <linux/bpfilter.h>
+
+#include <linux/err.h>
+#include <linux/pkt_cls.h>
+
+#include "../../kselftest_harness.h"
+
+#include <stdint.h>
+
+#include "codegen.h"
+#include "context.h"
+#include "filter-table.h"
+#include "logger.h"
+#include "table.h"
+
+#include "bpfilter_util.h"
+
+FIXTURE(test_codegen)
+{
+	struct context ctx;
+	struct shared_codegen shared_codegen;
+	struct codegen codegen;
+	struct table *table;
+	int prog_fd;
+	uint32_t retval;
+	union bpf_attr attr;
+};
+
+FIXTURE_VARIANT(test_codegen)
+{
+	const struct bpfilter_ipt_replace *replace_blob;
+	size_t replace_blob_size;
+	const uint8_t *packet;
+	size_t packet_size;
+	enum bpf_prog_type prog_type;
+	int hook;
+	int expected_retval;
+};
+
+/*
+ *  Generated by iptables-save v1.8.2 on Sat May  8 05:22:41 2021
+ * *filter
+ * :INPUT ACCEPT [0:0]
+ * :FORWARD ACCEPT [0:0]
+ * :OUTPUT ACCEPT [0:0]
+ * -A INPUT -s 1.1.1.1/32 -d 2.2.2.2/32 -j DROP
+ * -A INPUT -s 2.2.0.0/16 -d 3.0.0.0/8 -j DROP
+ * -A INPUT -p udp -m udp --sport 100 --dport 500 -j DROP
+ * COMMIT
+ */
+
+static const uint8_t user_defined_chain_blob[] = {
+	0x66, 0x69, 0x6c, 0x74, 0x65, 0x72, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x0e, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00,
+	0x70, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x90, 0x02, 0x00, 0x00,
+	0x28, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0xf8, 0x01, 0x00, 0x00,
+	0x90, 0x02, 0x00, 0x00, 0x28, 0x03, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00,
+	0xe0, 0x26, 0x99, 0xca, 0x67, 0x55, 0x00, 0x00,
+	0x01, 0x01, 0x01, 0x01, 0x02, 0x02, 0x02, 0x02,
+	0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x02, 0x02, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00,
+	0xff, 0xff, 0x00, 0x00, 0xff, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xa0, 0x00, 0xc8, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x30, 0x00, 0x75, 0x64, 0x70, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x64, 0x00, 0x64, 0x00, 0xf4, 0x01, 0xf4, 0x01,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0x98, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x28, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0xfe, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x70, 0x00, 0xb0, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x40, 0x00, 0x45, 0x52, 0x52, 0x4f, 0x52, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x45, 0x52, 0x52, 0x4f, 0x52, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
+};
+
+// Generated by scapy
+// Ether(src='00:11:22:33:44:55',dst='66:77:88:99:aa:bb')/IP(src='1.1.1.1',dst='2.2.2.2')/UDP(sport=100,dport=200)
+static const uint8_t udp_packet_1[] = {
+	0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0x00, 0x11,
+	0x22, 0x33, 0x44, 0x55, 0x08, 0x00, 0x45, 0x00,
+	0x00, 0x1c, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+	0x74, 0xcb, 0x01, 0x01, 0x01, 0x01, 0x02, 0x02,
+	0x02, 0x02, 0x00, 0x64, 0x00, 0xc8, 0x00, 0x08,
+	0xf8, 0xac,
+};
+
+// Ether(src='00:11:22:33:44:55',dst='66:77:88:99:aa:bb')/IP(src='2.2.2.2',dst='3.1.4.1')/UDP(sport=100,dport=200)
+static const uint8_t udp_packet_2[] = {
+	0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0x00, 0x11,
+	0x22, 0x33, 0x44, 0x55, 0x08, 0x00, 0x45, 0x00,
+	0x00, 0x1c, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+	0x6f, 0xcb, 0x02, 0x02, 0x02, 0x02, 0x03, 0x01,
+	0x04, 0x01, 0x00, 0x64, 0x00, 0xc8, 0x00, 0x08,
+	0xf3, 0xac,
+};
+
+// Ether(src='00:11:22:33:44:55',dst='66:77:88:99:aa:bb')/IP(src='2.7.1.8',dst='3.1.4.1')/UDP(sport=100,dport=500)
+static const uint8_t udp_packet_3[] = {
+	0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0x00, 0x11,
+	0x22, 0x33, 0x44, 0x55, 0x08, 0x00, 0x45, 0x00,
+	0x00, 0x1c, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+	0x70, 0xc0, 0x02, 0x07, 0x01, 0x08, 0x03, 0x01,
+	0x04, 0x01, 0x00, 0x64, 0x01, 0xf4, 0x00, 0x08,
+	0xf3, 0x75,
+};
+
+// Ether(src='00:11:22:33:44:55',dst='66:77:88:99:aa:bb')/IP(src='5.5.5.5',dst='5.5.5.5')/UDP(sport=300,dport=300)
+static const uint8_t udp_packet_4[] = {
+	0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0x00, 0x11,
+	0x22, 0x33, 0x44, 0x55, 0x08, 0x00, 0x45, 0x00,
+	0x00, 0x1c, 0x00, 0x01, 0x00, 0x00, 0x40, 0x11,
+	0x66, 0xbd, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
+	0x05, 0x05, 0x01, 0x2c, 0x01, 0x2c, 0x00, 0x08,
+	0xe9, 0x72,
+};
+
+FIXTURE_VARIANT_ADD(test_codegen, drop_by_ip_tc) {
+	.replace_blob = (const struct bpfilter_ipt_replace *)user_defined_chain_blob,
+	.replace_blob_size = ARRAY_SIZE(user_defined_chain_blob),
+	.packet = udp_packet_1,
+	.packet_size = ARRAY_SIZE(udp_packet_1),
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.hook = BPFILTER_INET_HOOK_LOCAL_IN,
+	.expected_retval = TC_ACT_SHOT,
+};
+
+FIXTURE_VARIANT_ADD(test_codegen, drop_by_net_tc) {
+	.replace_blob = (const struct bpfilter_ipt_replace *)user_defined_chain_blob,
+	.replace_blob_size = ARRAY_SIZE(user_defined_chain_blob),
+	.packet = udp_packet_2,
+	.packet_size = ARRAY_SIZE(udp_packet_2),
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.hook = BPFILTER_INET_HOOK_LOCAL_IN,
+	.expected_retval = TC_ACT_SHOT,
+};
+
+FIXTURE_VARIANT_ADD(test_codegen, drop_by_udp_port_tc) {
+	.replace_blob = (const struct bpfilter_ipt_replace *)user_defined_chain_blob,
+	.replace_blob_size = ARRAY_SIZE(user_defined_chain_blob),
+	.packet = udp_packet_3,
+	.packet_size = ARRAY_SIZE(udp_packet_3),
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.hook = BPFILTER_INET_HOOK_LOCAL_IN,
+	.expected_retval = TC_ACT_SHOT,
+};
+
+FIXTURE_VARIANT_ADD(test_codegen, accept_tc) {
+	.replace_blob = (const struct bpfilter_ipt_replace *)user_defined_chain_blob,
+	.replace_blob_size = ARRAY_SIZE(user_defined_chain_blob),
+	.packet = udp_packet_4,
+	.packet_size = ARRAY_SIZE(udp_packet_4),
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.hook = BPFILTER_INET_HOOK_LOCAL_IN,
+	.expected_retval = TC_ACT_UNSPEC,
+};
+
+FIXTURE_SETUP(test_codegen)
+{
+	logger_set_file(stderr);
+	ASSERT_EQ(create_context(&self->ctx), 0);
+
+	create_shared_codegen(&self->shared_codegen);
+	ASSERT_EQ(0, create_codegen(&self->codegen, variant->prog_type));
+
+	self->codegen.ctx = &self->ctx;
+	self->codegen.shared_codegen = &self->shared_codegen;
+	self->codegen.iptables_hook = variant->hook;
+
+	self->table = filter_table_ops.create(&self->ctx, variant->replace_blob);
+	ASSERT_FALSE(IS_ERR_OR_NULL(self->table));
+
+	ASSERT_EQ(0, try_codegen(&self->codegen, self->table));
+
+	self->prog_fd = load_img(&self->codegen);
+	ASSERT_GT(self->prog_fd, -1)
+	TH_LOG("load_img(): '%s': %s", STRERR(self->prog_fd),
+	       self->codegen.log_buf);
+};
+
+FIXTURE_TEARDOWN(test_codegen)
+{
+	filter_table_ops.free(self->table);
+	unload_img(&self->codegen);
+	free_codegen(&self->codegen);
+	free_context(&self->ctx);
+	if (self->prog_fd > -1)
+		close(self->prog_fd);
+};
+
+TEST_F(test_codegen, test_run)
+{
+	EXPECT_EQ(0, bpf_prog_test_run(self->prog_fd, variant->packet,
+				       variant->packet_size, &self->retval))
+	TH_LOG("cannot bpf_prog_test_run(): '%s'", STRERR(errno));
+	EXPECT_EQ(self->retval, variant->expected_retval)
+	TH_LOG("expected: %d, actual: %d\n", variant->expected_retval,
+	       self->retval);
+}
+
+TEST_HARNESS_MAIN
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 16/16] bpfilter: handle setsockopt() calls
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (14 preceding siblings ...)
  2022-12-24  0:04 ` [PATCH bpf-next v3 15/16] bpfilter: add filter table Quentin Deslandes
@ 2022-12-24  0:04 ` Quentin Deslandes
  2022-12-27 18:22 ` [PATCH bpf-next v3 00/16] bpfilter Alexei Starovoitov
  2023-01-03 11:45 ` Florian Westphal
  17 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-24  0:04 UTC (permalink / raw)
  To: qde
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Use earlier introduced infrastructure and handle setsockopt(2) calls.

Co-developed-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Quentin Deslandes <qde@naccy.de>
---
 net/bpfilter/main.c | 132 ++++++++++++++++++++++++++++++--------------
 1 file changed, 90 insertions(+), 42 deletions(-)

diff --git a/net/bpfilter/main.c b/net/bpfilter/main.c
index 291a92546246..c157277c48b5 100644
--- a/net/bpfilter/main.c
+++ b/net/bpfilter/main.c
@@ -1,64 +1,112 @@
 // SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Telegram FZ-LLC
+ * Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
+ */
+
 #define _GNU_SOURCE
-#include <sys/uio.h>
+
 #include <errno.h>
 #include <stdio.h>
-#include <sys/socket.h>
-#include <fcntl.h>
+#include <stdlib.h>
+#include <sys/types.h>
 #include <unistd.h>
-#include "../../include/uapi/linux/bpf.h"
-#include <asm/unistd.h>
+
+#include "context.h"
+#include "filter-table.h"
+#include "logger.h"
 #include "msgfmt.h"
+#include "sockopt.h"
 
-FILE *debug_f;
+#define do_exact(fd, op, buffer, count)							  \
+	({										  \
+		typeof(count) __count = count;						  \
+		size_t total = 0;							  \
+		int r = 0;								  \
+											  \
+		do {									  \
+			const ssize_t part = op(fd, (buffer) + total, (__count) - total); \
+			if (part > 0) {							  \
+				total += part;						  \
+			} else if (part == 0 && (__count) > 0) {			  \
+				r = -EIO;						  \
+				break;							  \
+			} else if (part == -1) {					  \
+				if (errno == EINTR)					  \
+					continue;					  \
+				r = -errno;						  \
+				break;							  \
+			}								  \
+		} while (total < (__count));						  \
+											  \
+		r;									  \
+	})
 
-static int handle_get_cmd(struct mbox_request *cmd)
+static int read_exact(int fd, void *buffer, size_t count)
 {
-	switch (cmd->cmd) {
-	case 0:
-		return 0;
-	default:
-		break;
-	}
-	return -ENOPROTOOPT;
+	return do_exact(fd, read, buffer, count);
+}
+
+static int write_exact(int fd, const void *buffer, size_t count)
+{
+	return do_exact(fd, write, buffer, count);
 }
 
-static int handle_set_cmd(struct mbox_request *cmd)
+static int setup_context(struct context *ctx)
 {
-	return -ENOPROTOOPT;
+	int r;
+
+	r = logger_init();
+	if (r < 0)
+		return r;
+
+	BFLOG_DBG("log file opened and ready to use");
+
+	r = create_filter_table(ctx);
+	if (r < 0)
+		BFLOG_ERR("failed to created filter table: %s", STRERR(r));
+
+	return r;
 }
 
-static void loop(void)
+static void loop(struct context *ctx)
 {
-	while (1) {
-		struct mbox_request req;
-		struct mbox_reply reply;
-		int n;
-
-		n = read(0, &req, sizeof(req));
-		if (n != sizeof(req)) {
-			fprintf(debug_f, "invalid request %d\n", n);
-			return;
-		}
-
-		reply.status = req.is_set ?
-			handle_set_cmd(&req) :
-			handle_get_cmd(&req);
-
-		n = write(1, &reply, sizeof(reply));
-		if (n != sizeof(reply)) {
-			fprintf(debug_f, "reply failed %d\n", n);
-			return;
-		}
+	struct mbox_request req;
+	struct mbox_reply reply;
+	int r;
+
+	for (;;) {
+		r = read_exact(STDIN_FILENO, &req, sizeof(req));
+		if (r)
+			BFLOG_EMERG("cannot read request: %s", STRERR(r));
+
+		reply.status = handle_sockopt_request(ctx, &req);
+
+		r = write_exact(STDOUT_FILENO, &reply, sizeof(reply));
+		if (r)
+			BFLOG_EMERG("cannot write reply: %s", STRERR(r));
 	}
 }
 
 int main(void)
 {
-	debug_f = fopen("/dev/kmsg", "w");
-	setvbuf(debug_f, 0, _IOLBF, 0);
-	fprintf(debug_f, "<5>Started bpfilter\n");
-	loop();
-	fclose(debug_f);
+	struct context ctx;
+	int r;
+
+	r = create_context(&ctx);
+	if (r)
+		return r;
+
+	r = setup_context(&ctx);
+	if (r) {
+		free_context(&ctx);
+		return r;
+	}
+
+	loop(&ctx);
+
+	// Disregard return value, the application is closed anyway.
+	(void)logger_clean();
+
 	return 0;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (15 preceding siblings ...)
  2022-12-24  0:04 ` [PATCH bpf-next v3 16/16] bpfilter: handle setsockopt() calls Quentin Deslandes
@ 2022-12-27 18:22 ` Alexei Starovoitov
  2023-01-03 11:38   ` Florian Westphal
  2023-01-06 14:15   ` Quentin Deslandes
  2023-01-03 11:45 ` Florian Westphal
  17 siblings, 2 replies; 26+ messages in thread
From: Alexei Starovoitov @ 2022-12-27 18:22 UTC (permalink / raw)
  To: Quentin Deslandes
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team, fw

On Sat, Dec 24, 2022 at 01:03:46AM +0100, Quentin Deslandes wrote:
> 
> Due to poor hardware availability on my side, I've not been able to
> benchmark those changes. I plan to get some numbers for the next iteration.

Yeah. Performance numbers would be my main question :)

> FORWARD filter chain is now supported, however, it's attached to
> TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
> multiple programs to be attached. I could generate a single program
> out of both INPUT and FORWARD chains, but that would prevent another
> BPF program to be attached to the interface anyway. If a solution
> exists to attach both those programs to XDP while allowing for other
> programs to be attached, it requires more investigation. In the meantime,
> INPUT and FORWARD filtering is supported using TC.

I think we can ignore XDP chaining for now assuming that Daniel's bpf_link-tc work
will be applicable to XDP as well, so we'll have a simple chaining
for XDP eventually.

As far as attaching to TC... I think it would be great to combine bpfilter
codegen and attach to Florian's bpf hooks exactly at netfilter.
See
https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?h=nf_hook_jit_bpf_29&id=0c1ec06503cb8a142d3ad9f760b72d94ea0091fa
With nf_hook_ingress() calling either into classic iptable or into bpf_prog_run_nf
which is either generated by Florian's optimizer of nf chains or into
bpfilter generated code would be ideal.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2022-12-27 18:22 ` [PATCH bpf-next v3 00/16] bpfilter Alexei Starovoitov
@ 2023-01-03 11:38   ` Florian Westphal
  2023-01-06 14:15   ` Quentin Deslandes
  1 sibling, 0 replies; 26+ messages in thread
From: Florian Westphal @ 2023-01-03 11:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Quentin Deslandes, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team, fw

Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> codegen and attach to Florian's bpf hooks exactly at netfilter.
> See
> https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?h=nf_hook_jit_bpf_29&id=0c1ec06503cb8a142d3ad9f760b72d94ea0091fa

FWIW I plan to submit this patchset for 6.2.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
                   ` (16 preceding siblings ...)
  2022-12-27 18:22 ` [PATCH bpf-next v3 00/16] bpfilter Alexei Starovoitov
@ 2023-01-03 11:45 ` Florian Westphal
  2023-01-06 14:43   ` Quentin Deslandes
  17 siblings, 1 reply; 26+ messages in thread
From: Florian Westphal @ 2023-01-03 11:45 UTC (permalink / raw)
  To: Quentin Deslandes
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Quentin Deslandes <qde@naccy.de> wrote:
> The patchset is based on the patches from David S. Miller [1],
> Daniel Borkmann [2], and Dmitrii Banshchikov [3].
> 
> Note: I've partially sent this patchset earlier due to a
> mistake on my side, sorry for then noise.
> 
> The main goal of the patchset is to prepare bpfilter for
> iptables' configuration blob parsing and code generation.
> 
> The patchset introduces data structures and code for matches,
> targets, rules and tables. Beside that the code generation
> is introduced.
> 
> The first version of the code generation supports only "inline"
> mode - all chains and their rules emit instructions in linear
> approach.
> 
> Things that are not implemented yet:
>   1) The process of switching from the previous BPF programs to the
>      new set isn't atomic.

You can't make this atomic from userspace perspective, the
get/setsockopt API of iptables uses a read-modify-write model.

Tentatively I'd try to extend libnftnl and generate bpf code there,
since its used by both iptables(-nft) and nftables we'd automatically
get support for both.

I was planning to look into "attach bpf progs to raw netfilter hooks"
in Q1 2023, once the initial nf-bpf-codegen is merged.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2022-12-27 18:22 ` [PATCH bpf-next v3 00/16] bpfilter Alexei Starovoitov
  2023-01-03 11:38   ` Florian Westphal
@ 2023-01-06 14:15   ` Quentin Deslandes
  2023-01-12  3:03     ` Florian Westphal
  1 sibling, 1 reply; 26+ messages in thread
From: Quentin Deslandes @ 2023-01-06 14:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team, fw

Le 27/12/2022 à 19:22, Alexei Starovoitov a écrit :
> On Sat, Dec 24, 2022 at 01:03:46AM +0100, Quentin Deslandes wrote:
>>
>> Due to poor hardware availability on my side, I've not been able to
>> benchmark those changes. I plan to get some numbers for the next iteration.
> 
> Yeah. Performance numbers would be my main question :)

Hardware is on the way! :)

>> FORWARD filter chain is now supported, however, it's attached to
>> TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
>> multiple programs to be attached. I could generate a single program
>> out of both INPUT and FORWARD chains, but that would prevent another
>> BPF program to be attached to the interface anyway. If a solution
>> exists to attach both those programs to XDP while allowing for other
>> programs to be attached, it requires more investigation. In the meantime,
>> INPUT and FORWARD filtering is supported using TC.
> 
> I think we can ignore XDP chaining for now assuming that Daniel's bpf_link-tc work
> will be applicable to XDP as well, so we'll have a simple chaining
> for XDP eventually.
> 
> As far as attaching to TC... I think it would be great to combine bpfilter
> codegen and attach to Florian's bpf hooks exactly at netfilter.
> See
> https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?h=nf_hook_jit_bpf_29&id=0c1ec06503cb8a142d3ad9f760b72d94ea0091fa
> With nf_hook_ingress() calling either into classic iptable or into bpf_prog_run_nf
> which is either generated by Florian's optimizer of nf chains or into
> bpfilter generated code would be ideal.

That sounds interesting. If my understanding is correct, Florian's
work doesn't yet allow for userspace-generated programs to be attached,
which will be required for bpfilter.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2023-01-03 11:45 ` Florian Westphal
@ 2023-01-06 14:43   ` Quentin Deslandes
  2023-01-12  3:17     ` Florian Westphal
  0 siblings, 1 reply; 26+ messages in thread
From: Quentin Deslandes @ 2023-01-06 14:43 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Le 03/01/2023 à 12:45, Florian Westphal a écrit :
> Quentin Deslandes <qde@naccy.de> wrote:
>> The patchset is based on the patches from David S. Miller [1],
>> Daniel Borkmann [2], and Dmitrii Banshchikov [3].
>>
>> Note: I've partially sent this patchset earlier due to a
>> mistake on my side, sorry for then noise.
>>
>> The main goal of the patchset is to prepare bpfilter for
>> iptables' configuration blob parsing and code generation.
>>
>> The patchset introduces data structures and code for matches,
>> targets, rules and tables. Beside that the code generation
>> is introduced.
>>
>> The first version of the code generation supports only "inline"
>> mode - all chains and their rules emit instructions in linear
>> approach.
>>
>> Things that are not implemented yet:
>>    1) The process of switching from the previous BPF programs to the
>>       new set isn't atomic.
> 
> You can't make this atomic from userspace perspective, the
> get/setsockopt API of iptables uses a read-modify-write model.

This refers to updating the programs from bpfilter's side. It won't
be atomic from iptables point of view, but currently bpfilter will
remove the program associated to a table, before installing the new
one. This means packets received in between those operations are
not filtered. I assume a better solution is possible.

> Tentatively I'd try to extend libnftnl and generate bpf code there,
> since its used by both iptables(-nft) and nftables we'd automatically
> get support for both.

That's one of the option, this could also remain in the kernel
tree or in a dedicated git repository. I don't know which one would
be the best, I'm open to suggestions.

> I was planning to look into "attach bpf progs to raw netfilter hooks"
> in Q1 2023, once the initial nf-bpf-codegen is merged.

Is there any plan to support non raw hooks? That's mainly out
of curiosity, I don't even know whether that would be a good thing
or not.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2023-01-06 14:15   ` Quentin Deslandes
@ 2023-01-12  3:03     ` Florian Westphal
  0 siblings, 0 replies; 26+ messages in thread
From: Florian Westphal @ 2023-01-12  3:03 UTC (permalink / raw)
  To: Quentin Deslandes
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team, fw

Quentin Deslandes <qde@naccy.de> wrote:
> That sounds interesting. If my understanding is correct, Florian's
> work doesn't yet allow for userspace-generated programs to be attached,
> which will be required for bpfilter.

Yes, but I started working on the attachment side.  It doesn't depend
on the nf-bpf generator patch set.

I think I can share PoC/RFC draft next week.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2023-01-06 14:43   ` Quentin Deslandes
@ 2023-01-12  3:17     ` Florian Westphal
  2023-01-25 10:25       ` Quentin Deslandes
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Westphal @ 2023-01-12  3:17 UTC (permalink / raw)
  To: Quentin Deslandes
  Cc: Florian Westphal, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

Quentin Deslandes <qde@naccy.de> wrote:
> Le 03/01/2023 à 12:45, Florian Westphal a écrit :
> > You can't make this atomic from userspace perspective, the
> > get/setsockopt API of iptables uses a read-modify-write model.
> 
> This refers to updating the programs from bpfilter's side. It won't
> be atomic from iptables point of view, but currently bpfilter will
> remove the program associated to a table, before installing the new
> one. This means packets received in between those operations are
> not filtered. I assume a better solution is possible.

Ah, I see, thanks.

> > Tentatively I'd try to extend libnftnl and generate bpf code there,
> > since its used by both iptables(-nft) and nftables we'd automatically
> > get support for both.
> 
> That's one of the option, this could also remain in the kernel
> tree or in a dedicated git repository. I don't know which one would
> be the best, I'm open to suggestions.

I can imagine that this will see a flurry of activity in the early
phase so I think a 'semi test repo' makes sense.

Provideded license allows this, useable bits and pieces can then
be grafted on to libnftnl (or iptables or whatever).

> > I was planning to look into "attach bpf progs to raw netfilter hooks"
> > in Q1 2023, once the initial nf-bpf-codegen is merged.
> 
> Is there any plan to support non raw hooks? That's mainly out
> of curiosity, I don't even know whether that would be a good thing
> or not.

Not sure what 'non raw hook' is.  Idea was to expose

1. protcocol family
2. hook number (prerouting, input etc)
3. priority

to userspace via bpf syscall/bpf link.

userspace would then provide the above info to kernel via
bpf(... BPF_LINK_CREATE )

which would then end up doing:
--------------
h.hook = nf_hook_run_bpf; // wrapper to call BPF_PROG_RUN
h.priv = prog; // the bpf program to run
h.pf = attr->netfilter.pf;
h.priority = attr->netfilter.priority;
h.hooknum = attr->netfilter.hooknum;

nf_register_net_hook(net, &h);
--------------

After that nf_hook_slow() calls the bpf program just like any
other of the netfilter hooks.

Does that make sense or did you have something else in mind?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH bpf-next v3 00/16] bpfilter
  2023-01-12  3:17     ` Florian Westphal
@ 2023-01-25 10:25       ` Quentin Deslandes
  0 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2023-01-25 10:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, linux-kselftest, netdev, Kernel Team

On Thu, Jan 12, 2023 at 04:17:28AM +0100, Florian Westphal wrote:
> Quentin Deslandes <qde@naccy.de> wrote:
> > Le 03/01/2023 à 12:45, Florian Westphal a écrit :
> > > You can't make this atomic from userspace perspective, the
> > > get/setsockopt API of iptables uses a read-modify-write model.
> > 
> > This refers to updating the programs from bpfilter's side. It won't
> > be atomic from iptables point of view, but currently bpfilter will
> > remove the program associated to a table, before installing the new
> > one. This means packets received in between those operations are
> > not filtered. I assume a better solution is possible.
> 
> Ah, I see, thanks.
> 
> > > Tentatively I'd try to extend libnftnl and generate bpf code there,
> > > since its used by both iptables(-nft) and nftables we'd automatically
> > > get support for both.
> > 
> > That's one of the option, this could also remain in the kernel
> > tree or in a dedicated git repository. I don't know which one would
> > be the best, I'm open to suggestions.
> 
> I can imagine that this will see a flurry of activity in the early
> phase so I think a 'semi test repo' makes sense.
> 
> Provideded license allows this, useable bits and pieces can then
> be grafted on to libnftnl (or iptables or whatever).
> 
> > > I was planning to look into "attach bpf progs to raw netfilter hooks"
> > > in Q1 2023, once the initial nf-bpf-codegen is merged.
> > 
> > Is there any plan to support non raw hooks? That's mainly out
> > of curiosity, I don't even know whether that would be a good thing
> > or not.
> 
> Not sure what 'non raw hook' is.  Idea was to expose
> 
> 1. protcocol family
> 2. hook number (prerouting, input etc)
> 3. priority
> 
> to userspace via bpf syscall/bpf link.
> 
> userspace would then provide the above info to kernel via
> bpf(... BPF_LINK_CREATE )
> 
> which would then end up doing:
> --------------
> h.hook = nf_hook_run_bpf; // wrapper to call BPF_PROG_RUN
> h.priv = prog; // the bpf program to run
> h.pf = attr->netfilter.pf;
> h.priority = attr->netfilter.priority;
> h.hooknum = attr->netfilter.hooknum;
> 
> nf_register_net_hook(net, &h);
> --------------
> 
> After that nf_hook_slow() calls the bpf program just like any
> other of the netfilter hooks.
> 
> Does that make sense or did you have something else in mind?

Sounds good to me. I thought you were referring to hooks available for
the RAW table (as in `iptables --table raw...`).

Thanks,
Quentin


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH bpf-next v3 00/16] bpfilter
@ 2022-12-23 23:40 Quentin Deslandes
  0 siblings, 0 replies; 26+ messages in thread
From: Quentin Deslandes @ 2022-12-23 23:40 UTC (permalink / raw)
  To: qde
  Cc: kernel-team, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Mykola Lysenko, Shuah Khan, Dmitrii Banshchikov, linux-kernel,
	bpf, netdev, linux-kselftest

The patchset is based on the patches from David S. Miller [1],
Daniel Borkmann [2], and Dmitrii Banshchikov [3].

The main goal of the patchset is to prepare bpfilter for
iptables' configuration blob parsing and code generation.

The patchset introduces data structures and code for matches,
targets, rules and tables. Beside that the code generation
is introduced.

The first version of the code generation supports only "inline"
mode - all chains and their rules emit instructions in linear
approach.

Things that are not implemented yet:
  1) The process of switching from the previous BPF programs to the
     new set isn't atomic.
  2) No support of device ifindex - it's hardcoded
  3) No helper subprog for counters update

Another problem is using iptables' blobs for tests and filter
table initialization. While it saves lines something more
maintainable should be done here.

The plan for the next iteration:
  1) Add a helper program for counters update
  2) Handle ifindex

Patches 1/2 adds definitions of the used types.
Patch 3 adds logging to bpfilter.
Patch 4 adds an associative map.
Patch 5 add runtime context structure.
Patches 6/7 add code generation infrastructure and TC code generator.
Patches 8/9/10/11/12 add code for matches, targets, rules and table.
Patch 13 adds code generation for table.
Patch 14 handles hooked setsockopt(2) calls.
Patch 15 adds filter table
Patch 16 uses prepared code in main().

Due to poor hardware availability on my side, I've not been able to
benchmark those changes. I plan to get some numbers for the next iteration.

FORWARD filter chain is now supported, however, it's attached to
TC INGRESS along with INPUT filter chain. This is due to XDP not supporting
multiple programs to be attached. I could generate a single program
out of both INPUT and FORWARD chains, but that would prevent another
BPF program to be attached to the interface anyway. If a solution
exists to attach both those programs to XDP while allowing for other
programs to be attached, it requires more investigation. In the meantime,
INPUT and FORWARD filtering is supported using TC.

Most of the code in this series was written by Dmitrii Banshchikov,
my changes are limited to v3. I've tried to reflect this fact in the
commits by adding 'Co-developed-by:' and 'Signed-off-by:' for Dmitrii,
please tell me this was done the wrong way.

v2 -> v3
Chains:
  * Add support for FORWARD filter chain.
  * Add generation of BPF bytecode to assess whether a packet should be
    forwarded or not, using bpf_fib_lookup().
  * Allow for multiple programs to be attached to TC.
  * Allow for multiple TC hooks to be used.
Code generation:
  * Remove duplicated BPF bytecode generation.
  * Fix a bug regarding jump offset during generation.
  * Remove support for XDP from the series, as it's not currently
    used.
Table:
  * Add new filter_table_update_counters() virtual call. It updates
    the table's counter stored in the ipt_entry structure. This way,
    when iptables tries to fetch the values of the counters, bpfilter only
    has to copy the ipt_entry cached in the table structure.
Logging:
  * Refactor logging primitives.
Sockopts:
  * Add support for userspace counters querying.
Rule:
  * Store the rule's index inside struct rule, to each counters'
    map usage.

v1 -> v2
Maps:
  * Use map_upsert instead of separate map_insert and map_update
Matches:
  * Add a new virtual call - gen_inline. The call is used for
  * inline generating of a rule's match.
Targets:
  * Add a new virtual call - gen_inline. The call is used for inline
    generating of a rule's target.
Rules:
  * Add code generation for rules
Table:
  * Add struct table_ops
  * Add map for table_ops
  * Add filter table
  * Reorganize the way filter table is initialized
Sockopts:
  * Install/uninstall BPF programs while handling
    IPT_SO_SET_REPLACE
Code generation:
  * Add first version of the code generation
Dependencies:
  * Add libbpf

v0 -> v1
IO:
  * Use ssize_t in pvm_read, pvm_write for total_bytes
  * Move IO functions into sockopt.c and main.c
Logging:
  * Use LOGLEVEL_EMERG, LOGLEVEL_NOTICE, LOGLEVE_DEBUG
    while logging to /dev/kmsg
  * Prepend log message with <n> where n is log level
  * Conditionally enable BFLOG_DEBUG messages
  * Merge bflog.{h,c} into context.h
Matches:
  * Reorder fields in struct match_ops for tight packing
  * Get rid of struct match_ops_map
  * Rename udp_match_ops to xt_udp
  * Use XT_ALIGN macro
  * Store payload size in match size
  * Move udp match routines into a separate file
Targets:
  * Reorder fields in struct target_ops for tight packing
  * Get rid of struct target_ops_map
  * Add comments for convert_verdict function
Rules:
  * Add validation
Tables:
  * Combine table_map and table_list into table_index
  * Add validation
Sockopts:
  * Handle IPT_SO_GET_REVISION_TARGET

1. https://lore.kernel.org/patchwork/patch/902785/
2. https://lore.kernel.org/patchwork/patch/902783/
3. https://kernel.ubuntu.com/~cking/stress-ng/stress-ng.pdf

Quentin Deslandes (16):
  bpfilter: add types for usermode helper
  tools: add bpfilter usermode helper header
  bpfilter: add logging facility
  bpfilter: add map container
  bpfilter: add runtime context
  bpfilter: add BPF bytecode generation infrastructure
  bpfilter: add support for TC bytecode generation
  bpfilter: add match structure
  bpfilter: add support for src/dst addr and ports
  bpfilter: add target structure
  bpfilter: add rule structure
  bpfilter: add table structure
  bpfilter: add table code generation
  bpfilter: add setsockopt() support
  bpfilter: add filter table
  bpfilter: handle setsockopt() calls

 include/uapi/linux/bpfilter.h                 |  154 +++
 net/bpfilter/Makefile                         |   16 +-
 net/bpfilter/codegen.c                        | 1040 +++++++++++++++++
 net/bpfilter/codegen.h                        |  183 +++
 net/bpfilter/context.c                        |  168 +++
 net/bpfilter/context.h                        |   24 +
 net/bpfilter/filter-table.c                   |  344 ++++++
 net/bpfilter/filter-table.h                   |   18 +
 net/bpfilter/logger.c                         |   52 +
 net/bpfilter/logger.h                         |   80 ++
 net/bpfilter/main.c                           |  132 ++-
 net/bpfilter/map-common.c                     |   51 +
 net/bpfilter/map-common.h                     |   19 +
 net/bpfilter/match.c                          |   55 +
 net/bpfilter/match.h                          |   37 +
 net/bpfilter/rule.c                           |  286 +++++
 net/bpfilter/rule.h                           |   37 +
 net/bpfilter/sockopt.c                        |  533 +++++++++
 net/bpfilter/sockopt.h                        |   15 +
 net/bpfilter/table.c                          |  391 +++++++
 net/bpfilter/table.h                          |   59 +
 net/bpfilter/target.c                         |  203 ++++
 net/bpfilter/target.h                         |   57 +
 net/bpfilter/xt_udp.c                         |  111 ++
 tools/include/uapi/linux/bpfilter.h           |  175 +++
 .../testing/selftests/bpf/bpfilter/.gitignore |    8 +
 tools/testing/selftests/bpf/bpfilter/Makefile |   57 +
 .../selftests/bpf/bpfilter/bpfilter_util.h    |   80 ++
 .../selftests/bpf/bpfilter/test_codegen.c     |  338 ++++++
 .../testing/selftests/bpf/bpfilter/test_map.c |   63 +
 .../selftests/bpf/bpfilter/test_match.c       |   69 ++
 .../selftests/bpf/bpfilter/test_rule.c        |   56 +
 .../selftests/bpf/bpfilter/test_target.c      |   83 ++
 .../selftests/bpf/bpfilter/test_xt_udp.c      |   48 +
 34 files changed, 4999 insertions(+), 43 deletions(-)
 create mode 100644 net/bpfilter/codegen.c
 create mode 100644 net/bpfilter/codegen.h
 create mode 100644 net/bpfilter/context.c
 create mode 100644 net/bpfilter/context.h
 create mode 100644 net/bpfilter/filter-table.c
 create mode 100644 net/bpfilter/filter-table.h
 create mode 100644 net/bpfilter/logger.c
 create mode 100644 net/bpfilter/logger.h
 create mode 100644 net/bpfilter/map-common.c
 create mode 100644 net/bpfilter/map-common.h
 create mode 100644 net/bpfilter/match.c
 create mode 100644 net/bpfilter/match.h
 create mode 100644 net/bpfilter/rule.c
 create mode 100644 net/bpfilter/rule.h
 create mode 100644 net/bpfilter/sockopt.c
 create mode 100644 net/bpfilter/sockopt.h
 create mode 100644 net/bpfilter/table.c
 create mode 100644 net/bpfilter/table.h
 create mode 100644 net/bpfilter/target.c
 create mode 100644 net/bpfilter/target.h
 create mode 100644 net/bpfilter/xt_udp.c
 create mode 100644 tools/include/uapi/linux/bpfilter.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/.gitignore
 create mode 100644 tools/testing/selftests/bpf/bpfilter/Makefile
 create mode 100644 tools/testing/selftests/bpf/bpfilter/bpfilter_util.h
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_codegen.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_map.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_match.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_rule.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_target.c
 create mode 100644 tools/testing/selftests/bpf/bpfilter/test_xt_udp.c

--
2.38.1

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-01-25 10:44 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-24  0:03 [PATCH bpf-next v3 00/16] bpfilter Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 01/16] bpfilter: add types for usermode helper Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 02/16] tools: add bpfilter usermode helper header Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 03/16] bpfilter: add logging facility Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 04/16] bpfilter: add map container Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 05/16] bpfilter: add runtime context Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 06/16] bpfilter: add BPF bytecode generation infrastructure Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 07/16] bpfilter: add support for TC bytecode generation Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 08/16] bpfilter: add match structure Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 09/16] bpfilter: add support for src/dst addr and ports Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 10/16] bpfilter: add target structure Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 11/16] bpfilter: add rule structure Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 12/16] bpfilter: add table structure Quentin Deslandes
2022-12-24  0:03 ` [PATCH bpf-next v3 13/16] bpfilter: add table code generation Quentin Deslandes
2022-12-24  0:04 ` [PATCH bpf-next v3 14/16] bpfilter: add setsockopt() support Quentin Deslandes
2022-12-24  0:04 ` [PATCH bpf-next v3 15/16] bpfilter: add filter table Quentin Deslandes
2022-12-24  0:04 ` [PATCH bpf-next v3 16/16] bpfilter: handle setsockopt() calls Quentin Deslandes
2022-12-27 18:22 ` [PATCH bpf-next v3 00/16] bpfilter Alexei Starovoitov
2023-01-03 11:38   ` Florian Westphal
2023-01-06 14:15   ` Quentin Deslandes
2023-01-12  3:03     ` Florian Westphal
2023-01-03 11:45 ` Florian Westphal
2023-01-06 14:43   ` Quentin Deslandes
2023-01-12  3:17     ` Florian Westphal
2023-01-25 10:25       ` Quentin Deslandes
  -- strict thread matches above, loose matches on Subject: below --
2022-12-23 23:40 Quentin Deslandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).