bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator
@ 2021-10-14 12:10 Florian Westphal
  2021-10-14 12:10 ` [PATCH 1/1] netfilter: add " Florian Westphal
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

This series adds a bpf program generator for netfilter base hooks.

Currently netfilter hooks are invoked via nf_hook_slow, which walks
an array of function_addr:arg pairs:

for i in hooks[]; do
  verdict = hooks[i]->addr(hooks->[i].arg, skb, state);
  switch (verdict) { ....

The autogenerator unrolls this loop and builds a bpf program
that does:

state->priv = hooks->[0].hook_arg;
v = firstfunction(state);
if (v != ACCEPT) goto out;
state->priv = hooks->[1].hook_arg;
v = secondfunction(state); ...
if (v != ACCEPT) goto out;

... and so on.

Indirections are converted to direct calls. Invocation of the
autogenerated programs is done via bpf dispatcher from nf_hook().

As long as NF_QUEUE is not used, normal data path will not call
nf_hook_slow "interpreter" anymore.

Purpose of this is to eventually add a 'netfilter prog type' to bpf and
permit attachment of (userspace generated) bpf programs to the netfilter
machinery, e.g.  'attach bpf prog id 1234 to ipv6 PREROUTING at prio -300'.
The autogenerator would be adjusted so that these userspace-bpf programs
are invoked just like native c functions.

This will require to expose the context structure (program argument,
'struct __nf_hook_state *' and rewrite read-accesses to it to match internal
nf_hook_state layout plus new verifier checks on permitted return values
(e.g. a plain 'return NF_STOLEN' results in a skb leak).

Known problems:
- I did not convert all hooks to the new scheme, e.g. ILA won't compile ATM.
- checkpatch complains about a few indendation issues, line lengths etc.

Future work:
add support for NAT hooks, they still use indirect calls, but those
are less of a problem because these get called only once per connection.

Could annotate ops struct as to what kind of verdicts the
C function can return.  This would allow to elide retval
check when hook can only return NF_ACCEPT.

Could add extra support for INGRESS hook to move more code from
inline functions to the autogenerated program.

Initial tests show roughly 8% performance improvement in a netns-to-netns
UDP forward test with conntrack enabled in the 'forward' net namespace.

I'm looking for feedback on the chosen approach.

Thanks,
Florian

Florian Westphal (9):
  netfilter: nf_queue: carry index in hook state
  netfilter: nat: split nat hook iteration into a helper
  netfilter: remove hook index from nf_hook_slow arguments
  netfilter: make hook functions accept only one argument
  netfilter: reduce allowed hook count to 32
  netfilter: add bpf base hook program generator
  netfilter: core: do not rebuild bpf program on dying netns
  netfilter: ingress: switch to invocation via bpf
  netfilter: hook_jit: add prog cache

 drivers/net/ipvlan/ipvlan_l3s.c            |   4 +-
 include/linux/netfilter.h                  |  72 ++-
 include/linux/netfilter_ingress.h          |  17 +-
 include/net/netfilter/br_netfilter.h       |   7 +-
 include/net/netfilter/nf_flow_table.h      |   6 +-
 include/net/netfilter/nf_hook_bpf.h        |  14 +
 include/net/netfilter/nf_queue.h           |   3 +-
 include/net/netfilter/nf_synproxy.h        |   6 +-
 net/bridge/br_input.c                      |   3 +-
 net/bridge/br_netfilter_hooks.c            |  30 +-
 net/bridge/br_netfilter_ipv6.c             |   5 +-
 net/bridge/netfilter/ebtable_broute.c      |   8 +-
 net/bridge/netfilter/ebtable_filter.c      |   5 +-
 net/bridge/netfilter/ebtable_nat.c         |   5 +-
 net/bridge/netfilter/nf_conntrack_bridge.c |   8 +-
 net/ipv4/netfilter/arptable_filter.c       |   5 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c         |   6 +-
 net/ipv4/netfilter/iptable_filter.c        |   5 +-
 net/ipv4/netfilter/iptable_mangle.c        |   7 +-
 net/ipv4/netfilter/iptable_nat.c           |   6 +-
 net/ipv4/netfilter/iptable_raw.c           |   5 +-
 net/ipv4/netfilter/iptable_security.c      |   5 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c        |   5 +-
 net/ipv6/netfilter/ip6table_filter.c       |   5 +-
 net/ipv6/netfilter/ip6table_mangle.c       |   6 +-
 net/ipv6/netfilter/ip6table_nat.c          |   6 +-
 net/ipv6/netfilter/ip6table_raw.c          |   5 +-
 net/ipv6/netfilter/ip6table_security.c     |   5 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |   5 +-
 net/netfilter/Kconfig                      |  10 +
 net/netfilter/Makefile                     |   1 +
 net/netfilter/core.c                       | 103 +++-
 net/netfilter/ipvs/ip_vs_core.c            |  48 +-
 net/netfilter/nf_conntrack_proto.c         |  34 +-
 net/netfilter/nf_flow_table_inet.c         |   9 +-
 net/netfilter/nf_flow_table_ip.c           |  12 +-
 net/netfilter/nf_hook_bpf.c                | 569 +++++++++++++++++++++
 net/netfilter/nf_nat_core.c                |  50 +-
 net/netfilter/nf_nat_proto.c               |  56 +-
 net/netfilter/nf_queue.c                   |  12 +-
 net/netfilter/nf_synproxy_core.c           |   8 +-
 net/netfilter/nft_chain_filter.c           |  48 +-
 net/netfilter/nft_chain_nat.c              |   7 +-
 net/netfilter/nft_chain_route.c            |  22 +-
 security/selinux/hooks.c                   |  58 +--
 45 files changed, 1001 insertions(+), 315 deletions(-)
 create mode 100644 include/net/netfilter/nf_hook_bpf.h
 create mode 100644 net/netfilter/nf_hook_bpf.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/1] netfilter: add bpf base hook program generator
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 1/9] netfilter: nf_queue: carry index in hook state Florian Westphal
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

Add a kernel bpf program generator for netfilter base hooks.

Currently netfilter hooks are invoked by nf_hook_slow:

for i in hooks; do
  verdict = hooks[i]->indirect_func(hooks->[i].hook_arg, skb, state);

  switch (verdict) { ....

The autogenerator unrolls the loop, so we get:

state->priv = hooks->[0].hook_arg;
v = first_hook_function(state);
if (v != ACCEPT) goto done;
state->priv = hooks->[1].hook_arg;
v = second_hook_function(state); ...

Indirections are replaced by direct calls. Invocation of the
autogenerated programs is done via bpf dispatcher from nf_hook().

The autogenerated program has the same return value scheme as
nf_hook_slow(). NF_HOOK() points are converted to call the
autogenerated bpf program instead of nf_hook_slow().

Purpose of this is to eventually add a 'netfilter prog type' to bpf and
permit attachment of (userspace generated) bpf programs to the netfilter
machinery, e.g.  'attach bpf prog id 1234 to ipv6 PREROUTING at prio -300'.

This will require to expose the context structure (program argument,
'__nf_hook_state', with rewriting accesses to match nf_hook_state layout.

TODO:
1. Test !x86_64.
2. Test bridge family.
3. fix checkpatch errors.

Future work:
add support for NAT hooks, they still use indirect calls, but those
are less of a problem because these get called only once per
connection.

Could annotate ops struct as to what kind of verdicts the
C function can return.  This would allow to elide retval
check when hook can only return NF_ACCEPT.

Could add extra support for INGRESS hook to move more code from
inline functions to the autogenerated program.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter.h           |  56 ++++
 include/net/netfilter/nf_hook_bpf.h |  14 +
 net/netfilter/Kconfig               |  10 +
 net/netfilter/Makefile              |   1 +
 net/netfilter/core.c                |  74 ++++-
 net/netfilter/nf_hook_bpf.c         | 425 ++++++++++++++++++++++++++++
 6 files changed, 577 insertions(+), 3 deletions(-)
 create mode 100644 include/net/netfilter/nf_hook_bpf.h
 create mode 100644 net/netfilter/nf_hook_bpf.c

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index c5de525218c2..9d22e672710c 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_NETFILTER_H
 #define __LINUX_NETFILTER_H
 
+#include <linux/filter.h>
 #include <linux/init.h>
 #include <linux/skbuff.h>
 #include <linux/net.h>
@@ -106,6 +107,9 @@ struct nf_hook_entries_rcu_head {
 };
 
 struct nf_hook_entries {
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	struct bpf_prog			*hook_prog;
+#endif
 	u16				num_hook_entries;
 	/* padding */
 	struct nf_hook_entry		hooks[];
@@ -205,6 +209,17 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 
 void nf_hook_slow_list(struct list_head *head, struct nf_hook_state *state,
 		       const struct nf_hook_entries *e);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+DECLARE_BPF_DISPATCHER(nf_hook_base);
+
+static __always_inline int bpf_prog_run_nf(const struct bpf_prog *prog,
+					   struct nf_hook_state *state)
+{
+	return __bpf_prog_run(prog, state, BPF_DISPATCHER_FUNC(nf_hook_base));
+}
+#endif
+
 /**
  *	nf_hook - call a netfilter hook
  *
@@ -259,11 +274,24 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
 
 	if (hook_head) {
 		struct nf_hook_state state;
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+		const struct bpf_prog *p = READ_ONCE(hook_head->hook_prog);
+
+		nf_hook_state_init(&state, hook, pf, indev, outdev,
+				   sk, net, okfn);
+
+		state.priv = (void *)hook_head;
+		state.skb = skb;
 
+		migrate_disable();
+		ret = bpf_prog_run_nf(p, &state);
+		migrate_enable();
+#else
 		nf_hook_state_init(&state, hook, pf, indev, outdev,
 				   sk, net, okfn);
 
 		ret = nf_hook_slow(skb, &state, hook_head);
+#endif
 	}
 	rcu_read_unlock();
 
@@ -341,10 +369,38 @@ NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 
 	if (hook_head) {
 		struct nf_hook_state state;
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+		const struct bpf_prog *p = hook_head->hook_prog;
+		struct sk_buff *skb, *next;
+		struct list_head sublist;
+		int ret;
 
 		nf_hook_state_init(&state, hook, pf, in, out, sk, net, okfn);
 
+		INIT_LIST_HEAD(&sublist);
+
+		migrate_disable();
+
+		list_for_each_entry_safe(skb, next, head, list) {
+			skb_list_del_init(skb);
+
+			state.priv = (void *)hook_head;
+			state.skb = skb;
+
+			ret = bpf_prog_run_nf(p, &state);
+			if (ret == 1)
+				list_add_tail(&skb->list, &sublist);
+		}
+
+		migrate_enable();
+
+		/* Put passed packets back on main list */
+		list_splice(&sublist, head);
+#else
+		nf_hook_state_init(&state, hook, pf, in, out, sk, net, okfn);
+
 		nf_hook_slow_list(head, &state, hook_head);
+#endif
 	}
 	rcu_read_unlock();
 }
diff --git a/include/net/netfilter/nf_hook_bpf.h b/include/net/netfilter/nf_hook_bpf.h
new file mode 100644
index 000000000000..12304e9f3d25
--- /dev/null
+++ b/include/net/netfilter/nf_hook_bpf.h
@@ -0,0 +1,14 @@
+struct bpf_dispatcher;
+struct bpf_prog;
+
+struct bpf_prog *nf_hook_bpf_create(const struct nf_hook_entries *n);
+struct bpf_prog *nf_hook_bpf_create_fb(void);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+void nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to);
+#else
+static inline void
+nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *f, struct bpf_prog *t)
+{
+}
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 54395266339d..6eec1720ff3d 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -19,6 +19,16 @@ config NETFILTER_FAMILY_BRIDGE
 config NETFILTER_FAMILY_ARP
 	bool
 
+config HAVE_NF_HOOK_BPF
+	bool
+
+config NF_HOOK_BPF
+	bool "netfilter base hook bpf translator"
+	depends on BPF_JIT
+	help
+	  This partially unrolls nf_hook_slow interpreter loop with
+	  auto-generated BPF programs.
+
 config NETFILTER_NETLINK_HOOK
 	tristate "Netfilter base hook dump support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index aab20e575ecd..13f1b95a7809 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -16,6 +16,7 @@ nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o
 
 obj-$(CONFIG_NETFILTER) = netfilter.o
+obj-$(CONFIG_NF_HOOK_BPF) += nf_hook_bpf.o
 
 obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o
 obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index f4359179eba9..56d82822cab7 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <net/net_namespace.h>
 #include <net/netfilter/nf_queue.h>
+#include <net/netfilter/nf_hook_bpf.h>
 #include <net/sock.h>
 
 #include "nf_internals.h"
@@ -47,6 +48,12 @@ static DEFINE_MUTEX(nf_hook_mutex);
 #define nf_entry_dereference(e) \
 	rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))
 
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+DEFINE_BPF_DISPATCHER(nf_hook_base);
+
+static struct bpf_prog *fallback_nf_hook_slow;
+#endif
+
 static struct nf_hook_entries *allocate_hook_entries_size(u16 num)
 {
 	struct nf_hook_entries *e;
@@ -58,9 +65,25 @@ static struct nf_hook_entries *allocate_hook_entries_size(u16 num)
 	if (num == 0)
 		return NULL;
 
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	if (!fallback_nf_hook_slow) {
+		/* never free'd */
+		fallback_nf_hook_slow = nf_hook_bpf_create_fb();
+
+		if (!fallback_nf_hook_slow)
+			return NULL;
+	}
+#endif
+
 	e = kvzalloc(alloc, GFP_KERNEL);
-	if (e)
-		e->num_hook_entries = num;
+	if (!e)
+		return NULL;
+
+	e->num_hook_entries = num;
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	e->hook_prog = fallback_nf_hook_slow;
+#endif
+
 	return e;
 }
 
@@ -104,6 +127,7 @@ nf_hook_entries_grow(const struct nf_hook_entries *old,
 {
 	unsigned int i, alloc_entries, nhooks, old_entries;
 	struct nf_hook_ops **orig_ops = NULL;
+	struct bpf_prog *hook_bpf_prog;
 	struct nf_hook_ops **new_ops;
 	struct nf_hook_entries *new;
 	bool inserted = false;
@@ -156,6 +180,27 @@ nf_hook_entries_grow(const struct nf_hook_entries *old,
 		new->hooks[nhooks].priv = reg->priv;
 	}
 
+	hook_bpf_prog = nf_hook_bpf_create(new);
+
+	/* XXX: jit failure handling?
+	 * We could refuse hook registration.
+	 *
+	 * For now, allocate_hook_entries_size() sets
+	 * ->hook_prog to a small fallback program that
+	 *  calls nf_hook_slow().
+	 */
+	if (hook_bpf_prog) {
+		struct bpf_prog *old_prog = NULL;
+
+		new->hook_prog = hook_bpf_prog;
+
+		if (old)
+			old_prog = old->hook_prog;
+
+		nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base),
+					old_prog, hook_bpf_prog);
+	}
+
 	return new;
 }
 
@@ -221,6 +266,7 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 					  struct nf_hook_entries __rcu **pp)
 {
 	unsigned int i, j, skip = 0, hook_entries;
+	struct bpf_prog *hook_bpf_prog = NULL;
 	struct nf_hook_entries *new = NULL;
 	struct nf_hook_ops **orig_ops;
 	struct nf_hook_ops **new_ops;
@@ -244,8 +290,15 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 
 	hook_entries -= skip;
 	new = allocate_hook_entries_size(hook_entries);
-	if (!new)
+	if (!new) {
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+		struct bpf_prog *old_prog = old->hook_prog;
+
+		WRITE_ONCE(old->hook_prog, fallback_nf_hook_slow);
+		nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base), old_prog, NULL);
+#endif
 		return NULL;
+	}
 
 	new_ops = nf_hook_entries_get_hook_ops(new);
 	for (i = 0, j = 0; i < old->num_hook_entries; i++) {
@@ -256,7 +309,16 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 		j++;
 	}
 	hooks_validate(new);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	/* if this fails fallback prog calls nf_hook_slow. */
+	hook_bpf_prog = nf_hook_bpf_create(new);
+	if (hook_bpf_prog)
+		new->hook_prog = hook_bpf_prog;
+#endif
 out_assign:
+	nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base),
+				old ? old->hook_prog : NULL, hook_bpf_prog);
 	rcu_assign_pointer(*pp, new);
 	return old;
 }
@@ -584,6 +646,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 	int ret;
 
 	state->skb = skb;
+
 	for (; s < e->num_hook_entries; s++) {
 		verdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);
 		switch (verdict & NF_VERDICT_MASK) {
@@ -764,6 +827,11 @@ int __init netfilter_init(void)
 	if (ret < 0)
 		goto err_pernet;
 
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	fallback_nf_hook_slow = nf_hook_bpf_create_fb();
+	WARN_ON_ONCE(!fallback_nf_hook_slow);
+#endif
+
 	return 0;
 err_pernet:
 	unregister_pernet_subsys(&netfilter_net_ops);
diff --git a/net/netfilter/nf_hook_bpf.c b/net/netfilter/nf_hook_bpf.c
new file mode 100644
index 000000000000..cd8aba6da53b
--- /dev/null
+++ b/net/netfilter/nf_hook_bpf.c
@@ -0,0 +1,425 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/string.h>
+#include <linux/hashtable.h>
+#include <linux/jhash.h>
+#include <linux/netfilter.h>
+
+#include <net/netfilter/nf_queue.h>
+
+/* BPF translator for netfilter hooks.
+ *
+ * Copyright (c) 2021 Red Hat GmbH
+ *
+ * Author: Florian Westphal <fw@strlen.de>
+ *
+ * Unroll nf_hook_slow interpreter loop into an equivalent bpf
+ * program that can be called *instead* of nf_hook_slow().
+ * This program thus has same return value as nf_hook_slow and
+ * handles nfqueue and packet drops internally.
+ *
+ * These bpf programs are called/run from nf_hook() inline function.
+ *
+ * Register usage is:
+ *
+ * BPF_REG_0: verdict.
+ * BPF_REG_1: struct nf_hook_state *
+ * BPF_REG_2: reserved as arg to nf_queue()
+ * BPF_REG_3: reserved as arg to nf_queue()
+ *
+ * Prologue storage:
+ * BPF_REG_6: copy of REG_1 (original struct nf_hook_state *)
+ * BPF_REG_7: copy of original state->priv value
+ * BPF_REG_8: hook_index.  Inited to 0, increments on each hook call.
+ */
+
+#define JMP_INVALID 0
+#define JIT_SIZE_MAX 0xffff
+
+struct nf_hook_prog {
+	struct bpf_insn *insns;
+	unsigned int pos;
+};
+
+static bool emit(struct nf_hook_prog *p, struct bpf_insn insn)
+{
+	if (WARN_ON_ONCE(p->pos >= BPF_MAXINSNS))
+		return false;
+
+	p->insns[p->pos] = insn;
+	p->pos++;
+	return true;
+}
+
+static bool xlate_one_hook(struct nf_hook_prog *p,
+			   const struct nf_hook_entries *e,
+			   const struct nf_hook_entry *h)
+{
+	int width = bytes_to_bpf_size(sizeof(h->priv));
+
+	/* if priv is NULL, the called hookfn does not use the priv member. */
+	if (!h->priv)
+		goto emit_hook_call;
+
+	if (WARN_ON_ONCE(width < 0))
+		return false;
+
+	/* x = entries[s]->priv; */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_2, BPF_REG_7,
+				 (unsigned long)&h->priv - (unsigned long)e)))
+		return false;
+
+	/* state->priv = x */
+	if (!emit(p, BPF_STX_MEM(width, BPF_REG_6, BPF_REG_2,
+				 offsetof(struct nf_hook_state, priv))))
+		return false;
+
+emit_hook_call:
+	if (!emit(p, BPF_EMIT_CALL(h->hook)))
+		return false;
+
+	/* Only advance to next hook on ACCEPT verdict.
+	 * Else, skip rest and move to tail.
+	 *
+	 * Postprocessing patches the jump offset to the
+	 * correct position, after last hook.
+	 */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, NF_ACCEPT, JMP_INVALID)))
+		return false;
+
+	return true;
+}
+
+static bool emit_mov_ptr_reg(struct nf_hook_prog *p, u8 dreg, u8 sreg)
+{
+	if (sizeof(void *) == sizeof(u64))
+		return emit(p, BPF_MOV64_REG(dreg, sreg));
+	if (sizeof(void *) == sizeof(u32))
+		return emit(p, BPF_MOV32_REG(dreg, sreg));
+
+	return false;
+}
+
+static bool do_prologue(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	if (WARN_ON_ONCE(width < 0))
+		return false;
+
+	/* argument to program is a pointer to struct nf_hook_state, in BPF_REG_1. */
+	if (!emit_mov_ptr_reg(p, BPF_REG_6, BPF_REG_1))
+		return false;
+
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_7, BPF_REG_1,
+				 offsetof(struct nf_hook_state, priv))))
+		return false;
+
+	/* Could load state->hook_index here, but we don't support index > 0 for bpf call. */
+	if (!emit(p, BPF_MOV32_IMM(BPF_REG_8, 0)))
+		return false;
+
+	return true;
+}
+
+static void patch_hook_jumps(struct nf_hook_prog *p)
+{
+	unsigned int i;
+
+	if (!p->insns)
+		return;
+
+	for (i = 0; i < p->pos; i++) {
+		if (BPF_CLASS(p->insns[i].code) != BPF_JMP)
+			continue;
+
+		if (p->insns[i].code == (BPF_EXIT | BPF_JMP))
+			continue;
+		if (p->insns[i].code == (BPF_CALL | BPF_JMP))
+			continue;
+
+		if (p->insns[i].off != JMP_INVALID)
+			continue;
+		p->insns[i].off = p->pos - i - 1;
+	}
+}
+
+static bool emit_retval(struct nf_hook_prog *p, int retval)
+{
+	if (!emit(p, BPF_MOV32_IMM(BPF_REG_0, retval)))
+		return false;
+
+	return emit(p, BPF_EXIT_INSN());
+}
+
+static bool emit_nf_hook_slow(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	/* restore the original state->priv. */
+	if (!emit(p, BPF_STX_MEM(width, BPF_REG_6, BPF_REG_7,
+				 offsetof(struct nf_hook_state, priv))))
+		return false;
+
+	/* arg1 is state->skb */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_1, BPF_REG_6,
+				 offsetof(struct nf_hook_state, skb))))
+		return false;
+
+	/* arg2 is "struct nf_hook_state *" */
+	if (!emit(p, BPF_MOV64_REG(BPF_REG_2, BPF_REG_6)))
+		return false;
+
+	/* arg3 is nf_hook_entries (original state->priv) */
+	if (!emit(p, BPF_MOV64_REG(BPF_REG_3, BPF_REG_7)))
+		return false;
+
+	if (!emit(p, BPF_EMIT_CALL(nf_hook_slow)))
+		return false;
+
+	/* No further action needed, return retval provided by nf_hook_slow */
+	return emit(p, BPF_EXIT_INSN());
+}
+
+static bool emit_nf_queue(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	if (width < 0) {
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/* int nf_queue(struct sk_buff *skb, struct nf_hook_state *state, unsigned int verdict) */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_1, BPF_REG_6, offsetof(struct nf_hook_state, skb))))
+		return false;
+	if (!emit(p, BPF_STX_MEM(BPF_H, BPF_REG_6, BPF_REG_8,
+				 offsetof(struct nf_hook_state, hook_index))))
+		return false;
+	/* arg2: struct nf_hook_state * */
+	if (!emit(p, BPF_MOV64_REG(BPF_REG_2, BPF_REG_6)))
+		return false;
+	/* arg3: original hook return value: (NUM << NF_VERDICT_QBITS | NF_QUEUE) */
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_3, BPF_REG_0)))
+		return false;
+	if (!emit(p, BPF_EMIT_CALL(nf_queue)))
+		return false;
+
+	/* Check nf_queue return value.  Abnormal case: nf_queue returned != 0.
+	 *
+	 * Fall back to nf_hook_slow().
+	 */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2)))
+		return false;
+
+	/* Normal case: skb was stolen. Return 0. */
+	return emit_retval(p, 0);
+}
+
+static bool do_epilogue_base_hooks(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	if (WARN_ON_ONCE(width < 0))
+		return false;
+
+	/* last 'hook'. We arrive here if previous hook returned ACCEPT,
+	 * i.e. all hooks passed -- we are done.
+	 *
+	 * Return 1, skb can continue traversing network stack.
+	 */
+	if (!emit_retval(p, 1))
+		return false;
+
+	/* Patch all hook jumps, in case any of these are taken
+	 * we need to jump to this location.
+	 *
+	 * This happens when verdict is != ACCEPT.
+	 */
+	patch_hook_jumps(p);
+
+	/* need to ignore upper 24 bits, might contain errno or queue number */
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_3, BPF_REG_0)))
+		return false;
+	if (!emit(p, BPF_ALU32_IMM(BPF_AND, BPF_REG_3, 0xff)))
+		return false;
+
+	/* ACCEPT handled, check STOLEN. */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_3, NF_STOLEN, 2)))
+		return false;
+
+	if (!emit_retval(p, 0))
+		return false;
+
+	/* ACCEPT and STOLEN handled.  Check DROP next */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_3, NF_DROP, 1 + 2 + 2 + 2 + 2)))
+		return false;
+
+	/* First step. Extract the errno number. 1 insn. */
+	if (!emit(p, BPF_ALU32_IMM(BPF_RSH, BPF_REG_0, NF_VERDICT_QBITS)))
+		return false;
+
+	/* Second step: replace errno with EPERM if it was 0. 2 insns. */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1)))
+		return false;
+	if (!emit(p, BPF_MOV32_IMM(BPF_REG_0, EPERM)))
+		return false;
+
+	/* Third step: negate reg0: Caller expects -EFOO and stash the result.  2 insns. */
+	if (!emit(p, BPF_ALU32_IMM(BPF_NEG, BPF_REG_0, 0)))
+		return false;
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_8, BPF_REG_0)))
+		return false;
+
+	/* Fourth step: free the skb. 2 insns. */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_1, BPF_REG_6, offsetof(struct nf_hook_state, skb))))
+		return false;
+	if (!emit(p, BPF_EMIT_CALL(kfree_skb)))
+		return false;
+
+	/* Last step: return. 2 insns. */
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_0, BPF_REG_8)))
+		return false;
+	if (!emit(p, BPF_EXIT_INSN()))
+		return false;
+
+	/* ACCEPT, STOLEN and DROP have been handled.
+	 * REPEAT and STOP are not allowed anymore for individual hook functions.
+	 * This leaves NFQUEUE as only remaing return value.
+	 *
+	 * In this case BPF_REG_0 still contains the original verdict of
+	 * '(NUM << NF_VERDICT_QBITS | NF_QUEUE)', so pass it to nf_queue() as-is.
+	 */
+	if (!emit_nf_queue(p))
+		return false;
+
+	/* Increment hook index and store it in nf_hook_state so nf_hook_slow will
+	 * start at the next hook, if any.
+	 */
+	if (!emit(p, BPF_ALU32_IMM(BPF_ADD, BPF_REG_8, 1)))
+		return false;
+	if (!emit(p, BPF_STX_MEM(BPF_H, BPF_REG_6, BPF_REG_8,
+				 offsetof(struct nf_hook_state, hook_index))))
+		return false;
+
+	return emit_nf_hook_slow(p);
+}
+
+static int nf_hook_prog_init(struct nf_hook_prog *p)
+{
+	memset(p, 0, sizeof(*p));
+
+	p->insns = kcalloc(BPF_MAXINSNS, sizeof(*p->insns), GFP_KERNEL);
+	if (!p->insns)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void nf_hook_prog_free(struct nf_hook_prog *p)
+{
+	kfree(p->insns);
+}
+
+static int xlate_base_hooks(struct nf_hook_prog *p, const struct nf_hook_entries *e)
+{
+	unsigned int i, len;
+
+	len = e->num_hook_entries;
+
+	if (!do_prologue(p))
+		goto out;
+
+	for (i = 0; i < len; i++) {
+		if (!xlate_one_hook(p, e, &e->hooks[i]))
+			goto out;
+
+		if (i + 1 < len) {
+			if (!emit(p, BPF_MOV64_REG(BPF_REG_1, BPF_REG_6)))
+				goto out;
+
+			if (!emit(p, BPF_ALU32_IMM(BPF_ADD, BPF_REG_8, 1)))
+				goto out;
+		}
+	}
+
+	if (!do_epilogue_base_hooks(p))
+		goto out;
+
+	return 0;
+out:
+	return -EINVAL;
+}
+
+static struct bpf_prog *nf_hook_jit_compile(struct bpf_insn *insns, unsigned int len)
+{
+	struct bpf_prog *prog;
+	int err = 0;
+
+	prog = bpf_prog_alloc(bpf_prog_size(len), 0);
+	if (!prog)
+		return NULL;
+
+	prog->len = len;
+	prog->type = BPF_PROG_TYPE_SOCKET_FILTER;
+	memcpy(prog->insnsi, insns, prog->len * sizeof(struct bpf_insn));
+
+	prog = bpf_prog_select_runtime(prog, &err);
+	if (err) {
+		bpf_prog_free(prog);
+		return NULL;
+	}
+
+	return prog;
+}
+
+/* fallback program, invokes nf_hook_slow interpreter.
+ *
+ * Used when a hook is unregsitered and new program cannot
+ * be compiled for some reason.
+ */
+struct bpf_prog *nf_hook_bpf_create_fb(void)
+{
+	struct bpf_prog *prog;
+	struct nf_hook_prog p;
+	int err;
+
+	err = nf_hook_prog_init(&p);
+	if (err)
+		return NULL;
+
+	if (!do_prologue(&p))
+		goto err;
+
+	if (!emit_nf_hook_slow(&p))
+		goto err;
+
+	prog = nf_hook_jit_compile(p.insns, p.pos);
+err:
+	nf_hook_prog_free(&p);
+	return prog;
+}
+
+struct bpf_prog *nf_hook_bpf_create(const struct nf_hook_entries *new)
+{
+	struct bpf_prog *prog;
+	struct nf_hook_prog p;
+	int err;
+
+	err = nf_hook_prog_init(&p);
+	if (err)
+		return NULL;
+
+	err = xlate_base_hooks(&p, new);
+	if (err)
+		goto err;
+
+	prog = nf_hook_jit_compile(p.insns, p.pos);
+err:
+	nf_hook_prog_free(&p);
+	return prog;
+}
+
+void nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to)
+{
+	bpf_dispatcher_change_prog(d, from, to);
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 1/9] netfilter: nf_queue: carry index in hook state
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
  2021-10-14 12:10 ` [PATCH 1/1] netfilter: add " Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 2/9] netfilter: nat: split nat hook iteration into a helper Florian Westphal
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

Rather than passing the index (hook function to call next)
as function argument, store it in the hook state.

This is a prerequesite to allow passing all nf hook arguments in a single
structure.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter.h        |  1 +
 include/net/netfilter/nf_queue.h |  3 +--
 net/bridge/br_input.c            |  3 ++-
 net/netfilter/core.c             |  6 +++++-
 net/netfilter/nf_queue.c         | 12 ++++++------
 5 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 3fda1a508733..1d8b87abd54c 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -67,6 +67,7 @@ struct sock;
 struct nf_hook_state {
 	u8 hook;
 	u8 pf;
+	u16 hook_index; /* index in hook_entries->hook[] */
 	struct net_device *in;
 	struct net_device *out;
 	struct sock *sk;
diff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h
index 9eed51e920e8..bc245b96143a 100644
--- a/include/net/netfilter/nf_queue.h
+++ b/include/net/netfilter/nf_queue.h
@@ -13,7 +13,6 @@ struct nf_queue_entry {
 	struct list_head	list;
 	struct sk_buff		*skb;
 	unsigned int		id;
-	unsigned int		hook_index;	/* index in hook_entries->hook[] */
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
 	struct net_device	*physin;
 	struct net_device	*physout;
@@ -125,6 +124,6 @@ nfqueue_hash(const struct sk_buff *skb, u16 queue, u16 queues_total, u8 family,
 }
 
 int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
-	     unsigned int index, unsigned int verdict);
+	     unsigned int verdict);
 
 #endif /* _NF_QUEUE_H */
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index b50382f957c1..f0a025db263f 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -239,7 +239,8 @@ static int nf_hook_bridge_pre(struct sk_buff *skb, struct sk_buff **pskb)
 			kfree_skb(skb);
 			return RX_HANDLER_CONSUMED;
 		case NF_QUEUE:
-			ret = nf_queue(skb, &state, i, verdict);
+			state.hook_index = i;
+			ret = nf_queue(skb, &state, verdict);
 			if (ret == 1)
 				continue;
 			return RX_HANDLER_CONSUMED;
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 63d032191e62..57685334d32b 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -597,7 +597,8 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 				ret = -EPERM;
 			return ret;
 		case NF_QUEUE:
-			ret = nf_queue(skb, state, s, verdict);
+			state->hook_index = s;
+			ret = nf_queue(skb, state, verdict);
 			if (ret == 1)
 				continue;
 			return ret;
@@ -753,6 +754,9 @@ int __init netfilter_init(void)
 {
 	int ret;
 
+	/* state->index */
+	BUILD_BUG_ON(MAX_HOOK_COUNT > USHRT_MAX);
+
 	ret = register_pernet_subsys(&netfilter_net_ops);
 	if (ret < 0)
 		goto err;
diff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c
index 6d12afabfe8a..a869ae3b9665 100644
--- a/net/netfilter/nf_queue.c
+++ b/net/netfilter/nf_queue.c
@@ -145,7 +145,7 @@ static void nf_ip6_saveroute(const struct sk_buff *skb,
 }
 
 static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
-		      unsigned int index, unsigned int queuenum)
+		      unsigned int queuenum)
 {
 	struct nf_queue_entry *entry = NULL;
 	const struct nf_queue_handler *qh;
@@ -181,7 +181,6 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
 	*entry = (struct nf_queue_entry) {
 		.skb	= skb,
 		.state	= *state,
-		.hook_index = index,
 		.size	= sizeof(*entry) + route_key_size,
 	};
 
@@ -209,11 +208,11 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,
 
 /* Packets leaving via this function must come back through nf_reinject(). */
 int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,
-	     unsigned int index, unsigned int verdict)
+	     unsigned int verdict)
 {
 	int ret;
 
-	ret = __nf_queue(skb, state, index, verdict >> NF_VERDICT_QBITS);
+	ret = __nf_queue(skb, state, verdict >> NF_VERDICT_QBITS);
 	if (ret < 0) {
 		if (ret == -ESRCH &&
 		    (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))
@@ -285,7 +284,7 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 
 	hooks = nf_hook_entries_head(net, pf, entry->state.hook);
 
-	i = entry->hook_index;
+	i = entry->state.hook_index;
 	if (WARN_ON_ONCE(!hooks || i >= hooks->num_hook_entries)) {
 		kfree_skb(skb);
 		nf_queue_entry_free(entry);
@@ -317,7 +316,8 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 		local_bh_enable();
 		break;
 	case NF_QUEUE:
-		err = nf_queue(skb, &entry->state, i, verdict);
+		entry->state.hook_index = i;
+		err = nf_queue(skb, &entry->state, verdict);
 		if (err == 1)
 			goto next_hook;
 		break;
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 2/9] netfilter: nat: split nat hook iteration into a helper
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
  2021-10-14 12:10 ` [PATCH 1/1] netfilter: add " Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 1/9] netfilter: nf_queue: carry index in hook state Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 3/9] netfilter: remove hook index from nf_hook_slow arguments Florian Westphal
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

Makes conversion in followup patch simpler.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_nat_core.c | 46 +++++++++++++++++++++++--------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 273117683922..a6a273fff3f6 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -699,6 +699,32 @@ unsigned int nf_nat_packet(struct nf_conn *ct,
 }
 EXPORT_SYMBOL_GPL(nf_nat_packet);
 
+static unsigned int nf_nat_inet_run_hooks(const struct nf_hook_state *state,
+					  struct sk_buff *skb,
+					  struct nf_conn *ct,
+					  struct nf_nat_lookup_hook_priv *lpriv)
+{
+	enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
+	struct nf_hook_entries *e = rcu_dereference(lpriv->entries);
+	unsigned int ret;
+	int i;
+
+	if (!e)
+		goto null_bind;
+
+	for (i = 0; i < e->num_hook_entries; i++) {
+		ret = e->hooks[i].hook(e->hooks[i].priv, skb, state);
+		if (ret != NF_ACCEPT)
+			return ret;
+
+		if (nf_nat_initialized(ct, maniptype))
+			return NF_ACCEPT;
+	}
+
+null_bind:
+	return nf_nat_alloc_null_binding(ct, state->hook);
+}
+
 unsigned int
 nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 	       const struct nf_hook_state *state)
@@ -730,23 +756,9 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 		 */
 		if (!nf_nat_initialized(ct, maniptype)) {
 			struct nf_nat_lookup_hook_priv *lpriv = priv;
-			struct nf_hook_entries *e = rcu_dereference(lpriv->entries);
 			unsigned int ret;
-			int i;
-
-			if (!e)
-				goto null_bind;
-
-			for (i = 0; i < e->num_hook_entries; i++) {
-				ret = e->hooks[i].hook(e->hooks[i].priv, skb,
-						       state);
-				if (ret != NF_ACCEPT)
-					return ret;
-				if (nf_nat_initialized(ct, maniptype))
-					goto do_nat;
-			}
-null_bind:
-			ret = nf_nat_alloc_null_binding(ct, state->hook);
+
+			ret = nf_nat_inet_run_hooks(state, skb, ct, lpriv);
 			if (ret != NF_ACCEPT)
 				return ret;
 		} else {
@@ -765,7 +777,7 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 		if (nf_nat_oif_changed(state->hook, ctinfo, nat, state->out))
 			goto oif_changed;
 	}
-do_nat:
+
 	return nf_nat_packet(ct, ctinfo, state->hook, skb);
 
 oif_changed:
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 3/9] netfilter: remove hook index from nf_hook_slow arguments
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (2 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 2/9] netfilter: nat: split nat hook iteration into a helper Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 4/9] netfilter: make hook functions accept only one argument Florian Westphal
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

Previous patch added hook_entry member to nf_hook_state struct, so
use that for passing the index.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter.h         | 5 +++--
 include/linux/netfilter_ingress.h | 2 +-
 net/bridge/br_netfilter_hooks.c   | 3 ++-
 net/netfilter/core.c              | 6 +++---
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 1d8b87abd54c..61a8c8ded57b 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -154,6 +154,7 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
 {
 	p->hook = hook;
 	p->pf = pf;
+	p->hook_index = 0;
 	p->in = indev;
 	p->out = outdev;
 	p->sk = sk;
@@ -198,7 +199,7 @@ extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];
 #endif
 
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
-		 const struct nf_hook_entries *e, unsigned int i);
+		 const struct nf_hook_entries *e);
 
 void nf_hook_slow_list(struct list_head *head, struct nf_hook_state *state,
 		       const struct nf_hook_entries *e);
@@ -260,7 +261,7 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
 		nf_hook_state_init(&state, hook, pf, indev, outdev,
 				   sk, net, okfn);
 
-		ret = nf_hook_slow(skb, &state, hook_head, 0);
+		ret = nf_hook_slow(skb, &state, hook_head);
 	}
 	rcu_read_unlock();
 
diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_ingress.h
index a13774be2eb5..c95f84a5badc 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -31,7 +31,7 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 	nf_hook_state_init(&state, NF_NETDEV_INGRESS,
 			   NFPROTO_NETDEV, skb->dev, NULL, NULL,
 			   dev_net(skb->dev), NULL);
-	ret = nf_hook_slow(skb, &state, e, 0);
+	ret = nf_hook_slow(skb, &state, e);
 	if (ret == 0)
 		return -1;
 
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 8edfb98ae1d5..5ed8b698ce11 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -1020,7 +1020,8 @@ int br_nf_hook_thresh(unsigned int hook, struct net *net,
 	nf_hook_state_init(&state, hook, NFPROTO_BRIDGE, indev, outdev,
 			   sk, net, okfn);
 
-	ret = nf_hook_slow(skb, &state, e, i);
+	state.hook_index = i;
+	ret = nf_hook_slow(skb, &state, e);
 	if (ret == 1)
 		ret = okfn(net, sk, skb);
 
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 57685334d32b..129d48304821 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -580,9 +580,9 @@ EXPORT_SYMBOL(nf_unregister_net_hooks);
 /* Returns 1 if okfn() needs to be executed by the caller,
  * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */
 int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
-		 const struct nf_hook_entries *e, unsigned int s)
+		 const struct nf_hook_entries *e)
 {
-	unsigned int verdict;
+	unsigned int verdict, s = state->hook_index;
 	int ret;
 
 	for (; s < e->num_hook_entries; s++) {
@@ -625,7 +625,7 @@ void nf_hook_slow_list(struct list_head *head, struct nf_hook_state *state,
 
 	list_for_each_entry_safe(skb, next, head, list) {
 		skb_list_del_init(skb);
-		ret = nf_hook_slow(skb, state, e, 0);
+		ret = nf_hook_slow(skb, state, e);
 		if (ret == 1)
 			list_add_tail(&skb->list, &sublist);
 	}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 4/9] netfilter: make hook functions accept only one argument
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (3 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 3/9] netfilter: remove hook index from nf_hook_slow arguments Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 5/9] netfilter: reduce allowed hook count to 32 Florian Westphal
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

BPF conversion requirement: one pointer-to-structure as argument.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 drivers/net/ipvlan/ipvlan_l3s.c            |  4 +-
 include/linux/netfilter.h                  | 10 ++--
 include/net/netfilter/br_netfilter.h       |  7 +--
 include/net/netfilter/nf_flow_table.h      |  6 +--
 include/net/netfilter/nf_synproxy.h        |  6 +--
 net/bridge/br_netfilter_hooks.c            | 27 ++++------
 net/bridge/br_netfilter_ipv6.c             |  5 +-
 net/bridge/netfilter/ebtable_broute.c      |  8 +--
 net/bridge/netfilter/ebtable_filter.c      |  5 +-
 net/bridge/netfilter/ebtable_nat.c         |  5 +-
 net/bridge/netfilter/nf_conntrack_bridge.c |  8 +--
 net/ipv4/netfilter/arptable_filter.c       |  5 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c         |  6 +--
 net/ipv4/netfilter/iptable_filter.c        |  5 +-
 net/ipv4/netfilter/iptable_mangle.c        |  7 +--
 net/ipv4/netfilter/iptable_nat.c           |  6 +--
 net/ipv4/netfilter/iptable_raw.c           |  5 +-
 net/ipv4/netfilter/iptable_security.c      |  5 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c        |  5 +-
 net/ipv6/netfilter/ip6table_filter.c       |  5 +-
 net/ipv6/netfilter/ip6table_mangle.c       |  6 ++-
 net/ipv6/netfilter/ip6table_nat.c          |  6 +--
 net/ipv6/netfilter/ip6table_raw.c          |  5 +-
 net/ipv6/netfilter/ip6table_security.c     |  5 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |  5 +-
 net/netfilter/core.c                       |  5 +-
 net/netfilter/ipvs/ip_vs_core.c            | 48 ++++++++----------
 net/netfilter/nf_conntrack_proto.c         | 34 +++++--------
 net/netfilter/nf_flow_table_inet.c         |  9 ++--
 net/netfilter/nf_flow_table_ip.c           | 12 ++---
 net/netfilter/nf_nat_core.c                | 10 ++--
 net/netfilter/nf_nat_proto.c               | 56 +++++++++++----------
 net/netfilter/nf_synproxy_core.c           |  8 +--
 net/netfilter/nft_chain_filter.c           | 48 ++++++++----------
 net/netfilter/nft_chain_nat.c              |  7 ++-
 net/netfilter/nft_chain_route.c            | 22 ++++----
 security/selinux/hooks.c                   | 58 +++++-----------------
 37 files changed, 207 insertions(+), 277 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_l3s.c b/drivers/net/ipvlan/ipvlan_l3s.c
index 943d26cbf39f..a6af569fcc27 100644
--- a/drivers/net/ipvlan/ipvlan_l3s.c
+++ b/drivers/net/ipvlan/ipvlan_l3s.c
@@ -90,9 +90,9 @@ static const struct l3mdev_ops ipvl_l3mdev_ops = {
 	.l3mdev_l3_rcv = ipvlan_l3_rcv,
 };
 
-static unsigned int ipvlan_nf_input(void *priv, struct sk_buff *skb,
-				    const struct nf_hook_state *state)
+static unsigned int ipvlan_nf_input(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct ipvl_addr *addr;
 	unsigned int len;
 
diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 61a8c8ded57b..c5de525218c2 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -65,6 +65,8 @@ struct nf_hook_ops;
 struct sock;
 
 struct nf_hook_state {
+	struct sk_buff *skb;
+	void *priv;
 	u8 hook;
 	u8 pf;
 	u16 hook_index; /* index in hook_entries->hook[] */
@@ -75,9 +77,7 @@ struct nf_hook_state {
 	int (*okfn)(struct net *, struct sock *, struct sk_buff *);
 };
 
-typedef unsigned int nf_hookfn(void *priv,
-			       struct sk_buff *skb,
-			       const struct nf_hook_state *state);
+typedef unsigned int nf_hookfn(const struct nf_hook_state *state);
 enum nf_hook_ops_type {
 	NF_HOOK_OP_UNDEFINED,
 	NF_HOOK_OP_NF_TABLES,
@@ -140,7 +140,9 @@ static inline int
 nf_hook_entry_hookfn(const struct nf_hook_entry *entry, struct sk_buff *skb,
 		     struct nf_hook_state *state)
 {
-	return entry->hook(entry->priv, skb, state);
+	state->skb = skb;
+	state->priv = entry->priv;
+	return entry->hook(state);
 }
 
 static inline void nf_hook_state_init(struct nf_hook_state *p,
diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
index 371696ec11b2..9c37bf316077 100644
--- a/include/net/netfilter/br_netfilter.h
+++ b/include/net/netfilter/br_netfilter.h
@@ -57,9 +57,7 @@ struct net_device *setup_pre_routing(struct sk_buff *skb,
 
 #if IS_ENABLED(CONFIG_IPV6)
 int br_validate_ipv6(struct net *net, struct sk_buff *skb);
-unsigned int br_nf_pre_routing_ipv6(void *priv,
-				    struct sk_buff *skb,
-				    const struct nf_hook_state *state);
+unsigned int br_nf_pre_routing_ipv6(const struct nf_hook_state *state);
 #else
 static inline int br_validate_ipv6(struct net *net, struct sk_buff *skb)
 {
@@ -67,8 +65,7 @@ static inline int br_validate_ipv6(struct net *net, struct sk_buff *skb)
 }
 
 static inline unsigned int
-br_nf_pre_routing_ipv6(void *priv, struct sk_buff *skb,
-		       const struct nf_hook_state *state)
+br_nf_pre_routing_ipv6(const struct nf_hook_state *state)
 {
 	return NF_ACCEPT;
 }
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index a3647fadf1cc..50947d52f7c2 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -284,10 +284,8 @@ struct flow_ports {
 	__be16 source, dest;
 };
 
-unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
-				     const struct nf_hook_state *state);
-unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
-				       const struct nf_hook_state *state);
+unsigned int nf_flow_offload_ip_hook(const struct nf_hook_state *state);
+unsigned int nf_flow_offload_ipv6_hook(const struct nf_hook_state *state);
 
 #define MODULE_ALIAS_NF_FLOWTABLE(family)	\
 	MODULE_ALIAS("nf-flowtable-" __stringify(family))
diff --git a/include/net/netfilter/nf_synproxy.h b/include/net/netfilter/nf_synproxy.h
index a336f9434e73..9cf8db712e88 100644
--- a/include/net/netfilter/nf_synproxy.h
+++ b/include/net/netfilter/nf_synproxy.h
@@ -60,8 +60,7 @@ bool synproxy_recv_client_ack(struct net *net,
 
 struct nf_hook_state;
 
-unsigned int ipv4_synproxy_hook(void *priv, struct sk_buff *skb,
-				const struct nf_hook_state *nhs);
+unsigned int ipv4_synproxy_hook(const struct nf_hook_state *nhs);
 int nf_synproxy_ipv4_init(struct synproxy_net *snet, struct net *net);
 void nf_synproxy_ipv4_fini(struct synproxy_net *snet, struct net *net);
 
@@ -75,8 +74,7 @@ bool synproxy_recv_client_ack_ipv6(struct net *net, const struct sk_buff *skb,
 				   const struct tcphdr *th,
 				   struct synproxy_options *opts, u32 recv_seq);
 
-unsigned int ipv6_synproxy_hook(void *priv, struct sk_buff *skb,
-				const struct nf_hook_state *nhs);
+unsigned int ipv6_synproxy_hook(const struct nf_hook_state *nhs);
 int nf_synproxy_ipv6_init(struct synproxy_net *snet, struct net *net);
 void nf_synproxy_ipv6_fini(struct synproxy_net *snet, struct net *net);
 #else
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index 5ed8b698ce11..1c297db5e7cf 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -472,10 +472,9 @@ struct net_device *setup_pre_routing(struct sk_buff *skb, const struct net *net)
  * receiving device) to make netfilter happy, the REDIRECT
  * target in particular.  Save the original destination IP
  * address to be able to detect DNAT afterwards. */
-static unsigned int br_nf_pre_routing(void *priv,
-				      struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int br_nf_pre_routing(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nf_bridge_info *nf_bridge;
 	struct net_bridge_port *p;
 	struct net_bridge *br;
@@ -502,7 +501,7 @@ static unsigned int br_nf_pre_routing(void *priv,
 		}
 
 		nf_bridge_pull_encap_header_rcsum(skb);
-		return br_nf_pre_routing_ipv6(priv, skb, state);
+		return br_nf_pre_routing_ipv6(state);
 	}
 
 	if (!brnet->call_iptables && !br_opt_get(br, BROPT_NF_CALL_IPTABLES))
@@ -572,10 +571,9 @@ static int br_nf_forward_finish(struct net *net, struct sock *sk, struct sk_buff
  * but we are still able to filter on the 'real' indev/outdev
  * because of the physdev module. For ARP, indev and outdev are the
  * bridge ports. */
-static unsigned int br_nf_forward_ip(void *priv,
-				     struct sk_buff *skb,
-				     const struct nf_hook_state *state)
+static unsigned int br_nf_forward_ip(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nf_bridge_info *nf_bridge;
 	struct net_device *parent;
 	u_int8_t pf;
@@ -638,10 +636,9 @@ static unsigned int br_nf_forward_ip(void *priv,
 	return NF_STOLEN;
 }
 
-static unsigned int br_nf_forward_arp(void *priv,
-				      struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int br_nf_forward_arp(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct net_bridge_port *p;
 	struct net_bridge *br;
 	struct net_device **d = (struct net_device **)(skb->cb);
@@ -812,10 +809,9 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
 }
 
 /* PF_BRIDGE/POST_ROUTING ********************************************/
-static unsigned int br_nf_post_routing(void *priv,
-				       struct sk_buff *skb,
-				       const struct nf_hook_state *state)
+static unsigned int br_nf_post_routing(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 	struct net_device *realoutdev = bridge_parent(skb->dev);
 	u_int8_t pf;
@@ -861,10 +857,9 @@ static unsigned int br_nf_post_routing(void *priv,
 /* IP/SABOTAGE *****************************************************/
 /* Don't hand locally destined packets to PF_INET(6)/PRE_ROUTING
  * for the second time. */
-static unsigned int ip_sabotage_in(void *priv,
-				   struct sk_buff *skb,
-				   const struct nf_hook_state *state)
+static unsigned int ip_sabotage_in(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
 
 	if (nf_bridge && !nf_bridge->in_prerouting &&
diff --git a/net/bridge/br_netfilter_ipv6.c b/net/bridge/br_netfilter_ipv6.c
index e4e0c836c3f5..e558f8b2175a 100644
--- a/net/bridge/br_netfilter_ipv6.c
+++ b/net/bridge/br_netfilter_ipv6.c
@@ -212,11 +212,10 @@ static int br_nf_pre_routing_finish_ipv6(struct net *net, struct sock *sk, struc
 /* Replicate the checks that IPv6 does on packet reception and pass the packet
  * to ip6tables.
  */
-unsigned int br_nf_pre_routing_ipv6(void *priv,
-				    struct sk_buff *skb,
-				    const struct nf_hook_state *state)
+unsigned int br_nf_pre_routing_ipv6(const struct nf_hook_state *state)
 {
 	struct nf_bridge_info *nf_bridge;
+	struct sk_buff *skb = state->skb;
 
 	if (br_validate_ipv6(state->net, skb))
 		return NF_DROP;
diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c
index a7af4eaff17d..54616a888f3f 100644
--- a/net/bridge/netfilter/ebtable_broute.c
+++ b/net/bridge/netfilter/ebtable_broute.c
@@ -51,9 +51,9 @@ static const struct ebt_table broute_table = {
 	.me		= THIS_MODULE,
 };
 
-static unsigned int ebt_broute(void *priv, struct sk_buff *skb,
-			       const struct nf_hook_state *s)
+static unsigned int ebt_broute(const struct nf_hook_state *s)
 {
+	struct sk_buff *skb = s->skb;
 	struct net_bridge_port *p = br_port_get_rcu(skb->dev);
 	struct nf_hook_state state;
 	unsigned char *dest;
@@ -66,7 +66,9 @@ static unsigned int ebt_broute(void *priv, struct sk_buff *skb,
 			   NFPROTO_BRIDGE, s->in, NULL, NULL,
 			   s->net, NULL);
 
-	ret = ebt_do_table(skb, &state, priv);
+	state.skb = skb;
+	state.priv = s->priv;
+	ret = ebt_do_table(skb, &state, s->priv);
 	if (ret != NF_DROP)
 		return ret;
 
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c
index c0b121df4a9a..aa36541a4f92 100644
--- a/net/bridge/netfilter/ebtable_filter.c
+++ b/net/bridge/netfilter/ebtable_filter.c
@@ -59,10 +59,9 @@ static const struct ebt_table frame_filter = {
 };
 
 static unsigned int
-ebt_filter_hook(void *priv, struct sk_buff *skb,
-		const struct nf_hook_state *state)
+ebt_filter_hook(const struct nf_hook_state *state)
 {
-	return ebt_do_table(skb, state, priv);
+	return ebt_do_table(state->skb, state, state->priv);
 }
 
 static const struct nf_hook_ops ebt_ops_filter[] = {
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c
index 4078151c224f..901029d1e34c 100644
--- a/net/bridge/netfilter/ebtable_nat.c
+++ b/net/bridge/netfilter/ebtable_nat.c
@@ -58,10 +58,9 @@ static const struct ebt_table frame_nat = {
 	.me		= THIS_MODULE,
 };
 
-static unsigned int ebt_nat_hook(void *priv, struct sk_buff *skb,
-				 const struct nf_hook_state *state)
+static unsigned int ebt_nat_hook(const struct nf_hook_state *state)
 {
-	return ebt_do_table(skb, state, priv);
+	return ebt_do_table(state->skb, state, state->priv);
 }
 
 static const struct nf_hook_ops ebt_ops_nat[] = {
diff --git a/net/bridge/netfilter/nf_conntrack_bridge.c b/net/bridge/netfilter/nf_conntrack_bridge.c
index fdbed3158555..7c9e533fec0d 100644
--- a/net/bridge/netfilter/nf_conntrack_bridge.c
+++ b/net/bridge/netfilter/nf_conntrack_bridge.c
@@ -236,10 +236,10 @@ static int nf_ct_br_ipv6_check(const struct sk_buff *skb)
 	return 0;
 }
 
-static unsigned int nf_ct_bridge_pre(void *priv, struct sk_buff *skb,
-				     const struct nf_hook_state *state)
+static unsigned int nf_ct_bridge_pre(const struct nf_hook_state *state)
 {
 	struct nf_hook_state bridge_state = *state;
+	struct sk_buff *skb = state->skb;
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct;
 	u32 len;
@@ -395,9 +395,9 @@ static unsigned int nf_ct_bridge_confirm(struct sk_buff *skb)
 	return nf_confirm(skb, protoff, ct, ctinfo);
 }
 
-static unsigned int nf_ct_bridge_post(void *priv, struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int nf_ct_bridge_post(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	int ret;
 
 	ret = nf_ct_bridge_confirm(skb);
diff --git a/net/ipv4/netfilter/arptable_filter.c b/net/ipv4/netfilter/arptable_filter.c
index 3de78416ec76..14f7316c0563 100644
--- a/net/ipv4/netfilter/arptable_filter.c
+++ b/net/ipv4/netfilter/arptable_filter.c
@@ -28,10 +28,9 @@ static const struct xt_table packet_filter = {
 
 /* The work comes in here from netfilter.c */
 static unsigned int
-arptable_filter_hook(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+arptable_filter_hook(const struct nf_hook_state *state)
 {
-	return arpt_do_table(skb, state, priv);
+	return arpt_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *arpfilter_ops __read_mostly;
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 8fd1aba8af31..0610933c503a 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -75,7 +75,7 @@ struct clusterip_net {
 	unsigned int hook_users;
 };
 
-static unsigned int clusterip_arp_mangle(void *priv, struct sk_buff *skb, const struct nf_hook_state *state);
+static unsigned int clusterip_arp_mangle(const struct nf_hook_state *state);
 
 static const struct nf_hook_ops cip_arp_ops = {
 	.hook = clusterip_arp_mangle,
@@ -635,9 +635,9 @@ static void arp_print(struct arp_payload *payload)
 #endif
 
 static unsigned int
-clusterip_arp_mangle(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+clusterip_arp_mangle(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct arphdr *arp = arp_hdr(skb);
 	struct arp_payload *payload;
 	struct clusterip_config *c;
diff --git a/net/ipv4/netfilter/iptable_filter.c b/net/ipv4/netfilter/iptable_filter.c
index 0eb0e2ab9bfc..d67577320d05 100644
--- a/net/ipv4/netfilter/iptable_filter.c
+++ b/net/ipv4/netfilter/iptable_filter.c
@@ -29,10 +29,9 @@ static const struct xt_table packet_filter = {
 };
 
 static unsigned int
-iptable_filter_hook(void *priv, struct sk_buff *skb,
-		    const struct nf_hook_state *state)
+iptable_filter_hook(const struct nf_hook_state *state)
 {
-	return ipt_do_table(skb, state, priv);
+	return ipt_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *filter_ops __read_mostly;
diff --git a/net/ipv4/netfilter/iptable_mangle.c b/net/ipv4/netfilter/iptable_mangle.c
index 40417a3f930b..b1585a9dd128 100644
--- a/net/ipv4/netfilter/iptable_mangle.c
+++ b/net/ipv4/netfilter/iptable_mangle.c
@@ -70,10 +70,11 @@ ipt_mangle_out(struct sk_buff *skb, const struct nf_hook_state *state, void *pri
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
-iptable_mangle_hook(void *priv,
-		     struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+iptable_mangle_hook(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
+
 	if (state->hook == NF_INET_LOCAL_OUT)
 		return ipt_mangle_out(skb, state, priv);
 	return ipt_do_table(skb, state, priv);
diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c
index 45d7e072e6a5..d51901b367e4 100644
--- a/net/ipv4/netfilter/iptable_nat.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -29,11 +29,9 @@ static const struct xt_table nf_nat_ipv4_table = {
 	.af		= NFPROTO_IPV4,
 };
 
-static unsigned int iptable_nat_do_chain(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
+static unsigned int iptable_nat_do_chain(const struct nf_hook_state *state)
 {
-	return ipt_do_table(skb, state, priv);
+	return ipt_do_table(state->skb, state, state->priv);
 }
 
 static const struct nf_hook_ops nf_nat_ipv4_ops[] = {
diff --git a/net/ipv4/netfilter/iptable_raw.c b/net/ipv4/netfilter/iptable_raw.c
index 8265c6765705..88dc4b8ab2ac 100644
--- a/net/ipv4/netfilter/iptable_raw.c
+++ b/net/ipv4/netfilter/iptable_raw.c
@@ -34,10 +34,9 @@ static const struct xt_table packet_raw_before_defrag = {
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
-iptable_raw_hook(void *priv, struct sk_buff *skb,
-		 const struct nf_hook_state *state)
+iptable_raw_hook(const struct nf_hook_state *state)
 {
-	return ipt_do_table(skb, state, priv);
+	return ipt_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *rawtable_ops __read_mostly;
diff --git a/net/ipv4/netfilter/iptable_security.c b/net/ipv4/netfilter/iptable_security.c
index f519162a2fa5..8ab59e0d04ae 100644
--- a/net/ipv4/netfilter/iptable_security.c
+++ b/net/ipv4/netfilter/iptable_security.c
@@ -34,10 +34,9 @@ static const struct xt_table security_table = {
 };
 
 static unsigned int
-iptable_security_hook(void *priv, struct sk_buff *skb,
-		      const struct nf_hook_state *state)
+iptable_security_hook(const struct nf_hook_state *state)
 {
-	return ipt_do_table(skb, state, priv);
+	return ipt_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *sectbl_ops __read_mostly;
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 613432a36f0a..82fa58b9276a 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -63,10 +63,9 @@ static enum ip_defrag_users nf_ct_defrag_user(unsigned int hooknum,
 		return IP_DEFRAG_CONNTRACK_OUT + zone_id;
 }
 
-static unsigned int ipv4_conntrack_defrag(void *priv,
-					  struct sk_buff *skb,
-					  const struct nf_hook_state *state)
+static unsigned int ipv4_conntrack_defrag(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct sock *sk = skb->sk;
 
 	if (sk && sk_fullsock(sk) && (sk->sk_family == PF_INET) &&
diff --git a/net/ipv6/netfilter/ip6table_filter.c b/net/ipv6/netfilter/ip6table_filter.c
index 727ee8097012..90c475ac13d6 100644
--- a/net/ipv6/netfilter/ip6table_filter.c
+++ b/net/ipv6/netfilter/ip6table_filter.c
@@ -29,10 +29,9 @@ static const struct xt_table packet_filter = {
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
-ip6table_filter_hook(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+ip6table_filter_hook(const struct nf_hook_state *state)
 {
-	return ip6t_do_table(skb, state, priv);
+	return ip6t_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *filter_ops __read_mostly;
diff --git a/net/ipv6/netfilter/ip6table_mangle.c b/net/ipv6/netfilter/ip6table_mangle.c
index 9b518ce37d6a..fc1f7a4b9d59 100644
--- a/net/ipv6/netfilter/ip6table_mangle.c
+++ b/net/ipv6/netfilter/ip6table_mangle.c
@@ -64,9 +64,11 @@ ip6t_mangle_out(struct sk_buff *skb, const struct nf_hook_state *state, void *pr
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
-ip6table_mangle_hook(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+ip6table_mangle_hook(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
+
 	if (state->hook == NF_INET_LOCAL_OUT)
 		return ip6t_mangle_out(skb, state, priv);
 	return ip6t_do_table(skb, state, priv);
diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c
index 921c1723a01e..d4c9f5d0dc37 100644
--- a/net/ipv6/netfilter/ip6table_nat.c
+++ b/net/ipv6/netfilter/ip6table_nat.c
@@ -31,11 +31,9 @@ static const struct xt_table nf_nat_ipv6_table = {
 	.af		= NFPROTO_IPV6,
 };
 
-static unsigned int ip6table_nat_do_chain(void *priv,
-					  struct sk_buff *skb,
-					  const struct nf_hook_state *state)
+static unsigned int ip6table_nat_do_chain(const struct nf_hook_state *state)
 {
-	return ip6t_do_table(skb, state, priv);
+	return ip6t_do_table(state->skb, state, state->priv);
 }
 
 static const struct nf_hook_ops nf_nat_ipv6_ops[] = {
diff --git a/net/ipv6/netfilter/ip6table_raw.c b/net/ipv6/netfilter/ip6table_raw.c
index 4f2a04af71d3..9655f93927ce 100644
--- a/net/ipv6/netfilter/ip6table_raw.c
+++ b/net/ipv6/netfilter/ip6table_raw.c
@@ -33,10 +33,9 @@ static const struct xt_table packet_raw_before_defrag = {
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
-ip6table_raw_hook(void *priv, struct sk_buff *skb,
-		  const struct nf_hook_state *state)
+ip6table_raw_hook(const struct nf_hook_state *state)
 {
-	return ip6t_do_table(skb, state, priv);
+	return ip6t_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *rawtable_ops __read_mostly;
diff --git a/net/ipv6/netfilter/ip6table_security.c b/net/ipv6/netfilter/ip6table_security.c
index 931674034d8b..ff2c244488ec 100644
--- a/net/ipv6/netfilter/ip6table_security.c
+++ b/net/ipv6/netfilter/ip6table_security.c
@@ -33,10 +33,9 @@ static const struct xt_table security_table = {
 };
 
 static unsigned int
-ip6table_security_hook(void *priv, struct sk_buff *skb,
-		       const struct nf_hook_state *state)
+ip6table_security_hook(const struct nf_hook_state *state)
 {
-	return ip6t_do_table(skb, state, priv);
+	return ip6t_do_table(state->skb, state, state->priv);
 }
 
 static struct nf_hook_ops *sectbl_ops __read_mostly;
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index e8a59d8bf2ad..e02f798702d4 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -50,10 +50,9 @@ static enum ip6_defrag_users nf_ct6_defrag_user(unsigned int hooknum,
 		return IP6_DEFRAG_CONNTRACK_OUT + zone_id;
 }
 
-static unsigned int ipv6_defrag(void *priv,
-				struct sk_buff *skb,
-				const struct nf_hook_state *state)
+static unsigned int ipv6_defrag(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	int err;
 
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 129d48304821..3fd268afc13e 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -88,9 +88,7 @@ static void nf_hook_entries_free(struct nf_hook_entries *e)
 	call_rcu(&head->head, __nf_hook_entries_free);
 }
 
-static unsigned int accept_all(void *priv,
-			       struct sk_buff *skb,
-			       const struct nf_hook_state *state)
+static unsigned int accept_all(const struct nf_hook_state *state)
 {
 	return NF_ACCEPT; /* ACCEPT makes nf_hook_slow call next hook */
 }
@@ -585,6 +583,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 	unsigned int verdict, s = state->hook_index;
 	int ret;
 
+	state->skb = skb;
 	for (; s < e->num_hook_entries; s++) {
 		verdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);
 		switch (verdict & NF_VERDICT_MASK) {
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 128690c512df..f69ed7648c71 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1474,10 +1474,9 @@ ip_vs_out(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, in
  *	Check if packet is reply for established ip_vs_conn.
  */
 static unsigned int
-ip_vs_reply4(void *priv, struct sk_buff *skb,
-	     const struct nf_hook_state *state)
+ip_vs_reply4(const struct nf_hook_state *state)
 {
-	return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET);
+	return ip_vs_out(net_ipvs(state->net), state->hook, state->skb, AF_INET);
 }
 
 /*
@@ -1485,10 +1484,9 @@ ip_vs_reply4(void *priv, struct sk_buff *skb,
  *	Check if packet is reply for established ip_vs_conn.
  */
 static unsigned int
-ip_vs_local_reply4(void *priv, struct sk_buff *skb,
-		   const struct nf_hook_state *state)
+ip_vs_local_reply4(const struct nf_hook_state *state)
 {
-	return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET);
+	return ip_vs_out(net_ipvs(state->net), state->hook, state->skb, AF_INET);
 }
 
 #ifdef CONFIG_IP_VS_IPV6
@@ -1499,10 +1497,9 @@ ip_vs_local_reply4(void *priv, struct sk_buff *skb,
  *	Check if packet is reply for established ip_vs_conn.
  */
 static unsigned int
-ip_vs_reply6(void *priv, struct sk_buff *skb,
-	     const struct nf_hook_state *state)
+ip_vs_reply6(const struct nf_hook_state *state)
 {
-	return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET6);
+	return ip_vs_out(net_ipvs(state->net), state->hook, state->skb, AF_INET6);
 }
 
 /*
@@ -1510,10 +1507,9 @@ ip_vs_reply6(void *priv, struct sk_buff *skb,
  *	Check if packet is reply for established ip_vs_conn.
  */
 static unsigned int
-ip_vs_local_reply6(void *priv, struct sk_buff *skb,
-		   const struct nf_hook_state *state)
+ip_vs_local_reply6(const struct nf_hook_state *state)
 {
-	return ip_vs_out(net_ipvs(state->net), state->hook, skb, AF_INET6);
+	return ip_vs_out(net_ipvs(state->net), state->hook, state->skb, AF_INET6);
 }
 
 #endif
@@ -2142,10 +2138,9 @@ ip_vs_in(struct netns_ipvs *ipvs, unsigned int hooknum, struct sk_buff *skb, int
  *	Schedule and forward packets from remote clients
  */
 static unsigned int
-ip_vs_remote_request4(void *priv, struct sk_buff *skb,
-		      const struct nf_hook_state *state)
+ip_vs_remote_request4(const struct nf_hook_state *state)
 {
-	return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET);
+	return ip_vs_in(net_ipvs(state->net), state->hook, state->skb, AF_INET);
 }
 
 /*
@@ -2153,10 +2148,9 @@ ip_vs_remote_request4(void *priv, struct sk_buff *skb,
  *	Schedule and forward packets from local clients
  */
 static unsigned int
-ip_vs_local_request4(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+ip_vs_local_request4(const struct nf_hook_state *state)
 {
-	return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET);
+	return ip_vs_in(net_ipvs(state->net), state->hook, state->skb, AF_INET);
 }
 
 #ifdef CONFIG_IP_VS_IPV6
@@ -2166,10 +2160,9 @@ ip_vs_local_request4(void *priv, struct sk_buff *skb,
  *	Schedule and forward packets from remote clients
  */
 static unsigned int
-ip_vs_remote_request6(void *priv, struct sk_buff *skb,
-		      const struct nf_hook_state *state)
+ip_vs_remote_request6(const struct nf_hook_state *state)
 {
-	return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET6);
+	return ip_vs_in(net_ipvs(state->net), state->hook, state->skb, AF_INET6);
 }
 
 /*
@@ -2177,10 +2170,9 @@ ip_vs_remote_request6(void *priv, struct sk_buff *skb,
  *	Schedule and forward packets from local clients
  */
 static unsigned int
-ip_vs_local_request6(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+ip_vs_local_request6(const struct nf_hook_state *state)
 {
-	return ip_vs_in(net_ipvs(state->net), state->hook, skb, AF_INET6);
+	return ip_vs_in(net_ipvs(state->net), state->hook, state->skb, AF_INET6);
 }
 
 #endif
@@ -2196,11 +2188,11 @@ ip_vs_local_request6(void *priv, struct sk_buff *skb,
  *      and send them to ip_vs_in_icmp.
  */
 static unsigned int
-ip_vs_forward_icmp(void *priv, struct sk_buff *skb,
-		   const struct nf_hook_state *state)
+ip_vs_forward_icmp(const struct nf_hook_state *state)
 {
 	int r;
 	struct netns_ipvs *ipvs = net_ipvs(state->net);
+	struct sk_buff *skb = state->skb;
 
 	if (ip_hdr(skb)->protocol != IPPROTO_ICMP)
 		return NF_ACCEPT;
@@ -2214,11 +2206,11 @@ ip_vs_forward_icmp(void *priv, struct sk_buff *skb,
 
 #ifdef CONFIG_IP_VS_IPV6
 static unsigned int
-ip_vs_forward_icmp_v6(void *priv, struct sk_buff *skb,
-		      const struct nf_hook_state *state)
+ip_vs_forward_icmp_v6(const struct nf_hook_state *state)
 {
 	int r;
 	struct netns_ipvs *ipvs = net_ipvs(state->net);
+	struct sk_buff *skb = state->skb;
 	struct ip_vs_iphdr iphdr;
 
 	ip_vs_fill_iph_skb(AF_INET6, skb, false, &iphdr);
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index 8f7a9837349c..3207bb64e4ca 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -155,10 +155,9 @@ unsigned int nf_confirm(struct sk_buff *skb, unsigned int protoff,
 }
 EXPORT_SYMBOL_GPL(nf_confirm);
 
-static unsigned int ipv4_confirm(void *priv,
-				 struct sk_buff *skb,
-				 const struct nf_hook_state *state)
+static unsigned int ipv4_confirm(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct;
 
@@ -171,17 +170,15 @@ static unsigned int ipv4_confirm(void *priv,
 			  ct, ctinfo);
 }
 
-static unsigned int ipv4_conntrack_in(void *priv,
-				      struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int ipv4_conntrack_in(const struct nf_hook_state *state)
 {
-	return nf_conntrack_in(skb, state);
+	return nf_conntrack_in(state->skb, state);
 }
 
-static unsigned int ipv4_conntrack_local(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
+static unsigned int ipv4_conntrack_local(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+
 	if (ip_is_fragment(ip_hdr(skb))) { /* IP_NODEFRAG setsockopt set */
 		enum ip_conntrack_info ctinfo;
 		struct nf_conn *tmpl;
@@ -360,10 +357,9 @@ static struct nf_sockopt_ops so_getorigdst6 = {
 	.owner		= THIS_MODULE,
 };
 
-static unsigned int ipv6_confirm(void *priv,
-				 struct sk_buff *skb,
-				 const struct nf_hook_state *state)
+static unsigned int ipv6_confirm(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
 	unsigned char pnum = ipv6_hdr(skb)->nexthdr;
@@ -384,18 +380,14 @@ static unsigned int ipv6_confirm(void *priv,
 	return nf_confirm(skb, protoff, ct, ctinfo);
 }
 
-static unsigned int ipv6_conntrack_in(void *priv,
-				      struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int ipv6_conntrack_in(const struct nf_hook_state *state)
 {
-	return nf_conntrack_in(skb, state);
+	return nf_conntrack_in(state->skb, state);
 }
 
-static unsigned int ipv6_conntrack_local(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
+static unsigned int ipv6_conntrack_local(const struct nf_hook_state *state)
 {
-	return nf_conntrack_in(skb, state);
+	return nf_conntrack_in(state->skb, state);
 }
 
 static const struct nf_hook_ops ipv6_conntrack_ops[] = {
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c
index bc4126d8ef65..8091d79d76cc 100644
--- a/net/netfilter/nf_flow_table_inet.c
+++ b/net/netfilter/nf_flow_table_inet.c
@@ -8,14 +8,15 @@
 #include <net/netfilter/nf_tables.h>
 
 static unsigned int
-nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
-			  const struct nf_hook_state *state)
+nf_flow_offload_inet_hook(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+
 	switch (skb->protocol) {
 	case htons(ETH_P_IP):
-		return nf_flow_offload_ip_hook(priv, skb, state);
+		return nf_flow_offload_ip_hook(state);
 	case htons(ETH_P_IPV6):
-		return nf_flow_offload_ipv6_hook(priv, skb, state);
+		return nf_flow_offload_ipv6_hook(state);
 	}
 
 	return NF_ACCEPT;
diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 889cf88d3dba..80dd90ee4ffc 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -325,12 +325,12 @@ static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
 }
 
 unsigned int
-nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
-			const struct nf_hook_state *state)
+nf_flow_offload_ip_hook(const struct nf_hook_state *state)
 {
+	struct nf_flowtable *flow_table = state->priv;
 	struct flow_offload_tuple_rhash *tuplehash;
-	struct nf_flowtable *flow_table = priv;
 	struct flow_offload_tuple tuple = {};
+	struct sk_buff *skb = state->skb;
 	enum flow_offload_tuple_dir dir;
 	struct flow_offload *flow;
 	struct net_device *outdev;
@@ -561,12 +561,12 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
 }
 
 unsigned int
-nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
-			  const struct nf_hook_state *state)
+nf_flow_offload_ipv6_hook(const struct nf_hook_state *state)
 {
+	struct nf_flowtable *flow_table = state->priv;
 	struct flow_offload_tuple_rhash *tuplehash;
-	struct nf_flowtable *flow_table = priv;
 	struct flow_offload_tuple tuple = {};
+	struct sk_buff *skb = state->skb;
 	enum flow_offload_tuple_dir dir;
 	const struct in6_addr *nexthop;
 	struct flow_offload *flow;
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index a6a273fff3f6..9105764d52a4 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -700,20 +700,24 @@ unsigned int nf_nat_packet(struct nf_conn *ct,
 EXPORT_SYMBOL_GPL(nf_nat_packet);
 
 static unsigned int nf_nat_inet_run_hooks(const struct nf_hook_state *state,
-					  struct sk_buff *skb,
 					  struct nf_conn *ct,
 					  struct nf_nat_lookup_hook_priv *lpriv)
 {
 	enum nf_nat_manip_type maniptype = HOOK2MANIP(state->hook);
 	struct nf_hook_entries *e = rcu_dereference(lpriv->entries);
+	struct nf_hook_state __state;
 	unsigned int ret;
 	int i;
 
 	if (!e)
 		goto null_bind;
 
+	__state = *state;
+
 	for (i = 0; i < e->num_hook_entries; i++) {
-		ret = e->hooks[i].hook(e->hooks[i].priv, skb, state);
+		__state.priv = e->hooks[i].priv;
+
+		ret = e->hooks[i].hook(&__state);
 		if (ret != NF_ACCEPT)
 			return ret;
 
@@ -758,7 +762,7 @@ nf_nat_inet_fn(void *priv, struct sk_buff *skb,
 			struct nf_nat_lookup_hook_priv *lpriv = priv;
 			unsigned int ret;
 
-			ret = nf_nat_inet_run_hooks(state, skb, ct, lpriv);
+			ret = nf_nat_inet_run_hooks(state, ct, lpriv);
 			if (ret != NF_ACCEPT)
 				return ret;
 		} else {
diff --git a/net/netfilter/nf_nat_proto.c b/net/netfilter/nf_nat_proto.c
index 48cc60084d28..187d3e59fb53 100644
--- a/net/netfilter/nf_nat_proto.c
+++ b/net/netfilter/nf_nat_proto.c
@@ -622,11 +622,12 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
 EXPORT_SYMBOL_GPL(nf_nat_icmp_reply_translation);
 
 static unsigned int
-nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
-	       const struct nf_hook_state *state)
+nf_nat_ipv4_fn(const struct nf_hook_state *state)
 {
-	struct nf_conn *ct;
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
 	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
 
 	ct = nf_ct_get(skb, &ctinfo);
 	if (!ct)
@@ -646,13 +647,13 @@ nf_nat_ipv4_fn(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv4_pre_routing(void *priv, struct sk_buff *skb,
-			const struct nf_hook_state *state)
+nf_nat_ipv4_pre_routing(const struct nf_hook_state *state)
 {
-	unsigned int ret;
+	struct sk_buff *skb = state->skb;
 	__be32 daddr = ip_hdr(skb)->daddr;
+	unsigned int ret;
 
-	ret = nf_nat_ipv4_fn(priv, skb, state);
+	ret = nf_nat_ipv4_fn(state);
 	if (ret == NF_ACCEPT && daddr != ip_hdr(skb)->daddr)
 		skb_dst_drop(skb);
 
@@ -698,14 +699,14 @@ static int nf_xfrm_me_harder(struct net *net, struct sk_buff *skb, unsigned int
 #endif
 
 static unsigned int
-nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+nf_nat_ipv4_local_in(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	__be32 saddr = ip_hdr(skb)->saddr;
 	struct sock *sk = skb->sk;
 	unsigned int ret;
 
-	ret = nf_nat_ipv4_fn(priv, skb, state);
+	ret = nf_nat_ipv4_fn(state);
 
 	if (ret == NF_ACCEPT && sk && saddr != ip_hdr(skb)->saddr &&
 	    !inet_sk_transparent(sk))
@@ -715,9 +716,9 @@ nf_nat_ipv4_local_in(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
-		const struct nf_hook_state *state)
+nf_nat_ipv4_out(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 #ifdef CONFIG_XFRM
 	const struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
@@ -725,7 +726,7 @@ nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
 #endif
 	unsigned int ret;
 
-	ret = nf_nat_ipv4_fn(priv, skb, state);
+	ret = nf_nat_ipv4_fn(state);
 #ifdef CONFIG_XFRM
 	if (ret != NF_ACCEPT)
 		return ret;
@@ -752,15 +753,15 @@ nf_nat_ipv4_out(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv4_local_fn(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+nf_nat_ipv4_local_fn(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	const struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
 	unsigned int ret;
 	int err;
 
-	ret = nf_nat_ipv4_fn(priv, skb, state);
+	ret = nf_nat_ipv4_fn(state);
 	if (ret != NF_ACCEPT)
 		return ret;
 
@@ -901,9 +902,10 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
 EXPORT_SYMBOL_GPL(nf_nat_icmpv6_reply_translation);
 
 static unsigned int
-nf_nat_ipv6_fn(void *priv, struct sk_buff *skb,
-	       const struct nf_hook_state *state)
+nf_nat_ipv6_fn(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
 	struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
 	__be16 frag_off;
@@ -938,13 +940,13 @@ nf_nat_ipv6_fn(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv6_in(void *priv, struct sk_buff *skb,
-	       const struct nf_hook_state *state)
+nf_nat_ipv6_in(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	unsigned int ret;
 	struct in6_addr daddr = ipv6_hdr(skb)->daddr;
 
-	ret = nf_nat_ipv6_fn(priv, skb, state);
+	ret = nf_nat_ipv6_fn(state);
 	if (ret != NF_DROP && ret != NF_STOLEN &&
 	    ipv6_addr_cmp(&daddr, &ipv6_hdr(skb)->daddr))
 		skb_dst_drop(skb);
@@ -953,9 +955,9 @@ nf_nat_ipv6_in(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv6_out(void *priv, struct sk_buff *skb,
-		const struct nf_hook_state *state)
+nf_nat_ipv6_out(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 #ifdef CONFIG_XFRM
 	const struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
@@ -963,7 +965,7 @@ nf_nat_ipv6_out(void *priv, struct sk_buff *skb,
 #endif
 	unsigned int ret;
 
-	ret = nf_nat_ipv6_fn(priv, skb, state);
+	ret = nf_nat_ipv6_fn(state);
 #ifdef CONFIG_XFRM
 	if (ret != NF_ACCEPT)
 		return ret;
@@ -990,15 +992,15 @@ nf_nat_ipv6_out(void *priv, struct sk_buff *skb,
 }
 
 static unsigned int
-nf_nat_ipv6_local_fn(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
+nf_nat_ipv6_local_fn(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	const struct nf_conn *ct;
 	enum ip_conntrack_info ctinfo;
 	unsigned int ret;
 	int err;
 
-	ret = nf_nat_ipv6_fn(priv, skb, state);
+	ret = nf_nat_ipv6_fn(state);
 	if (ret != NF_ACCEPT)
 		return ret;
 
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 3d6d49420db8..3da1b13ccd8e 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -659,10 +659,10 @@ synproxy_recv_client_ack(struct net *net,
 EXPORT_SYMBOL_GPL(synproxy_recv_client_ack);
 
 unsigned int
-ipv4_synproxy_hook(void *priv, struct sk_buff *skb,
-		   const struct nf_hook_state *nhs)
+ipv4_synproxy_hook(const struct nf_hook_state *nhs)
 {
 	struct net *net = nhs->net;
+	struct sk_buff *skb = nhs->skb;
 	struct synproxy_net *snet = synproxy_pernet(net);
 	enum ip_conntrack_info ctinfo;
 	struct nf_conn *ct;
@@ -1076,9 +1076,9 @@ synproxy_recv_client_ack_ipv6(struct net *net,
 EXPORT_SYMBOL_GPL(synproxy_recv_client_ack_ipv6);
 
 unsigned int
-ipv6_synproxy_hook(void *priv, struct sk_buff *skb,
-		   const struct nf_hook_state *nhs)
+ipv6_synproxy_hook(const struct nf_hook_state *nhs)
 {
+	struct sk_buff *skb = nhs->skb;
 	struct net *net = nhs->net;
 	struct synproxy_net *snet = synproxy_pernet(net);
 	enum ip_conntrack_info ctinfo;
diff --git a/net/netfilter/nft_chain_filter.c b/net/netfilter/nft_chain_filter.c
index 5b02408a920b..df5a84996baa 100644
--- a/net/netfilter/nft_chain_filter.c
+++ b/net/netfilter/nft_chain_filter.c
@@ -11,16 +11,15 @@
 #include <net/netfilter/nf_tables_ipv6.h>
 
 #ifdef CONFIG_NF_TABLES_IPV4
-static unsigned int nft_do_chain_ipv4(void *priv,
-				      struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int nft_do_chain_ipv4(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nft_pktinfo pkt;
 
 	nft_set_pktinfo(&pkt, skb, state);
 	nft_set_pktinfo_ipv4(&pkt);
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 static const struct nft_chain_type nft_chain_filter_ipv4 = {
@@ -56,15 +55,15 @@ static inline void nft_chain_filter_ipv4_fini(void) {}
 #endif /* CONFIG_NF_TABLES_IPV4 */
 
 #ifdef CONFIG_NF_TABLES_ARP
-static unsigned int nft_do_chain_arp(void *priv, struct sk_buff *skb,
-				     const struct nf_hook_state *state)
+static unsigned int nft_do_chain_arp(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nft_pktinfo pkt;
 
 	nft_set_pktinfo(&pkt, skb, state);
 	nft_set_pktinfo_unspec(&pkt);
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 static const struct nft_chain_type nft_chain_filter_arp = {
@@ -95,16 +94,15 @@ static inline void nft_chain_filter_arp_fini(void) {}
 #endif /* CONFIG_NF_TABLES_ARP */
 
 #ifdef CONFIG_NF_TABLES_IPV6
-static unsigned int nft_do_chain_ipv6(void *priv,
-				      struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int nft_do_chain_ipv6(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nft_pktinfo pkt;
 
 	nft_set_pktinfo(&pkt, skb, state);
 	nft_set_pktinfo_ipv6(&pkt);
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 static const struct nft_chain_type nft_chain_filter_ipv6 = {
@@ -140,9 +138,9 @@ static inline void nft_chain_filter_ipv6_fini(void) {}
 #endif /* CONFIG_NF_TABLES_IPV6 */
 
 #ifdef CONFIG_NF_TABLES_INET
-static unsigned int nft_do_chain_inet(void *priv, struct sk_buff *skb,
-				      const struct nf_hook_state *state)
+static unsigned int nft_do_chain_inet(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nft_pktinfo pkt;
 
 	nft_set_pktinfo(&pkt, skb, state);
@@ -158,13 +156,13 @@ static unsigned int nft_do_chain_inet(void *priv, struct sk_buff *skb,
 		break;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
-static unsigned int nft_do_chain_inet_ingress(void *priv, struct sk_buff *skb,
-					      const struct nf_hook_state *state)
+static unsigned int nft_do_chain_inet_ingress(const struct nf_hook_state *state)
 {
 	struct nf_hook_state ingress_state = *state;
+	struct sk_buff *skb = state->skb;
 	struct nft_pktinfo pkt;
 
 	switch (skb->protocol) {
@@ -189,7 +187,7 @@ static unsigned int nft_do_chain_inet_ingress(void *priv, struct sk_buff *skb,
 		return NF_ACCEPT;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 static const struct nft_chain_type nft_chain_filter_inet = {
@@ -228,10 +226,9 @@ static inline void nft_chain_filter_inet_fini(void) {}
 
 #if IS_ENABLED(CONFIG_NF_TABLES_BRIDGE)
 static unsigned int
-nft_do_chain_bridge(void *priv,
-		    struct sk_buff *skb,
-		    const struct nf_hook_state *state)
+nft_do_chain_bridge(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
 	struct nft_pktinfo pkt;
 
 	nft_set_pktinfo(&pkt, skb, state);
@@ -248,7 +245,7 @@ nft_do_chain_bridge(void *priv,
 		break;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 static const struct nft_chain_type nft_chain_filter_bridge = {
@@ -284,14 +281,13 @@ static inline void nft_chain_filter_bridge_fini(void) {}
 #endif /* CONFIG_NF_TABLES_BRIDGE */
 
 #ifdef CONFIG_NF_TABLES_NETDEV
-static unsigned int nft_do_chain_netdev(void *priv, struct sk_buff *skb,
-					const struct nf_hook_state *state)
+static unsigned int nft_do_chain_netdev(const struct nf_hook_state *state)
 {
 	struct nft_pktinfo pkt;
 
-	nft_set_pktinfo(&pkt, skb, state);
+	nft_set_pktinfo(&pkt, state->skb, state);
 
-	switch (skb->protocol) {
+	switch (state->skb->protocol) {
 	case htons(ETH_P_IP):
 		nft_set_pktinfo_ipv4_validate(&pkt);
 		break;
@@ -303,7 +299,7 @@ static unsigned int nft_do_chain_netdev(void *priv, struct sk_buff *skb,
 		break;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 static const struct nft_chain_type nft_chain_filter_netdev = {
diff --git a/net/netfilter/nft_chain_nat.c b/net/netfilter/nft_chain_nat.c
index 98e4946100c5..7eff7e499f54 100644
--- a/net/netfilter/nft_chain_nat.c
+++ b/net/netfilter/nft_chain_nat.c
@@ -7,12 +7,11 @@
 #include <net/netfilter/nf_tables_ipv4.h>
 #include <net/netfilter/nf_tables_ipv6.h>
 
-static unsigned int nft_nat_do_chain(void *priv, struct sk_buff *skb,
-				     const struct nf_hook_state *state)
+static unsigned int nft_nat_do_chain(const struct nf_hook_state *state)
 {
 	struct nft_pktinfo pkt;
 
-	nft_set_pktinfo(&pkt, skb, state);
+	nft_set_pktinfo(&pkt, state->skb, state);
 
 	switch (state->pf) {
 #ifdef CONFIG_NF_TABLES_IPV4
@@ -29,7 +28,7 @@ static unsigned int nft_nat_do_chain(void *priv, struct sk_buff *skb,
 		break;
 	}
 
-	return nft_do_chain(&pkt, priv);
+	return nft_do_chain(&pkt, state->priv);
 }
 
 #ifdef CONFIG_NF_TABLES_IPV4
diff --git a/net/netfilter/nft_chain_route.c b/net/netfilter/nft_chain_route.c
index 925db0dce48d..8c9f31a96d6f 100644
--- a/net/netfilter/nft_chain_route.c
+++ b/net/netfilter/nft_chain_route.c
@@ -13,10 +13,10 @@
 #include <net/ip.h>
 
 #ifdef CONFIG_NF_TABLES_IPV4
-static unsigned int nf_route_table_hook4(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
+static unsigned int nf_route_table_hook4(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
 	const struct iphdr *iph;
 	struct nft_pktinfo pkt;
 	__be32 saddr, daddr;
@@ -62,10 +62,10 @@ static const struct nft_chain_type nft_chain_route_ipv4 = {
 #endif
 
 #ifdef CONFIG_NF_TABLES_IPV6
-static unsigned int nf_route_table_hook6(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
+static unsigned int nf_route_table_hook6(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
 	struct in6_addr saddr, daddr;
 	struct nft_pktinfo pkt;
 	u32 mark, flowlabel;
@@ -112,17 +112,17 @@ static const struct nft_chain_type nft_chain_route_ipv6 = {
 #endif
 
 #ifdef CONFIG_NF_TABLES_INET
-static unsigned int nf_route_table_inet(void *priv,
-					struct sk_buff *skb,
-					const struct nf_hook_state *state)
+static unsigned int nf_route_table_inet(const struct nf_hook_state *state)
 {
+	struct sk_buff *skb = state->skb;
+	void *priv = state->priv;
 	struct nft_pktinfo pkt;
 
 	switch (state->pf) {
 	case NFPROTO_IPV4:
-		return nf_route_table_hook4(priv, skb, state);
+		return nf_route_table_hook4(state);
 	case NFPROTO_IPV6:
-		return nf_route_table_hook6(priv, skb, state);
+		return nf_route_table_hook6(state);
 	default:
 		nft_set_pktinfo(&pkt, skb, state);
 		break;
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index e7ebd45ca345..688e8dabc037 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -5746,22 +5746,11 @@ static unsigned int selinux_ip_forward(struct sk_buff *skb,
 	return NF_ACCEPT;
 }
 
-static unsigned int selinux_ipv4_forward(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
+static unsigned int selinux_hook_forward(const struct nf_hook_state *state)
 {
-	return selinux_ip_forward(skb, state->in, PF_INET);
+	return selinux_ip_forward(state->skb, state->in, state->pf);
 }
 
-#if IS_ENABLED(CONFIG_IPV6)
-static unsigned int selinux_ipv6_forward(void *priv,
-					 struct sk_buff *skb,
-					 const struct nf_hook_state *state)
-{
-	return selinux_ip_forward(skb, state->in, PF_INET6);
-}
-#endif	/* IPV6 */
-
 static unsigned int selinux_ip_output(struct sk_buff *skb,
 				      u16 family)
 {
@@ -5804,21 +5793,10 @@ static unsigned int selinux_ip_output(struct sk_buff *skb,
 	return NF_ACCEPT;
 }
 
-static unsigned int selinux_ipv4_output(void *priv,
-					struct sk_buff *skb,
-					const struct nf_hook_state *state)
-{
-	return selinux_ip_output(skb, PF_INET);
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-static unsigned int selinux_ipv6_output(void *priv,
-					struct sk_buff *skb,
-					const struct nf_hook_state *state)
+static unsigned int selinux_hook_output(const struct nf_hook_state *state)
 {
-	return selinux_ip_output(skb, PF_INET6);
+	return selinux_ip_output(state->skb, state->pf);
 }
-#endif	/* IPV6 */
 
 static unsigned int selinux_ip_postroute_compat(struct sk_buff *skb,
 						int ifindex,
@@ -5994,22 +5972,10 @@ static unsigned int selinux_ip_postroute(struct sk_buff *skb,
 	return NF_ACCEPT;
 }
 
-static unsigned int selinux_ipv4_postroute(void *priv,
-					   struct sk_buff *skb,
-					   const struct nf_hook_state *state)
-{
-	return selinux_ip_postroute(skb, state->out, PF_INET);
-}
-
-#if IS_ENABLED(CONFIG_IPV6)
-static unsigned int selinux_ipv6_postroute(void *priv,
-					   struct sk_buff *skb,
-					   const struct nf_hook_state *state)
+static unsigned int selinux_hook_postroute(const struct nf_hook_state *state)
 {
-	return selinux_ip_postroute(skb, state->out, PF_INET6);
+	return selinux_ip_postroute(state->skb, state->out, state->pf);
 }
-#endif	/* IPV6 */
-
 #endif	/* CONFIG_NETFILTER */
 
 static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
@@ -7470,38 +7436,38 @@ DEFINE_LSM(selinux) = {
 
 static const struct nf_hook_ops selinux_nf_ops[] = {
 	{
-		.hook =		selinux_ipv4_postroute,
+		.hook =		selinux_hook_postroute,
 		.pf =		NFPROTO_IPV4,
 		.hooknum =	NF_INET_POST_ROUTING,
 		.priority =	NF_IP_PRI_SELINUX_LAST,
 	},
 	{
-		.hook =		selinux_ipv4_forward,
+		.hook =		selinux_hook_forward,
 		.pf =		NFPROTO_IPV4,
 		.hooknum =	NF_INET_FORWARD,
 		.priority =	NF_IP_PRI_SELINUX_FIRST,
 	},
 	{
-		.hook =		selinux_ipv4_output,
+		.hook =		selinux_hook_output,
 		.pf =		NFPROTO_IPV4,
 		.hooknum =	NF_INET_LOCAL_OUT,
 		.priority =	NF_IP_PRI_SELINUX_FIRST,
 	},
 #if IS_ENABLED(CONFIG_IPV6)
 	{
-		.hook =		selinux_ipv6_postroute,
+		.hook =		selinux_hook_postroute,
 		.pf =		NFPROTO_IPV6,
 		.hooknum =	NF_INET_POST_ROUTING,
 		.priority =	NF_IP6_PRI_SELINUX_LAST,
 	},
 	{
-		.hook =		selinux_ipv6_forward,
+		.hook =		selinux_hook_forward,
 		.pf =		NFPROTO_IPV6,
 		.hooknum =	NF_INET_FORWARD,
 		.priority =	NF_IP6_PRI_SELINUX_FIRST,
 	},
 	{
-		.hook =		selinux_ipv6_output,
+		.hook =		selinux_hook_output,
 		.pf =		NFPROTO_IPV6,
 		.hooknum =	NF_INET_LOCAL_OUT,
 		.priority =	NF_IP6_PRI_SELINUX_FIRST,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 5/9] netfilter: reduce allowed hook count to 32
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (4 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 4/9] netfilter: make hook functions accept only one argument Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 6/9] netfilter: add bpf base hook program generator Florian Westphal
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

1k is huge and will mean we'd need to support tailcalls in the
nf_hook bpf converter.

We need about 5 insns per hook at this time, ignoring prologue/epilogue.

32 should be fine, typically even extreme cases need about 8 hooks per
hook location.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 3fd268afc13e..f4359179eba9 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -42,7 +42,7 @@ EXPORT_SYMBOL(nf_hooks_needed);
 static DEFINE_MUTEX(nf_hook_mutex);
 
 /* max hooks per family/hooknum */
-#define MAX_HOOK_COUNT		1024
+#define MAX_HOOK_COUNT		32
 
 #define nf_entry_dereference(e) \
 	rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 6/9] netfilter: add bpf base hook program generator
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (5 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 5/9] netfilter: reduce allowed hook count to 32 Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 7/9] netfilter: core: do not rebuild bpf program on dying netns Florian Westphal
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

Add a kernel bpf program generator for netfilter base hooks.

Currently netfilter hooks are invoked by nf_hook_slow:

for i in hooks; do
  verdict = hooks[i]->indirect_func(hooks->[i].hook_arg, skb, state);

  switch (verdict) { ....

The autogenerator unrolls the loop, so we get:

state->priv = hooks->[0].hook_arg;
v = first_hook_function(state);
if (v != ACCEPT) goto done;
state->priv = hooks->[1].hook_arg;
v = second_hook_function(state); ...

Indirections are replaced by direct calls. Invocation of the
autogenerated programs is done via bpf dispatcher from nf_hook().

The autogenerated program has the same return value scheme as
nf_hook_slow(). NF_HOOK() points are converted to call the
autogenerated bpf program instead of nf_hook_slow().

Purpose of this is to eventually add a 'netfilter prog type' to bpf and
permit attachment of (userspace generated) bpf programs to the netfilter
machinery, e.g.  'attach bpf prog id 1234 to ipv6 PREROUTING at prio -300'.

This will require to expose the context structure (program argument,
'__nf_hook_state', with rewriting accesses to match nf_hook_state layout.

TODO:
1. Test !x86_64.
2. Test bridge family.

Future work:
add support for NAT hooks, they still use indirect calls, but those
are less of a problem because these get called only once per
connection.

Could annotate ops struct as to what kind of verdicts the
C function can return.  This would allow to elide retval
check when hook can only return NF_ACCEPT.

Could add extra support for INGRESS hook to move more code from
inline functions to the autogenerated program.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter.h           |  56 ++++
 include/net/netfilter/nf_hook_bpf.h |  14 +
 net/netfilter/Kconfig               |  10 +
 net/netfilter/Makefile              |   1 +
 net/netfilter/core.c                |  74 ++++-
 net/netfilter/nf_hook_bpf.c         | 425 ++++++++++++++++++++++++++++
 6 files changed, 577 insertions(+), 3 deletions(-)
 create mode 100644 include/net/netfilter/nf_hook_bpf.h
 create mode 100644 net/netfilter/nf_hook_bpf.c

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index c5de525218c2..9d22e672710c 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_NETFILTER_H
 #define __LINUX_NETFILTER_H
 
+#include <linux/filter.h>
 #include <linux/init.h>
 #include <linux/skbuff.h>
 #include <linux/net.h>
@@ -106,6 +107,9 @@ struct nf_hook_entries_rcu_head {
 };
 
 struct nf_hook_entries {
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	struct bpf_prog			*hook_prog;
+#endif
 	u16				num_hook_entries;
 	/* padding */
 	struct nf_hook_entry		hooks[];
@@ -205,6 +209,17 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 
 void nf_hook_slow_list(struct list_head *head, struct nf_hook_state *state,
 		       const struct nf_hook_entries *e);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+DECLARE_BPF_DISPATCHER(nf_hook_base);
+
+static __always_inline int bpf_prog_run_nf(const struct bpf_prog *prog,
+					   struct nf_hook_state *state)
+{
+	return __bpf_prog_run(prog, state, BPF_DISPATCHER_FUNC(nf_hook_base));
+}
+#endif
+
 /**
  *	nf_hook - call a netfilter hook
  *
@@ -259,11 +274,24 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,
 
 	if (hook_head) {
 		struct nf_hook_state state;
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+		const struct bpf_prog *p = READ_ONCE(hook_head->hook_prog);
+
+		nf_hook_state_init(&state, hook, pf, indev, outdev,
+				   sk, net, okfn);
+
+		state.priv = (void *)hook_head;
+		state.skb = skb;
 
+		migrate_disable();
+		ret = bpf_prog_run_nf(p, &state);
+		migrate_enable();
+#else
 		nf_hook_state_init(&state, hook, pf, indev, outdev,
 				   sk, net, okfn);
 
 		ret = nf_hook_slow(skb, &state, hook_head);
+#endif
 	}
 	rcu_read_unlock();
 
@@ -341,10 +369,38 @@ NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 
 	if (hook_head) {
 		struct nf_hook_state state;
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+		const struct bpf_prog *p = hook_head->hook_prog;
+		struct sk_buff *skb, *next;
+		struct list_head sublist;
+		int ret;
 
 		nf_hook_state_init(&state, hook, pf, in, out, sk, net, okfn);
 
+		INIT_LIST_HEAD(&sublist);
+
+		migrate_disable();
+
+		list_for_each_entry_safe(skb, next, head, list) {
+			skb_list_del_init(skb);
+
+			state.priv = (void *)hook_head;
+			state.skb = skb;
+
+			ret = bpf_prog_run_nf(p, &state);
+			if (ret == 1)
+				list_add_tail(&skb->list, &sublist);
+		}
+
+		migrate_enable();
+
+		/* Put passed packets back on main list */
+		list_splice(&sublist, head);
+#else
+		nf_hook_state_init(&state, hook, pf, in, out, sk, net, okfn);
+
 		nf_hook_slow_list(head, &state, hook_head);
+#endif
 	}
 	rcu_read_unlock();
 }
diff --git a/include/net/netfilter/nf_hook_bpf.h b/include/net/netfilter/nf_hook_bpf.h
new file mode 100644
index 000000000000..12304e9f3d25
--- /dev/null
+++ b/include/net/netfilter/nf_hook_bpf.h
@@ -0,0 +1,14 @@
+struct bpf_dispatcher;
+struct bpf_prog;
+
+struct bpf_prog *nf_hook_bpf_create(const struct nf_hook_entries *n);
+struct bpf_prog *nf_hook_bpf_create_fb(void);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+void nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to);
+#else
+static inline void
+nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *f, struct bpf_prog *t)
+{
+}
+#endif
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 54395266339d..6eec1720ff3d 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -19,6 +19,16 @@ config NETFILTER_FAMILY_BRIDGE
 config NETFILTER_FAMILY_ARP
 	bool
 
+config HAVE_NF_HOOK_BPF
+	bool
+
+config NF_HOOK_BPF
+	bool "netfilter base hook bpf translator"
+	depends on BPF_JIT
+	help
+	  This partially unrolls nf_hook_slow interpreter loop with
+	  auto-generated BPF programs.
+
 config NETFILTER_NETLINK_HOOK
 	tristate "Netfilter base hook dump support"
 	depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index aab20e575ecd..13f1b95a7809 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -16,6 +16,7 @@ nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_GRE) += nf_conntrack_proto_gre.o
 
 obj-$(CONFIG_NETFILTER) = netfilter.o
+obj-$(CONFIG_NF_HOOK_BPF) += nf_hook_bpf.o
 
 obj-$(CONFIG_NETFILTER_NETLINK) += nfnetlink.o
 obj-$(CONFIG_NETFILTER_NETLINK_ACCT) += nfnetlink_acct.o
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index f4359179eba9..56d82822cab7 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <net/net_namespace.h>
 #include <net/netfilter/nf_queue.h>
+#include <net/netfilter/nf_hook_bpf.h>
 #include <net/sock.h>
 
 #include "nf_internals.h"
@@ -47,6 +48,12 @@ static DEFINE_MUTEX(nf_hook_mutex);
 #define nf_entry_dereference(e) \
 	rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))
 
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+DEFINE_BPF_DISPATCHER(nf_hook_base);
+
+static struct bpf_prog *fallback_nf_hook_slow;
+#endif
+
 static struct nf_hook_entries *allocate_hook_entries_size(u16 num)
 {
 	struct nf_hook_entries *e;
@@ -58,9 +65,25 @@ static struct nf_hook_entries *allocate_hook_entries_size(u16 num)
 	if (num == 0)
 		return NULL;
 
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	if (!fallback_nf_hook_slow) {
+		/* never free'd */
+		fallback_nf_hook_slow = nf_hook_bpf_create_fb();
+
+		if (!fallback_nf_hook_slow)
+			return NULL;
+	}
+#endif
+
 	e = kvzalloc(alloc, GFP_KERNEL);
-	if (e)
-		e->num_hook_entries = num;
+	if (!e)
+		return NULL;
+
+	e->num_hook_entries = num;
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	e->hook_prog = fallback_nf_hook_slow;
+#endif
+
 	return e;
 }
 
@@ -104,6 +127,7 @@ nf_hook_entries_grow(const struct nf_hook_entries *old,
 {
 	unsigned int i, alloc_entries, nhooks, old_entries;
 	struct nf_hook_ops **orig_ops = NULL;
+	struct bpf_prog *hook_bpf_prog;
 	struct nf_hook_ops **new_ops;
 	struct nf_hook_entries *new;
 	bool inserted = false;
@@ -156,6 +180,27 @@ nf_hook_entries_grow(const struct nf_hook_entries *old,
 		new->hooks[nhooks].priv = reg->priv;
 	}
 
+	hook_bpf_prog = nf_hook_bpf_create(new);
+
+	/* XXX: jit failure handling?
+	 * We could refuse hook registration.
+	 *
+	 * For now, allocate_hook_entries_size() sets
+	 * ->hook_prog to a small fallback program that
+	 *  calls nf_hook_slow().
+	 */
+	if (hook_bpf_prog) {
+		struct bpf_prog *old_prog = NULL;
+
+		new->hook_prog = hook_bpf_prog;
+
+		if (old)
+			old_prog = old->hook_prog;
+
+		nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base),
+					old_prog, hook_bpf_prog);
+	}
+
 	return new;
 }
 
@@ -221,6 +266,7 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 					  struct nf_hook_entries __rcu **pp)
 {
 	unsigned int i, j, skip = 0, hook_entries;
+	struct bpf_prog *hook_bpf_prog = NULL;
 	struct nf_hook_entries *new = NULL;
 	struct nf_hook_ops **orig_ops;
 	struct nf_hook_ops **new_ops;
@@ -244,8 +290,15 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 
 	hook_entries -= skip;
 	new = allocate_hook_entries_size(hook_entries);
-	if (!new)
+	if (!new) {
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+		struct bpf_prog *old_prog = old->hook_prog;
+
+		WRITE_ONCE(old->hook_prog, fallback_nf_hook_slow);
+		nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base), old_prog, NULL);
+#endif
 		return NULL;
+	}
 
 	new_ops = nf_hook_entries_get_hook_ops(new);
 	for (i = 0, j = 0; i < old->num_hook_entries; i++) {
@@ -256,7 +309,16 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 		j++;
 	}
 	hooks_validate(new);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	/* if this fails fallback prog calls nf_hook_slow. */
+	hook_bpf_prog = nf_hook_bpf_create(new);
+	if (hook_bpf_prog)
+		new->hook_prog = hook_bpf_prog;
+#endif
 out_assign:
+	nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base),
+				old ? old->hook_prog : NULL, hook_bpf_prog);
 	rcu_assign_pointer(*pp, new);
 	return old;
 }
@@ -584,6 +646,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
 	int ret;
 
 	state->skb = skb;
+
 	for (; s < e->num_hook_entries; s++) {
 		verdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);
 		switch (verdict & NF_VERDICT_MASK) {
@@ -764,6 +827,11 @@ int __init netfilter_init(void)
 	if (ret < 0)
 		goto err_pernet;
 
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	fallback_nf_hook_slow = nf_hook_bpf_create_fb();
+	WARN_ON_ONCE(!fallback_nf_hook_slow);
+#endif
+
 	return 0;
 err_pernet:
 	unregister_pernet_subsys(&netfilter_net_ops);
diff --git a/net/netfilter/nf_hook_bpf.c b/net/netfilter/nf_hook_bpf.c
new file mode 100644
index 000000000000..cd8aba6da53b
--- /dev/null
+++ b/net/netfilter/nf_hook_bpf.c
@@ -0,0 +1,425 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/string.h>
+#include <linux/hashtable.h>
+#include <linux/jhash.h>
+#include <linux/netfilter.h>
+
+#include <net/netfilter/nf_queue.h>
+
+/* BPF translator for netfilter hooks.
+ *
+ * Copyright (c) 2021 Red Hat GmbH
+ *
+ * Author: Florian Westphal <fw@strlen.de>
+ *
+ * Unroll nf_hook_slow interpreter loop into an equivalent bpf
+ * program that can be called *instead* of nf_hook_slow().
+ * This program thus has same return value as nf_hook_slow and
+ * handles nfqueue and packet drops internally.
+ *
+ * These bpf programs are called/run from nf_hook() inline function.
+ *
+ * Register usage is:
+ *
+ * BPF_REG_0: verdict.
+ * BPF_REG_1: struct nf_hook_state *
+ * BPF_REG_2: reserved as arg to nf_queue()
+ * BPF_REG_3: reserved as arg to nf_queue()
+ *
+ * Prologue storage:
+ * BPF_REG_6: copy of REG_1 (original struct nf_hook_state *)
+ * BPF_REG_7: copy of original state->priv value
+ * BPF_REG_8: hook_index.  Inited to 0, increments on each hook call.
+ */
+
+#define JMP_INVALID 0
+#define JIT_SIZE_MAX 0xffff
+
+struct nf_hook_prog {
+	struct bpf_insn *insns;
+	unsigned int pos;
+};
+
+static bool emit(struct nf_hook_prog *p, struct bpf_insn insn)
+{
+	if (WARN_ON_ONCE(p->pos >= BPF_MAXINSNS))
+		return false;
+
+	p->insns[p->pos] = insn;
+	p->pos++;
+	return true;
+}
+
+static bool xlate_one_hook(struct nf_hook_prog *p,
+			   const struct nf_hook_entries *e,
+			   const struct nf_hook_entry *h)
+{
+	int width = bytes_to_bpf_size(sizeof(h->priv));
+
+	/* if priv is NULL, the called hookfn does not use the priv member. */
+	if (!h->priv)
+		goto emit_hook_call;
+
+	if (WARN_ON_ONCE(width < 0))
+		return false;
+
+	/* x = entries[s]->priv; */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_2, BPF_REG_7,
+				 (unsigned long)&h->priv - (unsigned long)e)))
+		return false;
+
+	/* state->priv = x */
+	if (!emit(p, BPF_STX_MEM(width, BPF_REG_6, BPF_REG_2,
+				 offsetof(struct nf_hook_state, priv))))
+		return false;
+
+emit_hook_call:
+	if (!emit(p, BPF_EMIT_CALL(h->hook)))
+		return false;
+
+	/* Only advance to next hook on ACCEPT verdict.
+	 * Else, skip rest and move to tail.
+	 *
+	 * Postprocessing patches the jump offset to the
+	 * correct position, after last hook.
+	 */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, NF_ACCEPT, JMP_INVALID)))
+		return false;
+
+	return true;
+}
+
+static bool emit_mov_ptr_reg(struct nf_hook_prog *p, u8 dreg, u8 sreg)
+{
+	if (sizeof(void *) == sizeof(u64))
+		return emit(p, BPF_MOV64_REG(dreg, sreg));
+	if (sizeof(void *) == sizeof(u32))
+		return emit(p, BPF_MOV32_REG(dreg, sreg));
+
+	return false;
+}
+
+static bool do_prologue(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	if (WARN_ON_ONCE(width < 0))
+		return false;
+
+	/* argument to program is a pointer to struct nf_hook_state, in BPF_REG_1. */
+	if (!emit_mov_ptr_reg(p, BPF_REG_6, BPF_REG_1))
+		return false;
+
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_7, BPF_REG_1,
+				 offsetof(struct nf_hook_state, priv))))
+		return false;
+
+	/* Could load state->hook_index here, but we don't support index > 0 for bpf call. */
+	if (!emit(p, BPF_MOV32_IMM(BPF_REG_8, 0)))
+		return false;
+
+	return true;
+}
+
+static void patch_hook_jumps(struct nf_hook_prog *p)
+{
+	unsigned int i;
+
+	if (!p->insns)
+		return;
+
+	for (i = 0; i < p->pos; i++) {
+		if (BPF_CLASS(p->insns[i].code) != BPF_JMP)
+			continue;
+
+		if (p->insns[i].code == (BPF_EXIT | BPF_JMP))
+			continue;
+		if (p->insns[i].code == (BPF_CALL | BPF_JMP))
+			continue;
+
+		if (p->insns[i].off != JMP_INVALID)
+			continue;
+		p->insns[i].off = p->pos - i - 1;
+	}
+}
+
+static bool emit_retval(struct nf_hook_prog *p, int retval)
+{
+	if (!emit(p, BPF_MOV32_IMM(BPF_REG_0, retval)))
+		return false;
+
+	return emit(p, BPF_EXIT_INSN());
+}
+
+static bool emit_nf_hook_slow(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	/* restore the original state->priv. */
+	if (!emit(p, BPF_STX_MEM(width, BPF_REG_6, BPF_REG_7,
+				 offsetof(struct nf_hook_state, priv))))
+		return false;
+
+	/* arg1 is state->skb */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_1, BPF_REG_6,
+				 offsetof(struct nf_hook_state, skb))))
+		return false;
+
+	/* arg2 is "struct nf_hook_state *" */
+	if (!emit(p, BPF_MOV64_REG(BPF_REG_2, BPF_REG_6)))
+		return false;
+
+	/* arg3 is nf_hook_entries (original state->priv) */
+	if (!emit(p, BPF_MOV64_REG(BPF_REG_3, BPF_REG_7)))
+		return false;
+
+	if (!emit(p, BPF_EMIT_CALL(nf_hook_slow)))
+		return false;
+
+	/* No further action needed, return retval provided by nf_hook_slow */
+	return emit(p, BPF_EXIT_INSN());
+}
+
+static bool emit_nf_queue(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	if (width < 0) {
+		WARN_ON_ONCE(1);
+		return false;
+	}
+
+	/* int nf_queue(struct sk_buff *skb, struct nf_hook_state *state, unsigned int verdict) */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_1, BPF_REG_6, offsetof(struct nf_hook_state, skb))))
+		return false;
+	if (!emit(p, BPF_STX_MEM(BPF_H, BPF_REG_6, BPF_REG_8,
+				 offsetof(struct nf_hook_state, hook_index))))
+		return false;
+	/* arg2: struct nf_hook_state * */
+	if (!emit(p, BPF_MOV64_REG(BPF_REG_2, BPF_REG_6)))
+		return false;
+	/* arg3: original hook return value: (NUM << NF_VERDICT_QBITS | NF_QUEUE) */
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_3, BPF_REG_0)))
+		return false;
+	if (!emit(p, BPF_EMIT_CALL(nf_queue)))
+		return false;
+
+	/* Check nf_queue return value.  Abnormal case: nf_queue returned != 0.
+	 *
+	 * Fall back to nf_hook_slow().
+	 */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2)))
+		return false;
+
+	/* Normal case: skb was stolen. Return 0. */
+	return emit_retval(p, 0);
+}
+
+static bool do_epilogue_base_hooks(struct nf_hook_prog *p)
+{
+	int width = bytes_to_bpf_size(sizeof(void *));
+
+	if (WARN_ON_ONCE(width < 0))
+		return false;
+
+	/* last 'hook'. We arrive here if previous hook returned ACCEPT,
+	 * i.e. all hooks passed -- we are done.
+	 *
+	 * Return 1, skb can continue traversing network stack.
+	 */
+	if (!emit_retval(p, 1))
+		return false;
+
+	/* Patch all hook jumps, in case any of these are taken
+	 * we need to jump to this location.
+	 *
+	 * This happens when verdict is != ACCEPT.
+	 */
+	patch_hook_jumps(p);
+
+	/* need to ignore upper 24 bits, might contain errno or queue number */
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_3, BPF_REG_0)))
+		return false;
+	if (!emit(p, BPF_ALU32_IMM(BPF_AND, BPF_REG_3, 0xff)))
+		return false;
+
+	/* ACCEPT handled, check STOLEN. */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_3, NF_STOLEN, 2)))
+		return false;
+
+	if (!emit_retval(p, 0))
+		return false;
+
+	/* ACCEPT and STOLEN handled.  Check DROP next */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_3, NF_DROP, 1 + 2 + 2 + 2 + 2)))
+		return false;
+
+	/* First step. Extract the errno number. 1 insn. */
+	if (!emit(p, BPF_ALU32_IMM(BPF_RSH, BPF_REG_0, NF_VERDICT_QBITS)))
+		return false;
+
+	/* Second step: replace errno with EPERM if it was 0. 2 insns. */
+	if (!emit(p, BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1)))
+		return false;
+	if (!emit(p, BPF_MOV32_IMM(BPF_REG_0, EPERM)))
+		return false;
+
+	/* Third step: negate reg0: Caller expects -EFOO and stash the result.  2 insns. */
+	if (!emit(p, BPF_ALU32_IMM(BPF_NEG, BPF_REG_0, 0)))
+		return false;
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_8, BPF_REG_0)))
+		return false;
+
+	/* Fourth step: free the skb. 2 insns. */
+	if (!emit(p, BPF_LDX_MEM(width, BPF_REG_1, BPF_REG_6, offsetof(struct nf_hook_state, skb))))
+		return false;
+	if (!emit(p, BPF_EMIT_CALL(kfree_skb)))
+		return false;
+
+	/* Last step: return. 2 insns. */
+	if (!emit(p, BPF_MOV32_REG(BPF_REG_0, BPF_REG_8)))
+		return false;
+	if (!emit(p, BPF_EXIT_INSN()))
+		return false;
+
+	/* ACCEPT, STOLEN and DROP have been handled.
+	 * REPEAT and STOP are not allowed anymore for individual hook functions.
+	 * This leaves NFQUEUE as only remaing return value.
+	 *
+	 * In this case BPF_REG_0 still contains the original verdict of
+	 * '(NUM << NF_VERDICT_QBITS | NF_QUEUE)', so pass it to nf_queue() as-is.
+	 */
+	if (!emit_nf_queue(p))
+		return false;
+
+	/* Increment hook index and store it in nf_hook_state so nf_hook_slow will
+	 * start at the next hook, if any.
+	 */
+	if (!emit(p, BPF_ALU32_IMM(BPF_ADD, BPF_REG_8, 1)))
+		return false;
+	if (!emit(p, BPF_STX_MEM(BPF_H, BPF_REG_6, BPF_REG_8,
+				 offsetof(struct nf_hook_state, hook_index))))
+		return false;
+
+	return emit_nf_hook_slow(p);
+}
+
+static int nf_hook_prog_init(struct nf_hook_prog *p)
+{
+	memset(p, 0, sizeof(*p));
+
+	p->insns = kcalloc(BPF_MAXINSNS, sizeof(*p->insns), GFP_KERNEL);
+	if (!p->insns)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void nf_hook_prog_free(struct nf_hook_prog *p)
+{
+	kfree(p->insns);
+}
+
+static int xlate_base_hooks(struct nf_hook_prog *p, const struct nf_hook_entries *e)
+{
+	unsigned int i, len;
+
+	len = e->num_hook_entries;
+
+	if (!do_prologue(p))
+		goto out;
+
+	for (i = 0; i < len; i++) {
+		if (!xlate_one_hook(p, e, &e->hooks[i]))
+			goto out;
+
+		if (i + 1 < len) {
+			if (!emit(p, BPF_MOV64_REG(BPF_REG_1, BPF_REG_6)))
+				goto out;
+
+			if (!emit(p, BPF_ALU32_IMM(BPF_ADD, BPF_REG_8, 1)))
+				goto out;
+		}
+	}
+
+	if (!do_epilogue_base_hooks(p))
+		goto out;
+
+	return 0;
+out:
+	return -EINVAL;
+}
+
+static struct bpf_prog *nf_hook_jit_compile(struct bpf_insn *insns, unsigned int len)
+{
+	struct bpf_prog *prog;
+	int err = 0;
+
+	prog = bpf_prog_alloc(bpf_prog_size(len), 0);
+	if (!prog)
+		return NULL;
+
+	prog->len = len;
+	prog->type = BPF_PROG_TYPE_SOCKET_FILTER;
+	memcpy(prog->insnsi, insns, prog->len * sizeof(struct bpf_insn));
+
+	prog = bpf_prog_select_runtime(prog, &err);
+	if (err) {
+		bpf_prog_free(prog);
+		return NULL;
+	}
+
+	return prog;
+}
+
+/* fallback program, invokes nf_hook_slow interpreter.
+ *
+ * Used when a hook is unregsitered and new program cannot
+ * be compiled for some reason.
+ */
+struct bpf_prog *nf_hook_bpf_create_fb(void)
+{
+	struct bpf_prog *prog;
+	struct nf_hook_prog p;
+	int err;
+
+	err = nf_hook_prog_init(&p);
+	if (err)
+		return NULL;
+
+	if (!do_prologue(&p))
+		goto err;
+
+	if (!emit_nf_hook_slow(&p))
+		goto err;
+
+	prog = nf_hook_jit_compile(p.insns, p.pos);
+err:
+	nf_hook_prog_free(&p);
+	return prog;
+}
+
+struct bpf_prog *nf_hook_bpf_create(const struct nf_hook_entries *new)
+{
+	struct bpf_prog *prog;
+	struct nf_hook_prog p;
+	int err;
+
+	err = nf_hook_prog_init(&p);
+	if (err)
+		return NULL;
+
+	err = xlate_base_hooks(&p, new);
+	if (err)
+		goto err;
+
+	prog = nf_hook_jit_compile(p.insns, p.pos);
+err:
+	nf_hook_prog_free(&p);
+	return prog;
+}
+
+void nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to)
+{
+	bpf_dispatcher_change_prog(d, from, to);
+}
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 7/9] netfilter: core: do not rebuild bpf program on dying netns
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (6 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 6/9] netfilter: add bpf base hook program generator Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 8/9] netfilter: ingress: switch to invocation via bpf Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 9/9] netfilter: hook_jit: add prog cache Florian Westphal
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

We can save a few cycles on netns destruction.
When a hook is removed we can just skip building a new
program with the remaining hooks, those will be removed too
in the immediate future.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/core.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 56d82822cab7..9a19d4f1673b 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -251,6 +251,7 @@ EXPORT_SYMBOL_GPL(nf_hook_entries_insert_raw);
  *
  * @old -- current hook blob at @pp
  * @pp -- location of hook blob
+ * @recompile -- false if bpf prog should not be replaced
  *
  * Hook unregistration must always succeed, so to-be-removed hooks
  * are replaced by a dummy one that will just move to next hook.
@@ -263,7 +264,8 @@ EXPORT_SYMBOL_GPL(nf_hook_entries_insert_raw);
  * Returns address to free, or NULL.
  */
 static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
-					  struct nf_hook_entries __rcu **pp)
+					  struct nf_hook_entries __rcu **pp,
+					  bool recompile)
 {
 	unsigned int i, j, skip = 0, hook_entries;
 	struct bpf_prog *hook_bpf_prog = NULL;
@@ -311,10 +313,12 @@ static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
 	hooks_validate(new);
 
 #if IS_ENABLED(CONFIG_NF_HOOK_BPF)
-	/* if this fails fallback prog calls nf_hook_slow. */
-	hook_bpf_prog = nf_hook_bpf_create(new);
-	if (hook_bpf_prog)
-		new->hook_prog = hook_bpf_prog;
+	if (recompile) {
+		/* if this fails fallback prog calls nf_hook_slow. */
+		hook_bpf_prog = nf_hook_bpf_create(new);
+		if (hook_bpf_prog)
+			new->hook_prog = hook_bpf_prog;
+	}
 #endif
 out_assign:
 	nf_hook_bpf_change_prog(BPF_DISPATCHER_PTR(nf_hook_base),
@@ -540,7 +544,7 @@ static void __nf_unregister_net_hook(struct net *net, int pf,
 		WARN_ONCE(1, "hook not found, pf %d num %d", pf, reg->hooknum);
 	}
 
-	p = __nf_hook_entries_try_shrink(p, pp);
+	p = __nf_hook_entries_try_shrink(p, pp, check_net(net));
 	mutex_unlock(&nf_hook_mutex);
 	if (!p)
 		return;
@@ -571,7 +575,7 @@ void nf_hook_entries_delete_raw(struct nf_hook_entries __rcu **pp,
 
 	p = rcu_dereference_raw(*pp);
 	if (nf_remove_net_hook(p, reg)) {
-		p = __nf_hook_entries_try_shrink(p, pp);
+		p = __nf_hook_entries_try_shrink(p, pp, false);
 		nf_hook_entries_free(p);
 	}
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 8/9] netfilter: ingress: switch to invocation via bpf
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (7 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 7/9] netfilter: core: do not rebuild bpf program on dying netns Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  2021-10-14 12:10 ` [PATCH RFC nf-next 9/9] netfilter: hook_jit: add prog cache Florian Westphal
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter_ingress.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_ingress.h
index c95f84a5badc..20e0b1c2c706 100644
--- a/include/linux/netfilter_ingress.h
+++ b/include/linux/netfilter_ingress.h
@@ -19,6 +19,9 @@ static inline bool nf_hook_ingress_active(const struct sk_buff *skb)
 static inline int nf_hook_ingress(struct sk_buff *skb)
 {
 	struct nf_hook_entries *e = rcu_dereference(skb->dev->nf_hooks_ingress);
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	const struct bpf_prog *prog;
+#endif
 	struct nf_hook_state state;
 	int ret;
 
@@ -31,7 +34,19 @@ static inline int nf_hook_ingress(struct sk_buff *skb)
 	nf_hook_state_init(&state, NF_NETDEV_INGRESS,
 			   NFPROTO_NETDEV, skb->dev, NULL, NULL,
 			   dev_net(skb->dev), NULL);
+
+#if IS_ENABLED(CONFIG_NF_HOOK_BPF)
+	prog = READ_ONCE(e->hook_prog);
+
+	state.priv = (void *)e;
+	state.skb = skb;
+
+	migrate_disable();
+	ret = bpf_prog_run_nf(prog, &state);
+	migrate_enable();
+#else
 	ret = nf_hook_slow(skb, &state, e);
+#endif
 	if (ret == 0)
 		return -1;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC nf-next 9/9] netfilter: hook_jit: add prog cache
  2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
                   ` (8 preceding siblings ...)
  2021-10-14 12:10 ` [PATCH RFC nf-next 8/9] netfilter: ingress: switch to invocation via bpf Florian Westphal
@ 2021-10-14 12:10 ` Florian Westphal
  9 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2021-10-14 12:10 UTC (permalink / raw)
  To: netfilter-devel; +Cc: bpf, netdev, me, Florian Westphal

This allows to re-use the same program.  For example, a nft
ruleset that attaches filter basechains to input, forward, output would
use the same program for all three hook points.

The cache is intentionally netns agnostic, so same config
in different netns will all use same programs.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_hook_bpf.c | 144 ++++++++++++++++++++++++++++++++++++
 1 file changed, 144 insertions(+)

diff --git a/net/netfilter/nf_hook_bpf.c b/net/netfilter/nf_hook_bpf.c
index cd8aba6da53b..00ac3e896f25 100644
--- a/net/netfilter/nf_hook_bpf.c
+++ b/net/netfilter/nf_hook_bpf.c
@@ -40,6 +40,24 @@ struct nf_hook_prog {
 	unsigned int pos;
 };
 
+struct nf_hook_bpf_prog {
+	struct rcu_head rcu_head;
+
+	struct hlist_node node_key;
+	struct hlist_node node_prog;
+	u32 key;
+	u16 hook_count;
+	refcount_t refcnt;
+	struct bpf_prog	*prog;
+	unsigned long hooks[64];
+};
+
+#define NF_BPF_PROG_HT_BITS	8
+
+/* users need to hold nf_hook_mutex */
+static DEFINE_HASHTABLE(nf_bpf_progs_ht_key, NF_BPF_PROG_HT_BITS);
+static DEFINE_HASHTABLE(nf_bpf_progs_ht_prog, NF_BPF_PROG_HT_BITS);
+
 static bool emit(struct nf_hook_prog *p, struct bpf_insn insn)
 {
 	if (WARN_ON_ONCE(p->pos >= BPF_MAXINSNS))
@@ -399,12 +417,106 @@ struct bpf_prog *nf_hook_bpf_create_fb(void)
 	return prog;
 }
 
+static u32 nf_hook_entries_hash(const struct nf_hook_entries *new)
+{
+	int i, hook_count = new->num_hook_entries;
+	u32 a, b, c;
+
+	a = b = c = JHASH_INITVAL + hook_count;
+	i = 0;
+	while (hook_count > 3) {
+		a += hash32_ptr(new->hooks[i+0].hook);
+		b += hash32_ptr(new->hooks[i+1].hook);
+		c += hash32_ptr(new->hooks[i+2].hook);
+		__jhash_mix(a, b, c);
+		hook_count -= 3;
+		i += 3;
+	}
+
+	switch (hook_count) {
+	case 3: c += hash32_ptr(new->hooks[i+2].hook); fallthrough;
+	case 2: b += hash32_ptr(new->hooks[i+1].hook); fallthrough;
+	case 1: a += hash32_ptr(new->hooks[i+0].hook);
+		__jhash_final(a, b, c);
+		break;
+	}
+
+	return c;
+}
+
+static struct bpf_prog *nf_hook_bpf_find_prog_by_key(const struct nf_hook_entries *new, u32 key)
+{
+	int i, hook_count = new->num_hook_entries;
+	struct nf_hook_bpf_prog *pc;
+
+	hash_for_each_possible(nf_bpf_progs_ht_key, pc, node_key, key) {
+		if (pc->hook_count != hook_count ||
+		    pc->key != key)
+			continue;
+
+		for (i = 0; i < hook_count; i++) {
+			if (pc->hooks[i] != (unsigned long)new->hooks[i].hook)
+				break;
+		}
+
+		if (i == hook_count) {
+			refcount_inc(&pc->refcnt);
+			return pc->prog;
+		}
+	}
+
+	return NULL;
+}
+
+static struct nf_hook_bpf_prog *nf_hook_bpf_find_prog(const struct bpf_prog *p)
+{
+	struct nf_hook_bpf_prog *pc;
+
+	hash_for_each_possible(nf_bpf_progs_ht_prog, pc, node_prog, (unsigned long)p) {
+		if (pc->prog == p)
+			return pc;
+	}
+
+	return NULL;
+}
+
+static void nf_hook_bpf_prog_store(const struct nf_hook_entries *new, struct bpf_prog *prog, u32 key)
+{
+	unsigned int i, hook_count = new->num_hook_entries;
+	struct nf_hook_bpf_prog *alloc;
+
+	if (hook_count >= ARRAY_SIZE(alloc->hooks))
+		return;
+
+	alloc = kzalloc(sizeof(*alloc), GFP_KERNEL);
+	if (!alloc)
+		return;
+
+	alloc->hook_count = new->num_hook_entries;
+	alloc->prog = prog;
+	alloc->key = key;
+
+	for (i = 0; i < hook_count; i++)
+		alloc->hooks[i] = (unsigned long)new->hooks[i].hook;
+
+	hash_add(nf_bpf_progs_ht_key, &alloc->node_key, key);
+	hash_add(nf_bpf_progs_ht_prog, &alloc->node_prog, (unsigned long)prog);
+	refcount_set(&alloc->refcnt, 1);
+
+	bpf_prog_inc(prog);
+}
+
 struct bpf_prog *nf_hook_bpf_create(const struct nf_hook_entries *new)
 {
+	u32 key = nf_hook_entries_hash(new);
 	struct bpf_prog *prog;
 	struct nf_hook_prog p;
 	int err;
 
+	prog = nf_hook_bpf_find_prog_by_key(new, key);
+	if (prog)
+		return prog;
+
 	err = nf_hook_prog_init(&p);
 	if (err)
 		return NULL;
@@ -414,12 +526,44 @@ struct bpf_prog *nf_hook_bpf_create(const struct nf_hook_entries *new)
 		goto err;
 
 	prog = nf_hook_jit_compile(p.insns, p.pos);
+	if (prog)
+		nf_hook_bpf_prog_store(new, prog, key);
 err:
 	nf_hook_prog_free(&p);
 	return prog;
 }
 
+static void __nf_hook_free_prog(struct rcu_head *head)
+{
+	struct nf_hook_bpf_prog *old = container_of(head, struct nf_hook_bpf_prog, rcu_head);
+
+	bpf_prog_put(old->prog);
+	kfree(old);
+}
+
+static void nf_hook_free_prog(struct nf_hook_bpf_prog *old)
+{
+	call_rcu(&old->rcu_head, __nf_hook_free_prog);
+}
+
 void nf_hook_bpf_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to)
 {
+	if (from == to)
+		return;
+
+	if (from) {
+		struct nf_hook_bpf_prog *old;
+
+		old = nf_hook_bpf_find_prog(from);
+		if (old) {
+			WARN_ON_ONCE(from != old->prog);
+			if (refcount_dec_and_test(&old->refcnt)) {
+				hash_del(&old->node_key);
+				hash_del(&old->node_prog);
+				nf_hook_free_prog(old);
+			}
+		}
+	}
+
 	bpf_dispatcher_change_prog(d, from, to);
 }
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-14 12:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14 12:10 [PATCH RFC nf-next 0/9] netfilter: bpf base hook program generator Florian Westphal
2021-10-14 12:10 ` [PATCH 1/1] netfilter: add " Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 1/9] netfilter: nf_queue: carry index in hook state Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 2/9] netfilter: nat: split nat hook iteration into a helper Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 3/9] netfilter: remove hook index from nf_hook_slow arguments Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 4/9] netfilter: make hook functions accept only one argument Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 5/9] netfilter: reduce allowed hook count to 32 Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 6/9] netfilter: add bpf base hook program generator Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 7/9] netfilter: core: do not rebuild bpf program on dying netns Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 8/9] netfilter: ingress: switch to invocation via bpf Florian Westphal
2021-10-14 12:10 ` [PATCH RFC nf-next 9/9] netfilter: hook_jit: add prog cache Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).