netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] Netfilter updates for net-next
@ 2023-01-18 12:31 Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets Florian Westphal
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:31 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

Hello,

following patch set includes netfilter updates for your *net-next* tree.

1. Replace pr_debug use with nf_log infra for debugging in sctp
   conntrack.
2. Remove pr_debug calls, they are either useless or we have better
   options in place.
3. Avoid repeated load of ct->status in some spots.
   Some bit-flags cannot change during the lifeetime of
   a connection, so no need to re-fetch those.
4. Avoid uneeded nesting of rcu_read_lock during tuple lookup.
5. Remove the CLUSTERIP target.  Marked as obsolete for years,
   and we still have WARN splats wrt. races of the out-of-band
   /proc interface installed by this target.
6. Add static key to nf_tables to avoid the retpoline mitigation
   if/else if cascade provided the cpu doesn't need the retpoline thunk.
7. add nf_tables objref calls to the retpoline mitigation workaround.
8. Split parts of nft_ct.c that do not need symbols exported by
   the conntrack modules and place them in nf_tables directly.
   This allows to avoid indirect call for 'ct status' checks.
9. Add 'destroy' commands to nf_tables.  They are identical
   to the existing 'delete' commands, but do not indicate
   an error if the referenced object (set, chain, rule...)
   did not exist, from Fernando.

The following changes since commit c4791b3196bf46367bcf6cc56a09b32e037c4f49:

  Merge branch 'net-mdio-continue-separating-c22-and-c45' (2023-01-17 19:34:10 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git

for you to fetch changes up to f80a612dd77c4585171e44a06b490466bdeec1ae:

  netfilter: nf_tables: add support to destroy operation (2023-01-18 13:09:00 +0100)

----------------------------------------------------------------
Fernando Fernandez Mancera (1):
      netfilter: nf_tables: add support to destroy operation

Florian Westphal (8):
      netfilter: conntrack: sctp: use nf log infrastructure for invalid packets
      netfilter: conntrack: remove pr_debug calls
      netfilter: conntrack: avoid reload of ct->status
      netfilter: conntrack: move rcu read lock to nf_conntrack_find_get
      netfilter: ip_tables: remove clusterip target
      netfilter: nf_tables: add static key to skip retpoline workarounds
      netfilter: nf_tables: avoid retpoline overhead for objref calls
      netfilter: nf_tables: avoid retpoline overhead for some ct expression calls

 include/net/netfilter/nf_tables_core.h   |  16 +
 include/uapi/linux/netfilter/nf_tables.h |  14 +
 net/ipv4/netfilter/Kconfig               |  14 -
 net/ipv4/netfilter/Makefile              |   1 -
 net/ipv4/netfilter/ipt_CLUSTERIP.c       | 929 -------------------------------
 net/netfilter/Makefile                   |   6 +
 net/netfilter/nf_conntrack_core.c        |  46 +-
 net/netfilter/nf_conntrack_proto.c       |  20 +-
 net/netfilter/nf_conntrack_proto_sctp.c  |  46 +-
 net/netfilter/nf_conntrack_proto_tcp.c   |   9 -
 net/netfilter/nf_conntrack_proto_udp.c   |  10 +-
 net/netfilter/nf_tables_api.c            | 111 +++-
 net/netfilter/nf_tables_core.c           |  35 +-
 net/netfilter/nft_ct.c                   |  39 +-
 net/netfilter/nft_ct_fast.c              |  56 ++
 net/netfilter/nft_objref.c               |  12 +-
 16 files changed, 302 insertions(+), 1062 deletions(-)
 delete mode 100644 net/ipv4/netfilter/ipt_CLUSTERIP.c
 create mode 100644 net/netfilter/nft_ct_fast.c
-- 
2.38.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 13:30   ` patchwork-bot+netdevbpf
  2023-01-18 12:32 ` [PATCH net-next 2/9] netfilter: conntrack: remove pr_debug calls Florian Westphal
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

The conntrack logging facilities include useful info such as in/out
interface names and packet headers.

Use those in more places instead of pr_debug calls.
Furthermore, several pr_debug calls can be removed, they are useless
on production machines due to the sheer volume of log messages.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_proto_sctp.c | 46 ++++++++-----------------
 1 file changed, 15 insertions(+), 31 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index d88b92a8ffca..dbdfcc6cd2aa 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -168,7 +168,8 @@ for ((offset) = (dataoff) + sizeof(struct sctphdr), (count) = 0;	\
 static int do_basic_checks(struct nf_conn *ct,
 			   const struct sk_buff *skb,
 			   unsigned int dataoff,
-			   unsigned long *map)
+			   unsigned long *map,
+			   const struct nf_hook_state *state)
 {
 	u_int32_t offset, count;
 	struct sctp_chunkhdr _sch, *sch;
@@ -177,8 +178,6 @@ static int do_basic_checks(struct nf_conn *ct,
 	flag = 0;
 
 	for_each_sctp_chunk (skb, sch, _sch, offset, dataoff, count) {
-		pr_debug("Chunk Num: %d  Type: %d\n", count, sch->type);
-
 		if (sch->type == SCTP_CID_INIT ||
 		    sch->type == SCTP_CID_INIT_ACK ||
 		    sch->type == SCTP_CID_SHUTDOWN_COMPLETE)
@@ -193,7 +192,9 @@ static int do_basic_checks(struct nf_conn *ct,
 		      sch->type == SCTP_CID_COOKIE_ECHO ||
 		      flag) &&
 		     count != 0) || !sch->length) {
-			pr_debug("Basic checks failed\n");
+			nf_ct_l4proto_log_invalid(skb, ct, state,
+						  "%s failed. chunk num %d, type %d, len %d flag %d\n",
+						  __func__, count, sch->type, sch->length, flag);
 			return 1;
 		}
 
@@ -201,7 +202,6 @@ static int do_basic_checks(struct nf_conn *ct,
 			set_bit(sch->type, map);
 	}
 
-	pr_debug("Basic checks passed\n");
 	return count == 0;
 }
 
@@ -211,69 +211,51 @@ static int sctp_new_state(enum ip_conntrack_dir dir,
 {
 	int i;
 
-	pr_debug("Chunk type: %d\n", chunk_type);
-
 	switch (chunk_type) {
 	case SCTP_CID_INIT:
-		pr_debug("SCTP_CID_INIT\n");
 		i = 0;
 		break;
 	case SCTP_CID_INIT_ACK:
-		pr_debug("SCTP_CID_INIT_ACK\n");
 		i = 1;
 		break;
 	case SCTP_CID_ABORT:
-		pr_debug("SCTP_CID_ABORT\n");
 		i = 2;
 		break;
 	case SCTP_CID_SHUTDOWN:
-		pr_debug("SCTP_CID_SHUTDOWN\n");
 		i = 3;
 		break;
 	case SCTP_CID_SHUTDOWN_ACK:
-		pr_debug("SCTP_CID_SHUTDOWN_ACK\n");
 		i = 4;
 		break;
 	case SCTP_CID_ERROR:
-		pr_debug("SCTP_CID_ERROR\n");
 		i = 5;
 		break;
 	case SCTP_CID_COOKIE_ECHO:
-		pr_debug("SCTP_CID_COOKIE_ECHO\n");
 		i = 6;
 		break;
 	case SCTP_CID_COOKIE_ACK:
-		pr_debug("SCTP_CID_COOKIE_ACK\n");
 		i = 7;
 		break;
 	case SCTP_CID_SHUTDOWN_COMPLETE:
-		pr_debug("SCTP_CID_SHUTDOWN_COMPLETE\n");
 		i = 8;
 		break;
 	case SCTP_CID_HEARTBEAT:
-		pr_debug("SCTP_CID_HEARTBEAT");
 		i = 9;
 		break;
 	case SCTP_CID_HEARTBEAT_ACK:
-		pr_debug("SCTP_CID_HEARTBEAT_ACK");
 		i = 10;
 		break;
 	case SCTP_CID_DATA:
 	case SCTP_CID_SACK:
-		pr_debug("SCTP_CID_DATA/SACK");
 		i = 11;
 		break;
 	default:
 		/* Other chunks like DATA or SACK do not change the state */
-		pr_debug("Unknown chunk type, Will stay in %s\n",
-			 sctp_conntrack_names[cur_state]);
+		pr_debug("Unknown chunk type %d, Will stay in %s\n",
+			 chunk_type, sctp_conntrack_names[cur_state]);
 		return cur_state;
 	}
 
-	pr_debug("dir: %d   cur_state: %s  chunk_type: %d  new_state: %s\n",
-		 dir, sctp_conntrack_names[cur_state], chunk_type,
-		 sctp_conntrack_names[sctp_conntracks[dir][i][cur_state]]);
-
 	return sctp_conntracks[dir][i][cur_state];
 }
 
@@ -392,7 +374,7 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
 	if (sh == NULL)
 		goto out;
 
-	if (do_basic_checks(ct, skb, dataoff, map) != 0)
+	if (do_basic_checks(ct, skb, dataoff, map, state) != 0)
 		goto out;
 
 	if (!nf_ct_is_confirmed(ct)) {
@@ -414,7 +396,9 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
 		    !test_bit(SCTP_CID_HEARTBEAT, map) &&
 		    !test_bit(SCTP_CID_HEARTBEAT_ACK, map) &&
 		    sh->vtag != ct->proto.sctp.vtag[dir]) {
-			pr_debug("Verification tag check failed\n");
+			nf_ct_l4proto_log_invalid(skb, ct, state,
+						  "verification tag check failed %x vs %x for dir %d",
+						  sh->vtag, ct->proto.sctp.vtag[dir], dir);
 			goto out;
 		}
 	}
@@ -488,9 +472,10 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
 
 		/* Invalid */
 		if (new_state == SCTP_CONNTRACK_MAX) {
-			pr_debug("nf_conntrack_sctp: Invalid dir=%i ctype=%u "
-				 "conntrack=%u\n",
-				 dir, sch->type, old_state);
+			nf_ct_l4proto_log_invalid(skb, ct, state,
+						  "Invalid, old_state %d, dir %d, type %d",
+						  old_state, dir, sch->type);
+
 			goto out_unlock;
 		}
 
@@ -536,7 +521,6 @@ int nf_conntrack_sctp_packet(struct nf_conn *ct,
 	if (old_state == SCTP_CONNTRACK_COOKIE_ECHOED &&
 	    dir == IP_CT_DIR_REPLY &&
 	    new_state == SCTP_CONNTRACK_ESTABLISHED) {
-		pr_debug("Setting assured bit\n");
 		set_bit(IPS_ASSURED_BIT, &ct->status);
 		nf_conntrack_event_cache(IPCT_ASSURED, ct);
 	}
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/9] netfilter: conntrack: remove pr_debug calls
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 3/9] netfilter: conntrack: avoid reload of ct->status Florian Westphal
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

Those are all useless or dubious.
getorigdst() is called via setsockopt, so return value/errno will
already indicate an appropriate error.

For other pr_debug calls there are better replacements, such as
slab/slub debugging or 'conntrack -E' (ctnetlink events).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c      | 20 ++------------------
 net/netfilter/nf_conntrack_proto.c     | 20 +++-----------------
 net/netfilter/nf_conntrack_proto_tcp.c |  9 ---------
 3 files changed, 5 insertions(+), 44 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 496c4920505b..81ece117033a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -514,7 +514,6 @@ EXPORT_SYMBOL_GPL(nf_ct_get_id);
 static void
 clean_from_lists(struct nf_conn *ct)
 {
-	pr_debug("clean_from_lists(%p)\n", ct);
 	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
 	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode);
 
@@ -582,7 +581,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 {
 	struct nf_conn *ct = (struct nf_conn *)nfct;
 
-	pr_debug("%s(%p)\n", __func__, ct);
 	WARN_ON(refcount_read(&nfct->use) != 0);
 
 	if (unlikely(nf_ct_is_template(ct))) {
@@ -603,7 +601,6 @@ void nf_ct_destroy(struct nf_conntrack *nfct)
 	if (ct->master)
 		nf_ct_put(ct->master);
 
-	pr_debug("%s: returning ct=%p to slab\n", __func__, ct);
 	nf_conntrack_free(ct);
 }
 EXPORT_SYMBOL(nf_ct_destroy);
@@ -1210,7 +1207,6 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		goto dying;
 	}
 
-	pr_debug("Confirming conntrack %p\n", ct);
 	/* We have to check the DYING flag after unlink to prevent
 	 * a race against nf_ct_get_next_corpse() possibly called from
 	 * user context, else we insert an already 'dead' hash, blocking
@@ -1721,10 +1717,8 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 	struct nf_conntrack_zone tmp;
 	struct nf_conntrack_net *cnet;
 
-	if (!nf_ct_invert_tuple(&repl_tuple, tuple)) {
-		pr_debug("Can't invert tuple.\n");
+	if (!nf_ct_invert_tuple(&repl_tuple, tuple))
 		return NULL;
-	}
 
 	zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
 	ct = __nf_conntrack_alloc(net, zone, tuple, &repl_tuple, GFP_ATOMIC,
@@ -1764,8 +1758,6 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
 		spin_lock_bh(&nf_conntrack_expect_lock);
 		exp = nf_ct_find_expectation(net, zone, tuple);
 		if (exp) {
-			pr_debug("expectation arrives ct=%p exp=%p\n",
-				 ct, exp);
 			/* Welcome, Mr. Bond.  We've been expecting you... */
 			__set_bit(IPS_EXPECTED_BIT, &ct->status);
 			/* exp->master safe, refcnt bumped in nf_ct_find_expectation */
@@ -1829,10 +1821,8 @@ resolve_normal_ct(struct nf_conn *tmpl,
 
 	if (!nf_ct_get_tuple(skb, skb_network_offset(skb),
 			     dataoff, state->pf, protonum, state->net,
-			     &tuple)) {
-		pr_debug("Can't get tuple\n");
+			     &tuple))
 		return 0;
-	}
 
 	/* look for tuple match */
 	zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
@@ -1866,13 +1856,10 @@ resolve_normal_ct(struct nf_conn *tmpl,
 	} else {
 		/* Once we've had two way comms, always ESTABLISHED. */
 		if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
-			pr_debug("normal packet for %p\n", ct);
 			ctinfo = IP_CT_ESTABLISHED;
 		} else if (test_bit(IPS_EXPECTED_BIT, &ct->status)) {
-			pr_debug("related packet for %p\n", ct);
 			ctinfo = IP_CT_RELATED;
 		} else {
-			pr_debug("new packet for %p\n", ct);
 			ctinfo = IP_CT_NEW;
 		}
 	}
@@ -1988,7 +1975,6 @@ nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
 	/* rcu_read_lock()ed by nf_hook_thresh */
 	dataoff = get_l4proto(skb, skb_network_offset(skb), state->pf, &protonum);
 	if (dataoff <= 0) {
-		pr_debug("not prepared to track yet or error occurred\n");
 		NF_CT_STAT_INC_ATOMIC(state->net, invalid);
 		ret = NF_ACCEPT;
 		goto out;
@@ -2027,7 +2013,6 @@ nf_conntrack_in(struct sk_buff *skb, const struct nf_hook_state *state)
 	if (ret <= 0) {
 		/* Invalid: inverse of the return code tells
 		 * the netfilter core what to do */
-		pr_debug("nf_conntrack_in: Can't track with proto module\n");
 		nf_ct_put(ct);
 		skb->_nfct = 0;
 		/* Special case: TCP tracker reports an attempt to reopen a
@@ -2066,7 +2051,6 @@ void nf_conntrack_alter_reply(struct nf_conn *ct,
 	/* Should be unconfirmed, so not in hash table yet */
 	WARN_ON(nf_ct_is_confirmed(ct));
 
-	pr_debug("Altering reply tuple of %p to ", ct);
 	nf_ct_dump_tuple(newreply);
 
 	ct->tuplehash[IP_CT_DIR_REPLY].tuple = *newreply;
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index ccef340be575..c928ff63b10e 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -284,16 +284,11 @@ getorigdst(struct sock *sk, int optval, void __user *user, int *len)
 
 	/* We only do TCP and SCTP at the moment: is there a better way? */
 	if (tuple.dst.protonum != IPPROTO_TCP &&
-	    tuple.dst.protonum != IPPROTO_SCTP) {
-		pr_debug("SO_ORIGINAL_DST: Not a TCP/SCTP socket\n");
+	    tuple.dst.protonum != IPPROTO_SCTP)
 		return -ENOPROTOOPT;
-	}
 
-	if ((unsigned int)*len < sizeof(struct sockaddr_in)) {
-		pr_debug("SO_ORIGINAL_DST: len %d not %zu\n",
-			 *len, sizeof(struct sockaddr_in));
+	if ((unsigned int)*len < sizeof(struct sockaddr_in))
 		return -EINVAL;
-	}
 
 	h = nf_conntrack_find_get(sock_net(sk), &nf_ct_zone_dflt, &tuple);
 	if (h) {
@@ -307,17 +302,12 @@ getorigdst(struct sock *sk, int optval, void __user *user, int *len)
 			.tuple.dst.u3.ip;
 		memset(sin.sin_zero, 0, sizeof(sin.sin_zero));
 
-		pr_debug("SO_ORIGINAL_DST: %pI4 %u\n",
-			 &sin.sin_addr.s_addr, ntohs(sin.sin_port));
 		nf_ct_put(ct);
 		if (copy_to_user(user, &sin, sizeof(sin)) != 0)
 			return -EFAULT;
 		else
 			return 0;
 	}
-	pr_debug("SO_ORIGINAL_DST: Can't find %pI4/%u-%pI4/%u.\n",
-		 &tuple.src.u3.ip, ntohs(tuple.src.u.tcp.port),
-		 &tuple.dst.u3.ip, ntohs(tuple.dst.u.tcp.port));
 	return -ENOENT;
 }
 
@@ -360,12 +350,8 @@ ipv6_getorigdst(struct sock *sk, int optval, void __user *user, int *len)
 		return -EINVAL;
 
 	h = nf_conntrack_find_get(sock_net(sk), &nf_ct_zone_dflt, &tuple);
-	if (!h) {
-		pr_debug("IP6T_SO_ORIGINAL_DST: Can't find %pI6c/%u-%pI6c/%u.\n",
-			 &tuple.src.u3.ip6, ntohs(tuple.src.u.tcp.port),
-			 &tuple.dst.u3.ip6, ntohs(tuple.dst.u.tcp.port));
+	if (!h)
 		return -ENOENT;
-	}
 
 	ct = nf_ct_tuplehash_to_ctrack(h);
 
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 656631083177..21a3741162ba 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -930,7 +930,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 {
 	struct net *net = nf_ct_net(ct);
 	struct nf_tcp_net *tn = nf_tcp_pernet(net);
-	struct nf_conntrack_tuple *tuple;
 	enum tcp_conntrack new_state, old_state;
 	unsigned int index, *timeouts;
 	enum nf_ct_tcp_action res;
@@ -954,7 +953,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 	dir = CTINFO2DIR(ctinfo);
 	index = get_conntrack_index(th);
 	new_state = tcp_conntracks[dir][index][old_state];
-	tuple = &ct->tuplehash[dir].tuple;
 
 	switch (new_state) {
 	case TCP_CONNTRACK_SYN_SENT:
@@ -1217,13 +1215,6 @@ int nf_conntrack_tcp_packet(struct nf_conn *ct,
 	ct->proto.tcp.last_index = index;
 	ct->proto.tcp.last_dir = dir;
 
-	pr_debug("tcp_conntracks: ");
-	nf_ct_dump_tuple(tuple);
-	pr_debug("syn=%i ack=%i fin=%i rst=%i old=%i new=%i\n",
-		 (th->syn ? 1 : 0), (th->ack ? 1 : 0),
-		 (th->fin ? 1 : 0), (th->rst ? 1 : 0),
-		 old_state, new_state);
-
 	ct->proto.tcp.state = new_state;
 	if (old_state != new_state
 	    && new_state == TCP_CONNTRACK_FIN_WAIT)
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/9] netfilter: conntrack: avoid reload of ct->status
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 2/9] netfilter: conntrack: remove pr_debug calls Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-23 11:38   ` Roi Dayan
  2023-01-18 12:32 ` [PATCH net-next 4/9] netfilter: conntrack: move rcu read lock to nf_conntrack_find_get Florian Westphal
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

Compiler can't merge the two test_bit() calls, so load ct->status
once and use non-atomic accesses.

This is fine because IPS_EXPECTED or NAT_CLASH are either set at ct
creation time or not at all, but compiler can't know that.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c      |  9 +++++----
 net/netfilter/nf_conntrack_proto_udp.c | 10 ++++++----
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 81ece117033a..9e12cade4e0f 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1854,14 +1854,15 @@ resolve_normal_ct(struct nf_conn *tmpl,
 	if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) {
 		ctinfo = IP_CT_ESTABLISHED_REPLY;
 	} else {
+		unsigned long status = READ_ONCE(ct->status);
+
 		/* Once we've had two way comms, always ESTABLISHED. */
-		if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
+		if (likely(status & IPS_SEEN_REPLY))
 			ctinfo = IP_CT_ESTABLISHED;
-		} else if (test_bit(IPS_EXPECTED_BIT, &ct->status)) {
+		else if (status & IPS_EXPECTED)
 			ctinfo = IP_CT_RELATED;
-		} else {
+		else
 			ctinfo = IP_CT_NEW;
-		}
 	}
 	nf_ct_set(skb, ct, ctinfo);
 	return 0;
diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
index 3b516cffc779..6b9206635b24 100644
--- a/net/netfilter/nf_conntrack_proto_udp.c
+++ b/net/netfilter/nf_conntrack_proto_udp.c
@@ -88,6 +88,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
 			    const struct nf_hook_state *state)
 {
 	unsigned int *timeouts;
+	unsigned long status;
 
 	if (udp_error(skb, dataoff, state))
 		return -NF_ACCEPT;
@@ -96,26 +97,27 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
 	if (!timeouts)
 		timeouts = udp_get_timeouts(nf_ct_net(ct));
 
-	if (!nf_ct_is_confirmed(ct))
+	status = READ_ONCE(ct->status);
+	if ((status & IPS_CONFIRMED) == 0)
 		ct->proto.udp.stream_ts = 2 * HZ + jiffies;
 
 	/* If we've seen traffic both ways, this is some kind of UDP
 	 * stream. Set Assured.
 	 */
-	if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
+	if (status & IPS_SEEN_REPLY_BIT) {
 		unsigned long extra = timeouts[UDP_CT_UNREPLIED];
 		bool stream = false;
 
 		/* Still active after two seconds? Extend timeout. */
 		if (time_after(jiffies, ct->proto.udp.stream_ts)) {
 			extra = timeouts[UDP_CT_REPLIED];
-			stream = true;
+			stream = (status & IPS_ASSURED) == 0;
 		}
 
 		nf_ct_refresh_acct(ct, ctinfo, skb, extra);
 
 		/* never set ASSURED for IPS_NAT_CLASH, they time out soon */
-		if (unlikely((ct->status & IPS_NAT_CLASH)))
+		if (unlikely((status & IPS_NAT_CLASH)))
 			return NF_ACCEPT;
 
 		/* Also, more likely to be important, and not a probe */
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/9] netfilter: conntrack: move rcu read lock to nf_conntrack_find_get
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (2 preceding siblings ...)
  2023-01-18 12:32 ` [PATCH net-next 3/9] netfilter: conntrack: avoid reload of ct->status Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 5/9] netfilter: ip_tables: remove clusterip target Florian Westphal
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

Move rcu_read_lock/unlock to nf_conntrack_find_get(), this avoids
nested rcu_read_lock call from resolve_normal_ct().

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9e12cade4e0f..c00858344f02 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -783,8 +783,6 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 	struct nf_conntrack_tuple_hash *h;
 	struct nf_conn *ct;
 
-	rcu_read_lock();
-
 	h = ____nf_conntrack_find(net, zone, tuple, hash);
 	if (h) {
 		/* We have a candidate that matches the tuple we're interested
@@ -796,7 +794,7 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 			smp_acquire__after_ctrl_dep();
 
 			if (likely(nf_ct_key_equal(h, tuple, zone, net)))
-				goto found;
+				return h;
 
 			/* TYPESAFE_BY_RCU recycled the candidate */
 			nf_ct_put(ct);
@@ -804,8 +802,6 @@ __nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 
 		h = NULL;
 	}
-found:
-	rcu_read_unlock();
 
 	return h;
 }
@@ -817,16 +813,21 @@ nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 	unsigned int rid, zone_id = nf_ct_zone_id(zone, IP_CT_DIR_ORIGINAL);
 	struct nf_conntrack_tuple_hash *thash;
 
+	rcu_read_lock();
+
 	thash = __nf_conntrack_find_get(net, zone, tuple,
 					hash_conntrack_raw(tuple, zone_id, net));
 
 	if (thash)
-		return thash;
+		goto out_unlock;
 
 	rid = nf_ct_zone_id(zone, IP_CT_DIR_REPLY);
 	if (rid != zone_id)
-		return __nf_conntrack_find_get(net, zone, tuple,
-					       hash_conntrack_raw(tuple, rid, net));
+		thash = __nf_conntrack_find_get(net, zone, tuple,
+						hash_conntrack_raw(tuple, rid, net));
+
+out_unlock:
+	rcu_read_unlock();
 	return thash;
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_find_get);
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/9] netfilter: ip_tables: remove clusterip target
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (3 preceding siblings ...)
  2023-01-18 12:32 ` [PATCH net-next 4/9] netfilter: conntrack: move rcu read lock to nf_conntrack_find_get Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 6/9] netfilter: nf_tables: add static key to skip retpoline workarounds Florian Westphal
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

Marked as 'to be removed soon' since kernel 4.1 (2015).
Functionality was superseded by the 'cluster' match, added in kernel
2.6.30 (2009).

clusterip_tg_check still has races that can give

 proc_dir_entry 'ipt_CLUSTERIP/10.1.1.2' already registered

followed by a WARN splat.

Remove it instead of trying to fix this up again.
clusterip uapi header is left as-is for now.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/netfilter/Kconfig         |  14 -
 net/ipv4/netfilter/Makefile        |   1 -
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 929 -----------------------------
 3 files changed, 944 deletions(-)
 delete mode 100644 net/ipv4/netfilter/ipt_CLUSTERIP.c

diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index aab384126f61..f71a7e9a7de6 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -259,20 +259,6 @@ config IP_NF_MANGLE
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
-config IP_NF_TARGET_CLUSTERIP
-	tristate "CLUSTERIP target support"
-	depends on IP_NF_MANGLE
-	depends on NF_CONNTRACK
-	depends on NETFILTER_ADVANCED
-	select NF_CONNTRACK_MARK
-	select NETFILTER_FAMILY_ARP
-	help
-	  The CLUSTERIP target allows you to build load-balancing clusters of
-	  network servers without having a dedicated load-balancing
-	  router/server/switch.
-
-	  To compile it as a module, choose M here.  If unsure, say N.
-
 config IP_NF_TARGET_ECN
 	tristate "ECN target support"
 	depends on IP_NF_MANGLE
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 93bad1184251..5a26f9de1ab9 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -39,7 +39,6 @@ obj-$(CONFIG_IP_NF_MATCH_AH) += ipt_ah.o
 obj-$(CONFIG_IP_NF_MATCH_RPFILTER) += ipt_rpfilter.o
 
 # targets
-obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o
 obj-$(CONFIG_IP_NF_TARGET_ECN) += ipt_ECN.o
 obj-$(CONFIG_IP_NF_TARGET_REJECT) += ipt_REJECT.o
 obj-$(CONFIG_IP_NF_TARGET_SYNPROXY) += ipt_SYNPROXY.o
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
deleted file mode 100644
index b3cc416ed292..000000000000
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ /dev/null
@@ -1,929 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/* Cluster IP hashmark target
- * (C) 2003-2004 by Harald Welte <laforge@netfilter.org>
- * based on ideas of Fabio Olive Leite <olive@unixforge.org>
- *
- * Development of this code funded by SuSE Linux AG, https://www.suse.com/
- */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#include <linux/module.h>
-#include <linux/proc_fs.h>
-#include <linux/jhash.h>
-#include <linux/bitops.h>
-#include <linux/skbuff.h>
-#include <linux/slab.h>
-#include <linux/ip.h>
-#include <linux/tcp.h>
-#include <linux/udp.h>
-#include <linux/icmp.h>
-#include <linux/if_arp.h>
-#include <linux/seq_file.h>
-#include <linux/refcount.h>
-#include <linux/netfilter_arp.h>
-#include <linux/netfilter/x_tables.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
-#include <linux/netfilter_ipv4/ipt_CLUSTERIP.h>
-#include <net/netfilter/nf_conntrack.h>
-#include <net/net_namespace.h>
-#include <net/netns/generic.h>
-#include <net/checksum.h>
-#include <net/ip.h>
-
-#define CLUSTERIP_VERSION "0.8"
-
-MODULE_LICENSE("GPL");
-MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>");
-MODULE_DESCRIPTION("Xtables: CLUSTERIP target");
-
-struct clusterip_config {
-	struct list_head list;			/* list of all configs */
-	refcount_t refcount;			/* reference count */
-	refcount_t entries;			/* number of entries/rules
-						 * referencing us */
-
-	__be32 clusterip;			/* the IP address */
-	u_int8_t clustermac[ETH_ALEN];		/* the MAC address */
-	int ifindex;				/* device ifindex */
-	u_int16_t num_total_nodes;		/* total number of nodes */
-	unsigned long local_nodes;		/* node number array */
-
-#ifdef CONFIG_PROC_FS
-	struct proc_dir_entry *pde;		/* proc dir entry */
-#endif
-	enum clusterip_hashmode hash_mode;	/* which hashing mode */
-	u_int32_t hash_initval;			/* hash initialization */
-	struct rcu_head rcu;			/* for call_rcu */
-	struct net *net;			/* netns for pernet list */
-	char ifname[IFNAMSIZ];			/* device ifname */
-};
-
-#ifdef CONFIG_PROC_FS
-static const struct proc_ops clusterip_proc_ops;
-#endif
-
-struct clusterip_net {
-	struct list_head configs;
-	/* lock protects the configs list */
-	spinlock_t lock;
-
-	bool clusterip_deprecated_warning;
-#ifdef CONFIG_PROC_FS
-	struct proc_dir_entry *procdir;
-	/* mutex protects the config->pde*/
-	struct mutex mutex;
-#endif
-	unsigned int hook_users;
-};
-
-static unsigned int clusterip_arp_mangle(void *priv, struct sk_buff *skb, const struct nf_hook_state *state);
-
-static const struct nf_hook_ops cip_arp_ops = {
-	.hook = clusterip_arp_mangle,
-	.pf = NFPROTO_ARP,
-	.hooknum = NF_ARP_OUT,
-	.priority = -1
-};
-
-static unsigned int clusterip_net_id __read_mostly;
-static inline struct clusterip_net *clusterip_pernet(struct net *net)
-{
-	return net_generic(net, clusterip_net_id);
-}
-
-static inline void
-clusterip_config_get(struct clusterip_config *c)
-{
-	refcount_inc(&c->refcount);
-}
-
-static void clusterip_config_rcu_free(struct rcu_head *head)
-{
-	struct clusterip_config *config;
-	struct net_device *dev;
-
-	config = container_of(head, struct clusterip_config, rcu);
-	dev = dev_get_by_name(config->net, config->ifname);
-	if (dev) {
-		dev_mc_del(dev, config->clustermac);
-		dev_put(dev);
-	}
-	kfree(config);
-}
-
-static inline void
-clusterip_config_put(struct clusterip_config *c)
-{
-	if (refcount_dec_and_test(&c->refcount))
-		call_rcu(&c->rcu, clusterip_config_rcu_free);
-}
-
-/* decrease the count of entries using/referencing this config.  If last
- * entry(rule) is removed, remove the config from lists, but don't free it
- * yet, since proc-files could still be holding references */
-static inline void
-clusterip_config_entry_put(struct clusterip_config *c)
-{
-	struct clusterip_net *cn = clusterip_pernet(c->net);
-
-	local_bh_disable();
-	if (refcount_dec_and_lock(&c->entries, &cn->lock)) {
-		list_del_rcu(&c->list);
-		spin_unlock(&cn->lock);
-		local_bh_enable();
-		/* In case anyone still accesses the file, the open/close
-		 * functions are also incrementing the refcount on their own,
-		 * so it's safe to remove the entry even if it's in use. */
-#ifdef CONFIG_PROC_FS
-		mutex_lock(&cn->mutex);
-		if (cn->procdir)
-			proc_remove(c->pde);
-		mutex_unlock(&cn->mutex);
-#endif
-		return;
-	}
-	local_bh_enable();
-}
-
-static struct clusterip_config *
-__clusterip_config_find(struct net *net, __be32 clusterip)
-{
-	struct clusterip_config *c;
-	struct clusterip_net *cn = clusterip_pernet(net);
-
-	list_for_each_entry_rcu(c, &cn->configs, list) {
-		if (c->clusterip == clusterip)
-			return c;
-	}
-
-	return NULL;
-}
-
-static inline struct clusterip_config *
-clusterip_config_find_get(struct net *net, __be32 clusterip, int entry)
-{
-	struct clusterip_config *c;
-
-	rcu_read_lock_bh();
-	c = __clusterip_config_find(net, clusterip);
-	if (c) {
-#ifdef CONFIG_PROC_FS
-		if (!c->pde)
-			c = NULL;
-		else
-#endif
-		if (unlikely(!refcount_inc_not_zero(&c->refcount)))
-			c = NULL;
-		else if (entry) {
-			if (unlikely(!refcount_inc_not_zero(&c->entries))) {
-				clusterip_config_put(c);
-				c = NULL;
-			}
-		}
-	}
-	rcu_read_unlock_bh();
-
-	return c;
-}
-
-static void
-clusterip_config_init_nodelist(struct clusterip_config *c,
-			       const struct ipt_clusterip_tgt_info *i)
-{
-	int n;
-
-	for (n = 0; n < i->num_local_nodes; n++)
-		set_bit(i->local_nodes[n] - 1, &c->local_nodes);
-}
-
-static int
-clusterip_netdev_event(struct notifier_block *this, unsigned long event,
-		       void *ptr)
-{
-	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
-	struct net *net = dev_net(dev);
-	struct clusterip_net *cn = clusterip_pernet(net);
-	struct clusterip_config *c;
-
-	spin_lock_bh(&cn->lock);
-	list_for_each_entry_rcu(c, &cn->configs, list) {
-		switch (event) {
-		case NETDEV_REGISTER:
-			if (!strcmp(dev->name, c->ifname)) {
-				c->ifindex = dev->ifindex;
-				dev_mc_add(dev, c->clustermac);
-			}
-			break;
-		case NETDEV_UNREGISTER:
-			if (dev->ifindex == c->ifindex) {
-				dev_mc_del(dev, c->clustermac);
-				c->ifindex = -1;
-			}
-			break;
-		case NETDEV_CHANGENAME:
-			if (!strcmp(dev->name, c->ifname)) {
-				c->ifindex = dev->ifindex;
-				dev_mc_add(dev, c->clustermac);
-			} else if (dev->ifindex == c->ifindex) {
-				dev_mc_del(dev, c->clustermac);
-				c->ifindex = -1;
-			}
-			break;
-		}
-	}
-	spin_unlock_bh(&cn->lock);
-
-	return NOTIFY_DONE;
-}
-
-static struct clusterip_config *
-clusterip_config_init(struct net *net, const struct ipt_clusterip_tgt_info *i,
-		      __be32 ip, const char *iniface)
-{
-	struct clusterip_net *cn = clusterip_pernet(net);
-	struct clusterip_config *c;
-	struct net_device *dev;
-	int err;
-
-	if (iniface[0] == '\0') {
-		pr_info("Please specify an interface name\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	c = kzalloc(sizeof(*c), GFP_ATOMIC);
-	if (!c)
-		return ERR_PTR(-ENOMEM);
-
-	dev = dev_get_by_name(net, iniface);
-	if (!dev) {
-		pr_info("no such interface %s\n", iniface);
-		kfree(c);
-		return ERR_PTR(-ENOENT);
-	}
-	c->ifindex = dev->ifindex;
-	strcpy(c->ifname, dev->name);
-	memcpy(&c->clustermac, &i->clustermac, ETH_ALEN);
-	dev_mc_add(dev, c->clustermac);
-	dev_put(dev);
-
-	c->clusterip = ip;
-	c->num_total_nodes = i->num_total_nodes;
-	clusterip_config_init_nodelist(c, i);
-	c->hash_mode = i->hash_mode;
-	c->hash_initval = i->hash_initval;
-	c->net = net;
-	refcount_set(&c->refcount, 1);
-
-	spin_lock_bh(&cn->lock);
-	if (__clusterip_config_find(net, ip)) {
-		err = -EBUSY;
-		goto out_config_put;
-	}
-
-	list_add_rcu(&c->list, &cn->configs);
-	spin_unlock_bh(&cn->lock);
-
-#ifdef CONFIG_PROC_FS
-	{
-		char buffer[16];
-
-		/* create proc dir entry */
-		sprintf(buffer, "%pI4", &ip);
-		mutex_lock(&cn->mutex);
-		c->pde = proc_create_data(buffer, 0600,
-					  cn->procdir,
-					  &clusterip_proc_ops, c);
-		mutex_unlock(&cn->mutex);
-		if (!c->pde) {
-			err = -ENOMEM;
-			goto err;
-		}
-	}
-#endif
-
-	refcount_set(&c->entries, 1);
-	return c;
-
-#ifdef CONFIG_PROC_FS
-err:
-#endif
-	spin_lock_bh(&cn->lock);
-	list_del_rcu(&c->list);
-out_config_put:
-	spin_unlock_bh(&cn->lock);
-	clusterip_config_put(c);
-	return ERR_PTR(err);
-}
-
-#ifdef CONFIG_PROC_FS
-static int
-clusterip_add_node(struct clusterip_config *c, u_int16_t nodenum)
-{
-
-	if (nodenum == 0 ||
-	    nodenum > c->num_total_nodes)
-		return 1;
-
-	/* check if we already have this number in our bitfield */
-	if (test_and_set_bit(nodenum - 1, &c->local_nodes))
-		return 1;
-
-	return 0;
-}
-
-static bool
-clusterip_del_node(struct clusterip_config *c, u_int16_t nodenum)
-{
-	if (nodenum == 0 ||
-	    nodenum > c->num_total_nodes)
-		return true;
-
-	if (test_and_clear_bit(nodenum - 1, &c->local_nodes))
-		return false;
-
-	return true;
-}
-#endif
-
-static inline u_int32_t
-clusterip_hashfn(const struct sk_buff *skb,
-		 const struct clusterip_config *config)
-{
-	const struct iphdr *iph = ip_hdr(skb);
-	unsigned long hashval;
-	u_int16_t sport = 0, dport = 0;
-	int poff;
-
-	poff = proto_ports_offset(iph->protocol);
-	if (poff >= 0) {
-		const u_int16_t *ports;
-		u16 _ports[2];
-
-		ports = skb_header_pointer(skb, iph->ihl * 4 + poff, 4, _ports);
-		if (ports) {
-			sport = ports[0];
-			dport = ports[1];
-		}
-	} else {
-		net_info_ratelimited("unknown protocol %u\n", iph->protocol);
-	}
-
-	switch (config->hash_mode) {
-	case CLUSTERIP_HASHMODE_SIP:
-		hashval = jhash_1word(ntohl(iph->saddr),
-				      config->hash_initval);
-		break;
-	case CLUSTERIP_HASHMODE_SIP_SPT:
-		hashval = jhash_2words(ntohl(iph->saddr), sport,
-				       config->hash_initval);
-		break;
-	case CLUSTERIP_HASHMODE_SIP_SPT_DPT:
-		hashval = jhash_3words(ntohl(iph->saddr), sport, dport,
-				       config->hash_initval);
-		break;
-	default:
-		/* to make gcc happy */
-		hashval = 0;
-		/* This cannot happen, unless the check function wasn't called
-		 * at rule load time */
-		pr_info("unknown mode %u\n", config->hash_mode);
-		BUG();
-		break;
-	}
-
-	/* node numbers are 1..n, not 0..n */
-	return reciprocal_scale(hashval, config->num_total_nodes) + 1;
-}
-
-static inline int
-clusterip_responsible(const struct clusterip_config *config, u_int32_t hash)
-{
-	return test_bit(hash - 1, &config->local_nodes);
-}
-
-/***********************************************************************
- * IPTABLES TARGET
- ***********************************************************************/
-
-static unsigned int
-clusterip_tg(struct sk_buff *skb, const struct xt_action_param *par)
-{
-	const struct ipt_clusterip_tgt_info *cipinfo = par->targinfo;
-	struct nf_conn *ct;
-	enum ip_conntrack_info ctinfo;
-	u_int32_t hash;
-
-	/* don't need to clusterip_config_get() here, since refcount
-	 * is only decremented by destroy() - and ip_tables guarantees
-	 * that the ->target() function isn't called after ->destroy() */
-
-	ct = nf_ct_get(skb, &ctinfo);
-	if (ct == NULL)
-		return NF_DROP;
-
-	/* special case: ICMP error handling. conntrack distinguishes between
-	 * error messages (RELATED) and information requests (see below) */
-	if (ip_hdr(skb)->protocol == IPPROTO_ICMP &&
-	    (ctinfo == IP_CT_RELATED ||
-	     ctinfo == IP_CT_RELATED_REPLY))
-		return XT_CONTINUE;
-
-	/* nf_conntrack_proto_icmp guarantees us that we only have ICMP_ECHO,
-	 * TIMESTAMP, INFO_REQUEST or ICMP_ADDRESS type icmp packets from here
-	 * on, which all have an ID field [relevant for hashing]. */
-
-	hash = clusterip_hashfn(skb, cipinfo->config);
-
-	switch (ctinfo) {
-	case IP_CT_NEW:
-		WRITE_ONCE(ct->mark, hash);
-		break;
-	case IP_CT_RELATED:
-	case IP_CT_RELATED_REPLY:
-		/* FIXME: we don't handle expectations at the moment.
-		 * They can arrive on a different node than
-		 * the master connection (e.g. FTP passive mode) */
-	case IP_CT_ESTABLISHED:
-	case IP_CT_ESTABLISHED_REPLY:
-		break;
-	default:			/* Prevent gcc warnings */
-		break;
-	}
-
-#ifdef DEBUG
-	nf_ct_dump_tuple_ip(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
-#endif
-	pr_debug("hash=%u ct_hash=%u ", hash, READ_ONCE(ct->mark));
-	if (!clusterip_responsible(cipinfo->config, hash)) {
-		pr_debug("not responsible\n");
-		return NF_DROP;
-	}
-	pr_debug("responsible\n");
-
-	/* despite being received via linklayer multicast, this is
-	 * actually a unicast IP packet. TCP doesn't like PACKET_MULTICAST */
-	skb->pkt_type = PACKET_HOST;
-
-	return XT_CONTINUE;
-}
-
-static int clusterip_tg_check(const struct xt_tgchk_param *par)
-{
-	struct ipt_clusterip_tgt_info *cipinfo = par->targinfo;
-	struct clusterip_net *cn = clusterip_pernet(par->net);
-	const struct ipt_entry *e = par->entryinfo;
-	struct clusterip_config *config;
-	int ret, i;
-
-	if (par->nft_compat) {
-		pr_err("cannot use CLUSTERIP target from nftables compat\n");
-		return -EOPNOTSUPP;
-	}
-
-	if (cn->hook_users == UINT_MAX)
-		return -EOVERFLOW;
-
-	if (cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP &&
-	    cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT &&
-	    cipinfo->hash_mode != CLUSTERIP_HASHMODE_SIP_SPT_DPT) {
-		pr_info("unknown mode %u\n", cipinfo->hash_mode);
-		return -EINVAL;
-
-	}
-	if (e->ip.dmsk.s_addr != htonl(0xffffffff) ||
-	    e->ip.dst.s_addr == 0) {
-		pr_info("Please specify destination IP\n");
-		return -EINVAL;
-	}
-	if (cipinfo->num_local_nodes > ARRAY_SIZE(cipinfo->local_nodes)) {
-		pr_info("bad num_local_nodes %u\n", cipinfo->num_local_nodes);
-		return -EINVAL;
-	}
-	for (i = 0; i < cipinfo->num_local_nodes; i++) {
-		if (cipinfo->local_nodes[i] - 1 >=
-		    sizeof(config->local_nodes) * 8) {
-			pr_info("bad local_nodes[%d] %u\n",
-				i, cipinfo->local_nodes[i]);
-			return -EINVAL;
-		}
-	}
-
-	config = clusterip_config_find_get(par->net, e->ip.dst.s_addr, 1);
-	if (!config) {
-		if (!(cipinfo->flags & CLUSTERIP_FLAG_NEW)) {
-			pr_info("no config found for %pI4, need 'new'\n",
-				&e->ip.dst.s_addr);
-			return -EINVAL;
-		} else {
-			config = clusterip_config_init(par->net, cipinfo,
-						       e->ip.dst.s_addr,
-						       e->ip.iniface);
-			if (IS_ERR(config))
-				return PTR_ERR(config);
-		}
-	} else if (memcmp(&config->clustermac, &cipinfo->clustermac, ETH_ALEN)) {
-		clusterip_config_entry_put(config);
-		clusterip_config_put(config);
-		return -EINVAL;
-	}
-
-	ret = nf_ct_netns_get(par->net, par->family);
-	if (ret < 0) {
-		pr_info("cannot load conntrack support for proto=%u\n",
-			par->family);
-		clusterip_config_entry_put(config);
-		clusterip_config_put(config);
-		return ret;
-	}
-
-	if (cn->hook_users == 0) {
-		ret = nf_register_net_hook(par->net, &cip_arp_ops);
-
-		if (ret < 0) {
-			clusterip_config_entry_put(config);
-			clusterip_config_put(config);
-			nf_ct_netns_put(par->net, par->family);
-			return ret;
-		}
-	}
-
-	cn->hook_users++;
-
-	if (!cn->clusterip_deprecated_warning) {
-		pr_info("ipt_CLUSTERIP is deprecated and it will removed soon, "
-			"use xt_cluster instead\n");
-		cn->clusterip_deprecated_warning = true;
-	}
-
-	cipinfo->config = config;
-	return ret;
-}
-
-/* drop reference count of cluster config when rule is deleted */
-static void clusterip_tg_destroy(const struct xt_tgdtor_param *par)
-{
-	const struct ipt_clusterip_tgt_info *cipinfo = par->targinfo;
-	struct clusterip_net *cn = clusterip_pernet(par->net);
-
-	/* if no more entries are referencing the config, remove it
-	 * from the list and destroy the proc entry */
-	clusterip_config_entry_put(cipinfo->config);
-
-	clusterip_config_put(cipinfo->config);
-
-	nf_ct_netns_put(par->net, par->family);
-	cn->hook_users--;
-
-	if (cn->hook_users == 0)
-		nf_unregister_net_hook(par->net, &cip_arp_ops);
-}
-
-#ifdef CONFIG_NETFILTER_XTABLES_COMPAT
-struct compat_ipt_clusterip_tgt_info
-{
-	u_int32_t	flags;
-	u_int8_t	clustermac[6];
-	u_int16_t	num_total_nodes;
-	u_int16_t	num_local_nodes;
-	u_int16_t	local_nodes[CLUSTERIP_MAX_NODES];
-	u_int32_t	hash_mode;
-	u_int32_t	hash_initval;
-	compat_uptr_t	config;
-};
-#endif /* CONFIG_NETFILTER_XTABLES_COMPAT */
-
-static struct xt_target clusterip_tg_reg __read_mostly = {
-	.name		= "CLUSTERIP",
-	.family		= NFPROTO_IPV4,
-	.target		= clusterip_tg,
-	.checkentry	= clusterip_tg_check,
-	.destroy	= clusterip_tg_destroy,
-	.targetsize	= sizeof(struct ipt_clusterip_tgt_info),
-	.usersize	= offsetof(struct ipt_clusterip_tgt_info, config),
-#ifdef CONFIG_NETFILTER_XTABLES_COMPAT
-	.compatsize	= sizeof(struct compat_ipt_clusterip_tgt_info),
-#endif /* CONFIG_NETFILTER_XTABLES_COMPAT */
-	.me		= THIS_MODULE
-};
-
-
-/***********************************************************************
- * ARP MANGLING CODE
- ***********************************************************************/
-
-/* hardcoded for 48bit ethernet and 32bit ipv4 addresses */
-struct arp_payload {
-	u_int8_t src_hw[ETH_ALEN];
-	__be32 src_ip;
-	u_int8_t dst_hw[ETH_ALEN];
-	__be32 dst_ip;
-} __packed;
-
-#ifdef DEBUG
-static void arp_print(struct arp_payload *payload)
-{
-#define HBUFFERLEN 30
-	char hbuffer[HBUFFERLEN];
-	int j, k;
-
-	for (k = 0, j = 0; k < HBUFFERLEN - 3 && j < ETH_ALEN; j++) {
-		hbuffer[k++] = hex_asc_hi(payload->src_hw[j]);
-		hbuffer[k++] = hex_asc_lo(payload->src_hw[j]);
-		hbuffer[k++] = ':';
-	}
-	hbuffer[--k] = '\0';
-
-	pr_debug("src %pI4@%s, dst %pI4\n",
-		 &payload->src_ip, hbuffer, &payload->dst_ip);
-}
-#endif
-
-static unsigned int
-clusterip_arp_mangle(void *priv, struct sk_buff *skb,
-		     const struct nf_hook_state *state)
-{
-	struct arphdr *arp = arp_hdr(skb);
-	struct arp_payload *payload;
-	struct clusterip_config *c;
-	struct net *net = state->net;
-
-	/* we don't care about non-ethernet and non-ipv4 ARP */
-	if (arp->ar_hrd != htons(ARPHRD_ETHER) ||
-	    arp->ar_pro != htons(ETH_P_IP) ||
-	    arp->ar_pln != 4 || arp->ar_hln != ETH_ALEN)
-		return NF_ACCEPT;
-
-	/* we only want to mangle arp requests and replies */
-	if (arp->ar_op != htons(ARPOP_REPLY) &&
-	    arp->ar_op != htons(ARPOP_REQUEST))
-		return NF_ACCEPT;
-
-	payload = (void *)(arp+1);
-
-	/* if there is no clusterip configuration for the arp reply's
-	 * source ip, we don't want to mangle it */
-	c = clusterip_config_find_get(net, payload->src_ip, 0);
-	if (!c)
-		return NF_ACCEPT;
-
-	/* normally the linux kernel always replies to arp queries of
-	 * addresses on different interfacs.  However, in the CLUSTERIP case
-	 * this wouldn't work, since we didn't subscribe the mcast group on
-	 * other interfaces */
-	if (c->ifindex != state->out->ifindex) {
-		pr_debug("not mangling arp reply on different interface: cip'%d'-skb'%d'\n",
-			 c->ifindex, state->out->ifindex);
-		clusterip_config_put(c);
-		return NF_ACCEPT;
-	}
-
-	/* mangle reply hardware address */
-	memcpy(payload->src_hw, c->clustermac, arp->ar_hln);
-
-#ifdef DEBUG
-	pr_debug("mangled arp reply: ");
-	arp_print(payload);
-#endif
-
-	clusterip_config_put(c);
-
-	return NF_ACCEPT;
-}
-
-/***********************************************************************
- * PROC DIR HANDLING
- ***********************************************************************/
-
-#ifdef CONFIG_PROC_FS
-
-struct clusterip_seq_position {
-	unsigned int pos;	/* position */
-	unsigned int weight;	/* number of bits set == size */
-	unsigned int bit;	/* current bit */
-	unsigned long val;	/* current value */
-};
-
-static void *clusterip_seq_start(struct seq_file *s, loff_t *pos)
-{
-	struct clusterip_config *c = s->private;
-	unsigned int weight;
-	u_int32_t local_nodes;
-	struct clusterip_seq_position *idx;
-
-	/* FIXME: possible race */
-	local_nodes = c->local_nodes;
-	weight = hweight32(local_nodes);
-	if (*pos >= weight)
-		return NULL;
-
-	idx = kmalloc(sizeof(struct clusterip_seq_position), GFP_KERNEL);
-	if (!idx)
-		return ERR_PTR(-ENOMEM);
-
-	idx->pos = *pos;
-	idx->weight = weight;
-	idx->bit = ffs(local_nodes);
-	idx->val = local_nodes;
-	clear_bit(idx->bit - 1, &idx->val);
-
-	return idx;
-}
-
-static void *clusterip_seq_next(struct seq_file *s, void *v, loff_t *pos)
-{
-	struct clusterip_seq_position *idx = v;
-
-	*pos = ++idx->pos;
-	if (*pos >= idx->weight) {
-		kfree(v);
-		return NULL;
-	}
-	idx->bit = ffs(idx->val);
-	clear_bit(idx->bit - 1, &idx->val);
-	return idx;
-}
-
-static void clusterip_seq_stop(struct seq_file *s, void *v)
-{
-	if (!IS_ERR(v))
-		kfree(v);
-}
-
-static int clusterip_seq_show(struct seq_file *s, void *v)
-{
-	struct clusterip_seq_position *idx = v;
-
-	if (idx->pos != 0)
-		seq_putc(s, ',');
-
-	seq_printf(s, "%u", idx->bit);
-
-	if (idx->pos == idx->weight - 1)
-		seq_putc(s, '\n');
-
-	return 0;
-}
-
-static const struct seq_operations clusterip_seq_ops = {
-	.start	= clusterip_seq_start,
-	.next	= clusterip_seq_next,
-	.stop	= clusterip_seq_stop,
-	.show	= clusterip_seq_show,
-};
-
-static int clusterip_proc_open(struct inode *inode, struct file *file)
-{
-	int ret = seq_open(file, &clusterip_seq_ops);
-
-	if (!ret) {
-		struct seq_file *sf = file->private_data;
-		struct clusterip_config *c = pde_data(inode);
-
-		sf->private = c;
-
-		clusterip_config_get(c);
-	}
-
-	return ret;
-}
-
-static int clusterip_proc_release(struct inode *inode, struct file *file)
-{
-	struct clusterip_config *c = pde_data(inode);
-	int ret;
-
-	ret = seq_release(inode, file);
-
-	if (!ret)
-		clusterip_config_put(c);
-
-	return ret;
-}
-
-static ssize_t clusterip_proc_write(struct file *file, const char __user *input,
-				size_t size, loff_t *ofs)
-{
-	struct clusterip_config *c = pde_data(file_inode(file));
-#define PROC_WRITELEN	10
-	char buffer[PROC_WRITELEN+1];
-	unsigned long nodenum;
-	int rc;
-
-	if (size > PROC_WRITELEN)
-		return -EIO;
-	if (copy_from_user(buffer, input, size))
-		return -EFAULT;
-	buffer[size] = 0;
-
-	if (*buffer == '+') {
-		rc = kstrtoul(buffer+1, 10, &nodenum);
-		if (rc)
-			return rc;
-		if (clusterip_add_node(c, nodenum))
-			return -ENOMEM;
-	} else if (*buffer == '-') {
-		rc = kstrtoul(buffer+1, 10, &nodenum);
-		if (rc)
-			return rc;
-		if (clusterip_del_node(c, nodenum))
-			return -ENOENT;
-	} else
-		return -EIO;
-
-	return size;
-}
-
-static const struct proc_ops clusterip_proc_ops = {
-	.proc_open	= clusterip_proc_open,
-	.proc_read	= seq_read,
-	.proc_write	= clusterip_proc_write,
-	.proc_lseek	= seq_lseek,
-	.proc_release	= clusterip_proc_release,
-};
-
-#endif /* CONFIG_PROC_FS */
-
-static int clusterip_net_init(struct net *net)
-{
-	struct clusterip_net *cn = clusterip_pernet(net);
-
-	INIT_LIST_HEAD(&cn->configs);
-
-	spin_lock_init(&cn->lock);
-
-#ifdef CONFIG_PROC_FS
-	cn->procdir = proc_mkdir("ipt_CLUSTERIP", net->proc_net);
-	if (!cn->procdir) {
-		pr_err("Unable to proc dir entry\n");
-		return -ENOMEM;
-	}
-	mutex_init(&cn->mutex);
-#endif /* CONFIG_PROC_FS */
-
-	return 0;
-}
-
-static void clusterip_net_exit(struct net *net)
-{
-#ifdef CONFIG_PROC_FS
-	struct clusterip_net *cn = clusterip_pernet(net);
-
-	mutex_lock(&cn->mutex);
-	proc_remove(cn->procdir);
-	cn->procdir = NULL;
-	mutex_unlock(&cn->mutex);
-#endif
-}
-
-static struct pernet_operations clusterip_net_ops = {
-	.init = clusterip_net_init,
-	.exit = clusterip_net_exit,
-	.id   = &clusterip_net_id,
-	.size = sizeof(struct clusterip_net),
-};
-
-static struct notifier_block cip_netdev_notifier = {
-	.notifier_call = clusterip_netdev_event
-};
-
-static int __init clusterip_tg_init(void)
-{
-	int ret;
-
-	ret = register_pernet_subsys(&clusterip_net_ops);
-	if (ret < 0)
-		return ret;
-
-	ret = xt_register_target(&clusterip_tg_reg);
-	if (ret < 0)
-		goto cleanup_subsys;
-
-	ret = register_netdevice_notifier(&cip_netdev_notifier);
-	if (ret < 0)
-		goto unregister_target;
-
-	pr_info("ClusterIP Version %s loaded successfully\n",
-		CLUSTERIP_VERSION);
-
-	return 0;
-
-unregister_target:
-	xt_unregister_target(&clusterip_tg_reg);
-cleanup_subsys:
-	unregister_pernet_subsys(&clusterip_net_ops);
-	return ret;
-}
-
-static void __exit clusterip_tg_exit(void)
-{
-	pr_info("ClusterIP Version %s unloading\n", CLUSTERIP_VERSION);
-
-	unregister_netdevice_notifier(&cip_netdev_notifier);
-	xt_unregister_target(&clusterip_tg_reg);
-	unregister_pernet_subsys(&clusterip_net_ops);
-
-	/* Wait for completion of call_rcu()'s (clusterip_config_rcu_free) */
-	rcu_barrier();
-}
-
-module_init(clusterip_tg_init);
-module_exit(clusterip_tg_exit);
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 6/9] netfilter: nf_tables: add static key to skip retpoline workarounds
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (4 preceding siblings ...)
  2023-01-18 12:32 ` [PATCH net-next 5/9] netfilter: ip_tables: remove clusterip target Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 7/9] netfilter: nf_tables: avoid retpoline overhead for objref calls Florian Westphal
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal, Eric Dumazet

If CONFIG_RETPOLINE is enabled nf_tables avoids indirect calls for
builtin expressions.

On newer cpus indirect calls do not go through the retpoline thunk
anymore, even for RETPOLINE=y builds.

Just like with the new tc retpoline wrappers:
Add a static key to skip the if / else if cascade if the cpu
does not require retpolines.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_tables_core.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 709a736c301c..0f26d002d8b3 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -21,6 +21,26 @@
 #include <net/netfilter/nf_log.h>
 #include <net/netfilter/nft_meta.h>
 
+#if defined(CONFIG_RETPOLINE) && defined(CONFIG_X86)
+
+static struct static_key_false nf_tables_skip_direct_calls;
+
+static bool nf_skip_indirect_calls(void)
+{
+	return static_branch_likely(&nf_tables_skip_direct_calls);
+}
+
+static void __init nf_skip_indirect_calls_enable(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_RETPOLINE))
+		static_branch_enable(&nf_tables_skip_direct_calls);
+}
+#else
+static inline bool nf_skip_indirect_calls(void) { return false; }
+
+static inline void nf_skip_indirect_calls_enable(void) { }
+#endif
+
 static noinline void __nft_trace_packet(struct nft_traceinfo *info,
 					const struct nft_chain *chain,
 					enum nft_trace_types type)
@@ -193,7 +213,12 @@ static void expr_call_ops_eval(const struct nft_expr *expr,
 			       struct nft_pktinfo *pkt)
 {
 #ifdef CONFIG_RETPOLINE
-	unsigned long e = (unsigned long)expr->ops->eval;
+	unsigned long e;
+
+	if (nf_skip_indirect_calls())
+		goto indirect_call;
+
+	e = (unsigned long)expr->ops->eval;
 #define X(e, fun) \
 	do { if ((e) == (unsigned long)(fun)) \
 		return fun(expr, regs, pkt); } while (0)
@@ -210,6 +235,7 @@ static void expr_call_ops_eval(const struct nft_expr *expr,
 	X(e, nft_rt_get_eval);
 	X(e, nft_bitwise_eval);
 #undef  X
+indirect_call:
 #endif /* CONFIG_RETPOLINE */
 	expr->ops->eval(expr, regs, pkt);
 }
@@ -369,6 +395,8 @@ int __init nf_tables_core_module_init(void)
 			goto err;
 	}
 
+	nf_skip_indirect_calls_enable();
+
 	return 0;
 
 err:
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 7/9] netfilter: nf_tables: avoid retpoline overhead for objref calls
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (5 preceding siblings ...)
  2023-01-18 12:32 ` [PATCH net-next 6/9] netfilter: nf_tables: add static key to skip retpoline workarounds Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 8/9] netfilter: nf_tables: avoid retpoline overhead for some ct expression calls Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation Florian Westphal
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

objref expression is builtin, so avoid calls to it for
RETOLINE=y builds.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_tables_core.h |  4 ++++
 net/netfilter/nf_tables_core.c         |  2 ++
 net/netfilter/nft_objref.c             | 12 ++++++------
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index 3e825381ac5c..bedef373ec21 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -164,4 +164,8 @@ void nft_payload_inner_eval(const struct nft_expr *expr, struct nft_regs *regs,
 			    const struct nft_pktinfo *pkt,
 			    struct nft_inner_tun_ctx *ctx);
 
+void nft_objref_eval(const struct nft_expr *expr, struct nft_regs *regs,
+		     const struct nft_pktinfo *pkt);
+void nft_objref_map_eval(const struct nft_expr *expr, struct nft_regs *regs,
+			 const struct nft_pktinfo *pkt);
 #endif /* _NET_NF_TABLES_CORE_H */
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index 0f26d002d8b3..d9992906199f 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -234,6 +234,8 @@ static void expr_call_ops_eval(const struct nft_expr *expr,
 	X(e, nft_dynset_eval);
 	X(e, nft_rt_get_eval);
 	X(e, nft_bitwise_eval);
+	X(e, nft_objref_eval);
+	X(e, nft_objref_map_eval);
 #undef  X
 indirect_call:
 #endif /* CONFIG_RETPOLINE */
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index 7b01aa2ef653..cb37169608ba 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -13,9 +13,9 @@
 
 #define nft_objref_priv(expr)	*((struct nft_object **)nft_expr_priv(expr))
 
-static void nft_objref_eval(const struct nft_expr *expr,
-			    struct nft_regs *regs,
-			    const struct nft_pktinfo *pkt)
+void nft_objref_eval(const struct nft_expr *expr,
+		     struct nft_regs *regs,
+		     const struct nft_pktinfo *pkt)
 {
 	struct nft_object *obj = nft_objref_priv(expr);
 
@@ -100,9 +100,9 @@ struct nft_objref_map {
 	struct nft_set_binding	binding;
 };
 
-static void nft_objref_map_eval(const struct nft_expr *expr,
-				struct nft_regs *regs,
-				const struct nft_pktinfo *pkt)
+void nft_objref_map_eval(const struct nft_expr *expr,
+			 struct nft_regs *regs,
+			 const struct nft_pktinfo *pkt)
 {
 	struct nft_objref_map *priv = nft_expr_priv(expr);
 	const struct nft_set *set = priv->set;
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 8/9] netfilter: nf_tables: avoid retpoline overhead for some ct expression calls
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (6 preceding siblings ...)
  2023-01-18 12:32 ` [PATCH net-next 7/9] netfilter: nf_tables: avoid retpoline overhead for objref calls Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-18 12:32 ` [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation Florian Westphal
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Florian Westphal

nft_ct expression cannot be made builtin to nf_tables without also
forcing the conntrack itself to be builtin.

However, this can be avoided by splitting retrieval of a few
selector keys that only need to access the nf_conn structure,
i.e. no function calls to nf_conntrack code.

Many rulesets start with something like
"ct status established,related accept"

With this change, this no longer requires an indirect call, which
gives about 1.8% more throughput with a simple conntrack-enabled
forwarding test (retpoline thunk used).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_tables_core.h | 12 ++++++
 net/netfilter/Makefile                 |  6 +++
 net/netfilter/nf_tables_core.c         |  3 ++
 net/netfilter/nft_ct.c                 | 39 ++++++++++++------
 net/netfilter/nft_ct_fast.c            | 56 ++++++++++++++++++++++++++
 5 files changed, 104 insertions(+), 12 deletions(-)
 create mode 100644 net/netfilter/nft_ct_fast.c

diff --git a/include/net/netfilter/nf_tables_core.h b/include/net/netfilter/nf_tables_core.h
index bedef373ec21..780a5f6ad4a6 100644
--- a/include/net/netfilter/nf_tables_core.h
+++ b/include/net/netfilter/nf_tables_core.h
@@ -61,6 +61,16 @@ struct nft_immediate_expr {
 extern const struct nft_expr_ops nft_cmp_fast_ops;
 extern const struct nft_expr_ops nft_cmp16_fast_ops;
 
+struct nft_ct {
+	enum nft_ct_keys	key:8;
+	enum ip_conntrack_dir	dir:8;
+	u8			len;
+	union {
+		u8		dreg;
+		u8		sreg;
+	};
+};
+
 struct nft_payload {
 	enum nft_payload_bases	base:8;
 	u8			offset;
@@ -140,6 +150,8 @@ void nft_rt_get_eval(const struct nft_expr *expr,
 		     struct nft_regs *regs, const struct nft_pktinfo *pkt);
 void nft_counter_eval(const struct nft_expr *expr, struct nft_regs *regs,
                       const struct nft_pktinfo *pkt);
+void nft_ct_get_fast_eval(const struct nft_expr *expr,
+			  struct nft_regs *regs, const struct nft_pktinfo *pkt);
 
 enum {
 	NFT_PAYLOAD_CTX_INNER_TUN	= (1 << 0),
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 3754eb06fb41..ba2a6b5e93d9 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -98,6 +98,12 @@ nf_tables-objs += nft_set_pipapo_avx2.o
 endif
 endif
 
+ifdef CONFIG_NFT_CT
+ifdef CONFIG_RETPOLINE
+nf_tables-objs += nft_ct_fast.o
+endif
+endif
+
 obj-$(CONFIG_NF_TABLES)		+= nf_tables.o
 obj-$(CONFIG_NFT_COMPAT)	+= nft_compat.o
 obj-$(CONFIG_NFT_CONNLIMIT)	+= nft_connlimit.o
diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index d9992906199f..6ecd0ba2e546 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -228,6 +228,9 @@ static void expr_call_ops_eval(const struct nft_expr *expr,
 	X(e, nft_counter_eval);
 	X(e, nft_meta_get_eval);
 	X(e, nft_lookup_eval);
+#if IS_ENABLED(CONFIG_NFT_CT)
+	X(e, nft_ct_get_fast_eval);
+#endif
 	X(e, nft_range_eval);
 	X(e, nft_immediate_eval);
 	X(e, nft_byteorder_eval);
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index c68e2151defe..b9c84499438b 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -12,7 +12,7 @@
 #include <linux/netlink.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
-#include <net/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_acct.h>
 #include <net/netfilter/nf_conntrack_tuple.h>
@@ -23,16 +23,6 @@
 #include <net/netfilter/nf_conntrack_l4proto.h>
 #include <net/netfilter/nf_conntrack_expect.h>
 
-struct nft_ct {
-	enum nft_ct_keys	key:8;
-	enum ip_conntrack_dir	dir:8;
-	u8			len;
-	union {
-		u8		dreg;
-		u8		sreg;
-	};
-};
-
 struct nft_ct_helper_obj  {
 	struct nf_conntrack_helper *helper4;
 	struct nf_conntrack_helper *helper6;
@@ -759,6 +749,18 @@ static bool nft_ct_set_reduce(struct nft_regs_track *track,
 	return false;
 }
 
+#ifdef CONFIG_RETPOLINE
+static const struct nft_expr_ops nft_ct_get_fast_ops = {
+	.type		= &nft_ct_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_ct)),
+	.eval		= nft_ct_get_fast_eval,
+	.init		= nft_ct_get_init,
+	.destroy	= nft_ct_get_destroy,
+	.dump		= nft_ct_get_dump,
+	.reduce		= nft_ct_set_reduce,
+};
+#endif
+
 static const struct nft_expr_ops nft_ct_set_ops = {
 	.type		= &nft_ct_type,
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_ct)),
@@ -791,8 +793,21 @@ nft_ct_select_ops(const struct nft_ctx *ctx,
 	if (tb[NFTA_CT_DREG] && tb[NFTA_CT_SREG])
 		return ERR_PTR(-EINVAL);
 
-	if (tb[NFTA_CT_DREG])
+	if (tb[NFTA_CT_DREG]) {
+#ifdef CONFIG_RETPOLINE
+		u32 k = ntohl(nla_get_be32(tb[NFTA_CT_KEY]));
+
+		switch (k) {
+		case NFT_CT_STATE:
+		case NFT_CT_DIRECTION:
+		case NFT_CT_STATUS:
+		case NFT_CT_MARK:
+		case NFT_CT_SECMARK:
+			return &nft_ct_get_fast_ops;
+		}
+#endif
 		return &nft_ct_get_ops;
+	}
 
 	if (tb[NFTA_CT_SREG]) {
 #ifdef CONFIG_NF_CONNTRACK_ZONES
diff --git a/net/netfilter/nft_ct_fast.c b/net/netfilter/nft_ct_fast.c
new file mode 100644
index 000000000000..89983b0613fa
--- /dev/null
+++ b/net/netfilter/nft_ct_fast.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#if IS_ENABLED(CONFIG_NFT_CT)
+#include <linux/netfilter/nf_tables.h>
+#include <net/netfilter/nf_tables_core.h>
+#include <net/netfilter/nf_conntrack.h>
+
+void nft_ct_get_fast_eval(const struct nft_expr *expr,
+			  struct nft_regs *regs,
+			  const struct nft_pktinfo *pkt)
+{
+	const struct nft_ct *priv = nft_expr_priv(expr);
+	u32 *dest = &regs->data[priv->dreg];
+	enum ip_conntrack_info ctinfo;
+	const struct nf_conn *ct;
+	unsigned int state;
+
+	ct = nf_ct_get(pkt->skb, &ctinfo);
+	if (!ct) {
+		regs->verdict.code = NFT_BREAK;
+		return;
+	}
+
+	switch (priv->key) {
+	case NFT_CT_STATE:
+		if (ct)
+			state = NF_CT_STATE_BIT(ctinfo);
+		else if (ctinfo == IP_CT_UNTRACKED)
+			state = NF_CT_STATE_UNTRACKED_BIT;
+		else
+			state = NF_CT_STATE_INVALID_BIT;
+		*dest = state;
+		return;
+	case NFT_CT_DIRECTION:
+		nft_reg_store8(dest, CTINFO2DIR(ctinfo));
+		return;
+	case NFT_CT_STATUS:
+		*dest = ct->status;
+		return;
+#ifdef CONFIG_NF_CONNTRACK_MARK
+	case NFT_CT_MARK:
+		*dest = ct->mark;
+		return;
+#endif
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+	case NFT_CT_SECMARK:
+		*dest = ct->secmark;
+		return;
+#endif
+	default:
+		WARN_ON_ONCE(1);
+		regs->verdict.code = NFT_BREAK;
+		break;
+	}
+}
+EXPORT_SYMBOL_GPL(nft_ct_get_fast_eval);
+#endif
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation
  2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (7 preceding siblings ...)
  2023-01-18 12:32 ` [PATCH net-next 8/9] netfilter: nf_tables: avoid retpoline overhead for some ct expression calls Florian Westphal
@ 2023-01-18 12:32 ` Florian Westphal
  2023-01-19  7:29   ` Vlad Buslov
  8 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2023-01-18 12:32 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Fernando Fernandez Mancera, Pablo Neira Ayuso,
	Florian Westphal

From: Fernando Fernandez Mancera <ffmancera@riseup.net>

Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a
delete operation but ignoring the ENOENT errors.

This is useful for the transaction semantics, where failing to delete an
object which does not exist results in aborting the transaction.

This new command allows the transaction to proceed in case the object
does not exist.

Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/uapi/linux/netfilter/nf_tables.h |  14 +++
 net/netfilter/nf_tables_api.c            | 111 +++++++++++++++++++++--
 2 files changed, 117 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index cfa844da1ce6..ff677f3a6cad 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -98,6 +98,13 @@ enum nft_verdicts {
  * @NFT_MSG_GETFLOWTABLE: get flow table (enum nft_flowtable_attributes)
  * @NFT_MSG_DELFLOWTABLE: delete flow table (enum nft_flowtable_attributes)
  * @NFT_MSG_GETRULE_RESET: get rules and reset stateful expressions (enum nft_obj_attributes)
+ * @NFT_MSG_DESTROYTABLE: destroy a table (enum nft_table_attributes)
+ * @NFT_MSG_DESTROYCHAIN: destroy a chain (enum nft_chain_attributes)
+ * @NFT_MSG_DESTROYRULE: destroy a rule (enum nft_rule_attributes)
+ * @NFT_MSG_DESTROYSET: destroy a set (enum nft_set_attributes)
+ * @NFT_MSG_DESTROYSETELEM: destroy a set element (enum nft_set_elem_attributes)
+ * @NFT_MSG_DESTROYOBJ: destroy a stateful object (enum nft_object_attributes)
+ * @NFT_MSG_DESTROYFLOWTABLE: destroy flow table (enum nft_flowtable_attributes)
  */
 enum nf_tables_msg_types {
 	NFT_MSG_NEWTABLE,
@@ -126,6 +133,13 @@ enum nf_tables_msg_types {
 	NFT_MSG_GETFLOWTABLE,
 	NFT_MSG_DELFLOWTABLE,
 	NFT_MSG_GETRULE_RESET,
+	NFT_MSG_DESTROYTABLE,
+	NFT_MSG_DESTROYCHAIN,
+	NFT_MSG_DESTROYRULE,
+	NFT_MSG_DESTROYSET,
+	NFT_MSG_DESTROYSETELEM,
+	NFT_MSG_DESTROYOBJ,
+	NFT_MSG_DESTROYFLOWTABLE,
 	NFT_MSG_MAX,
 };
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 8c09e4d12ac1..974b95dece1d 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1401,6 +1401,10 @@ static int nf_tables_deltable(struct sk_buff *skb, const struct nfnl_info *info,
 	}
 
 	if (IS_ERR(table)) {
+		if (PTR_ERR(table) == -ENOENT &&
+		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYTABLE)
+			return 0;
+
 		NL_SET_BAD_ATTR(extack, attr);
 		return PTR_ERR(table);
 	}
@@ -2639,6 +2643,10 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info,
 		chain = nft_chain_lookup(net, table, attr, genmask);
 	}
 	if (IS_ERR(chain)) {
+		if (PTR_ERR(chain) == -ENOENT &&
+		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYCHAIN)
+			return 0;
+
 		NL_SET_BAD_ATTR(extack, attr);
 		return PTR_ERR(chain);
 	}
@@ -3716,6 +3724,10 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info,
 		chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN],
 					 genmask);
 		if (IS_ERR(chain)) {
+			if (PTR_ERR(rule) == -ENOENT &&
+			    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYRULE)
+				return 0;
+
 			NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN]);
 			return PTR_ERR(chain);
 		}
@@ -3729,6 +3741,10 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info,
 		if (nla[NFTA_RULE_HANDLE]) {
 			rule = nft_rule_lookup(chain, nla[NFTA_RULE_HANDLE]);
 			if (IS_ERR(rule)) {
+				if (PTR_ERR(rule) == -ENOENT &&
+				    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYRULE)
+					return 0;
+
 				NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_HANDLE]);
 				return PTR_ERR(rule);
 			}
@@ -4808,6 +4824,10 @@ static int nf_tables_delset(struct sk_buff *skb, const struct nfnl_info *info,
 	}
 
 	if (IS_ERR(set)) {
+		if (PTR_ERR(set) == -ENOENT &&
+		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYSET)
+			return 0;
+
 		NL_SET_BAD_ATTR(extack, attr);
 		return PTR_ERR(set);
 	}
@@ -6690,6 +6710,10 @@ static int nf_tables_delsetelem(struct sk_buff *skb,
 
 	nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
 		err = nft_del_setelem(&ctx, set, attr);
+		if (err == -ENOENT &&
+		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYSETELEM)
+			continue;
+
 		if (err < 0) {
 			NL_SET_BAD_ATTR(extack, attr);
 			break;
@@ -7334,6 +7358,10 @@ static int nf_tables_delobj(struct sk_buff *skb, const struct nfnl_info *info,
 	}
 
 	if (IS_ERR(obj)) {
+		if (PTR_ERR(obj) == -ENOENT &&
+		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYOBJ)
+			return 0;
+
 		NL_SET_BAD_ATTR(extack, attr);
 		return PTR_ERR(obj);
 	}
@@ -7964,6 +7992,10 @@ static int nf_tables_delflowtable(struct sk_buff *skb,
 	}
 
 	if (IS_ERR(flowtable)) {
+		if (PTR_ERR(flowtable) == -ENOENT &&
+		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYFLOWTABLE)
+			return 0;
+
 		NL_SET_BAD_ATTR(extack, attr);
 		return PTR_ERR(flowtable);
 	}
@@ -8373,6 +8405,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_TABLE_MAX,
 		.policy		= nft_table_policy,
 	},
+	[NFT_MSG_DESTROYTABLE] = {
+		.call		= nf_tables_deltable,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_TABLE_MAX,
+		.policy		= nft_table_policy,
+	},
 	[NFT_MSG_NEWCHAIN] = {
 		.call		= nf_tables_newchain,
 		.type		= NFNL_CB_BATCH,
@@ -8391,6 +8429,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_CHAIN_MAX,
 		.policy		= nft_chain_policy,
 	},
+	[NFT_MSG_DESTROYCHAIN] = {
+		.call		= nf_tables_delchain,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_CHAIN_MAX,
+		.policy		= nft_chain_policy,
+	},
 	[NFT_MSG_NEWRULE] = {
 		.call		= nf_tables_newrule,
 		.type		= NFNL_CB_BATCH,
@@ -8415,6 +8459,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_RULE_MAX,
 		.policy		= nft_rule_policy,
 	},
+	[NFT_MSG_DESTROYRULE] = {
+		.call		= nf_tables_delrule,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_RULE_MAX,
+		.policy		= nft_rule_policy,
+	},
 	[NFT_MSG_NEWSET] = {
 		.call		= nf_tables_newset,
 		.type		= NFNL_CB_BATCH,
@@ -8433,6 +8483,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_SET_MAX,
 		.policy		= nft_set_policy,
 	},
+	[NFT_MSG_DESTROYSET] = {
+		.call		= nf_tables_delset,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_SET_MAX,
+		.policy		= nft_set_policy,
+	},
 	[NFT_MSG_NEWSETELEM] = {
 		.call		= nf_tables_newsetelem,
 		.type		= NFNL_CB_BATCH,
@@ -8451,6 +8507,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_SET_ELEM_LIST_MAX,
 		.policy		= nft_set_elem_list_policy,
 	},
+	[NFT_MSG_DESTROYSETELEM] = {
+		.call		= nf_tables_delsetelem,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_SET_ELEM_LIST_MAX,
+		.policy		= nft_set_elem_list_policy,
+	},
 	[NFT_MSG_GETGEN] = {
 		.call		= nf_tables_getgen,
 		.type		= NFNL_CB_RCU,
@@ -8473,6 +8535,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_OBJ_MAX,
 		.policy		= nft_obj_policy,
 	},
+	[NFT_MSG_DESTROYOBJ] = {
+		.call		= nf_tables_delobj,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_OBJ_MAX,
+		.policy		= nft_obj_policy,
+	},
 	[NFT_MSG_GETOBJ_RESET] = {
 		.call		= nf_tables_getobj,
 		.type		= NFNL_CB_RCU,
@@ -8497,6 +8565,12 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = {
 		.attr_count	= NFTA_FLOWTABLE_MAX,
 		.policy		= nft_flowtable_policy,
 	},
+	[NFT_MSG_DESTROYFLOWTABLE] = {
+		.call		= nf_tables_delflowtable,
+		.type		= NFNL_CB_BATCH,
+		.attr_count	= NFTA_FLOWTABLE_MAX,
+		.policy		= nft_flowtable_policy,
+	},
 };
 
 static int nf_tables_validate(struct net *net)
@@ -8590,6 +8664,7 @@ static void nft_commit_release(struct nft_trans *trans)
 {
 	switch (trans->msg_type) {
 	case NFT_MSG_DELTABLE:
+	case NFT_MSG_DESTROYTABLE:
 		nf_tables_table_destroy(&trans->ctx);
 		break;
 	case NFT_MSG_NEWCHAIN:
@@ -8597,23 +8672,29 @@ static void nft_commit_release(struct nft_trans *trans)
 		kfree(nft_trans_chain_name(trans));
 		break;
 	case NFT_MSG_DELCHAIN:
+	case NFT_MSG_DESTROYCHAIN:
 		nf_tables_chain_destroy(&trans->ctx);
 		break;
 	case NFT_MSG_DELRULE:
+	case NFT_MSG_DESTROYRULE:
 		nf_tables_rule_destroy(&trans->ctx, nft_trans_rule(trans));
 		break;
 	case NFT_MSG_DELSET:
+	case NFT_MSG_DESTROYSET:
 		nft_set_destroy(&trans->ctx, nft_trans_set(trans));
 		break;
 	case NFT_MSG_DELSETELEM:
+	case NFT_MSG_DESTROYSETELEM:
 		nf_tables_set_elem_destroy(&trans->ctx,
 					   nft_trans_elem_set(trans),
 					   nft_trans_elem(trans).priv);
 		break;
 	case NFT_MSG_DELOBJ:
+	case NFT_MSG_DESTROYOBJ:
 		nft_obj_destroy(&trans->ctx, nft_trans_obj(trans));
 		break;
 	case NFT_MSG_DELFLOWTABLE:
+	case NFT_MSG_DESTROYFLOWTABLE:
 		if (nft_trans_flowtable_update(trans))
 			nft_flowtable_hooks_destroy(&nft_trans_flowtable_hooks(trans));
 		else
@@ -9065,8 +9146,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_DELTABLE:
+		case NFT_MSG_DESTROYTABLE:
 			list_del_rcu(&trans->ctx.table->list);
-			nf_tables_table_notify(&trans->ctx, NFT_MSG_DELTABLE);
+			nf_tables_table_notify(&trans->ctx, trans->msg_type);
 			break;
 		case NFT_MSG_NEWCHAIN:
 			if (nft_trans_chain_update(trans)) {
@@ -9081,8 +9163,9 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			}
 			break;
 		case NFT_MSG_DELCHAIN:
+		case NFT_MSG_DESTROYCHAIN:
 			nft_chain_del(trans->ctx.chain);
-			nf_tables_chain_notify(&trans->ctx, NFT_MSG_DELCHAIN);
+			nf_tables_chain_notify(&trans->ctx, trans->msg_type);
 			nf_tables_unregister_hook(trans->ctx.net,
 						  trans->ctx.table,
 						  trans->ctx.chain);
@@ -9098,10 +9181,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_DELRULE:
+		case NFT_MSG_DESTROYRULE:
 			list_del_rcu(&nft_trans_rule(trans)->list);
 			nf_tables_rule_notify(&trans->ctx,
 					      nft_trans_rule(trans),
-					      NFT_MSG_DELRULE);
+					      trans->msg_type);
 			nft_rule_expr_deactivate(&trans->ctx,
 						 nft_trans_rule(trans),
 						 NFT_TRANS_COMMIT);
@@ -9129,9 +9213,10 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_DELSET:
+		case NFT_MSG_DESTROYSET:
 			list_del_rcu(&nft_trans_set(trans)->list);
 			nf_tables_set_notify(&trans->ctx, nft_trans_set(trans),
-					     NFT_MSG_DELSET, GFP_KERNEL);
+					     trans->msg_type, GFP_KERNEL);
 			break;
 		case NFT_MSG_NEWSETELEM:
 			te = (struct nft_trans_elem *)trans->data;
@@ -9143,11 +9228,12 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_DELSETELEM:
+		case NFT_MSG_DESTROYSETELEM:
 			te = (struct nft_trans_elem *)trans->data;
 
 			nf_tables_setelem_notify(&trans->ctx, te->set,
 						 &te->elem,
-						 NFT_MSG_DELSETELEM);
+						 trans->msg_type);
 			nft_setelem_remove(net, te->set, &te->elem);
 			if (!nft_setelem_is_catchall(te->set, &te->elem)) {
 				atomic_dec(&te->set->nelems);
@@ -9169,9 +9255,10 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			}
 			break;
 		case NFT_MSG_DELOBJ:
+		case NFT_MSG_DESTROYOBJ:
 			nft_obj_del(nft_trans_obj(trans));
 			nf_tables_obj_notify(&trans->ctx, nft_trans_obj(trans),
-					     NFT_MSG_DELOBJ);
+					     trans->msg_type);
 			break;
 		case NFT_MSG_NEWFLOWTABLE:
 			if (nft_trans_flowtable_update(trans)) {
@@ -9193,11 +9280,12 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_DELFLOWTABLE:
+		case NFT_MSG_DESTROYFLOWTABLE:
 			if (nft_trans_flowtable_update(trans)) {
 				nf_tables_flowtable_notify(&trans->ctx,
 							   nft_trans_flowtable(trans),
 							   &nft_trans_flowtable_hooks(trans),
-							   NFT_MSG_DELFLOWTABLE);
+							   trans->msg_type);
 				nft_unregister_flowtable_net_hooks(net,
 								   &nft_trans_flowtable_hooks(trans));
 			} else {
@@ -9205,7 +9293,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 				nf_tables_flowtable_notify(&trans->ctx,
 							   nft_trans_flowtable(trans),
 							   &nft_trans_flowtable(trans)->hook_list,
-							   NFT_MSG_DELFLOWTABLE);
+							   trans->msg_type);
 				nft_unregister_flowtable_net_hooks(net,
 						&nft_trans_flowtable(trans)->hook_list);
 			}
@@ -9301,6 +9389,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			}
 			break;
 		case NFT_MSG_DELTABLE:
+		case NFT_MSG_DESTROYTABLE:
 			nft_clear(trans->ctx.net, trans->ctx.table);
 			nft_trans_destroy(trans);
 			break;
@@ -9322,6 +9411,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			}
 			break;
 		case NFT_MSG_DELCHAIN:
+		case NFT_MSG_DESTROYCHAIN:
 			trans->ctx.table->use++;
 			nft_clear(trans->ctx.net, trans->ctx.chain);
 			nft_trans_destroy(trans);
@@ -9336,6 +9426,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 				nft_flow_rule_destroy(nft_trans_flow_rule(trans));
 			break;
 		case NFT_MSG_DELRULE:
+		case NFT_MSG_DESTROYRULE:
 			trans->ctx.chain->use++;
 			nft_clear(trans->ctx.net, nft_trans_rule(trans));
 			nft_rule_expr_activate(&trans->ctx, nft_trans_rule(trans));
@@ -9357,6 +9448,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			list_del_rcu(&nft_trans_set(trans)->list);
 			break;
 		case NFT_MSG_DELSET:
+		case NFT_MSG_DESTROYSET:
 			trans->ctx.table->use++;
 			nft_clear(trans->ctx.net, nft_trans_set(trans));
 			nft_trans_destroy(trans);
@@ -9372,6 +9464,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 				atomic_dec(&te->set->nelems);
 			break;
 		case NFT_MSG_DELSETELEM:
+		case NFT_MSG_DESTROYSETELEM:
 			te = (struct nft_trans_elem *)trans->data;
 
 			nft_setelem_data_activate(net, te->set, &te->elem);
@@ -9391,6 +9484,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			}
 			break;
 		case NFT_MSG_DELOBJ:
+		case NFT_MSG_DESTROYOBJ:
 			trans->ctx.table->use++;
 			nft_clear(trans->ctx.net, nft_trans_obj(trans));
 			nft_trans_destroy(trans);
@@ -9407,6 +9501,7 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action)
 			}
 			break;
 		case NFT_MSG_DELFLOWTABLE:
+		case NFT_MSG_DESTROYFLOWTABLE:
 			if (nft_trans_flowtable_update(trans)) {
 				list_splice(&nft_trans_flowtable_hooks(trans),
 					    &nft_trans_flowtable(trans)->hook_list);
-- 
2.38.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets
  2023-01-18 12:32 ` [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets Florian Westphal
@ 2023-01-18 13:30   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-01-18 13:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, kuba, edumazet, pabeni, davem, netfilter-devel

Hello:

This series was applied to netdev/net-next.git (master)
by Florian Westphal <fw@strlen.de>:

On Wed, 18 Jan 2023 13:32:00 +0100 you wrote:
> The conntrack logging facilities include useful info such as in/out
> interface names and packet headers.
> 
> Use those in more places instead of pr_debug calls.
> Furthermore, several pr_debug calls can be removed, they are useless
> on production machines due to the sheer volume of log messages.
> 
> [...]

Here is the summary with links:
  - [net-next,1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets
    https://git.kernel.org/netdev/net-next/c/f71cb8f45d09
  - [net-next,2/9] netfilter: conntrack: remove pr_debug calls
    https://git.kernel.org/netdev/net-next/c/50bfbb8957ab
  - [net-next,3/9] netfilter: conntrack: avoid reload of ct->status
    https://git.kernel.org/netdev/net-next/c/4883ec512c17
  - [net-next,4/9] netfilter: conntrack: move rcu read lock to nf_conntrack_find_get
    https://git.kernel.org/netdev/net-next/c/2a2fa2efc65f
  - [net-next,5/9] netfilter: ip_tables: remove clusterip target
    https://git.kernel.org/netdev/net-next/c/9db5d918e2c0
  - [net-next,6/9] netfilter: nf_tables: add static key to skip retpoline workarounds
    https://git.kernel.org/netdev/net-next/c/d8d760627855
  - [net-next,7/9] netfilter: nf_tables: avoid retpoline overhead for objref calls
    https://git.kernel.org/netdev/net-next/c/2032e907d8d4
  - [net-next,8/9] netfilter: nf_tables: avoid retpoline overhead for some ct expression calls
    https://git.kernel.org/netdev/net-next/c/d9e789147605
  - [net-next,9/9] netfilter: nf_tables: add support to destroy operation
    https://git.kernel.org/netdev/net-next/c/f80a612dd77c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation
  2023-01-18 12:32 ` [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation Florian Westphal
@ 2023-01-19  7:29   ` Vlad Buslov
  2023-01-20  9:58     ` Fernando F. Mancera
  0 siblings, 1 reply; 15+ messages in thread
From: Vlad Buslov @ 2023-01-19  7:29 UTC (permalink / raw)
  To: Florian Westphal, Fernando Fernandez Mancera
  Cc: netdev, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	David S. Miller, netfilter-devel, Pablo Neira Ayuso,
	Maor Dickman

On Wed 18 Jan 2023 at 13:32, Florian Westphal <fw@strlen.de> wrote:
> From: Fernando Fernandez Mancera <ffmancera@riseup.net>
>
> Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a
> delete operation but ignoring the ENOENT errors.
>
> This is useful for the transaction semantics, where failing to delete an
> object which does not exist results in aborting the transaction.
>
> This new command allows the transaction to proceed in case the object
> does not exist.
>
> Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  include/uapi/linux/netfilter/nf_tables.h |  14 +++
>  net/netfilter/nf_tables_api.c            | 111 +++++++++++++++++++++--
>  2 files changed, 117 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
> index cfa844da1ce6..ff677f3a6cad 100644
> --- a/include/uapi/linux/netfilter/nf_tables.h
> +++ b/include/uapi/linux/netfilter/nf_tables.h
> @@ -98,6 +98,13 @@ enum nft_verdicts {
>   * @NFT_MSG_GETFLOWTABLE: get flow table (enum nft_flowtable_attributes)
>   * @NFT_MSG_DELFLOWTABLE: delete flow table (enum nft_flowtable_attributes)
>   * @NFT_MSG_GETRULE_RESET: get rules and reset stateful expressions (enum nft_obj_attributes)
> + * @NFT_MSG_DESTROYTABLE: destroy a table (enum nft_table_attributes)
> + * @NFT_MSG_DESTROYCHAIN: destroy a chain (enum nft_chain_attributes)
> + * @NFT_MSG_DESTROYRULE: destroy a rule (enum nft_rule_attributes)
> + * @NFT_MSG_DESTROYSET: destroy a set (enum nft_set_attributes)
> + * @NFT_MSG_DESTROYSETELEM: destroy a set element (enum nft_set_elem_attributes)
> + * @NFT_MSG_DESTROYOBJ: destroy a stateful object (enum nft_object_attributes)
> + * @NFT_MSG_DESTROYFLOWTABLE: destroy flow table (enum nft_flowtable_attributes)
>   */
>  enum nf_tables_msg_types {
>  	NFT_MSG_NEWTABLE,
> @@ -126,6 +133,13 @@ enum nf_tables_msg_types {
>  	NFT_MSG_GETFLOWTABLE,
>  	NFT_MSG_DELFLOWTABLE,
>  	NFT_MSG_GETRULE_RESET,
> +	NFT_MSG_DESTROYTABLE,
> +	NFT_MSG_DESTROYCHAIN,
> +	NFT_MSG_DESTROYRULE,
> +	NFT_MSG_DESTROYSET,
> +	NFT_MSG_DESTROYSETELEM,
> +	NFT_MSG_DESTROYOBJ,
> +	NFT_MSG_DESTROYFLOWTABLE,
>  	NFT_MSG_MAX,
>  };
>  
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 8c09e4d12ac1..974b95dece1d 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -1401,6 +1401,10 @@ static int nf_tables_deltable(struct sk_buff *skb, const struct nfnl_info *info,
>  	}
>  
>  	if (IS_ERR(table)) {
> +		if (PTR_ERR(table) == -ENOENT &&
> +		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYTABLE)
> +			return 0;
> +
>  		NL_SET_BAD_ATTR(extack, attr);
>  		return PTR_ERR(table);
>  	}
> @@ -2639,6 +2643,10 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info,
>  		chain = nft_chain_lookup(net, table, attr, genmask);
>  	}
>  	if (IS_ERR(chain)) {
> +		if (PTR_ERR(chain) == -ENOENT &&
> +		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYCHAIN)
> +			return 0;
> +
>  		NL_SET_BAD_ATTR(extack, attr);
>  		return PTR_ERR(chain);
>  	}
> @@ -3716,6 +3724,10 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info,
>  		chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN],
>  					 genmask);
>  		if (IS_ERR(chain)) {
> +			if (PTR_ERR(rule) == -ENOENT &&

Coverity complains that at this point rule is not initialized yet, which
looks like to be the case to me.

[...]


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation
  2023-01-19  7:29   ` Vlad Buslov
@ 2023-01-20  9:58     ` Fernando F. Mancera
  2023-01-20 10:06       ` Fernando F. Mancera
  0 siblings, 1 reply; 15+ messages in thread
From: Fernando F. Mancera @ 2023-01-20  9:58 UTC (permalink / raw)
  To: Vlad Buslov, Florian Westphal
  Cc: netdev, Jakub Kicinski, Eric Dumazet, Paolo Abeni,
	David S. Miller, netfilter-devel, Pablo Neira Ayuso,
	Maor Dickman

Hi Vlad,

On 19/01/2023 08:29, Vlad Buslov wrote:
> On Wed 18 Jan 2023 at 13:32, Florian Westphal <fw@strlen.de> wrote:
>> From: Fernando Fernandez Mancera <ffmancera@riseup.net>
>>
>> Introduce NFT_MSG_DESTROY* message type. The destroy operation performs a
>> delete operation but ignoring the ENOENT errors.
>>
>> This is useful for the transaction semantics, where failing to delete an
>> object which does not exist results in aborting the transaction.
>>
>> This new command allows the transaction to proceed in case the object
>> does not exist.
>>
>> Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> ---
>>   include/uapi/linux/netfilter/nf_tables.h |  14 +++
>>   net/netfilter/nf_tables_api.c            | 111 +++++++++++++++++++++--
>>   2 files changed, 117 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
>> index cfa844da1ce6..ff677f3a6cad 100644
>> --- a/include/uapi/linux/netfilter/nf_tables.h
>> +++ b/include/uapi/linux/netfilter/nf_tables.h
>> @@ -98,6 +98,13 @@ enum nft_verdicts {
>>    * @NFT_MSG_GETFLOWTABLE: get flow table (enum nft_flowtable_attributes)
>>    * @NFT_MSG_DELFLOWTABLE: delete flow table (enum nft_flowtable_attributes)
>>    * @NFT_MSG_GETRULE_RESET: get rules and reset stateful expressions (enum nft_obj_attributes)
>> + * @NFT_MSG_DESTROYTABLE: destroy a table (enum nft_table_attributes)
>> + * @NFT_MSG_DESTROYCHAIN: destroy a chain (enum nft_chain_attributes)
>> + * @NFT_MSG_DESTROYRULE: destroy a rule (enum nft_rule_attributes)
>> + * @NFT_MSG_DESTROYSET: destroy a set (enum nft_set_attributes)
>> + * @NFT_MSG_DESTROYSETELEM: destroy a set element (enum nft_set_elem_attributes)
>> + * @NFT_MSG_DESTROYOBJ: destroy a stateful object (enum nft_object_attributes)
>> + * @NFT_MSG_DESTROYFLOWTABLE: destroy flow table (enum nft_flowtable_attributes)
>>    */
>>   enum nf_tables_msg_types {
>>   	NFT_MSG_NEWTABLE,
>> @@ -126,6 +133,13 @@ enum nf_tables_msg_types {
>>   	NFT_MSG_GETFLOWTABLE,
>>   	NFT_MSG_DELFLOWTABLE,
>>   	NFT_MSG_GETRULE_RESET,
>> +	NFT_MSG_DESTROYTABLE,
>> +	NFT_MSG_DESTROYCHAIN,
>> +	NFT_MSG_DESTROYRULE,
>> +	NFT_MSG_DESTROYSET,
>> +	NFT_MSG_DESTROYSETELEM,
>> +	NFT_MSG_DESTROYOBJ,
>> +	NFT_MSG_DESTROYFLOWTABLE,
>>   	NFT_MSG_MAX,
>>   };
>>   
>> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
>> index 8c09e4d12ac1..974b95dece1d 100644
>> --- a/net/netfilter/nf_tables_api.c
>> +++ b/net/netfilter/nf_tables_api.c
>> @@ -1401,6 +1401,10 @@ static int nf_tables_deltable(struct sk_buff *skb, const struct nfnl_info *info,
>>   	}
>>   
>>   	if (IS_ERR(table)) {
>> +		if (PTR_ERR(table) == -ENOENT &&
>> +		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYTABLE)
>> +			return 0;
>> +
>>   		NL_SET_BAD_ATTR(extack, attr);
>>   		return PTR_ERR(table);
>>   	}
>> @@ -2639,6 +2643,10 @@ static int nf_tables_delchain(struct sk_buff *skb, const struct nfnl_info *info,
>>   		chain = nft_chain_lookup(net, table, attr, genmask);
>>   	}
>>   	if (IS_ERR(chain)) {
>> +		if (PTR_ERR(chain) == -ENOENT &&
>> +		    NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_DESTROYCHAIN)
>> +			return 0;
>> +
>>   		NL_SET_BAD_ATTR(extack, attr);
>>   		return PTR_ERR(chain);
>>   	}
>> @@ -3716,6 +3724,10 @@ static int nf_tables_delrule(struct sk_buff *skb, const struct nfnl_info *info,
>>   		chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN],
>>   					 genmask);
>>   		if (IS_ERR(chain)) {
>> +			if (PTR_ERR(rule) == -ENOENT &&
> 
> Coverity complains that at this point rule is not initialized yet, which
> looks like to be the case to me.
> 

Thanks, I am sending a patch fixing this.

> [...]
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation
  2023-01-20  9:58     ` Fernando F. Mancera
@ 2023-01-20 10:06       ` Fernando F. Mancera
  0 siblings, 0 replies; 15+ messages in thread
From: Fernando F. Mancera @ 2023-01-20 10:06 UTC (permalink / raw)
  To: Vlad Buslov, Florian Westphal
  Cc: netdev, netfilter-devel, Pablo Neira Ayuso, Maor Dickman, yangyingliang

On 20/01/2023 10:58, Fernando F. Mancera wrote:
> Hi Vlad,
> 
> On 19/01/2023 08:29, Vlad Buslov wrote:
>> On Wed 18 Jan 2023 at 13:32, Florian Westphal <fw@strlen.de> wrote:
>>> From: Fernando Fernandez Mancera <ffmancera@riseup.net>
>>>
>>> Introduce NFT_MSG_DESTROY* message type. The destroy operation 
>>> performs a
>>> delete operation but ignoring the ENOENT errors.
>>>
>>> This is useful for the transaction semantics, where failing to delete an
>>> object which does not exist results in aborting the transaction.
>>>
>>> This new command allows the transaction to proceed in case the object
>>> does not exist.
>>>
>>> Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>> ---
>>>   include/uapi/linux/netfilter/nf_tables.h |  14 +++
>>>   net/netfilter/nf_tables_api.c            | 111 +++++++++++++++++++++--
>>>   2 files changed, 117 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/netfilter/nf_tables.h 
>>> b/include/uapi/linux/netfilter/nf_tables.h
>>> index cfa844da1ce6..ff677f3a6cad 100644
>>> --- a/include/uapi/linux/netfilter/nf_tables.h
>>> +++ b/include/uapi/linux/netfilter/nf_tables.h
>>> @@ -98,6 +98,13 @@ enum nft_verdicts {
>>>    * @NFT_MSG_GETFLOWTABLE: get flow table (enum 
>>> nft_flowtable_attributes)
>>>    * @NFT_MSG_DELFLOWTABLE: delete flow table (enum 
>>> nft_flowtable_attributes)
>>>    * @NFT_MSG_GETRULE_RESET: get rules and reset stateful expressions 
>>> (enum nft_obj_attributes)
>>> + * @NFT_MSG_DESTROYTABLE: destroy a table (enum nft_table_attributes)
>>> + * @NFT_MSG_DESTROYCHAIN: destroy a chain (enum nft_chain_attributes)
>>> + * @NFT_MSG_DESTROYRULE: destroy a rule (enum nft_rule_attributes)
>>> + * @NFT_MSG_DESTROYSET: destroy a set (enum nft_set_attributes)
>>> + * @NFT_MSG_DESTROYSETELEM: destroy a set element (enum 
>>> nft_set_elem_attributes)
>>> + * @NFT_MSG_DESTROYOBJ: destroy a stateful object (enum 
>>> nft_object_attributes)
>>> + * @NFT_MSG_DESTROYFLOWTABLE: destroy flow table (enum 
>>> nft_flowtable_attributes)
>>>    */
>>>   enum nf_tables_msg_types {
>>>       NFT_MSG_NEWTABLE,
>>> @@ -126,6 +133,13 @@ enum nf_tables_msg_types {
>>>       NFT_MSG_GETFLOWTABLE,
>>>       NFT_MSG_DELFLOWTABLE,
>>>       NFT_MSG_GETRULE_RESET,
>>> +    NFT_MSG_DESTROYTABLE,
>>> +    NFT_MSG_DESTROYCHAIN,
>>> +    NFT_MSG_DESTROYRULE,
>>> +    NFT_MSG_DESTROYSET,
>>> +    NFT_MSG_DESTROYSETELEM,
>>> +    NFT_MSG_DESTROYOBJ,
>>> +    NFT_MSG_DESTROYFLOWTABLE,
>>>       NFT_MSG_MAX,
>>>   };
>>> diff --git a/net/netfilter/nf_tables_api.c 
>>> b/net/netfilter/nf_tables_api.c
>>> index 8c09e4d12ac1..974b95dece1d 100644
>>> --- a/net/netfilter/nf_tables_api.c
>>> +++ b/net/netfilter/nf_tables_api.c
>>> @@ -1401,6 +1401,10 @@ static int nf_tables_deltable(struct sk_buff 
>>> *skb, const struct nfnl_info *info,
>>>       }
>>>       if (IS_ERR(table)) {
>>> +        if (PTR_ERR(table) == -ENOENT &&
>>> +            NFNL_MSG_TYPE(info->nlh->nlmsg_type) == 
>>> NFT_MSG_DESTROYTABLE)
>>> +            return 0;
>>> +
>>>           NL_SET_BAD_ATTR(extack, attr);
>>>           return PTR_ERR(table);
>>>       }
>>> @@ -2639,6 +2643,10 @@ static int nf_tables_delchain(struct sk_buff 
>>> *skb, const struct nfnl_info *info,
>>>           chain = nft_chain_lookup(net, table, attr, genmask);
>>>       }
>>>       if (IS_ERR(chain)) {
>>> +        if (PTR_ERR(chain) == -ENOENT &&
>>> +            NFNL_MSG_TYPE(info->nlh->nlmsg_type) == 
>>> NFT_MSG_DESTROYCHAIN)
>>> +            return 0;
>>> +
>>>           NL_SET_BAD_ATTR(extack, attr);
>>>           return PTR_ERR(chain);
>>>       }
>>> @@ -3716,6 +3724,10 @@ static int nf_tables_delrule(struct sk_buff 
>>> *skb, const struct nfnl_info *info,
>>>           chain = nft_chain_lookup(net, table, nla[NFTA_RULE_CHAIN],
>>>                        genmask);
>>>           if (IS_ERR(chain)) {
>>> +            if (PTR_ERR(rule) == -ENOENT &&
>>
>> Coverity complains that at this point rule is not initialized yet, which
>> looks like to be the case to me.
>>
> 
> Thanks, I am sending a patch fixing this.
> 
>> [...]
>>

There is already a patch sent by Yang Yingliang, 
https://lore.kernel.org/netfilter-devel/20230119075125.3598627-1-yangyingliang@huawei.com/T/#u

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/9] netfilter: conntrack: avoid reload of ct->status
  2023-01-18 12:32 ` [PATCH net-next 3/9] netfilter: conntrack: avoid reload of ct->status Florian Westphal
@ 2023-01-23 11:38   ` Roi Dayan
  0 siblings, 0 replies; 15+ messages in thread
From: Roi Dayan @ 2023-01-23 11:38 UTC (permalink / raw)
  To: Florian Westphal, netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Maor Dickman



On 18/01/2023 14:32, Florian Westphal wrote:
> Compiler can't merge the two test_bit() calls, so load ct->status
> once and use non-atomic accesses.
> 
> This is fine because IPS_EXPECTED or NAT_CLASH are either set at ct
> creation time or not at all, but compiler can't know that.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nf_conntrack_core.c      |  9 +++++----
>  net/netfilter/nf_conntrack_proto_udp.c | 10 ++++++----
>  2 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 81ece117033a..9e12cade4e0f 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1854,14 +1854,15 @@ resolve_normal_ct(struct nf_conn *tmpl,
>  	if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY) {
>  		ctinfo = IP_CT_ESTABLISHED_REPLY;
>  	} else {
> +		unsigned long status = READ_ONCE(ct->status);
> +
>  		/* Once we've had two way comms, always ESTABLISHED. */
> -		if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
> +		if (likely(status & IPS_SEEN_REPLY))
>  			ctinfo = IP_CT_ESTABLISHED;
> -		} else if (test_bit(IPS_EXPECTED_BIT, &ct->status)) {
> +		else if (status & IPS_EXPECTED)
>  			ctinfo = IP_CT_RELATED;
> -		} else {
> +		else
>  			ctinfo = IP_CT_NEW;
> -		}
>  	}
>  	nf_ct_set(skb, ct, ctinfo);
>  	return 0;
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> index 3b516cffc779..6b9206635b24 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -88,6 +88,7 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
>  			    const struct nf_hook_state *state)
>  {
>  	unsigned int *timeouts;
> +	unsigned long status;
>  
>  	if (udp_error(skb, dataoff, state))
>  		return -NF_ACCEPT;
> @@ -96,26 +97,27 @@ int nf_conntrack_udp_packet(struct nf_conn *ct,
>  	if (!timeouts)
>  		timeouts = udp_get_timeouts(nf_ct_net(ct));
>  
> -	if (!nf_ct_is_confirmed(ct))
> +	status = READ_ONCE(ct->status);
> +	if ((status & IPS_CONFIRMED) == 0)
>  		ct->proto.udp.stream_ts = 2 * HZ + jiffies;
>  
>  	/* If we've seen traffic both ways, this is some kind of UDP
>  	 * stream. Set Assured.
>  	 */
> -	if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) {
> +	if (status & IPS_SEEN_REPLY_BIT) {

Hi,

This change has a bug not matching reply status anymore.
Since you don't use test_bit() you should use IPS_SEEN_REPLY
instead of IPS_SEEN_REPLY_BIT.

Thanks,
Roi

>  		unsigned long extra = timeouts[UDP_CT_UNREPLIED];
>  		bool stream = false;
>  
>  		/* Still active after two seconds? Extend timeout. */
>  		if (time_after(jiffies, ct->proto.udp.stream_ts)) {
>  			extra = timeouts[UDP_CT_REPLIED];
> -			stream = true;
> +			stream = (status & IPS_ASSURED) == 0;
>  		}
>  
>  		nf_ct_refresh_acct(ct, ctinfo, skb, extra);
>  
>  		/* never set ASSURED for IPS_NAT_CLASH, they time out soon */
> -		if (unlikely((ct->status & IPS_NAT_CLASH)))
> +		if (unlikely((status & IPS_NAT_CLASH)))
>  			return NF_ACCEPT;
>  
>  		/* Also, more likely to be important, and not a probe */

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-01-23 11:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-18 12:31 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 1/9] netfilter: conntrack: sctp: use nf log infrastructure for invalid packets Florian Westphal
2023-01-18 13:30   ` patchwork-bot+netdevbpf
2023-01-18 12:32 ` [PATCH net-next 2/9] netfilter: conntrack: remove pr_debug calls Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 3/9] netfilter: conntrack: avoid reload of ct->status Florian Westphal
2023-01-23 11:38   ` Roi Dayan
2023-01-18 12:32 ` [PATCH net-next 4/9] netfilter: conntrack: move rcu read lock to nf_conntrack_find_get Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 5/9] netfilter: ip_tables: remove clusterip target Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 6/9] netfilter: nf_tables: add static key to skip retpoline workarounds Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 7/9] netfilter: nf_tables: avoid retpoline overhead for objref calls Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 8/9] netfilter: nf_tables: avoid retpoline overhead for some ct expression calls Florian Westphal
2023-01-18 12:32 ` [PATCH net-next 9/9] netfilter: nf_tables: add support to destroy operation Florian Westphal
2023-01-19  7:29   ` Vlad Buslov
2023-01-20  9:58     ` Fernando F. Mancera
2023-01-20 10:06       ` Fernando F. Mancera

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).