netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] Netfilter updates for net-next
@ 2023-05-18 10:07 Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks Florian Westphal
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel

Hello,

[ sorry if you get this twice, wrong mail aliases in v1 ]

this PR contains updates for your *net-next* tree.

nftables updates:

1. Allow key existence checks with maps.
   At the moment the kernel requires userspace to pass a destination
   register for the associated value, make this optional so userspace
   can query if the key exists, just like with normal sets.

2. nftables maintains a counter per set that holds the number of
   elements.  This counter gets decremented on element removal,
   but its only incremented if the set has a upper maximum value.
   Increment unconditionally, this will allow us to update the
   maximum value later on.

3. At DCCP option maching, from Jeremy Sowden.

4. use struct_size macro, from Christophe JAILLET.

Conntrack:

5. Squash holes in struct nf_conntrack_expect, also Christophe JAILLET.

6. Allow clash resolution for GRE Protocol to avoid a packet drop,
   from Faicker Mo.

Flowtable:

Simplify route logic and split large functions into smaller
chunks, from Pablo Neira Ayuso.

The following changes since commit b50a8b0d57ab1ef11492171e98a030f48682eac3:

  net: openvswitch: Use struct_size() (2023-05-17 21:25:46 -0700)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next.git tags/nf-next-2023-05-18

for you to fetch changes up to e05b5362166b18a224c30502e81416e4d622d3e4:

  netfilter: flowtable: split IPv6 datapath in helper functions (2023-05-18 08:48:55 +0200)

----------------------------------------------------------------
Christophe JAILLET (2):
      netfilter: Reorder fields in 'struct nf_conntrack_expect'
      netfilter: nft_set_pipapo: Use struct_size()

Faicker Mo (1):
      netfilter: conntrack: allow insertion clash of gre protocol

Florian Westphal (2):
      netfilter: nf_tables: relax set/map validation checks
      netfilter: nf_tables: always increment set element count

Jeremy Sowden (1):
      netfilter: nft_exthdr: add boolean DCCP option matching

Pablo Neira Ayuso (3):
      netfilter: flowtable: simplify route logic
      netfilter: flowtable: split IPv4 datapath in helper functions
      netfilter: flowtable: split IPv6 datapath in helper functions

 include/net/netfilter/nf_conntrack_expect.h |  18 +--
 include/net/netfilter/nf_flow_table.h       |   4 +-
 include/uapi/linux/netfilter/nf_tables.h    |   2 +
 net/netfilter/nf_conntrack_proto_gre.c      |   1 +
 net/netfilter/nf_flow_table_core.c          |  24 +--
 net/netfilter/nf_flow_table_ip.c            | 231 ++++++++++++++++++----------
 net/netfilter/nf_tables_api.c               |  11 +-
 net/netfilter/nft_exthdr.c                  | 106 +++++++++++++
 net/netfilter/nft_flow_offload.c            |  12 +-
 net/netfilter/nft_lookup.c                  |  23 ++-
 net/netfilter/nft_set_pipapo.c              |   6 +-
 11 files changed, 303 insertions(+), 135 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 22:50   ` patchwork-bot+netdevbpf
  2023-05-18 10:07 ` [PATCH net-next 2/9] netfilter: nf_tables: always increment set element count Florian Westphal
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel

Its currently not allowed to perform queries on a map, for example:

table t {
	map m {
		typeof ip saddr : meta mark
		..

	chain c {
		ip saddr @m counter

will fail, because kernel requires that userspace provides a destination
register when the referenced set is a map.

However, internally there is no real distinction between sets and maps,
maps are just sets where each key is associated with a value.

Relax this so that maps can be used just like sets.

This allows to have rules that query if a given key exists
without making use of the associated value.

This also permits != checks which don't work for map lookups.

When no destination reg is given for a map, then permit this for named
maps.

Data and dump paths need to be updated to consider priv->dreg_set
instead of the 'set-is-a-map' check.

Checks in reduce and validate callbacks are not changed, this
can be relaxed later if a need arises.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_lookup.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index 03ef4fdaa460..29ac48cdd6db 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -19,6 +19,7 @@ struct nft_lookup {
 	struct nft_set			*set;
 	u8				sreg;
 	u8				dreg;
+	bool				dreg_set;
 	bool				invert;
 	struct nft_set_binding		binding;
 };
@@ -75,7 +76,7 @@ void nft_lookup_eval(const struct nft_expr *expr,
 	}
 
 	if (ext) {
-		if (set->flags & NFT_SET_MAP)
+		if (priv->dreg_set)
 			nft_data_copy(&regs->data[priv->dreg],
 				      nft_set_ext_data(ext), set->dlen);
 
@@ -122,11 +123,8 @@ static int nft_lookup_init(const struct nft_ctx *ctx,
 		if (flags & ~NFT_LOOKUP_F_INV)
 			return -EINVAL;
 
-		if (flags & NFT_LOOKUP_F_INV) {
-			if (set->flags & NFT_SET_MAP)
-				return -EINVAL;
+		if (flags & NFT_LOOKUP_F_INV)
 			priv->invert = true;
-		}
 	}
 
 	if (tb[NFTA_LOOKUP_DREG] != NULL) {
@@ -140,8 +138,17 @@ static int nft_lookup_init(const struct nft_ctx *ctx,
 					       set->dlen);
 		if (err < 0)
 			return err;
-	} else if (set->flags & NFT_SET_MAP)
-		return -EINVAL;
+		priv->dreg_set = true;
+	} else if (set->flags & NFT_SET_MAP) {
+		/* Map given, but user asks for lookup only (i.e. to
+		 * ignore value assoicated with key).
+		 *
+		 * This makes no sense for anonymous maps since they are
+		 * scoped to the rule, but for named sets this can be useful.
+		 */
+		if (set->flags & NFT_SET_ANONYMOUS)
+			return -EINVAL;
+	}
 
 	priv->binding.flags = set->flags & NFT_SET_MAP;
 
@@ -188,7 +195,7 @@ static int nft_lookup_dump(struct sk_buff *skb,
 		goto nla_put_failure;
 	if (nft_dump_register(skb, NFTA_LOOKUP_SREG, priv->sreg))
 		goto nla_put_failure;
-	if (priv->set->flags & NFT_SET_MAP)
+	if (priv->dreg_set)
 		if (nft_dump_register(skb, NFTA_LOOKUP_DREG, priv->dreg))
 			goto nla_put_failure;
 	if (nla_put_be32(skb, NFTA_LOOKUP_FLAGS, htonl(flags)))
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 2/9] netfilter: nf_tables: always increment set element count
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching Florian Westphal
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel

At this time, set->nelems counter only increments when the set has
a maximum size.

All set elements decrement the counter unconditionally, this is
confusing.

Increment the counter unconditionally to make this symmetrical.
This would also allow changing the set maximum size after set creation
in a later patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_tables_api.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 59fb8320ab4d..7a61de80a8d1 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6541,10 +6541,13 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err_element_clash;
 	}
 
-	if (!(flags & NFT_SET_ELEM_CATCHALL) && set->size &&
-	    !atomic_add_unless(&set->nelems, 1, set->size + set->ndeact)) {
-		err = -ENFILE;
-		goto err_set_full;
+	if (!(flags & NFT_SET_ELEM_CATCHALL)) {
+		unsigned int max = set->size ? set->size + set->ndeact : UINT_MAX;
+
+		if (!atomic_add_unless(&set->nelems, 1, max)) {
+			err = -ENFILE;
+			goto err_set_full;
+		}
 	}
 
 	nft_trans_elem(trans) = elem;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 2/9] netfilter: nf_tables: always increment set element count Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 21:04   ` Jakub Kicinski
  2023-05-18 10:07 ` [PATCH net-next 4/9] netfilter: Reorder fields in 'struct nf_conntrack_expect' Florian Westphal
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Jeremy Sowden

From: Jeremy Sowden <jeremy@azazel.net>

The xt_dccp iptables module supports the matching of DCCP packets based
on the presence or absence of DCCP options.  Extend nft_exthdr to add
this functionality to nftables.

Link: https://bugzilla.netfilter.org/show_bug.cgi?id=930
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/uapi/linux/netfilter/nf_tables.h |   2 +
 net/netfilter/nft_exthdr.c               | 106 +++++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index c4d4d8e42dc8..e059dc2644df 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -859,12 +859,14 @@ enum nft_exthdr_flags {
  * @NFT_EXTHDR_OP_TCP: match against tcp options
  * @NFT_EXTHDR_OP_IPV4: match against ipv4 options
  * @NFT_EXTHDR_OP_SCTP: match against sctp chunks
+ * @NFT_EXTHDR_OP_DCCP: match against dccp otions
  */
 enum nft_exthdr_op {
 	NFT_EXTHDR_OP_IPV6,
 	NFT_EXTHDR_OP_TCPOPT,
 	NFT_EXTHDR_OP_IPV4,
 	NFT_EXTHDR_OP_SCTP,
+	NFT_EXTHDR_OP_DCCP,
 	__NFT_EXTHDR_OP_MAX
 };
 #define NFT_EXTHDR_OP_MAX	(__NFT_EXTHDR_OP_MAX - 1)
diff --git a/net/netfilter/nft_exthdr.c b/net/netfilter/nft_exthdr.c
index a54a7f772cec..671474e59817 100644
--- a/net/netfilter/nft_exthdr.c
+++ b/net/netfilter/nft_exthdr.c
@@ -10,6 +10,7 @@
 #include <linux/netlink.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter/nf_tables.h>
+#include <linux/dccp.h>
 #include <linux/sctp.h>
 #include <net/netfilter/nf_tables_core.h>
 #include <net/netfilter/nf_tables.h>
@@ -406,6 +407,82 @@ static void nft_exthdr_sctp_eval(const struct nft_expr *expr,
 		regs->verdict.code = NFT_BREAK;
 }
 
+static void nft_exthdr_dccp_eval(const struct nft_expr *expr,
+				 struct nft_regs *regs,
+				 const struct nft_pktinfo *pkt)
+{
+	struct nft_exthdr *priv = nft_expr_priv(expr);
+	unsigned int thoff, dataoff, optoff, optlen, i;
+	u32 *dest = &regs->data[priv->dreg];
+	const struct dccp_hdr *dh;
+	struct dccp_hdr _dh;
+
+	if (pkt->tprot != IPPROTO_DCCP || pkt->fragoff)
+		goto err;
+
+	thoff = nft_thoff(pkt);
+
+	dh = skb_header_pointer(pkt->skb, thoff, sizeof(_dh), &_dh);
+	if (!dh)
+		goto err;
+
+	dataoff = dh->dccph_doff * sizeof(u32);
+	optoff = __dccp_hdr_len(dh);
+	if (dataoff <= optoff)
+		goto err;
+
+	optlen = dataoff - optoff;
+
+	for (i = 0; i < optlen; ) {
+		/* Options 0 (DCCPO_PADDING) - 31 (DCCPO_MAX_RESERVED) are 1B in
+		 * the length; the remaining options are at least 2B long.  In
+		 * all cases, the first byte contains the option type.  In
+		 * multi-byte options, the second byte contains the option
+		 * length, which must be at least two: 1 for the type plus 1 for
+		 * the length plus 0-253 for any following option data.  We
+		 * aren't interested in the option data, only the type and the
+		 * length, so we don't need to read more than two bytes at a
+		 * time.
+		 */
+		unsigned int buflen = optlen - i;
+		u8 buf[2], *bufp;
+		u8 type, len;
+
+		if (buflen > sizeof(buf))
+			buflen = sizeof(buf);
+
+		bufp = skb_header_pointer(pkt->skb, thoff + optoff + i, buflen,
+					  &buf);
+		if (!bufp)
+			goto err;
+
+		type = bufp[0];
+
+		if (type == priv->type) {
+			*dest = 1;
+			return;
+		}
+
+		if (type <= DCCPO_MAX_RESERVED) {
+			i++;
+			continue;
+		}
+
+		if (buflen < 2)
+			goto err;
+
+		len = bufp[1];
+
+		if (len < 2)
+			goto err;
+
+		i += len;
+	}
+
+err:
+	*dest = 0;
+}
+
 static const struct nla_policy nft_exthdr_policy[NFTA_EXTHDR_MAX + 1] = {
 	[NFTA_EXTHDR_DREG]		= { .type = NLA_U32 },
 	[NFTA_EXTHDR_TYPE]		= { .type = NLA_U8 },
@@ -557,6 +634,22 @@ static int nft_exthdr_ipv4_init(const struct nft_ctx *ctx,
 	return 0;
 }
 
+static int nft_exthdr_dccp_init(const struct nft_ctx *ctx,
+				const struct nft_expr *expr,
+				const struct nlattr * const tb[])
+{
+	struct nft_exthdr *priv = nft_expr_priv(expr);
+	int err = nft_exthdr_init(ctx, expr, tb);
+
+	if (err < 0)
+		return err;
+
+	if (!(priv->flags & NFT_EXTHDR_F_PRESENT))
+		return -EOPNOTSUPP;
+
+	return 0;
+}
+
 static int nft_exthdr_dump_common(struct sk_buff *skb, const struct nft_exthdr *priv)
 {
 	if (nla_put_u8(skb, NFTA_EXTHDR_TYPE, priv->type))
@@ -686,6 +779,15 @@ static const struct nft_expr_ops nft_exthdr_sctp_ops = {
 	.reduce		= nft_exthdr_reduce,
 };
 
+static const struct nft_expr_ops nft_exthdr_dccp_ops = {
+	.type		= &nft_exthdr_type,
+	.size		= NFT_EXPR_SIZE(sizeof(struct nft_exthdr)),
+	.eval		= nft_exthdr_dccp_eval,
+	.init		= nft_exthdr_dccp_init,
+	.dump		= nft_exthdr_dump,
+	.reduce		= nft_exthdr_reduce,
+};
+
 static const struct nft_expr_ops *
 nft_exthdr_select_ops(const struct nft_ctx *ctx,
 		      const struct nlattr * const tb[])
@@ -720,6 +822,10 @@ nft_exthdr_select_ops(const struct nft_ctx *ctx,
 		if (tb[NFTA_EXTHDR_DREG])
 			return &nft_exthdr_sctp_ops;
 		break;
+	case NFT_EXTHDR_OP_DCCP:
+		if (tb[NFTA_EXTHDR_DREG])
+			return &nft_exthdr_dccp_ops;
+		break;
 	}
 
 	return ERR_PTR(-EOPNOTSUPP);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 4/9] netfilter: Reorder fields in 'struct nf_conntrack_expect'
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (2 preceding siblings ...)
  2023-05-18 10:07 ` [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 5/9] netfilter: nft_set_pipapo: Use struct_size() Florian Westphal
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Christophe JAILLET, Simon Horman

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nf_conntrack_expect' from 264
to 256 bytes.

This structure deserve a dedicated cache, so reducing its size looks nice.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_expect.h | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_expect.h b/include/net/netfilter/nf_conntrack_expect.h
index 0855b60fba17..cf0d81be5a96 100644
--- a/include/net/netfilter/nf_conntrack_expect.h
+++ b/include/net/netfilter/nf_conntrack_expect.h
@@ -26,6 +26,15 @@ struct nf_conntrack_expect {
 	struct nf_conntrack_tuple tuple;
 	struct nf_conntrack_tuple_mask mask;
 
+	/* Usage count. */
+	refcount_t use;
+
+	/* Flags */
+	unsigned int flags;
+
+	/* Expectation class */
+	unsigned int class;
+
 	/* Function to call after setup and insertion */
 	void (*expectfn)(struct nf_conn *new,
 			 struct nf_conntrack_expect *this);
@@ -39,15 +48,6 @@ struct nf_conntrack_expect {
 	/* Timer function; deletes the expectation. */
 	struct timer_list timeout;
 
-	/* Usage count. */
-	refcount_t use;
-
-	/* Flags */
-	unsigned int flags;
-
-	/* Expectation class */
-	unsigned int class;
-
 #if IS_ENABLED(CONFIG_NF_NAT)
 	union nf_inet_addr saved_addr;
 	/* This is the original per-proto part, used to map the
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 5/9] netfilter: nft_set_pipapo: Use struct_size()
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (3 preceding siblings ...)
  2023-05-18 10:07 ` [PATCH net-next 4/9] netfilter: Reorder fields in 'struct nf_conntrack_expect' Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 6/9] netfilter: conntrack: allow insertion clash of gre protocol Florian Westphal
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Christophe JAILLET, Simon Horman

From: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Use struct_size() instead of hand writing it.
This is less verbose and more informative.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 06d46d182634..34c684e121d3 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1274,8 +1274,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	struct nft_pipapo_match *new;
 	int i;
 
-	new = kmalloc(sizeof(*new) + sizeof(*dst) * old->field_count,
-		      GFP_KERNEL);
+	new = kmalloc(struct_size(new, f, old->field_count), GFP_KERNEL);
 	if (!new)
 		return ERR_PTR(-ENOMEM);
 
@@ -2059,8 +2058,7 @@ static int nft_pipapo_init(const struct nft_set *set,
 	if (field_count > NFT_PIPAPO_MAX_FIELDS)
 		return -EINVAL;
 
-	m = kmalloc(sizeof(*priv->match) + sizeof(*f) * field_count,
-		    GFP_KERNEL);
+	m = kmalloc(struct_size(m, f, field_count), GFP_KERNEL);
 	if (!m)
 		return -ENOMEM;
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 6/9] netfilter: conntrack: allow insertion clash of gre protocol
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (4 preceding siblings ...)
  2023-05-18 10:07 ` [PATCH net-next 5/9] netfilter: nft_set_pipapo: Use struct_size() Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 7/9] netfilter: flowtable: simplify route logic Florian Westphal
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Faicker Mo

From: Faicker Mo <faicker.mo@ucloud.cn>

NVGRE tunnel is used in the VM-to-VM communications. The VM packets
are encapsulated in NVGRE and sent from the host. For NVGRE
there are two tuples(outer sip and outer dip) in the host conntrack item.
Insertion clashes are more likely to happen if the concurrent connections
are sent from the VM.

Signed-off-by: Faicker Mo <faicker.mo@ucloud.cn>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_proto_gre.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_conntrack_proto_gre.c b/net/netfilter/nf_conntrack_proto_gre.c
index 728eeb0aea87..ad6f0ca40cd2 100644
--- a/net/netfilter/nf_conntrack_proto_gre.c
+++ b/net/netfilter/nf_conntrack_proto_gre.c
@@ -296,6 +296,7 @@ void nf_conntrack_gre_init_net(struct net *net)
 /* protocol helper struct */
 const struct nf_conntrack_l4proto nf_conntrack_l4proto_gre = {
 	.l4proto	 = IPPROTO_GRE,
+	.allow_clash	 = true,
 #ifdef CONFIG_NF_CONNTRACK_PROCFS
 	.print_conntrack = gre_print_conntrack,
 #endif
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 7/9] netfilter: flowtable: simplify route logic
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (5 preceding siblings ...)
  2023-05-18 10:07 ` [PATCH net-next 6/9] netfilter: conntrack: allow insertion clash of gre protocol Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 8/9] netfilter: flowtable: split IPv4 datapath in helper functions Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 9/9] netfilter: flowtable: split IPv6 " Florian Westphal
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Pablo Neira Ayuso

From: Pablo Neira Ayuso <pablo@netfilter.org>

Grab reference to dst from skbuff earlier to simplify route caching.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_flow_table.h |  4 ++--
 net/netfilter/nf_flow_table_core.c    | 24 +++---------------------
 net/netfilter/nft_flow_offload.c      | 12 ++++++++----
 3 files changed, 13 insertions(+), 27 deletions(-)

diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index ebb28ec5b6fa..546fc4a9b939 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -263,8 +263,8 @@ nf_flow_table_offload_del_cb(struct nf_flowtable *flow_table,
 	up_write(&flow_table->flow_block_lock);
 }
 
-int flow_offload_route_init(struct flow_offload *flow,
-			    const struct nf_flow_route *route);
+void flow_offload_route_init(struct flow_offload *flow,
+			     const struct nf_flow_route *route);
 
 int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow);
 void flow_offload_refresh(struct nf_flowtable *flow_table,
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 04bd0ed4d2ae..b46dd897f2c5 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -125,9 +125,6 @@ static int flow_offload_fill_route(struct flow_offload *flow,
 		break;
 	case FLOW_OFFLOAD_XMIT_XFRM:
 	case FLOW_OFFLOAD_XMIT_NEIGH:
-		if (!dst_hold_safe(route->tuple[dir].dst))
-			return -1;
-
 		flow_tuple->dst_cache = dst;
 		flow_tuple->dst_cookie = flow_offload_dst_cookie(flow_tuple);
 		break;
@@ -148,27 +145,12 @@ static void nft_flow_dst_release(struct flow_offload *flow,
 		dst_release(flow->tuplehash[dir].tuple.dst_cache);
 }
 
-int flow_offload_route_init(struct flow_offload *flow,
+void flow_offload_route_init(struct flow_offload *flow,
 			    const struct nf_flow_route *route)
 {
-	int err;
-
-	err = flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_ORIGINAL);
-	if (err < 0)
-		return err;
-
-	err = flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_REPLY);
-	if (err < 0)
-		goto err_route_reply;
-
+	flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_ORIGINAL);
+	flow_offload_fill_route(flow, route, FLOW_OFFLOAD_DIR_REPLY);
 	flow->type = NF_FLOW_OFFLOAD_ROUTE;
-
-	return 0;
-
-err_route_reply:
-	nft_flow_dst_release(flow, FLOW_OFFLOAD_DIR_ORIGINAL);
-
-	return err;
 }
 EXPORT_SYMBOL_GPL(flow_offload_route_init);
 
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index e860d8fe0e5e..5ef9146e74ad 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -250,9 +250,14 @@ static int nft_flow_route(const struct nft_pktinfo *pkt,
 		break;
 	}
 
+	if (!dst_hold_safe(this_dst))
+		return -ENOENT;
+
 	nf_route(nft_net(pkt), &other_dst, &fl, false, nft_pf(pkt));
-	if (!other_dst)
+	if (!other_dst) {
+		dst_release(this_dst);
 		return -ENOENT;
+	}
 
 	nft_default_forward_path(route, this_dst, dir);
 	nft_default_forward_path(route, other_dst, !dir);
@@ -349,8 +354,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
 	if (!flow)
 		goto err_flow_alloc;
 
-	if (flow_offload_route_init(flow, &route) < 0)
-		goto err_flow_add;
+	flow_offload_route_init(flow, &route);
 
 	if (tcph) {
 		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
@@ -361,12 +365,12 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
 	if (ret < 0)
 		goto err_flow_add;
 
-	dst_release(route.tuple[!dir].dst);
 	return;
 
 err_flow_add:
 	flow_offload_free(flow);
 err_flow_alloc:
+	dst_release(route.tuple[dir].dst);
 	dst_release(route.tuple[!dir].dst);
 err_flow_route:
 	clear_bit(IPS_OFFLOAD_BIT, &ct->status);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 8/9] netfilter: flowtable: split IPv4 datapath in helper functions
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (6 preceding siblings ...)
  2023-05-18 10:07 ` [PATCH net-next 7/9] netfilter: flowtable: simplify route logic Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  2023-05-18 10:07 ` [PATCH net-next 9/9] netfilter: flowtable: split IPv6 " Florian Westphal
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Pablo Neira Ayuso

From: Pablo Neira Ayuso <pablo@netfilter.org>

Add context structure and helper functions to look up for a matching
IPv4 entry in the flowtable and to forward packets.

No functional changes are intended.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_flow_table_ip.c | 119 ++++++++++++++++++++-----------
 1 file changed, 77 insertions(+), 42 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 19efba1e51ef..3fb476167d1d 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -163,38 +163,43 @@ static void nf_flow_tuple_encap(struct sk_buff *skb,
 	}
 }
 
-static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
-			    struct flow_offload_tuple *tuple, u32 *hdrsize,
-			    u32 offset)
+struct nf_flowtable_ctx {
+	const struct net_device	*in;
+	u32			offset;
+	u32			hdrsize;
+};
+
+static int nf_flow_tuple_ip(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
+			    struct flow_offload_tuple *tuple)
 {
 	struct flow_ports *ports;
 	unsigned int thoff;
 	struct iphdr *iph;
 	u8 ipproto;
 
-	if (!pskb_may_pull(skb, sizeof(*iph) + offset))
+	if (!pskb_may_pull(skb, sizeof(*iph) + ctx->offset))
 		return -1;
 
-	iph = (struct iphdr *)(skb_network_header(skb) + offset);
+	iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
 	thoff = (iph->ihl * 4);
 
 	if (ip_is_fragment(iph) ||
 	    unlikely(ip_has_options(thoff)))
 		return -1;
 
-	thoff += offset;
+	thoff += ctx->offset;
 
 	ipproto = iph->protocol;
 	switch (ipproto) {
 	case IPPROTO_TCP:
-		*hdrsize = sizeof(struct tcphdr);
+		ctx->hdrsize = sizeof(struct tcphdr);
 		break;
 	case IPPROTO_UDP:
-		*hdrsize = sizeof(struct udphdr);
+		ctx->hdrsize = sizeof(struct udphdr);
 		break;
 #ifdef CONFIG_NF_CT_PROTO_GRE
 	case IPPROTO_GRE:
-		*hdrsize = sizeof(struct gre_base_hdr);
+		ctx->hdrsize = sizeof(struct gre_base_hdr);
 		break;
 #endif
 	default:
@@ -204,7 +209,7 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
 	if (iph->ttl <= 1)
 		return -1;
 
-	if (!pskb_may_pull(skb, thoff + *hdrsize))
+	if (!pskb_may_pull(skb, thoff + ctx->hdrsize))
 		return -1;
 
 	switch (ipproto) {
@@ -224,13 +229,13 @@ static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev,
 	}
 	}
 
-	iph = (struct iphdr *)(skb_network_header(skb) + offset);
+	iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
 
 	tuple->src_v4.s_addr	= iph->saddr;
 	tuple->dst_v4.s_addr	= iph->daddr;
 	tuple->l3proto		= AF_INET;
 	tuple->l4proto		= ipproto;
-	tuple->iifidx		= dev->ifindex;
+	tuple->iifidx		= ctx->in->ifindex;
 	nf_flow_tuple_encap(skb, tuple);
 
 	return 0;
@@ -336,58 +341,56 @@ static unsigned int nf_flow_queue_xmit(struct net *net, struct sk_buff *skb,
 	return NF_STOLEN;
 }
 
-unsigned int
-nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
-			const struct nf_hook_state *state)
+static struct flow_offload_tuple_rhash *
+nf_flow_offload_lookup(struct nf_flowtable_ctx *ctx,
+		       struct nf_flowtable *flow_table, struct sk_buff *skb)
 {
-	struct flow_offload_tuple_rhash *tuplehash;
-	struct nf_flowtable *flow_table = priv;
 	struct flow_offload_tuple tuple = {};
-	enum flow_offload_tuple_dir dir;
-	struct flow_offload *flow;
-	struct net_device *outdev;
-	u32 hdrsize, offset = 0;
-	unsigned int thoff, mtu;
-	struct rtable *rt;
-	struct iphdr *iph;
-	__be32 nexthop;
-	int ret;
 
 	if (skb->protocol != htons(ETH_P_IP) &&
-	    !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &offset))
-		return NF_ACCEPT;
+	    !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IP), &ctx->offset))
+		return NULL;
 
-	if (nf_flow_tuple_ip(skb, state->in, &tuple, &hdrsize, offset) < 0)
-		return NF_ACCEPT;
+	if (nf_flow_tuple_ip(ctx, skb, &tuple) < 0)
+		return NULL;
 
-	tuplehash = flow_offload_lookup(flow_table, &tuple);
-	if (tuplehash == NULL)
-		return NF_ACCEPT;
+	return flow_offload_lookup(flow_table, &tuple);
+}
+
+static int nf_flow_offload_forward(struct nf_flowtable_ctx *ctx,
+				   struct nf_flowtable *flow_table,
+				   struct flow_offload_tuple_rhash *tuplehash,
+				   struct sk_buff *skb)
+{
+	enum flow_offload_tuple_dir dir;
+	struct flow_offload *flow;
+	unsigned int thoff, mtu;
+	struct iphdr *iph;
 
 	dir = tuplehash->tuple.dir;
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 
-	mtu = flow->tuplehash[dir].tuple.mtu + offset;
+	mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset;
 	if (unlikely(nf_flow_exceeds_mtu(skb, mtu)))
-		return NF_ACCEPT;
+		return 0;
 
-	iph = (struct iphdr *)(skb_network_header(skb) + offset);
-	thoff = (iph->ihl * 4) + offset;
+	iph = (struct iphdr *)(skb_network_header(skb) + ctx->offset);
+	thoff = (iph->ihl * 4) + ctx->offset;
 	if (nf_flow_state_check(flow, iph->protocol, skb, thoff))
-		return NF_ACCEPT;
+		return 0;
 
 	if (!nf_flow_dst_check(&tuplehash->tuple)) {
 		flow_offload_teardown(flow);
-		return NF_ACCEPT;
+		return 0;
 	}
 
-	if (skb_try_make_writable(skb, thoff + hdrsize))
-		return NF_DROP;
+	if (skb_try_make_writable(skb, thoff + ctx->hdrsize))
+		return -1;
 
 	flow_offload_refresh(flow_table, flow);
 
 	nf_flow_encap_pop(skb, tuplehash);
-	thoff -= offset;
+	thoff -= ctx->offset;
 
 	iph = ip_hdr(skb);
 	nf_flow_nat_ip(flow, skb, thoff, dir, iph);
@@ -398,6 +401,35 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 	if (flow_table->flags & NF_FLOWTABLE_COUNTER)
 		nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len);
 
+	return 1;
+}
+
+unsigned int
+nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
+			const struct nf_hook_state *state)
+{
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct nf_flowtable *flow_table = priv;
+	enum flow_offload_tuple_dir dir;
+	struct nf_flowtable_ctx ctx = {
+		.in	= state->in,
+	};
+	struct flow_offload *flow;
+	struct net_device *outdev;
+	struct rtable *rt;
+	__be32 nexthop;
+	int ret;
+
+	tuplehash = nf_flow_offload_lookup(&ctx, flow_table, skb);
+	if (!tuplehash)
+		return NF_ACCEPT;
+
+	ret = nf_flow_offload_forward(&ctx, flow_table, tuplehash, skb);
+	if (ret < 0)
+		return NF_DROP;
+	else if (ret == 0)
+		return NF_ACCEPT;
+
 	if (unlikely(tuplehash->tuple.xmit_type == FLOW_OFFLOAD_XMIT_XFRM)) {
 		rt = (struct rtable *)tuplehash->tuple.dst_cache;
 		memset(skb->cb, 0, sizeof(struct inet_skb_parm));
@@ -406,6 +438,9 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 		return nf_flow_xmit_xfrm(skb, state, &rt->dst);
 	}
 
+	dir = tuplehash->tuple.dir;
+	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+
 	switch (tuplehash->tuple.xmit_type) {
 	case FLOW_OFFLOAD_XMIT_NEIGH:
 		rt = (struct rtable *)tuplehash->tuple.dst_cache;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH net-next 9/9] netfilter: flowtable: split IPv6 datapath in helper functions
  2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
                   ` (7 preceding siblings ...)
  2023-05-18 10:07 ` [PATCH net-next 8/9] netfilter: flowtable: split IPv4 datapath in helper functions Florian Westphal
@ 2023-05-18 10:07 ` Florian Westphal
  8 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-18 10:07 UTC (permalink / raw)
  To: netdev
  Cc: Jakub Kicinski, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Pablo Neira Ayuso

From: Pablo Neira Ayuso <pablo@netfilter.org>

Add context structure and helper functions to look up for a matching
IPv6 entry in the flowtable and to forward packets.

No functional changes are intended.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_flow_table_ip.c | 112 ++++++++++++++++++++-----------
 1 file changed, 71 insertions(+), 41 deletions(-)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index 3fb476167d1d..d248763917ad 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -570,32 +570,31 @@ static void nf_flow_nat_ipv6(const struct flow_offload *flow,
 	}
 }
 
-static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
-			      struct flow_offload_tuple *tuple, u32 *hdrsize,
-			      u32 offset)
+static int nf_flow_tuple_ipv6(struct nf_flowtable_ctx *ctx, struct sk_buff *skb,
+			      struct flow_offload_tuple *tuple)
 {
 	struct flow_ports *ports;
 	struct ipv6hdr *ip6h;
 	unsigned int thoff;
 	u8 nexthdr;
 
-	thoff = sizeof(*ip6h) + offset;
+	thoff = sizeof(*ip6h) + ctx->offset;
 	if (!pskb_may_pull(skb, thoff))
 		return -1;
 
-	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
+	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
 
 	nexthdr = ip6h->nexthdr;
 	switch (nexthdr) {
 	case IPPROTO_TCP:
-		*hdrsize = sizeof(struct tcphdr);
+		ctx->hdrsize = sizeof(struct tcphdr);
 		break;
 	case IPPROTO_UDP:
-		*hdrsize = sizeof(struct udphdr);
+		ctx->hdrsize = sizeof(struct udphdr);
 		break;
 #ifdef CONFIG_NF_CT_PROTO_GRE
 	case IPPROTO_GRE:
-		*hdrsize = sizeof(struct gre_base_hdr);
+		ctx->hdrsize = sizeof(struct gre_base_hdr);
 		break;
 #endif
 	default:
@@ -605,7 +604,7 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
 	if (ip6h->hop_limit <= 1)
 		return -1;
 
-	if (!pskb_may_pull(skb, thoff + *hdrsize))
+	if (!pskb_may_pull(skb, thoff + ctx->hdrsize))
 		return -1;
 
 	switch (nexthdr) {
@@ -625,65 +624,47 @@ static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev,
 	}
 	}
 
-	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
+	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
 
 	tuple->src_v6		= ip6h->saddr;
 	tuple->dst_v6		= ip6h->daddr;
 	tuple->l3proto		= AF_INET6;
 	tuple->l4proto		= nexthdr;
-	tuple->iifidx		= dev->ifindex;
+	tuple->iifidx		= ctx->in->ifindex;
 	nf_flow_tuple_encap(skb, tuple);
 
 	return 0;
 }
 
-unsigned int
-nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
-			  const struct nf_hook_state *state)
+static int nf_flow_offload_ipv6_forward(struct nf_flowtable_ctx *ctx,
+					struct nf_flowtable *flow_table,
+					struct flow_offload_tuple_rhash *tuplehash,
+					struct sk_buff *skb)
 {
-	struct flow_offload_tuple_rhash *tuplehash;
-	struct nf_flowtable *flow_table = priv;
-	struct flow_offload_tuple tuple = {};
 	enum flow_offload_tuple_dir dir;
-	const struct in6_addr *nexthop;
 	struct flow_offload *flow;
-	struct net_device *outdev;
 	unsigned int thoff, mtu;
-	u32 hdrsize, offset = 0;
 	struct ipv6hdr *ip6h;
-	struct rt6_info *rt;
-	int ret;
-
-	if (skb->protocol != htons(ETH_P_IPV6) &&
-	    !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IPV6), &offset))
-		return NF_ACCEPT;
-
-	if (nf_flow_tuple_ipv6(skb, state->in, &tuple, &hdrsize, offset) < 0)
-		return NF_ACCEPT;
-
-	tuplehash = flow_offload_lookup(flow_table, &tuple);
-	if (tuplehash == NULL)
-		return NF_ACCEPT;
 
 	dir = tuplehash->tuple.dir;
 	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
 
-	mtu = flow->tuplehash[dir].tuple.mtu + offset;
+	mtu = flow->tuplehash[dir].tuple.mtu + ctx->offset;
 	if (unlikely(nf_flow_exceeds_mtu(skb, mtu)))
-		return NF_ACCEPT;
+		return 0;
 
-	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + offset);
-	thoff = sizeof(*ip6h) + offset;
+	ip6h = (struct ipv6hdr *)(skb_network_header(skb) + ctx->offset);
+	thoff = sizeof(*ip6h) + ctx->offset;
 	if (nf_flow_state_check(flow, ip6h->nexthdr, skb, thoff))
-		return NF_ACCEPT;
+		return 0;
 
 	if (!nf_flow_dst_check(&tuplehash->tuple)) {
 		flow_offload_teardown(flow);
-		return NF_ACCEPT;
+		return 0;
 	}
 
-	if (skb_try_make_writable(skb, thoff + hdrsize))
-		return NF_DROP;
+	if (skb_try_make_writable(skb, thoff + ctx->hdrsize))
+		return -1;
 
 	flow_offload_refresh(flow_table, flow);
 
@@ -698,6 +679,52 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	if (flow_table->flags & NF_FLOWTABLE_COUNTER)
 		nf_ct_acct_update(flow->ct, tuplehash->tuple.dir, skb->len);
 
+	return 1;
+}
+
+static struct flow_offload_tuple_rhash *
+nf_flow_offload_ipv6_lookup(struct nf_flowtable_ctx *ctx,
+			    struct nf_flowtable *flow_table,
+			    struct sk_buff *skb)
+{
+	struct flow_offload_tuple tuple = {};
+
+	if (skb->protocol != htons(ETH_P_IPV6) &&
+	    !nf_flow_skb_encap_protocol(skb, htons(ETH_P_IPV6), &ctx->offset))
+		return NULL;
+
+	if (nf_flow_tuple_ipv6(ctx, skb, &tuple) < 0)
+		return NULL;
+
+	return flow_offload_lookup(flow_table, &tuple);
+}
+
+unsigned int
+nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
+			  const struct nf_hook_state *state)
+{
+	struct flow_offload_tuple_rhash *tuplehash;
+	struct nf_flowtable *flow_table = priv;
+	enum flow_offload_tuple_dir dir;
+	struct nf_flowtable_ctx ctx = {
+		.in	= state->in,
+	};
+	const struct in6_addr *nexthop;
+	struct flow_offload *flow;
+	struct net_device *outdev;
+	struct rt6_info *rt;
+	int ret;
+
+	tuplehash = nf_flow_offload_ipv6_lookup(&ctx, flow_table, skb);
+	if (tuplehash == NULL)
+		return NF_ACCEPT;
+
+	ret = nf_flow_offload_ipv6_forward(&ctx, flow_table, tuplehash, skb);
+	if (ret < 0)
+		return NF_DROP;
+	else if (ret == 0)
+		return NF_ACCEPT;
+
 	if (unlikely(tuplehash->tuple.xmit_type == FLOW_OFFLOAD_XMIT_XFRM)) {
 		rt = (struct rt6_info *)tuplehash->tuple.dst_cache;
 		memset(skb->cb, 0, sizeof(struct inet6_skb_parm));
@@ -706,6 +733,9 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 		return nf_flow_xmit_xfrm(skb, state, &rt->dst);
 	}
 
+	dir = tuplehash->tuple.dir;
+	flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]);
+
 	switch (tuplehash->tuple.xmit_type) {
 	case FLOW_OFFLOAD_XMIT_NEIGH:
 		rt = (struct rt6_info *)tuplehash->tuple.dst_cache;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching
  2023-05-18 10:07 ` [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching Florian Westphal
@ 2023-05-18 21:04   ` Jakub Kicinski
  2023-05-19 10:53     ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2023-05-18 21:04 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Jeremy Sowden

On Thu, 18 May 2023 12:07:53 +0200 Florian Westphal wrote:
> From: Jeremy Sowden <jeremy@azazel.net>
> 
> The xt_dccp iptables module supports the matching of DCCP packets based
> on the presence or absence of DCCP options.  Extend nft_exthdr to add
> this functionality to nftables.

Someone is actually using DCCP ? :o

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks
  2023-05-18 10:07 ` [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks Florian Westphal
@ 2023-05-18 22:50   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-05-18 22:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev, kuba, edumazet, pabeni, davem, netfilter-devel

Hello:

This series was applied to netdev/net-next.git (main)
by Florian Westphal <fw@strlen.de>:

On Thu, 18 May 2023 12:07:51 +0200 you wrote:
> Its currently not allowed to perform queries on a map, for example:
> 
> table t {
> 	map m {
> 		typeof ip saddr : meta mark
> 		..
> 
> [...]

Here is the summary with links:
  - [net-next,1/9] netfilter: nf_tables: relax set/map validation checks
    https://git.kernel.org/netdev/net-next/c/a4878eeae390
  - [net-next,2/9] netfilter: nf_tables: always increment set element count
    https://git.kernel.org/netdev/net-next/c/d4b7f29eb85c
  - [net-next,3/9] netfilter: nft_exthdr: add boolean DCCP option matching
    https://git.kernel.org/netdev/net-next/c/b9f9a485fb0e
  - [net-next,4/9] netfilter: Reorder fields in 'struct nf_conntrack_expect'
    https://git.kernel.org/netdev/net-next/c/61e03e912da8
  - [net-next,5/9] netfilter: nft_set_pipapo: Use struct_size()
    https://git.kernel.org/netdev/net-next/c/a2a0ffb08468
  - [net-next,6/9] netfilter: conntrack: allow insertion clash of gre protocol
    https://git.kernel.org/netdev/net-next/c/d671fd82eaa9
  - [net-next,7/9] netfilter: flowtable: simplify route logic
    https://git.kernel.org/netdev/net-next/c/fa502c865666
  - [net-next,8/9] netfilter: flowtable: split IPv4 datapath in helper functions
    https://git.kernel.org/netdev/net-next/c/a10fa0b489d6
  - [net-next,9/9] netfilter: flowtable: split IPv6 datapath in helper functions
    https://git.kernel.org/netdev/net-next/c/e05b5362166b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching
  2023-05-18 21:04   ` Jakub Kicinski
@ 2023-05-19 10:53     ` Florian Westphal
  2023-05-19 15:21       ` Jakub Kicinski
  0 siblings, 1 reply; 15+ messages in thread
From: Florian Westphal @ 2023-05-19 10:53 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Florian Westphal, netdev, Eric Dumazet, Paolo Abeni,
	David S. Miller, netfilter-devel, Jeremy Sowden

Jakub Kicinski <kuba@kernel.org> wrote:
> On Thu, 18 May 2023 12:07:53 +0200 Florian Westphal wrote:
> > From: Jeremy Sowden <jeremy@azazel.net>
> > 
> > The xt_dccp iptables module supports the matching of DCCP packets based
> > on the presence or absence of DCCP options.  Extend nft_exthdr to add
> > this functionality to nftables.
> 
> Someone is actually using DCCP ? :o

Don't know but its still seeing *some* activity.
When I asked the same question I was pointed at

https://multipath-dccp.org/

respectively the out-of-tree implementation at
https://github.com/telekom/mp-dccp/

There is also some ietf activity for dccp, e.g.
BBR-like CC:
https://www.ietf.org/archive/id/draft-romo-iccrg-ccid5-00.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching
  2023-05-19 10:53     ` Florian Westphal
@ 2023-05-19 15:21       ` Jakub Kicinski
  2023-05-19 15:25         ` Florian Westphal
  0 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2023-05-19 15:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, Eric Dumazet, Paolo Abeni, David S. Miller,
	netfilter-devel, Jeremy Sowden

On Fri, 19 May 2023 12:53:48 +0200 Florian Westphal wrote:
> > Someone is actually using DCCP ? :o  
> 
> Don't know but its still seeing *some* activity.
> When I asked the same question I was pointed at
> 
> https://multipath-dccp.org/
> 
> respectively the out-of-tree implementation at
> https://github.com/telekom/mp-dccp/
> 
> There is also some ietf activity for dccp, e.g.
> BBR-like CC:
> https://www.ietf.org/archive/id/draft-romo-iccrg-ccid5-00.html

Oh, Deutsche Telekom, ISDN and now DCCP?
I wonder if we could make one of them a maintainer, because DCCP
is an Orphan.. but then the GH tree has such gold as:
net/dccp/non_gpl_scheduler/ 
😑️

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching
  2023-05-19 15:21       ` Jakub Kicinski
@ 2023-05-19 15:25         ` Florian Westphal
  0 siblings, 0 replies; 15+ messages in thread
From: Florian Westphal @ 2023-05-19 15:25 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Florian Westphal, netdev, Eric Dumazet, Paolo Abeni,
	David S. Miller, netfilter-devel, Jeremy Sowden

Jakub Kicinski <kuba@kernel.org> wrote:
> Oh, Deutsche Telekom, ISDN and now DCCP?
> I wonder if we could make one of them a maintainer, because DCCP
> is an Orphan.. but then the GH tree has such gold as:
> net/dccp/non_gpl_scheduler/ 

Could just mark it CONFIG_BROKEN or rip it out
altogether.  It can be brought back if anyone cares.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-05-19 15:26 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 10:07 [PATCH net-next 0/9] Netfilter updates for net-next Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 1/9] netfilter: nf_tables: relax set/map validation checks Florian Westphal
2023-05-18 22:50   ` patchwork-bot+netdevbpf
2023-05-18 10:07 ` [PATCH net-next 2/9] netfilter: nf_tables: always increment set element count Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 3/9] netfilter: nft_exthdr: add boolean DCCP option matching Florian Westphal
2023-05-18 21:04   ` Jakub Kicinski
2023-05-19 10:53     ` Florian Westphal
2023-05-19 15:21       ` Jakub Kicinski
2023-05-19 15:25         ` Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 4/9] netfilter: Reorder fields in 'struct nf_conntrack_expect' Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 5/9] netfilter: nft_set_pipapo: Use struct_size() Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 6/9] netfilter: conntrack: allow insertion clash of gre protocol Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 7/9] netfilter: flowtable: simplify route logic Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 8/9] netfilter: flowtable: split IPv4 datapath in helper functions Florian Westphal
2023-05-18 10:07 ` [PATCH net-next 9/9] netfilter: flowtable: split IPv6 " Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).