netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort
@ 2024-04-25 12:06 Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 1/8] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Changes since v1:
 - rebase on top of nf/net-next tree
 - patch
   "netfilter: nf_tables: pass new nft_iter_type hint to walker" has
   been removed, a similar change has been applied already

I've retained Stefanos RvB tags for commits that have not been changed.
Tested with nftables py/shell tests and nft_concat_range on a debug
kernel.

V1 cover letter:

pipapo keeps one active set data (used from datapath) and one shadow
copy, in priv->clone, used from transactional path to update the set.

On abort and commit, the clone/shadow becomes the active set,
and a new clone is made for the next transaction.

The problem with this is that we cannot fail in ->commit.

This patchset rearranges priv->clone allocation so the cloning occurs on
the first insertion/removal.

set flush needs a bit of extra work, this is done by adding a iter_type
hint to the walker callbacks so that a set flush will be able to perform
the needed clone.

The dirty flag is no longer meaningful after these changes, so last
patch removes it again.

After this patch it is possible to elide calls to nft_setelem_remove
from the abort path IFF the set backend implements an abort() function,
but this change isn't included here.

Florian Westphal (8):
  netfilter: nft_set_pipapo: move prove_locking helper around
  netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  netfilter: nft_set_pipapo: merge deactivate helper into caller
  netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  netfilter: nft_set_pipapo: remove dirty flag

 net/netfilter/nft_set_pipapo.c | 261 ++++++++++++++++-----------------
 net/netfilter/nft_set_pipapo.h |   2 -
 2 files changed, 126 insertions(+), 137 deletions(-)

-- 
2.43.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 1/8] netfilter: nft_set_pipapo: move prove_locking helper around
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 2/8] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Preparation patch, the helper will soon get called from insert
function too.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 187138afac45..b8205d961ba4 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1247,6 +1247,17 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 	return 0;
 }
 
+static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
+{
+#ifdef CONFIG_PROVE_LOCKING
+	const struct net *net = read_pnet(&set->net);
+
+	return lockdep_is_held(&nft_pernet(net)->commit_mutex);
+#else
+	return true;
+#endif
+}
+
 /**
  * nft_pipapo_insert() - Validate and insert ranged elements
  * @net:	Network namespace
@@ -1799,17 +1810,6 @@ static void nft_pipapo_commit(struct nft_set *set)
 	priv->clone = new_clone;
 }
 
-static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
-{
-#ifdef CONFIG_PROVE_LOCKING
-	const struct net *net = read_pnet(&set->net);
-
-	return lockdep_is_held(&nft_pernet(net)->commit_mutex);
-#else
-	return true;
-#endif
-}
-
 static void nft_pipapo_abort(const struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 2/8] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 1/8] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 3/8] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Currently it returns an error pointer, but the only possible failure
is ENOMEM.

After a followup patch, we'd need to discard the errno code, i.e.

x = pipapo_clone()
if (IS_ERR(x))
	return NULL

or make more changes to fix up callers to expect IS_ERR() code
from set->ops->deactivate().

So simplify this and make it return ptr-or-null.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index b8205d961ba4..7b6d5d2d0d54 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1395,7 +1395,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
  * pipapo_clone() - Clone matching data to create new working copy
  * @old:	Existing matching data
  *
- * Return: copy of matching data passed as 'old', error pointer on failure
+ * Return: copy of matching data passed as 'old' or NULL.
  */
 static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 {
@@ -1405,7 +1405,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 
 	new = kmalloc(struct_size(new, f, old->field_count), GFP_KERNEL);
 	if (!new)
-		return ERR_PTR(-ENOMEM);
+		return NULL;
 
 	new->field_count = old->field_count;
 	new->bsize_max = old->bsize_max;
@@ -1477,7 +1477,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	free_percpu(new->scratch);
 	kfree(new);
 
-	return ERR_PTR(-ENOMEM);
+	return NULL;
 }
 
 /**
@@ -1797,7 +1797,7 @@ static void nft_pipapo_commit(struct nft_set *set)
 		return;
 
 	new_clone = pipapo_clone(priv->clone);
-	if (IS_ERR(new_clone))
+	if (!new_clone)
 		return;
 
 	priv->dirty = false;
@@ -1821,7 +1821,7 @@ static void nft_pipapo_abort(const struct nft_set *set)
 	m = rcu_dereference_protected(priv->match, nft_pipapo_transaction_mutex_held(set));
 
 	new_clone = pipapo_clone(m);
-	if (IS_ERR(new_clone))
+	if (!new_clone)
 		return;
 
 	priv->dirty = false;
@@ -2269,8 +2269,8 @@ static int nft_pipapo_init(const struct nft_set *set,
 
 	/* Create an initial clone of matching data for next insertion */
 	priv->clone = pipapo_clone(m);
-	if (IS_ERR(priv->clone)) {
-		err = PTR_ERR(priv->clone);
+	if (!priv->clone) {
+		err = -ENOMEM;
 		goto out_free;
 	}
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 3/8] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 1/8] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 2/8] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 4/8] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Once priv->clone can be NULL in case no insertions/removals occurred
in the last transaction we need to drop set elements from priv->match
if priv->clone is NULL.

While at it, condense this function by reusing the pipapo_free_match
helper instead of open-coded version.

The rcu_barrier() is removed, its not needed: old call_rcu instances
for pipapo_reclaim_match do not access struct nft_set.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 7b6d5d2d0d54..459e2dd5050c 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2326,33 +2326,18 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
 	struct nft_pipapo_match *m;
-	int cpu;
 
 	m = rcu_dereference_protected(priv->match, true);
-	if (m) {
-		rcu_barrier();
-
-		for_each_possible_cpu(cpu)
-			pipapo_free_scratch(m, cpu);
-		free_percpu(m->scratch);
-		pipapo_free_fields(m);
-		kfree(m);
-		priv->match = NULL;
-	}
 
 	if (priv->clone) {
-		m = priv->clone;
-
-		nft_set_pipapo_match_destroy(ctx, set, m);
-
-		for_each_possible_cpu(cpu)
-			pipapo_free_scratch(priv->clone, cpu);
-		free_percpu(priv->clone->scratch);
-
-		pipapo_free_fields(priv->clone);
-		kfree(priv->clone);
+		nft_set_pipapo_match_destroy(ctx, set, priv->clone);
+		pipapo_free_match(priv->clone);
 		priv->clone = NULL;
+	} else {
+		nft_set_pipapo_match_destroy(ctx, set, m);
 	}
+
+	pipapo_free_match(m);
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 4/8] netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (2 preceding siblings ...)
  2024-04-25 12:06 ` [PATCH nf-next v2 3/8] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-26  5:44   ` Stefano Brivio
  2024-04-25 12:06 ` [PATCH nf-next v2 5/8] netfilter: nft_set_pipapo: merge deactivate helper into caller Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

The existing code uses iter->type to figure out what data is needed, the
live copy (READ) or clone (UPDATE).

Without pending updates, priv->clone and priv->match will point to
different memory locations, but they have identical content.

Future patch will make priv->clone == NULL if there are no pending changes,
in this case we must copy the live data for the UPDATE case.

Currently this would require GFP_ATOMIC allocation.  Split the walk
function in two parts: one that does the walk and one that decides which
data is needed.

In the UPDATE case, callers hold the transaction mutex so we do not need
the rcu read lock.  This allows to use GFP_KERNEL allocation while
cloning.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v2: rebase to account for the previously-merged new iter->type field.

 net/netfilter/nft_set_pipapo.c | 60 ++++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 21 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 459e2dd5050c..3e1d44a2b1c1 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -2106,35 +2106,23 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
 }
 
 /**
- * nft_pipapo_walk() - Walk over elements
+ * __nft_pipapo_walk() - Walk over elements in m
  * @ctx:	nftables API context
  * @set:	nftables API set representation
+ * @m:		matching data pointing to key mapping array
  * @iter:	Iterator
  *
  * As elements are referenced in the mapping array for the last field, directly
  * scan that array: there's no need to follow rule mappings from the first
- * field.
+ * field. @m is protected either by RCU read lock or by transaction mutex.
  */
-static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
-			    struct nft_set_iter *iter)
+static void nft_pipapo_do_walk(const struct nft_ctx *ctx, struct nft_set *set,
+			       const struct nft_pipapo_match *m,
+			       struct nft_set_iter *iter)
 {
-	struct nft_pipapo *priv = nft_set_priv(set);
-	const struct nft_pipapo_match *m;
 	const struct nft_pipapo_field *f;
 	unsigned int i, r;
 
-	WARN_ON_ONCE(iter->type != NFT_ITER_READ &&
-		     iter->type != NFT_ITER_UPDATE);
-
-	rcu_read_lock();
-	if (iter->type == NFT_ITER_READ)
-		m = rcu_dereference(priv->match);
-	else
-		m = priv->clone;
-
-	if (unlikely(!m))
-		goto out;
-
 	for (i = 0, f = m->f; i < m->field_count - 1; i++, f++)
 		;
 
@@ -2151,14 +2139,44 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
 
 		iter->err = iter->fn(ctx, set, iter, &e->priv);
 		if (iter->err < 0)
-			goto out;
+			return;
 
 cont:
 		iter->count++;
 	}
+}
 
-out:
-	rcu_read_unlock();
+/**
+ * nft_pipapo_walk() - Walk over elements
+ * @ctx:	nftables API context
+ * @set:	nftables API set representation
+ * @iter:	Iterator
+ *
+ * Test if destructive action is needed or not, clone active backend if needed
+ * and call the real function to work on the data.
+ */
+static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
+			    struct nft_set_iter *iter)
+{
+	struct nft_pipapo *priv = nft_set_priv(set);
+	const struct nft_pipapo_match *m;
+
+	switch (iter->type) {
+	case NFT_ITER_UPDATE:
+		m = priv->clone;
+		nft_pipapo_do_walk(ctx, set, m, iter);
+		break;
+	case NFT_ITER_READ:
+		rcu_read_lock();
+		m = rcu_dereference(priv->match);
+		nft_pipapo_do_walk(ctx, set, m, iter);
+		rcu_read_unlock();
+		break;
+	default:
+		iter->err = -EINVAL;
+		WARN_ON_ONCE(1);
+		break;
+	}
 }
 
 /**
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 5/8] netfilter: nft_set_pipapo: merge deactivate helper into caller
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (3 preceding siblings ...)
  2024-04-25 12:06 ` [PATCH nf-next v2 4/8] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 6/8] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Its the only remaining call site so there is no need for this to
be separated anymore.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 39 ++++++++--------------------------
 1 file changed, 9 insertions(+), 30 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 3e1d44a2b1c1..a922d39f7f25 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1851,52 +1851,31 @@ static void nft_pipapo_activate(const struct net *net,
 }
 
 /**
- * pipapo_deactivate() - Check that element is in set, mark as inactive
+ * nft_pipapo_deactivate() - Search for element and make it inactive
  * @net:	Network namespace
  * @set:	nftables API set representation
- * @data:	Input key data
- * @ext:	nftables API extension pointer, used to check for end element
- *
- * This is a convenience function that can be called from both
- * nft_pipapo_deactivate() and nft_pipapo_flush(), as they are in fact the same
- * operation.
+ * @elem:	nftables API element representation containing key data
  *
  * Return: deactivated element if found, NULL otherwise.
  */
-static void *pipapo_deactivate(const struct net *net, const struct nft_set *set,
-			       const u8 *data, const struct nft_set_ext *ext)
+static struct nft_elem_priv *
+nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
+		      const struct nft_set_elem *elem)
 {
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, data, nft_genmask_next(net),
-		       nft_net_tstamp(net), GFP_KERNEL);
+	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+		       nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL);
 	if (IS_ERR(e))
 		return NULL;
 
 	nft_set_elem_change_active(net, set, &e->ext);
 
-	return e;
-}
-
-/**
- * nft_pipapo_deactivate() - Call pipapo_deactivate() to make element inactive
- * @net:	Network namespace
- * @set:	nftables API set representation
- * @elem:	nftables API element representation containing key data
- *
- * Return: deactivated element if found, NULL otherwise.
- */
-static struct nft_elem_priv *
-nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
-		      const struct nft_set_elem *elem)
-{
-	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
-
-	return pipapo_deactivate(net, set, (const u8 *)elem->key.val.data, ext);
+	return &e->priv;
 }
 
 /**
- * nft_pipapo_flush() - Call pipapo_deactivate() to make element inactive
+ * nft_pipapo_flush() - make element inactive
  * @net:	Network namespace
  * @set:	nftables API set representation
  * @elem_priv:	nftables API element representation containing key data
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 6/8] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (4 preceding siblings ...)
  2024-04-25 12:06 ` [PATCH nf-next v2 5/8] netfilter: nft_set_pipapo: merge deactivate helper into caller Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 7/8] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 8/8] netfilter: nft_set_pipapo: remove dirty flag Florian Westphal
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

The helper uses priv->clone unconditionally which will fail once we do
the clone conditionally on first insert or removal.

'nft get element' from userspace needs to use priv->match if priv->clone
is null.

Prepare for this by passing the match backend data as argument.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index a922d39f7f25..9c8da9a0861d 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -504,6 +504,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
  * pipapo_get() - Get matching element reference given key data
  * @net:	Network namespace
  * @set:	nftables API set representation
+ * @m:		storage containing active/existing elements
  * @data:	Key data to be matched against existing elements
  * @genmask:	If set, check that element is active in given genmask
  * @tstamp:	timestamp to check for expired elements
@@ -517,17 +518,15 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
  */
 static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 					  const struct nft_set *set,
+					  const struct nft_pipapo_match *m,
 					  const u8 *data, u8 genmask,
 					  u64 tstamp, gfp_t gfp)
 {
 	struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT);
-	struct nft_pipapo *priv = nft_set_priv(set);
 	unsigned long *res_map, *fill_map = NULL;
-	const struct nft_pipapo_match *m;
 	const struct nft_pipapo_field *f;
 	int i;
 
-	m = priv->clone;
 	if (m->bsize_max == 0)
 		return ret;
 
@@ -612,9 +611,11 @@ static struct nft_elem_priv *
 nft_pipapo_get(const struct net *net, const struct nft_set *set,
 	       const struct nft_set_elem *elem, unsigned int flags)
 {
+	struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_match *m = priv->clone;
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_cur(net), get_jiffies_64(),
 		       GFP_ATOMIC);
 	if (IS_ERR(e))
@@ -1288,7 +1289,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	else
 		end = start;
 
-	dup = pipapo_get(net, set, start, genmask, tstamp, GFP_KERNEL);
+	dup = pipapo_get(net, set, m, start, genmask, tstamp, GFP_KERNEL);
 	if (!IS_ERR(dup)) {
 		/* Check if we already have the same exact entry */
 		const struct nft_data *dup_key, *dup_end;
@@ -1310,7 +1311,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 
 	if (PTR_ERR(dup) == -ENOENT) {
 		/* Look for partially overlapping entries */
-		dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp,
+		dup = pipapo_get(net, set, m, end, nft_genmask_next(net), tstamp,
 				 GFP_KERNEL);
 	}
 
@@ -1862,9 +1863,11 @@ static struct nft_elem_priv *
 nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
 		      const struct nft_set_elem *elem)
 {
+	const struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_match *m = priv->clone;
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL);
 	if (IS_ERR(e))
 		return NULL;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 7/8] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (5 preceding siblings ...)
  2024-04-25 12:06 ` [PATCH nf-next v2 6/8] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  2024-04-25 12:06 ` [PATCH nf-next v2 8/8] netfilter: nft_set_pipapo: remove dirty flag Florian Westphal
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

This set type keeps two copies of the sets' content,
   priv->match (live version, used to match from packet path)
   priv->clone (work-in-progress version of the 'future' priv->match).

All additions and removals are done on priv->clone.  When transaction
completes, priv->clone becomes priv->match and a new clone is allocated
for use by next transaction.

Problem is that the cloning requires GFP_KERNEL allocations but we
cannot fail at either commit or abort time.

This patch defers the clone until we get an insertion or removal
request.  This allows us to handle OOM situations correctly.

This also allows to remove ->dirty in a followup change:

If ->clone exists, ->dirty is always true
If ->clone is NULL, ->dirty is always false, no elements were added
or removed (except catchall elements which are external to the specific
set backend).

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 73 ++++++++++++++++++++++++----------
 1 file changed, 52 insertions(+), 21 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 9c8da9a0861d..2b1c91e56ca1 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -615,6 +615,9 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
 	struct nft_pipapo_match *m = priv->clone;
 	struct nft_pipapo_elem *e;
 
+	if (!m)
+		m = rcu_dereference(priv->match);
+
 	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_cur(net), get_jiffies_64(),
 		       GFP_ATOMIC);
@@ -1259,6 +1262,29 @@ static bool nft_pipapo_transaction_mutex_held(const struct nft_set *set)
 #endif
 }
 
+static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old);
+
+/**
+ * pipapo_maybe_clone() - Build clone for pending data changes, if not existing
+ * @set:	nftables API set representation
+ *
+ * Return: newly created or existing clone, if any. NULL on allocation failure
+ */
+static struct nft_pipapo_match *pipapo_maybe_clone(const struct nft_set *set)
+{
+	struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_match *m;
+
+	if (priv->clone)
+		return priv->clone;
+
+	m = rcu_dereference_protected(priv->match,
+				      nft_pipapo_transaction_mutex_held(set));
+	priv->clone = pipapo_clone(m);
+
+	return priv->clone;
+}
+
 /**
  * nft_pipapo_insert() - Validate and insert ranged elements
  * @net:	Network namespace
@@ -1275,8 +1301,8 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
 	const u8 *start = (const u8 *)elem->key.val.data, *end;
+	struct nft_pipapo_match *m = pipapo_maybe_clone(set);
 	struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *m = priv->clone;
 	u8 genmask = nft_genmask_next(net);
 	struct nft_pipapo_elem *e, *dup;
 	u64 tstamp = nft_net_tstamp(net);
@@ -1284,6 +1310,9 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	const u8 *start_p, *end_p;
 	int i, bsize_max, err = 0;
 
+	if (!m)
+		return -ENOMEM;
+
 	if (nft_set_ext_exists(ext, NFT_SET_EXT_KEY_END))
 		end = (const u8 *)nft_set_ext_key_end(ext)->data;
 	else
@@ -1789,7 +1818,10 @@ static void pipapo_reclaim_match(struct rcu_head *rcu)
 static void nft_pipapo_commit(struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *new_clone, *old;
+	struct nft_pipapo_match *old;
+
+	if (!priv->clone)
+		return;
 
 	if (time_after_eq(jiffies, priv->last_gc + nft_set_gc_interval(set)))
 		pipapo_gc(set, priv->clone);
@@ -1797,38 +1829,27 @@ static void nft_pipapo_commit(struct nft_set *set)
 	if (!priv->dirty)
 		return;
 
-	new_clone = pipapo_clone(priv->clone);
-	if (!new_clone)
-		return;
-
+	old = rcu_replace_pointer(priv->match, priv->clone,
+				  nft_pipapo_transaction_mutex_held(set));
+	priv->clone = NULL;
 	priv->dirty = false;
 
-	old = rcu_access_pointer(priv->match);
-	rcu_assign_pointer(priv->match, priv->clone);
 	if (old)
 		call_rcu(&old->rcu, pipapo_reclaim_match);
-
-	priv->clone = new_clone;
 }
 
 static void nft_pipapo_abort(const struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *new_clone, *m;
 
 	if (!priv->dirty)
 		return;
 
-	m = rcu_dereference_protected(priv->match, nft_pipapo_transaction_mutex_held(set));
-
-	new_clone = pipapo_clone(m);
-	if (!new_clone)
+	if (!priv->clone)
 		return;
-
 	priv->dirty = false;
-
 	pipapo_free_match(priv->clone);
-	priv->clone = new_clone;
+	priv->clone = NULL;
 }
 
 /**
@@ -1863,10 +1884,15 @@ static struct nft_elem_priv *
 nft_pipapo_deactivate(const struct net *net, const struct nft_set *set,
 		      const struct nft_set_elem *elem)
 {
-	const struct nft_pipapo *priv = nft_set_priv(set);
-	struct nft_pipapo_match *m = priv->clone;
+	struct nft_pipapo_match *m = pipapo_maybe_clone(set);
 	struct nft_pipapo_elem *e;
 
+	/* removal must occur on priv->clone, if we are low on memory
+	 * we have no choice and must fail the removal request.
+	 */
+	if (!m)
+		return NULL;
+
 	e = pipapo_get(net, set, m, (const u8 *)elem->key.val.data,
 		       nft_genmask_next(net), nft_net_tstamp(net), GFP_KERNEL);
 	if (IS_ERR(e))
@@ -2145,7 +2171,12 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
 
 	switch (iter->type) {
 	case NFT_ITER_UPDATE:
-		m = priv->clone;
+		m = pipapo_maybe_clone(set);
+		if (!m) {
+			iter->err = -ENOMEM;
+			return;
+		}
+
 		nft_pipapo_do_walk(ctx, set, m, iter);
 		break;
 	case NFT_ITER_READ:
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf-next v2 8/8] netfilter: nft_set_pipapo: remove dirty flag
  2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
                   ` (6 preceding siblings ...)
  2024-04-25 12:06 ` [PATCH nf-next v2 7/8] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
@ 2024-04-25 12:06 ` Florian Westphal
  7 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-04-25 12:06 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

After previous change:
 ->clone exists: ->dirty is always true
 ->clone == NULL ->dirty is always false

So remove this flag.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
 net/netfilter/nft_set_pipapo.c | 25 -------------------------
 net/netfilter/nft_set_pipapo.h |  2 --
 2 files changed, 27 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 2b1c91e56ca1..be5c554ca4d3 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1302,7 +1302,6 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
 	const u8 *start = (const u8 *)elem->key.val.data, *end;
 	struct nft_pipapo_match *m = pipapo_maybe_clone(set);
-	struct nft_pipapo *priv = nft_set_priv(set);
 	u8 genmask = nft_genmask_next(net);
 	struct nft_pipapo_elem *e, *dup;
 	u64 tstamp = nft_net_tstamp(net);
@@ -1373,8 +1372,6 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	}
 
 	/* Insert */
-	priv->dirty = true;
-
 	bsize_max = m->bsize_max;
 
 	nft_pipapo_for_each_field(f, i, m) {
@@ -1739,8 +1736,6 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
 		 * NFT_SET_ELEM_DEAD_BIT.
 		 */
 		if (__nft_set_elem_expired(&e->ext, tstamp)) {
-			priv->dirty = true;
-
 			gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
 			if (!gc)
 				return;
@@ -1826,13 +1821,9 @@ static void nft_pipapo_commit(struct nft_set *set)
 	if (time_after_eq(jiffies, priv->last_gc + nft_set_gc_interval(set)))
 		pipapo_gc(set, priv->clone);
 
-	if (!priv->dirty)
-		return;
-
 	old = rcu_replace_pointer(priv->match, priv->clone,
 				  nft_pipapo_transaction_mutex_held(set));
 	priv->clone = NULL;
-	priv->dirty = false;
 
 	if (old)
 		call_rcu(&old->rcu, pipapo_reclaim_match);
@@ -1842,12 +1833,8 @@ static void nft_pipapo_abort(const struct nft_set *set)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
 
-	if (!priv->dirty)
-		return;
-
 	if (!priv->clone)
 		return;
-	priv->dirty = false;
 	pipapo_free_match(priv->clone);
 	priv->clone = NULL;
 }
@@ -2101,7 +2088,6 @@ static void nft_pipapo_remove(const struct net *net, const struct nft_set *set,
 			match_end += NFT_PIPAPO_GROUPS_PADDED_SIZE(f);
 
 			if (last && f->mt[rulemap[i].to].e == e) {
-				priv->dirty = true;
 				pipapo_drop(m, rulemap);
 				return;
 			}
@@ -2298,21 +2284,10 @@ static int nft_pipapo_init(const struct nft_set *set,
 		f->mt = NULL;
 	}
 
-	/* Create an initial clone of matching data for next insertion */
-	priv->clone = pipapo_clone(m);
-	if (!priv->clone) {
-		err = -ENOMEM;
-		goto out_free;
-	}
-
-	priv->dirty = false;
-
 	rcu_assign_pointer(priv->match, m);
 
 	return 0;
 
-out_free:
-	free_percpu(m->scratch);
 out_scratch:
 	kfree(m);
 
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 24cd1ff73f98..0d2e40e10f7f 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -155,14 +155,12 @@ struct nft_pipapo_match {
  * @match:	Currently in-use matching data
  * @clone:	Copy where pending insertions and deletions are kept
  * @width:	Total bytes to be matched for one packet, including padding
- * @dirty:	Working copy has pending insertions or deletions
  * @last_gc:	Timestamp of last garbage collection run, jiffies
  */
 struct nft_pipapo {
 	struct nft_pipapo_match __rcu *match;
 	struct nft_pipapo_match *clone;
 	int width;
-	bool dirty;
 	unsigned long last_gc;
 };
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH nf-next v2 4/8] netfilter: nft_set_pipapo: prepare walk function for on-demand clone
  2024-04-25 12:06 ` [PATCH nf-next v2 4/8] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
@ 2024-04-26  5:44   ` Stefano Brivio
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Brivio @ 2024-04-26  5:44 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Thu, 25 Apr 2024 14:06:43 +0200
Florian Westphal <fw@strlen.de> wrote:

> The existing code uses iter->type to figure out what data is needed, the
> live copy (READ) or clone (UPDATE).
> 
> Without pending updates, priv->clone and priv->match will point to
> different memory locations, but they have identical content.
> 
> Future patch will make priv->clone == NULL if there are no pending changes,
> in this case we must copy the live data for the UPDATE case.
> 
> Currently this would require GFP_ATOMIC allocation.  Split the walk
> function in two parts: one that does the walk and one that decides which
> data is needed.
> 
> In the UPDATE case, callers hold the transaction mutex so we do not need
> the rcu read lock.  This allows to use GFP_KERNEL allocation while
> cloning.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-26  5:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-25 12:06 [PATCH nf-next v2 0/8] nft_set_pipapo: remove cannot-fail allocations on commit and abort Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 1/8] netfilter: nft_set_pipapo: move prove_locking helper around Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 2/8] netfilter: nft_set_pipapo: make pipapo_clone helper return NULL Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 3/8] netfilter: nft_set_pipapo: prepare destroy function for on-demand clone Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 4/8] netfilter: nft_set_pipapo: prepare walk " Florian Westphal
2024-04-26  5:44   ` Stefano Brivio
2024-04-25 12:06 ` [PATCH nf-next v2 5/8] netfilter: nft_set_pipapo: merge deactivate helper into caller Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 6/8] netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 7/8] netfilter: nft_set_pipapo: move cloning of match info to insert/removal path Florian Westphal
2024-04-25 12:06 ` [PATCH nf-next v2 8/8] netfilter: nft_set_pipapo: remove dirty flag Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).