All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set
@ 2024-02-06 12:23 Florian Westphal
  2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Florian Westphal @ 2024-02-06 12:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

While working on speeding up pipapo rule insertions I found that the
map_index needs to be percpu and per-set, not just percpu.

At this time its possible for a pipapo set to fill the all-zero part
with ones and take the 'might have bits set' as 'start-from-zero' area.

First patch changes scratchpad area to a structure that provides
space for a per-set-and-cpu toggle and uses it of the percpu one.

Second patch prepares for patch 3, adds a new free helper.

Third patch removes the scratch_aligned pointer and makes AVX2
implementation use the exact same memory addresses for read/store of
the matching state.

Florian Westphal (3):
  netfilter: nft_set_pipapo: store index in scratch maps
  netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  netfilter: nft_set_pipapo: remove scratch_aligned pointer

 net/netfilter/nft_set_pipapo.c      | 96 +++++++++++++----------------
 net/netfilter/nft_set_pipapo.h      | 18 ++++--
 net/netfilter/nft_set_pipapo_avx2.c | 17 +++--
 3 files changed, 63 insertions(+), 68 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
  2024-02-06 12:23 [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set Florian Westphal
@ 2024-02-06 12:23 ` Florian Westphal
  2024-02-07 14:15   ` Stefano Brivio
  2024-02-06 12:23 ` [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Florian Westphal
  2024-02-06 12:23 ` [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer Florian Westphal
  2 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2024-02-06 12:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

Pipapo needs a scratchpad area to keep state during matching.
This state can be large and thus cannot reside on stack.

Each set preallocates percpu areas for this.

On each match stage, one scratchpad half starts with all-zero and the other
is inited to all-ones.

At the end of each stage, the half that starts with all-ones is
always zero.  Before next field is tested, pointers to the two halves
are swapped, i.e.  resmap pointer turns into fill pointer and vice versa.

After the last field has been processed, pipapo stashes the
index toggle in a percpu variable, with assumption that next packet
will start with the all-zero half and sets all bits in the other to 1.

This isn't reliable.

There can be multiple sets and we can't be sure that the upper
and lower half of all set scratch map is always in sync (lookups
can be conditional), so one set might have swapped, but other might
not have been queried.

Thus we need to keep the index per-set-and-cpu, just like the
scratchpad.

Note that this bug fix is incomplete, there is a related issue.

avx2 and normal implementation might use slightly different areas of the
map array space due to the avx2 alignment requirements, so
m->scratch (generic/fallback implementation) and ->scratch_aligned
(avx) may partially overlap. scratch and scratch_aligned are not distinct
objects, the latter is just the aligned address of the former.

After this change, write to scratch_align->map_index may write to
scratch->map, so this issue becomes more prominent, we can set to 1
a bit in the supposedly-all-zero area of scratch->map[].

A followup patch will remove the scratch_aligned and makes generic and
avx code use the same (aligned) area.

Its done in a separate change to ease review.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c      | 34 +++++++++++++++--------------
 net/netfilter/nft_set_pipapo.h      | 14 ++++++++++--
 net/netfilter/nft_set_pipapo_avx2.c | 15 ++++++-------
 3 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index efd523496be4..a8aa915f3f0b 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -342,9 +342,6 @@
 #include "nft_set_pipapo_avx2.h"
 #include "nft_set_pipapo.h"
 
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_scratch_index);
-
 /**
  * pipapo_refill() - For each set bit, set bits from selected mapping table item
  * @map:	Bitmap to be scanned for set bits
@@ -412,6 +409,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 		       const u32 *key, const struct nft_set_ext **ext)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_scratch *scratch;
 	unsigned long *res_map, *fill_map;
 	u8 genmask = nft_genmask_cur(net);
 	const u8 *rp = (const u8 *)key;
@@ -422,15 +420,17 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 
 	local_bh_disable();
 
-	map_index = raw_cpu_read(nft_pipapo_scratch_index);
-
 	m = rcu_dereference(priv->match);
 
 	if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
 		goto out;
 
-	res_map  = *raw_cpu_ptr(m->scratch) + (map_index ? m->bsize_max : 0);
-	fill_map = *raw_cpu_ptr(m->scratch) + (map_index ? 0 : m->bsize_max);
+	scratch = *raw_cpu_ptr(m->scratch);
+
+	map_index = scratch->map_index;
+
+	res_map  = scratch->map + (map_index ? m->bsize_max : 0);
+	fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
 
 	memset(res_map, 0xff, m->bsize_max * sizeof(*res_map));
 
@@ -460,7 +460,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 		b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt,
 				  last);
 		if (b < 0) {
-			raw_cpu_write(nft_pipapo_scratch_index, map_index);
+			scratch->map_index = map_index;
 			local_bh_enable();
 
 			return false;
@@ -477,7 +477,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 			 * current inactive bitmap is clean and can be reused as
 			 * *next* bitmap (not initial) for the next packet.
 			 */
-			raw_cpu_write(nft_pipapo_scratch_index, map_index);
+			scratch->map_index = map_index;
 			local_bh_enable();
 
 			return true;
@@ -1121,12 +1121,12 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 	int i;
 
 	for_each_possible_cpu(i) {
-		unsigned long *scratch;
+		struct nft_pipapo_scratch *scratch;
 #ifdef NFT_PIPAPO_ALIGN
-		unsigned long *scratch_aligned;
+		void *scratch_aligned;
 #endif
-
-		scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2 +
+		scratch = kzalloc_node(struct_size(scratch, map,
+						   bsize_max * 2) +
 				       NFT_PIPAPO_ALIGN_HEADROOM,
 				       GFP_KERNEL, cpu_to_node(i));
 		if (!scratch) {
@@ -1145,7 +1145,9 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 		*per_cpu_ptr(clone->scratch, i) = scratch;
 
 #ifdef NFT_PIPAPO_ALIGN
-		scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
+		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
+		/* Need to align ->map start, not start of structure itself */
+		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
 		*per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
 #endif
 	}
@@ -2132,7 +2134,7 @@ static int nft_pipapo_init(const struct nft_set *set,
 	m->field_count = field_count;
 	m->bsize_max = 0;
 
-	m->scratch = alloc_percpu(unsigned long *);
+	m->scratch = alloc_percpu(struct nft_pipapo_scratch *);
 	if (!m->scratch) {
 		err = -ENOMEM;
 		goto out_scratch;
@@ -2141,7 +2143,7 @@ static int nft_pipapo_init(const struct nft_set *set,
 		*per_cpu_ptr(m->scratch, i) = NULL;
 
 #ifdef NFT_PIPAPO_ALIGN
-	m->scratch_aligned = alloc_percpu(unsigned long *);
+	m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *);
 	if (!m->scratch_aligned) {
 		err = -ENOMEM;
 		goto out_free;
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 1040223da5fa..144b186c4caf 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -130,6 +130,16 @@ struct nft_pipapo_field {
 	union nft_pipapo_map_bucket *mt;
 };
 
+/**
+ * struct nft_pipapo_scratch - percpu data used for lookup and matching
+ * @map_index	Current working bitmap index, toggled between field matches
+ * @map		store partial matching results during lookup
+ */
+struct nft_pipapo_scratch {
+	u8 map_index;
+	unsigned long map[];
+};
+
 /**
  * struct nft_pipapo_match - Data used for lookup and matching
  * @field_count		Amount of fields in set
@@ -142,9 +152,9 @@ struct nft_pipapo_field {
 struct nft_pipapo_match {
 	int field_count;
 #ifdef NFT_PIPAPO_ALIGN
-	unsigned long * __percpu *scratch_aligned;
+	struct nft_pipapo_scratch * __percpu *scratch_aligned;
 #endif
-	unsigned long * __percpu *scratch;
+	struct nft_pipapo_scratch * __percpu *scratch;
 	size_t bsize_max;
 	struct rcu_head rcu;
 	struct nft_pipapo_field f[] __counted_by(field_count);
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 52e0d026d30a..8cad7b2e759d 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -71,9 +71,6 @@
 #define NFT_PIPAPO_AVX2_ZERO(reg)					\
 	asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
 
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_avx2_scratch_index);
-
 /**
  * nft_pipapo_avx2_prepare() - Prepare before main algorithm body
  *
@@ -1120,11 +1117,12 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 			    const u32 *key, const struct nft_set_ext **ext)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-	unsigned long *res, *fill, *scratch;
+	struct nft_pipapo_scratch *scratch;
 	u8 genmask = nft_genmask_cur(net);
 	const u8 *rp = (const u8 *)key;
 	struct nft_pipapo_match *m;
 	struct nft_pipapo_field *f;
+	unsigned long *res, *fill;
 	bool map_index;
 	int i, ret = 0;
 
@@ -1146,10 +1144,11 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 		kernel_fpu_end();
 		return false;
 	}
-	map_index = raw_cpu_read(nft_pipapo_avx2_scratch_index);
 
-	res  = scratch + (map_index ? m->bsize_max : 0);
-	fill = scratch + (map_index ? 0 : m->bsize_max);
+	map_index = scratch->map_index;
+
+	res = scratch->map + (map_index ? m->bsize_max : 0);
+	fill = scratch->map + (map_index ? 0 : m->bsize_max);
 
 	/* Starting map doesn't need to be set for this implementation */
 
@@ -1221,7 +1220,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 out:
 	if (i % 2)
-		raw_cpu_write(nft_pipapo_avx2_scratch_index, !map_index);
+		scratch->map_index = !map_index;
 	kernel_fpu_end();
 
 	return ret >= 0;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  2024-02-06 12:23 [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set Florian Westphal
  2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
@ 2024-02-06 12:23 ` Florian Westphal
  2024-02-07 17:29   ` Stefano Brivio
  2024-02-06 12:23 ` [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer Florian Westphal
  2 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2024-02-06 12:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

After next patch simple kfree() is not enough anymore, so add
a helper for it.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index a8aa915f3f0b..3d308e31b048 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1108,6 +1108,19 @@ static void pipapo_map(struct nft_pipapo_match *m,
 		f->mt[map[i].to + j].e = e;
 }
 
+static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu)
+{
+	struct nft_pipapo_scratch *s;
+	void *mem;
+
+	s = *per_cpu_ptr(m->scratch, cpu);
+	if (!s)
+		return;
+
+	mem = s;
+	kfree(mem);
+}
+
 /**
  * pipapo_realloc_scratch() - Reallocate scratch maps for partial match results
  * @clone:	Copy of matching data with pending insertions and deletions
@@ -1140,7 +1153,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 			return -ENOMEM;
 		}
 
-		kfree(*per_cpu_ptr(clone->scratch, i));
+		pipapo_free_scratch(clone, i);
 
 		*per_cpu_ptr(clone->scratch, i) = scratch;
 
@@ -1359,7 +1372,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	}
 out_scratch_realloc:
 	for_each_possible_cpu(i)
-		kfree(*per_cpu_ptr(new->scratch, i));
+		pipapo_free_scratch(new, i);
 #ifdef NFT_PIPAPO_ALIGN
 	free_percpu(new->scratch_aligned);
 #endif
@@ -1642,7 +1655,7 @@ static void pipapo_free_match(struct nft_pipapo_match *m)
 	int i;
 
 	for_each_possible_cpu(i)
-		kfree(*per_cpu_ptr(m->scratch, i));
+		pipapo_free_scratch(m, i);
 
 #ifdef NFT_PIPAPO_ALIGN
 	free_percpu(m->scratch_aligned);
@@ -2242,7 +2255,7 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 		free_percpu(m->scratch_aligned);
 #endif
 		for_each_possible_cpu(cpu)
-			kfree(*per_cpu_ptr(m->scratch, cpu));
+			pipapo_free_scratch(m, cpu);
 		free_percpu(m->scratch);
 		pipapo_free_fields(m);
 		kfree(m);
@@ -2259,7 +2272,7 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 		free_percpu(priv->clone->scratch_aligned);
 #endif
 		for_each_possible_cpu(cpu)
-			kfree(*per_cpu_ptr(priv->clone->scratch, cpu));
+			pipapo_free_scratch(priv->clone, cpu);
 		free_percpu(priv->clone->scratch);
 
 		pipapo_free_fields(priv->clone);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer
  2024-02-06 12:23 [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set Florian Westphal
  2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
  2024-02-06 12:23 ` [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Florian Westphal
@ 2024-02-06 12:23 ` Florian Westphal
  2024-02-07 17:29   ` Stefano Brivio
  2 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2024-02-06 12:23 UTC (permalink / raw)
  To: netfilter-devel; +Cc: sbrivio, Florian Westphal

use ->scratch for both avx2 and the generic implementation.

After previous change the scratch->map member is always aligned properly
for AVX2, so we can just use scratch->map in AVX2 too.

The alignoff delta is stored in the scratchpad so we can reconstruct
the correct address to free the area again.

Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nft_set_pipapo.c      | 41 +++++------------------------
 net/netfilter/nft_set_pipapo.h      |  6 ++---
 net/netfilter/nft_set_pipapo_avx2.c |  2 +-
 3 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 3d308e31b048..7ec284fd5093 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1118,6 +1118,7 @@ static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int c
 		return;
 
 	mem = s;
+	mem -= s->align_off;
 	kfree(mem);
 }
 
@@ -1137,6 +1138,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 		struct nft_pipapo_scratch *scratch;
 #ifdef NFT_PIPAPO_ALIGN
 		void *scratch_aligned;
+		u32 align_off;
 #endif
 		scratch = kzalloc_node(struct_size(scratch, map,
 						   bsize_max * 2) +
@@ -1155,14 +1157,16 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 
 		pipapo_free_scratch(clone, i);
 
-		*per_cpu_ptr(clone->scratch, i) = scratch;
-
 #ifdef NFT_PIPAPO_ALIGN
 		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
 		/* Need to align ->map start, not start of structure itself */
 		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
-		*per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
+		align_off = scratch_aligned - (void *)scratch;
+
+		scratch = scratch_aligned;
+		scratch->align_off = align_off;
 #endif
+		*per_cpu_ptr(clone->scratch, i) = scratch;
 	}
 
 	return 0;
@@ -1316,11 +1320,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	if (!new->scratch)
 		goto out_scratch;
 
-#ifdef NFT_PIPAPO_ALIGN
-	new->scratch_aligned = alloc_percpu(*new->scratch_aligned);
-	if (!new->scratch_aligned)
-		goto out_scratch;
-#endif
 	for_each_possible_cpu(i)
 		*per_cpu_ptr(new->scratch, i) = NULL;
 
@@ -1373,9 +1372,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 out_scratch_realloc:
 	for_each_possible_cpu(i)
 		pipapo_free_scratch(new, i);
-#ifdef NFT_PIPAPO_ALIGN
-	free_percpu(new->scratch_aligned);
-#endif
 out_scratch:
 	free_percpu(new->scratch);
 	kfree(new);
@@ -1657,11 +1653,7 @@ static void pipapo_free_match(struct nft_pipapo_match *m)
 	for_each_possible_cpu(i)
 		pipapo_free_scratch(m, i);
 
-#ifdef NFT_PIPAPO_ALIGN
-	free_percpu(m->scratch_aligned);
-#endif
 	free_percpu(m->scratch);
-
 	pipapo_free_fields(m);
 
 	kfree(m);
@@ -2155,16 +2147,6 @@ static int nft_pipapo_init(const struct nft_set *set,
 	for_each_possible_cpu(i)
 		*per_cpu_ptr(m->scratch, i) = NULL;
 
-#ifdef NFT_PIPAPO_ALIGN
-	m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *);
-	if (!m->scratch_aligned) {
-		err = -ENOMEM;
-		goto out_free;
-	}
-	for_each_possible_cpu(i)
-		*per_cpu_ptr(m->scratch_aligned, i) = NULL;
-#endif
-
 	rcu_head_init(&m->rcu);
 
 	nft_pipapo_for_each_field(f, i, m) {
@@ -2195,9 +2177,6 @@ static int nft_pipapo_init(const struct nft_set *set,
 	return 0;
 
 out_free:
-#ifdef NFT_PIPAPO_ALIGN
-	free_percpu(m->scratch_aligned);
-#endif
 	free_percpu(m->scratch);
 out_scratch:
 	kfree(m);
@@ -2251,9 +2230,6 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 
 		nft_set_pipapo_match_destroy(ctx, set, m);
 
-#ifdef NFT_PIPAPO_ALIGN
-		free_percpu(m->scratch_aligned);
-#endif
 		for_each_possible_cpu(cpu)
 			pipapo_free_scratch(m, cpu);
 		free_percpu(m->scratch);
@@ -2268,9 +2244,6 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 		if (priv->dirty)
 			nft_set_pipapo_match_destroy(ctx, set, m);
 
-#ifdef NFT_PIPAPO_ALIGN
-		free_percpu(priv->clone->scratch_aligned);
-#endif
 		for_each_possible_cpu(cpu)
 			pipapo_free_scratch(priv->clone, cpu);
 		free_percpu(priv->clone->scratch);
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 144b186c4caf..e5f67c5cf30a 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -133,10 +133,12 @@ struct nft_pipapo_field {
 /**
  * struct nft_pipapo_scratch - percpu data used for lookup and matching
  * @map_index	Current working bitmap index, toggled between field matches
+ * @align_off	Offset to get the originally allocated address
  * @map		store partial matching results during lookup
  */
 struct nft_pipapo_scratch {
 	u8 map_index;
+	u32 align_off;
 	unsigned long map[];
 };
 
@@ -144,16 +146,12 @@ struct nft_pipapo_scratch {
  * struct nft_pipapo_match - Data used for lookup and matching
  * @field_count		Amount of fields in set
  * @scratch:		Preallocated per-CPU maps for partial matching results
- * @scratch_aligned:	Version of @scratch aligned to NFT_PIPAPO_ALIGN bytes
  * @bsize_max:		Maximum lookup table bucket size of all fields, in longs
  * @rcu			Matching data is swapped on commits
  * @f:			Fields, with lookup and mapping tables
  */
 struct nft_pipapo_match {
 	int field_count;
-#ifdef NFT_PIPAPO_ALIGN
-	struct nft_pipapo_scratch * __percpu *scratch_aligned;
-#endif
 	struct nft_pipapo_scratch * __percpu *scratch;
 	size_t bsize_max;
 	struct rcu_head rcu;
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 8cad7b2e759d..107cffa8200c 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1139,7 +1139,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 	 */
 	kernel_fpu_begin_mask(0);
 
-	scratch = *raw_cpu_ptr(m->scratch_aligned);
+	scratch = *raw_cpu_ptr(m->scratch);
 	if (unlikely(!scratch)) {
 		kernel_fpu_end();
 		return false;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
  2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
@ 2024-02-07 14:15   ` Stefano Brivio
  2024-02-07 15:23     ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Brivio @ 2024-02-07 14:15 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue,  6 Feb 2024 13:23:06 +0100
Florian Westphal <fw@strlen.de> wrote:

> Pipapo needs a scratchpad area to keep state during matching.
> This state can be large and thus cannot reside on stack.
> 
> Each set preallocates percpu areas for this.
> 
> On each match stage, one scratchpad half starts with all-zero and the other
> is inited to all-ones.
> 
> At the end of each stage, the half that starts with all-ones is
> always zero.  Before next field is tested, pointers to the two halves
> are swapped, i.e.  resmap pointer turns into fill pointer and vice versa.
> 
> After the last field has been processed, pipapo stashes the
> index toggle in a percpu variable, with assumption that next packet
> will start with the all-zero half and sets all bits in the other to 1.
> 
> This isn't reliable.

Uh oh. In hindsight, this sounds so obvious now...

> There can be multiple sets and we can't be sure that the upper
> and lower half of all set scratch map is always in sync (lookups
> can be conditional), so one set might have swapped, but other might
> not have been queried.
> 
> Thus we need to keep the index per-set-and-cpu, just like the
> scratchpad.
> 
> Note that this bug fix is incomplete, there is a related issue.
> 
> avx2 and normal implementation might use slightly different areas of the
> map array space due to the avx2 alignment requirements, so
> m->scratch (generic/fallback implementation) and ->scratch_aligned
> (avx) may partially overlap. scratch and scratch_aligned are not distinct
> objects, the latter is just the aligned address of the former.
> 
> After this change, write to scratch_align->map_index may write to
> scratch->map, so this issue becomes more prominent, we can set to 1
> a bit in the supposedly-all-zero area of scratch->map[].
> 
> A followup patch will remove the scratch_aligned and makes generic and
> avx code use the same (aligned) area.
> 
> Its done in a separate change to ease review.
> 
> Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> Signed-off-by: Florian Westphal <fw@strlen.de>

Minus one nit (not worth respinning) and one half-doubt below,

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

...I'm still reviewing the rest.

> ---
>  net/netfilter/nft_set_pipapo.c      | 34 +++++++++++++++--------------
>  net/netfilter/nft_set_pipapo.h      | 14 ++++++++++--
>  net/netfilter/nft_set_pipapo_avx2.c | 15 ++++++-------
>  3 files changed, 37 insertions(+), 26 deletions(-)
> 
> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index efd523496be4..a8aa915f3f0b 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -342,9 +342,6 @@
>  #include "nft_set_pipapo_avx2.h"
>  #include "nft_set_pipapo.h"
>  
> -/* Current working bitmap index, toggled between field matches */
> -static DEFINE_PER_CPU(bool, nft_pipapo_scratch_index);
> -
>  /**
>   * pipapo_refill() - For each set bit, set bits from selected mapping table item
>   * @map:	Bitmap to be scanned for set bits
> @@ -412,6 +409,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
>  		       const u32 *key, const struct nft_set_ext **ext)
>  {
>  	struct nft_pipapo *priv = nft_set_priv(set);
> +	struct nft_pipapo_scratch *scratch;
>  	unsigned long *res_map, *fill_map;
>  	u8 genmask = nft_genmask_cur(net);
>  	const u8 *rp = (const u8 *)key;
> @@ -422,15 +420,17 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
>  
>  	local_bh_disable();
>  
> -	map_index = raw_cpu_read(nft_pipapo_scratch_index);
> -
>  	m = rcu_dereference(priv->match);
>  
>  	if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
>  		goto out;
>  
> -	res_map  = *raw_cpu_ptr(m->scratch) + (map_index ? m->bsize_max : 0);
> -	fill_map = *raw_cpu_ptr(m->scratch) + (map_index ? 0 : m->bsize_max);
> +	scratch = *raw_cpu_ptr(m->scratch);
> +
> +	map_index = scratch->map_index;
> +
> +	res_map  = scratch->map + (map_index ? m->bsize_max : 0);
> +	fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
>  
>  	memset(res_map, 0xff, m->bsize_max * sizeof(*res_map));
>  
> @@ -460,7 +460,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
>  		b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt,
>  				  last);
>  		if (b < 0) {
> -			raw_cpu_write(nft_pipapo_scratch_index, map_index);
> +			scratch->map_index = map_index;
>  			local_bh_enable();
>  
>  			return false;
> @@ -477,7 +477,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
>  			 * current inactive bitmap is clean and can be reused as
>  			 * *next* bitmap (not initial) for the next packet.
>  			 */
> -			raw_cpu_write(nft_pipapo_scratch_index, map_index);
> +			scratch->map_index = map_index;
>  			local_bh_enable();
>  
>  			return true;
> @@ -1121,12 +1121,12 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
>  	int i;
>  
>  	for_each_possible_cpu(i) {
> -		unsigned long *scratch;
> +		struct nft_pipapo_scratch *scratch;
>  #ifdef NFT_PIPAPO_ALIGN
> -		unsigned long *scratch_aligned;
> +		void *scratch_aligned;
>  #endif
> -
> -		scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2 +
> +		scratch = kzalloc_node(struct_size(scratch, map,
> +						   bsize_max * 2) +
>  				       NFT_PIPAPO_ALIGN_HEADROOM,
>  				       GFP_KERNEL, cpu_to_node(i));
>  		if (!scratch) {
> @@ -1145,7 +1145,9 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
>  		*per_cpu_ptr(clone->scratch, i) = scratch;
>  
>  #ifdef NFT_PIPAPO_ALIGN
> -		scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
> +		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
> +		/* Need to align ->map start, not start of structure itself */
> +		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);

This should be fine with the current version of
NFT_PIPAPO_ALIGN_HEADROOM, but it took me quite some pondering, reasoning
below if you feel like double checking.

Let's say ARCH_KMALLOC_MINALIGN is 4, NFT_PIPAPO_LT_ALIGN is 32, we
need 100 bytes for the scratch map (existing implementation), and the
address, x, we get from kzalloc_node() is k + 28, with k aligned to 32
bytes.

Then, we'll ask to allocate 32 - 4 = 28 extra bytes
(NFT_PIPAPO_ALIGN_HEADROOM), that is, 128 bytes, and we'll start using
the area at x + 4 (aligned to 32 bytes), with 124 bytes in front of us.

With this change, and the current NFT_PIPAPO_ALIGN_HEADROOM, we'll
allocate (usually) 4 bytes more, 132 bytes, and we'll start using the
area at x + 4 anyway, with 128 bytes in front of us, and we could have
asked to allocate less, which made me think for a moment that
NFT_PIPAPO_ALIGN_HEADROOM needed to be adjusted too.

However, 'scratch' at k + 28 is not the worst case: k + 32 is. At that
point, we need anyway to ask for 28 extra bytes, because 'map_index'
will force us to start from x + 32.

>  		*per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
>  #endif
>  	}
> @@ -2132,7 +2134,7 @@ static int nft_pipapo_init(const struct nft_set *set,
>  	m->field_count = field_count;
>  	m->bsize_max = 0;
>  
> -	m->scratch = alloc_percpu(unsigned long *);
> +	m->scratch = alloc_percpu(struct nft_pipapo_scratch *);
>  	if (!m->scratch) {
>  		err = -ENOMEM;
>  		goto out_scratch;
> @@ -2141,7 +2143,7 @@ static int nft_pipapo_init(const struct nft_set *set,
>  		*per_cpu_ptr(m->scratch, i) = NULL;
>  
>  #ifdef NFT_PIPAPO_ALIGN
> -	m->scratch_aligned = alloc_percpu(unsigned long *);
> +	m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *);
>  	if (!m->scratch_aligned) {
>  		err = -ENOMEM;
>  		goto out_free;
> diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
> index 1040223da5fa..144b186c4caf 100644
> --- a/net/netfilter/nft_set_pipapo.h
> +++ b/net/netfilter/nft_set_pipapo.h
> @@ -130,6 +130,16 @@ struct nft_pipapo_field {
>  	union nft_pipapo_map_bucket *mt;
>  };
>  
> +/**
> + * struct nft_pipapo_scratch - percpu data used for lookup and matching
> + * @map_index	Current working bitmap index, toggled between field matches
> + * @map		store partial matching results during lookup
> + */
> +struct nft_pipapo_scratch {
> +	u8 map_index;
> +	unsigned long map[];
> +};
> +
>  /**
>   * struct nft_pipapo_match - Data used for lookup and matching
>   * @field_count		Amount of fields in set
> @@ -142,9 +152,9 @@ struct nft_pipapo_field {
>  struct nft_pipapo_match {
>  	int field_count;
>  #ifdef NFT_PIPAPO_ALIGN
> -	unsigned long * __percpu *scratch_aligned;
> +	struct nft_pipapo_scratch * __percpu *scratch_aligned;
>  #endif
> -	unsigned long * __percpu *scratch;
> +	struct nft_pipapo_scratch * __percpu *scratch;
>  	size_t bsize_max;
>  	struct rcu_head rcu;
>  	struct nft_pipapo_field f[] __counted_by(field_count);
> diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
> index 52e0d026d30a..8cad7b2e759d 100644
> --- a/net/netfilter/nft_set_pipapo_avx2.c
> +++ b/net/netfilter/nft_set_pipapo_avx2.c
> @@ -71,9 +71,6 @@
>  #define NFT_PIPAPO_AVX2_ZERO(reg)					\
>  	asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
>  
> -/* Current working bitmap index, toggled between field matches */
> -static DEFINE_PER_CPU(bool, nft_pipapo_avx2_scratch_index);
> -
>  /**
>   * nft_pipapo_avx2_prepare() - Prepare before main algorithm body
>   *
> @@ -1120,11 +1117,12 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
>  			    const u32 *key, const struct nft_set_ext **ext)
>  {
>  	struct nft_pipapo *priv = nft_set_priv(set);
> -	unsigned long *res, *fill, *scratch;
> +	struct nft_pipapo_scratch *scratch;
>  	u8 genmask = nft_genmask_cur(net);
>  	const u8 *rp = (const u8 *)key;
>  	struct nft_pipapo_match *m;
>  	struct nft_pipapo_field *f;
> +	unsigned long *res, *fill;
>  	bool map_index;
>  	int i, ret = 0;
>  
> @@ -1146,10 +1144,11 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
>  		kernel_fpu_end();
>  		return false;
>  	}
> -	map_index = raw_cpu_read(nft_pipapo_avx2_scratch_index);
>  
> -	res  = scratch + (map_index ? m->bsize_max : 0);
> -	fill = scratch + (map_index ? 0 : m->bsize_max);
> +	map_index = scratch->map_index;
> +
> +	res = scratch->map + (map_index ? m->bsize_max : 0);
> +	fill = scratch->map + (map_index ? 0 : m->bsize_max);

Nit (if you respin for any reason): the existing version had one extra
space to highlight the symmetry between 'res' and 'fill' in the right
operand. You kept that in nft_pipapo_lookup(), but dropped it here.

>  
>  	/* Starting map doesn't need to be set for this implementation */
>  
> @@ -1221,7 +1220,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
>  
>  out:
>  	if (i % 2)
> -		raw_cpu_write(nft_pipapo_avx2_scratch_index, !map_index);
> +		scratch->map_index = !map_index;
>  	kernel_fpu_end();
>  
>  	return ret >= 0;

-- 
Stefano


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
  2024-02-07 14:15   ` Stefano Brivio
@ 2024-02-07 15:23     ` Florian Westphal
  2024-02-07 17:27       ` Stefano Brivio
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2024-02-07 15:23 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Florian Westphal, netfilter-devel

Stefano Brivio <sbrivio@redhat.com> wrote:
> > This isn't reliable.
> 
> Uh oh. In hindsight, this sounds so obvious now...

Thats a recurring theme with a lot of bugs.
So, no, it was not obvious.

> > There can be multiple sets and we can't be sure that the upper
> > and lower half of all set scratch map is always in sync (lookups
> > can be conditional), so one set might have swapped, but other might
> > not have been queried.
> > 
> > Thus we need to keep the index per-set-and-cpu, just like the
> > scratchpad.
> > 
> > Note that this bug fix is incomplete, there is a related issue.
> > 
> > avx2 and normal implementation might use slightly different areas of the
> > map array space due to the avx2 alignment requirements, so
> > m->scratch (generic/fallback implementation) and ->scratch_aligned
> > (avx) may partially overlap. scratch and scratch_aligned are not distinct
> > objects, the latter is just the aligned address of the former.
> > 
> > After this change, write to scratch_align->map_index may write to
> > scratch->map, so this issue becomes more prominent, we can set to 1
> > a bit in the supposedly-all-zero area of scratch->map[].
> > 
> > A followup patch will remove the scratch_aligned and makes generic and
> > avx code use the same (aligned) area.
> > 
> > Its done in a separate change to ease review.
> > 
> > Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> 
> Minus one nit (not worth respinning) and one half-doubt below,
> 
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> 
> ...I'm still reviewing the rest.

Thanks for reviewing.

> >  #ifdef NFT_PIPAPO_ALIGN
> > -		scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
> > +		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
> > +		/* Need to align ->map start, not start of structure itself */
> > +		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
> 
> This should be fine with the current version of
> NFT_PIPAPO_ALIGN_HEADROOM, but it took me quite some pondering, reasoning
> below if you feel like double checking.

Good point.

> Let's say ARCH_KMALLOC_MINALIGN is 4, NFT_PIPAPO_LT_ALIGN is 32, we
> need 100 bytes for the scratch map (existing implementation), and the
> address, x, we get from kzalloc_node() is k + 28, with k aligned to 32
> bytes.
> Then, we'll ask to allocate 32 - 4 = 28 extra bytes
> (NFT_PIPAPO_ALIGN_HEADROOM), that is, 128 bytes, and we'll start using
> the area at x + 4 (aligned to 32 bytes), with 124 bytes in front of us.
>
> With this change, and the current NFT_PIPAPO_ALIGN_HEADROOM, we'll
> allocate (usually) 4 bytes more, 132 bytes, and we'll start using the
> area at x + 4 anyway, with 128 bytes in front of us, and we could have
> asked to allocate less, which made me think for a moment that
> NFT_PIPAPO_ALIGN_HEADROOM needed to be adjusted too.

We'll allocate sizeof(long) more space (map_index), which is 4 bytes in
your example.

> However, 'scratch' at k + 28 is not the worst case: k + 32 is. At that
> point, we need anyway to ask for 28 extra bytes, because 'map_index'
> will force us to start from x + 32.

Wait.  k + 32 is really "k" for old code: slab gave us an aligned
address.

In new code, k+4 is the perfect "already-aligned" address where we would
'no-op' the address on a ARCH_KMALLOC_MINALIGN == 4 system.

Lets assume we get address k, and that address is the worst
possible aligned one (with minalign 4), so we align
(k + 4) (&scratch->map[0]), then subtract the index/struct head,
which means we store (align(k+4) - 4), which would be 28.

Worst case aligned value allocator can provide for new code
is k or k + 32 (make no difference):

Lets say address we got from allocator is 0x4:

NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 32, we store 28
as start of struct, so ->map[0] is at address 32.

Lets say address we got from allocator is 0x20 (32):
NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 64, we store 60
as start of struct, so ->map[0] at 64.

In both cases ALIGN() ate 28 bytes of the allocation, which we accounted
for as NFT_PIPAPO_ALIGN_HEADROOM.

Maybe thats what you were saying.  I could try to add/expand the
comments here for the alignment calculations.

> > -	res  = scratch + (map_index ? m->bsize_max : 0);
> > -	fill = scratch + (map_index ? 0 : m->bsize_max);
> > +	map_index = scratch->map_index;
> > +
> > +	res = scratch->map + (map_index ? m->bsize_max : 0);
> > +	fill = scratch->map + (map_index ? 0 : m->bsize_max);
> 
> Nit (if you respin for any reason): the existing version had one extra
> space to highlight the symmetry between 'res' and 'fill' in the right
> operand. You kept that in nft_pipapo_lookup(), but dropped it here.

Oh, indeed, i'll fix this up.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
  2024-02-07 15:23     ` Florian Westphal
@ 2024-02-07 17:27       ` Stefano Brivio
  2024-02-07 19:24         ` Florian Westphal
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Brivio @ 2024-02-07 17:27 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, 7 Feb 2024 16:23:28 +0100
Florian Westphal <fw@strlen.de> wrote:

> Stefano Brivio <sbrivio@redhat.com> wrote:
> > > This isn't reliable.  
> > 
> > Uh oh. In hindsight, this sounds so obvious now...  
> 
> Thats a recurring theme with a lot of bugs.
> So, no, it was not obvious.
> 
> > > There can be multiple sets and we can't be sure that the upper
> > > and lower half of all set scratch map is always in sync (lookups
> > > can be conditional), so one set might have swapped, but other might
> > > not have been queried.
> > > 
> > > Thus we need to keep the index per-set-and-cpu, just like the
> > > scratchpad.
> > > 
> > > Note that this bug fix is incomplete, there is a related issue.
> > > 
> > > avx2 and normal implementation might use slightly different areas of the
> > > map array space due to the avx2 alignment requirements, so
> > > m->scratch (generic/fallback implementation) and ->scratch_aligned
> > > (avx) may partially overlap. scratch and scratch_aligned are not distinct
> > > objects, the latter is just the aligned address of the former.
> > > 
> > > After this change, write to scratch_align->map_index may write to
> > > scratch->map, so this issue becomes more prominent, we can set to 1
> > > a bit in the supposedly-all-zero area of scratch->map[].
> > > 
> > > A followup patch will remove the scratch_aligned and makes generic and
> > > avx code use the same (aligned) area.
> > > 
> > > Its done in a separate change to ease review.
> > > 
> > > Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
> > > Signed-off-by: Florian Westphal <fw@strlen.de>  
> > 
> > Minus one nit (not worth respinning) and one half-doubt below,
> > 
> > Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> > 
> > ...I'm still reviewing the rest.  
> 
> Thanks for reviewing.
> 
> > >  #ifdef NFT_PIPAPO_ALIGN
> > > -		scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
> > > +		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
> > > +		/* Need to align ->map start, not start of structure itself */
> > > +		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);  
> > 
> > This should be fine with the current version of
> > NFT_PIPAPO_ALIGN_HEADROOM, but it took me quite some pondering, reasoning
> > below if you feel like double checking.  
> 
> Good point.
> 
> > Let's say ARCH_KMALLOC_MINALIGN is 4, NFT_PIPAPO_LT_ALIGN is 32, we
> > need 100 bytes for the scratch map (existing implementation), and the
> > address, x, we get from kzalloc_node() is k + 28, with k aligned to 32
> > bytes.
> > Then, we'll ask to allocate 32 - 4 = 28 extra bytes
> > (NFT_PIPAPO_ALIGN_HEADROOM), that is, 128 bytes, and we'll start using
> > the area at x + 4 (aligned to 32 bytes), with 124 bytes in front of us.
> >
> > With this change, and the current NFT_PIPAPO_ALIGN_HEADROOM, we'll
> > allocate (usually) 4 bytes more, 132 bytes, and we'll start using the
> > area at x + 4 anyway, with 128 bytes in front of us, and we could have
> > asked to allocate less, which made me think for a moment that
> > NFT_PIPAPO_ALIGN_HEADROOM needed to be adjusted too.  
> 
> We'll allocate sizeof(long) more space (map_index), which is 4 bytes in
> your example.

Oh, right, it could be 8 bytes too. Let's stick to 4 for simplicity.

> > However, 'scratch' at k + 28 is not the worst case: k + 32 is. At that
> > point, we need anyway to ask for 28 extra bytes, because 'map_index'
> > will force us to start from x + 32.  
> 
> Wait.  k + 32 is really "k" for old code: slab gave us an aligned
> address.

Yes, I meant that for the *new* code: k + 32 mod 32 (that is, k) is the
worst case for the new code.

> In new code, k+4 is the perfect "already-aligned" address where we would
> 'no-op' the address on a ARCH_KMALLOC_MINALIGN == 4 system.

Isn't the already aligned-address k - 4, that is, k + 28? With k + 4,
we would have &scratch->map[0] at k + 8. But anyway:

> Lets assume we get address k, and that address is the worst
> possible aligned one (with minalign 4), so we align
> (k + 4) (&scratch->map[0]), then subtract the index/struct head,
> which means we store (align(k+4) - 4), which would be 28.
> 
> Worst case aligned value allocator can provide for new code
> is k or k + 32 (make no difference):
> 
> Lets say address we got from allocator is 0x4:
> 
> NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 32, we store 28
> as start of struct, so ->map[0] is at address 32.
> 
> Lets say address we got from allocator is 0x20 (32):
> NFT_PIPAPO_LT_ALIGN(&scratch->map); -> aligned to 64, we store 60
> as start of struct, so ->map[0] at 64.
> 
> In both cases ALIGN() ate 28 bytes of the allocation, which we accounted
> for as NFT_PIPAPO_ALIGN_HEADROOM.
> 
> Maybe thats what you were saying.  I could try to add/expand the
> comments here for the alignment calculations.

...yes, the rest is exactly what I meant. I'm not really satisfied of
the paragraph below but maybe something on the lines of:

	/* Align &scratch->map (not the struct itself): the extra
	 * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() above
	 * guarantee we can waste up to those bytes in order to align the map
	 * field regardless of its offset within the struct.
	 */

-- 
Stefano


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  2024-02-06 12:23 ` [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Florian Westphal
@ 2024-02-07 17:29   ` Stefano Brivio
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Brivio @ 2024-02-07 17:29 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue,  6 Feb 2024 13:23:07 +0100
Florian Westphal <fw@strlen.de> wrote:

> After next patch simple kfree() is not enough anymore, so add
> a helper for it.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nft_set_pipapo.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
> index a8aa915f3f0b..3d308e31b048 100644
> --- a/net/netfilter/nft_set_pipapo.c
> +++ b/net/netfilter/nft_set_pipapo.c
> @@ -1108,6 +1108,19 @@ static void pipapo_map(struct nft_pipapo_match *m,
>  		f->mt[map[i].to + j].e = e;
>  }
>  
> +static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu)

Almost everything else here has kerneldoc-style comments, perhaps
(already accounting for 3/3):

/**
 * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address
 * @m:		Matching data
 * @cpu:	CPU number
 */

Other than this,

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer
  2024-02-06 12:23 ` [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer Florian Westphal
@ 2024-02-07 17:29   ` Stefano Brivio
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Brivio @ 2024-02-07 17:29 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Tue,  6 Feb 2024 13:23:08 +0100
Florian Westphal <fw@strlen.de> wrote:

> use ->scratch for both avx2 and the generic implementation.
> 
> After previous change the scratch->map member is always aligned properly
> for AVX2, so we can just use scratch->map in AVX2 too.
> 
> The alignoff delta is stored in the scratchpad so we can reconstruct
> the correct address to free the area again.

That's a nice simplification, thanks.

> Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
> Signed-off-by: Florian Westphal <fw@strlen.de>

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>

-- 
Stefano


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps
  2024-02-07 17:27       ` Stefano Brivio
@ 2024-02-07 19:24         ` Florian Westphal
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2024-02-07 19:24 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: Florian Westphal, netfilter-devel

Stefano Brivio <sbrivio@redhat.com> wrote:
> > In new code, k+4 is the perfect "already-aligned" address where we would
> > 'no-op' the address on a ARCH_KMALLOC_MINALIGN == 4 system.
> 
> Isn't the already aligned-address k - 4, that is, k + 28? With k + 4,
> we would have &scratch->map[0] at k + 8. But anyway:

Yes, k-4 (k+28).

[..]

> > Maybe thats what you were saying.  I could try to add/expand the
> > comments here for the alignment calculations.
> 
> ...yes, the rest is exactly what I meant. I'm not really satisfied of
> the paragraph below but maybe something on the lines of:
> 
> 	/* Align &scratch->map (not the struct itself): the extra
> 	 * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node() above
> 	 * guarantee we can waste up to those bytes in order to align the map
> 	 * field regardless of its offset within the struct.
> 	 */

Thanks, thats good, I'll use that.

I think I'll also add a BUILD_BUG_ON to assert map
offset <= NFT_PIPAPO_ALIGN_HEADROOM, just in case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-07 19:24 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-06 12:23 [PATCH nf 0/3] netfilter: nft_set_pipapo: map_index must be per set Florian Westphal
2024-02-06 12:23 ` [PATCH nf 1/3] netfilter: nft_set_pipapo: store index in scratch maps Florian Westphal
2024-02-07 14:15   ` Stefano Brivio
2024-02-07 15:23     ` Florian Westphal
2024-02-07 17:27       ` Stefano Brivio
2024-02-07 19:24         ` Florian Westphal
2024-02-06 12:23 ` [PATCH nf 2/3] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Florian Westphal
2024-02-07 17:29   ` Stefano Brivio
2024-02-06 12:23 ` [PATCH nf 3/3] netfilter: nft_set_pipapo: remove scratch_aligned pointer Florian Westphal
2024-02-07 17:29   ` Stefano Brivio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.