All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net 0/5] Netfilter fixes for net
@ 2023-08-10  7:08 Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk Pablo Neira Ayuso
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10  7:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, stable

Hi,

The following patchset contains Netfilter fixes for net.

The existing attempt to resolve races between control plane and GC work
is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places
forgot to call nft_set_elem_mark_busy(), leading to double-deactivation
of elements.

This series contains the following patches:

1) Do not skip expired elements during walk otherwise elements might
   never decrement the reference counter on data, leading to memleak.

2) Add a GC transaction API to replace the former attempt to deal with
   races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT
   on elements and it creates a GC transaction to remove the expired
   elements, GC transaction could abort in case of interference with
   control plane and retried later (GC async). Set backends such as
   rbtree and pipapo also perform GC from control plane (GC sync), in
   such case, element deactivation and removal is safe because mutex
   is held then collected elements are released via call_rcu().

3) Adapt existing set backends to use the GC transaction API.

4) Update rhash set backend to set on _DEAD bit to report deleted
   elements from datapath for GC.

5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT.

Florian Westphal (1):
  netfilter: nf_tables: don't skip expired elements during walk

Pablo Neira Ayuso (4):
  netfilter: nf_tables: GC transaction API to avoid race with control plane
  netfilter: nf_tables: adapt set backend to use GC transaction API
  netfilter: nft_set_hash: mark set element as dead when deleting from packet path
  netfilter: nf_tables: remove busy mark and gc batch API

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-08-10

Thanks.

----------------------------------------------------------------

The following changes since commit c5ccff70501d92db445a135fa49cf9bc6b98c444:

  Merge branch 'net-sched-bind-logic-fixes-for-cls_fw-cls_u32-and-cls_route' (2023-07-31 20:10:39 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-08-10

for you to fetch changes up to a2dd0233cbc4d8a0abb5f64487487ffc9265beb5:

  netfilter: nf_tables: remove busy mark and gc batch API (2023-08-10 08:25:27 +0200)

----------------------------------------------------------------
netfilter pull request 23-08-10

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: nf_tables: don't skip expired elements during walk

Pablo Neira Ayuso (4):
      netfilter: nf_tables: GC transaction API to avoid race with control plane
      netfilter: nf_tables: adapt set backend to use GC transaction API
      netfilter: nft_set_hash: mark set element as dead when deleting from packet path
      netfilter: nf_tables: remove busy mark and gc batch API

 include/net/netfilter/nf_tables.h | 120 ++++++---------
 net/netfilter/nf_tables_api.c     | 307 ++++++++++++++++++++++++++++++--------
 net/netfilter/nft_set_hash.c      |  85 +++++++----
 net/netfilter/nft_set_pipapo.c    |  66 +++++---
 net/netfilter/nft_set_rbtree.c    | 146 ++++++++++--------
 5 files changed, 476 insertions(+), 248 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
@ 2023-08-10  7:08 ` Pablo Neira Ayuso
  2023-08-10  7:10   ` kernel test robot
  2023-08-10 18:00   ` patchwork-bot+netdevbpf
  2023-08-10  7:08 ` [PATCH net 2/5] netfilter: nf_tables: GC transaction API to avoid race with control plane Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10  7:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, stable

From: Florian Westphal <fw@strlen.de>

There is an asymmetry between commit/abort and preparation phase if the
following conditions are met:

1. set is a verdict map ("1.2.3.4 : jump foo")
2. timeouts are enabled

In this case, following sequence is problematic:

1. element E in set S refers to chain C
2. userspace requests removal of set S
3. kernel does a set walk to decrement chain->use count for all elements
   from preparation phase
4. kernel does another set walk to remove elements from the commit phase
   (or another walk to do a chain->use increment for all elements from
    abort phase)

If E has already expired in 1), it will be ignored during list walk, so its use count
won't have been changed.

Then, when set is culled, ->destroy callback will zap the element via
nf_tables_set_elem_destroy(), but this function is only safe for
elements that have been deactivated earlier from the preparation phase:
lack of earlier deactivate removes the element but leaks the chain use
count, which results in a WARN splat when the chain gets removed later,
plus a leak of the nft_chain structure.

Update pipapo_get() not to skip expired elements, otherwise flush
command reports bogus ENOENT errors.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c  |  4 ++++
 net/netfilter/nft_set_hash.c   |  2 --
 net/netfilter/nft_set_pipapo.c | 18 ++++++++++++------
 net/netfilter/nft_set_rbtree.c |  2 --
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d3c6ecd1f5a6..b4321869e5c6 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -5602,8 +5602,12 @@ static int nf_tables_dump_setelem(const struct nft_ctx *ctx,
 				  const struct nft_set_iter *iter,
 				  struct nft_set_elem *elem)
 {
+	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	struct nft_set_dump_args *args;
 
+	if (nft_set_elem_expired(ext))
+		return 0;
+
 	args = container_of(iter, struct nft_set_dump_args, iter);
 	return nf_tables_fill_setelem(args->skb, set, elem, args->reset);
 }
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 0b73cb0e752f..24caa31fa231 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -278,8 +278,6 @@ static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
 
 		if (iter->count < iter->skip)
 			goto cont;
-		if (nft_set_elem_expired(&he->ext))
-			goto cont;
 		if (!nft_set_elem_active(&he->ext, iter->genmask))
 			goto cont;
 
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 49915a2a58eb..d54784ea465b 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -566,8 +566,7 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 			goto out;
 
 		if (last) {
-			if (nft_set_elem_expired(&f->mt[b].e->ext) ||
-			    (genmask &&
+			if ((genmask &&
 			     !nft_set_elem_active(&f->mt[b].e->ext, genmask)))
 				goto next_match;
 
@@ -601,8 +600,17 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 static void *nft_pipapo_get(const struct net *net, const struct nft_set *set,
 			    const struct nft_set_elem *elem, unsigned int flags)
 {
-	return pipapo_get(net, set, (const u8 *)elem->key.val.data,
-			  nft_genmask_cur(net));
+	struct nft_pipapo_elem *ret;
+
+	ret = pipapo_get(net, set, (const u8 *)elem->key.val.data,
+			 nft_genmask_cur(net));
+	if (IS_ERR(ret))
+		return ret;
+
+	if (nft_set_elem_expired(&ret->ext))
+		return ERR_PTR(-ENOENT);
+
+	return ret;
 }
 
 /**
@@ -2005,8 +2013,6 @@ static void nft_pipapo_walk(const struct nft_ctx *ctx, struct nft_set *set,
 			goto cont;
 
 		e = f->mt[r].e;
-		if (nft_set_elem_expired(&e->ext))
-			goto cont;
 
 		elem.priv = e;
 
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 8d73fffd2d09..39956e5341c9 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -552,8 +552,6 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 
 		if (iter->count < iter->skip)
 			goto cont;
-		if (nft_set_elem_expired(&rbe->ext))
-			goto cont;
 		if (!nft_set_elem_active(&rbe->ext, iter->genmask))
 			goto cont;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net 2/5] netfilter: nf_tables: GC transaction API to avoid race with control plane
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk Pablo Neira Ayuso
@ 2023-08-10  7:08 ` Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 3/5] netfilter: nf_tables: adapt set backend to use GC transaction API Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10  7:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, stable

The set types rhashtable and rbtree use a GC worker to reclaim memory.
From system work queue, in periodic intervals, a scan of the table is
done.

The major caveat here is that the nft transaction mutex is not held.
This causes a race between control plane and GC when they attempt to
delete the same element.

We cannot grab the netlink mutex from the work queue, because the
control plane has to wait for the GC work queue in case the set is to be
removed, so we get following deadlock:

   cpu 1                                cpu2
     GC work                            transaction comes in , lock nft mutex
       `acquire nft mutex // BLOCKS
                                        transaction asks to remove the set
                                        set destruction calls cancel_work_sync()

cancel_work_sync will now block forever, because it is waiting for the
mutex the caller already owns.

This patch adds a new API that deals with garbage collection in two
steps:

1) Lockless GC of expired elements sets on the NFT_SET_ELEM_DEAD_BIT
   so they are not visible via lookup. Annotate current GC sequence in
   the GC transaction. Enqueue GC transaction work as soon as it is
   full. If ruleset is updated, then GC transaction is aborted and
   retried later.

2) GC work grabs the mutex. If GC sequence has changed then this GC
   transaction lost race with control plane, abort it as it contains
   stale references to objects and let GC try again later. If the
   ruleset is intact, then this GC transaction deactivates and removes
   the elements and it uses call_rcu() to destroy elements.

Note that no elements are removed from GC lockless path, the _DEAD bit
is set and pointers are collected. GC catchall does not remove the
elements anymore too. There is a new set->dead flag that is set on to
abort the GC transaction to deal with set->ops->destroy() path which
removes the remaining elements in the set from commit_release, where no
mutex is held.

To deal with GC when mutex is held, which allows safe deactivate and
removal, add sync GC API which releases the set element object via
call_rcu(). This is used by rbtree and pipapo backends which also
perform garbage collection from control plane path.

Since element removal from sets can happen from control plane and
element garbage collection/timeout, it is necessary to keep the set
structure alive until all elements have been deactivated and destroyed.

We cannot do a cancel_work_sync or flush_work in nft_set_destroy because
its called with the transaction mutex held, but the aforementioned async
work queue might be blocked on the very mutex that nft_set_destroy()
callchain is sitting on.

This gives us the choice of ABBA deadlock or UaF.

To avoid both, add set->refs refcount_t member. The GC API can then
increment the set refcount and release it once the elements have been
free'd.

Set backends are adapted to use the GC transaction API in a follow up
patch entitled:

  ("netfilter: nf_tables: use gc transaction API in set backends")

This is joint work with Florian Westphal.

Fixes: cfed7e1b1f8e ("netfilter: nf_tables: add set garbage collection helpers")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |  64 +++++++-
 net/netfilter/nf_tables_api.c     | 248 ++++++++++++++++++++++++++++--
 2 files changed, 300 insertions(+), 12 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 640441a2f926..7256e9c80477 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -512,6 +512,7 @@ struct nft_set_elem_expr {
  *
  *	@list: table set list node
  *	@bindings: list of set bindings
+ *	@refs: internal refcounting for async set destruction
  *	@table: table this set belongs to
  *	@net: netnamespace this set belongs to
  * 	@name: name of the set
@@ -541,6 +542,7 @@ struct nft_set_elem_expr {
 struct nft_set {
 	struct list_head		list;
 	struct list_head		bindings;
+	refcount_t			refs;
 	struct nft_table		*table;
 	possible_net_t			net;
 	char				*name;
@@ -562,7 +564,8 @@ struct nft_set {
 	struct list_head		pending_update;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
-	u16				flags:14,
+	u16				flags:13,
+					dead:1,
 					genmask:2;
 	u8				klen;
 	u8				dlen;
@@ -1592,6 +1595,32 @@ static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext)
 	clear_bit(NFT_SET_ELEM_BUSY_BIT, word);
 }
 
+#define NFT_SET_ELEM_DEAD_MASK	(1 << 3)
+
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+#define NFT_SET_ELEM_DEAD_BIT	3
+#elif defined(__BIG_ENDIAN_BITFIELD)
+#define NFT_SET_ELEM_DEAD_BIT	(BITS_PER_LONG - BITS_PER_BYTE + 3)
+#else
+#error
+#endif
+
+static inline void nft_set_elem_dead(struct nft_set_ext *ext)
+{
+	unsigned long *word = (unsigned long *)ext;
+
+	BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0);
+	set_bit(NFT_SET_ELEM_DEAD_BIT, word);
+}
+
+static inline int nft_set_elem_is_dead(const struct nft_set_ext *ext)
+{
+	unsigned long *word = (unsigned long *)ext;
+
+	BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0);
+	return test_bit(NFT_SET_ELEM_DEAD_BIT, word);
+}
+
 /**
  *	struct nft_trans - nf_tables object update in transaction
  *
@@ -1732,6 +1761,38 @@ struct nft_trans_flowtable {
 #define nft_trans_flowtable_flags(trans)	\
 	(((struct nft_trans_flowtable *)trans->data)->flags)
 
+#define NFT_TRANS_GC_BATCHCOUNT	256
+
+struct nft_trans_gc {
+	struct list_head	list;
+	struct net		*net;
+	struct nft_set		*set;
+	u32			seq;
+	u8			count;
+	void			*priv[NFT_TRANS_GC_BATCHCOUNT];
+	struct rcu_head		rcu;
+};
+
+struct nft_trans_gc *nft_trans_gc_alloc(struct nft_set *set,
+					unsigned int gc_seq, gfp_t gfp);
+void nft_trans_gc_destroy(struct nft_trans_gc *trans);
+
+struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc,
+					      unsigned int gc_seq, gfp_t gfp);
+void nft_trans_gc_queue_async_done(struct nft_trans_gc *gc);
+
+struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp);
+void nft_trans_gc_queue_sync_done(struct nft_trans_gc *trans);
+
+void nft_trans_gc_elem_add(struct nft_trans_gc *gc, void *priv);
+
+struct nft_trans_gc *nft_trans_gc_catchall(struct nft_trans_gc *gc,
+					   unsigned int gc_seq);
+
+void nft_setelem_data_deactivate(const struct net *net,
+				 const struct nft_set *set,
+				 struct nft_set_elem *elem);
+
 int __init nft_chain_filter_init(void);
 void nft_chain_filter_fini(void);
 
@@ -1758,6 +1819,7 @@ struct nftables_pernet {
 	struct mutex		commit_mutex;
 	u64			table_handle;
 	unsigned int		base_seq;
+	unsigned int		gc_seq;
 };
 
 extern unsigned int nf_tables_net_id;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b4321869e5c6..c28bacb9479b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -31,7 +31,9 @@ static LIST_HEAD(nf_tables_expressions);
 static LIST_HEAD(nf_tables_objects);
 static LIST_HEAD(nf_tables_flowtables);
 static LIST_HEAD(nf_tables_destroy_list);
+static LIST_HEAD(nf_tables_gc_list);
 static DEFINE_SPINLOCK(nf_tables_destroy_list_lock);
+static DEFINE_SPINLOCK(nf_tables_gc_list_lock);
 
 enum {
 	NFT_VALIDATE_SKIP	= 0,
@@ -120,6 +122,9 @@ static void nft_validate_state_update(struct nft_table *table, u8 new_validate_s
 static void nf_tables_trans_destroy_work(struct work_struct *w);
 static DECLARE_WORK(trans_destroy_work, nf_tables_trans_destroy_work);
 
+static void nft_trans_gc_work(struct work_struct *work);
+static DECLARE_WORK(trans_gc_work, nft_trans_gc_work);
+
 static void nft_ctx_init(struct nft_ctx *ctx,
 			 struct net *net,
 			 const struct sk_buff *skb,
@@ -582,10 +587,6 @@ static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type,
 	return __nft_trans_set_add(ctx, msg_type, set, NULL);
 }
 
-static void nft_setelem_data_deactivate(const struct net *net,
-					const struct nft_set *set,
-					struct nft_set_elem *elem);
-
 static int nft_mapelem_deactivate(const struct nft_ctx *ctx,
 				  struct nft_set *set,
 				  const struct nft_set_iter *iter,
@@ -5055,6 +5056,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
 
 	INIT_LIST_HEAD(&set->bindings);
 	INIT_LIST_HEAD(&set->catchall_list);
+	refcount_set(&set->refs, 1);
 	set->table = table;
 	write_pnet(&set->net, net);
 	set->ops = ops;
@@ -5122,6 +5124,14 @@ static void nft_set_catchall_destroy(const struct nft_ctx *ctx,
 	}
 }
 
+static void nft_set_put(struct nft_set *set)
+{
+	if (refcount_dec_and_test(&set->refs)) {
+		kfree(set->name);
+		kvfree(set);
+	}
+}
+
 static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set)
 {
 	int i;
@@ -5134,8 +5144,7 @@ static void nft_set_destroy(const struct nft_ctx *ctx, struct nft_set *set)
 
 	set->ops->destroy(ctx, set);
 	nft_set_catchall_destroy(ctx, set);
-	kfree(set->name);
-	kvfree(set);
+	nft_set_put(set);
 }
 
 static int nf_tables_delset(struct sk_buff *skb, const struct nfnl_info *info,
@@ -6278,7 +6287,8 @@ struct nft_set_ext *nft_set_catchall_lookup(const struct net *net,
 	list_for_each_entry_rcu(catchall, &set->catchall_list, list) {
 		ext = nft_set_elem_ext(set, catchall->elem);
 		if (nft_set_elem_active(ext, genmask) &&
-		    !nft_set_elem_expired(ext))
+		    !nft_set_elem_expired(ext) &&
+		    !nft_set_elem_is_dead(ext))
 			return ext;
 	}
 
@@ -6933,9 +6943,9 @@ static void nft_setelem_data_activate(const struct net *net,
 		nft_use_inc_restore(&(*nft_set_ext_obj(ext))->use);
 }
 
-static void nft_setelem_data_deactivate(const struct net *net,
-					const struct nft_set *set,
-					struct nft_set_elem *elem)
+void nft_setelem_data_deactivate(const struct net *net,
+				 const struct nft_set *set,
+				 struct nft_set_elem *elem)
 {
 	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 
@@ -9418,6 +9428,207 @@ void nft_chain_del(struct nft_chain *chain)
 	list_del_rcu(&chain->list);
 }
 
+static void nft_trans_gc_setelem_remove(struct nft_ctx *ctx,
+					struct nft_trans_gc *trans)
+{
+	void **priv = trans->priv;
+	unsigned int i;
+
+	for (i = 0; i < trans->count; i++) {
+		struct nft_set_elem elem = {
+			.priv = priv[i],
+		};
+
+		nft_setelem_data_deactivate(ctx->net, trans->set, &elem);
+		nft_setelem_remove(ctx->net, trans->set, &elem);
+	}
+}
+
+void nft_trans_gc_destroy(struct nft_trans_gc *trans)
+{
+	nft_set_put(trans->set);
+	put_net(trans->net);
+	kfree(trans);
+}
+
+static void nft_trans_gc_trans_free(struct rcu_head *rcu)
+{
+	struct nft_set_elem elem = {};
+	struct nft_trans_gc *trans;
+	struct nft_ctx ctx = {};
+	unsigned int i;
+
+	trans = container_of(rcu, struct nft_trans_gc, rcu);
+	ctx.net	= read_pnet(&trans->set->net);
+
+	for (i = 0; i < trans->count; i++) {
+		elem.priv = trans->priv[i];
+		if (!nft_setelem_is_catchall(trans->set, &elem))
+			atomic_dec(&trans->set->nelems);
+
+		nf_tables_set_elem_destroy(&ctx, trans->set, elem.priv);
+	}
+
+	nft_trans_gc_destroy(trans);
+}
+
+static bool nft_trans_gc_work_done(struct nft_trans_gc *trans)
+{
+	struct nftables_pernet *nft_net;
+	struct nft_ctx ctx = {};
+
+	nft_net = nft_pernet(trans->net);
+
+	mutex_lock(&nft_net->commit_mutex);
+
+	/* Check for race with transaction, otherwise this batch refers to
+	 * stale objects that might not be there anymore. Skip transaction if
+	 * set has been destroyed from control plane transaction in case gc
+	 * worker loses race.
+	 */
+	if (READ_ONCE(nft_net->gc_seq) != trans->seq || trans->set->dead) {
+		mutex_unlock(&nft_net->commit_mutex);
+		return false;
+	}
+
+	ctx.net = trans->net;
+	ctx.table = trans->set->table;
+
+	nft_trans_gc_setelem_remove(&ctx, trans);
+	mutex_unlock(&nft_net->commit_mutex);
+
+	return true;
+}
+
+static void nft_trans_gc_work(struct work_struct *work)
+{
+	struct nft_trans_gc *trans, *next;
+	LIST_HEAD(trans_gc_list);
+
+	spin_lock(&nf_tables_destroy_list_lock);
+	list_splice_init(&nf_tables_gc_list, &trans_gc_list);
+	spin_unlock(&nf_tables_destroy_list_lock);
+
+	list_for_each_entry_safe(trans, next, &trans_gc_list, list) {
+		list_del(&trans->list);
+		if (!nft_trans_gc_work_done(trans)) {
+			nft_trans_gc_destroy(trans);
+			continue;
+		}
+		call_rcu(&trans->rcu, nft_trans_gc_trans_free);
+	}
+}
+
+struct nft_trans_gc *nft_trans_gc_alloc(struct nft_set *set,
+					unsigned int gc_seq, gfp_t gfp)
+{
+	struct net *net = read_pnet(&set->net);
+	struct nft_trans_gc *trans;
+
+	trans = kzalloc(sizeof(*trans), gfp);
+	if (!trans)
+		return NULL;
+
+	refcount_inc(&set->refs);
+	trans->set = set;
+	trans->net = get_net(net);
+	trans->seq = gc_seq;
+
+	return trans;
+}
+
+void nft_trans_gc_elem_add(struct nft_trans_gc *trans, void *priv)
+{
+	trans->priv[trans->count++] = priv;
+}
+
+static void nft_trans_gc_queue_work(struct nft_trans_gc *trans)
+{
+	spin_lock(&nf_tables_gc_list_lock);
+	list_add_tail(&trans->list, &nf_tables_gc_list);
+	spin_unlock(&nf_tables_gc_list_lock);
+
+	schedule_work(&trans_gc_work);
+}
+
+static int nft_trans_gc_space(struct nft_trans_gc *trans)
+{
+	return NFT_TRANS_GC_BATCHCOUNT - trans->count;
+}
+
+struct nft_trans_gc *nft_trans_gc_queue_async(struct nft_trans_gc *gc,
+					      unsigned int gc_seq, gfp_t gfp)
+{
+	if (nft_trans_gc_space(gc))
+		return gc;
+
+	nft_trans_gc_queue_work(gc);
+
+	return nft_trans_gc_alloc(gc->set, gc_seq, gfp);
+}
+
+void nft_trans_gc_queue_async_done(struct nft_trans_gc *trans)
+{
+	if (trans->count == 0) {
+		nft_trans_gc_destroy(trans);
+		return;
+	}
+
+	nft_trans_gc_queue_work(trans);
+}
+
+struct nft_trans_gc *nft_trans_gc_queue_sync(struct nft_trans_gc *gc, gfp_t gfp)
+{
+	if (WARN_ON_ONCE(!lockdep_commit_lock_is_held(gc->net)))
+		return NULL;
+
+	if (nft_trans_gc_space(gc))
+		return gc;
+
+	call_rcu(&gc->rcu, nft_trans_gc_trans_free);
+
+	return nft_trans_gc_alloc(gc->set, 0, gfp);
+}
+
+void nft_trans_gc_queue_sync_done(struct nft_trans_gc *trans)
+{
+	WARN_ON_ONCE(!lockdep_commit_lock_is_held(trans->net));
+
+	if (trans->count == 0) {
+		nft_trans_gc_destroy(trans);
+		return;
+	}
+
+	call_rcu(&trans->rcu, nft_trans_gc_trans_free);
+}
+
+struct nft_trans_gc *nft_trans_gc_catchall(struct nft_trans_gc *gc,
+					   unsigned int gc_seq)
+{
+	struct nft_set_elem_catchall *catchall;
+	const struct nft_set *set = gc->set;
+	struct nft_set_ext *ext;
+
+	list_for_each_entry_rcu(catchall, &set->catchall_list, list) {
+		ext = nft_set_elem_ext(set, catchall->elem);
+
+		if (!nft_set_elem_expired(ext))
+			continue;
+		if (nft_set_elem_is_dead(ext))
+			goto dead_elem;
+
+		nft_set_elem_dead(ext);
+dead_elem:
+		gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC);
+		if (!gc)
+			return NULL;
+
+		nft_trans_gc_elem_add(gc, catchall->elem);
+	}
+
+	return gc;
+}
+
 static void nf_tables_module_autoload_cleanup(struct net *net)
 {
 	struct nftables_pernet *nft_net = nft_pernet(net);
@@ -9580,11 +9791,11 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 {
 	struct nftables_pernet *nft_net = nft_pernet(net);
 	struct nft_trans *trans, *next;
+	unsigned int base_seq, gc_seq;
 	LIST_HEAD(set_update_list);
 	struct nft_trans_elem *te;
 	struct nft_chain *chain;
 	struct nft_table *table;
-	unsigned int base_seq;
 	LIST_HEAD(adl);
 	int err;
 
@@ -9661,6 +9872,10 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 
 	WRITE_ONCE(nft_net->base_seq, base_seq);
 
+	/* Bump gc counter, it becomes odd, this is the busy mark. */
+	gc_seq = READ_ONCE(nft_net->gc_seq);
+	WRITE_ONCE(nft_net->gc_seq, ++gc_seq);
+
 	/* step 3. Start new generation, rules_gen_X now in use. */
 	net->nft.gencursor = nft_gencursor_next(net);
 
@@ -9768,6 +9983,7 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 			break;
 		case NFT_MSG_DELSET:
 		case NFT_MSG_DESTROYSET:
+			nft_trans_set(trans)->dead = 1;
 			list_del_rcu(&nft_trans_set(trans)->list);
 			nf_tables_set_notify(&trans->ctx, nft_trans_set(trans),
 					     trans->msg_type, GFP_KERNEL);
@@ -9870,6 +10086,8 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb)
 	nft_commit_notify(net, NETLINK_CB(skb).portid);
 	nf_tables_gen_notify(net, skb, NFT_MSG_NEWGEN);
 	nf_tables_commit_audit_log(&adl, nft_net->base_seq);
+
+	WRITE_ONCE(nft_net->gc_seq, ++gc_seq);
 	nf_tables_commit_release(net);
 
 	return 0;
@@ -10919,6 +11137,7 @@ static int __net_init nf_tables_init_net(struct net *net)
 	INIT_LIST_HEAD(&nft_net->notify_list);
 	mutex_init(&nft_net->commit_mutex);
 	nft_net->base_seq = 1;
+	nft_net->gc_seq = 0;
 
 	return 0;
 }
@@ -10947,10 +11166,16 @@ static void __net_exit nf_tables_exit_net(struct net *net)
 	WARN_ON_ONCE(!list_empty(&nft_net->notify_list));
 }
 
+static void nf_tables_exit_batch(struct list_head *net_exit_list)
+{
+	flush_work(&trans_gc_work);
+}
+
 static struct pernet_operations nf_tables_net_ops = {
 	.init		= nf_tables_init_net,
 	.pre_exit	= nf_tables_pre_exit_net,
 	.exit		= nf_tables_exit_net,
+	.exit_batch	= nf_tables_exit_batch,
 	.id		= &nf_tables_net_id,
 	.size		= sizeof(struct nftables_pernet),
 };
@@ -11022,6 +11247,7 @@ static void __exit nf_tables_module_exit(void)
 	nft_chain_filter_fini();
 	nft_chain_route_fini();
 	unregister_pernet_subsys(&nf_tables_net_ops);
+	cancel_work_sync(&trans_gc_work);
 	cancel_work_sync(&trans_destroy_work);
 	rcu_barrier();
 	rhltable_destroy(&nft_objname_ht);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net 3/5] netfilter: nf_tables: adapt set backend to use GC transaction API
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 2/5] netfilter: nf_tables: GC transaction API to avoid race with control plane Pablo Neira Ayuso
@ 2023-08-10  7:08 ` Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 4/5] netfilter: nft_set_hash: mark set element as dead when deleting from packet path Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10  7:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, stable

Use the GC transaction API to replace the old and buggy gc API and the
busy mark approach.

No set elements are removed from async garbage collection anymore,
instead the _DEAD bit is set on so the set element is not visible from
lookup path anymore. Async GC enqueues transaction work that might be
aborted and retried later.

rbtree and pipapo set backends does not set on the _DEAD bit from the
sync GC path since this runs in control plane path where mutex is held.
In this case, set elements are deactivated, removed and then released
via RCU callback, sync GC never fails.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support")
Fixes: 9d0982927e79 ("netfilter: nft_hash: add support for timeouts")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c  |   7 +-
 net/netfilter/nft_set_hash.c   |  77 +++++++++++-------
 net/netfilter/nft_set_pipapo.c |  48 ++++++++---
 net/netfilter/nft_set_rbtree.c | 144 ++++++++++++++++++++-------------
 4 files changed, 173 insertions(+), 103 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index c28bacb9479b..fd4b5da7ac3c 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6380,7 +6380,6 @@ static void nft_setelem_activate(struct net *net, struct nft_set *set,
 
 	if (nft_setelem_is_catchall(set, elem)) {
 		nft_set_elem_change_active(net, set, ext);
-		nft_set_elem_clear_busy(ext);
 	} else {
 		set->ops->activate(net, set, elem);
 	}
@@ -6395,8 +6394,7 @@ static int nft_setelem_catchall_deactivate(const struct net *net,
 
 	list_for_each_entry(catchall, &set->catchall_list, list) {
 		ext = nft_set_elem_ext(set, catchall->elem);
-		if (!nft_is_active(net, ext) ||
-		    nft_set_elem_mark_busy(ext))
+		if (!nft_is_active(net, ext))
 			continue;
 
 		kfree(elem->priv);
@@ -7109,8 +7107,7 @@ static int nft_set_catchall_flush(const struct nft_ctx *ctx,
 
 	list_for_each_entry_rcu(catchall, &set->catchall_list, list) {
 		ext = nft_set_elem_ext(set, catchall->elem);
-		if (!nft_set_elem_active(ext, genmask) ||
-		    nft_set_elem_mark_busy(ext))
+		if (!nft_set_elem_active(ext, genmask))
 			continue;
 
 		elem.priv = catchall->elem;
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 24caa31fa231..2f067e4596b0 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -59,6 +59,8 @@ static inline int nft_rhash_cmp(struct rhashtable_compare_arg *arg,
 
 	if (memcmp(nft_set_ext_key(&he->ext), x->key, x->set->klen))
 		return 1;
+	if (nft_set_elem_is_dead(&he->ext))
+		return 1;
 	if (nft_set_elem_expired(&he->ext))
 		return 1;
 	if (!nft_set_elem_active(&he->ext, x->genmask))
@@ -188,7 +190,6 @@ static void nft_rhash_activate(const struct net *net, const struct nft_set *set,
 	struct nft_rhash_elem *he = elem->priv;
 
 	nft_set_elem_change_active(net, set, &he->ext);
-	nft_set_elem_clear_busy(&he->ext);
 }
 
 static bool nft_rhash_flush(const struct net *net,
@@ -196,12 +197,9 @@ static bool nft_rhash_flush(const struct net *net,
 {
 	struct nft_rhash_elem *he = priv;
 
-	if (!nft_set_elem_mark_busy(&he->ext) ||
-	    !nft_is_active(net, &he->ext)) {
-		nft_set_elem_change_active(net, set, &he->ext);
-		return true;
-	}
-	return false;
+	nft_set_elem_change_active(net, set, &he->ext);
+
+	return true;
 }
 
 static void *nft_rhash_deactivate(const struct net *net,
@@ -218,9 +216,8 @@ static void *nft_rhash_deactivate(const struct net *net,
 
 	rcu_read_lock();
 	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
-	if (he != NULL &&
-	    !nft_rhash_flush(net, set, he))
-		he = NULL;
+	if (he)
+		nft_set_elem_change_active(net, set, &he->ext);
 
 	rcu_read_unlock();
 
@@ -312,25 +309,48 @@ static bool nft_rhash_expr_needs_gc_run(const struct nft_set *set,
 
 static void nft_rhash_gc(struct work_struct *work)
 {
+	struct nftables_pernet *nft_net;
 	struct nft_set *set;
 	struct nft_rhash_elem *he;
 	struct nft_rhash *priv;
-	struct nft_set_gc_batch *gcb = NULL;
 	struct rhashtable_iter hti;
+	struct nft_trans_gc *gc;
+	struct net *net;
+	u32 gc_seq;
 
 	priv = container_of(work, struct nft_rhash, gc_work.work);
 	set  = nft_set_container_of(priv);
+	net  = read_pnet(&set->net);
+	nft_net = nft_pernet(net);
+	gc_seq = READ_ONCE(nft_net->gc_seq);
+
+	gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL);
+	if (!gc)
+		goto done;
 
 	rhashtable_walk_enter(&priv->ht, &hti);
 	rhashtable_walk_start(&hti);
 
 	while ((he = rhashtable_walk_next(&hti))) {
 		if (IS_ERR(he)) {
-			if (PTR_ERR(he) != -EAGAIN)
-				break;
+			if (PTR_ERR(he) != -EAGAIN) {
+				nft_trans_gc_destroy(gc);
+				gc = NULL;
+				goto try_later;
+			}
 			continue;
 		}
 
+		/* Ruleset has been updated, try later. */
+		if (READ_ONCE(nft_net->gc_seq) != gc_seq) {
+			nft_trans_gc_destroy(gc);
+			gc = NULL;
+			goto try_later;
+		}
+
+		if (nft_set_elem_is_dead(&he->ext))
+			goto dead_elem;
+
 		if (nft_set_ext_exists(&he->ext, NFT_SET_EXT_EXPRESSIONS) &&
 		    nft_rhash_expr_needs_gc_run(set, &he->ext))
 			goto needs_gc_run;
@@ -338,26 +358,26 @@ static void nft_rhash_gc(struct work_struct *work)
 		if (!nft_set_elem_expired(&he->ext))
 			continue;
 needs_gc_run:
-		if (nft_set_elem_mark_busy(&he->ext))
-			continue;
+		nft_set_elem_dead(&he->ext);
+dead_elem:
+		gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC);
+		if (!gc)
+			goto try_later;
 
-		gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
-		if (gcb == NULL)
-			break;
-		rhashtable_remove_fast(&priv->ht, &he->node, nft_rhash_params);
-		atomic_dec(&set->nelems);
-		nft_set_gc_batch_add(gcb, he);
+		nft_trans_gc_elem_add(gc, he);
 	}
+
+	gc = nft_trans_gc_catchall(gc, gc_seq);
+
+try_later:
+	/* catchall list iteration requires rcu read side lock. */
 	rhashtable_walk_stop(&hti);
 	rhashtable_walk_exit(&hti);
 
-	he = nft_set_catchall_gc(set);
-	if (he) {
-		gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
-		if (gcb)
-			nft_set_gc_batch_add(gcb, he);
-	}
-	nft_set_gc_batch_complete(gcb);
+	if (gc)
+		nft_trans_gc_queue_async_done(gc);
+
+done:
 	queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
 			   nft_set_gc_interval(set));
 }
@@ -420,7 +440,6 @@ static void nft_rhash_destroy(const struct nft_ctx *ctx,
 	};
 
 	cancel_delayed_work_sync(&priv->gc_work);
-	rcu_barrier();
 	rhashtable_free_and_destroy(&priv->ht, nft_rhash_elem_destroy,
 				    (void *)&rhash_ctx);
 }
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index d54784ea465b..a5b8301afe4a 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1536,16 +1536,34 @@ static void pipapo_drop(struct nft_pipapo_match *m,
 	}
 }
 
+static void nft_pipapo_gc_deactivate(struct net *net, struct nft_set *set,
+				     struct nft_pipapo_elem *e)
+
+{
+	struct nft_set_elem elem = {
+		.priv	= e,
+	};
+
+	nft_setelem_data_deactivate(net, set, &elem);
+}
+
 /**
  * pipapo_gc() - Drop expired entries from set, destroy start and end elements
  * @set:	nftables API set representation
  * @m:		Matching data
  */
-static void pipapo_gc(const struct nft_set *set, struct nft_pipapo_match *m)
+static void pipapo_gc(const struct nft_set *_set, struct nft_pipapo_match *m)
 {
+	struct nft_set *set = (struct nft_set *) _set;
 	struct nft_pipapo *priv = nft_set_priv(set);
+	struct net *net = read_pnet(&set->net);
 	int rules_f0, first_rule = 0;
 	struct nft_pipapo_elem *e;
+	struct nft_trans_gc *gc;
+
+	gc = nft_trans_gc_alloc(set, 0, GFP_KERNEL);
+	if (!gc)
+		return;
 
 	while ((rules_f0 = pipapo_rules_same_key(m->f, first_rule))) {
 		union nft_pipapo_map_bucket rulemap[NFT_PIPAPO_MAX_FIELDS];
@@ -1569,13 +1587,20 @@ static void pipapo_gc(const struct nft_set *set, struct nft_pipapo_match *m)
 		f--;
 		i--;
 		e = f->mt[rulemap[i].to].e;
-		if (nft_set_elem_expired(&e->ext) &&
-		    !nft_set_elem_mark_busy(&e->ext)) {
+
+		/* synchronous gc never fails, there is no need to set on
+		 * NFT_SET_ELEM_DEAD_BIT.
+		 */
+		if (nft_set_elem_expired(&e->ext)) {
 			priv->dirty = true;
-			pipapo_drop(m, rulemap);
 
-			rcu_barrier();
-			nft_set_elem_destroy(set, e, true);
+			gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC);
+			if (!gc)
+				break;
+
+			nft_pipapo_gc_deactivate(net, set, e);
+			pipapo_drop(m, rulemap);
+			nft_trans_gc_elem_add(gc, e);
 
 			/* And check again current first rule, which is now the
 			 * first we haven't checked.
@@ -1585,11 +1610,11 @@ static void pipapo_gc(const struct nft_set *set, struct nft_pipapo_match *m)
 		}
 	}
 
-	e = nft_set_catchall_gc(set);
-	if (e)
-		nft_set_elem_destroy(set, e, true);
-
-	priv->last_gc = jiffies;
+	gc = nft_trans_gc_catchall(gc, 0);
+	if (gc) {
+		nft_trans_gc_queue_sync_done(gc);
+		priv->last_gc = jiffies;
+	}
 }
 
 /**
@@ -1714,7 +1739,6 @@ static void nft_pipapo_activate(const struct net *net,
 		return;
 
 	nft_set_elem_change_active(net, set, &e->ext);
-	nft_set_elem_clear_busy(&e->ext);
 }
 
 /**
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 39956e5341c9..f9d4c8fcbbf8 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -46,6 +46,12 @@ static int nft_rbtree_cmp(const struct nft_set *set,
 		      set->klen);
 }
 
+static bool nft_rbtree_elem_expired(const struct nft_rbtree_elem *rbe)
+{
+	return nft_set_elem_expired(&rbe->ext) ||
+	       nft_set_elem_is_dead(&rbe->ext);
+}
+
 static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set,
 				const u32 *key, const struct nft_set_ext **ext,
 				unsigned int seq)
@@ -80,7 +86,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set
 				continue;
 			}
 
-			if (nft_set_elem_expired(&rbe->ext))
+			if (nft_rbtree_elem_expired(rbe))
 				return false;
 
 			if (nft_rbtree_interval_end(rbe)) {
@@ -98,7 +104,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set
 
 	if (set->flags & NFT_SET_INTERVAL && interval != NULL &&
 	    nft_set_elem_active(&interval->ext, genmask) &&
-	    !nft_set_elem_expired(&interval->ext) &&
+	    !nft_rbtree_elem_expired(interval) &&
 	    nft_rbtree_interval_start(interval)) {
 		*ext = &interval->ext;
 		return true;
@@ -215,6 +221,18 @@ static void *nft_rbtree_get(const struct net *net, const struct nft_set *set,
 	return rbe;
 }
 
+static void nft_rbtree_gc_remove(struct net *net, struct nft_set *set,
+				 struct nft_rbtree *priv,
+				 struct nft_rbtree_elem *rbe)
+{
+	struct nft_set_elem elem = {
+		.priv	= rbe,
+	};
+
+	nft_setelem_data_deactivate(net, set, &elem);
+	rb_erase(&rbe->node, &priv->root);
+}
+
 static int nft_rbtree_gc_elem(const struct nft_set *__set,
 			      struct nft_rbtree *priv,
 			      struct nft_rbtree_elem *rbe,
@@ -222,11 +240,12 @@ static int nft_rbtree_gc_elem(const struct nft_set *__set,
 {
 	struct nft_set *set = (struct nft_set *)__set;
 	struct rb_node *prev = rb_prev(&rbe->node);
+	struct net *net = read_pnet(&set->net);
 	struct nft_rbtree_elem *rbe_prev;
-	struct nft_set_gc_batch *gcb;
+	struct nft_trans_gc *gc;
 
-	gcb = nft_set_gc_batch_check(set, NULL, GFP_ATOMIC);
-	if (!gcb)
+	gc = nft_trans_gc_alloc(set, 0, GFP_ATOMIC);
+	if (!gc)
 		return -ENOMEM;
 
 	/* search for end interval coming before this element.
@@ -244,17 +263,28 @@ static int nft_rbtree_gc_elem(const struct nft_set *__set,
 
 	if (prev) {
 		rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
+		nft_rbtree_gc_remove(net, set, priv, rbe_prev);
 
-		rb_erase(&rbe_prev->node, &priv->root);
-		atomic_dec(&set->nelems);
-		nft_set_gc_batch_add(gcb, rbe_prev);
+		/* There is always room in this trans gc for this element,
+		 * memory allocation never actually happens, hence, the warning
+		 * splat in such case. No need to set NFT_SET_ELEM_DEAD_BIT,
+		 * this is synchronous gc which never fails.
+		 */
+		gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC);
+		if (WARN_ON_ONCE(!gc))
+			return -ENOMEM;
+
+		nft_trans_gc_elem_add(gc, rbe_prev);
 	}
 
-	rb_erase(&rbe->node, &priv->root);
-	atomic_dec(&set->nelems);
+	nft_rbtree_gc_remove(net, set, priv, rbe);
+	gc = nft_trans_gc_queue_sync(gc, GFP_ATOMIC);
+	if (WARN_ON_ONCE(!gc))
+		return -ENOMEM;
+
+	nft_trans_gc_elem_add(gc, rbe);
 
-	nft_set_gc_batch_add(gcb, rbe);
-	nft_set_gc_batch_complete(gcb);
+	nft_trans_gc_queue_sync_done(gc);
 
 	return 0;
 }
@@ -482,7 +512,6 @@ static void nft_rbtree_activate(const struct net *net,
 	struct nft_rbtree_elem *rbe = elem->priv;
 
 	nft_set_elem_change_active(net, set, &rbe->ext);
-	nft_set_elem_clear_busy(&rbe->ext);
 }
 
 static bool nft_rbtree_flush(const struct net *net,
@@ -490,12 +519,9 @@ static bool nft_rbtree_flush(const struct net *net,
 {
 	struct nft_rbtree_elem *rbe = priv;
 
-	if (!nft_set_elem_mark_busy(&rbe->ext) ||
-	    !nft_is_active(net, &rbe->ext)) {
-		nft_set_elem_change_active(net, set, &rbe->ext);
-		return true;
-	}
-	return false;
+	nft_set_elem_change_active(net, set, &rbe->ext);
+
+	return true;
 }
 
 static void *nft_rbtree_deactivate(const struct net *net,
@@ -570,26 +596,40 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
 
 static void nft_rbtree_gc(struct work_struct *work)
 {
-	struct nft_rbtree_elem *rbe, *rbe_end = NULL, *rbe_prev = NULL;
-	struct nft_set_gc_batch *gcb = NULL;
+	struct nft_rbtree_elem *rbe, *rbe_end = NULL;
+	struct nftables_pernet *nft_net;
 	struct nft_rbtree *priv;
+	struct nft_trans_gc *gc;
 	struct rb_node *node;
 	struct nft_set *set;
+	unsigned int gc_seq;
 	struct net *net;
-	u8 genmask;
 
 	priv = container_of(work, struct nft_rbtree, gc_work.work);
 	set  = nft_set_container_of(priv);
 	net  = read_pnet(&set->net);
-	genmask = nft_genmask_cur(net);
+	nft_net = nft_pernet(net);
+	gc_seq  = READ_ONCE(nft_net->gc_seq);
+
+	gc = nft_trans_gc_alloc(set, gc_seq, GFP_KERNEL);
+	if (!gc)
+		goto done;
 
 	write_lock_bh(&priv->lock);
 	write_seqcount_begin(&priv->count);
 	for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) {
+
+		/* Ruleset has been updated, try later. */
+		if (READ_ONCE(nft_net->gc_seq) != gc_seq) {
+			nft_trans_gc_destroy(gc);
+			gc = NULL;
+			goto try_later;
+		}
+
 		rbe = rb_entry(node, struct nft_rbtree_elem, node);
 
-		if (!nft_set_elem_active(&rbe->ext, genmask))
-			continue;
+		if (nft_set_elem_is_dead(&rbe->ext))
+			goto dead_elem;
 
 		/* elements are reversed in the rbtree for historical reasons,
 		 * from highest to lowest value, that is why end element is
@@ -602,46 +642,36 @@ static void nft_rbtree_gc(struct work_struct *work)
 		if (!nft_set_elem_expired(&rbe->ext))
 			continue;
 
-		if (nft_set_elem_mark_busy(&rbe->ext)) {
-			rbe_end = NULL;
+		nft_set_elem_dead(&rbe->ext);
+
+		if (!rbe_end)
 			continue;
-		}
 
-		if (rbe_prev) {
-			rb_erase(&rbe_prev->node, &priv->root);
-			rbe_prev = NULL;
-		}
-		gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
-		if (!gcb)
-			break;
+		nft_set_elem_dead(&rbe_end->ext);
 
-		atomic_dec(&set->nelems);
-		nft_set_gc_batch_add(gcb, rbe);
-		rbe_prev = rbe;
+		gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC);
+		if (!gc)
+			goto try_later;
 
-		if (rbe_end) {
-			atomic_dec(&set->nelems);
-			nft_set_gc_batch_add(gcb, rbe_end);
-			rb_erase(&rbe_end->node, &priv->root);
-			rbe_end = NULL;
-		}
-		node = rb_next(node);
-		if (!node)
-			break;
+		nft_trans_gc_elem_add(gc, rbe_end);
+		rbe_end = NULL;
+dead_elem:
+		gc = nft_trans_gc_queue_async(gc, gc_seq, GFP_ATOMIC);
+		if (!gc)
+			goto try_later;
+
+		nft_trans_gc_elem_add(gc, rbe);
 	}
-	if (rbe_prev)
-		rb_erase(&rbe_prev->node, &priv->root);
+
+	gc = nft_trans_gc_catchall(gc, gc_seq);
+
+try_later:
 	write_seqcount_end(&priv->count);
 	write_unlock_bh(&priv->lock);
 
-	rbe = nft_set_catchall_gc(set);
-	if (rbe) {
-		gcb = nft_set_gc_batch_check(set, gcb, GFP_ATOMIC);
-		if (gcb)
-			nft_set_gc_batch_add(gcb, rbe);
-	}
-	nft_set_gc_batch_complete(gcb);
-
+	if (gc)
+		nft_trans_gc_queue_async_done(gc);
+done:
 	queue_delayed_work(system_power_efficient_wq, &priv->gc_work,
 			   nft_set_gc_interval(set));
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net 4/5] netfilter: nft_set_hash: mark set element as dead when deleting from packet path
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2023-08-10  7:08 ` [PATCH net 3/5] netfilter: nf_tables: adapt set backend to use GC transaction API Pablo Neira Ayuso
@ 2023-08-10  7:08 ` Pablo Neira Ayuso
  2023-08-10  7:08 ` [PATCH net 5/5] netfilter: nf_tables: remove busy mark and gc batch API Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10  7:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, stable

Set on the NFT_SET_ELEM_DEAD_BIT flag on this element, instead of
performing element removal which might race with an ongoing transaction.
Enable gc when dynamic flag is set on since dynset deletion requires
garbage collection after this patch.

Fixes: d0a8d877da97 ("netfilter: nft_dynset: support for element deletion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_hash.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 2f067e4596b0..cef5df846000 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -249,7 +249,9 @@ static bool nft_rhash_delete(const struct nft_set *set,
 	if (he == NULL)
 		return false;
 
-	return rhashtable_remove_fast(&priv->ht, &he->node, nft_rhash_params) == 0;
+	nft_set_elem_dead(&he->ext);
+
+	return true;
 }
 
 static void nft_rhash_walk(const struct nft_ctx *ctx, struct nft_set *set,
@@ -412,7 +414,7 @@ static int nft_rhash_init(const struct nft_set *set,
 		return err;
 
 	INIT_DEFERRABLE_WORK(&priv->gc_work, nft_rhash_gc);
-	if (set->flags & NFT_SET_TIMEOUT)
+	if (set->flags & (NFT_SET_TIMEOUT | NFT_SET_EVAL))
 		nft_rhash_gc_init(set);
 
 	return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net 5/5] netfilter: nf_tables: remove busy mark and gc batch API
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2023-08-10  7:08 ` [PATCH net 4/5] netfilter: nft_set_hash: mark set element as dead when deleting from packet path Pablo Neira Ayuso
@ 2023-08-10  7:08 ` Pablo Neira Ayuso
  2023-08-10  7:49 ` [PATCH net 0/5] Netfilter fixes for net Greg KH
  2023-08-10 17:46 ` Jakub Kicinski
  6 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10  7:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, stable

Ditch it, it has been replace it by the GC transaction API and it has no
clients anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h | 98 +------------------------------
 net/netfilter/nf_tables_api.c     | 48 +--------------
 2 files changed, 4 insertions(+), 142 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 7256e9c80477..35870858ddf2 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -599,7 +599,6 @@ struct nft_set *nft_set_lookup_global(const struct net *net,
 
 struct nft_set_ext *nft_set_catchall_lookup(const struct net *net,
 					    const struct nft_set *set);
-void *nft_set_catchall_gc(const struct nft_set *set);
 
 static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
 {
@@ -816,62 +815,6 @@ void nft_set_elem_destroy(const struct nft_set *set, void *elem,
 void nf_tables_set_elem_destroy(const struct nft_ctx *ctx,
 				const struct nft_set *set, void *elem);
 
-/**
- *	struct nft_set_gc_batch_head - nf_tables set garbage collection batch
- *
- *	@rcu: rcu head
- *	@set: set the elements belong to
- *	@cnt: count of elements
- */
-struct nft_set_gc_batch_head {
-	struct rcu_head			rcu;
-	const struct nft_set		*set;
-	unsigned int			cnt;
-};
-
-#define NFT_SET_GC_BATCH_SIZE	((PAGE_SIZE -				  \
-				  sizeof(struct nft_set_gc_batch_head)) / \
-				 sizeof(void *))
-
-/**
- *	struct nft_set_gc_batch - nf_tables set garbage collection batch
- *
- * 	@head: GC batch head
- * 	@elems: garbage collection elements
- */
-struct nft_set_gc_batch {
-	struct nft_set_gc_batch_head	head;
-	void				*elems[NFT_SET_GC_BATCH_SIZE];
-};
-
-struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set,
-						gfp_t gfp);
-void nft_set_gc_batch_release(struct rcu_head *rcu);
-
-static inline void nft_set_gc_batch_complete(struct nft_set_gc_batch *gcb)
-{
-	if (gcb != NULL)
-		call_rcu(&gcb->head.rcu, nft_set_gc_batch_release);
-}
-
-static inline struct nft_set_gc_batch *
-nft_set_gc_batch_check(const struct nft_set *set, struct nft_set_gc_batch *gcb,
-		       gfp_t gfp)
-{
-	if (gcb != NULL) {
-		if (gcb->head.cnt + 1 < ARRAY_SIZE(gcb->elems))
-			return gcb;
-		nft_set_gc_batch_complete(gcb);
-	}
-	return nft_set_gc_batch_alloc(set, gfp);
-}
-
-static inline void nft_set_gc_batch_add(struct nft_set_gc_batch *gcb,
-					void *elem)
-{
-	gcb->elems[gcb->head.cnt++] = elem;
-}
-
 struct nft_expr_ops;
 /**
  *	struct nft_expr_type - nf_tables expression type
@@ -1560,47 +1503,12 @@ static inline void nft_set_elem_change_active(const struct net *net,
 
 #endif /* IS_ENABLED(CONFIG_NF_TABLES) */
 
-/*
- * We use a free bit in the genmask field to indicate the element
- * is busy, meaning it is currently being processed either by
- * the netlink API or GC.
- *
- * Even though the genmask is only a single byte wide, this works
- * because the extension structure if fully constant once initialized,
- * so there are no non-atomic write accesses unless it is already
- * marked busy.
- */
-#define NFT_SET_ELEM_BUSY_MASK	(1 << 2)
-
-#if defined(__LITTLE_ENDIAN_BITFIELD)
-#define NFT_SET_ELEM_BUSY_BIT	2
-#elif defined(__BIG_ENDIAN_BITFIELD)
-#define NFT_SET_ELEM_BUSY_BIT	(BITS_PER_LONG - BITS_PER_BYTE + 2)
-#else
-#error
-#endif
-
-static inline int nft_set_elem_mark_busy(struct nft_set_ext *ext)
-{
-	unsigned long *word = (unsigned long *)ext;
-
-	BUILD_BUG_ON(offsetof(struct nft_set_ext, genmask) != 0);
-	return test_and_set_bit(NFT_SET_ELEM_BUSY_BIT, word);
-}
-
-static inline void nft_set_elem_clear_busy(struct nft_set_ext *ext)
-{
-	unsigned long *word = (unsigned long *)ext;
-
-	clear_bit(NFT_SET_ELEM_BUSY_BIT, word);
-}
-
-#define NFT_SET_ELEM_DEAD_MASK	(1 << 3)
+#define NFT_SET_ELEM_DEAD_MASK	(1 << 2)
 
 #if defined(__LITTLE_ENDIAN_BITFIELD)
-#define NFT_SET_ELEM_DEAD_BIT	3
+#define NFT_SET_ELEM_DEAD_BIT	2
 #elif defined(__BIG_ENDIAN_BITFIELD)
-#define NFT_SET_ELEM_DEAD_BIT	(BITS_PER_LONG - BITS_PER_BYTE + 3)
+#define NFT_SET_ELEM_DEAD_BIT	(BITS_PER_LONG - BITS_PER_BYTE + 2)
 #else
 #error
 #endif
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fd4b5da7ac3c..c62227ae7746 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -6296,29 +6296,6 @@ struct nft_set_ext *nft_set_catchall_lookup(const struct net *net,
 }
 EXPORT_SYMBOL_GPL(nft_set_catchall_lookup);
 
-void *nft_set_catchall_gc(const struct nft_set *set)
-{
-	struct nft_set_elem_catchall *catchall, *next;
-	struct nft_set_ext *ext;
-	void *elem = NULL;
-
-	list_for_each_entry_safe(catchall, next, &set->catchall_list, list) {
-		ext = nft_set_elem_ext(set, catchall->elem);
-
-		if (!nft_set_elem_expired(ext) ||
-		    nft_set_elem_mark_busy(ext))
-			continue;
-
-		elem = catchall->elem;
-		list_del_rcu(&catchall->list);
-		kfree_rcu(catchall, rcu);
-		break;
-	}
-
-	return elem;
-}
-EXPORT_SYMBOL_GPL(nft_set_catchall_gc);
-
 static int nft_setelem_catchall_insert(const struct net *net,
 				       struct nft_set *set,
 				       const struct nft_set_elem *elem,
@@ -6789,7 +6766,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err_elem_free;
 	}
 
-	ext->genmask = nft_genmask_cur(ctx->net) | NFT_SET_ELEM_BUSY_MASK;
+	ext->genmask = nft_genmask_cur(ctx->net);
 
 	err = nft_setelem_insert(ctx->net, set, &elem, &ext2, flags);
 	if (err) {
@@ -7181,29 +7158,6 @@ static int nf_tables_delsetelem(struct sk_buff *skb,
 	return err;
 }
 
-void nft_set_gc_batch_release(struct rcu_head *rcu)
-{
-	struct nft_set_gc_batch *gcb;
-	unsigned int i;
-
-	gcb = container_of(rcu, struct nft_set_gc_batch, head.rcu);
-	for (i = 0; i < gcb->head.cnt; i++)
-		nft_set_elem_destroy(gcb->head.set, gcb->elems[i], true);
-	kfree(gcb);
-}
-
-struct nft_set_gc_batch *nft_set_gc_batch_alloc(const struct nft_set *set,
-						gfp_t gfp)
-{
-	struct nft_set_gc_batch *gcb;
-
-	gcb = kzalloc(sizeof(*gcb), gfp);
-	if (gcb == NULL)
-		return gcb;
-	gcb->head.set = set;
-	return gcb;
-}
-
 /*
  * Stateful objects
  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk
  2023-08-10  7:08 ` [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk Pablo Neira Ayuso
@ 2023-08-10  7:10   ` kernel test robot
  2023-08-10 18:00   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 26+ messages in thread
From: kernel test robot @ 2023-08-10  7:10 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

Rule: 'Cc: stable@vger.kernel.org' or 'commit <sha1> upstream.'
Subject: [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk
Link: https://lore.kernel.org/stable/20230810070830.24064-2-pablo%40netfilter.org

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net 0/5] Netfilter fixes for net
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2023-08-10  7:08 ` [PATCH net 5/5] netfilter: nf_tables: remove busy mark and gc batch API Pablo Neira Ayuso
@ 2023-08-10  7:49 ` Greg KH
  2023-08-10 10:29   ` Pablo Neira Ayuso
  2023-08-10 17:46 ` Jakub Kicinski
  6 siblings, 1 reply; 26+ messages in thread
From: Greg KH @ 2023-08-10  7:49 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, stable

On Thu, Aug 10, 2023 at 09:08:25AM +0200, Pablo Neira Ayuso wrote:
> Hi,
> 
> The following patchset contains Netfilter fixes for net.
> 
> The existing attempt to resolve races between control plane and GC work
> is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places
> forgot to call nft_set_elem_mark_busy(), leading to double-deactivation
> of elements.
> 
> This series contains the following patches:
> 
> 1) Do not skip expired elements during walk otherwise elements might
>    never decrement the reference counter on data, leading to memleak.
> 
> 2) Add a GC transaction API to replace the former attempt to deal with
>    races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT
>    on elements and it creates a GC transaction to remove the expired
>    elements, GC transaction could abort in case of interference with
>    control plane and retried later (GC async). Set backends such as
>    rbtree and pipapo also perform GC from control plane (GC sync), in
>    such case, element deactivation and removal is safe because mutex
>    is held then collected elements are released via call_rcu().
> 
> 3) Adapt existing set backends to use the GC transaction API.
> 
> 4) Update rhash set backend to set on _DEAD bit to report deleted
>    elements from datapath for GC.
> 
> 5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT.
> 
> Florian Westphal (1):
>   netfilter: nf_tables: don't skip expired elements during walk
> 
> Pablo Neira Ayuso (4):
>   netfilter: nf_tables: GC transaction API to avoid race with control plane
>   netfilter: nf_tables: adapt set backend to use GC transaction API
>   netfilter: nft_set_hash: mark set element as dead when deleting from packet path
>   netfilter: nf_tables: remove busy mark and gc batch API
> 
> Please, pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-08-10
> 
> Thanks.
> 
> ----------------------------------------------------------------
> 
> The following changes since commit c5ccff70501d92db445a135fa49cf9bc6b98c444:
> 
>   Merge branch 'net-sched-bind-logic-fixes-for-cls_fw-cls_u32-and-cls_route' (2023-07-31 20:10:39 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-08-10
> 
> for you to fetch changes up to a2dd0233cbc4d8a0abb5f64487487ffc9265beb5:
> 
>   netfilter: nf_tables: remove busy mark and gc batch API (2023-08-10 08:25:27 +0200)
> 
> ----------------------------------------------------------------
> netfilter pull request 23-08-10
> 
> ----------------------------------------------------------------
> Florian Westphal (1):
>       netfilter: nf_tables: don't skip expired elements during walk
> 
> Pablo Neira Ayuso (4):
>       netfilter: nf_tables: GC transaction API to avoid race with control plane
>       netfilter: nf_tables: adapt set backend to use GC transaction API
>       netfilter: nft_set_hash: mark set element as dead when deleting from packet path
>       netfilter: nf_tables: remove busy mark and gc batch API
> 
>  include/net/netfilter/nf_tables.h | 120 ++++++---------
>  net/netfilter/nf_tables_api.c     | 307 ++++++++++++++++++++++++++++++--------
>  net/netfilter/nft_set_hash.c      |  85 +++++++----
>  net/netfilter/nft_set_pipapo.c    |  66 +++++---
>  net/netfilter/nft_set_rbtree.c    | 146 ++++++++++--------
>  5 files changed, 476 insertions(+), 248 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
    https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net 0/5] Netfilter fixes for net
  2023-08-10  7:49 ` [PATCH net 0/5] Netfilter fixes for net Greg KH
@ 2023-08-10 10:29   ` Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-10 10:29 UTC (permalink / raw)
  To: Greg KH; +Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, stable

On Thu, Aug 10, 2023 at 09:49:11AM +0200, Greg KH wrote:
> On Thu, Aug 10, 2023 at 09:08:25AM +0200, Pablo Neira Ayuso wrote:
> > Hi,
> > 
> > The following patchset contains Netfilter fixes for net.
> > 
> > The existing attempt to resolve races between control plane and GC work
> > is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places
> > forgot to call nft_set_elem_mark_busy(), leading to double-deactivation
> > of elements.
> > 
> > This series contains the following patches:
> > 
> > 1) Do not skip expired elements during walk otherwise elements might
> >    never decrement the reference counter on data, leading to memleak.
> > 
> > 2) Add a GC transaction API to replace the former attempt to deal with
> >    races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT
> >    on elements and it creates a GC transaction to remove the expired
> >    elements, GC transaction could abort in case of interference with
> >    control plane and retried later (GC async). Set backends such as
> >    rbtree and pipapo also perform GC from control plane (GC sync), in
> >    such case, element deactivation and removal is safe because mutex
> >    is held then collected elements are released via call_rcu().
> > 
> > 3) Adapt existing set backends to use the GC transaction API.
> > 
> > 4) Update rhash set backend to set on _DEAD bit to report deleted
> >    elements from datapath for GC.
> > 
> > 5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT.
> > 
> > Florian Westphal (1):
> >   netfilter: nf_tables: don't skip expired elements during walk
> > 
> > Pablo Neira Ayuso (4):
> >   netfilter: nf_tables: GC transaction API to avoid race with control plane
> >   netfilter: nf_tables: adapt set backend to use GC transaction API
> >   netfilter: nft_set_hash: mark set element as dead when deleting from packet path
> >   netfilter: nf_tables: remove busy mark and gc batch API
> > 
> > Please, pull these changes from:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-08-10
> > 
> > Thanks.
> > 
> > ----------------------------------------------------------------
> > 
> > The following changes since commit c5ccff70501d92db445a135fa49cf9bc6b98c444:
> > 
> >   Merge branch 'net-sched-bind-logic-fixes-for-cls_fw-cls_u32-and-cls_route' (2023-07-31 20:10:39 -0700)
> > 
> > are available in the Git repository at:
> > 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-08-10
> > 
> > for you to fetch changes up to a2dd0233cbc4d8a0abb5f64487487ffc9265beb5:
> > 
> >   netfilter: nf_tables: remove busy mark and gc batch API (2023-08-10 08:25:27 +0200)
> > 
> > ----------------------------------------------------------------
> > netfilter pull request 23-08-10
> > 
> > ----------------------------------------------------------------
> > Florian Westphal (1):
> >       netfilter: nf_tables: don't skip expired elements during walk
> > 
> > Pablo Neira Ayuso (4):
> >       netfilter: nf_tables: GC transaction API to avoid race with control plane
> >       netfilter: nf_tables: adapt set backend to use GC transaction API
> >       netfilter: nft_set_hash: mark set element as dead when deleting from packet path
> >       netfilter: nf_tables: remove busy mark and gc batch API
> > 
> >  include/net/netfilter/nf_tables.h | 120 ++++++---------
> >  net/netfilter/nf_tables_api.c     | 307 ++++++++++++++++++++++++++++++--------
> >  net/netfilter/nft_set_hash.c      |  85 +++++++----
> >  net/netfilter/nft_set_pipapo.c    |  66 +++++---
> >  net/netfilter/nft_set_rbtree.c    | 146 ++++++++++--------
> >  5 files changed, 476 insertions(+), 248 deletions(-)
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree.  Please read:
>     https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.

I will re-submit this once this hit upstream.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net 0/5] Netfilter fixes for net
  2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2023-08-10  7:49 ` [PATCH net 0/5] Netfilter fixes for net Greg KH
@ 2023-08-10 17:46 ` Jakub Kicinski
  6 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2023-08-10 17:46 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, pabeni, edumazet, stable

We've got some new kdoc warnings here:

net/netfilter/nft_set_pipapo.c:1557: warning: Function parameter or member '_set' not described in 'pipapo_gc'
net/netfilter/nft_set_pipapo.c:1557: warning: Excess function parameter 'set' description in 'pipapo_gc'
include/net/netfilter/nf_tables.h:577: warning: Function parameter or member 'dead' not described in 'nft_set'

Don't think Linus will care enough to complain but it'd be good to get
those cleaned up.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk
  2023-08-10  7:08 ` [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk Pablo Neira Ayuso
  2023-08-10  7:10   ` kernel test robot
@ 2023-08-10 18:00   ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 26+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-10 18:00 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, stable

Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Thu, 10 Aug 2023 09:08:26 +0200 you wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> There is an asymmetry between commit/abort and preparation phase if the
> following conditions are met:
> 
> 1. set is a verdict map ("1.2.3.4 : jump foo")
> 2. timeouts are enabled
> 
> [...]

Here is the summary with links:
  - [net,1/5] netfilter: nf_tables: don't skip expired elements during walk
    https://git.kernel.org/netdev/net/c/24138933b97b
  - [net,2/5] netfilter: nf_tables: GC transaction API to avoid race with control plane
    https://git.kernel.org/netdev/net/c/5f68718b34a5
  - [net,3/5] netfilter: nf_tables: adapt set backend to use GC transaction API
    https://git.kernel.org/netdev/net/c/f6c383b8c31a
  - [net,4/5] netfilter: nft_set_hash: mark set element as dead when deleting from packet path
    https://git.kernel.org/netdev/net/c/c92db3030492
  - [net,5/5] netfilter: nf_tables: remove busy mark and gc batch API
    https://git.kernel.org/netdev/net/c/a2dd0233cbc4

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2024-03-07  2:15 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2024-03-07  2:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

Hi,

The following patchset contains fixes for net:

Patch #1 disallows anonymous sets with timeout, except for dynamic sets.
         Anonymous sets with timeouts using the pipapo set backend makes
         no sense from userspace perspective.

Patch #2 rejects constant sets with timeout which has no practical usecase.
         This kind of set, once bound, contains elements that expire but
         no new elements can be added.

Patch #3 restores custom conntrack expectations with NFPROTO_INET,
         from Florian Westphal.

Patch #4 marks rhashtable anonymous set with timeout as dead from the
         commit path to avoid that async GC collects these elements. Rules
         that refers to the anonymous set get released with no mutex held
         from the commit path.

Patch #5 fixes a UBSAN shift overflow in H.323 conntrack helper,
         from Lena Wang.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-24-03-07

Thanks.

----------------------------------------------------------------

The following changes since commit c055fc00c07be1f0df7375ab0036cebd1106ed38:

  net/rds: fix WARNING in rds_conn_connect_if_down (2024-03-06 11:58:42 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-24-03-07

for you to fetch changes up to 767146637efc528b5e3d31297df115e85a2fd362:

  netfilter: nf_conntrack_h323: Add protection for bmp length out of range (2024-03-07 03:10:35 +0100)

----------------------------------------------------------------
netfilter pull request 24-03-07

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: nft_ct: fix l3num expectations with inet pseudo family

Lena Wang (1):
      netfilter: nf_conntrack_h323: Add protection for bmp length out of range

Pablo Neira Ayuso (3):
      netfilter: nf_tables: disallow anonymous set with timeout flag
      netfilter: nf_tables: reject constant set with timeout
      netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout

 net/netfilter/nf_conntrack_h323_asn1.c |  4 ++++
 net/netfilter/nf_tables_api.c          |  7 +++++++
 net/netfilter/nft_ct.c                 | 11 +++++------
 3 files changed, 16 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2024-02-22  0:08 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-22  0:08 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

Hi,

The following patchset contains Netfilter fixes for net:

1) If user requests to wake up a table and hook fails, restore the
   dormant flag from the error path, from Florian Westphal.

2) Reset dst after transferring it to the flow object, otherwise dst
   gets released twice from the error path.

3) Release dst in case the flowtable selects a direct xmit path, eg.
   transmission to bridge port. Otherwise, dst is memleaked.

4) Register basechain and flowtable hooks at the end of the command.
   Error path releases these datastructure without waiting for the
   rcu grace period.

5) Use kzalloc() to initialize struct nft_hook to fix a KMSAN report
   on access to hook type, also from Florian Westphal.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-24-02-22

Thanks.

----------------------------------------------------------------

The following changes since commit 40b9385dd8e6a0515e1c9cd06a277483556b7286:

  enic: Avoid false positive under FORTIFY_SOURCE (2024-02-19 10:57:27 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-24-02-22

for you to fetch changes up to 195e5f88c2e48330ba5483e0bad2de3b3fad484f:

  netfilter: nf_tables: use kzalloc for hook allocation (2024-02-22 00:15:58 +0100)

----------------------------------------------------------------
netfilter pull request 24-02-22

----------------------------------------------------------------
Florian Westphal (2):
      netfilter: nf_tables: set dormant flag on hook register failure
      netfilter: nf_tables: use kzalloc for hook allocation

Pablo Neira Ayuso (3):
      netfilter: nft_flow_offload: reset dst in route object after setting up flow
      netfilter: nft_flow_offload: release dst in case direct xmit path is used
      netfilter: nf_tables: register hooks last when adding new chain/flowtable

 include/net/netfilter/nf_flow_table.h |  2 +-
 net/netfilter/nf_flow_table_core.c    | 17 ++++++--
 net/netfilter/nf_tables_api.c         | 81 ++++++++++++++++++-----------------
 3 files changed, 57 insertions(+), 43 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2023-11-08 15:57 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-11-08 15:57 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

Hi,

The following patchset contains Netfilter fixes for net:

1) Add missing netfilter modules description to fix W=1, from Florian Westphal.

2) Fix catch-all element GC with timeout when use with the pipapo set
   backend, this remained broken since I tried to fix it this summer,
   then another attempt to fix it recently.

3) Add missing IPVS modules descriptions to fix W=1, also from Florian.

4) xt_recent allocated a too small buffer to store an IPv4-mapped IPv6
   address which can be parsed by in6_pton(), from Maciej Zenczykowski.
   Broken for many releases.

5) Skip IPv4-mapped IPv6, IPv4-compat IPv6, site/link local scoped IPv6
   addressses to set up IPv6 NAT redirect, also from Florian. This is
   broken since 2012.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-11-08

Thanks.

----------------------------------------------------------------

The following changes since commit d93f9528573e1d419b69ca5ff4130201d05f6b90:

  nfsd: regenerate user space parsers after ynl-gen changes (2023-11-06 09:03:46 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-11-08

for you to fetch changes up to 80abbe8a8263106fe45a4f293b92b5c74cc9cc8a:

  netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses (2023-11-08 16:40:30 +0100)

----------------------------------------------------------------
netfilter pull request 23-11-08

----------------------------------------------------------------
Florian Westphal (3):
      netfilter: add missing module descriptions
      ipvs: add missing module descriptions
      netfilter: nat: fix ipv6 nat redirect with mapped and scoped addresses

Maciej Żenczykowski (1):
      netfilter: xt_recent: fix (increase) ipv6 literal buffer length

Pablo Neira Ayuso (1):
      netfilter: nf_tables: remove catchall element in GC sync path

 net/bridge/netfilter/ebtable_broute.c      |  1 +
 net/bridge/netfilter/ebtable_filter.c      |  1 +
 net/bridge/netfilter/ebtable_nat.c         |  1 +
 net/bridge/netfilter/ebtables.c            |  1 +
 net/bridge/netfilter/nf_conntrack_bridge.c |  1 +
 net/ipv4/netfilter/iptable_nat.c           |  1 +
 net/ipv4/netfilter/iptable_raw.c           |  1 +
 net/ipv4/netfilter/nf_defrag_ipv4.c        |  1 +
 net/ipv4/netfilter/nf_reject_ipv4.c        |  1 +
 net/ipv6/netfilter/ip6table_nat.c          |  1 +
 net/ipv6/netfilter/ip6table_raw.c          |  1 +
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |  1 +
 net/ipv6/netfilter/nf_reject_ipv6.c        |  1 +
 net/netfilter/ipvs/ip_vs_core.c            |  1 +
 net/netfilter/ipvs/ip_vs_dh.c              |  1 +
 net/netfilter/ipvs/ip_vs_fo.c              |  1 +
 net/netfilter/ipvs/ip_vs_ftp.c             |  1 +
 net/netfilter/ipvs/ip_vs_lblc.c            |  1 +
 net/netfilter/ipvs/ip_vs_lblcr.c           |  1 +
 net/netfilter/ipvs/ip_vs_lc.c              |  1 +
 net/netfilter/ipvs/ip_vs_nq.c              |  1 +
 net/netfilter/ipvs/ip_vs_ovf.c             |  1 +
 net/netfilter/ipvs/ip_vs_pe_sip.c          |  1 +
 net/netfilter/ipvs/ip_vs_rr.c              |  1 +
 net/netfilter/ipvs/ip_vs_sed.c             |  1 +
 net/netfilter/ipvs/ip_vs_sh.c              |  1 +
 net/netfilter/ipvs/ip_vs_twos.c            |  1 +
 net/netfilter/ipvs/ip_vs_wlc.c             |  1 +
 net/netfilter/ipvs/ip_vs_wrr.c             |  1 +
 net/netfilter/nf_conntrack_broadcast.c     |  1 +
 net/netfilter/nf_conntrack_netlink.c       |  1 +
 net/netfilter/nf_conntrack_proto.c         |  1 +
 net/netfilter/nf_nat_core.c                |  1 +
 net/netfilter/nf_nat_redirect.c            | 27 ++++++++++++++++++++++++++-
 net/netfilter/nf_tables_api.c              | 23 ++++++++++++++++++-----
 net/netfilter/nfnetlink_osf.c              |  1 +
 net/netfilter/nft_chain_nat.c              |  1 +
 net/netfilter/nft_fib.c                    |  1 +
 net/netfilter/nft_fwd_netdev.c             |  1 +
 net/netfilter/xt_recent.c                  |  2 +-
 40 files changed, 82 insertions(+), 7 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2023-08-30 23:59 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-08-30 23:59 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet

Hi,

The following patchset contains Netfilter fixes for net:

1) Fix mangling of TCP options with non-linear skbuff, from Xiao Liang.

2) OOB read in xt_sctp due to missing sanitization of array length field.
   From Wander Lairson Costa.

3) OOB read in xt_u32 due to missing sanitization of array length field.
   Also from Wander Lairson Costa.

All of them above, always broken for several releases.

4) Missing audit log for set element reset command, from Phil Sutter.

5) Missing audit log for rule reset command, also from Phil.

These audit log support are missing in 6.5.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-08-31

Thanks.

----------------------------------------------------------------

The following changes since commit bd6c11bc43c496cddfc6cf603b5d45365606dbd5:

  Merge tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next (2023-08-29 11:33:01 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-08-31

for you to fetch changes up to ea078ae9108e25fc881c84369f7c03931d22e555:

  netfilter: nf_tables: Audit log rule reset (2023-08-31 01:29:28 +0200)

----------------------------------------------------------------
netfilter pull request 23-08-31

----------------------------------------------------------------
Phil Sutter (2):
      netfilter: nf_tables: Audit log setelem reset
      netfilter: nf_tables: Audit log rule reset

Wander Lairson Costa (2):
      netfilter: xt_sctp: validate the flag_info count
      netfilter: xt_u32: validate user space input

Xiao Liang (1):
      netfilter: nft_exthdr: Fix non-linear header modification

 include/linux/audit.h         |  2 ++
 kernel/auditsc.c              |  2 ++
 net/netfilter/nf_tables_api.c | 49 ++++++++++++++++++++++++++++++++++++++++---
 net/netfilter/nft_exthdr.c    | 20 +++++++-----------
 net/netfilter/xt_sctp.c       |  2 ++
 net/netfilter/xt_u32.c        | 21 +++++++++++++++++++
 6 files changed, 81 insertions(+), 15 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2023-06-06 22:58 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-06-06 22:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

Hi,

The following patchset contains Netfilter fixes for net:

1) Missing nul-check in basechain hook netlink dump path, from Gavrilov Ilia.

2) Fix bitwise register tracking, from Jeremy Sowden.

3) Null pointer dereference when accessing conntrack helper,
   from Tijs Van Buggenhout.

4) Add schedule point to ipset's call_ad, from Kuniyuki Iwashima.

5) Incorrect boundary check when building chain blob.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-23-06-07

Thanks.

----------------------------------------------------------------

The following changes since commit 9025944fddfed5966c8f102f1fe921ab3aee2c12:

  net: fec: add dma_wmb to ensure correct descriptor values (2023-05-19 09:17:53 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-23-06-07

for you to fetch changes up to 08e42a0d3ad30f276f9597b591f975971a1b0fcf:

  netfilter: nf_tables: out-of-bound check in chain blob (2023-06-07 00:43:44 +0200)

----------------------------------------------------------------
netfilter pull request 23-06-07

----------------------------------------------------------------
Gavrilov Ilia (1):
      netfilter: nf_tables: Add null check for nla_nest_start_noflag() in nft_dump_basechain_hook()

Jeremy Sowden (1):
      netfilter: nft_bitwise: fix register tracking

Kuniyuki Iwashima (1):
      netfilter: ipset: Add schedule point in call_ad().

Pablo Neira Ayuso (1):
      netfilter: nf_tables: out-of-bound check in chain blob

Tijs Van Buggenhout (1):
      netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper

 net/netfilter/ipset/ip_set_core.c | 8 ++++++++
 net/netfilter/nf_conntrack_core.c | 3 +++
 net/netfilter/nf_tables_api.c     | 4 +++-
 net/netfilter/nft_bitwise.c       | 2 +-
 4 files changed, 15 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2023-04-18 14:50 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-18 14:50 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet

Hi,

The following patchset contains Netfilter fixes for net:

1) Unbreak br_netfilter physdev match support, from Florian Westphal.

2) Use GFP_KERNEL_ACCOUNT for stateful/policy objects, from Chen Aotian.

3) Use IS_ENABLED() in nf_reset_trace(), from Florian Westphal.

4) Fix validation of catch-all set element.

5) Tighten requirements for catch-all set elements.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 24e3fce00c0b557491ff596c0682a29dee6fe848:

  net: stmmac: Add queue reset into stmmac_xdp_open() function (2023-04-05 19:02:56 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git HEAD

for you to fetch changes up to d4eb7e39929a3b1ff30fb751b4859fc2410702a0:

  netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements (2023-04-18 09:30:21 +0200)

----------------------------------------------------------------
Chen Aotian (1):
      netfilter: nf_tables: Modify nla_memdup's flag to GFP_KERNEL_ACCOUNT

Florian Westphal (2):
      netfilter: br_netfilter: fix recent physdev match breakage
      netfilter: nf_tables: fix ifdef to also consider nf_tables=m

Pablo Neira Ayuso (2):
      netfilter: nf_tables: validate catch-all set elements
      netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements

 include/linux/skbuff.h            |  5 +--
 include/net/netfilter/nf_tables.h |  4 +++
 net/bridge/br_netfilter_hooks.c   | 17 ++++++----
 net/netfilter/nf_tables_api.c     | 69 ++++++++++++++++++++++++++++++++++-----
 net/netfilter/nft_lookup.c        | 36 +++-----------------
 5 files changed, 83 insertions(+), 48 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2022-06-21  8:56 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2022-06-21  8:56 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet

Hi,

The following patchset contains Netfilter fixes for net:

1) Use get_random_u32() instead of prandom_u32_state() in nft_meta
   and nft_numgen, from Florian Westphal.

2) Incorrect list head in nfnetlink_cttimeout in recent update coming
   from previous development cycle. Also from Florian.

3) Incorrect path to pktgen scripts for nft_concat_range.sh selftest.
   From Jie2x Zhou.

4) Two fixes for the for nft_fwd and nft_dup egress support, from Florian.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit f5826c8c9d57210a17031af5527056eefdc2b7eb:

  net/mlx4_en: Fix wrong return value on ioctl EEPROM query failure (2022-06-07 20:49:58 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git HEAD

for you to fetch changes up to fcd53c51d03709bc429822086f1e9b3e88904284:

  netfilter: nf_dup_netdev: add and use recursion counter (2022-06-21 10:50:41 +0200)

----------------------------------------------------------------
Florian Westphal (4):
      netfilter: use get_random_u32 instead of prandom
      netfilter: cttimeout: fix slab-out-of-bounds read typo in cttimeout_net_exit
      netfilter: nf_dup_netdev: do not push mac header a second time
      netfilter: nf_dup_netdev: add and use recursion counter

Jie2x Zhou (1):
      selftests: netfilter: correct PKTGEN_SCRIPT_PATHS in nft_concat_range.sh

 net/netfilter/nf_dup_netdev.c                      | 25 ++++++++++++++++++----
 net/netfilter/nfnetlink_cttimeout.c                |  2 +-
 net/netfilter/nft_meta.c                           | 13 ++---------
 net/netfilter/nft_numgen.c                         | 12 +++--------
 .../selftests/netfilter/nft_concat_range.sh        |  2 +-
 5 files changed, 28 insertions(+), 26 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2022-05-31 21:58 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2022-05-31 21:58 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet

Hi,

1) Missing proper sanitization for nft_set_desc_concat_parse().

2) Missing mutex in nf_tables pre_exit path.

3) Possible double hook unregistration from clean_net path.

4) Missing FLOWI_FLAG_ANYSRC flag in flowtable route lookup.
   Fix incorrect source and destination address in case of NAT.
   Patch from wenxu.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 09e545f7381459c015b6fa0cd0ac6f010ef8cc25:

  xen/netback: fix incorrect usage of RING_HAS_UNCONSUMED_REQUESTS() (2022-05-31 12:22:22 +0200)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git HEAD

for you to fetch changes up to 97629b237a8cb7ac655c3969b8d5e57300ff6598:

  netfilter: flowtable: fix nft_flow_route source address for nat case (2022-05-31 23:32:53 +0200)

----------------------------------------------------------------
Pablo Neira Ayuso (3):
      netfilter: nf_tables: sanitize nft_set_desc_concat_parse()
      netfilter: nf_tables: hold mutex on netns pre_exit path
      netfilter: nf_tables: double hook unregistration in netns path

wenxu (2):
      netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag
      netfilter: flowtable: fix nft_flow_route source address for nat case

 net/netfilter/nf_tables_api.c    | 75 +++++++++++++++++++++++++++++++---------
 net/netfilter/nft_flow_offload.c |  6 ++--
 2 files changed, 62 insertions(+), 19 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2022-01-20 12:52 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2022-01-20 12:52 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

Hi,

The following patchset contains Netfilter fixes for net:

1) Incorrect helper module alias in netbios_ns, from Florian Westphal.

2) Remove unused variable in nf_tables.

3) Uninitialized last expression in nf_tables register tracking.

4) Memleak in nft_connlimit after moving stateful data out of the
   expression data area.

5) Bogus invalid stats update when NF_REPEAT is returned, from Florian.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 7d6019b602de660bfc6a542a68630006ace83b90:

  Revert "net: vertexcom: default to disabled on kbuild" (2022-01-10 21:11:07 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 830af2eba40327abec64325a5b08b1e85c37a2e0:

  netfilter: conntrack: don't increment invalid counter on NF_REPEAT (2022-01-16 00:55:27 +0100)

----------------------------------------------------------------
Florian Westphal (2):
      netfilter: nf_conntrack_netbios_ns: fix helper module alias
      netfilter: conntrack: don't increment invalid counter on NF_REPEAT

Pablo Neira Ayuso (3):
      netfilter: nf_tables: remove unused variable
      netfilter: nf_tables: set last expression in register tracking area
      netfilter: nft_connlimit: memleak if nf_ct_netns_get() fails

 net/netfilter/nf_conntrack_core.c       |  8 +++++---
 net/netfilter/nf_conntrack_netbios_ns.c |  5 +++--
 net/netfilter/nf_tables_api.c           |  4 +---
 net/netfilter/nft_connlimit.c           | 11 ++++++++++-
 4 files changed, 19 insertions(+), 9 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2021-09-29 23:04 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-29 23:04 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

Hi,

The following patchset contains Netfilter fixes for net:

1) Move back the defrag users fields to the global netns_nf area.
   Kernel fails to boot if conntrack is builtin and kernel is booted
   with: nf_conntrack.enable_hooks=1. From Florian Westphal.

2) Rule event notification is missing relevant context such as
   the position handle and the NLM_F_APPEND flag.

3) Rule replacement is expanded to add + delete using the existing
   rule handle, reverse order of this operation so it makes sense
   from rule notification standpoint.

4) Remove superfluous check in the dynamic set extension which
   disallow update commands on a set without timeout.

5) Propagate to userspace the NLM_F_CREATE and NLM_F_EXCL flags
   from the rule notification path.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 3b1b6e82fb5e08e2cb355d7b2ee8644ec289de66:

  net: phy: enhance GPY115 loopback disable function (2021-09-27 13:49:38 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 3d3b30175a51cf027201670af3e2e5b05447b985:

  netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification (2021-09-28 13:04:56 +0200)

----------------------------------------------------------------
Florian Westphal (1):
      netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1

Pablo Neira Ayuso (4):
      netfilter: nf_tables: add position handle in event notification
      netfilter: nf_tables: reverse order in rule replacement expansion
      netfilter: nft_dynset: relax superfluous check on set updates
      netfilter: nf_tables: honor NLM_F_CREATE and NLM_F_EXCL in event notification

 include/net/netfilter/ipv6/nf_defrag_ipv6.h |  1 -
 include/net/netfilter/nf_tables.h           |  2 +-
 include/net/netns/netfilter.h               |  6 ++
 net/ipv4/netfilter/nf_defrag_ipv4.c         | 30 +++-------
 net/ipv6/netfilter/nf_conntrack_reasm.c     |  2 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c   | 25 +++-----
 net/netfilter/nf_tables_api.c               | 91 ++++++++++++++++++++---------
 net/netfilter/nft_dynset.c                  | 11 +---
 net/netfilter/nft_quota.c                   |  2 +-
 9 files changed, 92 insertions(+), 78 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2021-09-03 16:30 Pablo Neira Ayuso
  0 siblings, 0 replies; 26+ messages in thread
From: Pablo Neira Ayuso @ 2021-09-03 16:30 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

Hi,

The following patchset contains Netfilter fixes for net:

1) Protect nft_ct template with global mutex, from Pavel Skripkin.

2) Two recent commits switched inet rt and nexthop exception hashes
   from jhash to siphash. If those two spots are problematic then
   conntrack is affected as well, so switch voer to siphash too.
   While at it, add a hard upper limit on chain lengths and reject
   insertion if this is hit. Patches from Florian Westphal.

3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN,
   from Benjamin Hesmans.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 519133debcc19f5c834e7e28480b60bdc234fe02:

  net: bridge: fix memleak in br_add_if() (2021-08-10 13:25:14 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 730affed24bffcd1eebd5903171960f5ff9f1f22:

  netfilter: socket: icmp6: fix use-after-scope (2021-09-03 18:25:31 +0200)

----------------------------------------------------------------
Benjamin Hesmans (1):
      netfilter: socket: icmp6: fix use-after-scope

Florian Westphal (3):
      netfilter: conntrack: sanitize table size default settings
      netfilter: conntrack: switch to siphash
      netfilter: refuse insertion if chain has grown too large

Pavel Skripkin (1):
      netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex

 Documentation/networking/nf_conntrack-sysctl.rst   |  13 ++-
 include/linux/netfilter/nf_conntrack_common.h      |   1 +
 include/uapi/linux/netfilter/nfnetlink_conntrack.h |   1 +
 net/ipv6/netfilter/nf_socket_ipv6.c                |   4 +-
 net/netfilter/nf_conntrack_core.c                  | 103 ++++++++++++++-------
 net/netfilter/nf_conntrack_expect.c                |  25 +++--
 net/netfilter/nf_conntrack_netlink.c               |   4 +-
 net/netfilter/nf_conntrack_standalone.c            |   4 +-
 net/netfilter/nf_nat_core.c                        |  18 +++-
 net/netfilter/nft_ct.c                             |   9 +-
 10 files changed, 123 insertions(+), 59 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net 0/5] Netfilter fixes for net
  2020-11-27 19:03 Pablo Neira Ayuso
@ 2020-11-28 21:23 ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2020-11-28 21:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Fri, 27 Nov 2020 20:03:08 +0100 Pablo Neira Ayuso wrote:
> 1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported
>    by syzbot.
> 
> 2) Remove spurious reports on nf_tables when lockdep gets disabled,
>    from Florian Westphal.
> 
> 3) Fix memleak in the error path of error path of
>    ip_vs_control_net_init(), from Wang Hai.
> 
> 4) Fix missing control data in flow dissector, otherwise IP address
>    matching in hardware offload infra does not work.
> 
> 5) Fix hardware offload match on prefix IP address when userspace
>    does not send a bitwise expression to represent the prefix.
> 
> Please, pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2020-11-27 19:03 Pablo Neira Ayuso
  2020-11-28 21:23 ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2020-11-27 19:03 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba

Hi,

The following patchset contains Netfilter fixes for net:

1) Fix insufficient validation of IPSET_ATTR_IPADDR_IPV6 reported
   by syzbot.

2) Remove spurious reports on nf_tables when lockdep gets disabled,
   from Florian Westphal.

3) Fix memleak in the error path of error path of
   ip_vs_control_net_init(), from Wang Hai.

4) Fix missing control data in flow dissector, otherwise IP address
   matching in hardware offload infra does not work.

5) Fix hardware offload match on prefix IP address when userspace
   does not send a bitwise expression to represent the prefix.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks Jakub.

----------------------------------------------------------------

The following changes since commit 90cf87d16bd566cff40c2bc8e32e6d4cd3af23f0:

  enetc: Let the hardware auto-advance the taprio base-time of 0 (2020-11-25 12:36:27 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to a5d45bc0dc50f9dd83703510e9804d813a9cac32:

  netfilter: nftables_offload: build mask based from the matching bytes (2020-11-27 12:10:47 +0100)

----------------------------------------------------------------
Eric Dumazet (1):
      netfilter: ipset: prevent uninit-value in hash_ip6_add

Florian Westphal (1):
      netfilter: nf_tables: avoid false-postive lockdep splat

Pablo Neira Ayuso (2):
      netfilter: nftables_offload: set address type in control dissector
      netfilter: nftables_offload: build mask based from the matching bytes

Wang Hai (1):
      ipvs: fix possible memory leak in ip_vs_control_net_init

 include/net/netfilter/nf_tables_offload.h |  7 ++++
 net/netfilter/ipset/ip_set_core.c         |  3 +-
 net/netfilter/ipvs/ip_vs_ctl.c            | 31 +++++++++++---
 net/netfilter/nf_tables_api.c             |  3 +-
 net/netfilter/nf_tables_offload.c         | 17 ++++++++
 net/netfilter/nft_cmp.c                   |  8 ++--
 net/netfilter/nft_meta.c                  | 16 +++----
 net/netfilter/nft_payload.c               | 70 +++++++++++++++++++++++--------
 8 files changed, 117 insertions(+), 38 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net 0/5] Netfilter fixes for net
  2020-10-31 18:14 Pablo Neira Ayuso
@ 2020-11-01  1:02 ` Jakub Kicinski
  0 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2020-11-01  1:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Sat, 31 Oct 2020 19:14:32 +0100 Pablo Neira Ayuso wrote:
> Hi,
> 
> The following patchset contains Netfilter fixes for net:
> 
> 1) Incorrect netlink report logic in flowtable and genID.
> 
> 2) Add a selftest to check that wireguard passes the right sk
>    to ip_route_me_harder, from Jason A. Donenfeld.
> 
> 3) Pass the actual sk to ip_route_me_harder(), also from Jason.
> 
> 4) Missing expression validation of updates via nft --check.
> 
> 5) Update byte and packet counters regardless of whether they
>    match, from Stefano Brivio.

Pulled, thanks Pablo!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net 0/5] Netfilter fixes for net
@ 2020-10-31 18:14 Pablo Neira Ayuso
  2020-11-01  1:02 ` Jakub Kicinski
  0 siblings, 1 reply; 26+ messages in thread
From: Pablo Neira Ayuso @ 2020-10-31 18:14 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi,

The following patchset contains Netfilter fixes for net:

1) Incorrect netlink report logic in flowtable and genID.

2) Add a selftest to check that wireguard passes the right sk
   to ip_route_me_harder, from Jason A. Donenfeld.

3) Pass the actual sk to ip_route_me_harder(), also from Jason.

4) Missing expression validation of updates via nft --check.

5) Update byte and packet counters regardless of whether they
   match, from Stefano Brivio.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks.

----------------------------------------------------------------

The following changes since commit 07e0887302450a62f51dba72df6afb5fabb23d1c:

  Merge tag 'fallthrough-fixes-clang-5.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux (2020-10-29 13:02:52 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 7d10e62c2ff8e084c136c94d32d9a94de4d31248:

  netfilter: ipset: Update byte and packet counters regardless of whether they match (2020-10-31 11:11:11 +0100)

----------------------------------------------------------------
Jason A. Donenfeld (2):
      wireguard: selftests: check that route_me_harder packets use the right sk
      netfilter: use actual socket sk rather than skb sk when routing harder

Pablo Neira Ayuso (2):
      netfilter: nftables: fix netlink report logic in flowtable and genid
      netfilter: nf_tables: missing validation from the abort path

Stefano Brivio (1):
      netfilter: ipset: Update byte and packet counters regardless of whether they match

 include/linux/netfilter/nfnetlink.h                |  9 ++++++++-
 include/linux/netfilter_ipv4.h                     |  2 +-
 include/linux/netfilter_ipv6.h                     | 10 +++++-----
 net/ipv4/netfilter.c                               |  8 +++++---
 net/ipv4/netfilter/iptable_mangle.c                |  2 +-
 net/ipv4/netfilter/nf_reject_ipv4.c                |  2 +-
 net/ipv6/netfilter.c                               |  6 +++---
 net/ipv6/netfilter/ip6table_mangle.c               |  2 +-
 net/netfilter/ipset/ip_set_core.c                  |  3 ++-
 net/netfilter/ipvs/ip_vs_core.c                    |  4 ++--
 net/netfilter/nf_nat_proto.c                       |  4 ++--
 net/netfilter/nf_synproxy_core.c                   |  2 +-
 net/netfilter/nf_tables_api.c                      | 19 ++++++++++++-------
 net/netfilter/nfnetlink.c                          | 22 ++++++++++++++++++----
 net/netfilter/nft_chain_route.c                    |  4 ++--
 net/netfilter/utils.c                              |  4 ++--
 tools/testing/selftests/wireguard/netns.sh         |  8 ++++++++
 .../testing/selftests/wireguard/qemu/kernel.config |  2 ++
 18 files changed, 76 insertions(+), 37 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2024-03-07  2:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-10  7:08 [PATCH net 0/5] Netfilter fixes for net Pablo Neira Ayuso
2023-08-10  7:08 ` [PATCH net 1/5] netfilter: nf_tables: don't skip expired elements during walk Pablo Neira Ayuso
2023-08-10  7:10   ` kernel test robot
2023-08-10 18:00   ` patchwork-bot+netdevbpf
2023-08-10  7:08 ` [PATCH net 2/5] netfilter: nf_tables: GC transaction API to avoid race with control plane Pablo Neira Ayuso
2023-08-10  7:08 ` [PATCH net 3/5] netfilter: nf_tables: adapt set backend to use GC transaction API Pablo Neira Ayuso
2023-08-10  7:08 ` [PATCH net 4/5] netfilter: nft_set_hash: mark set element as dead when deleting from packet path Pablo Neira Ayuso
2023-08-10  7:08 ` [PATCH net 5/5] netfilter: nf_tables: remove busy mark and gc batch API Pablo Neira Ayuso
2023-08-10  7:49 ` [PATCH net 0/5] Netfilter fixes for net Greg KH
2023-08-10 10:29   ` Pablo Neira Ayuso
2023-08-10 17:46 ` Jakub Kicinski
  -- strict thread matches above, loose matches on Subject: below --
2024-03-07  2:15 Pablo Neira Ayuso
2024-02-22  0:08 Pablo Neira Ayuso
2023-11-08 15:57 Pablo Neira Ayuso
2023-08-30 23:59 Pablo Neira Ayuso
2023-06-06 22:58 Pablo Neira Ayuso
2023-04-18 14:50 Pablo Neira Ayuso
2022-06-21  8:56 Pablo Neira Ayuso
2022-05-31 21:58 Pablo Neira Ayuso
2022-01-20 12:52 Pablo Neira Ayuso
2021-09-29 23:04 Pablo Neira Ayuso
2021-09-03 16:30 Pablo Neira Ayuso
2020-11-27 19:03 Pablo Neira Ayuso
2020-11-28 21:23 ` Jakub Kicinski
2020-10-31 18:14 Pablo Neira Ayuso
2020-11-01  1:02 ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.