netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net,v2 00/13] Netfilter fixes for net
@ 2024-02-08 11:28 Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits Pablo Neira Ayuso
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

This v2 including changes requested by Paolo Abeni.

-o-

Hi,

The following patchset contains Netfilter fixes for net:

1) Narrow down target/match revision to u8 in nft_compat.

2) Bail out with unused flags in nft_compat.

3) Restrict layer 4 protocol to u16 in nft_compat.

4) Remove static in pipapo get command that slipped through when
   reducing set memory footprint.

5) Follow up incremental fix for the ipset performance regression,
   this includes the missing gc cancellation, from Jozsef Kadlecsik.

6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
   as no filtering, from Felix Huettner.

7) Reject direction for NFT_CT_ID.

8) Use timestamp to check for set element expiration while transaction
   is handled to prevent garbage collection from removing set elements
   that were just added by this transaction. Packet path and netlink
   dump/get path still use current time to check for expiration.

9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.

10) map_index needs to be percpu and per-set, not just percpu.
    At this time its possible for a pipapo set to fill the all-zero part
    with ones and take the 'might have bits set' as 'start-from-zero' area.
    From Florian Westphal. This includes three patches:

    - Change scratchpad area to a structure that provides space for a
      per-set-and-cpu toggle and uses it of the percpu one.

    - Add a new free helper to prepare for the next patch.

    - Remove the scratch_aligned pointer and makes AVX2 implementation
      use the exact same memory addresses for read/store of the matching
      state.

Please, pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git nf-24-02-08

Thanks.

----------------------------------------------------------------

The following changes since commit eef00a82c568944f113f2de738156ac591bbd5cd:

  inet: read sk->sk_family once in inet_recv_error() (2024-02-04 16:06:53 +0000)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf.git tags/nf-24-02-08

for you to fetch changes up to 5a8cdf6fd860ac5e6d08d72edbcecee049a7fec4:

  netfilter: nft_set_pipapo: remove scratch_aligned pointer (2024-02-08 12:24:02 +0100)

----------------------------------------------------------------
netfilter pull request 24-02-08

----------------------------------------------------------------
Felix Huettner (1):
      netfilter: ctnetlink: fix filtering for zone 0

Florian Westphal (4):
      netfilter: nfnetlink_queue: un-break NF_REPEAT
      netfilter: nft_set_pipapo: store index in scratch maps
      netfilter: nft_set_pipapo: add helper to release pcpu scratch area
      netfilter: nft_set_pipapo: remove scratch_aligned pointer

Jozsef Kadlecsik (1):
      netfilter: ipset: Missing gc cancellations fixed

Pablo Neira Ayuso (7):
      netfilter: nft_compat: narrow down revision to unsigned 8-bits
      netfilter: nft_compat: reject unused compat flag
      netfilter: nft_compat: restrict match/target protocol to u16
      netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
      netfilter: nft_ct: reject direction for ct id
      netfilter: nf_tables: use timestamp to check for set element timeout
      netfilter: nft_set_rbtree: skip end interval element from gc

 include/net/netfilter/nf_tables.h                  |  16 ++-
 include/uapi/linux/netfilter/nf_tables.h           |   2 +
 net/netfilter/ipset/ip_set_core.c                  |   2 +
 net/netfilter/ipset/ip_set_hash_gen.h              |   4 +-
 net/netfilter/nf_conntrack_netlink.c               |  12 +-
 net/netfilter/nf_tables_api.c                      |   4 +-
 net/netfilter/nfnetlink_queue.c                    |  13 ++-
 net/netfilter/nft_compat.c                         |  17 ++-
 net/netfilter/nft_ct.c                             |   3 +
 net/netfilter/nft_set_hash.c                       |   8 +-
 net/netfilter/nft_set_pipapo.c                     | 128 +++++++++++----------
 net/netfilter/nft_set_pipapo.h                     |  18 ++-
 net/netfilter/nft_set_pipapo_avx2.c                |  17 ++-
 net/netfilter/nft_set_rbtree.c                     |  17 +--
 .../selftests/netfilter/conntrack_dump_flush.c     |  43 ++++++-
 15 files changed, 202 insertions(+), 102 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 12:30   ` patchwork-bot+netdevbpf
  2024-02-08 11:28 ` [PATCH net 02/13] netfilter: nft_compat: reject unused compat flag Pablo Neira Ayuso
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

xt_find_revision() expects u8, restrict it to this datatype.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index f0eeda97bfcd..001b6841a4b6 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -135,7 +135,7 @@ static void nft_target_eval_bridge(const struct nft_expr *expr,
 
 static const struct nla_policy nft_target_policy[NFTA_TARGET_MAX + 1] = {
 	[NFTA_TARGET_NAME]	= { .type = NLA_NUL_STRING },
-	[NFTA_TARGET_REV]	= { .type = NLA_U32 },
+	[NFTA_TARGET_REV]	= NLA_POLICY_MAX(NLA_BE32, 255),
 	[NFTA_TARGET_INFO]	= { .type = NLA_BINARY },
 };
 
@@ -419,7 +419,7 @@ static void nft_match_eval(const struct nft_expr *expr,
 
 static const struct nla_policy nft_match_policy[NFTA_MATCH_MAX + 1] = {
 	[NFTA_MATCH_NAME]	= { .type = NLA_NUL_STRING },
-	[NFTA_MATCH_REV]	= { .type = NLA_U32 },
+	[NFTA_MATCH_REV]	= NLA_POLICY_MAX(NLA_BE32, 255),
 	[NFTA_MATCH_INFO]	= { .type = NLA_BINARY },
 };
 
@@ -724,7 +724,7 @@ static int nfnl_compat_get_rcu(struct sk_buff *skb,
 static const struct nla_policy nfnl_compat_policy_get[NFTA_COMPAT_MAX+1] = {
 	[NFTA_COMPAT_NAME]	= { .type = NLA_NUL_STRING,
 				    .len = NFT_COMPAT_NAME_MAX-1 },
-	[NFTA_COMPAT_REV]	= { .type = NLA_U32 },
+	[NFTA_COMPAT_REV]	= NLA_POLICY_MAX(NLA_BE32, 255),
 	[NFTA_COMPAT_TYPE]	= { .type = NLA_U32 },
 };
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 02/13] netfilter: nft_compat: reject unused compat flag
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 03/13] netfilter: nft_compat: restrict match/target protocol to u16 Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

Flag (1 << 0) is ignored is set, never used, reject it it with EINVAL
instead.

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 2 ++
 net/netfilter/nft_compat.c               | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index ca30232b7bc8..117c6a9b845b 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -285,9 +285,11 @@ enum nft_rule_attributes {
 /**
  * enum nft_rule_compat_flags - nf_tables rule compat flags
  *
+ * @NFT_RULE_COMPAT_F_UNUSED: unused
  * @NFT_RULE_COMPAT_F_INV: invert the check result
  */
 enum nft_rule_compat_flags {
+	NFT_RULE_COMPAT_F_UNUSED = (1 << 0),
 	NFT_RULE_COMPAT_F_INV	= (1 << 1),
 	NFT_RULE_COMPAT_F_MASK	= NFT_RULE_COMPAT_F_INV,
 };
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index 001b6841a4b6..ed71d5ecbe0a 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -212,7 +212,8 @@ static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
 		return -EINVAL;
 
 	flags = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_FLAGS]));
-	if (flags & ~NFT_RULE_COMPAT_F_MASK)
+	if (flags & NFT_RULE_COMPAT_F_UNUSED ||
+	    flags & ~NFT_RULE_COMPAT_F_MASK)
 		return -EINVAL;
 	if (flags & NFT_RULE_COMPAT_F_INV)
 		*inv = true;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 03/13] netfilter: nft_compat: restrict match/target protocol to u16
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 02/13] netfilter: nft_compat: reject unused compat flag Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 04/13] netfilter: nft_set_pipapo: remove static in nft_pipapo_get() Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

xt_check_{match,target} expects u16, but NFTA_RULE_COMPAT_PROTO is u32.

NLA_POLICY_MAX(NLA_BE32, 65535) cannot be used because .max in
nla_policy is s16, see 3e48be05f3c7 ("netlink: add attribute range
validation to policy").

Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_compat.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
index ed71d5ecbe0a..1f9474fefe84 100644
--- a/net/netfilter/nft_compat.c
+++ b/net/netfilter/nft_compat.c
@@ -200,6 +200,7 @@ static const struct nla_policy nft_rule_compat_policy[NFTA_RULE_COMPAT_MAX + 1]
 static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
 {
 	struct nlattr *tb[NFTA_RULE_COMPAT_MAX+1];
+	u32 l4proto;
 	u32 flags;
 	int err;
 
@@ -218,7 +219,12 @@ static int nft_parse_compat(const struct nlattr *attr, u16 *proto, bool *inv)
 	if (flags & NFT_RULE_COMPAT_F_INV)
 		*inv = true;
 
-	*proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO]));
+	l4proto = ntohl(nla_get_be32(tb[NFTA_RULE_COMPAT_PROTO]));
+	if (l4proto > U16_MAX)
+		return -EINVAL;
+
+	*proto = l4proto;
+
 	return 0;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 04/13] netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 03/13] netfilter: nft_compat: restrict match/target protocol to u16 Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

This has slipped through when reducing memory footprint for set
elements, remove it.

Fixes: 9dad402b89e8 ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_pipapo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index efd523496be4..f24ecdaa1c1e 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -603,7 +603,7 @@ static struct nft_elem_priv *
 nft_pipapo_get(const struct net *net, const struct nft_set *set,
 	       const struct nft_set_elem *elem, unsigned int flags)
 {
-	static struct nft_pipapo_elem *e;
+	struct nft_pipapo_elem *e;
 
 	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
 		       nft_genmask_cur(net));
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 04/13] netfilter: nft_set_pipapo: remove static in nft_pipapo_get() Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 06/13] netfilter: ctnetlink: fix filtering for zone 0 Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

From: Jozsef Kadlecsik <kadlec@netfilter.org>

The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
in swap operation") missed to add the calls to gc cancellations
at the error path of create operations and at module unload. Also,
because the half of the destroy operations now executed by a
function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
or rcu read lock is held and therefore the checking of them results
false warnings.

Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")
Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Стас Ничипорович <stasn77@gmail.com>
Tested-by: Brad Spengler <spender@grsecurity.net>
Tested-by: Стас Ничипорович <stasn77@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c     | 2 ++
 net/netfilter/ipset/ip_set_hash_gen.h | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index bcaad9c009fe..3184cc6be4c9 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
 	return ret;
 
 cleanup:
+	set->variant->cancel_gc(set);
 	set->variant->destroy(set);
 put_out:
 	module_put(set->type->me);
@@ -2378,6 +2379,7 @@ ip_set_net_exit(struct net *net)
 		set = ip_set(inst, i);
 		if (set) {
 			ip_set(inst, i) = NULL;
+			set->variant->cancel_gc(set);
 			ip_set_destroy_set(set);
 		}
 	}
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 1136510521a8..cf3ce72c3de6 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -432,7 +432,7 @@ mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
 	u32 i;
 
 	for (i = 0; i < jhash_size(t->htable_bits); i++) {
-		n = __ipset_dereference(hbucket(t, i));
+		n = (__force struct hbucket *)hbucket(t, i);
 		if (!n)
 			continue;
 		if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
@@ -452,7 +452,7 @@ mtype_destroy(struct ip_set *set)
 	struct htype *h = set->data;
 	struct list_head *l, *lt;
 
-	mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
+	mtype_ahash_destroy(set, (__force struct htable *)h->table, true);
 	list_for_each_safe(l, lt, &h->ad) {
 		list_del(l);
 		kfree(l);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 06/13] netfilter: ctnetlink: fix filtering for zone 0
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 07/13] netfilter: nft_ct: reject direction for ct id Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

From: Felix Huettner <felix.huettner@mail.schwarz>

previously filtering for the default zone would actually skip the zone
filter and flush all zones.

Fixes: eff3c558bb7e ("netfilter: ctnetlink: support filtering by zone")
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Closes: https://lore.kernel.org/netdev/2032238f-31ac-4106-8f22-522e76df5a12@ovn.org/
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_netlink.c          | 12 ++++--
 .../netfilter/conntrack_dump_flush.c          | 43 ++++++++++++++++++-
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 0c22a02c2035..3b846cbdc050 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -876,6 +876,7 @@ struct ctnetlink_filter_u32 {
 
 struct ctnetlink_filter {
 	u8 family;
+	bool zone_filter;
 
 	u_int32_t orig_flags;
 	u_int32_t reply_flags;
@@ -992,9 +993,12 @@ ctnetlink_alloc_filter(const struct nlattr * const cda[], u8 family)
 	if (err)
 		goto err_filter;
 
-	err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
-	if (err < 0)
-		goto err_filter;
+	if (cda[CTA_ZONE]) {
+		err = ctnetlink_parse_zone(cda[CTA_ZONE], &filter->zone);
+		if (err < 0)
+			goto err_filter;
+		filter->zone_filter = true;
+	}
 
 	if (!cda[CTA_FILTER])
 		return filter;
@@ -1148,7 +1152,7 @@ static int ctnetlink_filter_match(struct nf_conn *ct, void *data)
 	if (filter->family && nf_ct_l3num(ct) != filter->family)
 		goto ignore_entry;
 
-	if (filter->zone.id != NF_CT_DEFAULT_ZONE_ID &&
+	if (filter->zone_filter &&
 	    !nf_ct_zone_equal_any(ct, &filter->zone))
 		goto ignore_entry;
 
diff --git a/tools/testing/selftests/netfilter/conntrack_dump_flush.c b/tools/testing/selftests/netfilter/conntrack_dump_flush.c
index f18c6db13bbf..b11ea8ee6719 100644
--- a/tools/testing/selftests/netfilter/conntrack_dump_flush.c
+++ b/tools/testing/selftests/netfilter/conntrack_dump_flush.c
@@ -13,7 +13,7 @@
 #include "../kselftest_harness.h"
 
 #define TEST_ZONE_ID 123
-#define CTA_FILTER_F_CTA_TUPLE_ZONE (1 << 2)
+#define NF_CT_DEFAULT_ZONE_ID 0
 
 static int reply_counter;
 
@@ -336,6 +336,9 @@ FIXTURE_SETUP(conntrack_dump_flush)
 	ret = conntrack_data_generate_v4(self->sock, 0xf4f4f4f4, 0xf5f5f5f5,
 					 TEST_ZONE_ID + 2);
 	EXPECT_EQ(ret, 0);
+	ret = conntrack_data_generate_v4(self->sock, 0xf6f6f6f6, 0xf7f7f7f7,
+					 NF_CT_DEFAULT_ZONE_ID);
+	EXPECT_EQ(ret, 0);
 
 	src = (struct in6_addr) {{
 		.__u6_addr32 = {
@@ -395,6 +398,26 @@ FIXTURE_SETUP(conntrack_dump_flush)
 					 TEST_ZONE_ID + 2);
 	EXPECT_EQ(ret, 0);
 
+	src = (struct in6_addr) {{
+		.__u6_addr32 = {
+			0xb80d0120,
+			0x00000000,
+			0x00000000,
+			0x07000000
+		}
+	}};
+	dst = (struct in6_addr) {{
+		.__u6_addr32 = {
+			0xb80d0120,
+			0x00000000,
+			0x00000000,
+			0x08000000
+		}
+	}};
+	ret = conntrack_data_generate_v6(self->sock, src, dst,
+					 NF_CT_DEFAULT_ZONE_ID);
+	EXPECT_EQ(ret, 0);
+
 	ret = conntracK_count_zone(self->sock, TEST_ZONE_ID);
 	EXPECT_GE(ret, 2);
 	if (ret > 2)
@@ -425,6 +448,24 @@ TEST_F(conntrack_dump_flush, test_flush_by_zone)
 	EXPECT_EQ(ret, 2);
 	ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 2);
 	EXPECT_EQ(ret, 2);
+	ret = conntracK_count_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+	EXPECT_EQ(ret, 2);
+}
+
+TEST_F(conntrack_dump_flush, test_flush_by_zone_default)
+{
+	int ret;
+
+	ret = conntrack_flush_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+	EXPECT_EQ(ret, 0);
+	ret = conntracK_count_zone(self->sock, TEST_ZONE_ID);
+	EXPECT_EQ(ret, 2);
+	ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 1);
+	EXPECT_EQ(ret, 2);
+	ret = conntracK_count_zone(self->sock, TEST_ZONE_ID + 2);
+	EXPECT_EQ(ret, 2);
+	ret = conntracK_count_zone(self->sock, NF_CT_DEFAULT_ZONE_ID);
+	EXPECT_EQ(ret, 0);
 }
 
 TEST_HARNESS_MAIN
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 07/13] netfilter: nft_ct: reject direction for ct id
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 06/13] netfilter: ctnetlink: fix filtering for zone 0 Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 08/13] netfilter: nf_tables: use timestamp to check for set element timeout Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

Direction attribute is ignored, reject it in case this ever needs to be
supported

Fixes: 3087c3f7c23b ("netfilter: nft_ct: Add ct id support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_ct.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index aac98a3c966e..bfd3e5a14dab 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -476,6 +476,9 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
 		break;
 #endif
 	case NFT_CT_ID:
+		if (tb[NFTA_CT_DIRECTION])
+			return -EINVAL;
+
 		len = sizeof(u32);
 		break;
 	default:
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 08/13] netfilter: nf_tables: use timestamp to check for set element timeout
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 07/13] netfilter: nft_ct: reject direction for ct id Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 09/13] netfilter: nfnetlink_queue: un-break NF_REPEAT Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

Add a timestamp field at the beginning of the transaction, store it
in the nftables per-netns area.

Update set backend .insert, .deactivate and sync gc path to use the
timestamp, this avoids that an element expires while control plane
transaction is still unfinished.

.lookup and .update, which are used from packet path, still use the
current time to check if the element has expired. And .get path and dump
also since this runs lockless under rcu read size lock. Then, there is
async gc which also needs to check the current time since it runs
asynchronously from a workqueue.

Fixes: c3e1b005ed1c ("netfilter: nf_tables: add set element timeout support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h | 16 ++++++++++++++--
 net/netfilter/nf_tables_api.c     |  4 +++-
 net/netfilter/nft_set_hash.c      |  8 +++++++-
 net/netfilter/nft_set_pipapo.c    | 18 +++++++++++-------
 net/netfilter/nft_set_rbtree.c    | 11 +++++++----
 5 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 001226c34621..510244cc0f8f 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -808,10 +808,16 @@ static inline struct nft_set_elem_expr *nft_set_ext_expr(const struct nft_set_ex
 	return nft_set_ext(ext, NFT_SET_EXT_EXPRESSIONS);
 }
 
-static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+static inline bool __nft_set_elem_expired(const struct nft_set_ext *ext,
+					  u64 tstamp)
 {
 	return nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION) &&
-	       time_is_before_eq_jiffies64(*nft_set_ext_expiration(ext));
+	       time_after_eq64(tstamp, *nft_set_ext_expiration(ext));
+}
+
+static inline bool nft_set_elem_expired(const struct nft_set_ext *ext)
+{
+	return __nft_set_elem_expired(ext, get_jiffies_64());
 }
 
 static inline struct nft_set_ext *nft_set_elem_ext(const struct nft_set *set,
@@ -1779,6 +1785,7 @@ struct nftables_pernet {
 	struct list_head	notify_list;
 	struct mutex		commit_mutex;
 	u64			table_handle;
+	u64			tstamp;
 	unsigned int		base_seq;
 	unsigned int		gc_seq;
 	u8			validate_state;
@@ -1791,6 +1798,11 @@ static inline struct nftables_pernet *nft_pernet(const struct net *net)
 	return net_generic(net, nf_tables_net_id);
 }
 
+static inline u64 nft_net_tstamp(const struct net *net)
+{
+	return nft_pernet(net)->tstamp;
+}
+
 #define __NFT_REDUCE_READONLY	1UL
 #define NFT_REDUCE_READONLY	(void *)__NFT_REDUCE_READONLY
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index fc016befb46f..f8e3f70c35bd 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -9827,6 +9827,7 @@ struct nft_trans_gc *nft_trans_gc_catchall_async(struct nft_trans_gc *gc,
 struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc)
 {
 	struct nft_set_elem_catchall *catchall, *next;
+	u64 tstamp = nft_net_tstamp(gc->net);
 	const struct nft_set *set = gc->set;
 	struct nft_elem_priv *elem_priv;
 	struct nft_set_ext *ext;
@@ -9836,7 +9837,7 @@ struct nft_trans_gc *nft_trans_gc_catchall_sync(struct nft_trans_gc *gc)
 	list_for_each_entry_safe(catchall, next, &set->catchall_list, list) {
 		ext = nft_set_elem_ext(set, catchall->elem);
 
-		if (!nft_set_elem_expired(ext))
+		if (!__nft_set_elem_expired(ext, tstamp))
 			continue;
 
 		gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
@@ -10622,6 +10623,7 @@ static bool nf_tables_valid_genid(struct net *net, u32 genid)
 	bool genid_ok;
 
 	mutex_lock(&nft_net->commit_mutex);
+	nft_net->tstamp = get_jiffies_64();
 
 	genid_ok = genid == 0 || nft_net->base_seq == genid;
 	if (!genid_ok)
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 6c2061bfdae6..6968a3b34236 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -36,6 +36,7 @@ struct nft_rhash_cmp_arg {
 	const struct nft_set		*set;
 	const u32			*key;
 	u8				genmask;
+	u64				tstamp;
 };
 
 static inline u32 nft_rhash_key(const void *data, u32 len, u32 seed)
@@ -62,7 +63,7 @@ static inline int nft_rhash_cmp(struct rhashtable_compare_arg *arg,
 		return 1;
 	if (nft_set_elem_is_dead(&he->ext))
 		return 1;
-	if (nft_set_elem_expired(&he->ext))
+	if (__nft_set_elem_expired(&he->ext, x->tstamp))
 		return 1;
 	if (!nft_set_elem_active(&he->ext, x->genmask))
 		return 1;
@@ -87,6 +88,7 @@ bool nft_rhash_lookup(const struct net *net, const struct nft_set *set,
 		.genmask = nft_genmask_cur(net),
 		.set	 = set,
 		.key	 = key,
+		.tstamp  = get_jiffies_64(),
 	};
 
 	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -106,6 +108,7 @@ nft_rhash_get(const struct net *net, const struct nft_set *set,
 		.genmask = nft_genmask_cur(net),
 		.set	 = set,
 		.key	 = elem->key.val.data,
+		.tstamp  = get_jiffies_64(),
 	};
 
 	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -131,6 +134,7 @@ static bool nft_rhash_update(struct nft_set *set, const u32 *key,
 		.genmask = NFT_GENMASK_ANY,
 		.set	 = set,
 		.key	 = key,
+		.tstamp  = get_jiffies_64(),
 	};
 
 	he = rhashtable_lookup(&priv->ht, &arg, nft_rhash_params);
@@ -175,6 +179,7 @@ static int nft_rhash_insert(const struct net *net, const struct nft_set *set,
 		.genmask = nft_genmask_next(net),
 		.set	 = set,
 		.key	 = elem->key.val.data,
+		.tstamp	 = nft_net_tstamp(net),
 	};
 	struct nft_rhash_elem *prev;
 
@@ -216,6 +221,7 @@ nft_rhash_deactivate(const struct net *net, const struct nft_set *set,
 		.genmask = nft_genmask_next(net),
 		.set	 = set,
 		.key	 = elem->key.val.data,
+		.tstamp	 = nft_net_tstamp(net),
 	};
 
 	rcu_read_lock();
diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index f24ecdaa1c1e..b17c18203416 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -504,6 +504,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
  * @set:	nftables API set representation
  * @data:	Key data to be matched against existing elements
  * @genmask:	If set, check that element is active in given genmask
+ * @tstamp:	timestamp to check for expired elements
  *
  * This is essentially the same as the lookup function, except that it matches
  * key data against the uncommitted copy and doesn't use preallocated maps for
@@ -513,7 +514,8 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
  */
 static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 					  const struct nft_set *set,
-					  const u8 *data, u8 genmask)
+					  const u8 *data, u8 genmask,
+					  u64 tstamp)
 {
 	struct nft_pipapo_elem *ret = ERR_PTR(-ENOENT);
 	struct nft_pipapo *priv = nft_set_priv(set);
@@ -566,7 +568,7 @@ static struct nft_pipapo_elem *pipapo_get(const struct net *net,
 			goto out;
 
 		if (last) {
-			if (nft_set_elem_expired(&f->mt[b].e->ext))
+			if (__nft_set_elem_expired(&f->mt[b].e->ext, tstamp))
 				goto next_match;
 			if ((genmask &&
 			     !nft_set_elem_active(&f->mt[b].e->ext, genmask)))
@@ -606,7 +608,7 @@ nft_pipapo_get(const struct net *net, const struct nft_set *set,
 	struct nft_pipapo_elem *e;
 
 	e = pipapo_get(net, set, (const u8 *)elem->key.val.data,
-		       nft_genmask_cur(net));
+		       nft_genmask_cur(net), get_jiffies_64());
 	if (IS_ERR(e))
 		return ERR_CAST(e);
 
@@ -1173,6 +1175,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	struct nft_pipapo_match *m = priv->clone;
 	u8 genmask = nft_genmask_next(net);
 	struct nft_pipapo_elem *e, *dup;
+	u64 tstamp = nft_net_tstamp(net);
 	struct nft_pipapo_field *f;
 	const u8 *start_p, *end_p;
 	int i, bsize_max, err = 0;
@@ -1182,7 +1185,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 	else
 		end = start;
 
-	dup = pipapo_get(net, set, start, genmask);
+	dup = pipapo_get(net, set, start, genmask, tstamp);
 	if (!IS_ERR(dup)) {
 		/* Check if we already have the same exact entry */
 		const struct nft_data *dup_key, *dup_end;
@@ -1204,7 +1207,7 @@ static int nft_pipapo_insert(const struct net *net, const struct nft_set *set,
 
 	if (PTR_ERR(dup) == -ENOENT) {
 		/* Look for partially overlapping entries */
-		dup = pipapo_get(net, set, end, nft_genmask_next(net));
+		dup = pipapo_get(net, set, end, nft_genmask_next(net), tstamp);
 	}
 
 	if (PTR_ERR(dup) != -ENOENT) {
@@ -1560,6 +1563,7 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
 	struct net *net = read_pnet(&set->net);
+	u64 tstamp = nft_net_tstamp(net);
 	int rules_f0, first_rule = 0;
 	struct nft_pipapo_elem *e;
 	struct nft_trans_gc *gc;
@@ -1594,7 +1598,7 @@ static void pipapo_gc(struct nft_set *set, struct nft_pipapo_match *m)
 		/* synchronous gc never fails, there is no need to set on
 		 * NFT_SET_ELEM_DEAD_BIT.
 		 */
-		if (nft_set_elem_expired(&e->ext)) {
+		if (__nft_set_elem_expired(&e->ext, tstamp)) {
 			priv->dirty = true;
 
 			gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
@@ -1769,7 +1773,7 @@ static void *pipapo_deactivate(const struct net *net, const struct nft_set *set,
 {
 	struct nft_pipapo_elem *e;
 
-	e = pipapo_get(net, set, data, nft_genmask_next(net));
+	e = pipapo_get(net, set, data, nft_genmask_next(net), nft_net_tstamp(net));
 	if (IS_ERR(e))
 		return NULL;
 
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index baa3fea4fe65..5fd74f993988 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -313,6 +313,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
 	struct nft_rbtree *priv = nft_set_priv(set);
 	u8 cur_genmask = nft_genmask_cur(net);
 	u8 genmask = nft_genmask_next(net);
+	u64 tstamp = nft_net_tstamp(net);
 	int d;
 
 	/* Descend the tree to search for an existing element greater than the
@@ -360,7 +361,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
 		/* perform garbage collection to avoid bogus overlap reports
 		 * but skip new elements in this transaction.
 		 */
-		if (nft_set_elem_expired(&rbe->ext) &&
+		if (__nft_set_elem_expired(&rbe->ext, tstamp) &&
 		    nft_set_elem_active(&rbe->ext, cur_genmask)) {
 			const struct nft_rbtree_elem *removed_end;
 
@@ -551,6 +552,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
 	const struct nft_rbtree *priv = nft_set_priv(set);
 	const struct rb_node *parent = priv->root.rb_node;
 	u8 genmask = nft_genmask_next(net);
+	u64 tstamp = nft_net_tstamp(net);
 	int d;
 
 	while (parent != NULL) {
@@ -571,7 +573,7 @@ nft_rbtree_deactivate(const struct net *net, const struct nft_set *set,
 				   nft_rbtree_interval_end(this)) {
 				parent = parent->rb_right;
 				continue;
-			} else if (nft_set_elem_expired(&rbe->ext)) {
+			} else if (__nft_set_elem_expired(&rbe->ext, tstamp)) {
 				break;
 			} else if (!nft_set_elem_active(&rbe->ext, genmask)) {
 				parent = parent->rb_left;
@@ -624,9 +626,10 @@ static void nft_rbtree_gc(struct nft_set *set)
 {
 	struct nft_rbtree *priv = nft_set_priv(set);
 	struct nft_rbtree_elem *rbe, *rbe_end = NULL;
+	struct net *net = read_pnet(&set->net);
+	u64 tstamp = nft_net_tstamp(net);
 	struct rb_node *node, *next;
 	struct nft_trans_gc *gc;
-	struct net *net;
 
 	set  = nft_set_container_of(priv);
 	net  = read_pnet(&set->net);
@@ -648,7 +651,7 @@ static void nft_rbtree_gc(struct nft_set *set)
 			rbe_end = rbe;
 			continue;
 		}
-		if (!nft_set_elem_expired(&rbe->ext))
+		if (!__nft_set_elem_expired(&rbe->ext, tstamp))
 			continue;
 
 		gc = nft_trans_gc_queue_sync(gc, GFP_KERNEL);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 09/13] netfilter: nfnetlink_queue: un-break NF_REPEAT
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 08/13] netfilter: nf_tables: use timestamp to check for set element timeout Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 10/13] netfilter: nft_set_rbtree: skip end interval element from gc Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

From: Florian Westphal <fw@strlen.de>

Only override userspace verdict if the ct hook returns something
other than ACCEPT.

Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
(move to next hook).

Fixes: 6291b3a67ad5 ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
Reported-by: l.6diay@passmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nfnetlink_queue.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 171d1f52d3dd..5cf38fc0a366 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -232,18 +232,25 @@ static void nfqnl_reinject(struct nf_queue_entry *entry, unsigned int verdict)
 	if (verdict == NF_ACCEPT ||
 	    verdict == NF_REPEAT ||
 	    verdict == NF_STOP) {
+		unsigned int ct_verdict = verdict;
+
 		rcu_read_lock();
 		ct_hook = rcu_dereference(nf_ct_hook);
 		if (ct_hook)
-			verdict = ct_hook->update(entry->state.net, entry->skb);
+			ct_verdict = ct_hook->update(entry->state.net, entry->skb);
 		rcu_read_unlock();
 
-		switch (verdict & NF_VERDICT_MASK) {
+		switch (ct_verdict & NF_VERDICT_MASK) {
+		case NF_ACCEPT:
+			/* follow userspace verdict, could be REPEAT */
+			break;
 		case NF_STOLEN:
 			nf_queue_entry_free(entry);
 			return;
+		default:
+			verdict = ct_verdict & NF_VERDICT_MASK;
+			break;
 		}
-
 	}
 	nf_reinject(entry, verdict);
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 10/13] netfilter: nft_set_rbtree: skip end interval element from gc
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 09/13] netfilter: nfnetlink_queue: un-break NF_REPEAT Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 11/13] netfilter: nft_set_pipapo: store index in scratch maps Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

rbtree lazy gc on insert might collect an end interval element that has
been just added in this transactions, skip end interval elements that
are not yet active.

Fixes: f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk")
Cc: stable@vger.kernel.org
Reported-by: lonial con <kongln9170@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_rbtree.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 5fd74f993988..9944fe479e53 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -234,7 +234,7 @@ static void nft_rbtree_gc_elem_remove(struct net *net, struct nft_set *set,
 
 static const struct nft_rbtree_elem *
 nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
-		   struct nft_rbtree_elem *rbe, u8 genmask)
+		   struct nft_rbtree_elem *rbe)
 {
 	struct nft_set *set = (struct nft_set *)__set;
 	struct rb_node *prev = rb_prev(&rbe->node);
@@ -253,7 +253,7 @@ nft_rbtree_gc_elem(const struct nft_set *__set, struct nft_rbtree *priv,
 	while (prev) {
 		rbe_prev = rb_entry(prev, struct nft_rbtree_elem, node);
 		if (nft_rbtree_interval_end(rbe_prev) &&
-		    nft_set_elem_active(&rbe_prev->ext, genmask))
+		    nft_set_elem_active(&rbe_prev->ext, NFT_GENMASK_ANY))
 			break;
 
 		prev = rb_prev(prev);
@@ -365,7 +365,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,
 		    nft_set_elem_active(&rbe->ext, cur_genmask)) {
 			const struct nft_rbtree_elem *removed_end;
 
-			removed_end = nft_rbtree_gc_elem(set, priv, rbe, genmask);
+			removed_end = nft_rbtree_gc_elem(set, priv, rbe);
 			if (IS_ERR(removed_end))
 				return PTR_ERR(removed_end);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 11/13] netfilter: nft_set_pipapo: store index in scratch maps
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 10/13] netfilter: nft_set_rbtree: skip end interval element from gc Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 12/13] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 13/13] netfilter: nft_set_pipapo: remove scratch_aligned pointer Pablo Neira Ayuso
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

From: Florian Westphal <fw@strlen.de>

Pipapo needs a scratchpad area to keep state during matching.
This state can be large and thus cannot reside on stack.

Each set preallocates percpu areas for this.

On each match stage, one scratchpad half starts with all-zero and the other
is inited to all-ones.

At the end of each stage, the half that starts with all-ones is
always zero.  Before next field is tested, pointers to the two halves
are swapped, i.e.  resmap pointer turns into fill pointer and vice versa.

After the last field has been processed, pipapo stashes the
index toggle in a percpu variable, with assumption that next packet
will start with the all-zero half and sets all bits in the other to 1.

This isn't reliable.

There can be multiple sets and we can't be sure that the upper
and lower half of all set scratch map is always in sync (lookups
can be conditional), so one set might have swapped, but other might
not have been queried.

Thus we need to keep the index per-set-and-cpu, just like the
scratchpad.

Note that this bug fix is incomplete, there is a related issue.

avx2 and normal implementation might use slightly different areas of the
map array space due to the avx2 alignment requirements, so
m->scratch (generic/fallback implementation) and ->scratch_aligned
(avx) may partially overlap. scratch and scratch_aligned are not distinct
objects, the latter is just the aligned address of the former.

After this change, write to scratch_align->map_index may write to
scratch->map, so this issue becomes more prominent, we can set to 1
a bit in the supposedly-all-zero area of scratch->map[].

A followup patch will remove the scratch_aligned and makes generic and
avx code use the same (aligned) area.

Its done in a separate change to ease review.

Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_pipapo.c      | 41 ++++++++++++++++++-----------
 net/netfilter/nft_set_pipapo.h      | 14 ++++++++--
 net/netfilter/nft_set_pipapo_avx2.c | 15 +++++------
 3 files changed, 44 insertions(+), 26 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index b17c18203416..54d0bac140a3 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -342,9 +342,6 @@
 #include "nft_set_pipapo_avx2.h"
 #include "nft_set_pipapo.h"
 
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_scratch_index);
-
 /**
  * pipapo_refill() - For each set bit, set bits from selected mapping table item
  * @map:	Bitmap to be scanned for set bits
@@ -412,6 +409,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 		       const u32 *key, const struct nft_set_ext **ext)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
+	struct nft_pipapo_scratch *scratch;
 	unsigned long *res_map, *fill_map;
 	u8 genmask = nft_genmask_cur(net);
 	const u8 *rp = (const u8 *)key;
@@ -422,15 +420,17 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 
 	local_bh_disable();
 
-	map_index = raw_cpu_read(nft_pipapo_scratch_index);
-
 	m = rcu_dereference(priv->match);
 
 	if (unlikely(!m || !*raw_cpu_ptr(m->scratch)))
 		goto out;
 
-	res_map  = *raw_cpu_ptr(m->scratch) + (map_index ? m->bsize_max : 0);
-	fill_map = *raw_cpu_ptr(m->scratch) + (map_index ? 0 : m->bsize_max);
+	scratch = *raw_cpu_ptr(m->scratch);
+
+	map_index = scratch->map_index;
+
+	res_map  = scratch->map + (map_index ? m->bsize_max : 0);
+	fill_map = scratch->map + (map_index ? 0 : m->bsize_max);
 
 	memset(res_map, 0xff, m->bsize_max * sizeof(*res_map));
 
@@ -460,7 +460,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 		b = pipapo_refill(res_map, f->bsize, f->rules, fill_map, f->mt,
 				  last);
 		if (b < 0) {
-			raw_cpu_write(nft_pipapo_scratch_index, map_index);
+			scratch->map_index = map_index;
 			local_bh_enable();
 
 			return false;
@@ -477,7 +477,7 @@ bool nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
 			 * current inactive bitmap is clean and can be reused as
 			 * *next* bitmap (not initial) for the next packet.
 			 */
-			raw_cpu_write(nft_pipapo_scratch_index, map_index);
+			scratch->map_index = map_index;
 			local_bh_enable();
 
 			return true;
@@ -1123,12 +1123,12 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 	int i;
 
 	for_each_possible_cpu(i) {
-		unsigned long *scratch;
+		struct nft_pipapo_scratch *scratch;
 #ifdef NFT_PIPAPO_ALIGN
-		unsigned long *scratch_aligned;
+		void *scratch_aligned;
 #endif
-
-		scratch = kzalloc_node(bsize_max * sizeof(*scratch) * 2 +
+		scratch = kzalloc_node(struct_size(scratch, map,
+						   bsize_max * 2) +
 				       NFT_PIPAPO_ALIGN_HEADROOM,
 				       GFP_KERNEL, cpu_to_node(i));
 		if (!scratch) {
@@ -1147,7 +1147,16 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 		*per_cpu_ptr(clone->scratch, i) = scratch;
 
 #ifdef NFT_PIPAPO_ALIGN
-		scratch_aligned = NFT_PIPAPO_LT_ALIGN(scratch);
+		/* Align &scratch->map (not the struct itself): the extra
+		 * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node()
+		 * above guarantee we can waste up to those bytes in order
+		 * to align the map field regardless of its offset within
+		 * the struct.
+		 */
+		BUILD_BUG_ON(offsetof(struct nft_pipapo_scratch, map) > NFT_PIPAPO_ALIGN_HEADROOM);
+
+		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
+		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
 		*per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
 #endif
 	}
@@ -2136,7 +2145,7 @@ static int nft_pipapo_init(const struct nft_set *set,
 	m->field_count = field_count;
 	m->bsize_max = 0;
 
-	m->scratch = alloc_percpu(unsigned long *);
+	m->scratch = alloc_percpu(struct nft_pipapo_scratch *);
 	if (!m->scratch) {
 		err = -ENOMEM;
 		goto out_scratch;
@@ -2145,7 +2154,7 @@ static int nft_pipapo_init(const struct nft_set *set,
 		*per_cpu_ptr(m->scratch, i) = NULL;
 
 #ifdef NFT_PIPAPO_ALIGN
-	m->scratch_aligned = alloc_percpu(unsigned long *);
+	m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *);
 	if (!m->scratch_aligned) {
 		err = -ENOMEM;
 		goto out_free;
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index 1040223da5fa..d3bc1551694f 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -130,6 +130,16 @@ struct nft_pipapo_field {
 	union nft_pipapo_map_bucket *mt;
 };
 
+/**
+ * struct nft_pipapo_scratch - percpu data used for lookup and matching
+ * @map_index:	Current working bitmap index, toggled between field matches
+ * @map:	store partial matching results during lookup
+ */
+struct nft_pipapo_scratch {
+	u8 map_index;
+	unsigned long map[];
+};
+
 /**
  * struct nft_pipapo_match - Data used for lookup and matching
  * @field_count		Amount of fields in set
@@ -142,9 +152,9 @@ struct nft_pipapo_field {
 struct nft_pipapo_match {
 	int field_count;
 #ifdef NFT_PIPAPO_ALIGN
-	unsigned long * __percpu *scratch_aligned;
+	struct nft_pipapo_scratch * __percpu *scratch_aligned;
 #endif
-	unsigned long * __percpu *scratch;
+	struct nft_pipapo_scratch * __percpu *scratch;
 	size_t bsize_max;
 	struct rcu_head rcu;
 	struct nft_pipapo_field f[] __counted_by(field_count);
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 52e0d026d30a..78213c73af2e 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -71,9 +71,6 @@
 #define NFT_PIPAPO_AVX2_ZERO(reg)					\
 	asm volatile("vpxor %ymm" #reg ", %ymm" #reg ", %ymm" #reg)
 
-/* Current working bitmap index, toggled between field matches */
-static DEFINE_PER_CPU(bool, nft_pipapo_avx2_scratch_index);
-
 /**
  * nft_pipapo_avx2_prepare() - Prepare before main algorithm body
  *
@@ -1120,11 +1117,12 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 			    const u32 *key, const struct nft_set_ext **ext)
 {
 	struct nft_pipapo *priv = nft_set_priv(set);
-	unsigned long *res, *fill, *scratch;
+	struct nft_pipapo_scratch *scratch;
 	u8 genmask = nft_genmask_cur(net);
 	const u8 *rp = (const u8 *)key;
 	struct nft_pipapo_match *m;
 	struct nft_pipapo_field *f;
+	unsigned long *res, *fill;
 	bool map_index;
 	int i, ret = 0;
 
@@ -1146,10 +1144,11 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 		kernel_fpu_end();
 		return false;
 	}
-	map_index = raw_cpu_read(nft_pipapo_avx2_scratch_index);
 
-	res  = scratch + (map_index ? m->bsize_max : 0);
-	fill = scratch + (map_index ? 0 : m->bsize_max);
+	map_index = scratch->map_index;
+
+	res  = scratch->map + (map_index ? m->bsize_max : 0);
+	fill = scratch->map + (map_index ? 0 : m->bsize_max);
 
 	/* Starting map doesn't need to be set for this implementation */
 
@@ -1221,7 +1220,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 
 out:
 	if (i % 2)
-		raw_cpu_write(nft_pipapo_avx2_scratch_index, !map_index);
+		scratch->map_index = !map_index;
 	kernel_fpu_end();
 
 	return ret >= 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 12/13] netfilter: nft_set_pipapo: add helper to release pcpu scratch area
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 11/13] netfilter: nft_set_pipapo: store index in scratch maps Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  2024-02-08 11:28 ` [PATCH net 13/13] netfilter: nft_set_pipapo: remove scratch_aligned pointer Pablo Neira Ayuso
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

From: Florian Westphal <fw@strlen.de>

After next patch simple kfree() is not enough anymore, so add
a helper for it.

Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_pipapo.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 54d0bac140a3..5094d4c439c3 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1110,6 +1110,24 @@ static void pipapo_map(struct nft_pipapo_match *m,
 		f->mt[map[i].to + j].e = e;
 }
 
+/**
+ * pipapo_free_scratch() - Free per-CPU map at original (not aligned) address
+ * @m:		Matching data
+ * @cpu:	CPU number
+ */
+static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int cpu)
+{
+	struct nft_pipapo_scratch *s;
+	void *mem;
+
+	s = *per_cpu_ptr(m->scratch, cpu);
+	if (!s)
+		return;
+
+	mem = s;
+	kfree(mem);
+}
+
 /**
  * pipapo_realloc_scratch() - Reallocate scratch maps for partial match results
  * @clone:	Copy of matching data with pending insertions and deletions
@@ -1142,7 +1160,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 			return -ENOMEM;
 		}
 
-		kfree(*per_cpu_ptr(clone->scratch, i));
+		pipapo_free_scratch(clone, i);
 
 		*per_cpu_ptr(clone->scratch, i) = scratch;
 
@@ -1369,7 +1387,7 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	}
 out_scratch_realloc:
 	for_each_possible_cpu(i)
-		kfree(*per_cpu_ptr(new->scratch, i));
+		pipapo_free_scratch(new, i);
 #ifdef NFT_PIPAPO_ALIGN
 	free_percpu(new->scratch_aligned);
 #endif
@@ -1653,7 +1671,7 @@ static void pipapo_free_match(struct nft_pipapo_match *m)
 	int i;
 
 	for_each_possible_cpu(i)
-		kfree(*per_cpu_ptr(m->scratch, i));
+		pipapo_free_scratch(m, i);
 
 #ifdef NFT_PIPAPO_ALIGN
 	free_percpu(m->scratch_aligned);
@@ -2253,7 +2271,7 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 		free_percpu(m->scratch_aligned);
 #endif
 		for_each_possible_cpu(cpu)
-			kfree(*per_cpu_ptr(m->scratch, cpu));
+			pipapo_free_scratch(m, cpu);
 		free_percpu(m->scratch);
 		pipapo_free_fields(m);
 		kfree(m);
@@ -2270,7 +2288,7 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 		free_percpu(priv->clone->scratch_aligned);
 #endif
 		for_each_possible_cpu(cpu)
-			kfree(*per_cpu_ptr(priv->clone->scratch, cpu));
+			pipapo_free_scratch(priv->clone, cpu);
 		free_percpu(priv->clone->scratch);
 
 		pipapo_free_fields(priv->clone);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net 13/13] netfilter: nft_set_pipapo: remove scratch_aligned pointer
  2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
                   ` (11 preceding siblings ...)
  2024-02-08 11:28 ` [PATCH net 12/13] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Pablo Neira Ayuso
@ 2024-02-08 11:28 ` Pablo Neira Ayuso
  12 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08 11:28 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw, kadlec

From: Florian Westphal <fw@strlen.de>

use ->scratch for both avx2 and the generic implementation.

After previous change the scratch->map member is always aligned properly
for AVX2, so we can just use scratch->map in AVX2 too.

The alignoff delta is stored in the scratchpad so we can reconstruct
the correct address to free the area again.

Fixes: 7400b063969b ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_set_pipapo.c      | 41 +++++------------------------
 net/netfilter/nft_set_pipapo.h      |  6 ++---
 net/netfilter/nft_set_pipapo_avx2.c |  2 +-
 3 files changed, 10 insertions(+), 39 deletions(-)

diff --git a/net/netfilter/nft_set_pipapo.c b/net/netfilter/nft_set_pipapo.c
index 5094d4c439c3..aa1d9e93a9a0 100644
--- a/net/netfilter/nft_set_pipapo.c
+++ b/net/netfilter/nft_set_pipapo.c
@@ -1125,6 +1125,7 @@ static void pipapo_free_scratch(const struct nft_pipapo_match *m, unsigned int c
 		return;
 
 	mem = s;
+	mem -= s->align_off;
 	kfree(mem);
 }
 
@@ -1144,6 +1145,7 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 		struct nft_pipapo_scratch *scratch;
 #ifdef NFT_PIPAPO_ALIGN
 		void *scratch_aligned;
+		u32 align_off;
 #endif
 		scratch = kzalloc_node(struct_size(scratch, map,
 						   bsize_max * 2) +
@@ -1162,8 +1164,6 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 
 		pipapo_free_scratch(clone, i);
 
-		*per_cpu_ptr(clone->scratch, i) = scratch;
-
 #ifdef NFT_PIPAPO_ALIGN
 		/* Align &scratch->map (not the struct itself): the extra
 		 * %NFT_PIPAPO_ALIGN_HEADROOM bytes passed to kzalloc_node()
@@ -1175,8 +1175,12 @@ static int pipapo_realloc_scratch(struct nft_pipapo_match *clone,
 
 		scratch_aligned = NFT_PIPAPO_LT_ALIGN(&scratch->map);
 		scratch_aligned -= offsetof(struct nft_pipapo_scratch, map);
-		*per_cpu_ptr(clone->scratch_aligned, i) = scratch_aligned;
+		align_off = scratch_aligned - (void *)scratch;
+
+		scratch = scratch_aligned;
+		scratch->align_off = align_off;
 #endif
+		*per_cpu_ptr(clone->scratch, i) = scratch;
 	}
 
 	return 0;
@@ -1331,11 +1335,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 	if (!new->scratch)
 		goto out_scratch;
 
-#ifdef NFT_PIPAPO_ALIGN
-	new->scratch_aligned = alloc_percpu(*new->scratch_aligned);
-	if (!new->scratch_aligned)
-		goto out_scratch;
-#endif
 	for_each_possible_cpu(i)
 		*per_cpu_ptr(new->scratch, i) = NULL;
 
@@ -1388,9 +1387,6 @@ static struct nft_pipapo_match *pipapo_clone(struct nft_pipapo_match *old)
 out_scratch_realloc:
 	for_each_possible_cpu(i)
 		pipapo_free_scratch(new, i);
-#ifdef NFT_PIPAPO_ALIGN
-	free_percpu(new->scratch_aligned);
-#endif
 out_scratch:
 	free_percpu(new->scratch);
 	kfree(new);
@@ -1673,11 +1669,7 @@ static void pipapo_free_match(struct nft_pipapo_match *m)
 	for_each_possible_cpu(i)
 		pipapo_free_scratch(m, i);
 
-#ifdef NFT_PIPAPO_ALIGN
-	free_percpu(m->scratch_aligned);
-#endif
 	free_percpu(m->scratch);
-
 	pipapo_free_fields(m);
 
 	kfree(m);
@@ -2171,16 +2163,6 @@ static int nft_pipapo_init(const struct nft_set *set,
 	for_each_possible_cpu(i)
 		*per_cpu_ptr(m->scratch, i) = NULL;
 
-#ifdef NFT_PIPAPO_ALIGN
-	m->scratch_aligned = alloc_percpu(struct nft_pipapo_scratch *);
-	if (!m->scratch_aligned) {
-		err = -ENOMEM;
-		goto out_free;
-	}
-	for_each_possible_cpu(i)
-		*per_cpu_ptr(m->scratch_aligned, i) = NULL;
-#endif
-
 	rcu_head_init(&m->rcu);
 
 	nft_pipapo_for_each_field(f, i, m) {
@@ -2211,9 +2193,6 @@ static int nft_pipapo_init(const struct nft_set *set,
 	return 0;
 
 out_free:
-#ifdef NFT_PIPAPO_ALIGN
-	free_percpu(m->scratch_aligned);
-#endif
 	free_percpu(m->scratch);
 out_scratch:
 	kfree(m);
@@ -2267,9 +2246,6 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 
 		nft_set_pipapo_match_destroy(ctx, set, m);
 
-#ifdef NFT_PIPAPO_ALIGN
-		free_percpu(m->scratch_aligned);
-#endif
 		for_each_possible_cpu(cpu)
 			pipapo_free_scratch(m, cpu);
 		free_percpu(m->scratch);
@@ -2284,9 +2260,6 @@ static void nft_pipapo_destroy(const struct nft_ctx *ctx,
 		if (priv->dirty)
 			nft_set_pipapo_match_destroy(ctx, set, m);
 
-#ifdef NFT_PIPAPO_ALIGN
-		free_percpu(priv->clone->scratch_aligned);
-#endif
 		for_each_possible_cpu(cpu)
 			pipapo_free_scratch(priv->clone, cpu);
 		free_percpu(priv->clone->scratch);
diff --git a/net/netfilter/nft_set_pipapo.h b/net/netfilter/nft_set_pipapo.h
index d3bc1551694f..f59a0cd81105 100644
--- a/net/netfilter/nft_set_pipapo.h
+++ b/net/netfilter/nft_set_pipapo.h
@@ -133,10 +133,12 @@ struct nft_pipapo_field {
 /**
  * struct nft_pipapo_scratch - percpu data used for lookup and matching
  * @map_index:	Current working bitmap index, toggled between field matches
+ * @align_off:	Offset to get the originally allocated address
  * @map:	store partial matching results during lookup
  */
 struct nft_pipapo_scratch {
 	u8 map_index;
+	u32 align_off;
 	unsigned long map[];
 };
 
@@ -144,16 +146,12 @@ struct nft_pipapo_scratch {
  * struct nft_pipapo_match - Data used for lookup and matching
  * @field_count		Amount of fields in set
  * @scratch:		Preallocated per-CPU maps for partial matching results
- * @scratch_aligned:	Version of @scratch aligned to NFT_PIPAPO_ALIGN bytes
  * @bsize_max:		Maximum lookup table bucket size of all fields, in longs
  * @rcu			Matching data is swapped on commits
  * @f:			Fields, with lookup and mapping tables
  */
 struct nft_pipapo_match {
 	int field_count;
-#ifdef NFT_PIPAPO_ALIGN
-	struct nft_pipapo_scratch * __percpu *scratch_aligned;
-#endif
 	struct nft_pipapo_scratch * __percpu *scratch;
 	size_t bsize_max;
 	struct rcu_head rcu;
diff --git a/net/netfilter/nft_set_pipapo_avx2.c b/net/netfilter/nft_set_pipapo_avx2.c
index 78213c73af2e..90e275bb3e5d 100644
--- a/net/netfilter/nft_set_pipapo_avx2.c
+++ b/net/netfilter/nft_set_pipapo_avx2.c
@@ -1139,7 +1139,7 @@ bool nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
 	 */
 	kernel_fpu_begin_mask(0);
 
-	scratch = *raw_cpu_ptr(m->scratch_aligned);
+	scratch = *raw_cpu_ptr(m->scratch);
 	if (unlikely(!scratch)) {
 		kernel_fpu_end();
 		return false;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits
  2024-02-08 11:28 ` [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits Pablo Neira Ayuso
@ 2024-02-08 12:30   ` patchwork-bot+netdevbpf
  2024-02-08 14:51     ` Paolo Abeni
  0 siblings, 1 reply; 22+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-02-08 12:30 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, kadlec

Hello:

This series was applied to netdev/net.git (main)
by Pablo Neira Ayuso <pablo@netfilter.org>:

On Thu,  8 Feb 2024 12:28:22 +0100 you wrote:
> xt_find_revision() expects u8, restrict it to this datatype.
> 
> Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  net/netfilter/nft_compat.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Here is the summary with links:
  - [net,01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits
    https://git.kernel.org/netdev/net/c/36fa8d697132
  - [net,02/13] netfilter: nft_compat: reject unused compat flag
    https://git.kernel.org/netdev/net/c/292781c3c548
  - [net,03/13] netfilter: nft_compat: restrict match/target protocol to u16
    https://git.kernel.org/netdev/net/c/d694b754894c
  - [net,04/13] netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
    https://git.kernel.org/netdev/net/c/ab0beafd52b9
  - [net,05/13] netfilter: ipset: Missing gc cancellations fixed
    https://git.kernel.org/netdev/net/c/27c5a095e251
  - [net,06/13] netfilter: ctnetlink: fix filtering for zone 0
    https://git.kernel.org/netdev/net/c/fa173a1b4e3f
  - [net,07/13] netfilter: nft_ct: reject direction for ct id
    https://git.kernel.org/netdev/net/c/38ed1c7062ad
  - [net,08/13] netfilter: nf_tables: use timestamp to check for set element timeout
    https://git.kernel.org/netdev/net/c/7395dfacfff6
  - [net,09/13] netfilter: nfnetlink_queue: un-break NF_REPEAT
    https://git.kernel.org/netdev/net/c/f82777e8ce6c
  - [net,10/13] netfilter: nft_set_rbtree: skip end interval element from gc
    https://git.kernel.org/netdev/net/c/60c0c230c6f0
  - [net,11/13] netfilter: nft_set_pipapo: store index in scratch maps
    https://git.kernel.org/netdev/net/c/76313d1a4aa9
  - [net,12/13] netfilter: nft_set_pipapo: add helper to release pcpu scratch area
    https://git.kernel.org/netdev/net/c/47b1c03c3c1a
  - [net,13/13] netfilter: nft_set_pipapo: remove scratch_aligned pointer
    https://git.kernel.org/netdev/net/c/5a8cdf6fd860

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits
  2024-02-08 12:30   ` patchwork-bot+netdevbpf
@ 2024-02-08 14:51     ` Paolo Abeni
  0 siblings, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2024-02-08 14:51 UTC (permalink / raw)
  To: patchwork-bot+netdevbpf, Pablo Neira Ayuso
  Cc: netfilter-devel, davem, netdev, kuba, edumazet, fw, kadlec

On Thu, 2024-02-08 at 12:30 +0000, patchwork-bot+netdevbpf@kernel.org
wrote:
> Hello:
> 
> This series was applied to netdev/net.git (main)
> by Pablo Neira Ayuso <pablo@netfilter.org>:
> 
> On Thu,  8 Feb 2024 12:28:22 +0100 you wrote:
> > xt_find_revision() expects u8, restrict it to this datatype.
> > 
> > Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables")
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  net/netfilter/nft_compat.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> Here is the summary with links:
>   - [net,01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits
>     https://git.kernel.org/netdev/net/c/36fa8d697132
>   - [net,02/13] netfilter: nft_compat: reject unused compat flag
>     https://git.kernel.org/netdev/net/c/292781c3c548
>   - [net,03/13] netfilter: nft_compat: restrict match/target protocol to u16
>     https://git.kernel.org/netdev/net/c/d694b754894c
>   - [net,04/13] netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
>     https://git.kernel.org/netdev/net/c/ab0beafd52b9
>   - [net,05/13] netfilter: ipset: Missing gc cancellations fixed
>     https://git.kernel.org/netdev/net/c/27c5a095e251
>   - [net,06/13] netfilter: ctnetlink: fix filtering for zone 0
>     https://git.kernel.org/netdev/net/c/fa173a1b4e3f
>   - [net,07/13] netfilter: nft_ct: reject direction for ct id
>     https://git.kernel.org/netdev/net/c/38ed1c7062ad
>   - [net,08/13] netfilter: nf_tables: use timestamp to check for set element timeout
>     https://git.kernel.org/netdev/net/c/7395dfacfff6
>   - [net,09/13] netfilter: nfnetlink_queue: un-break NF_REPEAT
>     https://git.kernel.org/netdev/net/c/f82777e8ce6c
>   - [net,10/13] netfilter: nft_set_rbtree: skip end interval element from gc
>     https://git.kernel.org/netdev/net/c/60c0c230c6f0
>   - [net,11/13] netfilter: nft_set_pipapo: store index in scratch maps
>     https://git.kernel.org/netdev/net/c/76313d1a4aa9
>   - [net,12/13] netfilter: nft_set_pipapo: add helper to release pcpu scratch area
>     https://git.kernel.org/netdev/net/c/47b1c03c3c1a
>   - [net,13/13] netfilter: nft_set_pipapo: remove scratch_aligned pointer
>     https://git.kernel.org/netdev/net/c/5a8cdf6fd860
> 
> You are awesome, thank you!

FTR, the patchwork bot went wild, I pulled _v2_ 

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-08  9:01   ` Paolo Abeni
@ 2024-02-08  9:31     ` Jozsef Kadlecsik
  0 siblings, 0 replies; 22+ messages in thread
From: Jozsef Kadlecsik @ 2024-02-08  9:31 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Pablo Neira Ayuso, netfilter-devel, David Miller, netdev, kuba,
	edumazet, Florian Westphal

[-- Attachment #1: Type: text/plain, Size: 3916 bytes --]

On Thu, 8 Feb 2024, Paolo Abeni wrote:

> On Thu, 2024-02-08 at 00:37 +0100, Pablo Neira Ayuso wrote:
> > From: Jozsef Kadlecsik <kadlec@netfilter.org>
> > 
> > The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
> > in swap operation") missed to add the calls to gc cancellations
> > at the error path of create operations and at module unload. Also,
> > because the half of the destroy operations now executed by a
> > function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
> > or rcu read lock is held and therefore the checking of them results
> > false warnings.
> > 
> > Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
> > Reported-by: Brad Spengler <spender@grsecurity.net>
> > Reported-by: Стас Ничипорович <stasn77@gmail.com>
> > Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation")
> > Tested-by: Brad Spengler <spender@grsecurity.net>
> > Tested-by: Стас Ничипорович <stasn77@gmail.com>
> > Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  net/netfilter/ipset/ip_set_core.c     | 2 ++
> >  net/netfilter/ipset/ip_set_hash_gen.h | 4 ++--
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
> > index bcaad9c009fe..3184cc6be4c9 100644
> > --- a/net/netfilter/ipset/ip_set_core.c
> > +++ b/net/netfilter/ipset/ip_set_core.c
> > @@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
> >  	return ret;
> >  
> >  cleanup:
> > +	set->variant->cancel_gc(set);
> >  	set->variant->destroy(set);
> >  put_out:
> >  	module_put(set->type->me);
> > @@ -2378,6 +2379,7 @@ ip_set_net_exit(struct net *net)
> >  		set = ip_set(inst, i);
> >  		if (set) {
> >  			ip_set(inst, i) = NULL;
> > +			set->variant->cancel_gc(set);
> >  			ip_set_destroy_set(set);
> >  		}
> >  	}
> > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> > index 1136510521a8..cfa5eecbe393 100644
> > --- a/net/netfilter/ipset/ip_set_hash_gen.h
> > +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> > @@ -432,7 +432,7 @@ mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
> >  	u32 i;
> >  
> >  	for (i = 0; i < jhash_size(t->htable_bits); i++) {
> > -		n = __ipset_dereference(hbucket(t, i));
> 
> AFAICS __ipset_dereference() should not trigger any warning, as it
> boils down to rcu_dereference_protected()

The destroy operation is split into two parts. The second one is called 
now via call_rcu() when neither NFNL_SUBSYS_IPSET is held nor it is an rcu 
protected area, which conditions are checked by __ipset_dereference(). So 
the original lines above and below would generate suspicious RCU usage 
warnings. That's why I removed these two __ipset_dereference() calls.
 
> > +		n = hbucket(t, i);
> 
> This statement instead triggers a sparse warning.

Yeah, that's due to 'struct hbucket *' != 'struct hbucket __rcu *'. 
I'll send a patch to fix it. Thanks!

Best regards,
Jozsef
 
> >  		if (!n)
> >  			continue;
> >  		if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
> > @@ -452,7 +452,7 @@ mtype_destroy(struct ip_set *set)
> >  	struct htype *h = set->data;
> >  	struct list_head *l, *lt;
> >  
> > -	mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
> > +	mtype_ahash_destroy(set, h->table, true);
> 
> The same here. What about using always __ipset_dereference()?
> 
> Cheers,
> 
> Paolo
> 
> 
> 

-- 
E-mail  : kadlec@blackhole.kfki.hu, kadlecsik.jozsef@wigner.hu
PGP key : https://wigner.hu/~kadlec/pgp_public_key.txt
Address : Wigner Research Centre for Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-08  8:50     ` Paolo Abeni
@ 2024-02-08  9:20       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-08  9:20 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: davem, netdev, kuba, edumazet, fw, netfilter-devel

Hi Paolo,

Working on v2 series, it should be ready in before noon.

On Thu, Feb 08, 2024 at 09:50:55AM +0100, Paolo Abeni wrote:
> Hi,
> 
> On Thu, 2024-02-08 at 06:48 +0100, Thorsten Leemhuis wrote:
> > On 08.02.24 00:37, Pablo Neira Ayuso wrote:
> > > From: Jozsef Kadlecsik <kadlec@netfilter.org>
> > > 
> > > The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
> > > in swap operation") missed to add the calls to gc cancellations
> > > at the error path of create operations and at module unload. Also,
> > > because the half of the destroy operations now executed by a
> > > function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
> > > or rcu read lock is held and therefore the checking of them results
> > > false warnings.
> > > 
> > > Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
> > > Reported-by: Brad Spengler <spender@grsecurity.net>
> > > Reported-by: Стас Ничипорович <stasn77@gmail.com>
> > > Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation")
> > 
> > FWIW, in case anyone cares: that afaics should be
> > 
> >  Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")
> > 
> > instead, as noted yesterday elsewhere[1].
> > 
> > Ciao, Thorsten
> > 
> > [1] https://lore.kernel.org/all/07cf1cf8-825e-47b9-9837-f91ae958dd6b@leemhuis.info/
> 
> I think it would be better to update the commit message, to help stable
> teams. 
> 
> Unless you absolutely need series in today PR, could you please send
> out a v2? Note that if v2 comes soon enough it can still land into the
> mentioned PR.
> 
> Thanks,
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-07 23:37 ` [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed Pablo Neira Ayuso
  2024-02-08  5:48   ` Thorsten Leemhuis
@ 2024-02-08  9:01   ` Paolo Abeni
  2024-02-08  9:31     ` Jozsef Kadlecsik
  1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-08  9:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: davem, netdev, kuba, edumazet, fw

On Thu, 2024-02-08 at 00:37 +0100, Pablo Neira Ayuso wrote:
> From: Jozsef Kadlecsik <kadlec@netfilter.org>
> 
> The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
> in swap operation") missed to add the calls to gc cancellations
> at the error path of create operations and at module unload. Also,
> because the half of the destroy operations now executed by a
> function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
> or rcu read lock is held and therefore the checking of them results
> false warnings.
> 
> Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
> Reported-by: Brad Spengler <spender@grsecurity.net>
> Reported-by: Стас Ничипорович <stasn77@gmail.com>
> Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation")
> Tested-by: Brad Spengler <spender@grsecurity.net>
> Tested-by: Стас Ничипорович <stasn77@gmail.com>
> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  net/netfilter/ipset/ip_set_core.c     | 2 ++
>  net/netfilter/ipset/ip_set_hash_gen.h | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
> index bcaad9c009fe..3184cc6be4c9 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
>  	return ret;
>  
>  cleanup:
> +	set->variant->cancel_gc(set);
>  	set->variant->destroy(set);
>  put_out:
>  	module_put(set->type->me);
> @@ -2378,6 +2379,7 @@ ip_set_net_exit(struct net *net)
>  		set = ip_set(inst, i);
>  		if (set) {
>  			ip_set(inst, i) = NULL;
> +			set->variant->cancel_gc(set);
>  			ip_set_destroy_set(set);
>  		}
>  	}
> diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
> index 1136510521a8..cfa5eecbe393 100644
> --- a/net/netfilter/ipset/ip_set_hash_gen.h
> +++ b/net/netfilter/ipset/ip_set_hash_gen.h
> @@ -432,7 +432,7 @@ mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
>  	u32 i;
>  
>  	for (i = 0; i < jhash_size(t->htable_bits); i++) {
> -		n = __ipset_dereference(hbucket(t, i));

AFAICS __ipset_dereference() should not trigger any warning, as it
boils down to rcu_dereference_protected()

> +		n = hbucket(t, i);

This statement instead triggers a sparse warning.

>  		if (!n)
>  			continue;
>  		if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
> @@ -452,7 +452,7 @@ mtype_destroy(struct ip_set *set)
>  	struct htype *h = set->data;
>  	struct list_head *l, *lt;
>  
> -	mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
> +	mtype_ahash_destroy(set, h->table, true);

The same here. What about using always __ipset_dereference()?

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-08  5:48   ` Thorsten Leemhuis
@ 2024-02-08  8:50     ` Paolo Abeni
  2024-02-08  9:20       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-02-08  8:50 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: davem, netdev, kuba, edumazet, fw, netfilter-devel

Hi,

On Thu, 2024-02-08 at 06:48 +0100, Thorsten Leemhuis wrote:
> On 08.02.24 00:37, Pablo Neira Ayuso wrote:
> > From: Jozsef Kadlecsik <kadlec@netfilter.org>
> > 
> > The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
> > in swap operation") missed to add the calls to gc cancellations
> > at the error path of create operations and at module unload. Also,
> > because the half of the destroy operations now executed by a
> > function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
> > or rcu read lock is held and therefore the checking of them results
> > false warnings.
> > 
> > Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
> > Reported-by: Brad Spengler <spender@grsecurity.net>
> > Reported-by: Стас Ничипорович <stasn77@gmail.com>
> > Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation")
> 
> FWIW, in case anyone cares: that afaics should be
> 
>  Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")
> 
> instead, as noted yesterday elsewhere[1].
> 
> Ciao, Thorsten
> 
> [1] https://lore.kernel.org/all/07cf1cf8-825e-47b9-9837-f91ae958dd6b@leemhuis.info/

I think it would be better to update the commit message, to help stable
teams. 

Unless you absolutely need series in today PR, could you please send
out a v2? Note that if v2 comes soon enough it can still land into the
mentioned PR.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-07 23:37 ` [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed Pablo Neira Ayuso
@ 2024-02-08  5:48   ` Thorsten Leemhuis
  2024-02-08  8:50     ` Paolo Abeni
  2024-02-08  9:01   ` Paolo Abeni
  1 sibling, 1 reply; 22+ messages in thread
From: Thorsten Leemhuis @ 2024-02-08  5:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel
  Cc: davem, netdev, kuba, pabeni, edumazet, fw

On 08.02.24 00:37, Pablo Neira Ayuso wrote:
> From: Jozsef Kadlecsik <kadlec@netfilter.org>
> 
> The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
> in swap operation") missed to add the calls to gc cancellations
> at the error path of create operations and at module unload. Also,
> because the half of the destroy operations now executed by a
> function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
> or rcu read lock is held and therefore the checking of them results
> false warnings.
> 
> Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
> Reported-by: Brad Spengler <spender@grsecurity.net>
> Reported-by: Стас Ничипорович <stasn77@gmail.com>
> Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation")

FWIW, in case anyone cares: that afaics should be

 Fixes: 97f7cf1cd80e ("netfilter: ipset: fix performance regression in swap operation")

instead, as noted yesterday elsewhere[1].

Ciao, Thorsten

[1] https://lore.kernel.org/all/07cf1cf8-825e-47b9-9837-f91ae958dd6b@leemhuis.info/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed
  2024-02-07 23:37 [PATCH net 00/13] Netfilter fixes for net Pablo Neira Ayuso
@ 2024-02-07 23:37 ` Pablo Neira Ayuso
  2024-02-08  5:48   ` Thorsten Leemhuis
  2024-02-08  9:01   ` Paolo Abeni
  0 siblings, 2 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2024-02-07 23:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, kuba, pabeni, edumazet, fw

From: Jozsef Kadlecsik <kadlec@netfilter.org>

The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
in swap operation") missed to add the calls to gc cancellations
at the error path of create operations and at module unload. Also,
because the half of the destroy operations now executed by a
function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
or rcu read lock is held and therefore the checking of them results
false warnings.

Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
Reported-by: Brad Spengler <spender@grsecurity.net>
Reported-by: Стас Ничипорович <stasn77@gmail.com>
Fixes: fdb8e12cc2cc ("netfilter: ipset: fix performance regression in swap operation")
Tested-by: Brad Spengler <spender@grsecurity.net>
Tested-by: Стас Ничипорович <stasn77@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipset/ip_set_core.c     | 2 ++
 net/netfilter/ipset/ip_set_hash_gen.h | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index bcaad9c009fe..3184cc6be4c9 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1154,6 +1154,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
 	return ret;
 
 cleanup:
+	set->variant->cancel_gc(set);
 	set->variant->destroy(set);
 put_out:
 	module_put(set->type->me);
@@ -2378,6 +2379,7 @@ ip_set_net_exit(struct net *net)
 		set = ip_set(inst, i);
 		if (set) {
 			ip_set(inst, i) = NULL;
+			set->variant->cancel_gc(set);
 			ip_set_destroy_set(set);
 		}
 	}
diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
index 1136510521a8..cfa5eecbe393 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -432,7 +432,7 @@ mtype_ahash_destroy(struct ip_set *set, struct htable *t, bool ext_destroy)
 	u32 i;
 
 	for (i = 0; i < jhash_size(t->htable_bits); i++) {
-		n = __ipset_dereference(hbucket(t, i));
+		n = hbucket(t, i);
 		if (!n)
 			continue;
 		if (set->extensions & IPSET_EXT_DESTROY && ext_destroy)
@@ -452,7 +452,7 @@ mtype_destroy(struct ip_set *set)
 	struct htype *h = set->data;
 	struct list_head *l, *lt;
 
-	mtype_ahash_destroy(set, ipset_dereference_nfnl(h->table), true);
+	mtype_ahash_destroy(set, h->table, true);
 	list_for_each_safe(l, lt, &h->ad) {
 		list_del(l);
 		kfree(l);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-02-08 14:51 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-08 11:28 [PATCH net,v2 00/13] Netfilter fixes for net Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 01/13] netfilter: nft_compat: narrow down revision to unsigned 8-bits Pablo Neira Ayuso
2024-02-08 12:30   ` patchwork-bot+netdevbpf
2024-02-08 14:51     ` Paolo Abeni
2024-02-08 11:28 ` [PATCH net 02/13] netfilter: nft_compat: reject unused compat flag Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 03/13] netfilter: nft_compat: restrict match/target protocol to u16 Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 04/13] netfilter: nft_set_pipapo: remove static in nft_pipapo_get() Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 06/13] netfilter: ctnetlink: fix filtering for zone 0 Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 07/13] netfilter: nft_ct: reject direction for ct id Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 08/13] netfilter: nf_tables: use timestamp to check for set element timeout Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 09/13] netfilter: nfnetlink_queue: un-break NF_REPEAT Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 10/13] netfilter: nft_set_rbtree: skip end interval element from gc Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 11/13] netfilter: nft_set_pipapo: store index in scratch maps Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 12/13] netfilter: nft_set_pipapo: add helper to release pcpu scratch area Pablo Neira Ayuso
2024-02-08 11:28 ` [PATCH net 13/13] netfilter: nft_set_pipapo: remove scratch_aligned pointer Pablo Neira Ayuso
  -- strict thread matches above, loose matches on Subject: below --
2024-02-07 23:37 [PATCH net 00/13] Netfilter fixes for net Pablo Neira Ayuso
2024-02-07 23:37 ` [PATCH net 05/13] netfilter: ipset: Missing gc cancellations fixed Pablo Neira Ayuso
2024-02-08  5:48   ` Thorsten Leemhuis
2024-02-08  8:50     ` Paolo Abeni
2024-02-08  9:20       ` Pablo Neira Ayuso
2024-02-08  9:01   ` Paolo Abeni
2024-02-08  9:31     ` Jozsef Kadlecsik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).