* [PATCH 1/6] netfilter: conntrack: avoid gcc-10 zero-length-bounds warning
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
@ 2020-05-14 12:19 ` Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 2/6] netfilter: flowtable: Add pending bit for offload work Pablo Neira Ayuso
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2020-05-14 12:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Arnd Bergmann <arnd@arndb.de>
gcc-10 warns around a suspicious access to an empty struct member:
net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
1522 | memset(&ct->__nfct_init_offset[0], 0,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from net/netfilter/nf_conntrack_core.c:37:
include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
90 | u8 __nfct_init_offset[0];
| ^~~~~~~~~~~~~~~~~~
The code is correct but a bit unusual. Rework it slightly in a way that
does not trigger the warning, using an empty struct instead of an empty
array. There are probably more elegant ways to do this, but this is the
smallest change.
Fixes: c41884ce0562 ("netfilter: conntrack: avoid zeroing timer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack.h | 2 +-
net/netfilter/nf_conntrack_core.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 9f551f3b69c6..90690e37a56f 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -87,7 +87,7 @@ struct nf_conn {
struct hlist_node nat_bysource;
#endif
/* all members below initialized via memset */
- u8 __nfct_init_offset[0];
+ struct { } __nfct_init_offset;
/* If we were expected by an expectation, this will be it */
struct nf_conn *master;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index c4582eb71766..0173398f4ced 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1519,9 +1519,9 @@ __nf_conntrack_alloc(struct net *net,
ct->status = 0;
ct->timeout = 0;
write_pnet(&ct->ct_net, net);
- memset(&ct->__nfct_init_offset[0], 0,
+ memset(&ct->__nfct_init_offset, 0,
offsetof(struct nf_conn, proto) -
- offsetof(struct nf_conn, __nfct_init_offset[0]));
+ offsetof(struct nf_conn, __nfct_init_offset));
nf_ct_zone_add(ct, zone);
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/6] netfilter: flowtable: Add pending bit for offload work
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 1/6] netfilter: conntrack: avoid gcc-10 zero-length-bounds warning Pablo Neira Ayuso
@ 2020-05-14 12:19 ` Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 3/6] netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueue Pablo Neira Ayuso
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2020-05-14 12:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Paul Blakey <paulb@mellanox.com>
Gc step can queue offloaded flow del work or stats work.
Those work items can race each other and a flow could be freed
before the stats work is executed and querying it.
To avoid that, add a pending bit that if a work exists for a flow
don't queue another work for it.
This will also avoid adding multiple stats works in case stats work
didn't complete but gc step started again.
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_flow_table.h | 1 +
net/netfilter/nf_flow_table_offload.c | 8 +++++++-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h
index 6bf69652f57d..c54a7f707e50 100644
--- a/include/net/netfilter/nf_flow_table.h
+++ b/include/net/netfilter/nf_flow_table.h
@@ -127,6 +127,7 @@ enum nf_flow_flags {
NF_FLOW_HW_DYING,
NF_FLOW_HW_DEAD,
NF_FLOW_HW_REFRESH,
+ NF_FLOW_HW_PENDING,
};
enum flow_offload_type {
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index e3b099c14eff..3d4ca62c81f9 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -817,6 +817,7 @@ static void flow_offload_work_handler(struct work_struct *work)
WARN_ON_ONCE(1);
}
+ clear_bit(NF_FLOW_HW_PENDING, &offload->flow->flags);
kfree(offload);
}
@@ -831,9 +832,14 @@ nf_flow_offload_work_alloc(struct nf_flowtable *flowtable,
{
struct flow_offload_work *offload;
+ if (test_and_set_bit(NF_FLOW_HW_PENDING, &flow->flags))
+ return NULL;
+
offload = kmalloc(sizeof(struct flow_offload_work), GFP_ATOMIC);
- if (!offload)
+ if (!offload) {
+ clear_bit(NF_FLOW_HW_PENDING, &flow->flags);
return NULL;
+ }
offload->cmd = cmd;
offload->flow = flow;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/6] netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueue
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 1/6] netfilter: conntrack: avoid gcc-10 zero-length-bounds warning Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 2/6] netfilter: flowtable: Add pending bit for offload work Pablo Neira Ayuso
@ 2020-05-14 12:19 ` Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 4/6] netfilter: conntrack: fix infinite loop on rmmod Pablo Neira Ayuso
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2020-05-14 12:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Roi Dayan <roid@mellanox.com>
This workqueue is in charge of handling offloaded flow tasks like
add/del/stats we should not use WQ_MEM_RECLAIM flag.
The flag can result in the following warning.
[ 485.557189] ------------[ cut here ]------------
[ 485.562976] workqueue: WQ_MEM_RECLAIM nf_flow_table_offload:flow_offload_worr
[ 485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 check_flush0
[ 485.590191] Kernel panic - not syncing: panic_on_warn set ...
[ 485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 5.7.0-rc1.21802
[ 485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177
[ 485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_f]
[ 485.624834] Call Trace:
[ 485.628077] dump_stack+0x50/0x70
[ 485.632280] panic+0xfb/0x2d7
[ 485.636083] ? check_flush_dependency+0x110/0x130
[ 485.641830] __warn.cold.12+0x20/0x2a
[ 485.646405] ? check_flush_dependency+0x110/0x130
[ 485.652154] ? check_flush_dependency+0x110/0x130
[ 485.657900] report_bug+0xb8/0x100
[ 485.662187] ? sched_clock_cpu+0xc/0xb0
[ 485.666974] do_error_trap+0x9f/0xc0
[ 485.671464] do_invalid_op+0x36/0x40
[ 485.675950] ? check_flush_dependency+0x110/0x130
[ 485.681699] invalid_op+0x28/0x30
Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command")
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_flow_table_offload.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 3d4ca62c81f9..2276a73ccba2 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1062,7 +1062,7 @@ static struct flow_indr_block_entry block_ing_entry = {
int nf_flow_table_offload_init(void)
{
nf_flow_offload_wq = alloc_workqueue("nf_flow_table_offload",
- WQ_UNBOUND | WQ_MEM_RECLAIM, 0);
+ WQ_UNBOUND, 0);
if (!nf_flow_offload_wq)
return -ENOMEM;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/6] netfilter: conntrack: fix infinite loop on rmmod
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (2 preceding siblings ...)
2020-05-14 12:19 ` [PATCH 3/6] netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueue Pablo Neira Ayuso
@ 2020-05-14 12:19 ` Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 5/6] netfilter: flowtable: set NF_FLOW_TEARDOWN flag on entry expiration Pablo Neira Ayuso
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2020-05-14 12:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Florian Westphal <fw@strlen.de>
'rmmod nf_conntrack' can hang forever, because the netns exit
gets stuck in nf_conntrack_cleanup_net_list():
i_see_dead_people:
busy = 0;
list_for_each_entry(net, net_exit_list, exit_list) {
nf_ct_iterate_cleanup(kill_all, net, 0, 0);
if (atomic_read(&net->ct.count) != 0)
busy = 1;
}
if (busy) {
schedule();
goto i_see_dead_people;
}
When nf_ct_iterate_cleanup iterates the conntrack table, all nf_conn
structures can be found twice:
once for the original tuple and once for the conntracks reply tuple.
get_next_corpse() only calls the iterator when the entry is
in original direction -- the idea was to avoid unneeded invocations
of the iterator callback.
When support for clashing entries was added, the assumption that
all nf_conn objects are added twice, once in original, once for reply
tuple no longer holds -- NF_CLASH_BIT entries are only added in
the non-clashing reply direction.
Thus, if at least one NF_CLASH entry is in the list then
nf_conntrack_cleanup_net_list() always skips it completely.
During normal netns destruction, this causes a hang of several
seconds, until the gc worker removes the entry (NF_CLASH entries
always have a 1 second timeout).
But in the rmmod case, the gc worker has already been stopped, so
ct.count never becomes 0.
We can fix this in two ways:
1. Add a second test for CLASH_BIT and call iterator for those
entries as well, or:
2. Skip the original tuple direction and use the reply tuple.
2) is simpler, so do that.
Fixes: 6a757c07e51f80ac ("netfilter: conntrack: allow insertion of clashing entries")
Reported-by: Chen Yi <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_conntrack_core.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 0173398f4ced..1d57b95d3481 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2139,8 +2139,19 @@ get_next_corpse(int (*iter)(struct nf_conn *i, void *data),
nf_conntrack_lock(lockp);
if (*bucket < nf_conntrack_htable_size) {
hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[*bucket], hnnode) {
- if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
+ if (NF_CT_DIRECTION(h) != IP_CT_DIR_REPLY)
continue;
+ /* All nf_conn objects are added to hash table twice, one
+ * for original direction tuple, once for the reply tuple.
+ *
+ * Exception: In the IPS_NAT_CLASH case, only the reply
+ * tuple is added (the original tuple already existed for
+ * a different object).
+ *
+ * We only need to call the iterator once for each
+ * conntrack, so we just use the 'reply' direction
+ * tuple while iterating.
+ */
ct = nf_ct_tuplehash_to_ctrack(h);
if (iter(ct, data))
goto found;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/6] netfilter: flowtable: set NF_FLOW_TEARDOWN flag on entry expiration
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (3 preceding siblings ...)
2020-05-14 12:19 ` [PATCH 4/6] netfilter: conntrack: fix infinite loop on rmmod Pablo Neira Ayuso
@ 2020-05-14 12:19 ` Pablo Neira Ayuso
2020-05-14 12:19 ` [PATCH 6/6] netfilter: nft_set_rbtree: Add missing expired checks Pablo Neira Ayuso
2020-05-14 20:15 ` [PATCH 0/6] Netfilter fixes for net David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2020-05-14 12:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
If the flow timer expires, the gc sets on the NF_FLOW_TEARDOWN flag.
Otherwise, the flowtable software path might race to refresh the
timeout, leaving the state machine in inconsistent state.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_flow_table_core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 4344e572b7f9..42da6e337276 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -284,7 +284,7 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
if (nf_flow_has_expired(flow))
flow_offload_fixup_ct(flow->ct);
- else if (test_bit(NF_FLOW_TEARDOWN, &flow->flags))
+ else
flow_offload_fixup_ct_timeout(flow->ct);
flow_offload_free(flow);
@@ -361,8 +361,10 @@ static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data)
{
struct nf_flowtable *flow_table = data;
- if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct) ||
- test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
+ if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct))
+ set_bit(NF_FLOW_TEARDOWN, &flow->flags);
+
+ if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
if (test_bit(NF_FLOW_HW, &flow->flags)) {
if (!test_bit(NF_FLOW_HW_DYING, &flow->flags))
nf_flow_offload_del(flow_table, flow);
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 6/6] netfilter: nft_set_rbtree: Add missing expired checks
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (4 preceding siblings ...)
2020-05-14 12:19 ` [PATCH 5/6] netfilter: flowtable: set NF_FLOW_TEARDOWN flag on entry expiration Pablo Neira Ayuso
@ 2020-05-14 12:19 ` Pablo Neira Ayuso
2020-05-14 20:15 ` [PATCH 0/6] Netfilter fixes for net David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2020-05-14 12:19 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Phil Sutter <phil@nwl.cc>
Expired intervals would still match and be dumped to user space until
garbage collection wiped them out. Make sure they stop matching and
disappear (from users' perspective) as soon as they expire.
Fixes: 8d8540c4f5e03 ("netfilter: nft_set_rbtree: add timeout support")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_set_rbtree.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 3ffef454d469..62f416bc0579 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -79,6 +79,10 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set
parent = rcu_dereference_raw(parent->rb_left);
continue;
}
+
+ if (nft_set_elem_expired(&rbe->ext))
+ return false;
+
if (nft_rbtree_interval_end(rbe)) {
if (nft_set_is_anonymous(set))
return false;
@@ -94,6 +98,7 @@ static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set
if (set->flags & NFT_SET_INTERVAL && interval != NULL &&
nft_set_elem_active(&interval->ext, genmask) &&
+ !nft_set_elem_expired(&interval->ext) &&
nft_rbtree_interval_start(interval)) {
*ext = &interval->ext;
return true;
@@ -154,6 +159,9 @@ static bool __nft_rbtree_get(const struct net *net, const struct nft_set *set,
continue;
}
+ if (nft_set_elem_expired(&rbe->ext))
+ return false;
+
if (!nft_set_ext_exists(&rbe->ext, NFT_SET_EXT_FLAGS) ||
(*nft_set_ext_flags(&rbe->ext) & NFT_SET_ELEM_INTERVAL_END) ==
(flags & NFT_SET_ELEM_INTERVAL_END)) {
@@ -170,6 +178,7 @@ static bool __nft_rbtree_get(const struct net *net, const struct nft_set *set,
if (set->flags & NFT_SET_INTERVAL && interval != NULL &&
nft_set_elem_active(&interval->ext, genmask) &&
+ !nft_set_elem_expired(&interval->ext) &&
((!nft_rbtree_interval_end(interval) &&
!(flags & NFT_SET_ELEM_INTERVAL_END)) ||
(nft_rbtree_interval_end(interval) &&
@@ -418,6 +427,8 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx,
if (iter->count < iter->skip)
goto cont;
+ if (nft_set_elem_expired(&rbe->ext))
+ goto cont;
if (!nft_set_elem_active(&rbe->ext, iter->genmask))
goto cont;
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/6] Netfilter fixes for net
2020-05-14 12:19 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (5 preceding siblings ...)
2020-05-14 12:19 ` [PATCH 6/6] netfilter: nft_set_rbtree: Add missing expired checks Pablo Neira Ayuso
@ 2020-05-14 20:15 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2020-05-14 20:15 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 14 May 2020 14:19:07 +0200
> The following patchset contains Netfilter fixes for net:
>
> 1) Fix gcc-10 compilation warning in nf_conntrack, from Arnd Bergmann.
>
> 2) Add NF_FLOW_HW_PENDING to avoid races between stats and deletion
> commands, from Paul Blakey.
>
> 3) Remove WQ_MEM_RECLAIM from the offload workqueue, from Roi Dayan.
>
> 4) Infinite loop when removing nf_conntrack module, from Florian Westphal.
>
> 5) Set NF_FLOW_TEARDOWN bit on expiration to avoid races when refreshing
> the timeout from the software path.
>
> 6) Missing nft_set_elem_expired() check in the rbtree, from Phil Sutter.
>
> You can pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
Pulled, thank you.
^ permalink raw reply [flat|nested] 8+ messages in thread