[PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay
@ 2022-09-16  9:29 Antoine Tenart
  2022-09-16  9:29 ` [PATCH nf 1/2] " Antoine Tenart
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Antoine Tenart @ 2022-09-16  9:29 UTC (permalink / raw)
  To: pablo, kadlec, fw; +Cc: Antoine Tenart, netfilter-devel, netdev

Hello,

The first patch fixes how the conntrack gc rescheduling delay is
computed to avoid bias depending on the order of entries in the set. But
this change has a side effect, making the logic not to be triggered
until very large sets are used. This is fixed in patch 2, which changes
the initial conntrack gc rescheduling bias.

Thanks,
Antoine

Antoine Tenart (2):
  netfilter: conntrack: fix the gc rescheduling delay
  netfilter: conntrack: revisit the gc initial rescheduling bias

 net/netfilter/nf_conntrack_core.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

-- 
2.37.3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH nf 1/2] netfilter: conntrack: fix the gc rescheduling delay
  2022-09-16  9:29 [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Antoine Tenart
@ 2022-09-16  9:29 ` Antoine Tenart
  2022-09-16  9:29 ` [PATCH nf 2/2] netfilter: conntrack: revisit the gc initial rescheduling bias Antoine Tenart
  2022-09-19 16:19 ` [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Florian Westphal
  2 siblings, 0 replies; 4+ messages in thread
From: Antoine Tenart @ 2022-09-16  9:29 UTC (permalink / raw)
  To: pablo, kadlec, fw; +Cc: Antoine Tenart, netfilter-devel, netdev

Commit 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
changed the eviction rescheduling to the use average expiry of scanned
entries (within 1-60s) by doing:

  for (...) {
      expires = clamp(nf_ct_expires(tmp), ...);
      next_run += expires;
      next_run /= 2;
  }

The issue is the above will make the average ('next_run' here) more
dependent on the last expiration values than the firsts (for sets > 2).
Depending on the expiration values used to compute the average, the
result can be quite different than what's expected. To fix this we can
do the following:

  for (...) {
      expires = clamp(nf_ct_expires(tmp), ...);
      next_run += (expires - next_run) / ++count;
  }

Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/netfilter/nf_conntrack_core.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 1357a2729a4b..2e6d5f1e6d63 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -67,6 +67,7 @@ struct conntrack_gc_work {
 	struct delayed_work	dwork;
 	u32			next_bucket;
 	u32			avg_timeout;
+	u32			count;
 	u32			start_time;
 	bool			exiting;
 	bool			early_drop;
@@ -1466,6 +1467,7 @@ static void gc_worker(struct work_struct *work)
 	unsigned int expired_count = 0;
 	unsigned long next_run;
 	s32 delta_time;
+	long count;
 
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
@@ -1475,10 +1477,12 @@ static void gc_worker(struct work_struct *work)
 
 	if (i == 0) {
 		gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT;
+		gc_work->count = 1;
 		gc_work->start_time = start_time;
 	}
 
 	next_run = gc_work->avg_timeout;
+	count = gc_work->count;
 
 	end_time = start_time + GC_SCAN_MAX_DURATION;
 
@@ -1498,8 +1502,8 @@ static void gc_worker(struct work_struct *work)
 
 		hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
 			struct nf_conntrack_net *cnet;
-			unsigned long expires;
 			struct net *net;
+			long expires;
 
 			tmp = nf_ct_tuplehash_to_ctrack(h);
 
@@ -1513,6 +1517,7 @@ static void gc_worker(struct work_struct *work)
 
 				gc_work->next_bucket = i;
 				gc_work->avg_timeout = next_run;
+				gc_work->count = count;
 
 				delta_time = nfct_time_stamp - gc_work->start_time;
 
@@ -1528,8 +1533,8 @@ static void gc_worker(struct work_struct *work)
 			}
 
 			expires = clamp(nf_ct_expires(tmp), GC_SCAN_INTERVAL_MIN, GC_SCAN_INTERVAL_CLAMP);
+			expires = (expires - (long)next_run) / ++count;
 			next_run += expires;
-			next_run /= 2u;
 
 			if (nf_conntrack_max95 == 0 || gc_worker_skip_ct(tmp))
 				continue;
@@ -1570,6 +1575,7 @@ static void gc_worker(struct work_struct *work)
 		delta_time = nfct_time_stamp - end_time;
 		if (delta_time > 0 && i < hashsz) {
 			gc_work->avg_timeout = next_run;
+			gc_work->count = count;
 			gc_work->next_bucket = i;
 			next_run = 0;
 			goto early_exit;
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH nf 2/2] netfilter: conntrack: revisit the gc initial rescheduling bias
  2022-09-16  9:29 [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Antoine Tenart
  2022-09-16  9:29 ` [PATCH nf 1/2] " Antoine Tenart
@ 2022-09-16  9:29 ` Antoine Tenart
  2022-09-19 16:19 ` [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Florian Westphal
  2 siblings, 0 replies; 4+ messages in thread
From: Antoine Tenart @ 2022-09-16  9:29 UTC (permalink / raw)
  To: pablo, kadlec, fw; +Cc: Antoine Tenart, netfilter-devel, netdev

The previous commit changed the way the rescheduling delay is computed
which has a side effect: the bias is now represented as much as the
other entries in the rescheduling delay which makes the logic to kick in
only with very large sets, as the initial interval is very large
(INT_MAX).

Revisit the GC initial bias to allow more frequent GC for smaller sets
while still avoiding wakeups when a machine is mostly idle. We're moving
from a large initial value to pretending we have 100 entries expiring at
the upper bound. This way only a few entries having a small timeout
won't impact much the rescheduling delay and non-idle machines will have
enough entries to lower the delay when needed. This also improves
readability as the initial bias is now linked to what is computed
instead of being an arbitrary large value.

Fixes: 2cfadb761d3d ("netfilter: conntrack: revisit gc autotuning")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
---
 net/netfilter/nf_conntrack_core.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 2e6d5f1e6d63..8f261cd5b3a5 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -86,10 +86,12 @@ static DEFINE_MUTEX(nf_conntrack_mutex);
 /* clamp timeouts to this value (TCP unacked) */
 #define GC_SCAN_INTERVAL_CLAMP	(300ul * HZ)
 
-/* large initial bias so that we don't scan often just because we have
- * three entries with a 1s timeout.
+/* Initial bias pretending we have 100 entries at the upper bound so we don't
+ * wakeup often just because we have three entries with a 1s timeout while still
+ * allowing non-idle machines to wakeup more often when needed.
  */
-#define GC_SCAN_INTERVAL_INIT	INT_MAX
+#define GC_SCAN_INITIAL_COUNT	100
+#define GC_SCAN_INTERVAL_INIT	GC_SCAN_INTERVAL_MAX
 
 #define GC_SCAN_MAX_DURATION	msecs_to_jiffies(10)
 #define GC_SCAN_EXPIRED_MAX	(64000u / HZ)
@@ -1477,7 +1479,7 @@ static void gc_worker(struct work_struct *work)
 
 	if (i == 0) {
 		gc_work->avg_timeout = GC_SCAN_INTERVAL_INIT;
-		gc_work->count = 1;
+		gc_work->count = GC_SCAN_INITIAL_COUNT;
 		gc_work->start_time = start_time;
 	}
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay
  2022-09-16  9:29 [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Antoine Tenart
  2022-09-16  9:29 ` [PATCH nf 1/2] " Antoine Tenart
  2022-09-16  9:29 ` [PATCH nf 2/2] netfilter: conntrack: revisit the gc initial rescheduling bias Antoine Tenart
@ 2022-09-19 16:19 ` Florian Westphal
  2 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2022-09-19 16:19 UTC (permalink / raw)
  To: Antoine Tenart; +Cc: pablo, kadlec, fw, netfilter-devel, netdev

Antoine Tenart <atenart@kernel.org> wrote:
> Antoine Tenart (2):
>   netfilter: conntrack: fix the gc rescheduling delay
>   netfilter: conntrack: revisit the gc initial rescheduling bias

Series:
Reviewed-by: Florian Westphal <fw@strlen.de>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-09-19 16:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-16  9:29 [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Antoine Tenart
2022-09-16  9:29 ` [PATCH nf 1/2] " Antoine Tenart
2022-09-16  9:29 ` [PATCH nf 2/2] netfilter: conntrack: revisit the gc initial rescheduling bias Antoine Tenart
2022-09-19 16:19 ` [PATCH nf 0/2] netfilter: conntrack: fix the gc rescheduling delay Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).