All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux
@ 2017-01-16 10:56 Florian Westphal
  2017-01-16 16:42 ` Nicolas Dichtel
  0 siblings, 1 reply; 3+ messages in thread
From: Florian Westphal @ 2017-01-16 10:56 UTC (permalink / raw)
  To: netfilter-devel; +Cc: nuclearcat, Florian Westphal, Nicolas Dichtel

This further refines the changes made to conntrack gc_worker in
commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").

The main of this change is to reduce the scan interval when evictions
take place.

However, on the reporters' setup, there are 1-2 Million conntrack entries
in total, and roughly 8k new connections per second.

In this case we'll always evict at least one entry per gc cycle and scan
interval is always at 1 jiffy.

Given we scan ~10k entries per run its clearly wrong to reduce interval
based on number of evictions, it will only waste cpu cycles.

Thus only look at the ratio (scanned entries vs. evicted entries) to make
a decision on whether to reduce or not.

Given the fact that evictor is supposed to only kick in when system turns
idle after a busy period, we should pick a high ratio -- this makes it 50%.

Also, get rid of GC_MAX_EVICTS.  Instead of breaking loop and instant
resched, just don't bother checking this at all (the loop calls
cond_resched for every bucket anyway).

Removal of GC_MAX_EVICTS was suggested by Nicolas in the past; I
should have listened.

I tested this with a vm under synflood (with
sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).

While flood is ongoing, interval now stays at 2 seconds even though
evictions are ongoing.

After stopping flood, table got cleaned up in less than 10 seconds.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3a073cd9fcf4..6c4a1311f401 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -88,8 +88,8 @@ static __read_mostly bool nf_conntrack_locks_all;
 #define GC_MAX_BUCKETS_DIV	64u
 /* upper bound of scan intervals */
 #define GC_INTERVAL_MAX		(2 * HZ)
-/* maximum conntracks to evict per gc run */
-#define GC_MAX_EVICTS		256u
+/* desired ratio of entries found to be expired */
+#define GC_EVICT_RATIO	50u
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -979,8 +979,7 @@ static void gc_worker(struct work_struct *work)
 		 */
 		rcu_read_unlock();
 		cond_resched_rcu_qs();
-	} while (++buckets < goal &&
-		 expired_count < GC_MAX_EVICTS);
+	} while (++buckets < goal);
 
 	if (gc_work->exiting)
 		return;
@@ -997,26 +996,19 @@ static void gc_worker(struct work_struct *work)
 	 * 1. Minimize time until we notice a stale entry
 	 * 2. Maximize scan intervals to not waste cycles
 	 *
-	 * Normally, expired_count will be 0, this increases the next_run time
-	 * to priorize 2) above.
+	 * Normally, expire ratio will be close to 0.
 	 *
-	 * As soon as a timed-out entry is found, move towards 1) and increase
-	 * the scan frequency.
-	 * In case we have lots of evictions next scan is done immediately.
+	 * As soon as a sizeable fraction of the entries have expired
+	 * increase scan frequency.
 	 */
+	if (gc_work->next_gc_run < GC_INTERVAL_MAX)
+		gc_work->next_gc_run += msecs_to_jiffies(10);
+
 	ratio = scanned ? expired_count * 100 / scanned : 0;
-	if (ratio >= 90 || expired_count == GC_MAX_EVICTS) {
-		gc_work->next_gc_run = 0;
-		next_run = 0;
-	} else if (expired_count) {
+	if (ratio > GC_EVICT_RATIO)
 		gc_work->next_gc_run /= 2U;
-		next_run = msecs_to_jiffies(1);
-	} else {
-		if (gc_work->next_gc_run < GC_INTERVAL_MAX)
-			gc_work->next_gc_run += msecs_to_jiffies(1);
 
-		next_run = gc_work->next_gc_run;
-	}
+	next_run = gc_work->next_gc_run;
 
 	gc_work->last_bucket = i;
 	queue_delayed_work(system_long_wq, &gc_work->dwork, next_run);
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux
  2017-01-16 10:56 [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux Florian Westphal
@ 2017-01-16 16:42 ` Nicolas Dichtel
  2017-01-16 17:33   ` Florian Westphal
  0 siblings, 1 reply; 3+ messages in thread
From: Nicolas Dichtel @ 2017-01-16 16:42 UTC (permalink / raw)
  To: Florian Westphal, netfilter-devel; +Cc: nuclearcat

Le 16/01/2017 à 11:56, Florian Westphal a écrit :
> This further refines the changes made to conntrack gc_worker in
> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
> 
> The main of this change is to reduce the scan interval when evictions
> take place.
> 
> However, on the reporters' setup, there are 1-2 Million conntrack entries
> in total, and roughly 8k new connections per second.
> 
> In this case we'll always evict at least one entry per gc cycle and scan
> interval is always at 1 jiffy.
> 
> Given we scan ~10k entries per run its clearly wrong to reduce interval
> based on number of evictions, it will only waste cpu cycles.
> 
> Thus only look at the ratio (scanned entries vs. evicted entries) to make
> a decision on whether to reduce or not.
> 
> Given the fact that evictor is supposed to only kick in when system turns
> idle after a busy period, we should pick a high ratio -- this makes it 50%.
> 
> Also, get rid of GC_MAX_EVICTS.  Instead of breaking loop and instant
> resched, just don't bother checking this at all (the loop calls
> cond_resched for every bucket anyway).
> 
> Removal of GC_MAX_EVICTS was suggested by Nicolas in the past; I
> should have listened.
I (still) agree with that part, maybe it could be done in another patch.

> 
> I tested this with a vm under synflood (with
> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
> 
> While flood is ongoing, interval now stays at 2 seconds even though
> evictions are ongoing.
> 
> After stopping flood, table got cleaned up in less than 10 seconds.
> 
> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Thank you for CCing me ;-)

> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
> Signed-off-by: Florian Westphal <fw@strlen.de>
With that patch, it's worse than before. I regularly have a delay > 30 seconds
in my tests (this was infrequent before the patch).

A solution, to fix the balance between cpu cycles consumption and short ct
events delays may be to introduce a new sysctl which allows the user to choose
between two predefined configs: short delay or long delay for ct events.
Even if I don't like this, maybe it could be acceptable with a default value set
to 'short delay'.


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux
  2017-01-16 16:42 ` Nicolas Dichtel
@ 2017-01-16 17:33   ` Florian Westphal
  0 siblings, 0 replies; 3+ messages in thread
From: Florian Westphal @ 2017-01-16 17:33 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: Florian Westphal, netfilter-devel, nuclearcat

Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> Le 16/01/2017 à 11:56, Florian Westphal a écrit :
> > This further refines the changes made to conntrack gc_worker in
> > commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
> > 
> > The main of this change is to reduce the scan interval when evictions
> > take place.
> > 
> > However, on the reporters' setup, there are 1-2 Million conntrack entries
> > in total, and roughly 8k new connections per second.
> > 
> > In this case we'll always evict at least one entry per gc cycle and scan
> > interval is always at 1 jiffy.
> > 
> > Given we scan ~10k entries per run its clearly wrong to reduce interval
> > based on number of evictions, it will only waste cpu cycles.
> > 
> > Thus only look at the ratio (scanned entries vs. evicted entries) to make
> > a decision on whether to reduce or not.
> > 
> > Given the fact that evictor is supposed to only kick in when system turns
> > idle after a busy period, we should pick a high ratio -- this makes it 50%.
> > 
> > Also, get rid of GC_MAX_EVICTS.  Instead of breaking loop and instant
> > resched, just don't bother checking this at all (the loop calls
> > cond_resched for every bucket anyway).
> > 
> > Removal of GC_MAX_EVICTS was suggested by Nicolas in the past; I
> > should have listened.
> I (still) agree with that part, maybe it could be done in another patch.

Done.

> > Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> With that patch, it's worse than before. I regularly have a delay > 30 seconds
> in my tests (this was infrequent before the patch).

:-/

We could lower the ratio of course, but under synflood with 3 second syn_recv timeout
I get a scan/evict ratio of ~15%, and we should not schedule worker all the time in that case.

> A solution, to fix the balance between cpu cycles consumption and short ct
> events delays may be to introduce a new sysctl which allows the user to choose
> between two predefined configs: short delay or long delay for ct events.
> Even if I don't like this, maybe it could be acceptable with a default value set
> to 'short delay'.

TBH in that case I'd prefer to just give up, and change worker to this:

        for (i = 0; i < nf_conntrack_htable_size; i++) {
         [..]
                hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
                        tmp = nf_ct_tuplehash_to_ctrack(h);

                        if (nf_ct_is_expired(tmp)) {
                                nf_ct_gc_expired(tmp);
                                continue;
                        }
                }
	[..]
             cond_resched_rcu_qs();
        }

        if (!gc_work->exiting)
                queue_delayed_work(system_long_wq, &gc_work->dwork, nf_conntrack_gc_interval);
}

Where nf_conntrack_gc_interval gets exported to userspace (in milliseconds),
with a default set e.g. to one minute.

Obvious downside: with low setting like 1 second we'll do frequent
rescan of entire table.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-01-16 17:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 10:56 [PATCH nf] netfilter: conntrack: refine gc worker heuristics, redux Florian Westphal
2017-01-16 16:42 ` Nicolas Dichtel
2017-01-16 17:33   ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.