netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* netfilter: get rid of per-object conntrack timers
@ 2016-08-25 13:33 Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 1/7] netfilter: restart search if moved to other chain Florian Westphal
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel

During NFWS 2016 it was mentioned that per-conntrack timers have
two drawbacks:
 - the 5-day default established timeout is very large and brings
   extra constraints for the timer subsystem.
 - most distros enable timer stats so timer struct eats 80 bytes
   in each conntrack object.

This series replaces the per-object struct timer with a u32 jiffie
stamp and one global delayed work queue for conntrack eviction.
Size of nf_conn struct is reduced to 256 bytes on x86_64.

Eviction is performed from the packet path when doing
table lookup, for cases where we have idle periods the work
queue is used.

Tested with following script, conntrackd running in 'relible
event mode'.  and httpterm running on other host:

-----------------------------------------------------------
random_resize() {
        while true; do
                RND=$RANDOM%256000
                RND=$((RND+8192))
                sysctl net.netfilter.nf_conntrack_buckets=$RND
                sleep $((RANDOM % 120))
        done
}

random_flush() {
        while true; do
                sleep $((RANDOM % 120))
                conntrack -F
        done
}

random_startstop() {
        while true; do
                sleep $((RANDOM % 120))
                pkill -STOP conntrackd
                sleep $((RANDOM % 10))
                pkill -CONT conntrackd
        done
}

http-client -u 1000 -t 3 -F 192.168.10.50 -G 192.168.10.17:8001 &
http-client -u 1000 -t 3 -F 192.168.10.51 -G 192.168.10.17:8001 &
http-client -u 1000 -t 3 -F 192.168.10.52 -G 192.168.10.17:8001 &

random_resize &
random_flush &
random_startstop &

wait
-----------------------------------------------------------
 include/net/netfilter/nf_conntrack.h        |   36 +++--
 include/net/netfilter/nf_conntrack_ecache.h |   17 +-
 net/netfilter/nf_conntrack_core.c           |  192 +++++++++++++++++++++-------
 net/netfilter/nf_conntrack_ecache.c         |   22 ++-
 net/netfilter/nf_conntrack_netlink.c        |   39 ++++-
 net/netfilter/nf_conntrack_pptp.c           |    3 
 net/netfilter/nf_nat_core.c                 |    6 
 7 files changed, 221 insertions(+), 94 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 1/7] netfilter: restart search if moved to other chain
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 2/7] netfilter: don't rely on DYING bit to detect when destroy event was sent Florian Westphal
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

In case nf_conntrack_tuple_taken did not find a conflicting entry
check that all entries in this hash slot were tested and restart
in case an entry was moved to another chain.

Reported-by: Eric Dumazet <edumazet@google.com>
Fixes: ea781f197d6a ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()")
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/netfilter/nf_conntrack_core.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7d90a5d..887926a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -809,6 +809,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	zone = nf_ct_zone(ignored_conntrack);
 
 	rcu_read_lock();
+ begin:
 	nf_conntrack_get_ht(&ct_hash, &hsize);
 	hash = __hash_conntrack(net, tuple, hsize);
 
@@ -822,6 +823,12 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 		}
 		NF_CT_STAT_INC_ATOMIC(net, searched);
 	}
+
+	if (get_nulls_value(n) != hash) {
+		NF_CT_STAT_INC_ATOMIC(net, search_restart);
+		goto begin;
+	}
+
 	rcu_read_unlock();
 
 	return 0;
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 2/7] netfilter: don't rely on DYING bit to detect when destroy event was sent
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 1/7] netfilter: restart search if moved to other chain Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 3/7] netfilter: conntrack: get rid of conntrack timer Florian Westphal
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

The reliable event delivery mode currently (ab)uses the DYING bit to
detect which entries on the dying list have to be skipped when
re-delivering events from the eache worker in reliable event mode.

Currently when we delete the conntrack from main table we only set this
bit if we could also deliver the netlink destroy event to userspace.

If we fail we move it to the dying list, the ecache worker will
reattempt event delivery for all confirmed conntracks on the dying list
that do not have the DYING bit set.

Once timer is gone, we can no longer use if (del_timer()) to detect
when we 'stole' the reference count owned by the timer/hash entry, so
we need some other way to avoid racing with other cpu.

Pablo suggested to add a marker in the ecache extension that skips
entries that have been unhashed from main table but are still waiting
for the last reference count to be dropped (e.g. because one skb waiting
on nfqueue verdict still holds a reference).

We do this by adding a tristate.
If we fail to deliver the destroy event, make a note of this in the
eache extension.  The worker can then skip all entries that are in
a different state.  Either they never delivered a destroy event,
e.g. because the netlink backend was not loaded, or redelivery took
place already.

Once the conntrack timer is removed we will now be able to replace
del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid
racing with other cpu that tries to evict the same conntrack.

Because DYING will then be set right before we report the destroy event
we can no longer skip event reporting when dying bit is set.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/netfilter/nf_conntrack_ecache.h | 17 ++++++++++++-----
 net/netfilter/nf_conntrack_ecache.c         | 22 ++++++++++++++--------
 2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_ecache.h b/include/net/netfilter/nf_conntrack_ecache.h
index fa36447..12d967b 100644
--- a/include/net/netfilter/nf_conntrack_ecache.h
+++ b/include/net/netfilter/nf_conntrack_ecache.h
@@ -12,12 +12,19 @@
 #include <linux/netfilter/nf_conntrack_tuple_common.h>
 #include <net/netfilter/nf_conntrack_extend.h>
 
+enum nf_ct_ecache_state {
+	NFCT_ECACHE_UNKNOWN,		/* destroy event not sent */
+	NFCT_ECACHE_DESTROY_FAIL,	/* tried but failed to send destroy event */
+	NFCT_ECACHE_DESTROY_SENT,	/* sent destroy event after failure */
+};
+
 struct nf_conntrack_ecache {
-	unsigned long cache;	/* bitops want long */
-	unsigned long missed;	/* missed events */
-	u16 ctmask;		/* bitmask of ct events to be delivered */
-	u16 expmask;		/* bitmask of expect events to be delivered */
-	u32 portid;		/* netlink portid of destroyer */
+	unsigned long cache;		/* bitops want long */
+	unsigned long missed;		/* missed events */
+	u16 ctmask;			/* bitmask of ct events to be delivered */
+	u16 expmask;			/* bitmask of expect events to be delivered */
+	u32 portid;			/* netlink portid of destroyer */
+	enum nf_ct_ecache_state state;	/* ecache state */
 };
 
 static inline struct nf_conntrack_ecache *
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index d28011b..da9df2d 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -49,8 +49,13 @@ static enum retry_state ecache_work_evict_list(struct ct_pcpu *pcpu)
 
 	hlist_nulls_for_each_entry(h, n, &pcpu->dying, hnnode) {
 		struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
+		struct nf_conntrack_ecache *e;
 
-		if (nf_ct_is_dying(ct))
+		if (!nf_ct_is_confirmed(ct))
+			continue;
+
+		e = nf_ct_ecache_find(ct);
+		if (!e || e->state != NFCT_ECACHE_DESTROY_FAIL)
 			continue;
 
 		if (nf_conntrack_event(IPCT_DESTROY, ct)) {
@@ -58,8 +63,7 @@ static enum retry_state ecache_work_evict_list(struct ct_pcpu *pcpu)
 			break;
 		}
 
-		/* we've got the event delivered, now it's dying */
-		set_bit(IPS_DYING_BIT, &ct->status);
+		e->state = NFCT_ECACHE_DESTROY_SENT;
 		refs[evicted] = ct;
 
 		if (++evicted >= ARRAY_SIZE(refs)) {
@@ -130,7 +134,7 @@ int nf_conntrack_eventmask_report(unsigned int eventmask, struct nf_conn *ct,
 	if (!e)
 		goto out_unlock;
 
-	if (nf_ct_is_confirmed(ct) && !nf_ct_is_dying(ct)) {
+	if (nf_ct_is_confirmed(ct)) {
 		struct nf_ct_event item = {
 			.ct	= ct,
 			.portid	= e->portid ? e->portid : portid,
@@ -150,11 +154,13 @@ int nf_conntrack_eventmask_report(unsigned int eventmask, struct nf_conn *ct,
 				 * triggered by a process, we store the PORTID
 				 * to include it in the retransmission.
 				 */
-				if (eventmask & (1 << IPCT_DESTROY) &&
-				    e->portid == 0 && portid != 0)
-					e->portid = portid;
-				else
+				if (eventmask & (1 << IPCT_DESTROY)) {
+					if (e->portid == 0 && portid != 0)
+						e->portid = portid;
+					e->state = NFCT_ECACHE_DESTROY_FAIL;
+				} else {
 					e->missed |= eventmask;
+				}
 			} else {
 				e->missed &= ~missed;
 			}
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 3/7] netfilter: conntrack: get rid of conntrack timer
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 1/7] netfilter: restart search if moved to other chain Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 2/7] netfilter: don't rely on DYING bit to detect when destroy event was sent Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 4/7] netfilter: evict stale entries on netlink dumps Florian Westphal
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as
Eric Dumazet pointed out during netfilter workshop 2016.

Eric also says: "Another reason was the fact that Thomas was about to
change max timer range [..]" (500462a9de657f8, 'timers: Switch to
a non-cascading wheel').

Remove the timer and use a 32bit jiffies value containing timestamp until
entry is valid.

During conntrack lookup, even before doing tuple comparision, check
the timeout value and evict the entry in case it is too old.

The dying bit is used as a synchronization point to avoid races where
multiple cpus try to evict the same entry.

Because lookup is always lockless, we need to bump the refcnt once
when we evict, else we could try to evict already-dead entry that
is being recycled.

This is the standard/expected way when conntrack entries are destroyed.

Followup patches will introduce garbage colliction via work queue
and further places where we can reap obsoleted entries (e.g. during
netlink dumps), this is needed to avoid expired conntracks from hanging
around for too long when lookup rate is low after a busy period.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 include/net/netfilter/nf_conntrack.h | 23 +++++++--
 net/netfilter/nf_conntrack_core.c    | 91 ++++++++++++++++++++----------------
 net/netfilter/nf_conntrack_netlink.c | 14 ++----
 net/netfilter/nf_conntrack_pptp.c    |  3 +-
 net/netfilter/nf_nat_core.c          |  6 ---
 5 files changed, 74 insertions(+), 63 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 2a12748..7277751 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -42,7 +42,6 @@ union nf_conntrack_expect_proto {
 
 #include <linux/types.h>
 #include <linux/skbuff.h>
-#include <linux/timer.h>
 
 #ifdef CONFIG_NETFILTER_DEBUG
 #define NF_CT_ASSERT(x)		WARN_ON(!(x))
@@ -73,7 +72,7 @@ struct nf_conn_help {
 #include <net/netfilter/ipv6/nf_conntrack_ipv6.h>
 
 struct nf_conn {
-	/* Usage count in here is 1 for hash table/destruct timer, 1 per skb,
+	/* Usage count in here is 1 for hash table, 1 per skb,
 	 * plus 1 for any connection(s) we are `master' for
 	 *
 	 * Hint, SKB address this struct and refcnt via skb->nfct and
@@ -96,8 +95,8 @@ struct nf_conn {
 	/* Have we seen traffic both ways yet? (bitset) */
 	unsigned long status;
 
-	/* Timer function; drops refcnt when it goes off. */
-	struct timer_list timeout;
+	/* jiffies32 when this ct is considered dead */
+	u32 timeout;
 
 	possible_net_t ct_net;
 
@@ -291,14 +290,28 @@ static inline bool nf_is_loopback_packet(const struct sk_buff *skb)
 	return skb->dev && skb->skb_iif && skb->dev->flags & IFF_LOOPBACK;
 }
 
+#define nfct_time_stamp ((u32)(jiffies))
+
 /* jiffies until ct expires, 0 if already expired */
 static inline unsigned long nf_ct_expires(const struct nf_conn *ct)
 {
-	long timeout = (long)ct->timeout.expires - (long)jiffies;
+	s32 timeout = ct->timeout - nfct_time_stamp;
 
 	return timeout > 0 ? timeout : 0;
 }
 
+static inline bool nf_ct_is_expired(const struct nf_conn *ct)
+{
+	return (__s32)(ct->timeout - nfct_time_stamp) <= 0;
+}
+
+/* use after obtaining a reference count */
+static inline bool nf_ct_should_gc(const struct nf_conn *ct)
+{
+	return nf_ct_is_expired(ct) && nf_ct_is_confirmed(ct) &&
+	       !nf_ct_is_dying(ct);
+}
+
 struct kernel_param;
 
 int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 887926a..87ee6da 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -371,7 +371,6 @@ destroy_conntrack(struct nf_conntrack *nfct)
 
 	pr_debug("destroy_conntrack(%p)\n", ct);
 	NF_CT_ASSERT(atomic_read(&nfct->use) == 0);
-	NF_CT_ASSERT(!timer_pending(&ct->timeout));
 
 	if (unlikely(nf_ct_is_template(ct))) {
 		nf_ct_tmpl_free(ct);
@@ -434,35 +433,30 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
 {
 	struct nf_conn_tstamp *tstamp;
 
+	if (test_and_set_bit(IPS_DYING_BIT, &ct->status))
+		return false;
+
 	tstamp = nf_conn_tstamp_find(ct);
 	if (tstamp && tstamp->stop == 0)
 		tstamp->stop = ktime_get_real_ns();
 
-	if (nf_ct_is_dying(ct))
-		goto delete;
-
 	if (nf_conntrack_event_report(IPCT_DESTROY, ct,
 				    portid, report) < 0) {
-		/* destroy event was not delivered */
+		/* destroy event was not delivered. nf_ct_put will
+		 * be done by event cache worker on redelivery.
+		 */
 		nf_ct_delete_from_lists(ct);
 		nf_conntrack_ecache_delayed_work(nf_ct_net(ct));
 		return false;
 	}
 
 	nf_conntrack_ecache_work(nf_ct_net(ct));
-	set_bit(IPS_DYING_BIT, &ct->status);
- delete:
 	nf_ct_delete_from_lists(ct);
 	nf_ct_put(ct);
 	return true;
 }
 EXPORT_SYMBOL_GPL(nf_ct_delete);
 
-static void death_by_timeout(unsigned long ul_conntrack)
-{
-	nf_ct_delete((struct nf_conn *)ul_conntrack, 0, 0);
-}
-
 static inline bool
 nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
 		const struct nf_conntrack_tuple *tuple,
@@ -480,6 +474,18 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
 	       net_eq(net, nf_ct_net(ct));
 }
 
+/* caller must hold rcu readlock and none of the nf_conntrack_locks */
+static void nf_ct_gc_expired(struct nf_conn *ct)
+{
+	if (!atomic_inc_not_zero(&ct->ct_general.use))
+		return;
+
+	if (nf_ct_should_gc(ct))
+		nf_ct_kill(ct);
+
+	nf_ct_put(ct);
+}
+
 /*
  * Warning :
  * - Caller must take a reference on returned object
@@ -499,6 +505,17 @@ begin:
 	bucket = reciprocal_scale(hash, hsize);
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
+		struct nf_conn *ct;
+
+		ct = nf_ct_tuplehash_to_ctrack(h);
+		if (nf_ct_is_expired(ct)) {
+			nf_ct_gc_expired(ct);
+			continue;
+		}
+
+		if (nf_ct_is_dying(ct))
+			continue;
+
 		if (nf_ct_key_equal(h, tuple, zone, net)) {
 			NF_CT_STAT_INC_ATOMIC(net, found);
 			return h;
@@ -597,7 +614,6 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 				    zone, net))
 			goto out;
 
-	add_timer(&ct->timeout);
 	smp_wmb();
 	/* The caller holds a reference to this object */
 	atomic_set(&ct->ct_general.use, 2);
@@ -750,8 +766,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	/* Timer relative to confirmation time, not original
 	   setting time, otherwise we'd get timer wrap in
 	   weird delay cases. */
-	ct->timeout.expires += jiffies;
-	add_timer(&ct->timeout);
+	ct->timeout += nfct_time_stamp;
 	atomic_inc(&ct->ct_general.use);
 	ct->status |= IPS_CONFIRMED;
 
@@ -815,8 +830,16 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
-		if (ct != ignored_conntrack &&
-		    nf_ct_key_equal(h, tuple, zone, net)) {
+
+		if (ct == ignored_conntrack)
+			continue;
+
+		if (nf_ct_is_expired(ct)) {
+			nf_ct_gc_expired(ct);
+			continue;
+		}
+
+		if (nf_ct_key_equal(h, tuple, zone, net)) {
 			NF_CT_STAT_INC_ATOMIC(net, found);
 			rcu_read_unlock();
 			return 1;
@@ -850,6 +873,11 @@ static unsigned int early_drop_list(struct net *net,
 	hlist_nulls_for_each_entry_rcu(h, n, head, hnnode) {
 		tmp = nf_ct_tuplehash_to_ctrack(h);
 
+		if (nf_ct_is_expired(tmp)) {
+			nf_ct_gc_expired(tmp);
+			continue;
+		}
+
 		if (test_bit(IPS_ASSURED_BIT, &tmp->status) ||
 		    !net_eq(nf_ct_net(tmp), net) ||
 		    nf_ct_is_dying(tmp))
@@ -867,7 +895,6 @@ static unsigned int early_drop_list(struct net *net,
 		 */
 		if (net_eq(nf_ct_net(tmp), net) &&
 		    nf_ct_is_confirmed(tmp) &&
-		    del_timer(&tmp->timeout) &&
 		    nf_ct_delete(tmp, 0, 0))
 			drops++;
 
@@ -937,8 +964,6 @@ __nf_conntrack_alloc(struct net *net,
 	/* save hash for reusing when confirming */
 	*(unsigned long *)(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev) = hash;
 	ct->status = 0;
-	/* Don't set timer yet: wait for confirmation */
-	setup_timer(&ct->timeout, death_by_timeout, (unsigned long)ct);
 	write_pnet(&ct->ct_net, net);
 	memset(&ct->__nfct_init_offset[0], 0,
 	       offsetof(struct nf_conn, proto) -
@@ -1312,7 +1337,6 @@ void __nf_ct_refresh_acct(struct nf_conn *ct,
 			  unsigned long extra_jiffies,
 			  int do_acct)
 {
-	NF_CT_ASSERT(ct->timeout.data == (unsigned long)ct);
 	NF_CT_ASSERT(skb);
 
 	/* Only update if this is not a fixed timeout */
@@ -1320,18 +1344,10 @@ void __nf_ct_refresh_acct(struct nf_conn *ct,
 		goto acct;
 
 	/* If not in hash table, timer will not be active yet */
-	if (!nf_ct_is_confirmed(ct)) {
-		ct->timeout.expires = extra_jiffies;
-	} else {
-		unsigned long newtime = jiffies + extra_jiffies;
-
-		/* Only update the timeout if the new timeout is at least
-		   HZ jiffies from the old timeout. Need del_timer for race
-		   avoidance (may already be dying). */
-		if (newtime - ct->timeout.expires >= HZ)
-			mod_timer_pending(&ct->timeout, newtime);
-	}
+	if (nf_ct_is_confirmed(ct))
+		extra_jiffies += nfct_time_stamp;
 
+	ct->timeout = extra_jiffies;
 acct:
 	if (do_acct)
 		nf_ct_acct_update(ct, ctinfo, skb->len);
@@ -1346,11 +1362,7 @@ bool __nf_ct_kill_acct(struct nf_conn *ct,
 	if (do_acct)
 		nf_ct_acct_update(ct, ctinfo, skb->len);
 
-	if (del_timer(&ct->timeout)) {
-		ct->timeout.function((unsigned long)ct);
-		return true;
-	}
-	return false;
+	return nf_ct_delete(ct, 0, 0);
 }
 EXPORT_SYMBOL_GPL(__nf_ct_kill_acct);
 
@@ -1485,11 +1497,8 @@ void nf_ct_iterate_cleanup(struct net *net,
 
 	while ((ct = get_next_corpse(net, iter, data, &bucket)) != NULL) {
 		/* Time to push up daises... */
-		if (del_timer(&ct->timeout))
-			nf_ct_delete(ct, portid, report);
-
-		/* ... else the timer will get him soon. */
 
+		nf_ct_delete(ct, portid, report);
 		nf_ct_put(ct);
 		cond_resched();
 	}
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 68800c1..81fd34c 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1144,9 +1144,7 @@ static int ctnetlink_del_conntrack(struct net *net, struct sock *ctnl,
 		}
 	}
 
-	if (del_timer(&ct->timeout))
-		nf_ct_delete(ct, NETLINK_CB(skb).portid, nlmsg_report(nlh));
-
+	nf_ct_delete(ct, NETLINK_CB(skb).portid, nlmsg_report(nlh));
 	nf_ct_put(ct);
 
 	return 0;
@@ -1514,11 +1512,10 @@ static int ctnetlink_change_timeout(struct nf_conn *ct,
 {
 	u_int32_t timeout = ntohl(nla_get_be32(cda[CTA_TIMEOUT]));
 
-	if (!del_timer(&ct->timeout))
-		return -ETIME;
+	ct->timeout = nfct_time_stamp + timeout * HZ;
 
-	ct->timeout.expires = jiffies + timeout * HZ;
-	add_timer(&ct->timeout);
+	if (test_bit(IPS_DYING_BIT, &ct->status))
+		return -ETIME;
 
 	return 0;
 }
@@ -1716,9 +1713,8 @@ ctnetlink_create_conntrack(struct net *net,
 
 	if (!cda[CTA_TIMEOUT])
 		goto err1;
-	ct->timeout.expires = ntohl(nla_get_be32(cda[CTA_TIMEOUT]));
 
-	ct->timeout.expires = jiffies + ct->timeout.expires * HZ;
+	ct->timeout = nfct_time_stamp + ntohl(nla_get_be32(cda[CTA_TIMEOUT])) * HZ;
 
 	rcu_read_lock();
  	if (cda[CTA_HELP]) {
diff --git a/net/netfilter/nf_conntrack_pptp.c b/net/netfilter/nf_conntrack_pptp.c
index 5588c7a..f60a475 100644
--- a/net/netfilter/nf_conntrack_pptp.c
+++ b/net/netfilter/nf_conntrack_pptp.c
@@ -157,8 +157,7 @@ static int destroy_sibling_or_exp(struct net *net, struct nf_conn *ct,
 		pr_debug("setting timeout of conntrack %p to 0\n", sibling);
 		sibling->proto.gre.timeout	  = 0;
 		sibling->proto.gre.stream_timeout = 0;
-		if (del_timer(&sibling->timeout))
-			sibling->timeout.function((unsigned long)sibling);
+		nf_ct_kill(sibling);
 		nf_ct_put(sibling);
 		return 1;
 	} else {
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index de31818..81ae41f 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -565,16 +565,10 @@ static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
 	 * Else, when the conntrack is destoyed, nf_nat_cleanup_conntrack()
 	 * will delete entry from already-freed table.
 	 */
-	if (!del_timer(&ct->timeout))
-		return 1;
-
 	ct->status &= ~IPS_NAT_DONE_MASK;
-
 	rhashtable_remove_fast(&nf_nat_bysource_table, &ct->nat_bysource,
 			       nf_nat_bysource_params);
 
-	add_timer(&ct->timeout);
-
 	/* don't delete conntrack.  Although that would make things a lot
 	 * simpler, we'd end up flushing all conntracks on nat rmmod.
 	 */
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 4/7] netfilter: evict stale entries on netlink dumps
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
                   ` (2 preceding siblings ...)
  2016-08-25 13:33 ` [PATCH v3 nf-next 3/7] netfilter: conntrack: get rid of conntrack timer Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-25 13:33 ` [PATCH v3 nf-next 5/7] netfilter: conntrack: add gc worker to remove timed-out entries Florian Westphal
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

When dumping we already have to look at the entire table, so we might
as well toss those entries whose timeout value is in the past.

We also look at every entry during resize operations.
However, eviction there is not as simple because we hold the
global resize lock so we can't evict without adding a 'expired' list
to drop from later.  Considering that resizes are very rare it doesn't
seem worth doing it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
---
 net/netfilter/nf_conntrack_netlink.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 81fd34c..dedbe4b 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -815,14 +815,23 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 	struct hlist_nulls_node *n;
 	struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
 	u_int8_t l3proto = nfmsg->nfgen_family;
-	int res;
+	struct nf_conn *nf_ct_evict[8];
+	int res, i;
 	spinlock_t *lockp;
 
 	last = (struct nf_conn *)cb->args[1];
+	i = 0;
 
 	local_bh_disable();
 	for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
 restart:
+		while (i) {
+			i--;
+			if (nf_ct_should_gc(nf_ct_evict[i]))
+				nf_ct_kill(nf_ct_evict[i]);
+			nf_ct_put(nf_ct_evict[i]);
+		}
+
 		lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
 		if (cb->args[0] >= nf_conntrack_htable_size) {
@@ -834,6 +843,13 @@ restart:
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
+			if (nf_ct_is_expired(ct)) {
+				if (i < ARRAY_SIZE(nf_ct_evict) &&
+				    atomic_inc_not_zero(&ct->ct_general.use))
+					nf_ct_evict[i++] = ct;
+				continue;
+			}
+
 			if (!net_eq(net, nf_ct_net(ct)))
 				continue;
 
@@ -875,6 +891,13 @@ out:
 	if (last)
 		nf_ct_put(last);
 
+	while (i) {
+		i--;
+		if (nf_ct_should_gc(nf_ct_evict[i]))
+			nf_ct_kill(nf_ct_evict[i]);
+		nf_ct_put(nf_ct_evict[i]);
+	}
+
 	return skb->len;
 }
 
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 5/7] netfilter: conntrack: add gc worker to remove timed-out entries
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
                   ` (3 preceding siblings ...)
  2016-08-25 13:33 ` [PATCH v3 nf-next 4/7] netfilter: evict stale entries on netlink dumps Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-25 16:06   ` Eric Dumazet
  2016-08-25 13:33 ` [PATCH v3 nf-next 6/7] netfilter: conntrack: resched gc again if eviction rate is high Florian Westphal
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Conntrack gc worker to evict stale entries.

GC happens once every 5 seconds, but we only scan at most 1/64th of the
table (and not more than 8k) buckets to avoid hogging cpu.

This means that a complete scan of the table will take several minutes
of wall-clock time.

Considering that the gc run will never have to evict any entries
during normal operation because those will happen from packet path
this should be fine.

We only need gc to make sure userspace (conntrack event listeners)
eventually learn of the timeout, and for resource reclaim in case the
system becomes idle.

We do not disable BH and cond_resched for every bucket so this should
not introduce noticeable latencies either.

A followup patch will add a small change to speed up GC for the extreme
case where most entries are timed out on an otherwise idle system.

v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on
nulls value change in gc worker, suggested by Eric Dumazet.

v3: don't call cancel_delayed_work_sync twice (again, Eric).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 76 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 87ee6da..f95a9e9 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -72,11 +72,24 @@ EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
 struct hlist_nulls_head *nf_conntrack_hash __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_hash);
 
+struct conntrack_gc_work {
+	struct delayed_work	dwork;
+	u32			last_bucket;
+	bool			exiting;
+};
+
 static __read_mostly struct kmem_cache *nf_conntrack_cachep;
 static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
 static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
 static __read_mostly bool nf_conntrack_locks_all;
 
+#define GC_MAX_BUCKETS_DIV	64u
+#define GC_MAX_BUCKETS		8192u
+#define GC_INTERVAL		(5 * HZ)
+#define GC_MAX_EVICTS		256u
+
+static struct conntrack_gc_work conntrack_gc_work;
+
 void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
 {
 	spin_lock(lock);
@@ -928,6 +941,63 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 	return false;
 }
 
+static void gc_worker(struct work_struct *work)
+{
+	unsigned int i, goal, buckets = 0, expired_count = 0;
+	unsigned long next_run = GC_INTERVAL;
+	struct conntrack_gc_work *gc_work;
+
+	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
+
+	goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
+	i = gc_work->last_bucket;
+
+	do {
+		struct nf_conntrack_tuple_hash *h;
+		struct hlist_nulls_head *ct_hash;
+		struct hlist_nulls_node *n;
+		unsigned int hashsz;
+		struct nf_conn *tmp;
+
+		i++;
+		rcu_read_lock();
+
+		nf_conntrack_get_ht(&ct_hash, &hashsz);
+		if (i >= hashsz)
+			i = 0;
+
+		hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
+			tmp = nf_ct_tuplehash_to_ctrack(h);
+
+			if (nf_ct_is_expired(tmp)) {
+				nf_ct_gc_expired(tmp);
+				expired_count++;
+				continue;
+			}
+		}
+
+		/* could check get_nulls_value() here and restart if ct
+		 * was moved to another chain.  But given gc is best-effort
+		 * we will just continue with next hash slot.
+		 */
+		rcu_read_unlock();
+		cond_resched_rcu_qs();
+	} while (++buckets < goal &&
+		 expired_count < GC_MAX_EVICTS);
+
+	if (gc_work->exiting)
+		return;
+
+	gc_work->last_bucket = i;
+	schedule_delayed_work(&gc_work->dwork, next_run);
+}
+
+static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work)
+{
+	INIT_DELAYED_WORK(&gc_work->dwork, gc_worker);
+	gc_work->exiting = false;
+}
+
 static struct nf_conn *
 __nf_conntrack_alloc(struct net *net,
 		     const struct nf_conntrack_zone *zone,
@@ -1534,6 +1604,7 @@ static int untrack_refs(void)
 
 void nf_conntrack_cleanup_start(void)
 {
+	conntrack_gc_work.exiting = true;
 	RCU_INIT_POINTER(ip_ct_attach, NULL);
 }
 
@@ -1543,6 +1614,7 @@ void nf_conntrack_cleanup_end(void)
 	while (untrack_refs() > 0)
 		schedule();
 
+	cancel_delayed_work_sync(&conntrack_gc_work.dwork);
 	nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size);
 
 	nf_conntrack_proto_fini();
@@ -1817,6 +1889,10 @@ int nf_conntrack_init_start(void)
 	}
 	/*  - and look it like as a confirmed connection */
 	nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED);
+
+	conntrack_gc_work_init(&conntrack_gc_work);
+	schedule_delayed_work(&conntrack_gc_work.dwork, GC_INTERVAL);
+
 	return 0;
 
 err_proto:
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 6/7] netfilter: conntrack: resched gc again if eviction rate is high
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
                   ` (4 preceding siblings ...)
  2016-08-25 13:33 ` [PATCH v3 nf-next 5/7] netfilter: conntrack: add gc worker to remove timed-out entries Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-25 16:07   ` Eric Dumazet
  2016-08-25 13:33 ` [PATCH v3 nf-next 7/7] netfilter: remove __nf_ct_kill_acct helper Florian Westphal
  2016-08-30  9:43 ` netfilter: get rid of per-object conntrack timers Pablo Neira Ayuso
  7 siblings, 1 reply; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

If we evicted a large fraction of the scanned conntrack entries re-schedule
the next gc cycle for immediate execution.

This triggers during tests where load is high, then drops to zero and
many connections will be in TW/CLOSE state with < 30 second timeouts.

Without this change it will take several minutes until conntrack count
comes back to normal.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f95a9e9..7c66ce40 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -945,6 +945,7 @@ static void gc_worker(struct work_struct *work)
 {
 	unsigned int i, goal, buckets = 0, expired_count = 0;
 	unsigned long next_run = GC_INTERVAL;
+	unsigned int ratio, scanned = 0;
 	struct conntrack_gc_work *gc_work;
 
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
@@ -969,6 +970,7 @@ static void gc_worker(struct work_struct *work)
 		hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[i], hnnode) {
 			tmp = nf_ct_tuplehash_to_ctrack(h);
 
+			scanned++;
 			if (nf_ct_is_expired(tmp)) {
 				nf_ct_gc_expired(tmp);
 				expired_count++;
@@ -988,6 +990,10 @@ static void gc_worker(struct work_struct *work)
 	if (gc_work->exiting)
 		return;
 
+	ratio = scanned ? expired_count * 100 / scanned : 0;
+	if (ratio >= 90)
+		next_run = 0;
+
 	gc_work->last_bucket = i;
 	schedule_delayed_work(&gc_work->dwork, next_run);
 }
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 nf-next 7/7] netfilter: remove __nf_ct_kill_acct helper
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
                   ` (5 preceding siblings ...)
  2016-08-25 13:33 ` [PATCH v3 nf-next 6/7] netfilter: conntrack: resched gc again if eviction rate is high Florian Westphal
@ 2016-08-25 13:33 ` Florian Westphal
  2016-08-30  9:43 ` netfilter: get rid of per-object conntrack timers Pablo Neira Ayuso
  7 siblings, 0 replies; 11+ messages in thread
From: Florian Westphal @ 2016-08-25 13:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

After timer removal this just calls nf_ct_delete so remove the __ prefix
version and make nf_ct_kill a shorthand for nf_ct_delete.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack.h | 13 +++----------
 net/netfilter/nf_conntrack_core.c    | 12 +++++-------
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 7277751..5041805 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -219,21 +219,14 @@ static inline void nf_ct_refresh(struct nf_conn *ct,
 	__nf_ct_refresh_acct(ct, 0, skb, extra_jiffies, 0);
 }
 
-bool __nf_ct_kill_acct(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
-		       const struct sk_buff *skb, int do_acct);
-
 /* kill conntrack and do accounting */
-static inline bool nf_ct_kill_acct(struct nf_conn *ct,
-				   enum ip_conntrack_info ctinfo,
-				   const struct sk_buff *skb)
-{
-	return __nf_ct_kill_acct(ct, ctinfo, skb, 1);
-}
+bool nf_ct_kill_acct(struct nf_conn *ct, enum ip_conntrack_info ctinfo,
+		     const struct sk_buff *skb);
 
 /* kill conntrack without accounting */
 static inline bool nf_ct_kill(struct nf_conn *ct)
 {
-	return __nf_ct_kill_acct(ct, 0, NULL, 0);
+	return nf_ct_delete(ct, 0, 0);
 }
 
 /* These are for NAT.  Icky. */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7c66ce40..ac1db40 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1430,17 +1430,15 @@ acct:
 }
 EXPORT_SYMBOL_GPL(__nf_ct_refresh_acct);
 
-bool __nf_ct_kill_acct(struct nf_conn *ct,
-		       enum ip_conntrack_info ctinfo,
-		       const struct sk_buff *skb,
-		       int do_acct)
+bool nf_ct_kill_acct(struct nf_conn *ct,
+		     enum ip_conntrack_info ctinfo,
+		     const struct sk_buff *skb)
 {
-	if (do_acct)
-		nf_ct_acct_update(ct, ctinfo, skb->len);
+	nf_ct_acct_update(ct, ctinfo, skb->len);
 
 	return nf_ct_delete(ct, 0, 0);
 }
-EXPORT_SYMBOL_GPL(__nf_ct_kill_acct);
+EXPORT_SYMBOL_GPL(nf_ct_kill_acct);
 
 #if IS_ENABLED(CONFIG_NF_CT_NETLINK)
 
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 nf-next 5/7] netfilter: conntrack: add gc worker to remove timed-out entries
  2016-08-25 13:33 ` [PATCH v3 nf-next 5/7] netfilter: conntrack: add gc worker to remove timed-out entries Florian Westphal
@ 2016-08-25 16:06   ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2016-08-25 16:06 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Thu, 2016-08-25 at 15:33 +0200, Florian Westphal wrote:
> Conntrack gc worker to evict stale entries.
> 
> GC happens once every 5 seconds, but we only scan at most 1/64th of the
> table (and not more than 8k) buckets to avoid hogging cpu.
> 
> This means that a complete scan of the table will take several minutes
> of wall-clock time.
> 
> Considering that the gc run will never have to evict any entries
> during normal operation because those will happen from packet path
> this should be fine.
> 
> We only need gc to make sure userspace (conntrack event listeners)
> eventually learn of the timeout, and for resource reclaim in case the
> system becomes idle.
> 
> We do not disable BH and cond_resched for every bucket so this should
> not introduce noticeable latencies either.
> 
> A followup patch will add a small change to speed up GC for the extreme
> case where most entries are timed out on an otherwise idle system.
> 
> v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on
> nulls value change in gc worker, suggested by Eric Dumazet.
> 
> v3: don't call cancel_delayed_work_sync twice (again, Eric).
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nf_conntrack_core.c | 76 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)

Acked-by: Eric Dumazet <edumazet@google.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 nf-next 6/7] netfilter: conntrack: resched gc again if eviction rate is high
  2016-08-25 13:33 ` [PATCH v3 nf-next 6/7] netfilter: conntrack: resched gc again if eviction rate is high Florian Westphal
@ 2016-08-25 16:07   ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2016-08-25 16:07 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Thu, 2016-08-25 at 15:33 +0200, Florian Westphal wrote:
> If we evicted a large fraction of the scanned conntrack entries re-schedule
> the next gc cycle for immediate execution.
> 
> This triggers during tests where load is high, then drops to zero and
> many connections will be in TW/CLOSE state with < 30 second timeouts.
> 
> Without this change it will take several minutes until conntrack count
> comes back to normal.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---

Acked-by: Eric Dumazet <edumazet@google.com>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: netfilter: get rid of per-object conntrack timers
  2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
                   ` (6 preceding siblings ...)
  2016-08-25 13:33 ` [PATCH v3 nf-next 7/7] netfilter: remove __nf_ct_kill_acct helper Florian Westphal
@ 2016-08-30  9:43 ` Pablo Neira Ayuso
  7 siblings, 0 replies; 11+ messages in thread
From: Pablo Neira Ayuso @ 2016-08-30  9:43 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Thu, Aug 25, 2016 at 03:33:28PM +0200, Florian Westphal wrote:
> During NFWS 2016 it was mentioned that per-conntrack timers have
> two drawbacks:
>  - the 5-day default established timeout is very large and brings
>    extra constraints for the timer subsystem.
>  - most distros enable timer stats so timer struct eats 80 bytes
>    in each conntrack object.
> 
> This series replaces the per-object struct timer with a u32 jiffie
> stamp and one global delayed work queue for conntrack eviction.
> Size of nf_conn struct is reduced to 256 bytes on x86_64.
> 
> Eviction is performed from the packet path when doing
> table lookup, for cases where we have idle periods the work
> queue is used.

Series applied, thanks Florian!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-08-30  9:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 13:33 netfilter: get rid of per-object conntrack timers Florian Westphal
2016-08-25 13:33 ` [PATCH v3 nf-next 1/7] netfilter: restart search if moved to other chain Florian Westphal
2016-08-25 13:33 ` [PATCH v3 nf-next 2/7] netfilter: don't rely on DYING bit to detect when destroy event was sent Florian Westphal
2016-08-25 13:33 ` [PATCH v3 nf-next 3/7] netfilter: conntrack: get rid of conntrack timer Florian Westphal
2016-08-25 13:33 ` [PATCH v3 nf-next 4/7] netfilter: evict stale entries on netlink dumps Florian Westphal
2016-08-25 13:33 ` [PATCH v3 nf-next 5/7] netfilter: conntrack: add gc worker to remove timed-out entries Florian Westphal
2016-08-25 16:06   ` Eric Dumazet
2016-08-25 13:33 ` [PATCH v3 nf-next 6/7] netfilter: conntrack: resched gc again if eviction rate is high Florian Westphal
2016-08-25 16:07   ` Eric Dumazet
2016-08-25 13:33 ` [PATCH v3 nf-next 7/7] netfilter: remove __nf_ct_kill_acct helper Florian Westphal
2016-08-30  9:43 ` netfilter: get rid of per-object conntrack timers Pablo Neira Ayuso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).