All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
@ 2016-04-28 17:13 Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 1/9] netfilter: conntrack: keep BH enabled during lookup Florian Westphal
                   ` (15 more replies)
  0 siblings, 16 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

[ CCing netdev so netns folks can have a look too ]

This patch series removes the per-netns connection tracking tables.
All conntrack objects are then stored in one global global table.

This avoids the infamous 'vmalloc' when lots of namespaces are used:
We no longer allocate a new conntrack table for each namespace (with 64k
size this saves 512kb of memory per netns).

- net namespace address is made part of conntrack hash, to spread
  conntracks over entire table even if netns has overlapping ip addresses.
- lookup and iterators net_eq() to skip conntracks living in a different
  namespace.

Only the main conntrack table is converted here:
NAT bysrc and expectation hashes are still per namespace (will be unified
in a followup series).  Also, this retains the per-namespace kmem cache
for the conntrack objects.  This will also be resolved in a followup series.

Comments welcome.

 include/net/netfilter/nf_conntrack_core.h             |   11 
 include/net/netns/conntrack.h                         |    2 
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c        |    2 
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c |   38 ++
 net/netfilter/nf_conntrack_core.c                     |  233 +++++++++---------
 net/netfilter/nf_conntrack_helper.c                   |    6 
 net/netfilter/nf_conntrack_netlink.c                  |   11 
 net/netfilter/nf_conntrack_standalone.c               |   13 -
 net/netfilter/nf_nat_core.c                           |    2 
 net/netfilter/nfnetlink_cttimeout.c                   |    6 
 10 files changed, 179 insertions(+), 145 deletions(-)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH nf-next 1/9] netfilter: conntrack: keep BH enabled during lookup
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 2/9] netfilter: conntrack: fix lookup race during hash resize Florian Westphal
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

No need to disable BH here anymore:

stats are switched to _ATOMIC variant (== this_cpu_inc()), which
nowadays generates same code as the non _ATOMIC NF_STAT, at least on x86.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 25 ++++++++-----------------
 1 file changed, 8 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 1fd0ff1..1b63359 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -472,18 +472,13 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
 	struct hlist_nulls_node *n;
 	unsigned int bucket = hash_bucket(hash, net);
 
-	/* Disable BHs the entire time since we normally need to disable them
-	 * at least once for the stats anyway.
-	 */
-	local_bh_disable();
 begin:
 	hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[bucket], hnnode) {
 		if (nf_ct_key_equal(h, tuple, zone)) {
-			NF_CT_STAT_INC(net, found);
-			local_bh_enable();
+			NF_CT_STAT_INC_ATOMIC(net, found);
 			return h;
 		}
-		NF_CT_STAT_INC(net, searched);
+		NF_CT_STAT_INC_ATOMIC(net, searched);
 	}
 	/*
 	 * if the nulls value we got at the end of this lookup is
@@ -491,10 +486,9 @@ begin:
 	 * We probably met an item that was moved to another chain.
 	 */
 	if (get_nulls_value(n) != bucket) {
-		NF_CT_STAT_INC(net, search_restart);
+		NF_CT_STAT_INC_ATOMIC(net, search_restart);
 		goto begin;
 	}
-	local_bh_enable();
 
 	return NULL;
 }
@@ -735,22 +729,19 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	zone = nf_ct_zone(ignored_conntrack);
 	hash = hash_conntrack(net, tuple);
 
-	/* Disable BHs the entire time since we need to disable them at
-	 * least once for the stats anyway.
-	 */
-	rcu_read_lock_bh();
+	rcu_read_lock();
 	hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		if (ct != ignored_conntrack &&
 		    nf_ct_tuple_equal(tuple, &h->tuple) &&
 		    nf_ct_zone_equal(ct, zone, NF_CT_DIRECTION(h))) {
-			NF_CT_STAT_INC(net, found);
-			rcu_read_unlock_bh();
+			NF_CT_STAT_INC_ATOMIC(net, found);
+			rcu_read_unlock();
 			return 1;
 		}
-		NF_CT_STAT_INC(net, searched);
+		NF_CT_STAT_INC_ATOMIC(net, searched);
 	}
-	rcu_read_unlock_bh();
+	rcu_read_unlock();
 
 	return 0;
 }
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 2/9] netfilter: conntrack: fix lookup race during hash resize
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 1/9] netfilter: conntrack: keep BH enabled during lookup Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table Florian Westphal
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

When resizing the conntrack hash table at runtime via
echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with
the conntrack lookup path -- reads can happen in parallel and nothing
prevents readers from observing a the newly allocated hash but the old
size (or vice versa).

So access to hash[bucket] can trigger OOB read access in case the table got
expanded and we saw the new size but the old hash pointer (or it got shrunk
and we got new hash ptr but the size of the old and larger table):

kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN
CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107
[..]
Call Trace:
[<ffffffff822c3d6a>] ? nf_conntrack_tuple_taken+0x12a/0xe90
[<ffffffff822c3ac1>] ? nf_ct_invert_tuplepr+0x221/0x3a0
[<ffffffff8230e703>] get_unique_tuple+0xfb3/0x2760

Use generation counter to obtain the address/length of the same table.

Also add a synchronize_net before freeing the old hash.
AFAICS, without it we might access ct_hash[bucket] after ct_hash has been
freed, provided that lockless reader got delayed by another event:

CPU1			CPU2
seq_begin
seq_retry
<delay>			resize occurs
			free oldhash
for_each(oldhash[size])

Note that resize is only supported in init_netns, it took over 2 minutes
of constant resizing+flooding to produce the warning, so this isn't a
big problem in practice.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 1b63359..29fa08b 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -469,11 +469,18 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
 		      const struct nf_conntrack_tuple *tuple, u32 hash)
 {
 	struct nf_conntrack_tuple_hash *h;
+	struct hlist_nulls_head *ct_hash;
 	struct hlist_nulls_node *n;
-	unsigned int bucket = hash_bucket(hash, net);
+	unsigned int bucket, sequence;
 
 begin:
-	hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[bucket], hnnode) {
+	do {
+		sequence = read_seqcount_begin(&nf_conntrack_generation);
+		bucket = hash_bucket(hash, net);
+		ct_hash = net->ct.hash;
+	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
+
+	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
 		if (nf_ct_key_equal(h, tuple, zone)) {
 			NF_CT_STAT_INC_ATOMIC(net, found);
 			return h;
@@ -722,15 +729,21 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	struct net *net = nf_ct_net(ignored_conntrack);
 	const struct nf_conntrack_zone *zone;
 	struct nf_conntrack_tuple_hash *h;
+	struct hlist_nulls_head *ct_hash;
+	unsigned int hash, sequence;
 	struct hlist_nulls_node *n;
 	struct nf_conn *ct;
-	unsigned int hash;
 
 	zone = nf_ct_zone(ignored_conntrack);
-	hash = hash_conntrack(net, tuple);
 
 	rcu_read_lock();
-	hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) {
+	do {
+		sequence = read_seqcount_begin(&nf_conntrack_generation);
+		hash = hash_conntrack(net, tuple);
+		ct_hash = net->ct.hash;
+	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
+
+	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		if (ct != ignored_conntrack &&
 		    nf_ct_tuple_equal(tuple, &h->tuple) &&
@@ -1607,6 +1620,7 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 	nf_conntrack_all_unlock();
 	local_bh_enable();
 
+	synchronize_net();
 	nf_ct_free_hashtable(old_hash, old_size);
 	return 0;
 }
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 1/9] netfilter: conntrack: keep BH enabled during lookup Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 2/9] netfilter: conntrack: fix lookup race during hash resize Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-05-03 17:03   ` Pablo Neira Ayuso
  2016-04-28 17:13 ` [PATCH nf-next 4/9] netfilter: conntrack: use nf_ct_key_equal() in more places Florian Westphal
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

Once we place all conntracks into same table iteration becomes more
costly because the table contains conntracks that we are not interested
in (belonging to other netns).

So don't bother scanning if the current namespace has no entries.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 29fa08b..f2e75a5 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1428,6 +1428,9 @@ void nf_ct_iterate_cleanup(struct net *net,
 
 	might_sleep();
 
+	if (atomic_read(&net->ct.count) == 0)
+		return;
+
 	while ((ct = get_next_corpse(net, iter, data, &bucket)) != NULL) {
 		/* Time to push up daises... */
 		if (del_timer(&ct->timeout))
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 4/9] netfilter: conntrack: use nf_ct_key_equal() in more places
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (2 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf Florian Westphal
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

This prepares for upcoming change that places all conntracks into a
single, global table.  For this to work we will need to also compare
net pointer during lookup.  To avoid open-coding such check use the
nf_ct_key_equal helper and then later extend it to also consider net_eq.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index f2e75a5..3b9c302 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -572,16 +572,13 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 
 	/* See if there's one in the list already, including reverse */
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
-		if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
-				      &h->tuple) &&
-		    nf_ct_zone_equal(nf_ct_tuplehash_to_ctrack(h), zone,
-				     NF_CT_DIRECTION(h)))
+		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
+				    zone))
 			goto out;
+
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
-		if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_REPLY].tuple,
-				      &h->tuple) &&
-		    nf_ct_zone_equal(nf_ct_tuplehash_to_ctrack(h), zone,
-				     NF_CT_DIRECTION(h)))
+		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
+				    zone))
 			goto out;
 
 	add_timer(&ct->timeout);
@@ -665,16 +662,13 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	   NAT could have grabbed it without realizing, since we're
 	   not in the hash.  If there is, we lost race. */
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
-		if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
-				      &h->tuple) &&
-		    nf_ct_zone_equal(nf_ct_tuplehash_to_ctrack(h), zone,
-				     NF_CT_DIRECTION(h)))
+		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
+				    zone))
 			goto out;
+
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
-		if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_REPLY].tuple,
-				      &h->tuple) &&
-		    nf_ct_zone_equal(nf_ct_tuplehash_to_ctrack(h), zone,
-				     NF_CT_DIRECTION(h)))
+		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
+				    zone))
 			goto out;
 
 	/* Timer relative to confirmation time, not original
@@ -746,8 +740,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		if (ct != ignored_conntrack &&
-		    nf_ct_tuple_equal(tuple, &h->tuple) &&
-		    nf_ct_zone_equal(ct, zone, NF_CT_DIRECTION(h))) {
+		    nf_ct_key_equal(h, tuple, zone)) {
 			NF_CT_STAT_INC_ATOMIC(net, found);
 			rcu_read_unlock();
 			return 1;
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (3 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 4/9] netfilter: conntrack: use nf_ct_key_equal() in more places Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-05-03 18:12   ` Pablo Neira Ayuso
  2016-04-28 17:13 ` [PATCH nf-next 6/9] netfilter: conntrack: check netns when comparing conntrack objects Florian Westphal
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

The iteration process is lockless, so we test if the conntrack object is
eligible for printing (e.g. is AF_INET) after obtaining the reference
count.

Once we put all conntracks into same hash table we might see more
entries that need to be skipped.

So add a helper and first perform the test in a lockless fashion
for fast skip.

Once we obtain the reference count, just repeat the check.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 24 +++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index f0dfe92..483cf79 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -114,6 +114,19 @@ static inline void ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
 }
 #endif
 
+static bool ct_seq_should_skip(const struct nf_conn *ct,
+			       const struct nf_conntrack_tuple_hash *hash)
+{
+	/* we only want to print DIR_ORIGINAL */
+	if (NF_CT_DIRECTION(hash))
+		return true;
+
+	if (nf_ct_l3num(ct) != AF_INET)
+		return true;
+
+	return false;
+}
+
 static int ct_seq_show(struct seq_file *s, void *v)
 {
 	struct nf_conntrack_tuple_hash *hash = v;
@@ -123,14 +136,15 @@ static int ct_seq_show(struct seq_file *s, void *v)
 	int ret = 0;
 
 	NF_CT_ASSERT(ct);
-	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
+	if (ct_seq_should_skip(ct, hash))
 		return 0;
 
+	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
+		return 0;
 
-	/* we only want to print DIR_ORIGINAL */
-	if (NF_CT_DIRECTION(hash))
-		goto release;
-	if (nf_ct_l3num(ct) != AF_INET)
+	/* check if we raced w. object reuse */
+	if (!nf_ct_is_confirmed(ct) ||
+	    ct_seq_should_skip(ct, hash))
 		goto release;
 
 	l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct));
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 6/9] netfilter: conntrack: check netns when comparing conntrack objects
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (4 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

Once we place all conntracks in the same hash table we must also compare
the netns pointer to skip conntracks that belong to a different namespace.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   |  8 ++++++--
 net/netfilter/nf_conntrack_core.c                  | 23 ++++++++++++----------
 net/netfilter/nf_conntrack_netlink.c               |  3 +++
 3 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index 483cf79..171aba1 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -115,6 +115,7 @@ static inline void ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
 #endif
 
 static bool ct_seq_should_skip(const struct nf_conn *ct,
+			       const struct net *net,
 			       const struct nf_conntrack_tuple_hash *hash)
 {
 	/* we only want to print DIR_ORIGINAL */
@@ -124,6 +125,9 @@ static bool ct_seq_should_skip(const struct nf_conn *ct,
 	if (nf_ct_l3num(ct) != AF_INET)
 		return true;
 
+	if (!net_eq(nf_ct_net(ct), net))
+		return true;
+
 	return false;
 }
 
@@ -136,7 +140,7 @@ static int ct_seq_show(struct seq_file *s, void *v)
 	int ret = 0;
 
 	NF_CT_ASSERT(ct);
-	if (ct_seq_should_skip(ct, hash))
+	if (ct_seq_should_skip(ct, seq_file_net(s), hash))
 		return 0;
 
 	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
@@ -144,7 +148,7 @@ static int ct_seq_show(struct seq_file *s, void *v)
 
 	/* check if we raced w. object reuse */
 	if (!nf_ct_is_confirmed(ct) ||
-	    ct_seq_should_skip(ct, hash))
+	    ct_seq_should_skip(ct, seq_file_net(s), hash))
 		goto release;
 
 	l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct));
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3b9c302..10ae2ee 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -447,7 +447,8 @@ static void death_by_timeout(unsigned long ul_conntrack)
 static inline bool
 nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
 		const struct nf_conntrack_tuple *tuple,
-		const struct nf_conntrack_zone *zone)
+		const struct nf_conntrack_zone *zone,
+		const struct net *net)
 {
 	struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h);
 
@@ -456,7 +457,8 @@ nf_ct_key_equal(struct nf_conntrack_tuple_hash *h,
 	 */
 	return nf_ct_tuple_equal(tuple, &h->tuple) &&
 	       nf_ct_zone_equal(ct, zone, NF_CT_DIRECTION(h)) &&
-	       nf_ct_is_confirmed(ct);
+	       nf_ct_is_confirmed(ct) &&
+	       net_eq(net, nf_ct_net(ct));
 }
 
 /*
@@ -481,7 +483,7 @@ begin:
 	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
-		if (nf_ct_key_equal(h, tuple, zone)) {
+		if (nf_ct_key_equal(h, tuple, zone, net)) {
 			NF_CT_STAT_INC_ATOMIC(net, found);
 			return h;
 		}
@@ -517,7 +519,7 @@ begin:
 			     !atomic_inc_not_zero(&ct->ct_general.use)))
 			h = NULL;
 		else {
-			if (unlikely(!nf_ct_key_equal(h, tuple, zone))) {
+			if (unlikely(!nf_ct_key_equal(h, tuple, zone, net))) {
 				nf_ct_put(ct);
 				goto begin;
 			}
@@ -573,12 +575,12 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 	/* See if there's one in the list already, including reverse */
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
-				    zone))
+				    zone, net))
 			goto out;
 
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
-				    zone))
+				    zone, net))
 			goto out;
 
 	add_timer(&ct->timeout);
@@ -663,12 +665,12 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	   not in the hash.  If there is, we lost race. */
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
-				    zone))
+				    zone, net))
 			goto out;
 
 	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
-				    zone))
+				    zone, net))
 			goto out;
 
 	/* Timer relative to confirmation time, not original
@@ -740,7 +742,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		if (ct != ignored_conntrack &&
-		    nf_ct_key_equal(h, tuple, zone)) {
+		    nf_ct_key_equal(h, tuple, zone, net)) {
 			NF_CT_STAT_INC_ATOMIC(net, found);
 			rcu_read_unlock();
 			return 1;
@@ -1383,7 +1385,8 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
 				if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 					continue;
 				ct = nf_ct_tuplehash_to_ctrack(h);
-				if (iter(ct, data))
+				if (net_eq(nf_ct_net(ct), net) &&
+				    iter(ct, data))
 					goto found;
 			}
 		}
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 294a8e2..f6bbcb2 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -837,6 +837,9 @@ restart:
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
+			if (!net_eq(net, nf_ct_net(ct)))
+				continue;
+
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 7/9] netfilter: conntrack: make netns address part of hash
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (5 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 6/9] netfilter: conntrack: check netns when comparing conntrack objects Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

Once we place all conntracks into a global hash table we want them to be
spread across entire hash table, even if namespaces have overlapping ip
addresses.

We add nf_conntrack_netns_hash helper to later re-use it for nat bysrc
and expectation hash handling.  The helper also allows us to avoid the
(then) pointless hashing of init_net if kernel is built without netns
support.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/net/netfilter/nf_conntrack_core.h | 10 +++++++++
 net/netfilter/nf_conntrack_core.c         | 34 +++++++++++++++----------------
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 62e17d1..389e6da 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -12,6 +12,7 @@
 #ifndef _NF_CONNTRACK_CORE_H
 #define _NF_CONNTRACK_CORE_H
 
+#include <linux/hash.h>
 #include <linux/netfilter.h>
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_l4proto.h>
@@ -86,4 +87,13 @@ void nf_conntrack_lock(spinlock_t *lock);
 
 extern spinlock_t nf_conntrack_expect_lock;
 
+static inline u32 nf_conntrack_netns_hash(const struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	return hash_ptr(net, 32);
+#else
+	return 0;
+#endif
+}
+
 #endif /* _NF_CONNTRACK_CORE_H */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 10ae2ee..c29b929 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -144,9 +144,11 @@ EXPORT_PER_CPU_SYMBOL(nf_conntrack_untracked);
 
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
-static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple)
+static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
+			      const struct net *net)
 {
 	unsigned int n;
+	u32 seed;
 
 	get_random_once(&nf_conntrack_hash_rnd, sizeof(nf_conntrack_hash_rnd));
 
@@ -154,32 +156,29 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple)
 	 * destination ports (which is a multiple of 4) and treat the last
 	 * three bytes manually.
 	 */
+	seed = nf_conntrack_hash_rnd ^ nf_conntrack_netns_hash(net);
 	n = (sizeof(tuple->src) + sizeof(tuple->dst.u3)) / sizeof(u32);
-	return jhash2((u32 *)tuple, n, nf_conntrack_hash_rnd ^
+	return jhash2((u32 *)tuple, n, seed ^
 		      (((__force __u16)tuple->dst.u.all << 16) |
 		      tuple->dst.protonum));
 }
 
-static u32 __hash_bucket(u32 hash, unsigned int size)
-{
-	return reciprocal_scale(hash, size);
-}
-
 static u32 hash_bucket(u32 hash, const struct net *net)
 {
-	return __hash_bucket(hash, net->ct.htable_size);
+	return reciprocal_scale(hash, net->ct.htable_size);
 }
 
-static u_int32_t __hash_conntrack(const struct nf_conntrack_tuple *tuple,
-				  unsigned int size)
+static u32 __hash_conntrack(const struct net *net,
+			    const struct nf_conntrack_tuple *tuple,
+			    unsigned int size)
 {
-	return __hash_bucket(hash_conntrack_raw(tuple), size);
+	return reciprocal_scale(hash_conntrack_raw(tuple, net), size);
 }
 
-static inline u_int32_t hash_conntrack(const struct net *net,
-				       const struct nf_conntrack_tuple *tuple)
+static u32 hash_conntrack(const struct net *net,
+			  const struct nf_conntrack_tuple *tuple)
 {
-	return __hash_conntrack(tuple, net->ct.htable_size);
+	return __hash_conntrack(net, tuple, net->ct.htable_size);
 }
 
 bool
@@ -535,7 +534,7 @@ nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 		      const struct nf_conntrack_tuple *tuple)
 {
 	return __nf_conntrack_find_get(net, zone, tuple,
-				       hash_conntrack_raw(tuple));
+				       hash_conntrack_raw(tuple, net));
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_find_get);
 
@@ -1041,7 +1040,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
 
 	/* look for tuple match */
 	zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
-	hash = hash_conntrack_raw(&tuple);
+	hash = hash_conntrack_raw(&tuple, net);
 	h = __nf_conntrack_find_get(net, zone, &tuple, hash);
 	if (!h) {
 		h = init_conntrack(net, tmpl, &tuple, l3proto, l4proto,
@@ -1605,7 +1604,8 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 					struct nf_conntrack_tuple_hash, hnnode);
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			hlist_nulls_del_rcu(&h->hnnode);
-			bucket = __hash_conntrack(&h->tuple, hashsize);
+			bucket = __hash_conntrack(nf_ct_net(ct),
+						  &h->tuple, hashsize);
 			hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]);
 		}
 	}
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (6 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-04-29 15:04   ` Florian Westphal
  2016-04-28 17:13 ` [PATCH nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic Florian Westphal
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the table.

Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
64bit system.

NAT bysrc and expectation hash is still per namespace, those will
changed too soon.

Future patch will also make conntrack object slab cache global again.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 checkpatch complains about 'WARNING: line over 80 characters' but
 forcing line breaks looked even worse to me.

 include/net/netfilter/nf_conntrack_core.h          |  1 +
 include/net/netns/conntrack.h                      |  2 -
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c     |  2 +-
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 10 ++-
 net/netfilter/nf_conntrack_core.c                  | 78 +++++++++++-----------
 net/netfilter/nf_conntrack_helper.c                |  6 +-
 net/netfilter/nf_conntrack_netlink.c               |  8 +--
 net/netfilter/nf_conntrack_standalone.c            | 13 ++--
 net/netfilter/nf_nat_core.c                        |  2 +-
 net/netfilter/nfnetlink_cttimeout.c                |  6 +-
 10 files changed, 60 insertions(+), 68 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 389e6da..e8ad0ad 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -82,6 +82,7 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple,
 
 #define CONNTRACK_LOCKS 1024
 
+extern struct hlist_nulls_head *nf_conntrack_hash;
 extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS];
 void nf_conntrack_lock(spinlock_t *lock);
 
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index b052785..251c435 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -93,9 +93,7 @@ struct netns_ct {
 	int			sysctl_tstamp;
 	int			sysctl_checksum;
 
-	unsigned int		htable_size;
 	struct kmem_cache	*nf_conntrack_cachep;
-	struct hlist_nulls_head	*hash;
 	struct hlist_head	*expect_hash;
 	struct ct_pcpu __percpu *pcpu_lists;
 	struct ip_conntrack_stat __percpu *stat;
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index e3c46e8..ae1a71a 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -360,7 +360,7 @@ static int ipv4_init_net(struct net *net)
 
 	in->ctl_table[0].data = &nf_conntrack_max;
 	in->ctl_table[1].data = &net->ct.count;
-	in->ctl_table[2].data = &net->ct.htable_size;
+	in->ctl_table[2].data = &nf_conntrack_htable_size;
 	in->ctl_table[3].data = &net->ct.sysctl_checksum;
 	in->ctl_table[4].data = &net->ct.sysctl_log_invalid;
 #endif
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index 171aba1..f8fc7ab 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -31,15 +31,14 @@ struct ct_iter_state {
 
 static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 	struct hlist_nulls_node *n;
 
 	for (st->bucket = 0;
-	     st->bucket < net->ct.htable_size;
+	     st->bucket < nf_conntrack_htable_size;
 	     st->bucket++) {
 		n = rcu_dereference(
-			hlist_nulls_first_rcu(&net->ct.hash[st->bucket]));
+			hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
 		if (!is_a_nulls(n))
 			return n;
 	}
@@ -49,17 +48,16 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
 				      struct hlist_nulls_node *head)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 
 	head = rcu_dereference(hlist_nulls_next_rcu(head));
 	while (is_a_nulls(head)) {
 		if (likely(get_nulls_value(head) == st->bucket)) {
-			if (++st->bucket >= net->ct.htable_size)
+			if (++st->bucket >= nf_conntrack_htable_size)
 				return NULL;
 		}
 		head = rcu_dereference(
-			hlist_nulls_first_rcu(&net->ct.hash[st->bucket]));
+			hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
 	}
 	return head;
 }
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index c29b929..d58b597 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -68,6 +68,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_locks);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock);
 EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
 
+struct hlist_nulls_head *nf_conntrack_hash __read_mostly;
+EXPORT_SYMBOL_GPL(nf_conntrack_hash);
+
 static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
 static __read_mostly seqcount_t nf_conntrack_generation;
 static __read_mostly bool nf_conntrack_locks_all;
@@ -163,9 +166,9 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
 		      tuple->dst.protonum));
 }
 
-static u32 hash_bucket(u32 hash, const struct net *net)
+static u32 scale_hash(u32 hash)
 {
-	return reciprocal_scale(hash, net->ct.htable_size);
+	return reciprocal_scale(hash, nf_conntrack_htable_size);
 }
 
 static u32 __hash_conntrack(const struct net *net,
@@ -178,7 +181,7 @@ static u32 __hash_conntrack(const struct net *net,
 static u32 hash_conntrack(const struct net *net,
 			  const struct nf_conntrack_tuple *tuple)
 {
-	return __hash_conntrack(net, tuple, net->ct.htable_size);
+	return scale_hash(hash_conntrack_raw(tuple, net));
 }
 
 bool
@@ -477,8 +480,8 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
 begin:
 	do {
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
-		bucket = hash_bucket(hash, net);
-		ct_hash = net->ct.hash;
+		bucket = scale_hash(hash);
+		ct_hash = nf_conntrack_hash;
 	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
@@ -542,12 +545,10 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
 				       unsigned int hash,
 				       unsigned int reply_hash)
 {
-	struct net *net = nf_ct_net(ct);
-
 	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
-			   &net->ct.hash[hash]);
+			   &nf_conntrack_hash[hash]);
 	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode,
-			   &net->ct.hash[reply_hash]);
+			   &nf_conntrack_hash[reply_hash]);
 }
 
 int
@@ -572,12 +573,12 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 	} while (nf_conntrack_double_lock(net, hash, reply_hash, sequence));
 
 	/* See if there's one in the list already, including reverse */
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
 				    zone, net))
 			goto out;
 
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
 				    zone, net))
 			goto out;
@@ -632,7 +633,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
 		/* reuse the hash saved before */
 		hash = *(unsigned long *)&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev;
-		hash = hash_bucket(hash, net);
+		hash = scale_hash(hash);
 		reply_hash = hash_conntrack(net,
 					   &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
 
@@ -662,12 +663,12 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	/* See if there's one in the list already, including reverse:
 	   NAT could have grabbed it without realizing, since we're
 	   not in the hash.  If there is, we lost race. */
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
 				    zone, net))
 			goto out;
 
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
 				    zone, net))
 			goto out;
@@ -735,7 +736,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	do {
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
 		hash = hash_conntrack(net, tuple);
-		ct_hash = net->ct.hash;
+		ct_hash = nf_conntrack_hash;
 	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
@@ -772,16 +773,16 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 	local_bh_disable();
 restart:
 	sequence = read_seqcount_begin(&nf_conntrack_generation);
-	hash = hash_bucket(_hash, net);
-	for (; i < net->ct.htable_size; i++) {
+	hash = scale_hash(_hash);
+	for (; i < nf_conntrack_htable_size; i++) {
 		lockp = &nf_conntrack_locks[hash % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
 		if (read_seqcount_retry(&nf_conntrack_generation, sequence)) {
 			spin_unlock(lockp);
 			goto restart;
 		}
-		hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash],
-					 hnnode) {
+		hlist_nulls_for_each_entry_rcu(h, n, &nf_conntrack_hash[hash],
+					       hnnode) {
 			tmp = nf_ct_tuplehash_to_ctrack(h);
 			if (!test_bit(IPS_ASSURED_BIT, &tmp->status) &&
 			    !nf_ct_is_dying(tmp) &&
@@ -792,7 +793,7 @@ restart:
 			cnt++;
 		}
 
-		hash = (hash + 1) % net->ct.htable_size;
+		hash = (hash + 1) % nf_conntrack_htable_size;
 		spin_unlock(lockp);
 
 		if (ct || cnt >= NF_CT_EVICTION_RANGE)
@@ -1375,12 +1376,12 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
 	int cpu;
 	spinlock_t *lockp;
 
-	for (; *bucket < net->ct.htable_size; (*bucket)++) {
+	for (; *bucket < nf_conntrack_htable_size; (*bucket)++) {
 		lockp = &nf_conntrack_locks[*bucket % CONNTRACK_LOCKS];
 		local_bh_disable();
 		nf_conntrack_lock(lockp);
-		if (*bucket < net->ct.htable_size) {
-			hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], hnnode) {
+		if (*bucket < nf_conntrack_htable_size) {
+			hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[*bucket], hnnode) {
 				if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 					continue;
 				ct = nf_ct_tuplehash_to_ctrack(h);
@@ -1527,7 +1528,6 @@ i_see_dead_people:
 	}
 
 	list_for_each_entry(net, net_exit_list, exit_list) {
-		nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
 		nf_conntrack_proto_pernet_fini(net);
 		nf_conntrack_helper_pernet_fini(net);
 		nf_conntrack_ecache_pernet_fini(net);
@@ -1598,10 +1598,10 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 	 * though since that required taking the locks.
 	 */
 
-	for (i = 0; i < init_net.ct.htable_size; i++) {
-		while (!hlist_nulls_empty(&init_net.ct.hash[i])) {
-			h = hlist_nulls_entry(init_net.ct.hash[i].first,
-					struct nf_conntrack_tuple_hash, hnnode);
+	for (i = 0; i < nf_conntrack_htable_size; i++) {
+		while (!hlist_nulls_empty(&nf_conntrack_hash[i])) {
+			h = hlist_nulls_entry(nf_conntrack_hash[i].first,
+					      struct nf_conntrack_tuple_hash, hnnode);
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			hlist_nulls_del_rcu(&h->hnnode);
 			bucket = __hash_conntrack(nf_ct_net(ct),
@@ -1609,11 +1609,11 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 			hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]);
 		}
 	}
-	old_size = init_net.ct.htable_size;
-	old_hash = init_net.ct.hash;
+	old_size = nf_conntrack_htable_size;
+	old_hash = nf_conntrack_hash;
 
-	init_net.ct.htable_size = nf_conntrack_htable_size = hashsize;
-	init_net.ct.hash = hash;
+	nf_conntrack_hash = hash;
+	nf_conntrack_htable_size = hashsize;
 
 	write_seqcount_end(&nf_conntrack_generation);
 	nf_conntrack_all_unlock();
@@ -1669,6 +1669,11 @@ int nf_conntrack_init_start(void)
 		 * entries. */
 		max_factor = 4;
 	}
+
+	nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);
+	if (!nf_conntrack_hash)
+		return -ENOMEM;
+
 	nf_conntrack_max = max_factor * nf_conntrack_htable_size;
 
 	printk(KERN_INFO "nf_conntrack version %s (%u buckets, %d max)\n",
@@ -1747,6 +1752,7 @@ err_tstamp:
 err_acct:
 	nf_conntrack_expect_fini();
 err_expect:
+	nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size);
 	return ret;
 }
 
@@ -1799,12 +1805,6 @@ int nf_conntrack_init_net(struct net *net)
 		goto err_cache;
 	}
 
-	net->ct.htable_size = nf_conntrack_htable_size;
-	net->ct.hash = nf_ct_alloc_hashtable(&net->ct.htable_size, 1);
-	if (!net->ct.hash) {
-		printk(KERN_ERR "Unable to create nf_conntrack_hash\n");
-		goto err_hash;
-	}
 	ret = nf_conntrack_expect_pernet_init(net);
 	if (ret < 0)
 		goto err_expect;
@@ -1836,8 +1836,6 @@ err_tstamp:
 err_acct:
 	nf_conntrack_expect_pernet_fini(net);
 err_expect:
-	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
-err_hash:
 	kmem_cache_destroy(net->ct.nf_conntrack_cachep);
 err_cache:
 	kfree(net->ct.slabname);
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 498bf74..cb48e6a 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -424,10 +424,10 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
 		spin_unlock_bh(&pcpu->lock);
 	}
 	local_bh_disable();
-	for (i = 0; i < net->ct.htable_size; i++) {
+	for (i = 0; i < nf_conntrack_htable_size; i++) {
 		nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
-		if (i < net->ct.htable_size) {
-			hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode)
+		if (i < nf_conntrack_htable_size) {
+			hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
 				unhelp(h, me);
 		}
 		spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index f6bbcb2..e00f178 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -824,16 +824,16 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 	last = (struct nf_conn *)cb->args[1];
 
 	local_bh_disable();
-	for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) {
+	for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
 restart:
 		lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
-		if (cb->args[0] >= net->ct.htable_size) {
+		if (cb->args[0] >= nf_conntrack_htable_size) {
 			spin_unlock(lockp);
 			goto out;
 		}
-		hlist_nulls_for_each_entry(h, n, &net->ct.hash[cb->args[0]],
-					 hnnode) {
+		hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
+					   hnnode) {
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 0f1a45b..f87e84e 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -54,14 +54,13 @@ struct ct_iter_state {
 
 static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 	struct hlist_nulls_node *n;
 
 	for (st->bucket = 0;
-	     st->bucket < net->ct.htable_size;
+	     st->bucket < nf_conntrack_htable_size;
 	     st->bucket++) {
-		n = rcu_dereference(hlist_nulls_first_rcu(&net->ct.hash[st->bucket]));
+		n = rcu_dereference(hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
 		if (!is_a_nulls(n))
 			return n;
 	}
@@ -71,18 +70,17 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
 				      struct hlist_nulls_node *head)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 
 	head = rcu_dereference(hlist_nulls_next_rcu(head));
 	while (is_a_nulls(head)) {
 		if (likely(get_nulls_value(head) == st->bucket)) {
-			if (++st->bucket >= net->ct.htable_size)
+			if (++st->bucket >= nf_conntrack_htable_size)
 				return NULL;
 		}
 		head = rcu_dereference(
 				hlist_nulls_first_rcu(
-					&net->ct.hash[st->bucket]));
+					&nf_conntrack_hash[st->bucket]));
 	}
 	return head;
 }
@@ -458,7 +456,7 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 	},
 	{
 		.procname       = "nf_conntrack_buckets",
-		.data           = &init_net.ct.htable_size,
+		.data           = &nf_conntrack_htable_size,
 		.maxlen         = sizeof(unsigned int),
 		.mode           = 0444,
 		.proc_handler   = proc_dointvec,
@@ -512,7 +510,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 		goto out_kmemdup;
 
 	table[1].data = &net->ct.count;
-	table[2].data = &net->ct.htable_size;
 	table[3].data = &net->ct.sysctl_checksum;
 	table[4].data = &net->ct.sysctl_log_invalid;
 
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 3d52271..d74e716 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -824,7 +824,7 @@ nfnetlink_parse_nat_setup(struct nf_conn *ct,
 static int __net_init nf_nat_net_init(struct net *net)
 {
 	/* Leave them the same for the moment. */
-	net->ct.nat_htable_size = net->ct.htable_size;
+	net->ct.nat_htable_size = nf_conntrack_htable_size;
 	net->ct.nat_bysource = nf_ct_alloc_hashtable(&net->ct.nat_htable_size, 0);
 	if (!net->ct.nat_bysource)
 		return -ENOMEM;
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 2671b9d..3c84f14 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -306,10 +306,10 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout)
 	int i;
 
 	local_bh_disable();
-	for (i = 0; i < net->ct.htable_size; i++) {
+	for (i = 0; i < nf_conntrack_htable_size; i++) {
 		nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
-		if (i < net->ct.htable_size) {
-			hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode)
+		if (i < nf_conntrack_htable_size) {
+			hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
 				untimeout(h, timeout);
 		}
 		spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
-- 
2.7.3

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (7 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
@ 2016-04-28 17:13 ` Florian Westphal
  2016-05-02 16:39 ` [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-28 17:13 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

When iterating, skip conntrack entries living in a different netns.

We could ignore netns and kill some other non-assured one, but it
has two problems:

- a netns can kill non-assured conntracks in other namespace
- we would start to 'over-subscribe' the affected/overlimit netns.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 43 +++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index d58b597..418e4bc 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -763,18 +763,20 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 {
 	/* Use oldest entry, which is roughly LRU */
 	struct nf_conntrack_tuple_hash *h;
-	struct nf_conn *ct = NULL, *tmp;
+	struct nf_conn *tmp;
 	struct hlist_nulls_node *n;
-	unsigned int i = 0, cnt = 0;
-	int dropped = 0;
-	unsigned int hash, sequence;
+	unsigned int i, hash, sequence;
+	struct nf_conn *ct = NULL;
 	spinlock_t *lockp;
+	bool ret = false;
+
+	i = 0;
 
 	local_bh_disable();
 restart:
 	sequence = read_seqcount_begin(&nf_conntrack_generation);
-	hash = scale_hash(_hash);
-	for (; i < nf_conntrack_htable_size; i++) {
+	for (; i < NF_CT_EVICTION_RANGE; i++) {
+		hash = scale_hash(_hash++);
 		lockp = &nf_conntrack_locks[hash % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
 		if (read_seqcount_retry(&nf_conntrack_generation, sequence)) {
@@ -784,35 +786,40 @@ restart:
 		hlist_nulls_for_each_entry_rcu(h, n, &nf_conntrack_hash[hash],
 					       hnnode) {
 			tmp = nf_ct_tuplehash_to_ctrack(h);
-			if (!test_bit(IPS_ASSURED_BIT, &tmp->status) &&
-			    !nf_ct_is_dying(tmp) &&
-			    atomic_inc_not_zero(&tmp->ct_general.use)) {
+
+			if (test_bit(IPS_ASSURED_BIT, &tmp->status) ||
+			    !net_eq(nf_ct_net(tmp), net) ||
+			    nf_ct_is_dying(tmp))
+				continue;
+
+			if (atomic_inc_not_zero(&tmp->ct_general.use)) {
 				ct = tmp;
 				break;
 			}
-			cnt++;
 		}
 
-		hash = (hash + 1) % nf_conntrack_htable_size;
 		spin_unlock(lockp);
-
-		if (ct || cnt >= NF_CT_EVICTION_RANGE)
+		if (ct)
 			break;
-
 	}
+
 	local_bh_enable();
 
 	if (!ct)
-		return dropped;
+		return false;
 
-	if (del_timer(&ct->timeout)) {
+	/* kill only if in same netns -- might have moved due to
+	 * SLAB_DESTROY_BY_RCU rules
+	 */
+	if (net_eq(nf_ct_net(ct), net) && del_timer(&ct->timeout)) {
 		if (nf_ct_delete(ct, 0, 0)) {
-			dropped = 1;
 			NF_CT_STAT_INC_ATOMIC(net, early_drop);
+			ret = true;
 		}
 	}
+
 	nf_ct_put(ct);
-	return dropped;
+	return ret;
 }
 
 static struct nf_conn *
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces
  2016-04-28 17:13 ` [PATCH nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
@ 2016-04-29 15:04   ` Florian Westphal
  0 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-04-29 15:04 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

Florian Westphal <fw@strlen.de> wrote:
> We already include netns address in the hash and compare the netns pointers
> during lookup, so even if namespaces have overlapping addresses entries
> will be spread across the table.
> 
> Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
> 64bit system.
> 
> NAT bysrc and expectation hash is still per namespace, those will
> changed too soon.
> 
> Future patch will also make conntrack object slab cache global again.
> 
> @@ -1527,7 +1528,6 @@ i_see_dead_people:
>  	}
>  
>  	list_for_each_entry(net, net_exit_list, exit_list) {
> -		nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);

Removing this is ok, but nf_ct_free_hashtable() must now be called in
nf_conntrack_cleanup_end().

I'll wait with v2 for a couple of days.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (8 preceding siblings ...)
  2016-04-28 17:13 ` [PATCH nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic Florian Westphal
@ 2016-05-02 16:39 ` Florian Westphal
  2016-05-02 16:51   ` Eric Dumazet
  2016-05-02 16:39 ` [PATCH v2 nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-05-02 16:39 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Once we place all conntracks into a global hash table we want them to be
spread across entire hash table, even if namespaces have overlapping ip
addresses.

We add nf_conntrack_netns_hash helper to later re-use it for nat bysrc
and expectation hash handling.  The helper also allows us to avoid the
(then) pointless hashing of init_net if kernel is built without netns
support.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
 use hash32_ptr instead of hash_ptr

 include/net/netfilter/nf_conntrack_core.h | 10 +++++++++
 net/netfilter/nf_conntrack_core.c         | 34 +++++++++++++++----------------
 2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 62e17d1..c05ee81 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -12,6 +12,7 @@
 #ifndef _NF_CONNTRACK_CORE_H
 #define _NF_CONNTRACK_CORE_H
 
+#include <linux/hash.h>
 #include <linux/netfilter.h>
 #include <net/netfilter/nf_conntrack_l3proto.h>
 #include <net/netfilter/nf_conntrack_l4proto.h>
@@ -86,4 +87,13 @@ void nf_conntrack_lock(spinlock_t *lock);
 
 extern spinlock_t nf_conntrack_expect_lock;
 
+static inline u32 nf_conntrack_netns_hash(const struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	return hash32_ptr(net);
+#else
+	return 0;
+#endif
+}
+
 #endif /* _NF_CONNTRACK_CORE_H */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 10ae2ee..c29b929 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -144,9 +144,11 @@ EXPORT_PER_CPU_SYMBOL(nf_conntrack_untracked);
 
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
-static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple)
+static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
+			      const struct net *net)
 {
 	unsigned int n;
+	u32 seed;
 
 	get_random_once(&nf_conntrack_hash_rnd, sizeof(nf_conntrack_hash_rnd));
 
@@ -154,32 +156,29 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple)
 	 * destination ports (which is a multiple of 4) and treat the last
 	 * three bytes manually.
 	 */
+	seed = nf_conntrack_hash_rnd ^ nf_conntrack_netns_hash(net);
 	n = (sizeof(tuple->src) + sizeof(tuple->dst.u3)) / sizeof(u32);
-	return jhash2((u32 *)tuple, n, nf_conntrack_hash_rnd ^
+	return jhash2((u32 *)tuple, n, seed ^
 		      (((__force __u16)tuple->dst.u.all << 16) |
 		      tuple->dst.protonum));
 }
 
-static u32 __hash_bucket(u32 hash, unsigned int size)
-{
-	return reciprocal_scale(hash, size);
-}
-
 static u32 hash_bucket(u32 hash, const struct net *net)
 {
-	return __hash_bucket(hash, net->ct.htable_size);
+	return reciprocal_scale(hash, net->ct.htable_size);
 }
 
-static u_int32_t __hash_conntrack(const struct nf_conntrack_tuple *tuple,
-				  unsigned int size)
+static u32 __hash_conntrack(const struct net *net,
+			    const struct nf_conntrack_tuple *tuple,
+			    unsigned int size)
 {
-	return __hash_bucket(hash_conntrack_raw(tuple), size);
+	return reciprocal_scale(hash_conntrack_raw(tuple, net), size);
 }
 
-static inline u_int32_t hash_conntrack(const struct net *net,
-				       const struct nf_conntrack_tuple *tuple)
+static u32 hash_conntrack(const struct net *net,
+			  const struct nf_conntrack_tuple *tuple)
 {
-	return __hash_conntrack(tuple, net->ct.htable_size);
+	return __hash_conntrack(net, tuple, net->ct.htable_size);
 }
 
 bool
@@ -535,7 +534,7 @@ nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 		      const struct nf_conntrack_tuple *tuple)
 {
 	return __nf_conntrack_find_get(net, zone, tuple,
-				       hash_conntrack_raw(tuple));
+				       hash_conntrack_raw(tuple, net));
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_find_get);
 
@@ -1041,7 +1040,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
 
 	/* look for tuple match */
 	zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
-	hash = hash_conntrack_raw(&tuple);
+	hash = hash_conntrack_raw(&tuple, net);
 	h = __nf_conntrack_find_get(net, zone, &tuple, hash);
 	if (!h) {
 		h = init_conntrack(net, tmpl, &tuple, l3proto, l4proto,
@@ -1605,7 +1604,8 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 					struct nf_conntrack_tuple_hash, hnnode);
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			hlist_nulls_del_rcu(&h->hnnode);
-			bucket = __hash_conntrack(&h->tuple, hashsize);
+			bucket = __hash_conntrack(nf_ct_net(ct),
+						  &h->tuple, hashsize);
 			hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]);
 		}
 	}
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (9 preceding siblings ...)
  2016-05-02 16:39 ` [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
@ 2016-05-02 16:39 ` Florian Westphal
  2016-05-02 16:40 ` [PATCH v2 nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic Florian Westphal
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-05-02 16:39 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

We already include netns address in the hash and compare the netns pointers
during lookup, so even if namespaces have overlapping addresses entries
will be spread across the table.

Assuming 64k bucket size, this change saves 0.5 mbyte per namespace on a
64bit system.

NAT bysrc and expectation hash is still per namespace, those will
changed too soon.

Future patch will also make conntrack object slab cache global again.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
 - call nf_ct_free_hashtable in nf_conntrack_cleanup_end().

 checkpatch complains about 'WARNING: line over 80 characters' but
 forcing line breaks looked even worse to me.

 include/net/netfilter/nf_conntrack_core.h          |  1 +
 include/net/netns/conntrack.h                      |  2 -
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c     |  2 +-
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 10 ++-
 net/netfilter/nf_conntrack_core.c                  | 80 +++++++++++-----------
 net/netfilter/nf_conntrack_helper.c                |  6 +-
 net/netfilter/nf_conntrack_netlink.c               |  8 +--
 net/netfilter/nf_conntrack_standalone.c            | 13 ++--
 net/netfilter/nf_nat_core.c                        |  2 +-
 net/netfilter/nfnetlink_cttimeout.c                |  6 +-
 10 files changed, 62 insertions(+), 68 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index c05ee81..9ccd9c0 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -82,6 +82,7 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple,
 
 #define CONNTRACK_LOCKS 1024
 
+extern struct hlist_nulls_head *nf_conntrack_hash;
 extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS];
 void nf_conntrack_lock(spinlock_t *lock);
 
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index b052785..251c435 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -93,9 +93,7 @@ struct netns_ct {
 	int			sysctl_tstamp;
 	int			sysctl_checksum;
 
-	unsigned int		htable_size;
 	struct kmem_cache	*nf_conntrack_cachep;
-	struct hlist_nulls_head	*hash;
 	struct hlist_head	*expect_hash;
 	struct ct_pcpu __percpu *pcpu_lists;
 	struct ip_conntrack_stat __percpu *stat;
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index e3c46e8..ae1a71a 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -360,7 +360,7 @@ static int ipv4_init_net(struct net *net)
 
 	in->ctl_table[0].data = &nf_conntrack_max;
 	in->ctl_table[1].data = &net->ct.count;
-	in->ctl_table[2].data = &net->ct.htable_size;
+	in->ctl_table[2].data = &nf_conntrack_htable_size;
 	in->ctl_table[3].data = &net->ct.sysctl_checksum;
 	in->ctl_table[4].data = &net->ct.sysctl_log_invalid;
 #endif
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index 171aba1..f8fc7ab 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -31,15 +31,14 @@ struct ct_iter_state {
 
 static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 	struct hlist_nulls_node *n;
 
 	for (st->bucket = 0;
-	     st->bucket < net->ct.htable_size;
+	     st->bucket < nf_conntrack_htable_size;
 	     st->bucket++) {
 		n = rcu_dereference(
-			hlist_nulls_first_rcu(&net->ct.hash[st->bucket]));
+			hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
 		if (!is_a_nulls(n))
 			return n;
 	}
@@ -49,17 +48,16 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
 				      struct hlist_nulls_node *head)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 
 	head = rcu_dereference(hlist_nulls_next_rcu(head));
 	while (is_a_nulls(head)) {
 		if (likely(get_nulls_value(head) == st->bucket)) {
-			if (++st->bucket >= net->ct.htable_size)
+			if (++st->bucket >= nf_conntrack_htable_size)
 				return NULL;
 		}
 		head = rcu_dereference(
-			hlist_nulls_first_rcu(&net->ct.hash[st->bucket]));
+			hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
 	}
 	return head;
 }
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index c29b929..9091d48 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -68,6 +68,9 @@ EXPORT_SYMBOL_GPL(nf_conntrack_locks);
 __cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock);
 EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
 
+struct hlist_nulls_head *nf_conntrack_hash __read_mostly;
+EXPORT_SYMBOL_GPL(nf_conntrack_hash);
+
 static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
 static __read_mostly seqcount_t nf_conntrack_generation;
 static __read_mostly bool nf_conntrack_locks_all;
@@ -163,9 +166,9 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
 		      tuple->dst.protonum));
 }
 
-static u32 hash_bucket(u32 hash, const struct net *net)
+static u32 scale_hash(u32 hash)
 {
-	return reciprocal_scale(hash, net->ct.htable_size);
+	return reciprocal_scale(hash, nf_conntrack_htable_size);
 }
 
 static u32 __hash_conntrack(const struct net *net,
@@ -178,7 +181,7 @@ static u32 __hash_conntrack(const struct net *net,
 static u32 hash_conntrack(const struct net *net,
 			  const struct nf_conntrack_tuple *tuple)
 {
-	return __hash_conntrack(net, tuple, net->ct.htable_size);
+	return scale_hash(hash_conntrack_raw(tuple, net));
 }
 
 bool
@@ -477,8 +480,8 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
 begin:
 	do {
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
-		bucket = hash_bucket(hash, net);
-		ct_hash = net->ct.hash;
+		bucket = scale_hash(hash);
+		ct_hash = nf_conntrack_hash;
 	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) {
@@ -542,12 +545,10 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
 				       unsigned int hash,
 				       unsigned int reply_hash)
 {
-	struct net *net = nf_ct_net(ct);
-
 	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
-			   &net->ct.hash[hash]);
+			   &nf_conntrack_hash[hash]);
 	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnnode,
-			   &net->ct.hash[reply_hash]);
+			   &nf_conntrack_hash[reply_hash]);
 }
 
 int
@@ -572,12 +573,12 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
 	} while (nf_conntrack_double_lock(net, hash, reply_hash, sequence));
 
 	/* See if there's one in the list already, including reverse */
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
 				    zone, net))
 			goto out;
 
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
 				    zone, net))
 			goto out;
@@ -632,7 +633,7 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
 		/* reuse the hash saved before */
 		hash = *(unsigned long *)&ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev;
-		hash = hash_bucket(hash, net);
+		hash = scale_hash(hash);
 		reply_hash = hash_conntrack(net,
 					   &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
 
@@ -662,12 +663,12 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 	/* See if there's one in the list already, including reverse:
 	   NAT could have grabbed it without realizing, since we're
 	   not in the hash.  If there is, we lost race. */
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
 				    zone, net))
 			goto out;
 
-	hlist_nulls_for_each_entry(h, n, &net->ct.hash[reply_hash], hnnode)
+	hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[reply_hash], hnnode)
 		if (nf_ct_key_equal(h, &ct->tuplehash[IP_CT_DIR_REPLY].tuple,
 				    zone, net))
 			goto out;
@@ -735,7 +736,7 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
 	do {
 		sequence = read_seqcount_begin(&nf_conntrack_generation);
 		hash = hash_conntrack(net, tuple);
-		ct_hash = net->ct.hash;
+		ct_hash = nf_conntrack_hash;
 	} while (read_seqcount_retry(&nf_conntrack_generation, sequence));
 
 	hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) {
@@ -772,16 +773,16 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 	local_bh_disable();
 restart:
 	sequence = read_seqcount_begin(&nf_conntrack_generation);
-	hash = hash_bucket(_hash, net);
-	for (; i < net->ct.htable_size; i++) {
+	hash = scale_hash(_hash);
+	for (; i < nf_conntrack_htable_size; i++) {
 		lockp = &nf_conntrack_locks[hash % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
 		if (read_seqcount_retry(&nf_conntrack_generation, sequence)) {
 			spin_unlock(lockp);
 			goto restart;
 		}
-		hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash],
-					 hnnode) {
+		hlist_nulls_for_each_entry_rcu(h, n, &nf_conntrack_hash[hash],
+					       hnnode) {
 			tmp = nf_ct_tuplehash_to_ctrack(h);
 			if (!test_bit(IPS_ASSURED_BIT, &tmp->status) &&
 			    !nf_ct_is_dying(tmp) &&
@@ -792,7 +793,7 @@ restart:
 			cnt++;
 		}
 
-		hash = (hash + 1) % net->ct.htable_size;
+		hash = (hash + 1) % nf_conntrack_htable_size;
 		spin_unlock(lockp);
 
 		if (ct || cnt >= NF_CT_EVICTION_RANGE)
@@ -1375,12 +1376,12 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
 	int cpu;
 	spinlock_t *lockp;
 
-	for (; *bucket < net->ct.htable_size; (*bucket)++) {
+	for (; *bucket < nf_conntrack_htable_size; (*bucket)++) {
 		lockp = &nf_conntrack_locks[*bucket % CONNTRACK_LOCKS];
 		local_bh_disable();
 		nf_conntrack_lock(lockp);
-		if (*bucket < net->ct.htable_size) {
-			hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], hnnode) {
+		if (*bucket < nf_conntrack_htable_size) {
+			hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[*bucket], hnnode) {
 				if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 					continue;
 				ct = nf_ct_tuplehash_to_ctrack(h);
@@ -1477,6 +1478,8 @@ void nf_conntrack_cleanup_end(void)
 	while (untrack_refs() > 0)
 		schedule();
 
+	nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size);
+
 #ifdef CONFIG_NF_CONNTRACK_ZONES
 	nf_ct_extend_unregister(&nf_ct_zone_extend);
 #endif
@@ -1527,7 +1530,6 @@ i_see_dead_people:
 	}
 
 	list_for_each_entry(net, net_exit_list, exit_list) {
-		nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
 		nf_conntrack_proto_pernet_fini(net);
 		nf_conntrack_helper_pernet_fini(net);
 		nf_conntrack_ecache_pernet_fini(net);
@@ -1598,10 +1600,10 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 	 * though since that required taking the locks.
 	 */
 
-	for (i = 0; i < init_net.ct.htable_size; i++) {
-		while (!hlist_nulls_empty(&init_net.ct.hash[i])) {
-			h = hlist_nulls_entry(init_net.ct.hash[i].first,
-					struct nf_conntrack_tuple_hash, hnnode);
+	for (i = 0; i < nf_conntrack_htable_size; i++) {
+		while (!hlist_nulls_empty(&nf_conntrack_hash[i])) {
+			h = hlist_nulls_entry(nf_conntrack_hash[i].first,
+					      struct nf_conntrack_tuple_hash, hnnode);
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			hlist_nulls_del_rcu(&h->hnnode);
 			bucket = __hash_conntrack(nf_ct_net(ct),
@@ -1609,11 +1611,11 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 			hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]);
 		}
 	}
-	old_size = init_net.ct.htable_size;
-	old_hash = init_net.ct.hash;
+	old_size = nf_conntrack_htable_size;
+	old_hash = nf_conntrack_hash;
 
-	init_net.ct.htable_size = nf_conntrack_htable_size = hashsize;
-	init_net.ct.hash = hash;
+	nf_conntrack_hash = hash;
+	nf_conntrack_htable_size = hashsize;
 
 	write_seqcount_end(&nf_conntrack_generation);
 	nf_conntrack_all_unlock();
@@ -1669,6 +1671,11 @@ int nf_conntrack_init_start(void)
 		 * entries. */
 		max_factor = 4;
 	}
+
+	nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);
+	if (!nf_conntrack_hash)
+		return -ENOMEM;
+
 	nf_conntrack_max = max_factor * nf_conntrack_htable_size;
 
 	printk(KERN_INFO "nf_conntrack version %s (%u buckets, %d max)\n",
@@ -1747,6 +1754,7 @@ err_tstamp:
 err_acct:
 	nf_conntrack_expect_fini();
 err_expect:
+	nf_ct_free_hashtable(nf_conntrack_hash, nf_conntrack_htable_size);
 	return ret;
 }
 
@@ -1799,12 +1807,6 @@ int nf_conntrack_init_net(struct net *net)
 		goto err_cache;
 	}
 
-	net->ct.htable_size = nf_conntrack_htable_size;
-	net->ct.hash = nf_ct_alloc_hashtable(&net->ct.htable_size, 1);
-	if (!net->ct.hash) {
-		printk(KERN_ERR "Unable to create nf_conntrack_hash\n");
-		goto err_hash;
-	}
 	ret = nf_conntrack_expect_pernet_init(net);
 	if (ret < 0)
 		goto err_expect;
@@ -1836,8 +1838,6 @@ err_tstamp:
 err_acct:
 	nf_conntrack_expect_pernet_fini(net);
 err_expect:
-	nf_ct_free_hashtable(net->ct.hash, net->ct.htable_size);
-err_hash:
 	kmem_cache_destroy(net->ct.nf_conntrack_cachep);
 err_cache:
 	kfree(net->ct.slabname);
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index 498bf74..cb48e6a 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -424,10 +424,10 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
 		spin_unlock_bh(&pcpu->lock);
 	}
 	local_bh_disable();
-	for (i = 0; i < net->ct.htable_size; i++) {
+	for (i = 0; i < nf_conntrack_htable_size; i++) {
 		nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
-		if (i < net->ct.htable_size) {
-			hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode)
+		if (i < nf_conntrack_htable_size) {
+			hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
 				unhelp(h, me);
 		}
 		spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index f6bbcb2..e00f178 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -824,16 +824,16 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 	last = (struct nf_conn *)cb->args[1];
 
 	local_bh_disable();
-	for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) {
+	for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
 restart:
 		lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
-		if (cb->args[0] >= net->ct.htable_size) {
+		if (cb->args[0] >= nf_conntrack_htable_size) {
 			spin_unlock(lockp);
 			goto out;
 		}
-		hlist_nulls_for_each_entry(h, n, &net->ct.hash[cb->args[0]],
-					 hnnode) {
+		hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
+					   hnnode) {
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 0f1a45b..f87e84e 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -54,14 +54,13 @@ struct ct_iter_state {
 
 static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 	struct hlist_nulls_node *n;
 
 	for (st->bucket = 0;
-	     st->bucket < net->ct.htable_size;
+	     st->bucket < nf_conntrack_htable_size;
 	     st->bucket++) {
-		n = rcu_dereference(hlist_nulls_first_rcu(&net->ct.hash[st->bucket]));
+		n = rcu_dereference(hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket]));
 		if (!is_a_nulls(n))
 			return n;
 	}
@@ -71,18 +70,17 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq)
 static struct hlist_nulls_node *ct_get_next(struct seq_file *seq,
 				      struct hlist_nulls_node *head)
 {
-	struct net *net = seq_file_net(seq);
 	struct ct_iter_state *st = seq->private;
 
 	head = rcu_dereference(hlist_nulls_next_rcu(head));
 	while (is_a_nulls(head)) {
 		if (likely(get_nulls_value(head) == st->bucket)) {
-			if (++st->bucket >= net->ct.htable_size)
+			if (++st->bucket >= nf_conntrack_htable_size)
 				return NULL;
 		}
 		head = rcu_dereference(
 				hlist_nulls_first_rcu(
-					&net->ct.hash[st->bucket]));
+					&nf_conntrack_hash[st->bucket]));
 	}
 	return head;
 }
@@ -458,7 +456,7 @@ static struct ctl_table nf_ct_sysctl_table[] = {
 	},
 	{
 		.procname       = "nf_conntrack_buckets",
-		.data           = &init_net.ct.htable_size,
+		.data           = &nf_conntrack_htable_size,
 		.maxlen         = sizeof(unsigned int),
 		.mode           = 0444,
 		.proc_handler   = proc_dointvec,
@@ -512,7 +510,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 		goto out_kmemdup;
 
 	table[1].data = &net->ct.count;
-	table[2].data = &net->ct.htable_size;
 	table[3].data = &net->ct.sysctl_checksum;
 	table[4].data = &net->ct.sysctl_log_invalid;
 
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 3d52271..d74e716 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -824,7 +824,7 @@ nfnetlink_parse_nat_setup(struct nf_conn *ct,
 static int __net_init nf_nat_net_init(struct net *net)
 {
 	/* Leave them the same for the moment. */
-	net->ct.nat_htable_size = net->ct.htable_size;
+	net->ct.nat_htable_size = nf_conntrack_htable_size;
 	net->ct.nat_bysource = nf_ct_alloc_hashtable(&net->ct.nat_htable_size, 0);
 	if (!net->ct.nat_bysource)
 		return -ENOMEM;
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 2671b9d..3c84f14 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -306,10 +306,10 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout)
 	int i;
 
 	local_bh_disable();
-	for (i = 0; i < net->ct.htable_size; i++) {
+	for (i = 0; i < nf_conntrack_htable_size; i++) {
 		nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
-		if (i < net->ct.htable_size) {
-			hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode)
+		if (i < nf_conntrack_htable_size) {
+			hlist_nulls_for_each_entry(h, nn, &nf_conntrack_hash[i], hnnode)
 				untimeout(h, timeout);
 		}
 		spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (10 preceding siblings ...)
  2016-05-02 16:39 ` [PATCH v2 nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
@ 2016-05-02 16:40 ` Florian Westphal
  2016-05-02 22:25 ` [PATCH v3 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-05-02 16:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

When iterating, skip conntrack entries living in a different netns.

We could ignore netns and kill some other non-assured one, but it
has two problems:

- a netns can kill non-assured conntracks in other namespace
- we would start to 'over-subscribe' the affected/overlimit netns.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_core.c | 43 +++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 9091d48..c02e935 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -763,18 +763,20 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 {
 	/* Use oldest entry, which is roughly LRU */
 	struct nf_conntrack_tuple_hash *h;
-	struct nf_conn *ct = NULL, *tmp;
+	struct nf_conn *tmp;
 	struct hlist_nulls_node *n;
-	unsigned int i = 0, cnt = 0;
-	int dropped = 0;
-	unsigned int hash, sequence;
+	unsigned int i, hash, sequence;
+	struct nf_conn *ct = NULL;
 	spinlock_t *lockp;
+	bool ret = false;
+
+	i = 0;
 
 	local_bh_disable();
 restart:
 	sequence = read_seqcount_begin(&nf_conntrack_generation);
-	hash = scale_hash(_hash);
-	for (; i < nf_conntrack_htable_size; i++) {
+	for (; i < NF_CT_EVICTION_RANGE; i++) {
+		hash = scale_hash(_hash++);
 		lockp = &nf_conntrack_locks[hash % CONNTRACK_LOCKS];
 		nf_conntrack_lock(lockp);
 		if (read_seqcount_retry(&nf_conntrack_generation, sequence)) {
@@ -784,35 +786,40 @@ restart:
 		hlist_nulls_for_each_entry_rcu(h, n, &nf_conntrack_hash[hash],
 					       hnnode) {
 			tmp = nf_ct_tuplehash_to_ctrack(h);
-			if (!test_bit(IPS_ASSURED_BIT, &tmp->status) &&
-			    !nf_ct_is_dying(tmp) &&
-			    atomic_inc_not_zero(&tmp->ct_general.use)) {
+
+			if (test_bit(IPS_ASSURED_BIT, &tmp->status) ||
+			    !net_eq(nf_ct_net(tmp), net) ||
+			    nf_ct_is_dying(tmp))
+				continue;
+
+			if (atomic_inc_not_zero(&tmp->ct_general.use)) {
 				ct = tmp;
 				break;
 			}
-			cnt++;
 		}
 
-		hash = (hash + 1) % nf_conntrack_htable_size;
 		spin_unlock(lockp);
-
-		if (ct || cnt >= NF_CT_EVICTION_RANGE)
+		if (ct)
 			break;
-
 	}
+
 	local_bh_enable();
 
 	if (!ct)
-		return dropped;
+		return false;
 
-	if (del_timer(&ct->timeout)) {
+	/* kill only if in same netns -- might have moved due to
+	 * SLAB_DESTROY_BY_RCU rules
+	 */
+	if (net_eq(nf_ct_net(ct), net) && del_timer(&ct->timeout)) {
 		if (nf_ct_delete(ct, 0, 0)) {
-			dropped = 1;
 			NF_CT_STAT_INC_ATOMIC(net, early_drop);
+			ret = true;
 		}
 	}
+
 	nf_ct_put(ct);
-	return dropped;
+	return ret;
 }
 
 static struct nf_conn *
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash
  2016-05-02 16:39 ` [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
@ 2016-05-02 16:51   ` Eric Dumazet
  2016-05-02 21:52     ` Florian Westphal
  0 siblings, 1 reply; 33+ messages in thread
From: Eric Dumazet @ 2016-05-02 16:51 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Mon, 2016-05-02 at 18:39 +0200, Florian Westphal wrote:
> Once we place all conntracks into a global hash table we want them to be
> spread across entire hash table, even if namespaces have overlapping ip
> addresses.
>  
> +static inline u32 nf_conntrack_netns_hash(const struct net *net)
> +{
> +#ifdef CONFIG_NET_NS
> +	return hash32_ptr(net);
> +#else
> +	return 0;
> +#endif
> +}
> +

Are you reinventing net_hash_mix() ?

If net_hash_mix() is not good enough, please fix it ;)

hash_ptr() is not that good, as ongoing thread in lkml shows.

Thanks.



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash
  2016-05-02 16:51   ` Eric Dumazet
@ 2016-05-02 21:52     ` Florian Westphal
  0 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-05-02 21:52 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-05-02 at 18:39 +0200, Florian Westphal wrote:
> > +static inline u32 nf_conntrack_netns_hash(const struct net *net)
> > +{
> > +#ifdef CONFIG_NET_NS
> > +	return hash32_ptr(net);
> > +#else
> > +	return 0;
> > +#endif
> > +}
> > +
> 
> Are you reinventing net_hash_mix() ?

Yes, will respin, thanks Eric!

> If net_hash_mix() is not good enough, please fix it ;)

No, I did not know about it, thats all.  It will do just fine, thanks
for the hint!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3 nf-next 7/9] netfilter: conntrack: make netns address part of hash
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (11 preceding siblings ...)
  2016-05-02 16:40 ` [PATCH v2 nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic Florian Westphal
@ 2016-05-02 22:25 ` Florian Westphal
  2016-05-03 22:30 ` [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 33+ messages in thread
From: Florian Westphal @ 2016-05-02 22:25 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal

Once we place all conntracks into a global hash table we want them to be
spread across entire hash table, even if namespaces have overlapping ip
addresses.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v2:
 use net_hash_mix() instead of re-inventing it (Eric Dumazet)
 Changes since v1:
 use hash32_ptr instead of hash_ptr
 net/netfilter/nf_conntrack_core.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 10ae2ee..ebafa77 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -54,6 +54,7 @@
 #include <net/netfilter/nf_nat.h>
 #include <net/netfilter/nf_nat_core.h>
 #include <net/netfilter/nf_nat_helper.h>
+#include <net/netns/hash.h>
 
 #define NF_CONNTRACK_VERSION	"0.5.0"
 
@@ -144,9 +145,11 @@ EXPORT_PER_CPU_SYMBOL(nf_conntrack_untracked);
 
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
-static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple)
+static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
+			      const struct net *net)
 {
 	unsigned int n;
+	u32 seed;
 
 	get_random_once(&nf_conntrack_hash_rnd, sizeof(nf_conntrack_hash_rnd));
 
@@ -154,32 +157,29 @@ static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple)
 	 * destination ports (which is a multiple of 4) and treat the last
 	 * three bytes manually.
 	 */
+	seed = nf_conntrack_hash_rnd ^ net_hash_mix(net);
 	n = (sizeof(tuple->src) + sizeof(tuple->dst.u3)) / sizeof(u32);
-	return jhash2((u32 *)tuple, n, nf_conntrack_hash_rnd ^
+	return jhash2((u32 *)tuple, n, seed ^
 		      (((__force __u16)tuple->dst.u.all << 16) |
 		      tuple->dst.protonum));
 }
 
-static u32 __hash_bucket(u32 hash, unsigned int size)
-{
-	return reciprocal_scale(hash, size);
-}
-
 static u32 hash_bucket(u32 hash, const struct net *net)
 {
-	return __hash_bucket(hash, net->ct.htable_size);
+	return reciprocal_scale(hash, net->ct.htable_size);
 }
 
-static u_int32_t __hash_conntrack(const struct nf_conntrack_tuple *tuple,
-				  unsigned int size)
+static u32 __hash_conntrack(const struct net *net,
+			    const struct nf_conntrack_tuple *tuple,
+			    unsigned int size)
 {
-	return __hash_bucket(hash_conntrack_raw(tuple), size);
+	return reciprocal_scale(hash_conntrack_raw(tuple, net), size);
 }
 
-static inline u_int32_t hash_conntrack(const struct net *net,
-				       const struct nf_conntrack_tuple *tuple)
+static u32 hash_conntrack(const struct net *net,
+			  const struct nf_conntrack_tuple *tuple)
 {
-	return __hash_conntrack(tuple, net->ct.htable_size);
+	return __hash_conntrack(net, tuple, net->ct.htable_size);
 }
 
 bool
@@ -535,7 +535,7 @@ nf_conntrack_find_get(struct net *net, const struct nf_conntrack_zone *zone,
 		      const struct nf_conntrack_tuple *tuple)
 {
 	return __nf_conntrack_find_get(net, zone, tuple,
-				       hash_conntrack_raw(tuple));
+				       hash_conntrack_raw(tuple, net));
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_find_get);
 
@@ -1041,7 +1041,7 @@ resolve_normal_ct(struct net *net, struct nf_conn *tmpl,
 
 	/* look for tuple match */
 	zone = nf_ct_zone_tmpl(tmpl, skb, &tmp);
-	hash = hash_conntrack_raw(&tuple);
+	hash = hash_conntrack_raw(&tuple, net);
 	h = __nf_conntrack_find_get(net, zone, &tuple, hash);
 	if (!h) {
 		h = init_conntrack(net, tmpl, &tuple, l3proto, l4proto,
@@ -1605,7 +1605,8 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
 					struct nf_conntrack_tuple_hash, hnnode);
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			hlist_nulls_del_rcu(&h->hnnode);
-			bucket = __hash_conntrack(&h->tuple, hashsize);
+			bucket = __hash_conntrack(nf_ct_net(ct),
+						  &h->tuple, hashsize);
 			hlist_nulls_add_head_rcu(&h->hnnode, &hash[bucket]);
 		}
 	}
-- 
2.7.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table
  2016-04-28 17:13 ` [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table Florian Westphal
@ 2016-05-03 17:03   ` Pablo Neira Ayuso
  2016-05-03 17:17     ` Florian Westphal
  0 siblings, 1 reply; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-03 17:03 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Thu, Apr 28, 2016 at 07:13:42PM +0200, Florian Westphal wrote:
> Once we place all conntracks into same table iteration becomes more
> costly because the table contains conntracks that we are not interested
> in (belonging to other netns).
> 
> So don't bother scanning if the current namespace has no entries.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/nf_conntrack_core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 29fa08b..f2e75a5 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1428,6 +1428,9 @@ void nf_ct_iterate_cleanup(struct net *net,
>  
>  	might_sleep();
>  
> +	if (atomic_read(&net->ct.count) == 0)
> +		return;

This optimization gets defeated with just one single conntrack (ie.
net->ct.count == 1), so I wonder if this is practical thing.

At the cost of consuming more memory per conntrack, we may consider
adding a per-net list so this iteration doesn't become a problem.

>  	while ((ct = get_next_corpse(net, iter, data, &bucket)) != NULL) {
>  		/* Time to push up daises... */
>  		if (del_timer(&ct->timeout))
> -- 
> 2.7.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table
  2016-05-03 17:03   ` Pablo Neira Ayuso
@ 2016-05-03 17:17     ` Florian Westphal
  2016-05-03 17:41       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-05-03 17:17 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, netdev

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Thu, Apr 28, 2016 at 07:13:42PM +0200, Florian Westphal wrote:
> > Once we place all conntracks into same table iteration becomes more
> > costly because the table contains conntracks that we are not interested
> > in (belonging to other netns).
> > 
> > So don't bother scanning if the current namespace has no entries.
> > 
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  net/netfilter/nf_conntrack_core.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 29fa08b..f2e75a5 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1428,6 +1428,9 @@ void nf_ct_iterate_cleanup(struct net *net,
> >  
> >  	might_sleep();
> >  
> > +	if (atomic_read(&net->ct.count) == 0)
> > +		return;
> 
> This optimization gets defeated with just one single conntrack (ie.
> net->ct.count == 1), so I wonder if this is practical thing.

I was thinking of the cleanup we do in the netns exit path
(in nf_conntrack_cleanup_net_list() ).

If you don't like this I can move the check here:

i_see_dead_people:
    busy = 0;
    list_for_each_entry(net, net_exit_list, exit_list) {
    // here
    if (atomic_read .. > 0)
       nf_ct_iterate_cleanup(net, kill_all, ...

> At the cost of consuming more memory per conntrack, we may consider
> adding a per-net list so this iteration doesn't become a problem.

I don't think that will be needed.   We don't have any such iterations
in the fast path.

For dumps via ctnetlink it shouldn't be a big deal either, if needed
we can optimize that to use rcu readlocks only and 'upgrade' to locked
path only when we want to dump the candidate ct.
for deferred pruning).

early_drop will go away soon (i'll rework it to do the early_drop from
work queue).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table
  2016-05-03 17:17     ` Florian Westphal
@ 2016-05-03 17:41       ` Pablo Neira Ayuso
  2016-05-03 17:55         ` Florian Westphal
  0 siblings, 1 reply; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-03 17:41 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Tue, May 03, 2016 at 07:17:44PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Thu, Apr 28, 2016 at 07:13:42PM +0200, Florian Westphal wrote:
> > > Once we place all conntracks into same table iteration becomes more
> > > costly because the table contains conntracks that we are not interested
> > > in (belonging to other netns).
> > > 
> > > So don't bother scanning if the current namespace has no entries.
> > > 
> > > Signed-off-by: Florian Westphal <fw@strlen.de>
> > > ---
> > >  net/netfilter/nf_conntrack_core.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > index 29fa08b..f2e75a5 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -1428,6 +1428,9 @@ void nf_ct_iterate_cleanup(struct net *net,
> > >  
> > >  	might_sleep();
> > >  
> > > +	if (atomic_read(&net->ct.count) == 0)
> > > +		return;
> > 
> > This optimization gets defeated with just one single conntrack (ie.
> > net->ct.count == 1), so I wonder if this is practical thing.
> 
> I was thinking of the cleanup we do in the netns exit path
> (in nf_conntrack_cleanup_net_list() ).

Right, but in that path we still have entries in the table.

> If you don't like this I can move the check here:
> 
> i_see_dead_people:
>     busy = 0;
>     list_for_each_entry(net, net_exit_list, exit_list) {
>     // here
>     if (atomic_read .. > 0)
>        nf_ct_iterate_cleanup(net, kill_all, ...

I don't mind about placing this or there, as I said, my question is
how often we will hit this optimization in a real scenario.

If you think the answer is often, then this will help.

Otherwise, every time we'll go container destruction path, we'll hit
slow path, ie.  scanning the full table.

> > At the cost of consuming more memory per conntrack, we may consider
> > adding a per-net list so this iteration doesn't become a problem.
> 
> I don't think that will be needed.   We don't have any such iterations
> in the fast path.
>
> For dumps via ctnetlink it shouldn't be a big deal either, if needed
> we can optimize that to use rcu readlocks only and 'upgrade' to locked
> path only when we want to dump the candidate ct.
> for deferred pruning).
> early_drop will go away soon (i'll rework it to do the early_drop from
> work queue).

OK.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table
  2016-05-03 17:41       ` Pablo Neira Ayuso
@ 2016-05-03 17:55         ` Florian Westphal
  2016-05-03 22:27           ` Pablo Neira Ayuso
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-05-03 17:55 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, netdev

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I was thinking of the cleanup we do in the netns exit path
> > (in nf_conntrack_cleanup_net_list() ).
> 
> Right, but in that path we still have entries in the table.

Not necessarily, they might have already been removed
(timeout, close).

> > If you don't like this I can move the check here:
> > 
> > i_see_dead_people:
> >     busy = 0;
> >     list_for_each_entry(net, net_exit_list, exit_list) {
> >     // here
> >     if (atomic_read .. > 0)
> >        nf_ct_iterate_cleanup(net, kill_all, ...
> 
> I don't mind about placing this or there, as I said, my question is
> how often we will hit this optimization in a real scenario.
> 
> If you think the answer is often, then this will help.

I think the extra atomic_read in this code does no harm and
saves us the entire scan.  Also, in the exit path, when we hit the
'i_see_dead_people' label we restart the entire loop, so if we
have 200 netns on the list and the last one caused that restart,
we re-iterate needlesly for 199 netns...

> Otherwise, every time we'll go container destruction path, we'll hit
> slow path, ie.  scanning the full table.

Yes, but I see no other choice.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf
  2016-04-28 17:13 ` [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf Florian Westphal
@ 2016-05-03 18:12   ` Pablo Neira Ayuso
  2016-05-03 22:27     ` Florian Westphal
  2016-05-03 22:28     ` Pablo Neira Ayuso
  0 siblings, 2 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-03 18:12 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Thu, Apr 28, 2016 at 07:13:44PM +0200, Florian Westphal wrote:
> The iteration process is lockless, so we test if the conntrack object is
> eligible for printing (e.g. is AF_INET) after obtaining the reference
> count.
> 
> Once we put all conntracks into same hash table we might see more
> entries that need to be skipped.
> 
> So add a helper and first perform the test in a lockless fashion
> for fast skip.
> 
> Once we obtain the reference count, just repeat the check.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 24 +++++++++++++++++-----
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> index f0dfe92..483cf79 100644
> --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> @@ -114,6 +114,19 @@ static inline void ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
>  }
>  #endif
>  
> +static bool ct_seq_should_skip(const struct nf_conn *ct,
> +			       const struct nf_conntrack_tuple_hash *hash)
> +{
> +	/* we only want to print DIR_ORIGINAL */
> +	if (NF_CT_DIRECTION(hash))
> +		return true;
> +
> +	if (nf_ct_l3num(ct) != AF_INET)
> +		return true;
> +
> +	return false;
> +}
> +
>  static int ct_seq_show(struct seq_file *s, void *v)
>  {
>  	struct nf_conntrack_tuple_hash *hash = v;
> @@ -123,14 +136,15 @@ static int ct_seq_show(struct seq_file *s, void *v)
>  	int ret = 0;
>  
>  	NF_CT_ASSERT(ct);
> -	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
> +	if (ct_seq_should_skip(ct, hash))
>  		return 0;
>  
> +	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
> +		return 0;
>  
> -	/* we only want to print DIR_ORIGINAL */
> -	if (NF_CT_DIRECTION(hash))
> -		goto release;
> -	if (nf_ct_l3num(ct) != AF_INET)
> +	/* check if we raced w. object reuse */
> +	if (!nf_ct_is_confirmed(ct) ||

This refactoring includes this new check, is this intentional?

> +	    ct_seq_should_skip(ct, hash))
>  		goto release;
>  
>  	l3proto = __nf_ct_l3proto_find(nf_ct_l3num(ct));
> -- 
> 2.7.3
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf
  2016-05-03 18:12   ` Pablo Neira Ayuso
@ 2016-05-03 22:27     ` Florian Westphal
  2016-05-04  9:19       ` Pablo Neira Ayuso
  2016-05-03 22:28     ` Pablo Neira Ayuso
  1 sibling, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-05-03 22:27 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, netdev

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > -	if (NF_CT_DIRECTION(hash))
> > -		goto release;
> > -	if (nf_ct_l3num(ct) != AF_INET)
> > +	/* check if we raced w. object reuse */
> > +	if (!nf_ct_is_confirmed(ct) ||
> 
> This refactoring includes this new check, is this intentional?

Hmm, yes and no.

I should have put it in an extra commit :-/

Without this, we might erronously print a conntrack that is NEW
and which isn't confirmed yet.

We won't crash since seq_print doesn't depend on extensions being
set up properly, but it seems better to only display those conntracks
that are part of the conntrack hash table (i.e., have the confirmed bit
set).

Let me know if you want me to respin this as a separate fix, thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table
  2016-05-03 17:55         ` Florian Westphal
@ 2016-05-03 22:27           ` Pablo Neira Ayuso
  0 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-03 22:27 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Tue, May 03, 2016 at 07:55:59PM +0200, Florian Westphal wrote:
> > Otherwise, every time we'll go container destruction path, we'll hit
> > slow path, ie.  scanning the full table.
> 
> Yes, but I see no other choice.

Fair enough, will place this in nf-next, thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf
  2016-05-03 18:12   ` Pablo Neira Ayuso
  2016-05-03 22:27     ` Florian Westphal
@ 2016-05-03 22:28     ` Pablo Neira Ayuso
  1 sibling, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-03 22:28 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Tue, May 03, 2016 at 08:12:50PM +0200, Pablo Neira Ayuso wrote:
> On Thu, Apr 28, 2016 at 07:13:44PM +0200, Florian Westphal wrote:
> > The iteration process is lockless, so we test if the conntrack object is
> > eligible for printing (e.g. is AF_INET) after obtaining the reference
> > count.
> > 
> > Once we put all conntracks into same hash table we might see more
> > entries that need to be skipped.
> > 
> > So add a helper and first perform the test in a lockless fashion
> > for fast skip.
> > 
> > Once we obtain the reference count, just repeat the check.
> > 
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   | 24 +++++++++++++++++-----
> >  1 file changed, 19 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> > index f0dfe92..483cf79 100644
> > --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> > +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> > @@ -114,6 +114,19 @@ static inline void ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
> >  }
> >  #endif
> >  
> > +static bool ct_seq_should_skip(const struct nf_conn *ct,
> > +			       const struct nf_conntrack_tuple_hash *hash)
> > +{
> > +	/* we only want to print DIR_ORIGINAL */
> > +	if (NF_CT_DIRECTION(hash))
> > +		return true;
> > +
> > +	if (nf_ct_l3num(ct) != AF_INET)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> >  static int ct_seq_show(struct seq_file *s, void *v)
> >  {
> >  	struct nf_conntrack_tuple_hash *hash = v;
> > @@ -123,14 +136,15 @@ static int ct_seq_show(struct seq_file *s, void *v)
> >  	int ret = 0;
> >  
> >  	NF_CT_ASSERT(ct);
> > -	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
> > +	if (ct_seq_should_skip(ct, hash))
> >  		return 0;
> >  
> > +	if (unlikely(!atomic_inc_not_zero(&ct->ct_general.use)))
> > +		return 0;
> >  
> > -	/* we only want to print DIR_ORIGINAL */
> > -	if (NF_CT_DIRECTION(hash))
> > -		goto release;
> > -	if (nf_ct_l3num(ct) != AF_INET)
> > +	/* check if we raced w. object reuse */
> > +	if (!nf_ct_is_confirmed(ct) ||
> 
> This refactoring includes this new check, is this intentional?

It seems this check was previously missing, I can just amend the
commit log with a couple of lines to document that this patch also
includes this missing check. No problem.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (12 preceding siblings ...)
  2016-05-02 22:25 ` [PATCH v3 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
@ 2016-05-03 22:30 ` Pablo Neira Ayuso
  2016-05-05 11:54 ` Pablo Neira Ayuso
  2016-05-05 20:27 ` Brian Haley
  15 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-03 22:30 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Thu, Apr 28, 2016 at 07:13:39PM +0200, Florian Westphal wrote:
> [ CCing netdev so netns folks can have a look too ]
> 
> This patch series removes the per-netns connection tracking tables.
> All conntrack objects are then stored in one global global table.
> 
> This avoids the infamous 'vmalloc' when lots of namespaces are used:
> We no longer allocate a new conntrack table for each namespace (with 64k
> size this saves 512kb of memory per netns).
> 
> - net namespace address is made part of conntrack hash, to spread
>   conntracks over entire table even if netns has overlapping ip addresses.
> - lookup and iterators net_eq() to skip conntracks living in a different
>   namespace.
> 
> Only the main conntrack table is converted here:
> NAT bysrc and expectation hashes are still per namespace (will be unified
> in a followup series).  Also, this retains the per-namespace kmem cache
> for the conntrack objects.  This will also be resolved in a followup series.

This rework in important, I'm going to place this batch in the tree so
you can keep working on this. Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf
  2016-05-03 22:27     ` Florian Westphal
@ 2016-05-04  9:19       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-04  9:19 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Wed, May 04, 2016 at 12:27:36AM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > -	if (NF_CT_DIRECTION(hash))
> > > -		goto release;
> > > -	if (nf_ct_l3num(ct) != AF_INET)
> > > +	/* check if we raced w. object reuse */
> > > +	if (!nf_ct_is_confirmed(ct) ||
> > 
> > This refactoring includes this new check, is this intentional?
> 
> Hmm, yes and no.
> 
> I should have put it in an extra commit :-/
> 
> Without this, we might erronously print a conntrack that is NEW
> and which isn't confirmed yet.
> 
> We won't crash since seq_print doesn't depend on extensions being
> set up properly, but it seems better to only display those conntracks
> that are part of the conntrack hash table (i.e., have the confirmed bit
> set).

I see, a conntrack that shouldn't be printed be sneak in the listing.

> Let me know if you want me to respin this as a separate fix, thanks!

I will just append a notice on the commit message before applying.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (13 preceding siblings ...)
  2016-05-03 22:30 ` [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Pablo Neira Ayuso
@ 2016-05-05 11:54 ` Pablo Neira Ayuso
  2016-05-05 20:27 ` Brian Haley
  15 siblings, 0 replies; 33+ messages in thread
From: Pablo Neira Ayuso @ 2016-05-05 11:54 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On Thu, Apr 28, 2016 at 07:13:39PM +0200, Florian Westphal wrote:
> [ CCing netdev so netns folks can have a look too ]
> 
> This patch series removes the per-netns connection tracking tables.
> All conntrack objects are then stored in one global global table.
> 
> This avoids the infamous 'vmalloc' when lots of namespaces are used:
> We no longer allocate a new conntrack table for each namespace (with 64k
> size this saves 512kb of memory per netns).
> 
> - net namespace address is made part of conntrack hash, to spread
>   conntracks over entire table even if netns has overlapping ip addresses.
> - lookup and iterators net_eq() to skip conntracks living in a different
>   namespace.
> 
> Only the main conntrack table is converted here:
> NAT bysrc and expectation hashes are still per namespace (will be unified
> in a followup series).  Also, this retains the per-namespace kmem cache
> for the conntrack objects.  This will also be resolved in a followup series.

Series applied, thanks Florian.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
                   ` (14 preceding siblings ...)
  2016-05-05 11:54 ` Pablo Neira Ayuso
@ 2016-05-05 20:27 ` Brian Haley
  2016-05-05 20:54   ` Florian Westphal
  15 siblings, 1 reply; 33+ messages in thread
From: Brian Haley @ 2016-05-05 20:27 UTC (permalink / raw)
  To: Florian Westphal, netfilter-devel; +Cc: netdev

On 04/28/2016 01:13 PM, Florian Westphal wrote:
> [ CCing netdev so netns folks can have a look too ]
>
> This patch series removes the per-netns connection tracking tables.
> All conntrack objects are then stored in one global global table.
>
> This avoids the infamous 'vmalloc' when lots of namespaces are used:
> We no longer allocate a new conntrack table for each namespace (with 64k
> size this saves 512kb of memory per netns).
>
> - net namespace address is made part of conntrack hash, to spread
>    conntracks over entire table even if netns has overlapping ip addresses.
> - lookup and iterators net_eq() to skip conntracks living in a different
>    namespace.

Hi Florian,

Question on this series.

Openstack networking creates virtual routers using namespaces for isolation 
between users.  VETH pairs are used to connect the interfaces on these routers 
to different networks, whether they are internal (private) or external (public). 
  In most cases NAT is done inside the namespace as packets move between the 
networks.

I've seen cases where certain users are attacked, where the CT table is filled 
such that we start seeing "nf_conntrack: table full, dropping packet" messages 
(as expected).  But other users continue to function normally, unaffected.  Is 
this still the case - each netns has some limit it can't exceed?  I didn't see 
it, but your comment in 9/9 seemed like something was there -  "we would start 
to 'over-subscribe' the affected/overlimit netns".

Thanks,

-Brian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-05-05 20:27 ` Brian Haley
@ 2016-05-05 20:54   ` Florian Westphal
  2016-05-05 22:22     ` Brian Haley
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-05-05 20:54 UTC (permalink / raw)
  To: Brian Haley; +Cc: Florian Westphal, netfilter-devel, netdev

Brian Haley <brian.haley@hpe.com> wrote:
> Openstack networking creates virtual routers using namespaces for isolation
> between users.  VETH pairs are used to connect the interfaces on these
> routers to different networks, whether they are internal (private) or
> external (public).  In most cases NAT is done inside the namespace as
> packets move between the networks.
> 
> I've seen cases where certain users are attacked, where the CT table is
> filled such that we start seeing "nf_conntrack: table full, dropping packet"
> messages (as expected).  But other users continue to function normally,
> unaffected.  Is this still the case - each netns has some limit it can't
> exceed?

The limit is global, the accounting per namespace.

If the bucket count (net.netfilter.nf_conntrack_buckets) is high enough
to accomodate the expected load and noone can create arbitrary number of
net namespaces things are fine.

I haven't changed the way this works yet because I did not have a better
idea so far.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-05-05 20:54   ` Florian Westphal
@ 2016-05-05 22:22     ` Brian Haley
  2016-05-05 22:36       ` Florian Westphal
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Haley @ 2016-05-05 22:22 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On 05/05/2016 04:54 PM, Florian Westphal wrote:
> Brian Haley <brian.haley@hpe.com> wrote:
>> Openstack networking creates virtual routers using namespaces for isolation
>> between users.  VETH pairs are used to connect the interfaces on these
>> routers to different networks, whether they are internal (private) or
>> external (public).  In most cases NAT is done inside the namespace as
>> packets move between the networks.
>>
>> I've seen cases where certain users are attacked, where the CT table is
>> filled such that we start seeing "nf_conntrack: table full, dropping packet"
>> messages (as expected).  But other users continue to function normally,
>> unaffected.  Is this still the case - each netns has some limit it can't
>> exceed?
>
> The limit is global, the accounting per namespace.

So this is a change from existing.

> If the bucket count (net.netfilter.nf_conntrack_buckets) is high enough
> to accomodate the expected load and noone can create arbitrary number of
> net namespaces things are fine.

In my case we can't control the number of namespaces, each user will get one as 
a virtual router is created.  We could change how we size things, but that 
doesn't stop one user from consuming larger than their 1/N share of entries. 
Typically we just increase the number of systems hosting these "routers" when we 
hit a limit, which decreases the netns count per node.

> I haven't changed the way this works yet because I did not have a better
> idea so far.

Creating a per-netns maximum seems doable, but maybe not practical from the 
accounting side of things.  Can't think of anything else at the moment.

-Brian

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-05-05 22:22     ` Brian Haley
@ 2016-05-05 22:36       ` Florian Westphal
  2016-05-05 22:55         ` Brian Haley
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Westphal @ 2016-05-05 22:36 UTC (permalink / raw)
  To: Brian Haley; +Cc: Florian Westphal, netfilter-devel, netdev

Brian Haley <brian.haley@hpe.com> wrote:
> >>I've seen cases where certain users are attacked, where the CT table is
> >>filled such that we start seeing "nf_conntrack: table full, dropping packet"
> >>messages (as expected).  But other users continue to function normally,
> >>unaffected.  Is this still the case - each netns has some limit it can't
> >>exceed?
> >
> >The limit is global, the accounting per namespace.
> 
> So this is a change from existing.

No, see __nf_conntrack_alloc():

        if (nf_conntrack_max &&
            unlikely(atomic_read(&net->ct.count) > nf_conntrack_max)) {
		...

ct.count is whatever number of entries the namespace has allocated,
so max number of possible conntracks is always infinite if number
of net namespaces is unlimited (barring memory constraints, of course).

I did not change this.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1
  2016-05-05 22:36       ` Florian Westphal
@ 2016-05-05 22:55         ` Brian Haley
  0 siblings, 0 replies; 33+ messages in thread
From: Brian Haley @ 2016-05-05 22:55 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

On 05/05/2016 06:36 PM, Florian Westphal wrote:
> Brian Haley <brian.haley@hpe.com> wrote:
>>>> I've seen cases where certain users are attacked, where the CT table is
>>>> filled such that we start seeing "nf_conntrack: table full, dropping packet"
>>>> messages (as expected).  But other users continue to function normally,
>>>> unaffected.  Is this still the case - each netns has some limit it can't
>>>> exceed?
>>>
>>> The limit is global, the accounting per namespace.
>>
>> So this is a change from existing.
>
> No, see __nf_conntrack_alloc():
>
>          if (nf_conntrack_max &&
>              unlikely(atomic_read(&net->ct.count) > nf_conntrack_max)) {
> 		...
>
> ct.count is whatever number of entries the namespace has allocated,
> so max number of possible conntracks is always infinite if number
> of net namespaces is unlimited (barring memory constraints, of course).

Ah yes, nf_conntrack_max is a global, thanks for setting me straight.  So I 
guess the tuning might just include increasing the bucket count in order to try 
and keep the number of items in each one small since there will be more entries 
in this single table now.

Thanks,

-Brian

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-05-05 22:55 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-28 17:13 [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 1/9] netfilter: conntrack: keep BH enabled during lookup Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 2/9] netfilter: conntrack: fix lookup race during hash resize Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 3/9] netfilter: conntrack: don't attempt to iterate over empty table Florian Westphal
2016-05-03 17:03   ` Pablo Neira Ayuso
2016-05-03 17:17     ` Florian Westphal
2016-05-03 17:41       ` Pablo Neira Ayuso
2016-05-03 17:55         ` Florian Westphal
2016-05-03 22:27           ` Pablo Neira Ayuso
2016-04-28 17:13 ` [PATCH nf-next 4/9] netfilter: conntrack: use nf_ct_key_equal() in more places Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 5/9] netfilter: conntrack: small refactoring of conntrack seq_printf Florian Westphal
2016-05-03 18:12   ` Pablo Neira Ayuso
2016-05-03 22:27     ` Florian Westphal
2016-05-04  9:19       ` Pablo Neira Ayuso
2016-05-03 22:28     ` Pablo Neira Ayuso
2016-04-28 17:13 ` [PATCH nf-next 6/9] netfilter: conntrack: check netns when comparing conntrack objects Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
2016-04-29 15:04   ` Florian Westphal
2016-04-28 17:13 ` [PATCH nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic Florian Westphal
2016-05-02 16:39 ` [PATCH v2 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
2016-05-02 16:51   ` Eric Dumazet
2016-05-02 21:52     ` Florian Westphal
2016-05-02 16:39 ` [PATCH v2 nf-next 8/9] netfilter: conntrack: use a single hashtable for all namespaces Florian Westphal
2016-05-02 16:40 ` [PATCH v2 nf-next 9/9] netfilter: conntrack: consider ct netns in early_drop logic Florian Westphal
2016-05-02 22:25 ` [PATCH v3 nf-next 7/9] netfilter: conntrack: make netns address part of hash Florian Westphal
2016-05-03 22:30 ` [PATCH nf-next 0/9] netfilter: remove per-netns conntrack tables, part 1 Pablo Neira Ayuso
2016-05-05 11:54 ` Pablo Neira Ayuso
2016-05-05 20:27 ` Brian Haley
2016-05-05 20:54   ` Florian Westphal
2016-05-05 22:22     ` Brian Haley
2016-05-05 22:36       ` Florian Westphal
2016-05-05 22:55         ` Brian Haley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.