* [PATCH 1/6] netfilter: nft_byteorder: avoid unneeded le/be conversion steps
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
@ 2016-01-20 17:03 ` Pablo Neira Ayuso
2016-01-20 17:04 ` [PATCH 2/6] netfilter: ipset: allow a 0 netmask with hash_netiface type Pablo Neira Ayuso
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2016-01-20 17:03 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Florian Westphal <fw@strlen.de>
David points out that we to three le/be conversions instead
of just one. Doesn't matter on x86_64 w. gcc, but other
architectures might be less lucky.
Since it also simplifies code just follow his advice.
Fixes: c0f3275f5cb ("nftables: byteorder: provide le/be 64 bit conversion helper")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nft_byteorder.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nft_byteorder.c b/net/netfilter/nft_byteorder.c
index 383c171..b78c28b 100644
--- a/net/netfilter/nft_byteorder.c
+++ b/net/netfilter/nft_byteorder.c
@@ -46,16 +46,14 @@ static void nft_byteorder_eval(const struct nft_expr *expr,
switch (priv->op) {
case NFT_BYTEORDER_NTOH:
for (i = 0; i < priv->len / 8; i++) {
- src64 = get_unaligned_be64(&src[i]);
- src64 = be64_to_cpu((__force __be64)src64);
+ src64 = get_unaligned((u64 *)&src[i]);
put_unaligned_be64(src64, &dst[i]);
}
break;
case NFT_BYTEORDER_HTON:
for (i = 0; i < priv->len / 8; i++) {
src64 = get_unaligned_be64(&src[i]);
- src64 = (__force u64)cpu_to_be64(src64);
- put_unaligned_be64(src64, &dst[i]);
+ put_unaligned(src64, (u64 *)&dst[i]);
}
break;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/6] netfilter: ipset: allow a 0 netmask with hash_netiface type
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
2016-01-20 17:03 ` [PATCH 1/6] netfilter: nft_byteorder: avoid unneeded le/be conversion steps Pablo Neira Ayuso
@ 2016-01-20 17:04 ` Pablo Neira Ayuso
2016-01-20 17:04 ` [PATCH 3/6] netfilter: nft_ct: keep counters away from CONFIG_NF_CONNTRACK_LABELS Pablo Neira Ayuso
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2016-01-20 17:04 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Florian Westphal <fw@strlen.de>
Jozsef says:
The correct behaviour is that if we have
ipset create test1 hash:net,iface
ipset add test1 0.0.0.0/0,eth0
iptables -A INPUT -m set --match-set test1 src,src
then the rule should match for any traffic coming in through eth0.
This removes the -EINVAL runtime test to make matching work
in case packet arrived via the specified interface.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1297092
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/ipset/ip_set_hash_netiface.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c
index 43d8c98..f0f688d 100644
--- a/net/netfilter/ipset/ip_set_hash_netiface.c
+++ b/net/netfilter/ipset/ip_set_hash_netiface.c
@@ -164,8 +164,6 @@ hash_netiface4_kadt(struct ip_set *set, const struct sk_buff *skb,
};
struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
- if (e.cidr == 0)
- return -EINVAL;
if (adt == IPSET_TEST)
e.cidr = HOST_MASK;
@@ -377,8 +375,6 @@ hash_netiface6_kadt(struct ip_set *set, const struct sk_buff *skb,
};
struct ip_set_ext ext = IP_SET_INIT_KEXT(skb, opt, set);
- if (e.cidr == 0)
- return -EINVAL;
if (adt == IPSET_TEST)
e.cidr = HOST_MASK;
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/6] netfilter: nft_ct: keep counters away from CONFIG_NF_CONNTRACK_LABELS
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
2016-01-20 17:03 ` [PATCH 1/6] netfilter: nft_byteorder: avoid unneeded le/be conversion steps Pablo Neira Ayuso
2016-01-20 17:04 ` [PATCH 2/6] netfilter: ipset: allow a 0 netmask with hash_netiface type Pablo Neira Ayuso
@ 2016-01-20 17:04 ` Pablo Neira Ayuso
2016-01-20 17:04 ` [PATCH 4/6] netfilter: xt_TCPMSS: handle CHECKSUM_COMPLETE in tcpmss_tg6() Pablo Neira Ayuso
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2016-01-20 17:04 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
This is accidental, they don't depend on the label infrastructure.
Fixes: 48f66c905a97 ("netfilter: nft_ct: add byte/packet counter support")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
---
net/netfilter/nft_ct.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index a0eb216..d4a4619 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -127,6 +127,7 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
NF_CT_LABELS_MAX_SIZE - size);
return;
}
+#endif
case NFT_CT_BYTES: /* fallthrough */
case NFT_CT_PKTS: {
const struct nf_conn_acct *acct = nf_conn_acct_find(ct);
@@ -138,7 +139,6 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
memcpy(dest, &count, sizeof(count));
return;
}
-#endif
default:
break;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 4/6] netfilter: xt_TCPMSS: handle CHECKSUM_COMPLETE in tcpmss_tg6()
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (2 preceding siblings ...)
2016-01-20 17:04 ` [PATCH 3/6] netfilter: nft_ct: keep counters away from CONFIG_NF_CONNTRACK_LABELS Pablo Neira Ayuso
@ 2016-01-20 17:04 ` Pablo Neira Ayuso
2016-01-20 17:04 ` [PATCH 5/6] netfilter: nf_tables_netdev: fix error path in module initialization Pablo Neira Ayuso
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2016-01-20 17:04 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Eric Dumazet <edumazet@google.com>
In case MSS option is added in TCP options, skb length increases by 4.
IPv6 needs to update skb->csum if skb has CHECKSUM_COMPLETE,
otherwise kernel complains loudly in netdev_rx_csum_fault() with a
stack dump.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/xt_TCPMSS.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/xt_TCPMSS.c b/net/netfilter/xt_TCPMSS.c
index b7c43de..e118397 100644
--- a/net/netfilter/xt_TCPMSS.c
+++ b/net/netfilter/xt_TCPMSS.c
@@ -228,7 +228,7 @@ tcpmss_tg6(struct sk_buff *skb, const struct xt_action_param *par)
{
struct ipv6hdr *ipv6h = ipv6_hdr(skb);
u8 nexthdr;
- __be16 frag_off;
+ __be16 frag_off, oldlen, newlen;
int tcphoff;
int ret;
@@ -244,7 +244,12 @@ tcpmss_tg6(struct sk_buff *skb, const struct xt_action_param *par)
return NF_DROP;
if (ret > 0) {
ipv6h = ipv6_hdr(skb);
- ipv6h->payload_len = htons(ntohs(ipv6h->payload_len) + ret);
+ oldlen = ipv6h->payload_len;
+ newlen = htons(ntohs(oldlen) + ret);
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_add(csum_sub(skb->csum, oldlen),
+ newlen);
+ ipv6h->payload_len = newlen;
}
return XT_CONTINUE;
}
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 5/6] netfilter: nf_tables_netdev: fix error path in module initialization
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (3 preceding siblings ...)
2016-01-20 17:04 ` [PATCH 4/6] netfilter: xt_TCPMSS: handle CHECKSUM_COMPLETE in tcpmss_tg6() Pablo Neira Ayuso
@ 2016-01-20 17:04 ` Pablo Neira Ayuso
2016-01-20 17:04 ` [PATCH 6/6] netfilter: nf_conntrack: use safer way to lock all buckets Pablo Neira Ayuso
2016-01-21 2:57 ` [PATCH 0/6] Netfilter fixes for net David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2016-01-20 17:04 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
Unregister the chain type and return error, otherwise this leaks the
subscription to the netdevice notifier call chain.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
net/netfilter/nf_tables_netdev.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nf_tables_netdev.c b/net/netfilter/nf_tables_netdev.c
index b6605e0..5eefe4a 100644
--- a/net/netfilter/nf_tables_netdev.c
+++ b/net/netfilter/nf_tables_netdev.c
@@ -224,12 +224,12 @@ static int __init nf_tables_netdev_init(void)
nft_register_chain_type(&nft_filter_chain_netdev);
ret = register_pernet_subsys(&nf_tables_netdev_net_ops);
- if (ret < 0)
+ if (ret < 0) {
nft_unregister_chain_type(&nft_filter_chain_netdev);
-
+ return ret;
+ }
register_netdevice_notifier(&nf_tables_netdev_notifier);
-
- return ret;
+ return 0;
}
static void __exit nf_tables_netdev_exit(void)
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 6/6] netfilter: nf_conntrack: use safer way to lock all buckets
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (4 preceding siblings ...)
2016-01-20 17:04 ` [PATCH 5/6] netfilter: nf_tables_netdev: fix error path in module initialization Pablo Neira Ayuso
@ 2016-01-20 17:04 ` Pablo Neira Ayuso
2016-01-21 2:57 ` [PATCH 0/6] Netfilter fixes for net David Miller
6 siblings, 0 replies; 8+ messages in thread
From: Pablo Neira Ayuso @ 2016-01-20 17:04 UTC (permalink / raw)
To: netfilter-devel; +Cc: davem, netdev
From: Sasha Levin <sasha.levin@oracle.com>
When we need to lock all buckets in the connection hashtable we'd attempt to
lock 1024 spinlocks, which is way more preemption levels than supported by
the kernel. Furthermore, this behavior was hidden by checking if lockdep is
enabled, and if it was - use only 8 buckets(!).
Fix this by using a global lock and synchronize all buckets on it when we
need to lock them all. This is pretty heavyweight, but is only done when we
need to resize the hashtable, and that doesn't happen often enough (or at all).
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack_core.h | 8 +++----
net/netfilter/nf_conntrack_core.c | 38 +++++++++++++++++++++++--------
net/netfilter/nf_conntrack_helper.c | 2 +-
net/netfilter/nf_conntrack_netlink.c | 2 +-
net/netfilter/nfnetlink_cttimeout.c | 4 ++--
5 files changed, 35 insertions(+), 19 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h
index 788ef58..62e17d1 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -79,12 +79,10 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple,
const struct nf_conntrack_l3proto *l3proto,
const struct nf_conntrack_l4proto *proto);
-#ifdef CONFIG_LOCKDEP
-# define CONNTRACK_LOCKS 8
-#else
-# define CONNTRACK_LOCKS 1024
-#endif
+#define CONNTRACK_LOCKS 1024
+
extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS];
+void nf_conntrack_lock(spinlock_t *lock);
extern spinlock_t nf_conntrack_expect_lock;
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3cb3cb8..58882de 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -66,6 +66,21 @@ EXPORT_SYMBOL_GPL(nf_conntrack_locks);
__cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock);
EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock);
+static __read_mostly spinlock_t nf_conntrack_locks_all_lock;
+static __read_mostly bool nf_conntrack_locks_all;
+
+void nf_conntrack_lock(spinlock_t *lock) __acquires(lock)
+{
+ spin_lock(lock);
+ while (unlikely(nf_conntrack_locks_all)) {
+ spin_unlock(lock);
+ spin_lock(&nf_conntrack_locks_all_lock);
+ spin_unlock(&nf_conntrack_locks_all_lock);
+ spin_lock(lock);
+ }
+}
+EXPORT_SYMBOL_GPL(nf_conntrack_lock);
+
static void nf_conntrack_double_unlock(unsigned int h1, unsigned int h2)
{
h1 %= CONNTRACK_LOCKS;
@@ -82,12 +97,12 @@ static bool nf_conntrack_double_lock(struct net *net, unsigned int h1,
h1 %= CONNTRACK_LOCKS;
h2 %= CONNTRACK_LOCKS;
if (h1 <= h2) {
- spin_lock(&nf_conntrack_locks[h1]);
+ nf_conntrack_lock(&nf_conntrack_locks[h1]);
if (h1 != h2)
spin_lock_nested(&nf_conntrack_locks[h2],
SINGLE_DEPTH_NESTING);
} else {
- spin_lock(&nf_conntrack_locks[h2]);
+ nf_conntrack_lock(&nf_conntrack_locks[h2]);
spin_lock_nested(&nf_conntrack_locks[h1],
SINGLE_DEPTH_NESTING);
}
@@ -102,16 +117,19 @@ static void nf_conntrack_all_lock(void)
{
int i;
- for (i = 0; i < CONNTRACK_LOCKS; i++)
- spin_lock_nested(&nf_conntrack_locks[i], i);
+ spin_lock(&nf_conntrack_locks_all_lock);
+ nf_conntrack_locks_all = true;
+
+ for (i = 0; i < CONNTRACK_LOCKS; i++) {
+ spin_lock(&nf_conntrack_locks[i]);
+ spin_unlock(&nf_conntrack_locks[i]);
+ }
}
static void nf_conntrack_all_unlock(void)
{
- int i;
-
- for (i = 0; i < CONNTRACK_LOCKS; i++)
- spin_unlock(&nf_conntrack_locks[i]);
+ nf_conntrack_locks_all = false;
+ spin_unlock(&nf_conntrack_locks_all_lock);
}
unsigned int nf_conntrack_htable_size __read_mostly;
@@ -757,7 +775,7 @@ restart:
hash = hash_bucket(_hash, net);
for (; i < net->ct.htable_size; i++) {
lockp = &nf_conntrack_locks[hash % CONNTRACK_LOCKS];
- spin_lock(lockp);
+ nf_conntrack_lock(lockp);
if (read_seqcount_retry(&net->ct.generation, sequence)) {
spin_unlock(lockp);
goto restart;
@@ -1382,7 +1400,7 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data),
for (; *bucket < net->ct.htable_size; (*bucket)++) {
lockp = &nf_conntrack_locks[*bucket % CONNTRACK_LOCKS];
local_bh_disable();
- spin_lock(lockp);
+ nf_conntrack_lock(lockp);
if (*bucket < net->ct.htable_size) {
hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], hnnode) {
if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index bd9d315..3b40ec5 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -425,7 +425,7 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me,
}
local_bh_disable();
for (i = 0; i < net->ct.htable_size; i++) {
- spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
if (i < net->ct.htable_size) {
hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode)
unhelp(h, me);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index dbb1bb3..355e855 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -840,7 +840,7 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) {
restart:
lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS];
- spin_lock(lockp);
+ nf_conntrack_lock(lockp);
if (cb->args[0] >= net->ct.htable_size) {
spin_unlock(lockp);
goto out;
diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c
index 5d010f2..94837d2 100644
--- a/net/netfilter/nfnetlink_cttimeout.c
+++ b/net/netfilter/nfnetlink_cttimeout.c
@@ -307,12 +307,12 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout)
local_bh_disable();
for (i = 0; i < net->ct.htable_size; i++) {
- spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
if (i < net->ct.htable_size) {
hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode)
untimeout(h, timeout);
}
- spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
+ nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]);
}
local_bh_enable();
}
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/6] Netfilter fixes for net
2016-01-20 17:03 [PATCH 0/6] Netfilter fixes for net Pablo Neira Ayuso
` (5 preceding siblings ...)
2016-01-20 17:04 ` [PATCH 6/6] netfilter: nf_conntrack: use safer way to lock all buckets Pablo Neira Ayuso
@ 2016-01-21 2:57 ` David Miller
6 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2016-01-21 2:57 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 20 Jan 2016 18:03:58 +0100
> The following patchset contains Netfilter fixes for your net tree, they
> are:
>
> 1) Fix accidental 3-times le/be conversion for 64-bits in nft_byteorder,
> from Florian Westphal.
>
> 2) Get rid of defensive cidr = 0 check in the ipset hash:netiface set
> type which doesn't allow valid 0.0.0.0/0 elements, also from Florian.
>
> 3) Relocate #endif in nft_ct counter support, this doesn't have any
> relation with labels.
>
> 4) Fix TCPMSS target for IPv6 when skb has CHECKSUM_COMPLETE, from
> Eric Dumazet.
>
> 5) Fix netdevice notifier leak from the error path of nf_tables_netdev.
>
> 6) Safe conntrack hashtable resizing by introducing a global lock and
> synchronize all buckets to avoid going over the maximum number of
> preemption levels, from Sasha Levin.
>
> You can pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git
Pulled, thanks Pablo.
^ permalink raw reply [flat|nested] 8+ messages in thread