All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/14] Netfilter fixes for net
@ 2017-01-26 16:37 Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 01/14] netfilter: use fwmark_reflect in nf_send_reset Pablo Neira Ayuso
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains a large batch with Netfilter fixes for
your net tree, they are:

1) Two patches to solve conntrack garbage collector cpu hogging, one to
   remove GC_MAX_EVICTS and another to look at the ratio (scanned entries
   vs. evicted entries) to make a decision on whether to reduce or not
   the scanning interval. From Florian Westphal.

2) Two patches to fix incorrect set element counting if NLM_F_EXCL is
   is not set. Moreover, don't decrenent set->nelems from abort patch
   if -ENFILE which leaks a spare slot in the set. This includes a
   patch to deconstify the set walk callback to update set->ndeact.

3) Two fixes for the fwmark_reflect sysctl feature: Propagate mark to
   reply packets both from nf_reject and local stack, from Pau Espin Pedrol.

4) Fix incorrect handling of loopback traffic in rpfilter and nf_tables
   fib expression, from Liping Zhang.

5) Fix oops on stateful objects netlink dump, when no filter is specified.
   Also from Liping Zhang.

6) Fix a build error if proc is not available in ipt_CLUSTERIP, related
   to fix that was applied in the previous batch for net. From Arnd Bergmann.

7) Fix lack of string validation in table, chain, set and stateful
   object names in nf_tables, from Liping Zhang. Moreover, restrict
   maximum log prefix length to 127 bytes, otherwise explicitly bail
   out.

8) Two patches to fix spelling and typos in nf_tables uapi header file
   and Kconfig, patches from Alexander Alemayhu and William Breathitt Gray.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 03430fa10b99e95e3a15eb7c00978fb1652f3b24:

  Merge branch 'bcm_sf2-fixes' (2017-01-08 22:01:22 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to b2c11e4b9536ebab6b39929e1fe15f57039ab445:

  netfilter: nf_tables: bump set->ndeact on set flush (2017-01-24 21:46:59 +0100)

----------------------------------------------------------------
Alexander Alemayhu (1):
      netfilter: nf_tables: fix spelling mistakes

Arnd Bergmann (1):
      netfilter: ipt_CLUSTERIP: fix build error without procfs

Florian Westphal (2):
      netfilter: conntrack: remove GC_MAX_EVICTS break
      netfilter: conntrack: refine gc worker heuristics, redux

Liping Zhang (4):
      netfilter: rpfilter: fix incorrect loopback packet judgment
      netfilter: nf_tables: fix possible oops when dumping stateful objects
      netfilter: nf_tables: validate the name size when possible
      netfilter: nft_log: restrict the log prefix length to 127

Pablo Neira Ayuso (3):
      netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL
      netfilter: nf_tables: deconstify walk callback function
      netfilter: nf_tables: bump set->ndeact on set flush

Pau Espin Pedrol (2):
      netfilter: use fwmark_reflect in nf_send_reset
      tcp: fix mark propagation with fwmark_reflect enabled

William Breathitt Gray (1):
      netfilter: Fix typo in NF_CONNTRACK Kconfig option description

 include/net/netfilter/nf_tables.h        |  6 +--
 include/net/netfilter/nft_fib.h          |  6 +++
 include/uapi/linux/netfilter/nf_log.h    |  2 +
 include/uapi/linux/netfilter/nf_tables.h |  4 +-
 net/ipv4/ip_output.c                     |  1 +
 net/ipv4/netfilter/ipt_CLUSTERIP.c       |  7 +++-
 net/ipv4/netfilter/ipt_rpfilter.c        |  8 ++--
 net/ipv4/netfilter/nf_reject_ipv4.c      |  2 +
 net/ipv4/netfilter/nft_fib_ipv4.c        | 15 +++----
 net/ipv6/netfilter/ip6t_rpfilter.c       |  8 ++--
 net/ipv6/netfilter/nf_reject_ipv6.c      |  3 ++
 net/ipv6/netfilter/nft_fib_ipv6.c        | 13 ++-----
 net/ipv6/tcp_ipv6.c                      |  1 +
 net/netfilter/Kconfig                    |  2 +-
 net/netfilter/nf_conntrack_core.c        | 44 ++++++++++-----------
 net/netfilter/nf_log.c                   |  1 -
 net/netfilter/nf_tables_api.c            | 67 +++++++++++++++++++-------------
 net/netfilter/nft_dynset.c               |  3 +-
 net/netfilter/nft_log.c                  |  3 +-
 net/netfilter/nft_lookup.c               |  3 +-
 net/netfilter/nft_objref.c               |  6 ++-
 net/netfilter/nft_set_hash.c             |  2 +-
 net/netfilter/nft_set_rbtree.c           |  2 +-
 23 files changed, 116 insertions(+), 93 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 01/14] netfilter: use fwmark_reflect in nf_send_reset
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
@ 2017-01-26 16:37 ` Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled Pablo Neira Ayuso
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pau Espin Pedrol <pau.espin@tessares.net>

Otherwise, RST packets generated by ipt_REJECT always have mark 0 when
the routing is checked later in the same code path.

Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_reject_ipv4.c | 2 ++
 net/ipv6/netfilter/nf_reject_ipv6.c | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c
index fd8220213afc..146d86105183 100644
--- a/net/ipv4/netfilter/nf_reject_ipv4.c
+++ b/net/ipv4/netfilter/nf_reject_ipv4.c
@@ -126,6 +126,8 @@ void nf_send_reset(struct net *net, struct sk_buff *oldskb, int hook)
 	/* ip_route_me_harder expects skb->dst to be set */
 	skb_dst_set_noref(nskb, skb_dst(oldskb));
 
+	nskb->mark = IP4_REPLY_MARK(net, oldskb->mark);
+
 	skb_reserve(nskb, LL_MAX_HEADER);
 	niph = nf_reject_iphdr_put(nskb, oldskb, IPPROTO_TCP,
 				   ip4_dst_hoplimit(skb_dst(nskb)));
diff --git a/net/ipv6/netfilter/nf_reject_ipv6.c b/net/ipv6/netfilter/nf_reject_ipv6.c
index 10090400c72f..eedee5d108d9 100644
--- a/net/ipv6/netfilter/nf_reject_ipv6.c
+++ b/net/ipv6/netfilter/nf_reject_ipv6.c
@@ -157,6 +157,7 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 	fl6.fl6_sport = otcph->dest;
 	fl6.fl6_dport = otcph->source;
 	fl6.flowi6_oif = l3mdev_master_ifindex(skb_dst(oldskb)->dev);
+	fl6.flowi6_mark = IP6_REPLY_MARK(net, oldskb->mark);
 	security_skb_classify_flow(oldskb, flowi6_to_flowi(&fl6));
 	dst = ip6_route_output(net, NULL, &fl6);
 	if (dst->error) {
@@ -180,6 +181,8 @@ void nf_send_reset6(struct net *net, struct sk_buff *oldskb, int hook)
 
 	skb_dst_set(nskb, dst);
 
+	nskb->mark = fl6.flowi6_mark;
+
 	skb_reserve(nskb, hh_len + dst->header_len);
 	ip6h = nf_reject_ip6hdr_put(nskb, oldskb, IPPROTO_TCP,
 				    ip6_dst_hoplimit(dst));
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 01/14] netfilter: use fwmark_reflect in nf_send_reset Pablo Neira Ayuso
@ 2017-01-26 16:37 ` Pablo Neira Ayuso
  2017-01-26 18:02   ` Eric Dumazet
  2017-01-26 16:37 ` [PATCH 03/14] netfilter: nf_tables: fix spelling mistakes Pablo Neira Ayuso
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pau Espin Pedrol <pespin.shar@gmail.com>

Otherwise, RST packets generated by the TCP stack for non-existing
sockets always have mark 0.
The mark from the original packet is assigned to the netns_ipv4/6
socket used to send the response so that it can get copied into the
response skb when the socket sends it.

Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/ip_output.c | 1 +
 net/ipv6/tcp_ipv6.c  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index fac275c48108..b67719f45953 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1629,6 +1629,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
 	sk->sk_protocol = ip_hdr(skb)->protocol;
 	sk->sk_bound_dev_if = arg->bound_dev_if;
 	sk->sk_sndbuf = sysctl_wmem_default;
+	sk->sk_mark = fl4.flowi4_mark;
 	err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
 			     len, 0, &ipc, &rt, MSG_DONTWAIT);
 	if (unlikely(err)) {
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 73bc8fc68acd..2b20622a5824 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -840,6 +840,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
 	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
 	if (!IS_ERR(dst)) {
 		skb_dst_set(buff, dst);
+		ctl_sk->sk_mark = fl6.flowi6_mark;
 		ip6_xmit(ctl_sk, buff, &fl6, NULL, tclass);
 		TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
 		if (rst)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 03/14] netfilter: nf_tables: fix spelling mistakes
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 01/14] netfilter: use fwmark_reflect in nf_send_reset Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled Pablo Neira Ayuso
@ 2017-01-26 16:37 ` Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 04/14] netfilter: rpfilter: fix incorrect loopback packet judgment Pablo Neira Ayuso
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Alexander Alemayhu <alexander@alemayhu.com>

o s/numerice/numeric
o s/opertaor/operator

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_tables.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h
index 881d49e94569..e3f27e09eb2b 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -235,7 +235,7 @@ enum nft_rule_compat_flags {
 /**
  * enum nft_rule_compat_attributes - nf_tables rule compat attributes
  *
- * @NFTA_RULE_COMPAT_PROTO: numerice value of handled protocol (NLA_U32)
+ * @NFTA_RULE_COMPAT_PROTO: numeric value of handled protocol (NLA_U32)
  * @NFTA_RULE_COMPAT_FLAGS: bitmask of enum nft_rule_compat_flags (NLA_U32)
  */
 enum nft_rule_compat_attributes {
@@ -499,7 +499,7 @@ enum nft_bitwise_attributes {
  * enum nft_byteorder_ops - nf_tables byteorder operators
  *
  * @NFT_BYTEORDER_NTOH: network to host operator
- * @NFT_BYTEORDER_HTON: host to network opertaor
+ * @NFT_BYTEORDER_HTON: host to network operator
  */
 enum nft_byteorder_ops {
 	NFT_BYTEORDER_NTOH,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 04/14] netfilter: rpfilter: fix incorrect loopback packet judgment
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (2 preceding siblings ...)
  2017-01-26 16:37 ` [PATCH 03/14] netfilter: nf_tables: fix spelling mistakes Pablo Neira Ayuso
@ 2017-01-26 16:37 ` Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 05/14] netfilter: nf_tables: fix possible oops when dumping stateful objects Pablo Neira Ayuso
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Liping Zhang <zlpnobody@gmail.com>

Currently, we check the existing rtable in PREROUTING hook, if RTCF_LOCAL
is set, we assume that the packet is loopback.

But this assumption is incorrect, for example, a packet encapsulated
in ipsec transport mode was received and routed to local, after
decapsulation, it would be delivered to local again, and the rtable
was not dropped, so RTCF_LOCAL check would trigger. But actually, the
packet was not loopback.

So for these normal loopback packets, we can check whether the in device
is IFF_LOOPBACK or not. For these locally generated broadcast/multicast,
we can check whether the skb->pkt_type is PACKET_LOOPBACK or not.

Finally, there's a subtle difference between nft fib expr and xtables
rpfilter extension, user can add the following nft rule to do strict
rpfilter check:
  # nft add rule x y meta iif eth0 fib saddr . iif oif != eth0 drop

So when the packet is loopback, it's better to store the in device
instead of the LOOPBACK_IFINDEX, otherwise, after adding the above
nft rule, locally generated broad/multicast packets will be dropped
incorrectly.

Fixes: f83a7ea2075c ("netfilter: xt_rpfilter: skip locally generated broadcast/multicast, too")
Fixes: f6d0cbcf09c5 ("netfilter: nf_tables: add fib expression")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nft_fib.h    |  6 ++++++
 net/ipv4/netfilter/ipt_rpfilter.c  |  8 ++++----
 net/ipv4/netfilter/nft_fib_ipv4.c  | 15 +++++----------
 net/ipv6/netfilter/ip6t_rpfilter.c |  8 ++++----
 net/ipv6/netfilter/nft_fib_ipv6.c  | 13 ++++---------
 5 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/include/net/netfilter/nft_fib.h b/include/net/netfilter/nft_fib.h
index cbedda077db2..5ceb2205e4e3 100644
--- a/include/net/netfilter/nft_fib.h
+++ b/include/net/netfilter/nft_fib.h
@@ -9,6 +9,12 @@ struct nft_fib {
 
 extern const struct nla_policy nft_fib_policy[];
 
+static inline bool
+nft_fib_is_loopback(const struct sk_buff *skb, const struct net_device *in)
+{
+	return skb->pkt_type == PACKET_LOOPBACK || in->flags & IFF_LOOPBACK;
+}
+
 int nft_fib_dump(struct sk_buff *skb, const struct nft_expr *expr);
 int nft_fib_init(const struct nft_ctx *ctx, const struct nft_expr *expr,
 		 const struct nlattr * const tb[]);
diff --git a/net/ipv4/netfilter/ipt_rpfilter.c b/net/ipv4/netfilter/ipt_rpfilter.c
index f273098e48fd..37fb9552e858 100644
--- a/net/ipv4/netfilter/ipt_rpfilter.c
+++ b/net/ipv4/netfilter/ipt_rpfilter.c
@@ -63,10 +63,10 @@ static bool rpfilter_lookup_reverse(struct net *net, struct flowi4 *fl4,
 	return dev_match || flags & XT_RPFILTER_LOOSE;
 }
 
-static bool rpfilter_is_local(const struct sk_buff *skb)
+static bool
+rpfilter_is_loopback(const struct sk_buff *skb, const struct net_device *in)
 {
-	const struct rtable *rt = skb_rtable(skb);
-	return rt && (rt->rt_flags & RTCF_LOCAL);
+	return skb->pkt_type == PACKET_LOOPBACK || in->flags & IFF_LOOPBACK;
 }
 
 static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
@@ -79,7 +79,7 @@ static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	info = par->matchinfo;
 	invert = info->flags & XT_RPFILTER_INVERT;
 
-	if (rpfilter_is_local(skb))
+	if (rpfilter_is_loopback(skb, xt_in(par)))
 		return true ^ invert;
 
 	iph = ip_hdr(skb);
diff --git a/net/ipv4/netfilter/nft_fib_ipv4.c b/net/ipv4/netfilter/nft_fib_ipv4.c
index 965b1a161369..2981291910dd 100644
--- a/net/ipv4/netfilter/nft_fib_ipv4.c
+++ b/net/ipv4/netfilter/nft_fib_ipv4.c
@@ -26,13 +26,6 @@ static __be32 get_saddr(__be32 addr)
 	return addr;
 }
 
-static bool fib4_is_local(const struct sk_buff *skb)
-{
-	const struct rtable *rt = skb_rtable(skb);
-
-	return rt && (rt->rt_flags & RTCF_LOCAL);
-}
-
 #define DSCP_BITS     0xfc
 
 void nft_fib4_eval_type(const struct nft_expr *expr, struct nft_regs *regs,
@@ -95,8 +88,10 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
 	else
 		oif = NULL;
 
-	if (nft_hook(pkt) == NF_INET_PRE_ROUTING && fib4_is_local(pkt->skb)) {
-		nft_fib_store_result(dest, priv->result, pkt, LOOPBACK_IFINDEX);
+	if (nft_hook(pkt) == NF_INET_PRE_ROUTING &&
+	    nft_fib_is_loopback(pkt->skb, nft_in(pkt))) {
+		nft_fib_store_result(dest, priv->result, pkt,
+				     nft_in(pkt)->ifindex);
 		return;
 	}
 
@@ -131,7 +126,7 @@ void nft_fib4_eval(const struct nft_expr *expr, struct nft_regs *regs,
 	switch (res.type) {
 	case RTN_UNICAST:
 		break;
-	case RTN_LOCAL:	/* should not appear here, see fib4_is_local() above */
+	case RTN_LOCAL: /* Should not see RTN_LOCAL here */
 		return;
 	default:
 		break;
diff --git a/net/ipv6/netfilter/ip6t_rpfilter.c b/net/ipv6/netfilter/ip6t_rpfilter.c
index d5263dc364a9..b12e61b7b16c 100644
--- a/net/ipv6/netfilter/ip6t_rpfilter.c
+++ b/net/ipv6/netfilter/ip6t_rpfilter.c
@@ -72,10 +72,10 @@ static bool rpfilter_lookup_reverse6(struct net *net, const struct sk_buff *skb,
 	return ret;
 }
 
-static bool rpfilter_is_local(const struct sk_buff *skb)
+static bool
+rpfilter_is_loopback(const struct sk_buff *skb, const struct net_device *in)
 {
-	const struct rt6_info *rt = (const void *) skb_dst(skb);
-	return rt && (rt->rt6i_flags & RTF_LOCAL);
+	return skb->pkt_type == PACKET_LOOPBACK || in->flags & IFF_LOOPBACK;
 }
 
 static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
@@ -85,7 +85,7 @@ static bool rpfilter_mt(const struct sk_buff *skb, struct xt_action_param *par)
 	struct ipv6hdr *iph;
 	bool invert = info->flags & XT_RPFILTER_INVERT;
 
-	if (rpfilter_is_local(skb))
+	if (rpfilter_is_loopback(skb, xt_in(par)))
 		return true ^ invert;
 
 	iph = ipv6_hdr(skb);
diff --git a/net/ipv6/netfilter/nft_fib_ipv6.c b/net/ipv6/netfilter/nft_fib_ipv6.c
index c947aad8bcc6..765facf03d45 100644
--- a/net/ipv6/netfilter/nft_fib_ipv6.c
+++ b/net/ipv6/netfilter/nft_fib_ipv6.c
@@ -18,13 +18,6 @@
 #include <net/ip6_fib.h>
 #include <net/ip6_route.h>
 
-static bool fib6_is_local(const struct sk_buff *skb)
-{
-	const struct rt6_info *rt = (const void *)skb_dst(skb);
-
-	return rt && (rt->rt6i_flags & RTF_LOCAL);
-}
-
 static int get_ifindex(const struct net_device *dev)
 {
 	return dev ? dev->ifindex : 0;
@@ -164,8 +157,10 @@ void nft_fib6_eval(const struct nft_expr *expr, struct nft_regs *regs,
 
 	lookup_flags = nft_fib6_flowi_init(&fl6, priv, pkt, oif);
 
-	if (nft_hook(pkt) == NF_INET_PRE_ROUTING && fib6_is_local(pkt->skb)) {
-		nft_fib_store_result(dest, priv->result, pkt, LOOPBACK_IFINDEX);
+	if (nft_hook(pkt) == NF_INET_PRE_ROUTING &&
+	    nft_fib_is_loopback(pkt->skb, nft_in(pkt))) {
+		nft_fib_store_result(dest, priv->result, pkt,
+				     nft_in(pkt)->ifindex);
 		return;
 	}
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 05/14] netfilter: nf_tables: fix possible oops when dumping stateful objects
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (3 preceding siblings ...)
  2017-01-26 16:37 ` [PATCH 04/14] netfilter: rpfilter: fix incorrect loopback packet judgment Pablo Neira Ayuso
@ 2017-01-26 16:37 ` Pablo Neira Ayuso
  2017-01-26 16:37 ` [PATCH 06/14] netfilter: Fix typo in NF_CONNTRACK Kconfig option description Pablo Neira Ayuso
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Liping Zhang <zlpnobody@gmail.com>

When dumping nft stateful objects, if NFTA_OBJ_TABLE and NFTA_OBJ_TYPE
attributes are not specified either, filter will become NULL, so oops
will happen(actually nft utility will always set NFTA_OBJ_TABLE attr,
so I write a test program to make this happen):

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: nf_tables_dump_obj+0x17c/0x330 [nf_tables]
  [...]
  Call Trace:
  ? nf_tables_dump_obj+0x5/0x330 [nf_tables]
  ? __kmalloc_reserve.isra.35+0x31/0x90
  ? __alloc_skb+0x5b/0x1e0
  netlink_dump+0x124/0x2a0
  __netlink_dump_start+0x161/0x190
  nf_tables_getobj+0xe8/0x280 [nf_tables]

Fixes: a9fea2a3c3cf ("netfilter: nf_tables: allow to filter stateful object dumps by type")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 0db5f9782265..091d2dcc63b2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -4262,10 +4262,11 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb)
 				if (idx > s_idx)
 					memset(&cb->args[1], 0,
 					       sizeof(cb->args) - sizeof(cb->args[0]));
-				if (filter->table[0] &&
+				if (filter && filter->table[0] &&
 				    strcmp(filter->table, table->name))
 					goto cont;
-				if (filter->type != NFT_OBJECT_UNSPEC &&
+				if (filter &&
+				    filter->type != NFT_OBJECT_UNSPEC &&
 				    obj->type->type != filter->type)
 					goto cont;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 06/14] netfilter: Fix typo in NF_CONNTRACK Kconfig option description
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (4 preceding siblings ...)
  2017-01-26 16:37 ` [PATCH 05/14] netfilter: nf_tables: fix possible oops when dumping stateful objects Pablo Neira Ayuso
@ 2017-01-26 16:37 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 07/14] netfilter: ipt_CLUSTERIP: fix build error without procfs Pablo Neira Ayuso
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:37 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: William Breathitt Gray <vilhelm.gray@gmail.com>

The NF_CONNTRACK Kconfig option description makes an incorrect reference
to the "meta" expression where the "ct" expression would be correct.This
patch fixes the respective typographical error.

Fixes: d497c6352736 ("netfilter: add help information to new nf_tables Kconfig options")
Signed-off-by: William Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 63729b489c2c..bbc45f8a7b2d 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -494,7 +494,7 @@ config NFT_CT
 	depends on NF_CONNTRACK
 	tristate "Netfilter nf_tables conntrack module"
 	help
-	  This option adds the "meta" expression that you can use to match
+	  This option adds the "ct" expression that you can use to match
 	  connection tracking information such as the flow state.
 
 config NFT_SET_RBTREE
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 07/14] netfilter: ipt_CLUSTERIP: fix build error without procfs
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (5 preceding siblings ...)
  2017-01-26 16:37 ` [PATCH 06/14] netfilter: Fix typo in NF_CONNTRACK Kconfig option description Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 08/14] netfilter: conntrack: remove GC_MAX_EVICTS break Pablo Neira Ayuso
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Arnd Bergmann <arnd@arndb.de>

We can't access c->pde if CONFIG_PROC_FS is disabled:

net/ipv4/netfilter/ipt_CLUSTERIP.c: In function 'clusterip_config_find_get':
net/ipv4/netfilter/ipt_CLUSTERIP.c:147:9: error: 'struct clusterip_config' has no member named 'pde'

This moves the check inside of another #ifdef.

Fixes: 6c5d5cfbe3c5 ("netfilter: ipt_CLUSTERIP: check duplicate config when initializing")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index a6b8c1a4102b..0a783cd73faf 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -144,7 +144,12 @@ clusterip_config_find_get(struct net *net, __be32 clusterip, int entry)
 	rcu_read_lock_bh();
 	c = __clusterip_config_find(net, clusterip);
 	if (c) {
-		if (!c->pde || unlikely(!atomic_inc_not_zero(&c->refcount)))
+#ifdef CONFIG_PROC_FS
+		if (!c->pde)
+			c = NULL;
+		else
+#endif
+		if (unlikely(!atomic_inc_not_zero(&c->refcount)))
 			c = NULL;
 		else if (entry)
 			atomic_inc(&c->entries);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 08/14] netfilter: conntrack: remove GC_MAX_EVICTS break
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (6 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 07/14] netfilter: ipt_CLUSTERIP: fix build error without procfs Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux Pablo Neira Ayuso
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

Instead of breaking loop and instant resched, don't bother checking
this in first place (the loop calls cond_resched for every bucket anyway).

Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 3a073cd9fcf4..6feb5d370319 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -88,8 +88,6 @@ static __read_mostly bool nf_conntrack_locks_all;
 #define GC_MAX_BUCKETS_DIV	64u
 /* upper bound of scan intervals */
 #define GC_INTERVAL_MAX		(2 * HZ)
-/* maximum conntracks to evict per gc run */
-#define GC_MAX_EVICTS		256u
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -979,8 +977,7 @@ static void gc_worker(struct work_struct *work)
 		 */
 		rcu_read_unlock();
 		cond_resched_rcu_qs();
-	} while (++buckets < goal &&
-		 expired_count < GC_MAX_EVICTS);
+	} while (++buckets < goal);
 
 	if (gc_work->exiting)
 		return;
@@ -1005,7 +1002,7 @@ static void gc_worker(struct work_struct *work)
 	 * In case we have lots of evictions next scan is done immediately.
 	 */
 	ratio = scanned ? expired_count * 100 / scanned : 0;
-	if (ratio >= 90 || expired_count == GC_MAX_EVICTS) {
+	if (ratio >= 90) {
 		gc_work->next_gc_run = 0;
 		next_run = 0;
 	} else if (expired_count) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (7 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 08/14] netfilter: conntrack: remove GC_MAX_EVICTS break Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-27 16:51   ` Nicolas Dichtel
  2017-01-26 16:38 ` [PATCH 10/14] netfilter: nf_tables: validate the name size when possible Pablo Neira Ayuso
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Florian Westphal <fw@strlen.de>

This further refines the changes made to conntrack gc_worker in
commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").

The main idea of that change was to reduce the scan interval when evictions
take place.

However, on the reporters' setup, there are 1-2 million conntrack entries
in total and roughly 8k new (and closing) connections per second.

In this case we'll always evict at least one entry per gc cycle and scan
interval is always at 1 jiffy because of this test:

 } else if (expired_count) {
     gc_work->next_gc_run /= 2U;
     next_run = msecs_to_jiffies(1);

being true almost all the time.

Given we scan ~10k entries per run its clearly wrong to reduce interval
based on nonzero eviction count, it will only waste cpu cycles since a vast
majorities of conntracks are not timed out.

Thus only look at the ratio (scanned entries vs. evicted entries) to make
a decision on whether to reduce or not.

Because evictor is supposed to only kick in when system turns idle after
a busy period, pick a high ratio -- this makes it 50%.  We thus keep
the idea of increasing scan rate when its likely that table contains many
expired entries.

In order to not let timed-out entries hang around for too long
(important when using event logging, in which case we want to timely
destroy events), we now scan the full table within at most
GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
timed-out entries sit in same slot.

I tested this with a vm under synflood (with
sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).

While flood is ongoing, interval now stays at its max rate
(GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).

With feedback from Nicolas Dichtel.

Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 6feb5d370319..4e8083c5e01d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -85,9 +85,11 @@ static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock);
 static __read_mostly bool nf_conntrack_locks_all;
 
 /* every gc cycle scans at most 1/GC_MAX_BUCKETS_DIV part of table */
-#define GC_MAX_BUCKETS_DIV	64u
-/* upper bound of scan intervals */
-#define GC_INTERVAL_MAX		(2 * HZ)
+#define GC_MAX_BUCKETS_DIV	128u
+/* upper bound of full table scan */
+#define GC_MAX_SCAN_JIFFIES	(16u * HZ)
+/* desired ratio of entries found to be expired */
+#define GC_EVICT_RATIO	50u
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -936,6 +938,7 @@ static noinline int early_drop(struct net *net, unsigned int _hash)
 
 static void gc_worker(struct work_struct *work)
 {
+	unsigned int min_interval = max(HZ / GC_MAX_BUCKETS_DIV, 1u);
 	unsigned int i, goal, buckets = 0, expired_count = 0;
 	struct conntrack_gc_work *gc_work;
 	unsigned int ratio, scanned = 0;
@@ -994,27 +997,25 @@ static void gc_worker(struct work_struct *work)
 	 * 1. Minimize time until we notice a stale entry
 	 * 2. Maximize scan intervals to not waste cycles
 	 *
-	 * Normally, expired_count will be 0, this increases the next_run time
-	 * to priorize 2) above.
+	 * Normally, expire ratio will be close to 0.
 	 *
-	 * As soon as a timed-out entry is found, move towards 1) and increase
-	 * the scan frequency.
-	 * In case we have lots of evictions next scan is done immediately.
+	 * As soon as a sizeable fraction of the entries have expired
+	 * increase scan frequency.
 	 */
 	ratio = scanned ? expired_count * 100 / scanned : 0;
-	if (ratio >= 90) {
-		gc_work->next_gc_run = 0;
-		next_run = 0;
-	} else if (expired_count) {
-		gc_work->next_gc_run /= 2U;
-		next_run = msecs_to_jiffies(1);
+	if (ratio > GC_EVICT_RATIO) {
+		gc_work->next_gc_run = min_interval;
 	} else {
-		if (gc_work->next_gc_run < GC_INTERVAL_MAX)
-			gc_work->next_gc_run += msecs_to_jiffies(1);
+		unsigned int max = GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV;
 
-		next_run = gc_work->next_gc_run;
+		BUILD_BUG_ON((GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV) == 0);
+
+		gc_work->next_gc_run += min_interval;
+		if (gc_work->next_gc_run > max)
+			gc_work->next_gc_run = max;
 	}
 
+	next_run = gc_work->next_gc_run;
 	gc_work->last_bucket = i;
 	queue_delayed_work(system_long_wq, &gc_work->dwork, next_run);
 }
@@ -1022,7 +1023,7 @@ static void gc_worker(struct work_struct *work)
 static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work)
 {
 	INIT_DELAYED_WORK(&gc_work->dwork, gc_worker);
-	gc_work->next_gc_run = GC_INTERVAL_MAX;
+	gc_work->next_gc_run = HZ;
 	gc_work->exiting = false;
 }
 
@@ -1914,7 +1915,7 @@ int nf_conntrack_init_start(void)
 	nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED);
 
 	conntrack_gc_work_init(&conntrack_gc_work);
-	queue_delayed_work(system_long_wq, &conntrack_gc_work.dwork, GC_INTERVAL_MAX);
+	queue_delayed_work(system_long_wq, &conntrack_gc_work.dwork, HZ);
 
 	return 0;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 10/14] netfilter: nf_tables: validate the name size when possible
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (8 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 11/14] netfilter: nft_log: restrict the log prefix length to 127 Pablo Neira Ayuso
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Liping Zhang <zlpnobody@gmail.com>

Currently, if the user add a stateful object with the name size exceed
NFT_OBJ_MAXNAMELEN - 1 (i.e. 31), we truncate it down to 31 silently.
This is not friendly, furthermore, this will cause duplicated stateful
objects when the first 31 characters of the name is same. So limit the
stateful object's name size to NFT_OBJ_MAXNAMELEN - 1.

After apply this patch, error message will be printed out like this:
  # name_32=$(printf "%0.sQ" {1..32})
  # nft add counter filter $name_32
  <cmdline>:1:1-52: Error: Could not process rule: Numerical result out
  of range
  add counter filter QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Also this patch cleans up the codes which missing the name size limit
validation in nftables.

Fixes: e50092404c1b ("netfilter: nf_tables: add stateful objects")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 21 ++++++++++++++-------
 net/netfilter/nft_dynset.c    |  3 ++-
 net/netfilter/nft_lookup.c    |  3 ++-
 net/netfilter/nft_objref.c    |  6 ++++--
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 091d2dcc63b2..b84c7b25219b 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -928,7 +928,8 @@ static struct nft_chain *nf_tables_chain_lookup(const struct nft_table *table,
 }
 
 static const struct nla_policy nft_chain_policy[NFTA_CHAIN_MAX + 1] = {
-	[NFTA_CHAIN_TABLE]	= { .type = NLA_STRING },
+	[NFTA_CHAIN_TABLE]	= { .type = NLA_STRING,
+				    .len = NFT_TABLE_MAXNAMELEN - 1 },
 	[NFTA_CHAIN_HANDLE]	= { .type = NLA_U64 },
 	[NFTA_CHAIN_NAME]	= { .type = NLA_STRING,
 				    .len = NFT_CHAIN_MAXNAMELEN - 1 },
@@ -1854,7 +1855,8 @@ static struct nft_rule *nf_tables_rule_lookup(const struct nft_chain *chain,
 }
 
 static const struct nla_policy nft_rule_policy[NFTA_RULE_MAX + 1] = {
-	[NFTA_RULE_TABLE]	= { .type = NLA_STRING },
+	[NFTA_RULE_TABLE]	= { .type = NLA_STRING,
+				    .len = NFT_TABLE_MAXNAMELEN - 1 },
 	[NFTA_RULE_CHAIN]	= { .type = NLA_STRING,
 				    .len = NFT_CHAIN_MAXNAMELEN - 1 },
 	[NFTA_RULE_HANDLE]	= { .type = NLA_U64 },
@@ -2443,7 +2445,8 @@ nft_select_set_ops(const struct nlattr * const nla[],
 }
 
 static const struct nla_policy nft_set_policy[NFTA_SET_MAX + 1] = {
-	[NFTA_SET_TABLE]		= { .type = NLA_STRING },
+	[NFTA_SET_TABLE]		= { .type = NLA_STRING,
+					    .len = NFT_TABLE_MAXNAMELEN - 1 },
 	[NFTA_SET_NAME]			= { .type = NLA_STRING,
 					    .len = NFT_SET_MAXNAMELEN - 1 },
 	[NFTA_SET_FLAGS]		= { .type = NLA_U32 },
@@ -3192,8 +3195,10 @@ static const struct nla_policy nft_set_elem_policy[NFTA_SET_ELEM_MAX + 1] = {
 };
 
 static const struct nla_policy nft_set_elem_list_policy[NFTA_SET_ELEM_LIST_MAX + 1] = {
-	[NFTA_SET_ELEM_LIST_TABLE]	= { .type = NLA_STRING },
-	[NFTA_SET_ELEM_LIST_SET]	= { .type = NLA_STRING },
+	[NFTA_SET_ELEM_LIST_TABLE]	= { .type = NLA_STRING,
+					    .len = NFT_TABLE_MAXNAMELEN - 1 },
+	[NFTA_SET_ELEM_LIST_SET]	= { .type = NLA_STRING,
+					    .len = NFT_SET_MAXNAMELEN - 1 },
 	[NFTA_SET_ELEM_LIST_ELEMENTS]	= { .type = NLA_NESTED },
 	[NFTA_SET_ELEM_LIST_SET_ID]	= { .type = NLA_U32 },
 };
@@ -4032,8 +4037,10 @@ struct nft_object *nf_tables_obj_lookup(const struct nft_table *table,
 EXPORT_SYMBOL_GPL(nf_tables_obj_lookup);
 
 static const struct nla_policy nft_obj_policy[NFTA_OBJ_MAX + 1] = {
-	[NFTA_OBJ_TABLE]	= { .type = NLA_STRING },
-	[NFTA_OBJ_NAME]		= { .type = NLA_STRING },
+	[NFTA_OBJ_TABLE]	= { .type = NLA_STRING,
+				    .len = NFT_TABLE_MAXNAMELEN - 1 },
+	[NFTA_OBJ_NAME]		= { .type = NLA_STRING,
+				    .len = NFT_OBJ_MAXNAMELEN - 1 },
 	[NFTA_OBJ_TYPE]		= { .type = NLA_U32 },
 	[NFTA_OBJ_DATA]		= { .type = NLA_NESTED },
 };
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 7de2f46734a4..049ad2d9ee66 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -98,7 +98,8 @@ static void nft_dynset_eval(const struct nft_expr *expr,
 }
 
 static const struct nla_policy nft_dynset_policy[NFTA_DYNSET_MAX + 1] = {
-	[NFTA_DYNSET_SET_NAME]	= { .type = NLA_STRING },
+	[NFTA_DYNSET_SET_NAME]	= { .type = NLA_STRING,
+				    .len = NFT_SET_MAXNAMELEN - 1 },
 	[NFTA_DYNSET_SET_ID]	= { .type = NLA_U32 },
 	[NFTA_DYNSET_OP]	= { .type = NLA_U32 },
 	[NFTA_DYNSET_SREG_KEY]	= { .type = NLA_U32 },
diff --git a/net/netfilter/nft_lookup.c b/net/netfilter/nft_lookup.c
index d4f97fa7e21d..e21aea7e5ec8 100644
--- a/net/netfilter/nft_lookup.c
+++ b/net/netfilter/nft_lookup.c
@@ -49,7 +49,8 @@ static void nft_lookup_eval(const struct nft_expr *expr,
 }
 
 static const struct nla_policy nft_lookup_policy[NFTA_LOOKUP_MAX + 1] = {
-	[NFTA_LOOKUP_SET]	= { .type = NLA_STRING },
+	[NFTA_LOOKUP_SET]	= { .type = NLA_STRING,
+				    .len = NFT_SET_MAXNAMELEN - 1 },
 	[NFTA_LOOKUP_SET_ID]	= { .type = NLA_U32 },
 	[NFTA_LOOKUP_SREG]	= { .type = NLA_U32 },
 	[NFTA_LOOKUP_DREG]	= { .type = NLA_U32 },
diff --git a/net/netfilter/nft_objref.c b/net/netfilter/nft_objref.c
index 415a65ba2b85..1ae8c49ca4a1 100644
--- a/net/netfilter/nft_objref.c
+++ b/net/netfilter/nft_objref.c
@@ -193,10 +193,12 @@ nft_objref_select_ops(const struct nft_ctx *ctx,
 }
 
 static const struct nla_policy nft_objref_policy[NFTA_OBJREF_MAX + 1] = {
-	[NFTA_OBJREF_IMM_NAME]	= { .type = NLA_STRING },
+	[NFTA_OBJREF_IMM_NAME]	= { .type = NLA_STRING,
+				    .len = NFT_OBJ_MAXNAMELEN - 1 },
 	[NFTA_OBJREF_IMM_TYPE]	= { .type = NLA_U32 },
 	[NFTA_OBJREF_SET_SREG]	= { .type = NLA_U32 },
-	[NFTA_OBJREF_SET_NAME]	= { .type = NLA_STRING },
+	[NFTA_OBJREF_SET_NAME]	= { .type = NLA_STRING,
+				    .len = NFT_SET_MAXNAMELEN - 1 },
 	[NFTA_OBJREF_SET_ID]	= { .type = NLA_U32 },
 };
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 11/14] netfilter: nft_log: restrict the log prefix length to 127
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (9 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 10/14] netfilter: nf_tables: validate the name size when possible Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 12/14] netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL Pablo Neira Ayuso
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Liping Zhang <zlpnobody@gmail.com>

First, log prefix will be truncated to NF_LOG_PREFIXLEN-1, i.e. 127,
at nf_log_packet(), so the extra part is useless.

Second, after adding a log rule with a very very long prefix, we will
fail to dump the nft rules after this _special_ one, but acctually,
they do exist. For example:
  # name_65000=$(printf "%0.sQ" {1..65000})
  # nft add rule filter output log prefix "$name_65000"
  # nft add rule filter output counter
  # nft add rule filter output counter
  # nft list chain filter output
  table ip filter {
      chain output {
          type filter hook output priority 0; policy accept;
      }
  }

So now, restrict the log prefix length to NF_LOG_PREFIXLEN-1.

Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/netfilter/nf_log.h | 2 ++
 net/netfilter/nf_log.c                | 1 -
 net/netfilter/nft_log.c               | 3 ++-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/netfilter/nf_log.h b/include/uapi/linux/netfilter/nf_log.h
index 8be21e02387d..d0b5fa91ff54 100644
--- a/include/uapi/linux/netfilter/nf_log.h
+++ b/include/uapi/linux/netfilter/nf_log.h
@@ -9,4 +9,6 @@
 #define NF_LOG_MACDECODE	0x20	/* Decode MAC header */
 #define NF_LOG_MASK		0x2f
 
+#define NF_LOG_PREFIXLEN	128
+
 #endif /* _NETFILTER_NF_LOG_H */
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 3dca90dc24ad..ffb9e8ada899 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -13,7 +13,6 @@
 /* Internal logging interface, which relies on the real
    LOG target modules */
 
-#define NF_LOG_PREFIXLEN		128
 #define NFLOGGER_NAME_LEN		64
 
 static struct nf_logger __rcu *loggers[NFPROTO_NUMPROTO][NF_LOG_TYPE_MAX] __read_mostly;
diff --git a/net/netfilter/nft_log.c b/net/netfilter/nft_log.c
index 6271e40a3dd6..6f6e64423643 100644
--- a/net/netfilter/nft_log.c
+++ b/net/netfilter/nft_log.c
@@ -39,7 +39,8 @@ static void nft_log_eval(const struct nft_expr *expr,
 
 static const struct nla_policy nft_log_policy[NFTA_LOG_MAX + 1] = {
 	[NFTA_LOG_GROUP]	= { .type = NLA_U16 },
-	[NFTA_LOG_PREFIX]	= { .type = NLA_STRING },
+	[NFTA_LOG_PREFIX]	= { .type = NLA_STRING,
+				    .len = NF_LOG_PREFIXLEN - 1 },
 	[NFTA_LOG_SNAPLEN]	= { .type = NLA_U32 },
 	[NFTA_LOG_QTHRESHOLD]	= { .type = NLA_U16 },
 	[NFTA_LOG_LEVEL]	= { .type = NLA_U32 },
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 12/14] netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (10 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 11/14] netfilter: nft_log: restrict the log prefix length to 127 Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 13/14] netfilter: nf_tables: deconstify walk callback function Pablo Neira Ayuso
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

If the element exists and no NLM_F_EXCL is specified, do not bump
set->nelems, otherwise we leak one set element slot. This problem
amplifies if the set is full since the abort path always decrements the
counter for the -ENFILE case too, giving one spare extra slot.

Fix this by moving set->nelems update to nft_add_set_elem() after
successful element insertion. Moreover, remove the element if the set is
full so there is no need to rely on the abort path to undo things
anymore.

Fixes: c016c7e45ddf ("netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertion")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index b84c7b25219b..831a9a16f563 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3745,10 +3745,18 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set,
 		goto err5;
 	}
 
+	if (set->size &&
+	    !atomic_add_unless(&set->nelems, 1, set->size + set->ndeact)) {
+		err = -ENFILE;
+		goto err6;
+	}
+
 	nft_trans_elem(trans) = elem;
 	list_add_tail(&trans->list, &ctx->net->nft.commit_list);
 	return 0;
 
+err6:
+	set->ops->remove(set, &elem);
 err5:
 	kfree(trans);
 err4:
@@ -3795,15 +3803,9 @@ static int nf_tables_newsetelem(struct net *net, struct sock *nlsk,
 		return -EBUSY;
 
 	nla_for_each_nested(attr, nla[NFTA_SET_ELEM_LIST_ELEMENTS], rem) {
-		if (set->size &&
-		    !atomic_add_unless(&set->nelems, 1, set->size + set->ndeact))
-			return -ENFILE;
-
 		err = nft_add_set_elem(&ctx, set, attr, nlh->nlmsg_flags);
-		if (err < 0) {
-			atomic_dec(&set->nelems);
+		if (err < 0)
 			break;
-		}
 	}
 	return err;
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 13/14] netfilter: nf_tables: deconstify walk callback function
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (11 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 12/14] netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 16:38 ` [PATCH 14/14] netfilter: nf_tables: bump set->ndeact on set flush Pablo Neira Ayuso
  2017-01-26 17:59 ` [PATCH 00/14] Netfilter fixes for net David Miller
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

The flush operation needs to modify set and element objects, so let's
deconstify this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |  6 +++---
 net/netfilter/nf_tables_api.c     | 24 ++++++++++++------------
 net/netfilter/nft_set_hash.c      |  2 +-
 net/netfilter/nft_set_rbtree.c    |  2 +-
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 924325c46aab..7dfdb517f0be 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -207,9 +207,9 @@ struct nft_set_iter {
 	unsigned int	skip;
 	int		err;
 	int		(*fn)(const struct nft_ctx *ctx,
-			      const struct nft_set *set,
+			      struct nft_set *set,
 			      const struct nft_set_iter *iter,
-			      const struct nft_set_elem *elem);
+			      struct nft_set_elem *elem);
 };
 
 /**
@@ -301,7 +301,7 @@ struct nft_set_ops {
 	void				(*remove)(const struct nft_set *set,
 						  const struct nft_set_elem *elem);
 	void				(*walk)(const struct nft_ctx *ctx,
-						const struct nft_set *set,
+						struct nft_set *set,
 						struct nft_set_iter *iter);
 
 	unsigned int			(*privsize)(const struct nlattr * const nla[]);
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 831a9a16f563..5bd0068320fb 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3087,9 +3087,9 @@ static int nf_tables_delset(struct net *net, struct sock *nlsk,
 }
 
 static int nf_tables_bind_check_setelem(const struct nft_ctx *ctx,
-					const struct nft_set *set,
+					struct nft_set *set,
 					const struct nft_set_iter *iter,
-					const struct nft_set_elem *elem)
+					struct nft_set_elem *elem)
 {
 	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	enum nft_registers dreg;
@@ -3308,9 +3308,9 @@ struct nft_set_dump_args {
 };
 
 static int nf_tables_dump_setelem(const struct nft_ctx *ctx,
-				  const struct nft_set *set,
+				  struct nft_set *set,
 				  const struct nft_set_iter *iter,
-				  const struct nft_set_elem *elem)
+				  struct nft_set_elem *elem)
 {
 	struct nft_set_dump_args *args;
 
@@ -3322,7 +3322,7 @@ static int nf_tables_dump_set(struct sk_buff *skb, struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);
 	u8 genmask = nft_genmask_cur(net);
-	const struct nft_set *set;
+	struct nft_set *set;
 	struct nft_set_dump_args args;
 	struct nft_ctx ctx;
 	struct nlattr *nla[NFTA_SET_ELEM_LIST_MAX + 1];
@@ -3890,9 +3890,9 @@ static int nft_del_setelem(struct nft_ctx *ctx, struct nft_set *set,
 }
 
 static int nft_flush_set(const struct nft_ctx *ctx,
-			 const struct nft_set *set,
+			 struct nft_set *set,
 			 const struct nft_set_iter *iter,
-			 const struct nft_set_elem *elem)
+			 struct nft_set_elem *elem)
 {
 	struct nft_trans *trans;
 	int err;
@@ -3907,8 +3907,8 @@ static int nft_flush_set(const struct nft_ctx *ctx,
 		goto err1;
 	}
 
-	nft_trans_elem_set(trans) = (struct nft_set *)set;
-	nft_trans_elem(trans) = *((struct nft_set_elem *)elem);
+	nft_trans_elem_set(trans) = set;
+	nft_trans_elem(trans) = *elem;
 	list_add_tail(&trans->list, &ctx->net->nft.commit_list);
 
 	return 0;
@@ -5019,9 +5019,9 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx,
 				 const struct nft_chain *chain);
 
 static int nf_tables_loop_check_setelem(const struct nft_ctx *ctx,
-					const struct nft_set *set,
+					struct nft_set *set,
 					const struct nft_set_iter *iter,
-					const struct nft_set_elem *elem)
+					struct nft_set_elem *elem)
 {
 	const struct nft_set_ext *ext = nft_set_elem_ext(set, elem->priv);
 	const struct nft_data *data;
@@ -5045,7 +5045,7 @@ static int nf_tables_check_loops(const struct nft_ctx *ctx,
 {
 	const struct nft_rule *rule;
 	const struct nft_expr *expr, *last;
-	const struct nft_set *set;
+	struct nft_set *set;
 	struct nft_set_binding *binding;
 	struct nft_set_iter iter;
 
diff --git a/net/netfilter/nft_set_hash.c b/net/netfilter/nft_set_hash.c
index 1e20e2bbb6d9..e36069fb76ae 100644
--- a/net/netfilter/nft_set_hash.c
+++ b/net/netfilter/nft_set_hash.c
@@ -212,7 +212,7 @@ static void nft_hash_remove(const struct nft_set *set,
 	rhashtable_remove_fast(&priv->ht, &he->node, nft_hash_params);
 }
 
-static void nft_hash_walk(const struct nft_ctx *ctx, const struct nft_set *set,
+static void nft_hash_walk(const struct nft_ctx *ctx, struct nft_set *set,
 			  struct nft_set_iter *iter)
 {
 	struct nft_hash *priv = nft_set_priv(set);
diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 08376e50f6cd..f06f55ee516d 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -221,7 +221,7 @@ static void *nft_rbtree_deactivate(const struct net *net,
 }
 
 static void nft_rbtree_walk(const struct nft_ctx *ctx,
-			    const struct nft_set *set,
+			    struct nft_set *set,
 			    struct nft_set_iter *iter)
 {
 	const struct nft_rbtree *priv = nft_set_priv(set);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 14/14] netfilter: nf_tables: bump set->ndeact on set flush
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (12 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 13/14] netfilter: nf_tables: deconstify walk callback function Pablo Neira Ayuso
@ 2017-01-26 16:38 ` Pablo Neira Ayuso
  2017-01-26 17:59 ` [PATCH 00/14] Netfilter fixes for net David Miller
  14 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 16:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Add missing set->ndeact update on each deactivated element from the set
flush path. Otherwise, sets with fixed size break after flush since
accounting breaks.

 # nft add set x y { type ipv4_addr\; size 2\; }
 # nft add element x y { 1.1.1.1 }
 # nft add element x y { 1.1.1.2 }
 # nft flush set x y
 # nft add element x y { 1.1.1.1 }
 <cmdline>:1:1-28: Error: Could not process rule: Too many open files in system

Fixes: 8411b6442e59 ("netfilter: nf_tables: support for set flushing")
Reported-by: Elise Lennion <elise.lennion@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 5bd0068320fb..1b913760f205 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3906,6 +3906,7 @@ static int nft_flush_set(const struct nft_ctx *ctx,
 		err = -ENOENT;
 		goto err1;
 	}
+	set->ndeact++;
 
 	nft_trans_elem_set(trans) = set;
 	nft_trans_elem(trans) = *elem;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 00/14] Netfilter fixes for net
  2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
                   ` (13 preceding siblings ...)
  2017-01-26 16:38 ` [PATCH 14/14] netfilter: nf_tables: bump set->ndeact on set flush Pablo Neira Ayuso
@ 2017-01-26 17:59 ` David Miller
  14 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2017-01-26 17:59 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Thu, 26 Jan 2017 17:37:53 +0100

> The following patchset contains a large batch with Netfilter fixes for
> your net tree, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled
  2017-01-26 16:37 ` [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled Pablo Neira Ayuso
@ 2017-01-26 18:02   ` Eric Dumazet
  2017-01-26 19:19     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2017-01-26 18:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Thu, 2017-01-26 at 17:37 +0100, Pablo Neira Ayuso wrote:
> From: Pau Espin Pedrol <pespin.shar@gmail.com>
> 
> Otherwise, RST packets generated by the TCP stack for non-existing
> sockets always have mark 0.
> The mark from the original packet is assigned to the netns_ipv4/6
> socket used to send the response so that it can get copied into the
> response skb when the socket sends it.
> 
> Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
>  net/ipv4/ip_output.c | 1 +
>  net/ipv6/tcp_ipv6.c  | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index fac275c48108..b67719f45953 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1629,6 +1629,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
>  	sk->sk_protocol = ip_hdr(skb)->protocol;
>  	sk->sk_bound_dev_if = arg->bound_dev_if;
>  	sk->sk_sndbuf = sysctl_wmem_default;
> +	sk->sk_mark = fl4.flowi4_mark;
>  	err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
>  			     len, 0, &ipc, &rt, MSG_DONTWAIT);
>  	if (unlikely(err)) {
> diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> index 73bc8fc68acd..2b20622a5824 100644
> --- a/net/ipv6/tcp_ipv6.c
> +++ b/net/ipv6/tcp_ipv6.c
> @@ -840,6 +840,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
>  	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
>  	if (!IS_ERR(dst)) {
>  		skb_dst_set(buff, dst);
> +		ctl_sk->sk_mark = fl6.flowi6_mark;
>  		ip6_xmit(ctl_sk, buff, &fl6, NULL, tclass);
>  		TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
>  		if (rst)


This patch is wrong.

ctl_sk is a shared socket, and tcp_v6_send_response() can be called from
many different cpus at the same time.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled
  2017-01-26 18:02   ` Eric Dumazet
@ 2017-01-26 19:19     ` Pablo Neira Ayuso
  2017-01-26 19:28       ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-01-26 19:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netfilter-devel, davem, netdev

On Thu, Jan 26, 2017 at 10:02:40AM -0800, Eric Dumazet wrote:
> On Thu, 2017-01-26 at 17:37 +0100, Pablo Neira Ayuso wrote:
> > From: Pau Espin Pedrol <pespin.shar@gmail.com>
> > 
> > Otherwise, RST packets generated by the TCP stack for non-existing
> > sockets always have mark 0.
> > The mark from the original packet is assigned to the netns_ipv4/6
> > socket used to send the response so that it can get copied into the
> > response skb when the socket sends it.
> > 
> > Fixes: e110861f8609 ("net: add a sysctl to reflect the fwmark on replies")
> > Cc: Lorenzo Colitti <lorenzo@google.com>
> > Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> >  net/ipv4/ip_output.c | 1 +
> >  net/ipv6/tcp_ipv6.c  | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index fac275c48108..b67719f45953 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -1629,6 +1629,7 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb,
> >  	sk->sk_protocol = ip_hdr(skb)->protocol;
> >  	sk->sk_bound_dev_if = arg->bound_dev_if;
> >  	sk->sk_sndbuf = sysctl_wmem_default;
> > +	sk->sk_mark = fl4.flowi4_mark;
> >  	err = ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base,
> >  			     len, 0, &ipc, &rt, MSG_DONTWAIT);
> >  	if (unlikely(err)) {
> > diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
> > index 73bc8fc68acd..2b20622a5824 100644
> > --- a/net/ipv6/tcp_ipv6.c
> > +++ b/net/ipv6/tcp_ipv6.c
> > @@ -840,6 +840,7 @@ static void tcp_v6_send_response(const struct sock *sk, struct sk_buff *skb, u32
> >  	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
> >  	if (!IS_ERR(dst)) {
> >  		skb_dst_set(buff, dst);
> > +		ctl_sk->sk_mark = fl6.flowi6_mark;
> >  		ip6_xmit(ctl_sk, buff, &fl6, NULL, tclass);
> >  		TCP_INC_STATS(net, TCP_MIB_OUTSEGS);
> >  		if (rst)
> 
> 
> This patch is wrong.
> 
> ctl_sk is a shared socket, and tcp_v6_send_response() can be called from
> many different cpus at the same time.

Right. This is not percpu as in IPv4.

I can send a follow up patch to get this in sync with the way we do it
in IPv4, ie. add percpu socket.

Fine with this approach? Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled
  2017-01-26 19:19     ` Pablo Neira Ayuso
@ 2017-01-26 19:28       ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2017-01-26 19:28 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel, davem, netdev

On Thu, 2017-01-26 at 20:19 +0100, Pablo Neira Ayuso wrote:

> Right. This is not percpu as in IPv4.
> 
> I can send a follow up patch to get this in sync with the way we do it
> in IPv4, ie. add percpu socket.
> 
> Fine with this approach? Thanks!

Not really.

percpu sockets are going to slow down network namespace creation /
deletion and increase memory foot print.

IPv6 is cleaner because it does not really have to use different
sockets.

Ultimately would would like to have the same for IPv4.

I would rather carry the mark either in an additional parameter,
or in the flow that is already passed as a parameter.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux
  2017-01-26 16:38 ` [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux Pablo Neira Ayuso
@ 2017-01-27 16:51   ` Nicolas Dichtel
  2017-03-01 15:02     ` Nicolas Dichtel
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolas Dichtel @ 2017-01-27 16:51 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: davem, netdev

Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit :
> From: Florian Westphal <fw@strlen.de>
> 
> This further refines the changes made to conntrack gc_worker in
> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
> 
> The main idea of that change was to reduce the scan interval when evictions
> take place.
> 
> However, on the reporters' setup, there are 1-2 million conntrack entries
> in total and roughly 8k new (and closing) connections per second.
> 
> In this case we'll always evict at least one entry per gc cycle and scan
> interval is always at 1 jiffy because of this test:
> 
>  } else if (expired_count) {
>      gc_work->next_gc_run /= 2U;
>      next_run = msecs_to_jiffies(1);
> 
> being true almost all the time.
> 
> Given we scan ~10k entries per run its clearly wrong to reduce interval
> based on nonzero eviction count, it will only waste cpu cycles since a vast
> majorities of conntracks are not timed out.
> 
> Thus only look at the ratio (scanned entries vs. evicted entries) to make
> a decision on whether to reduce or not.
> 
> Because evictor is supposed to only kick in when system turns idle after
> a busy period, pick a high ratio -- this makes it 50%.  We thus keep
> the idea of increasing scan rate when its likely that table contains many
> expired entries.
> 
> In order to not let timed-out entries hang around for too long
> (important when using event logging, in which case we want to timely
> destroy events), we now scan the full table within at most
> GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
> timed-out entries sit in same slot.
> 
> I tested this with a vm under synflood (with
> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
> 
> While flood is ongoing, interval now stays at its max rate
> (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
> 
> With feedback from Nicolas Dichtel.
> 
> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9
stable?


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux
  2017-01-27 16:51   ` Nicolas Dichtel
@ 2017-03-01 15:02     ` Nicolas Dichtel
  2017-03-01 15:38       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 22+ messages in thread
From: Nicolas Dichtel @ 2017-03-01 15:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso, netfilter-devel; +Cc: davem, netdev

Le 27/01/2017 à 17:51, Nicolas Dichtel a écrit :
> Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit :
>> From: Florian Westphal <fw@strlen.de>
>>
>> This further refines the changes made to conntrack gc_worker in
>> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
>>
>> The main idea of that change was to reduce the scan interval when evictions
>> take place.
>>
>> However, on the reporters' setup, there are 1-2 million conntrack entries
>> in total and roughly 8k new (and closing) connections per second.
>>
>> In this case we'll always evict at least one entry per gc cycle and scan
>> interval is always at 1 jiffy because of this test:
>>
>>  } else if (expired_count) {
>>      gc_work->next_gc_run /= 2U;
>>      next_run = msecs_to_jiffies(1);
>>
>> being true almost all the time.
>>
>> Given we scan ~10k entries per run its clearly wrong to reduce interval
>> based on nonzero eviction count, it will only waste cpu cycles since a vast
>> majorities of conntracks are not timed out.
>>
>> Thus only look at the ratio (scanned entries vs. evicted entries) to make
>> a decision on whether to reduce or not.
>>
>> Because evictor is supposed to only kick in when system turns idle after
>> a busy period, pick a high ratio -- this makes it 50%.  We thus keep
>> the idea of increasing scan rate when its likely that table contains many
>> expired entries.
>>
>> In order to not let timed-out entries hang around for too long
>> (important when using event logging, in which case we want to timely
>> destroy events), we now scan the full table within at most
>> GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
>> timed-out entries sit in same slot.
>>
>> I tested this with a vm under synflood (with
>> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
>>
>> While flood is ongoing, interval now stays at its max rate
>> (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
>>
>> With feedback from Nicolas Dichtel.
>>
>> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
>> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9
> stable?

Pablo, should I submit it formally?


Regards,
Nicolas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux
  2017-03-01 15:02     ` Nicolas Dichtel
@ 2017-03-01 15:38       ` Pablo Neira Ayuso
  0 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2017-03-01 15:38 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: netfilter-devel, davem, netdev

On Wed, Mar 01, 2017 at 04:02:53PM +0100, Nicolas Dichtel wrote:
> Le 27/01/2017 à 17:51, Nicolas Dichtel a écrit :
> > Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit :
> >> From: Florian Westphal <fw@strlen.de>
> >>
> >> This further refines the changes made to conntrack gc_worker in
> >> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics").
> >>
> >> The main idea of that change was to reduce the scan interval when evictions
> >> take place.
> >>
> >> However, on the reporters' setup, there are 1-2 million conntrack entries
> >> in total and roughly 8k new (and closing) connections per second.
> >>
> >> In this case we'll always evict at least one entry per gc cycle and scan
> >> interval is always at 1 jiffy because of this test:
> >>
> >>  } else if (expired_count) {
> >>      gc_work->next_gc_run /= 2U;
> >>      next_run = msecs_to_jiffies(1);
> >>
> >> being true almost all the time.
> >>
> >> Given we scan ~10k entries per run its clearly wrong to reduce interval
> >> based on nonzero eviction count, it will only waste cpu cycles since a vast
> >> majorities of conntracks are not timed out.
> >>
> >> Thus only look at the ratio (scanned entries vs. evicted entries) to make
> >> a decision on whether to reduce or not.
> >>
> >> Because evictor is supposed to only kick in when system turns idle after
> >> a busy period, pick a high ratio -- this makes it 50%.  We thus keep
> >> the idea of increasing scan rate when its likely that table contains many
> >> expired entries.
> >>
> >> In order to not let timed-out entries hang around for too long
> >> (important when using event logging, in which case we want to timely
> >> destroy events), we now scan the full table within at most
> >> GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all
> >> timed-out entries sit in same slot.
> >>
> >> I tested this with a vm under synflood (with
> >> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3).
> >>
> >> While flood is ongoing, interval now stays at its max rate
> >> (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms).
> >>
> >> With feedback from Nicolas Dichtel.
> >>
> >> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> >> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> >> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries")
> >> Signed-off-by: Florian Westphal <fw@strlen.de>
> >> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> >> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> >> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> >> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9
> > stable?
> 
> Pablo, should I submit it formally?

Just requested this to Greg, sorry this didn't happen so far.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-03-01 15:38 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-26 16:37 [PATCH 00/14] Netfilter fixes for net Pablo Neira Ayuso
2017-01-26 16:37 ` [PATCH 01/14] netfilter: use fwmark_reflect in nf_send_reset Pablo Neira Ayuso
2017-01-26 16:37 ` [PATCH 02/14] tcp: fix mark propagation with fwmark_reflect enabled Pablo Neira Ayuso
2017-01-26 18:02   ` Eric Dumazet
2017-01-26 19:19     ` Pablo Neira Ayuso
2017-01-26 19:28       ` Eric Dumazet
2017-01-26 16:37 ` [PATCH 03/14] netfilter: nf_tables: fix spelling mistakes Pablo Neira Ayuso
2017-01-26 16:37 ` [PATCH 04/14] netfilter: rpfilter: fix incorrect loopback packet judgment Pablo Neira Ayuso
2017-01-26 16:37 ` [PATCH 05/14] netfilter: nf_tables: fix possible oops when dumping stateful objects Pablo Neira Ayuso
2017-01-26 16:37 ` [PATCH 06/14] netfilter: Fix typo in NF_CONNTRACK Kconfig option description Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 07/14] netfilter: ipt_CLUSTERIP: fix build error without procfs Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 08/14] netfilter: conntrack: remove GC_MAX_EVICTS break Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 09/14] netfilter: conntrack: refine gc worker heuristics, redux Pablo Neira Ayuso
2017-01-27 16:51   ` Nicolas Dichtel
2017-03-01 15:02     ` Nicolas Dichtel
2017-03-01 15:38       ` Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 10/14] netfilter: nf_tables: validate the name size when possible Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 11/14] netfilter: nft_log: restrict the log prefix length to 127 Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 12/14] netfilter: nf_tables: fix set->nelems counting with no NLM_F_EXCL Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 13/14] netfilter: nf_tables: deconstify walk callback function Pablo Neira Ayuso
2017-01-26 16:38 ` [PATCH 14/14] netfilter: nf_tables: bump set->ndeact on set flush Pablo Neira Ayuso
2017-01-26 17:59 ` [PATCH 00/14] Netfilter fixes for net David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.