netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling
@ 2015-07-14 15:51 Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 1/6] netfilter: xtables: compute exact size needed for jumpstack Florian Westphal
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet

During NFWS 2015 Eric Dumazet suggested various ideas to make
the xtables table traverser function setup less expensive.

In particular, the *_do_table functions keep track of the current
stack pointer.

It appears that we can simplify this to always start from 0
(therefore allowing us to avoid the save/restore) provided we make sure
that we use an alternate jump stack when we enter the traverser recursively
via TEE target.

This implements some of Erics ideas.

NOTE1: The last patch may break valid iptables rulesets.
Its the clasic question wheter we're willing to reject bizarre ruleset
or not.  If this patch is acceptable, we can avoid one more dereference
by using percpu allocation for the jumpstack as follow work.

If not, just ignore the last patch.

Florian Westphal (6):
      netfilter: xtables: compute exact size needed for jumpstack
      netfilter: move tee_active to core
      netfilter: xtables: don't save/restore jumpstack offset
      netfilter: add and use jump label for xt_tee
      netfilter: xtables: remove __pure annotation
      netfilter: xtables: add upper limit on call chain depth

 include/linux/netfilter.h          |   11 +++++
 include/linux/netfilter/x_tables.h |    8 +++-
 net/ipv4/netfilter/arp_tables.c    |   32 ++++++++---------
 net/ipv4/netfilter/ip_tables.c     |   68 +++++++++++++++++++++----------------
 net/ipv6/netfilter/ip6_tables.c    |   52 ++++++++++++++++------------
 net/netfilter/core.c               |    3 +
 net/netfilter/x_tables.c           |   31 +++++++++++-----
 net/netfilter/xt_TEE.c             |   15 ++++----
 8 files changed, 137 insertions(+), 83 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH -next v2 1/6] netfilter: xtables: compute exact size needed for jumpstack
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
@ 2015-07-14 15:51 ` Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 2/6] netfilter: move tee_active to core Florian Westphal
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

The {arp,ip,ip6tables} jump stack is currently sized based
on the number of user chains.

However, its rather unlikely that every user defined chain jumps to the
next, so lets use the existing loop detection logic to also track the
chain depths.

The stacksize is then set to the largest chain depth seen.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 No changes since v1.

 net/ipv4/netfilter/arp_tables.c | 19 ++++++++++++-------
 net/ipv4/netfilter/ip_tables.c  | 28 ++++++++++++++++++----------
 net/ipv6/netfilter/ip6_tables.c | 23 +++++++++++++++--------
 net/netfilter/x_tables.c        |  4 ++++
 4 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 92305a1..ae6d0a1 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -372,10 +372,13 @@ static inline bool unconditional(const struct arpt_arp *arp)
 
 /* Figures out from what hook each rule can be called: returns 0 if
  * there are loops.  Puts hook bitmask in comefrom.
+ *
+ * Keeps track of largest call depth seen and stores it in newinfo->stacksize.
  */
-static int mark_source_chains(const struct xt_table_info *newinfo,
+static int mark_source_chains(struct xt_table_info *newinfo,
 			      unsigned int valid_hooks, void *entry0)
 {
+	unsigned int calldepth, max_calldepth = 0;
 	unsigned int hook;
 
 	/* No recursion; use packet counter to save back ptrs (reset
@@ -391,6 +394,7 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 
 		/* Set initial back pointer. */
 		e->counters.pcnt = pos;
+		calldepth = 0;
 
 		for (;;) {
 			const struct xt_standard_target *t
@@ -445,6 +449,8 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 					(entry0 + pos + size);
 				e->counters.pcnt = pos;
 				pos += size;
+				if (calldepth > 0)
+					--calldepth;
 			} else {
 				int newpos = t->verdict;
 
@@ -459,6 +465,10 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 						return 0;
 					}
 
+					if (entry0 + newpos != arpt_next_entry(e) &&
+					    ++calldepth > max_calldepth)
+						max_calldepth = calldepth;
+
 					/* This a jump; chase it. */
 					duprintf("Jump rule %u -> %u\n",
 						 pos, newpos);
@@ -475,6 +485,7 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 		next:
 		duprintf("Finished chain %u\n", hook);
 	}
+	newinfo->stacksize = max_calldepth;
 	return 1;
 }
 
@@ -664,9 +675,6 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
-		if (strcmp(arpt_get_target(iter)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1439,9 +1447,6 @@ static int translate_compat_table(const char *name,
 			break;
 		}
 		++i;
-		if (strcmp(arpt_get_target(iter1)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	if (ret) {
 		/*
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 6c72fbb..5e44b35 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -439,11 +439,15 @@ ipt_do_table(struct sk_buff *skb,
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
-   there are loops.  Puts hook bitmask in comefrom. */
+ * there are loops.  Puts hook bitmask in comefrom.
+ *
+ * Keeps track of largest call depth seen and stores it in newinfo->stacksize.
+ */
 static int
-mark_source_chains(const struct xt_table_info *newinfo,
+mark_source_chains(struct xt_table_info *newinfo,
 		   unsigned int valid_hooks, void *entry0)
 {
+	unsigned int calldepth, max_calldepth = 0;
 	unsigned int hook;
 
 	/* No recursion; use packet counter to save back ptrs (reset
@@ -457,6 +461,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 
 		/* Set initial back pointer. */
 		e->counters.pcnt = pos;
+		calldepth = 0;
 
 		for (;;) {
 			const struct xt_standard_target *t
@@ -518,6 +523,9 @@ mark_source_chains(const struct xt_table_info *newinfo,
 					(entry0 + pos + size);
 				e->counters.pcnt = pos;
 				pos += size;
+				WARN_ON_ONCE(calldepth == 0);
+				if (calldepth > 0)
+					--calldepth;
 			} else {
 				int newpos = t->verdict;
 
@@ -531,9 +539,14 @@ mark_source_chains(const struct xt_table_info *newinfo,
 								newpos);
 						return 0;
 					}
+					if (entry0 + newpos != ipt_next_entry(e) &&
+					    !(e->ip.flags & IPT_F_GOTO) &&
+					    ++calldepth > max_calldepth)
+						max_calldepth = calldepth;
+
 					/* This a jump; chase it. */
-					duprintf("Jump rule %u -> %u\n",
-						 pos, newpos);
+					duprintf("Jump rule %u -> %u, calldepth %d\n",
+						 pos, newpos, calldepth);
 				} else {
 					/* ... this is a fallthru */
 					newpos = pos + e->next_offset;
@@ -547,6 +560,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 		next:
 		duprintf("Finished chain %u\n", hook);
 	}
+	newinfo->stacksize = max_calldepth;
 	return 1;
 }
 
@@ -826,9 +840,6 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
-		if (strcmp(ipt_get_target(iter)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -1744,9 +1755,6 @@ translate_compat_table(struct net *net,
 		if (ret != 0)
 			break;
 		++i;
-		if (strcmp(ipt_get_target(iter1)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	if (ret) {
 		/*
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 3c35ced..baf0321 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -452,11 +452,15 @@ ip6t_do_table(struct sk_buff *skb,
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
-   there are loops.  Puts hook bitmask in comefrom. */
+ * there are loops.  Puts hook bitmask in comefrom.
+ *
+ * Keeps track of largest call depth seen and stores it in newinfo->stacksize.
+ */
 static int
-mark_source_chains(const struct xt_table_info *newinfo,
+mark_source_chains(struct xt_table_info *newinfo,
 		   unsigned int valid_hooks, void *entry0)
 {
+	unsigned int calldepth, max_calldepth = 0;
 	unsigned int hook;
 
 	/* No recursion; use packet counter to save back ptrs (reset
@@ -470,6 +474,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 
 		/* Set initial back pointer. */
 		e->counters.pcnt = pos;
+		calldepth = 0;
 
 		for (;;) {
 			const struct xt_standard_target *t
@@ -531,6 +536,8 @@ mark_source_chains(const struct xt_table_info *newinfo,
 					(entry0 + pos + size);
 				e->counters.pcnt = pos;
 				pos += size;
+				if (calldepth > 0)
+					--calldepth;
 			} else {
 				int newpos = t->verdict;
 
@@ -544,6 +551,11 @@ mark_source_chains(const struct xt_table_info *newinfo,
 								newpos);
 						return 0;
 					}
+					if (entry0 + newpos != ip6t_next_entry(e) &&
+					    !(e->ipv6.flags & IP6T_F_GOTO) &&
+					    ++calldepth > max_calldepth)
+						max_calldepth = calldepth;
+
 					/* This a jump; chase it. */
 					duprintf("Jump rule %u -> %u\n",
 						 pos, newpos);
@@ -560,6 +572,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 		next:
 		duprintf("Finished chain %u\n", hook);
 	}
+	newinfo->stacksize = max_calldepth;
 	return 1;
 }
 
@@ -839,9 +852,6 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
-		if (strcmp(ip6t_get_target(iter)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -1754,9 +1764,6 @@ translate_compat_table(struct net *net,
 		if (ret != 0)
 			break;
 		++i;
-		if (strcmp(ip6t_get_target(iter1)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	if (ret) {
 		/*
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d324fe7..4db7d60 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -749,6 +749,10 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	if (i->jumpstack == NULL)
 		return -ENOMEM;
 
+	/* ruleset without jumps -- no stack needed */
+	if (i->stacksize == 0)
+		return 0;
+
 	i->stacksize *= xt_jumpstack_multiplier;
 	size = sizeof(void *) * i->stacksize;
 	for_each_possible_cpu(cpu) {
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next v2 2/6] netfilter: move tee_active to core
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 1/6] netfilter: xtables: compute exact size needed for jumpstack Florian Westphal
@ 2015-07-14 15:51 ` Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 3/6] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

This prepares for a TEE like expression in nftables.
We want to ensure only one duplicate is sent, so both will
use the same percpu variable to detect duplication.

The other use case is detection of recursive call to xtables, but since
we don't want dependency from nft to xtables core its put into core.c
instead of the x_tables core.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 No changes since v1.

 include/linux/netfilter.h | 11 +++++++++++
 net/netfilter/core.c      |  3 +++
 net/netfilter/xt_TEE.c    | 13 ++++++-------
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 00050df..4881d48 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -385,4 +385,15 @@ extern struct nfq_ct_hook __rcu *nfq_ct_hook;
 static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
 #endif
 
+/**
+ * nf_skb_duplicated - TEE target has sent a packet
+ *
+ * When a xtables target sends a packet, the OUTPUT and POSTROUTING
+ * hooks are traversed again, i.e. nft and xtables are invoked recursively.
+ *
+ * This is used by xtables TEE target to prevent the duplicated skb from
+ * being duplicated again.
+ */
+DECLARE_PER_CPU(bool, nf_skb_duplicated);
+
 #endif /*__LINUX_NETFILTER_H*/
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index a0e5497..254032d 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -34,6 +34,9 @@ EXPORT_SYMBOL(nf_afinfo);
 const struct nf_ipv6_ops __rcu *nf_ipv6_ops __read_mostly;
 EXPORT_SYMBOL_GPL(nf_ipv6_ops);
 
+DEFINE_PER_CPU(bool, nf_skb_duplicated);
+EXPORT_SYMBOL_GPL(nf_skb_duplicated);
+
 int nf_register_afinfo(const struct nf_afinfo *afinfo)
 {
 	mutex_lock(&afinfo_mutex);
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index a747eb4..8950e79 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -37,7 +37,6 @@ struct xt_tee_priv {
 };
 
 static const union nf_inet_addr tee_zero_address;
-static DEFINE_PER_CPU(bool, tee_active);
 
 static struct net *pick_net(struct sk_buff *skb)
 {
@@ -88,7 +87,7 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param *par)
 	const struct xt_tee_tginfo *info = par->targinfo;
 	struct iphdr *iph;
 
-	if (__this_cpu_read(tee_active))
+	if (__this_cpu_read(nf_skb_duplicated))
 		return XT_CONTINUE;
 	/*
 	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
@@ -125,9 +124,9 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param *par)
 	ip_send_check(iph);
 
 	if (tee_tg_route4(skb, info)) {
-		__this_cpu_write(tee_active, true);
+		__this_cpu_write(nf_skb_duplicated, true);
 		ip_local_out(skb);
-		__this_cpu_write(tee_active, false);
+		__this_cpu_write(nf_skb_duplicated, false);
 	} else {
 		kfree_skb(skb);
 	}
@@ -170,7 +169,7 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tee_tginfo *info = par->targinfo;
 
-	if (__this_cpu_read(tee_active))
+	if (__this_cpu_read(nf_skb_duplicated))
 		return XT_CONTINUE;
 	skb = pskb_copy(skb, GFP_ATOMIC);
 	if (skb == NULL)
@@ -188,9 +187,9 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 		--iph->hop_limit;
 	}
 	if (tee_tg_route6(skb, info)) {
-		__this_cpu_write(tee_active, true);
+		__this_cpu_write(nf_skb_duplicated, true);
 		ip6_local_out(skb);
-		__this_cpu_write(tee_active, false);
+		__this_cpu_write(nf_skb_duplicated, false);
 	} else {
 		kfree_skb(skb);
 	}
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next v2 3/6] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 1/6] netfilter: xtables: compute exact size needed for jumpstack Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 2/6] netfilter: move tee_active to core Florian Westphal
@ 2015-07-14 15:51 ` Florian Westphal
  2015-07-14 15:51 ` [PATCH -next 4/6] netfilter: add and use jump label for xt_tee Florian Westphal
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

In most cases there is no reentrancy into ip/ip6tables.

For skbs sent by REJECT or SYNPROXY targets, there is one level
of reentrancy, but its not relevant as those targets issue an absolute
verdict, i.e. the jumpstack can be clobbered since its not used
after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).

So the only special case where it is relevant is the TEE target, which
returns XT_CONTINUE.

This patch changes ip(6)_do_table to always use the jump stack starting
from 0.

When we detect we're operating on an skb sent via TEE (percpu
nf_skb_duplicated is 1) we switch to an alternate stack to leave
the original one alone.

Since there is no TEE support for arptables, it doesn't need to
test if tee is active.

The jump stack overflow tests are no longer needed as well --
since ->stacksize is the largest call depth we cannot exceed it.

A much better alternative to the external jumpstack would be to just
declare a jumps[32] stack on the local stack frame, but that would mean
we'd have to reject iptables rulesets that used to work before.

Another alternative would be to start rejecting rulesets with a larger
call depth, e.g. 1000 -- in this case it would be feasible to allocate the
entire stack in the percpu area which would avoid one dereference.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Changes since v1:
 - fix typos in comments (Jan)

 include/linux/netfilter/x_tables.h |  1 -
 net/ipv4/netfilter/arp_tables.c    | 11 +++--------
 net/ipv4/netfilter/ip_tables.c     | 37 ++++++++++++++++++++-----------------
 net/ipv6/netfilter/ip6_tables.c    | 26 ++++++++++++++------------
 net/netfilter/x_tables.c           | 22 +++++++++++-----------
 5 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 286098a..1492845 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -222,7 +222,6 @@ struct xt_table_info {
 	 * @stacksize jumps (number of user chains) can possibly be made.
 	 */
 	unsigned int stacksize;
-	unsigned int __percpu *stackptr;
 	void ***jumpstack;
 
 	unsigned char entries[0] __aligned(8);
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index ae6d0a1..969fdbe 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -280,6 +280,9 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 	table_base = private->entries;
 	jumpstack  = (struct arpt_entry **)private->jumpstack[cpu];
 
+	/* No TEE support for arptables, so no need to switch to alternate
+	 * stack.  All targets that reenter must return absolute verdicts.
+	 */
 	e = get_entry(table_base, private->hook_entry[hook]);
 
 	acpar.in      = state->in;
@@ -325,11 +328,6 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 			}
 			if (table_base + v
 			    != arpt_next_entry(e)) {
-
-				if (stackidx >= private->stacksize) {
-					verdict = NF_DROP;
-					break;
-				}
 				jumpstack[stackidx++] = e;
 			}
 
@@ -337,9 +335,6 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 			continue;
 		}
 
-		/* Targets which reenter must return
-		 * abs. verdicts
-		 */
 		acpar.target   = t->u.kernel.target;
 		acpar.targinfo = t->data;
 		verdict = t->u.kernel.target->target(skb, &acpar);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 5e44b35..a2e4b01 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -296,12 +296,13 @@ ipt_do_table(struct sk_buff *skb,
 	const char *indev, *outdev;
 	const void *table_base;
 	struct ipt_entry *e, **jumpstack;
-	unsigned int *stackptr, origptr, cpu;
+	unsigned int stackidx, cpu;
 	const struct xt_table_info *private;
 	struct xt_action_param acpar;
 	unsigned int addend;
 
 	/* Initialization */
+	stackidx = 0;
 	ip = ip_hdr(skb);
 	indev = state->in ? state->in->name : nulldevname;
 	outdev = state->out ? state->out->name : nulldevname;
@@ -331,13 +332,20 @@ ipt_do_table(struct sk_buff *skb,
 	smp_read_barrier_depends();
 	table_base = private->entries;
 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
-	stackptr   = per_cpu_ptr(private->stackptr, cpu);
-	origptr    = *stackptr;
+
+	/* Switch to alternate jumpstack if we're being invoked via TEE.
+	 * TEE issues XT_CONTINUE verdict on original skb so we must not
+	 * clobber the jumpstack.
+	 *
+	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
+	 * but it is no problem since absolute verdict is issued by these.
+	 */
+	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	pr_debug("Entering %s(hook %u); sp at %u (UF %p)\n",
-		 table->name, hook, origptr,
+	pr_debug("Entering %s(hook %u), UF %p\n",
+		 table->name, hook,
 		 get_entry(table_base, private->underflow[hook]));
 
 	do {
@@ -383,28 +391,24 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned int)(-v) - 1;
 					break;
 				}
-				if (*stackptr <= origptr) {
+				if (stackidx == 0) {
 					e = get_entry(table_base,
 					    private->underflow[hook]);
 					pr_debug("Underflow (this is normal) "
 						 "to %p\n", e);
 				} else {
-					e = jumpstack[--*stackptr];
+					e = jumpstack[--stackidx];
 					pr_debug("Pulled %p out from pos %u\n",
-						 e, *stackptr);
+						 e, stackidx);
 					e = ipt_next_entry(e);
 				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				if (*stackptr >= private->stacksize) {
-					verdict = NF_DROP;
-					break;
-				}
-				jumpstack[(*stackptr)++] = e;
+				jumpstack[stackidx++] = e;
 				pr_debug("Pushed %p into pos %u\n",
-					 e, *stackptr - 1);
+					 e, stackidx - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -423,9 +427,8 @@ ipt_do_table(struct sk_buff *skb,
 			/* Verdict */
 			break;
 	} while (!acpar.hotdrop);
-	pr_debug("Exiting %s; resetting sp from %u to %u\n",
-		 __func__, *stackptr, origptr);
-	*stackptr = origptr;
+	pr_debug("Exiting %s; sp at %u\n", __func__, stackidx);
+
  	xt_write_recseq_end(addend);
  	local_bh_enable();
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index baf0321..531281f 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -324,12 +324,13 @@ ip6t_do_table(struct sk_buff *skb,
 	const char *indev, *outdev;
 	const void *table_base;
 	struct ip6t_entry *e, **jumpstack;
-	unsigned int *stackptr, origptr, cpu;
+	unsigned int stackidx, cpu;
 	const struct xt_table_info *private;
 	struct xt_action_param acpar;
 	unsigned int addend;
 
 	/* Initialization */
+	stackidx = 0;
 	indev = state->in ? state->in->name : nulldevname;
 	outdev = state->out ? state->out->name : nulldevname;
 	/* We handle fragments by dealing with the first fragment as
@@ -357,8 +358,15 @@ ip6t_do_table(struct sk_buff *skb,
 	cpu        = smp_processor_id();
 	table_base = private->entries;
 	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
-	stackptr   = per_cpu_ptr(private->stackptr, cpu);
-	origptr    = *stackptr;
+
+	/* Switch to alternate jumpstack if we're being invoked via TEE.
+	 * TEE issues XT_CONTINUE verdict on original skb so we must not
+	 * clobber the jumpstack.
+	 *
+	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
+	 * but it is no problem since absolute verdict is issued by these.
+	 */
+	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
@@ -406,20 +414,16 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned int)(-v) - 1;
 					break;
 				}
-				if (*stackptr <= origptr)
+				if (stackidx == 0)
 					e = get_entry(table_base,
 					    private->underflow[hook]);
 				else
-					e = ip6t_next_entry(jumpstack[--*stackptr]);
+					e = ip6t_next_entry(jumpstack[--stackidx]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				if (*stackptr >= private->stacksize) {
-					verdict = NF_DROP;
-					break;
-				}
-				jumpstack[(*stackptr)++] = e;
+				jumpstack[stackidx++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -437,8 +441,6 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!acpar.hotdrop);
 
-	*stackptr = origptr;
-
  	xt_write_recseq_end(addend);
  	local_bh_enable();
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 4db7d60..154447e 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -67,9 +67,6 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
-/* Allow this many total (re)entries. */
-static const unsigned int xt_jumpstack_multiplier = 2;
-
 /* Registration hooks for targets. */
 int xt_register_target(struct xt_target *target)
 {
@@ -688,8 +685,6 @@ void xt_free_table_info(struct xt_table_info *info)
 		kvfree(info->jumpstack);
 	}
 
-	free_percpu(info->stackptr);
-
 	kvfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -737,10 +732,6 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	unsigned int size;
 	int cpu;
 
-	i->stackptr = alloc_percpu(unsigned int);
-	if (i->stackptr == NULL)
-		return -ENOMEM;
-
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
 		i->jumpstack = vzalloc(size);
@@ -753,8 +744,17 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	if (i->stacksize == 0)
 		return 0;
 
-	i->stacksize *= xt_jumpstack_multiplier;
-	size = sizeof(void *) * i->stacksize;
+	/* Jumpstack needs to be able to record two full callchains, one
+	 * from the first rule set traversal, plus one table reentrancy
+	 * via -j TEE without clobbering the callchain that brought us to
+	 * TEE target.
+	 *
+	 * This is done by allocating two jumpstacks per cpu, on reentry
+	 * the upper half of the stack is used.
+	 *
+	 * see the jumpstack setup in ipt_do_table() for more details.
+	 */
+	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
 		if (size > PAGE_SIZE)
 			i->jumpstack[cpu] = vmalloc_node(size,
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next 4/6] netfilter: add and use jump label for xt_tee
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
                   ` (2 preceding siblings ...)
  2015-07-14 15:51 ` [PATCH -next v2 3/6] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
@ 2015-07-14 15:51 ` Florian Westphal
  2015-07-14 15:51 ` [PATCH -next 5/6] netfilter: xtables: remove __pure annotation Florian Westphal
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

Don't bother testing if we need to switch to alternate stack
unless TEE target is used.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v1 of this patch.

 include/linux/netfilter/x_tables.h | 7 +++++++
 net/ipv4/netfilter/ip_tables.c     | 3 ++-
 net/ipv6/netfilter/ip6_tables.c    | 3 ++-
 net/netfilter/x_tables.c           | 3 +++
 net/netfilter/xt_TEE.c             | 2 ++
 5 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 1492845..b006b71 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -3,6 +3,7 @@
 
 
 #include <linux/netdevice.h>
+#include <linux/static_key.h>
 #include <uapi/linux/netfilter/x_tables.h>
 
 /**
@@ -280,6 +281,12 @@ void xt_free_table_info(struct xt_table_info *info);
  */
 DECLARE_PER_CPU(seqcount_t, xt_recseq);
 
+/* xt_tee_enabled - true if x_tables needs to handle reentrancy
+ *
+ * Enabled if current ip(6)tables ruleset has at least one -j TEE rule.
+ */
+extern struct static_key xt_tee_enabled;
+
 /**
  * xt_write_recseq_begin - start of a write section
  *
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index a2e4b01..ff585bd 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -340,7 +340,8 @@ ipt_do_table(struct sk_buff *skb,
 	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
 	 * but it is no problem since absolute verdict is issued by these.
 	 */
-	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
+	if (static_key_false(&xt_tee_enabled))
+		jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 531281f..ea6d105 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -366,7 +366,8 @@ ip6t_do_table(struct sk_buff *skb,
 	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
 	 * but it is no problem since absolute verdict is issued by these.
 	 */
-	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
+	if (static_key_false(&xt_tee_enabled))
+		jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 154447e..9b42b5e 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -727,6 +727,9 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock);
 DEFINE_PER_CPU(seqcount_t, xt_recseq);
 EXPORT_PER_CPU_SYMBOL_GPL(xt_recseq);
 
+struct static_key xt_tee_enabled __read_mostly;
+EXPORT_SYMBOL_GPL(xt_tee_enabled);
+
 static int xt_jumpstack_alloc(struct xt_table_info *i)
 {
 	unsigned int size;
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 8950e79..c5d6556 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -251,6 +251,7 @@ static int tee_tg_check(const struct xt_tgchk_param *par)
 	} else
 		info->priv = NULL;
 
+	static_key_slow_inc(&xt_tee_enabled);
 	return 0;
 }
 
@@ -262,6 +263,7 @@ static void tee_tg_destroy(const struct xt_tgdtor_param *par)
 		unregister_netdevice_notifier(&info->priv->notifier);
 		kfree(info->priv);
 	}
+	static_key_slow_dec(&xt_tee_enabled);
 }
 
 static struct xt_target tee_tg_reg[] __read_mostly = {
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next 5/6] netfilter: xtables: remove __pure annotation
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
                   ` (3 preceding siblings ...)
  2015-07-14 15:51 ` [PATCH -next 4/6] netfilter: add and use jump label for xt_tee Florian Westphal
@ 2015-07-14 15:51 ` Florian Westphal
  2015-07-14 15:51 ` [PATCH -next v2 6/6] netfilter: xtables: add upper limit on call chain depth Florian Westphal
  2015-07-15 17:17 ` [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Pablo Neira Ayuso
  6 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

sparse complains:
ip_tables.c:361:27: warning: incorrect type in assignment (different modifiers)
ip_tables.c:361:27:    expected struct ipt_entry *[assigned] e
ip_tables.c:361:27:    got struct ipt_entry [pure] *

doesn't change generated code.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v1 of this patch.
 net/ipv4/netfilter/arp_tables.c | 2 +-
 net/ipv4/netfilter/ip_tables.c  | 2 +-
 net/ipv6/netfilter/ip6_tables.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 969fdbe..c416cb3 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -240,7 +240,7 @@ get_entry(const void *base, unsigned int offset)
 	return (struct arpt_entry *)(base + offset);
 }
 
-static inline __pure
+static inline
 struct arpt_entry *arpt_next_entry(const struct arpt_entry *entry)
 {
 	return (void *)entry + entry->next_offset;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index ff585bd..787f99e 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -276,7 +276,7 @@ static void trace_packet(const struct sk_buff *skb,
 }
 #endif
 
-static inline __pure
+static inline
 struct ipt_entry *ipt_next_entry(const struct ipt_entry *entry)
 {
 	return (void *)entry + entry->next_offset;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index ea6d105..4e21f80 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -305,7 +305,7 @@ static void trace_packet(const struct sk_buff *skb,
 }
 #endif
 
-static inline __pure struct ip6t_entry *
+static inline struct ip6t_entry *
 ip6t_next_entry(const struct ip6t_entry *entry)
 {
 	return (void *)entry + entry->next_offset;
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH -next v2 6/6] netfilter: xtables: add upper limit on call chain depth
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
                   ` (4 preceding siblings ...)
  2015-07-14 15:51 ` [PATCH -next 5/6] netfilter: xtables: remove __pure annotation Florian Westphal
@ 2015-07-14 15:51 ` Florian Westphal
  2015-07-15 17:17 ` [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Pablo Neira Ayuso
  6 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-14 15:51 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

1024 is a very aggressive limit -- it will most likely not break any
real-world ruleset, but it might break certain iptables test scripts out
there.

If we were to use this limit it becomes feasible to allocate jump stack
directly via a percpu allocation (16kbytes needed per cpu in that case).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 no changes since v1.

 net/netfilter/x_tables.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 9b42b5e..2be4f8e 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -39,6 +39,7 @@ MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>");
 MODULE_DESCRIPTION("{ip,ip6,arp,eb}_tables backend module");
 
 #define SMP_ALIGN(x) (((x) + SMP_CACHE_BYTES-1) & ~(SMP_CACHE_BYTES-1))
+#define XT_MAX_STACKSIZE 1024
 
 struct compat_delta {
 	unsigned int offset; /* offset in kernel */
@@ -735,6 +736,9 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	unsigned int size;
 	int cpu;
 
+	if (i->stacksize > XT_MAX_STACKSIZE)
+		return -ELOOP;
+
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
 		i->jumpstack = vzalloc(size);
-- 
2.0.5


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling
  2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
                   ` (5 preceding siblings ...)
  2015-07-14 15:51 ` [PATCH -next v2 6/6] netfilter: xtables: add upper limit on call chain depth Florian Westphal
@ 2015-07-15 17:17 ` Pablo Neira Ayuso
  2015-07-15 18:52   ` Florian Westphal
  6 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2015-07-15 17:17 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, eric.dumazet

On Tue, Jul 14, 2015 at 05:51:05PM +0200, Florian Westphal wrote:
> During NFWS 2015 Eric Dumazet suggested various ideas to make
> the xtables table traverser function setup less expensive.
> 
> In particular, the *_do_table functions keep track of the current
> stack pointer.
> 
> It appears that we can simplify this to always start from 0
> (therefore allowing us to avoid the save/restore) provided we make sure
> that we use an alternate jump stack when we enter the traverser recursively
> via TEE target.
> 
> This implements some of Erics ideas.

Series from 1 to 5 applied.

> NOTE1: The last patch may break valid iptables rulesets.
> Its the clasic question wheter we're willing to reject bizarre ruleset
> or not.  If this patch is acceptable, we can avoid one more dereference
> by using percpu allocation for the jumpstack as follow work.

If we take this patch into the tree, I'd wait for quite some time to
make sure nobody barfs to us with problems, so we can revert it.

Otherwise, the (unlikely) scenario would require several reverts in a
row, starting from this to further suggested enhancements, right?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling
  2015-07-15 17:17 ` [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Pablo Neira Ayuso
@ 2015-07-15 18:52   ` Florian Westphal
  0 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2015-07-15 18:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, eric.dumazet

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> If we take this patch into the tree, I'd wait for quite some time to
> make sure nobody barfs to us with problems, so we can revert it.
> 
> Otherwise, the (unlikely) scenario would require several reverts in a
> row, starting from this to further suggested enhancements, right?

Yes, thats exactly the problem.
So for example if we'd switch to allocating the jumpstack in the percpu
area (rather than just a pointer as we do now) we would need to revert
that since percpu allocations have sie limitations (64k max iirc).

HOWEVER, the real problem is -- if you apply this, how long would
you propose to wait before we would leverage the limitation in further
patches?

I'd say 3 years minimum, and I'd rather see nft ready by then :)

So I'd say drop patch 6.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-07-15 18:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 15:51 [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Florian Westphal
2015-07-14 15:51 ` [PATCH -next v2 1/6] netfilter: xtables: compute exact size needed for jumpstack Florian Westphal
2015-07-14 15:51 ` [PATCH -next v2 2/6] netfilter: move tee_active to core Florian Westphal
2015-07-14 15:51 ` [PATCH -next v2 3/6] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
2015-07-14 15:51 ` [PATCH -next 4/6] netfilter: add and use jump label for xt_tee Florian Westphal
2015-07-14 15:51 ` [PATCH -next 5/6] netfilter: xtables: remove __pure annotation Florian Westphal
2015-07-14 15:51 ` [PATCH -next v2 6/6] netfilter: xtables: add upper limit on call chain depth Florian Westphal
2015-07-15 17:17 ` [PATCH -next v2 0/6] netfilter: xtables: improve jumpstack handling Pablo Neira Ayuso
2015-07-15 18:52   ` Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).