netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 0/4] netfilter: xtables: improve jumpstack handling
@ 2015-07-08 21:15 Florian Westphal
  2015-07-08 21:15 ` [PATCH -next 1/4] xtables: compute exact size needed for jumpstack Florian Westphal
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Florian Westphal @ 2015-07-08 21:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet

During NFWS 2015 Eric Dumazet suggested various ideas to make
the xtables table traversers function setup less expensive.

In particular, the *_do_table functions keep track of the current
stack pointer.

It appears that we can simplify this to always start from 0
(therefore allowing us to avoid the save/restore) provided we make sure
that we use an alternate jump stack when we enter the traverser recursively
via TEE target.

This implements a few of Erics suggestions.

NOTE1: The last patch is only RFC material, see the patch description.

Its the clasic question wheter we're willing to reject bizarre ruleset
or not.  If this patch is acceptable, we can avoid one more dereference
by using percpu allocation for the jumpstack.

Florian Westphal (4):
      xtables: compute exact size needed for jumpstack
      netfilter: move tee_active to core
      netfilter: xtables: don't save/restore jumpstack offset
      netfilter: xtables: add upper limit on call chain depth

 include/linux/netfilter.h          |   11 ++++++
 include/linux/netfilter/x_tables.h |    1 
 net/ipv4/netfilter/arp_tables.c    |   31 +++++++++--------
 net/ipv4/netfilter/ip_tables.c     |   65 +++++++++++++++++++++----------------
 net/ipv6/netfilter/ip6_tables.c    |   49 ++++++++++++++++-----------
 net/netfilter/core.c               |    3 +
 net/netfilter/x_tables.c           |   30 ++++++++++-------
 net/netfilter/xt_TEE.c             |   13 +++----
 8 files changed, 122 insertions(+), 81 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH -next 1/4] xtables: compute exact size needed for jumpstack
  2015-07-08 21:15 [PATCH -next 0/4] netfilter: xtables: improve jumpstack handling Florian Westphal
@ 2015-07-08 21:15 ` Florian Westphal
  2015-07-08 21:15 ` [PATCH -next 2/4] netfilter: move tee_active to core Florian Westphal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2015-07-08 21:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

The {arp,ip,ip6tables} jump stack is currently sized based
on the number of user chains.

However, its rather unlikely that every user defined chain jumps to the
next, so lets use the existing loop detection logic to also track the
chain depths.

The stacksize is then set to the largest chain depth seen.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/ipv4/netfilter/arp_tables.c | 19 ++++++++++++-------
 net/ipv4/netfilter/ip_tables.c  | 28 ++++++++++++++++++----------
 net/ipv6/netfilter/ip6_tables.c | 23 +++++++++++++++--------
 net/netfilter/x_tables.c        |  4 ++++
 4 files changed, 49 insertions(+), 25 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 92305a1..ae6d0a1 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -372,10 +372,13 @@ static inline bool unconditional(const struct arpt_arp *arp)
 
 /* Figures out from what hook each rule can be called: returns 0 if
  * there are loops.  Puts hook bitmask in comefrom.
+ *
+ * Keeps track of largest call depth seen and stores it in newinfo->stacksize.
  */
-static int mark_source_chains(const struct xt_table_info *newinfo,
+static int mark_source_chains(struct xt_table_info *newinfo,
 			      unsigned int valid_hooks, void *entry0)
 {
+	unsigned int calldepth, max_calldepth = 0;
 	unsigned int hook;
 
 	/* No recursion; use packet counter to save back ptrs (reset
@@ -391,6 +394,7 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 
 		/* Set initial back pointer. */
 		e->counters.pcnt = pos;
+		calldepth = 0;
 
 		for (;;) {
 			const struct xt_standard_target *t
@@ -445,6 +449,8 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 					(entry0 + pos + size);
 				e->counters.pcnt = pos;
 				pos += size;
+				if (calldepth > 0)
+					--calldepth;
 			} else {
 				int newpos = t->verdict;
 
@@ -459,6 +465,10 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 						return 0;
 					}
 
+					if (entry0 + newpos != arpt_next_entry(e) &&
+					    ++calldepth > max_calldepth)
+						max_calldepth = calldepth;
+
 					/* This a jump; chase it. */
 					duprintf("Jump rule %u -> %u\n",
 						 pos, newpos);
@@ -475,6 +485,7 @@ static int mark_source_chains(const struct xt_table_info *newinfo,
 		next:
 		duprintf("Finished chain %u\n", hook);
 	}
+	newinfo->stacksize = max_calldepth;
 	return 1;
 }
 
@@ -664,9 +675,6 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
-		if (strcmp(arpt_get_target(iter)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1439,9 +1447,6 @@ static int translate_compat_table(const char *name,
 			break;
 		}
 		++i;
-		if (strcmp(arpt_get_target(iter1)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	if (ret) {
 		/*
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 6c72fbb..5e44b35 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -439,11 +439,15 @@ ipt_do_table(struct sk_buff *skb,
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
-   there are loops.  Puts hook bitmask in comefrom. */
+ * there are loops.  Puts hook bitmask in comefrom.
+ *
+ * Keeps track of largest call depth seen and stores it in newinfo->stacksize.
+ */
 static int
-mark_source_chains(const struct xt_table_info *newinfo,
+mark_source_chains(struct xt_table_info *newinfo,
 		   unsigned int valid_hooks, void *entry0)
 {
+	unsigned int calldepth, max_calldepth = 0;
 	unsigned int hook;
 
 	/* No recursion; use packet counter to save back ptrs (reset
@@ -457,6 +461,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 
 		/* Set initial back pointer. */
 		e->counters.pcnt = pos;
+		calldepth = 0;
 
 		for (;;) {
 			const struct xt_standard_target *t
@@ -518,6 +523,9 @@ mark_source_chains(const struct xt_table_info *newinfo,
 					(entry0 + pos + size);
 				e->counters.pcnt = pos;
 				pos += size;
+				WARN_ON_ONCE(calldepth == 0);
+				if (calldepth > 0)
+					--calldepth;
 			} else {
 				int newpos = t->verdict;
 
@@ -531,9 +539,14 @@ mark_source_chains(const struct xt_table_info *newinfo,
 								newpos);
 						return 0;
 					}
+					if (entry0 + newpos != ipt_next_entry(e) &&
+					    !(e->ip.flags & IPT_F_GOTO) &&
+					    ++calldepth > max_calldepth)
+						max_calldepth = calldepth;
+
 					/* This a jump; chase it. */
-					duprintf("Jump rule %u -> %u\n",
-						 pos, newpos);
+					duprintf("Jump rule %u -> %u, calldepth %d\n",
+						 pos, newpos, calldepth);
 				} else {
 					/* ... this is a fallthru */
 					newpos = pos + e->next_offset;
@@ -547,6 +560,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 		next:
 		duprintf("Finished chain %u\n", hook);
 	}
+	newinfo->stacksize = max_calldepth;
 	return 1;
 }
 
@@ -826,9 +840,6 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
-		if (strcmp(ipt_get_target(iter)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -1744,9 +1755,6 @@ translate_compat_table(struct net *net,
 		if (ret != 0)
 			break;
 		++i;
-		if (strcmp(ipt_get_target(iter1)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	if (ret) {
 		/*
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 3c35ced..baf0321 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -452,11 +452,15 @@ ip6t_do_table(struct sk_buff *skb,
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
-   there are loops.  Puts hook bitmask in comefrom. */
+ * there are loops.  Puts hook bitmask in comefrom.
+ *
+ * Keeps track of largest call depth seen and stores it in newinfo->stacksize.
+ */
 static int
-mark_source_chains(const struct xt_table_info *newinfo,
+mark_source_chains(struct xt_table_info *newinfo,
 		   unsigned int valid_hooks, void *entry0)
 {
+	unsigned int calldepth, max_calldepth = 0;
 	unsigned int hook;
 
 	/* No recursion; use packet counter to save back ptrs (reset
@@ -470,6 +474,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 
 		/* Set initial back pointer. */
 		e->counters.pcnt = pos;
+		calldepth = 0;
 
 		for (;;) {
 			const struct xt_standard_target *t
@@ -531,6 +536,8 @@ mark_source_chains(const struct xt_table_info *newinfo,
 					(entry0 + pos + size);
 				e->counters.pcnt = pos;
 				pos += size;
+				if (calldepth > 0)
+					--calldepth;
 			} else {
 				int newpos = t->verdict;
 
@@ -544,6 +551,11 @@ mark_source_chains(const struct xt_table_info *newinfo,
 								newpos);
 						return 0;
 					}
+					if (entry0 + newpos != ip6t_next_entry(e) &&
+					    !(e->ipv6.flags & IP6T_F_GOTO) &&
+					    ++calldepth > max_calldepth)
+						max_calldepth = calldepth;
+
 					/* This a jump; chase it. */
 					duprintf("Jump rule %u -> %u\n",
 						 pos, newpos);
@@ -560,6 +572,7 @@ mark_source_chains(const struct xt_table_info *newinfo,
 		next:
 		duprintf("Finished chain %u\n", hook);
 	}
+	newinfo->stacksize = max_calldepth;
 	return 1;
 }
 
@@ -839,9 +852,6 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
-		if (strcmp(ip6t_get_target(iter)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -1754,9 +1764,6 @@ translate_compat_table(struct net *net,
 		if (ret != 0)
 			break;
 		++i;
-		if (strcmp(ip6t_get_target(iter1)->u.user.name,
-		    XT_ERROR_TARGET) == 0)
-			++newinfo->stacksize;
 	}
 	if (ret) {
 		/*
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index d324fe7..4db7d60 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -749,6 +749,10 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	if (i->jumpstack == NULL)
 		return -ENOMEM;
 
+	/* ruleset without jumps -- no stack needed */
+	if (i->stacksize == 0)
+		return 0;
+
 	i->stacksize *= xt_jumpstack_multiplier;
 	size = sizeof(void *) * i->stacksize;
 	for_each_possible_cpu(cpu) {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next 2/4] netfilter: move tee_active to core
  2015-07-08 21:15 [PATCH -next 0/4] netfilter: xtables: improve jumpstack handling Florian Westphal
  2015-07-08 21:15 ` [PATCH -next 1/4] xtables: compute exact size needed for jumpstack Florian Westphal
@ 2015-07-08 21:15 ` Florian Westphal
  2015-07-08 21:15 ` [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
  2015-07-08 21:15 ` [PATCH RFC -next 4/4] netfilter: xtables: add upper limit on call chain depth Florian Westphal
  3 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2015-07-08 21:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

This prepares for a TEE like expression in nftables.
We want to ensure only one duplicate is sent, so both will
use the same percpu variable to detect duplication.

The other use case is detection of recursive call to xtables, but since
we don't want dependency from nft to xtables its put into core.c
instead of the x_tables core.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter.h | 11 +++++++++++
 net/netfilter/core.c      |  3 +++
 net/netfilter/xt_TEE.c    | 13 ++++++-------
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index 00050df..4881d48 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -385,4 +385,15 @@ extern struct nfq_ct_hook __rcu *nfq_ct_hook;
 static inline void nf_ct_attach(struct sk_buff *new, struct sk_buff *skb) {}
 #endif
 
+/**
+ * nf_skb_duplicated - TEE target has sent a packet
+ *
+ * When a xtables target sends a packet, the OUTPUT and POSTROUTING
+ * hooks are traversed again, i.e. nft and xtables are invoked recursively.
+ *
+ * This is used by xtables TEE target to prevent the duplicated skb from
+ * being duplicated again.
+ */
+DECLARE_PER_CPU(bool, nf_skb_duplicated);
+
 #endif /*__LINUX_NETFILTER_H*/
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index a0e5497..254032d 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -34,6 +34,9 @@ EXPORT_SYMBOL(nf_afinfo);
 const struct nf_ipv6_ops __rcu *nf_ipv6_ops __read_mostly;
 EXPORT_SYMBOL_GPL(nf_ipv6_ops);
 
+DEFINE_PER_CPU(bool, nf_skb_duplicated);
+EXPORT_SYMBOL_GPL(nf_skb_duplicated);
+
 int nf_register_afinfo(const struct nf_afinfo *afinfo)
 {
 	mutex_lock(&afinfo_mutex);
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index a747eb4..8950e79 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -37,7 +37,6 @@ struct xt_tee_priv {
 };
 
 static const union nf_inet_addr tee_zero_address;
-static DEFINE_PER_CPU(bool, tee_active);
 
 static struct net *pick_net(struct sk_buff *skb)
 {
@@ -88,7 +87,7 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param *par)
 	const struct xt_tee_tginfo *info = par->targinfo;
 	struct iphdr *iph;
 
-	if (__this_cpu_read(tee_active))
+	if (__this_cpu_read(nf_skb_duplicated))
 		return XT_CONTINUE;
 	/*
 	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
@@ -125,9 +124,9 @@ tee_tg4(struct sk_buff *skb, const struct xt_action_param *par)
 	ip_send_check(iph);
 
 	if (tee_tg_route4(skb, info)) {
-		__this_cpu_write(tee_active, true);
+		__this_cpu_write(nf_skb_duplicated, true);
 		ip_local_out(skb);
-		__this_cpu_write(tee_active, false);
+		__this_cpu_write(nf_skb_duplicated, false);
 	} else {
 		kfree_skb(skb);
 	}
@@ -170,7 +169,7 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 {
 	const struct xt_tee_tginfo *info = par->targinfo;
 
-	if (__this_cpu_read(tee_active))
+	if (__this_cpu_read(nf_skb_duplicated))
 		return XT_CONTINUE;
 	skb = pskb_copy(skb, GFP_ATOMIC);
 	if (skb == NULL)
@@ -188,9 +187,9 @@ tee_tg6(struct sk_buff *skb, const struct xt_action_param *par)
 		--iph->hop_limit;
 	}
 	if (tee_tg_route6(skb, info)) {
-		__this_cpu_write(tee_active, true);
+		__this_cpu_write(nf_skb_duplicated, true);
 		ip6_local_out(skb);
-		__this_cpu_write(tee_active, false);
+		__this_cpu_write(nf_skb_duplicated, false);
 	} else {
 		kfree_skb(skb);
 	}
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-08 21:15 [PATCH -next 0/4] netfilter: xtables: improve jumpstack handling Florian Westphal
  2015-07-08 21:15 ` [PATCH -next 1/4] xtables: compute exact size needed for jumpstack Florian Westphal
  2015-07-08 21:15 ` [PATCH -next 2/4] netfilter: move tee_active to core Florian Westphal
@ 2015-07-08 21:15 ` Florian Westphal
  2015-07-08 21:47   ` Jan Engelhardt
  2015-07-09  7:54   ` Eric Dumazet
  2015-07-08 21:15 ` [PATCH RFC -next 4/4] netfilter: xtables: add upper limit on call chain depth Florian Westphal
  3 siblings, 2 replies; 10+ messages in thread
From: Florian Westphal @ 2015-07-08 21:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

In most cases there is no reentrancy into ip/ip6tables.

For skbs sent by REJECT or SYNPROXY targets, there is one level
of reentrancy, but its not relevant as those targets issue an absolute
verdict, i.e. the jumpstack can be clobbered since its not used
after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc).

So the only special case where it is relevant is the TEE target, which
returns XT_CONTINUE.

This patch changes arp, ip, and ip6_do_table to always use the
jump stack starting from 0.

When we detect we're operating on an skb sent via TEE (percpu
nf_skb_duplicated is 1) we switch to an alternate stack to leave
the original one alone.

Since there is no TEE support for arptables, it doesn't need to
test if tee is active.

The jump stack overflow tests are no longer needed as well --
since ->stacksize is the largest call depth we cannot exceed it.

A much better alternative to the external jumpstack would be to just
declare a jumps[32] stack on the local stack frame, but that would mean
we'd have to reject iptables rulesets that used to work before.

Another alternative would be to start rejecting rulesets with a larger
call depth, e.g. 1000 -- in this case it would be feasible to allocate the
entire stack in the percpu area which would avoid one dereference.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter/x_tables.h |  1 -
 net/ipv4/netfilter/arp_tables.c    | 12 ++++--------
 net/ipv4/netfilter/ip_tables.c     | 37 ++++++++++++++++++++-----------------
 net/ipv6/netfilter/ip6_tables.c    | 26 ++++++++++++++------------
 net/netfilter/x_tables.c           | 22 +++++++++++-----------
 5 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 286098a..1492845 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -222,7 +222,6 @@ struct xt_table_info {
 	 * @stacksize jumps (number of user chains) can possibly be made.
 	 */
 	unsigned int stacksize;
-	unsigned int __percpu *stackptr;
 	void ***jumpstack;
 
 	unsigned char entries[0] __aligned(8);
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index ae6d0a1..0507ea6 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -280,6 +280,10 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 	table_base = private->entries;
 	jumpstack  = (struct arpt_entry **)private->jumpstack[cpu];
 
+	/* No TEE support for arptables, so no need to switch to alternate
+	 * stack.  All targets that reenter must return absolte verdicts.
+	 */
+
 	e = get_entry(table_base, private->hook_entry[hook]);
 
 	acpar.in      = state->in;
@@ -325,11 +329,6 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 			}
 			if (table_base + v
 			    != arpt_next_entry(e)) {
-
-				if (stackidx >= private->stacksize) {
-					verdict = NF_DROP;
-					break;
-				}
 				jumpstack[stackidx++] = e;
 			}
 
@@ -337,9 +336,6 @@ unsigned int arpt_do_table(struct sk_buff *skb,
 			continue;
 		}
 
-		/* Targets which reenter must return
-		 * abs. verdicts
-		 */
 		acpar.target   = t->u.kernel.target;
 		acpar.targinfo = t->data;
 		verdict = t->u.kernel.target->target(skb, &acpar);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 5e44b35..29d47d7 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -296,12 +296,13 @@ ipt_do_table(struct sk_buff *skb,
 	const char *indev, *outdev;
 	const void *table_base;
 	struct ipt_entry *e, **jumpstack;
-	unsigned int *stackptr, origptr, cpu;
+	unsigned int stackidx, cpu;
 	const struct xt_table_info *private;
 	struct xt_action_param acpar;
 	unsigned int addend;
 
 	/* Initialization */
+	stackidx = 0;
 	ip = ip_hdr(skb);
 	indev = state->in ? state->in->name : nulldevname;
 	outdev = state->out ? state->out->name : nulldevname;
@@ -331,13 +332,20 @@ ipt_do_table(struct sk_buff *skb,
 	smp_read_barrier_depends();
 	table_base = private->entries;
 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
-	stackptr   = per_cpu_ptr(private->stackptr, cpu);
-	origptr    = *stackptr;
+
+	/* Switch to alternate jumpstack if we're being invoked via TEE.
+	 * The problem is that TEE issues XT_CONTINUE verdict on original
+	 * skb so we must not clobber the jumpstack.
+	 *
+	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
+	 * but its no problem since absolute verdict is issued by these.
+	 */
+	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	pr_debug("Entering %s(hook %u); sp at %u (UF %p)\n",
-		 table->name, hook, origptr,
+	pr_debug("Entering %s(hook %u), UF %p\n",
+		 table->name, hook,
 		 get_entry(table_base, private->underflow[hook]));
 
 	do {
@@ -383,28 +391,24 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned int)(-v) - 1;
 					break;
 				}
-				if (*stackptr <= origptr) {
+				if (stackidx == 0) {
 					e = get_entry(table_base,
 					    private->underflow[hook]);
 					pr_debug("Underflow (this is normal) "
 						 "to %p\n", e);
 				} else {
-					e = jumpstack[--*stackptr];
+					e = jumpstack[--stackidx];
 					pr_debug("Pulled %p out from pos %u\n",
-						 e, *stackptr);
+						 e, stackidx);
 					e = ipt_next_entry(e);
 				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				if (*stackptr >= private->stacksize) {
-					verdict = NF_DROP;
-					break;
-				}
-				jumpstack[(*stackptr)++] = e;
+				jumpstack[stackidx++] = e;
 				pr_debug("Pushed %p into pos %u\n",
-					 e, *stackptr - 1);
+					 e, stackidx - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -423,9 +427,8 @@ ipt_do_table(struct sk_buff *skb,
 			/* Verdict */
 			break;
 	} while (!acpar.hotdrop);
-	pr_debug("Exiting %s; resetting sp from %u to %u\n",
-		 __func__, *stackptr, origptr);
-	*stackptr = origptr;
+	pr_debug("Exiting %s; sp at %u\n", __func__, stackidx);
+
  	xt_write_recseq_end(addend);
  	local_bh_enable();
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index baf0321..4491335 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -324,12 +324,13 @@ ip6t_do_table(struct sk_buff *skb,
 	const char *indev, *outdev;
 	const void *table_base;
 	struct ip6t_entry *e, **jumpstack;
-	unsigned int *stackptr, origptr, cpu;
+	unsigned int stackidx, cpu;
 	const struct xt_table_info *private;
 	struct xt_action_param acpar;
 	unsigned int addend;
 
 	/* Initialization */
+	stackidx = 0;
 	indev = state->in ? state->in->name : nulldevname;
 	outdev = state->out ? state->out->name : nulldevname;
 	/* We handle fragments by dealing with the first fragment as
@@ -357,8 +358,15 @@ ip6t_do_table(struct sk_buff *skb,
 	cpu        = smp_processor_id();
 	table_base = private->entries;
 	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
-	stackptr   = per_cpu_ptr(private->stackptr, cpu);
-	origptr    = *stackptr;
+
+	/* Switch to alternate jumpstack if we're being invoked via TEE.
+	 * The problem is that TEE issues XT_CONTINUE verdict on original
+	 * skb so we must not clobber the jumpstack.
+	 *
+	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
+	 * but its no problem since absolute verdict is issued by these.
+	 */
+	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
@@ -406,20 +414,16 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned int)(-v) - 1;
 					break;
 				}
-				if (*stackptr <= origptr)
+				if (stackidx == 0)
 					e = get_entry(table_base,
 					    private->underflow[hook]);
 				else
-					e = ip6t_next_entry(jumpstack[--*stackptr]);
+					e = ip6t_next_entry(jumpstack[--stackidx]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				if (*stackptr >= private->stacksize) {
-					verdict = NF_DROP;
-					break;
-				}
-				jumpstack[(*stackptr)++] = e;
+				jumpstack[stackidx++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -437,8 +441,6 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!acpar.hotdrop);
 
-	*stackptr = origptr;
-
  	xt_write_recseq_end(addend);
  	local_bh_enable();
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 4db7d60..154447e 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -67,9 +67,6 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
-/* Allow this many total (re)entries. */
-static const unsigned int xt_jumpstack_multiplier = 2;
-
 /* Registration hooks for targets. */
 int xt_register_target(struct xt_target *target)
 {
@@ -688,8 +685,6 @@ void xt_free_table_info(struct xt_table_info *info)
 		kvfree(info->jumpstack);
 	}
 
-	free_percpu(info->stackptr);
-
 	kvfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -737,10 +732,6 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	unsigned int size;
 	int cpu;
 
-	i->stackptr = alloc_percpu(unsigned int);
-	if (i->stackptr == NULL)
-		return -ENOMEM;
-
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
 		i->jumpstack = vzalloc(size);
@@ -753,8 +744,17 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	if (i->stacksize == 0)
 		return 0;
 
-	i->stacksize *= xt_jumpstack_multiplier;
-	size = sizeof(void *) * i->stacksize;
+	/* Jumpstack needs to be able to record two full callchains, one
+	 * from the first rule set traversal, plus one table reentrancy
+	 * via -j TEE without clobbering the callchain that brought us to
+	 * TEE target.
+	 *
+	 * This is done by allocating two jumpstacks per cpu, on reentry
+	 * the upper half of the stack is used.
+	 *
+	 * see the jumpstack setup in ipt_do_table() for more details.
+	 */
+	size = sizeof(void *) * i->stacksize * 2u;
 	for_each_possible_cpu(cpu) {
 		if (size > PAGE_SIZE)
 			i->jumpstack[cpu] = vmalloc_node(size,
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH RFC -next 4/4] netfilter: xtables: add upper limit on call chain depth
  2015-07-08 21:15 [PATCH -next 0/4] netfilter: xtables: improve jumpstack handling Florian Westphal
                   ` (2 preceding siblings ...)
  2015-07-08 21:15 ` [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
@ 2015-07-08 21:15 ` Florian Westphal
  3 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2015-07-08 21:15 UTC (permalink / raw)
  To: netfilter-devel; +Cc: eric.dumazet, Florian Westphal

1024 is a very aggressive limit -- it will most likely not break any
real-world ruleset, but it might break certain iptables test scripts out
there.

If we were to use this limit it becomes feasible to allocate jump stack
directly via a percpu allocation (16kbytes needed per cpu in that case).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/x_tables.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 154447e..e043d7d 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -39,6 +39,7 @@ MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>");
 MODULE_DESCRIPTION("{ip,ip6,arp,eb}_tables backend module");
 
 #define SMP_ALIGN(x) (((x) + SMP_CACHE_BYTES-1) & ~(SMP_CACHE_BYTES-1))
+#define XT_MAX_STACKSIZE 1024
 
 struct compat_delta {
 	unsigned int offset; /* offset in kernel */
@@ -732,6 +733,9 @@ static int xt_jumpstack_alloc(struct xt_table_info *i)
 	unsigned int size;
 	int cpu;
 
+	if (i->stacksize > XT_MAX_STACKSIZE)
+		return -ELOOP;
+
 	size = sizeof(void **) * nr_cpu_ids;
 	if (size > PAGE_SIZE)
 		i->jumpstack = vzalloc(size);
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-08 21:15 ` [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
@ 2015-07-08 21:47   ` Jan Engelhardt
  2015-07-09  0:29     ` Florian Westphal
  2015-07-09  7:54   ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Engelhardt @ 2015-07-08 21:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, eric.dumazet


On Wednesday 2015-07-08 23:15, Florian Westphal wrote:

>The jump stack overflow tests are no longer needed as well -- since 
>->stacksize is the largest call depth we cannot exceed it.

The tests were once added for the rare case that a cloned packet hits 
another TEE. Can we be sure they are no longer needed?


>+	/* No TEE support for arptables, so no need to switch to alternate
>+	 * stack.  All targets that reenter must return absolte verdicts.

absolute

>+	/* Switch to alternate jumpstack if we're being invoked via TEE.
>+	 * The problem is that TEE issues XT_CONTINUE verdict on original
>+	 * skb so we must not clobber the jumpstack.

Well that is not really a problem but a feature :)

>+	/* Switch to alternate jumpstack if we're being invoked via TEE.
>+	 * The problem is that TEE issues XT_CONTINUE verdict on original
>+	 * skb so we must not clobber the jumpstack.
>+	 *
>+	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
>+	 * but its no problem since absolute verdict is issued by these.

"but it is no problem"

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-08 21:47   ` Jan Engelhardt
@ 2015-07-09  0:29     ` Florian Westphal
  2015-07-09  8:16       ` Jan Engelhardt
  0 siblings, 1 reply; 10+ messages in thread
From: Florian Westphal @ 2015-07-09  0:29 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Florian Westphal, netfilter-devel, eric.dumazet

Jan Engelhardt <jengelh@inai.de> wrote:
> On Wednesday 2015-07-08 23:15, Florian Westphal wrote:
> 
> >The jump stack overflow tests are no longer needed as well -- since 
> >->stacksize is the largest call depth we cannot exceed it.
> 
> The tests were once added for the rare case that a cloned packet hits 
> another TEE. Can we be sure they are no longer needed?

Hmm, not sure I understand.

If a TEE'd skb hits another TEE target there is no reentry since the
tee_active percpu indicator is true.

So where can we enter ip(6)tables *twice* via TEE?
Sure, a TEE'd packet can e.g. hit REJECT which then causes another
reentry into ip(6)tables. But it should be ok since we 'only' clobber
the "alternate" jumpstack and a DROP will be issued by REJECT.

Could you please outline a problematic scenario?  Thanks!

> >+	/* No TEE support for arptables, so no need to switch to alternate
> >+	 * stack.  All targets that reenter must return absolte verdicts.
> 
> absolute

Thanks, will fix

> >+	/* Switch to alternate jumpstack if we're being invoked via TEE.
> >+	 * The problem is that TEE issues XT_CONTINUE verdict on original
> >+	 * skb so we must not clobber the jumpstack.
> 
> Well that is not really a problem but a feature :)

Sorry, I did not mean to imply TEE was misbehaving.  I'll shorten this
to: "TEE will issue XT_CONTINUE verdict" ...

Thanks for reviewing.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-08 21:15 ` [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
  2015-07-08 21:47   ` Jan Engelhardt
@ 2015-07-09  7:54   ` Eric Dumazet
  2015-07-09  9:14     ` Florian Westphal
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-07-09  7:54 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Wed, 2015-07-08 at 23:15 +0200, Florian Westphal wrote:

> +
> +	/* Switch to alternate jumpstack if we're being invoked via TEE.
> +	 * The problem is that TEE issues XT_CONTINUE verdict on original
> +	 * skb so we must not clobber the jumpstack.
> +	 *
> +	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
> +	 * but its no problem since absolute verdict is issued by these.
> +	 */
> +	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);

This could eventually be garded by

#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TEE)

Or even better, a jump label that would be enabled when TEE module is
loaded.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-09  0:29     ` Florian Westphal
@ 2015-07-09  8:16       ` Jan Engelhardt
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Engelhardt @ 2015-07-09  8:16 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, eric.dumazet

On Thursday 2015-07-09 02:29, Florian Westphal wrote:

>Jan Engelhardt <jengelh@inai.de> wrote:
>> On Wednesday 2015-07-08 23:15, Florian Westphal wrote:
>> 
>> >The jump stack overflow tests are no longer needed as well -- since 
>> >->stacksize is the largest call depth we cannot exceed it.
>> 
>> The tests were once added for the rare case that a cloned packet hits 
>> another TEE. Can we be sure they are no longer needed?
>
>If a TEE'd skb hits another TEE target there is no reentry since the
>tee_active percpu indicator is true.

Ah. I don't remember having written the "tee_active" code, but 
apparently I did some years ago.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset
  2015-07-09  7:54   ` Eric Dumazet
@ 2015-07-09  9:14     ` Florian Westphal
  0 siblings, 0 replies; 10+ messages in thread
From: Florian Westphal @ 2015-07-09  9:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netfilter-devel

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > +	/* Switch to alternate jumpstack if we're being invoked via TEE.
> > +	 * The problem is that TEE issues XT_CONTINUE verdict on original
> > +	 * skb so we must not clobber the jumpstack.
> > +	 *
> > +	 * For recursion via REJECT or SYNPROXY the stack will be clobbered
> > +	 * but its no problem since absolute verdict is issued by these.
> > +	 */
> > +	jumpstack += private->stacksize * __this_cpu_read(nf_skb_duplicated);
> 
> This could eventually be garded by
> 
> #if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TEE)
> 
> Or even better, a jump label that would be enabled when TEE module is
> loaded.

I can add a static key, sure.

Thanks for the suggestion Eric.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-09  9:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-08 21:15 [PATCH -next 0/4] netfilter: xtables: improve jumpstack handling Florian Westphal
2015-07-08 21:15 ` [PATCH -next 1/4] xtables: compute exact size needed for jumpstack Florian Westphal
2015-07-08 21:15 ` [PATCH -next 2/4] netfilter: move tee_active to core Florian Westphal
2015-07-08 21:15 ` [PATCH -next 3/4] netfilter: xtables: don't save/restore jumpstack offset Florian Westphal
2015-07-08 21:47   ` Jan Engelhardt
2015-07-09  0:29     ` Florian Westphal
2015-07-09  8:16       ` Jan Engelhardt
2015-07-09  7:54   ` Eric Dumazet
2015-07-09  9:14     ` Florian Westphal
2015-07-08 21:15 ` [PATCH RFC -next 4/4] netfilter: xtables: add upper limit on call chain depth Florian Westphal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).