All of lore.kernel.org
 help / color / mirror / Atom feed
* nf-next: TEE 20100215
@ 2010-04-15 23:25 Jan Engelhardt
  2010-04-15 23:25 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel


Hi,


xt_TEE now has routing by oif. Please review. Does it look right to you?


The following changes since commit 9c6eb28aca52d562f3ffbaebaa56385df9972a43:
  Jan Engelhardt (1):
        netfilter: ipv6: add IPSKB_REROUTED exclusion to NF_HOOK/POSTROUTING invocation

are available in the git repository at:

  git://dev.medozas.de/linux master

Jan Engelhardt (4):
      netfilter: xtables: inclusion of xt_TEE
      netfilter: xtables2: make ip_tables reentrant
      netfilter: xt_TEE: have cloned packet travel through Xtables too
      netfilter: xtables: remove old comments about reentrancy

 include/linux/netfilter/Kbuild     |    1 +
 include/linux/netfilter/x_tables.h |    7 +
 include/linux/netfilter/xt_TEE.h   |    9 ++
 net/ipv4/netfilter/arp_tables.c    |    6 +-
 net/ipv4/netfilter/ip_tables.c     |   67 +++++-----
 net/ipv4/netfilter/ipt_REJECT.c    |    3 -
 net/ipv6/netfilter/ip6_tables.c    |   58 +++-----
 net/ipv6/netfilter/ip6t_REJECT.c   |    3 -
 net/netfilter/Kconfig              |    7 +
 net/netfilter/Makefile             |    1 +
 net/netfilter/x_tables.c           |   77 +++++++++++
 net/netfilter/xt_TEE.c             |  252 ++++++++++++++++++++++++++++++++++++
 12 files changed, 415 insertions(+), 76 deletions(-)
 create mode 100644 include/linux/netfilter/xt_TEE.h
 create mode 100644 net/netfilter/xt_TEE.c

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
@ 2010-04-15 23:25 ` Jan Engelhardt
  2010-04-19 12:20   ` Patrick McHardy
  2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

xt_TEE can be used to clone and reroute a packet. This can for
example be used to copy traffic at a router for logging purposes
to another dedicated machine.

References: http://www.gossamer-threads.com/lists/iptables/devel/68781
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/Kbuild   |    1 +
 include/linux/netfilter/xt_TEE.h |    9 ++
 net/ipv4/ip_output.c             |    1 +
 net/ipv6/ip6_output.c            |    1 +
 net/netfilter/Kconfig            |    7 +
 net/netfilter/Makefile           |    1 +
 net/netfilter/xt_TEE.c           |  256 ++++++++++++++++++++++++++++++++++++++
 7 files changed, 276 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/netfilter/xt_TEE.h
 create mode 100644 net/netfilter/xt_TEE.c

diff --git a/include/linux/netfilter/Kbuild b/include/linux/netfilter/Kbuild
index a5a63e4..48767cd 100644
--- a/include/linux/netfilter/Kbuild
+++ b/include/linux/netfilter/Kbuild
@@ -16,6 +16,7 @@ header-y += xt_RATEEST.h
 header-y += xt_SECMARK.h
 header-y += xt_TCPMSS.h
 header-y += xt_TCPOPTSTRIP.h
+header-y += xt_TEE.h
 header-y += xt_TPROXY.h
 header-y += xt_comment.h
 header-y += xt_connbytes.h
diff --git a/include/linux/netfilter/xt_TEE.h b/include/linux/netfilter/xt_TEE.h
new file mode 100644
index 0000000..55d4a50
--- /dev/null
+++ b/include/linux/netfilter/xt_TEE.h
@@ -0,0 +1,9 @@
+#ifndef _XT_TEE_TARGET_H
+#define _XT_TEE_TARGET_H
+
+struct xt_tee_tginfo {
+	union nf_inet_addr gw;
+	char oif[16];
+};
+
+#endif /* _XT_TEE_TARGET_H */
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index f09135e..0abfdde 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,6 +309,7 @@ int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
+EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c10a38a..d09be7f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,6 +176,7 @@ int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
+EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 8055786..673a6c8 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -502,6 +502,13 @@ config NETFILTER_XT_TARGET_RATEEST
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config NETFILTER_XT_TARGET_TEE
+	tristate '"TEE" - packet cloning to alternate destiantion'
+	depends on NETFILTER_ADVANCED
+	---help---
+	This option adds a "TEE" target with which a packet can be cloned and
+	this clone be rerouted to another nexthop.
+
 config NETFILTER_XT_TARGET_TPROXY
 	tristate '"TPROXY" target support (EXPERIMENTAL)'
 	depends on EXPERIMENTAL
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index cd31afe..14e3a8f 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -59,6 +59,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_SECMARK) += xt_SECMARK.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TPROXY) += xt_TPROXY.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPMSS) += xt_TCPMSS.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP) += xt_TCPOPTSTRIP.o
+obj-$(CONFIG_NETFILTER_XT_TARGET_TEE) += xt_TEE.o
 obj-$(CONFIG_NETFILTER_XT_TARGET_TRACE) += xt_TRACE.o
 
 # matches
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
new file mode 100644
index 0000000..b3d7301
--- /dev/null
+++ b/net/netfilter/xt_TEE.c
@@ -0,0 +1,256 @@
+/*
+ *	"TEE" target extension for Xtables
+ *	Copyright © Sebastian Claßen, 2007
+ *	Jan Engelhardt, 2007-2010
+ *
+ *	based on ipt_ROUTE.c from Cédric de Launois
+ *	<delaunois@info.ucl.be>
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	version 2 or later, as published by the Free Software Foundation.
+ */
+#include <linux/ip.h>
+#include <linux/module.h>
+#include <linux/route.h>
+#include <linux/skbuff.h>
+#include <net/checksum.h>
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/route.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_TEE.h>
+
+#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+#	define WITH_CONNTRACK 1
+#	include <net/netfilter/nf_conntrack.h>
+#endif
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
+#	define WITH_IPV6 1
+#endif
+
+static const union nf_inet_addr tee_zero_address;
+
+static struct net *pick_net(struct sk_buff *skb)
+{
+#ifdef CONFIG_NET_NS
+	const struct dst_entry *dst;
+
+	if (skb->dev != NULL)
+		return dev_net(skb->dev);
+	dst = skb_dst(skb);
+	if (dst != NULL && dst->dev != NULL)
+		return dev_net(dst->dev);
+#endif
+	return &init_net;
+}
+
+static bool tee_tg_route_oif(struct flowi *f, struct net *net,
+			     const struct xt_tee_tginfo *info)
+{
+	const struct net_device *dev;
+
+	if (*info->oif != '\0')
+		return true;
+	dev = dev_get_by_name(net, info->oif);
+	if (dev == NULL)
+		return false;
+	f->oif = dev->ifindex;
+	return true;
+}
+
+static bool
+tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	struct net *net = pick_net(skb);
+	struct rtable *rt;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(fl));
+	if (!tee_tg_route_oif(&fl, net, info))
+		return false;
+	fl.nl_u.ip4_u.daddr = info->gw.ip;
+	fl.nl_u.ip4_u.tos   = RT_TOS(iph->tos);
+	fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE;
+	if (ip_route_output_key(net, &rt, &fl) != 0)
+		return false;
+
+	dst_release(skb_dst(skb));
+	skb_dst_set(skb, &rt->u.dst);
+	skb->dev      = rt->u.dst.dev;
+	skb->protocol = htons(ETH_P_IP);
+	return true;
+}
+
+static unsigned int
+tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+	struct iphdr *iph;
+
+	/*
+	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
+	 * the original skb, which should continue on its way as if nothing has
+	 * happened. The copy should be independently delivered to the TEE
+	 * --gateway.
+	 */
+	skb = pskb_copy(skb, GFP_ATOMIC);
+	if (skb == NULL)
+		return XT_CONTINUE;
+
+#ifdef WITH_CONNTRACK
+	/* Avoid counting cloned packets towards the original connection. */
+	nf_conntrack_put(skb->nfct);
+	skb->nfct     = &nf_conntrack_untracked.ct_general;
+	skb->nfctinfo = IP_CT_NEW;
+	nf_conntrack_get(skb->nfct);
+#endif
+	/*
+	 * If we are in PREROUTING/INPUT, the checksum must be recalculated
+	 * since the length could have changed as a result of defragmentation.
+	 *
+	 * We also decrease the TTL to mitigate potential TEE loops
+	 * between two hosts.
+	 *
+	 * Set %IP_DF so that the original source is notified of a potentially
+	 * decreased MTU on the clone route. IPv6 does this too.
+	 */
+	iph = ip_hdr(skb);
+	iph->frag_off |= htons(IP_DF);
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_IN)
+		--iph->ttl;
+	ip_send_check(iph);
+
+	/*
+	 * Xtables is not reentrant currently, so a choice has to be made:
+	 * 1. return absolute verdict for the original and let the cloned
+	 *    packet travel through the chains
+	 * 2. let the original continue travelling and not pass the clone
+	 *    to Xtables.
+	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
+	 * Because iph->check is already correct and we don't pass it to
+	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
+	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
+	 * invoke POSTROUTING on the cloned packet.
+	 */
+	IPCB(skb)->flags |= IPSKB_REROUTED;
+	if (tee_tg_route4(skb, info))
+		ip_output(skb);
+	else
+		kfree_skb(skb);
+
+	return XT_CONTINUE;
+}
+
+#ifdef WITH_IPV6
+static bool
+tee_tg_route6(struct sk_buff *skb, const struct xt_tee_tginfo *info)
+{
+	const struct ipv6hdr *iph = ipv6_hdr(skb);
+	struct net *net = pick_net(skb);
+	struct dst_entry *dst;
+	struct flowi fl;
+
+	memset(&fl, 0, sizeof(fl));
+	if (!tee_tg_route_oif(&fl, net, info))
+		return false;
+	fl.nl_u.ip6_u.daddr = info->gw.in6;
+	fl.nl_u.ip6_u.flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) |
+				  (iph->flow_lbl[1] << 8) | iph->flow_lbl[2];
+	dst = ip6_route_output(net, NULL, &fl);
+	if (dst == NULL)
+		return false;
+
+	dst_release(skb_dst(skb));
+	skb_dst_set(skb, dst);
+	skb->dev      = dst->dev;
+	skb->protocol = htons(ETH_P_IPV6);
+	return true;
+}
+
+static unsigned int
+tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+
+	skb = pskb_copy(skb, GFP_ATOMIC);
+	if (skb == NULL)
+		return XT_CONTINUE;
+
+#ifdef WITH_CONNTRACK
+	nf_conntrack_put(skb->nfct);
+	skb->nfct     = &nf_conntrack_untracked.ct_general;
+	skb->nfctinfo = IP_CT_NEW;
+	nf_conntrack_get(skb->nfct);
+#endif
+	if (par->hooknum == NF_INET_PRE_ROUTING ||
+	    par->hooknum == NF_INET_LOCAL_IN) {
+		struct ipv6hdr *iph = ipv6_hdr(skb);
+		--iph->hop_limit;
+	}
+	IP6CB(skb)->flags |= IP6SKB_REROUTED;
+	if (tee_tg_route6(skb, info))
+		ip6_output(skb);
+	else
+		kfree_skb(skb);
+
+	return XT_CONTINUE;
+}
+#endif /* WITH_IPV6 */
+
+static int tee_tg_check(const struct xt_tgchk_param *par)
+{
+	const struct xt_tee_tginfo *info = par->targinfo;
+
+	if (info->oif[sizeof(info->oif)-1] != '\0')
+		return -EINVAL;
+	/* 0.0.0.0 and :: not allowed */
+	return (memcmp(&info->gw, &tee_zero_address,
+	       sizeof(tee_zero_address)) == 0) ? -EINVAL : 0;
+}
+
+static struct xt_target tee_tg_reg[] __read_mostly = {
+	{
+		.name       = "TEE",
+		.revision   = 1,
+		.family     = NFPROTO_IPV4,
+		.target     = tee_tg4,
+		.targetsize = sizeof(struct xt_tee_tginfo),
+		.checkentry = tee_tg_check,
+		.me         = THIS_MODULE,
+	},
+#ifdef WITH_IPV6
+	{
+		.name       = "TEE",
+		.revision   = 1,
+		.family     = NFPROTO_IPV6,
+		.target     = tee_tg6,
+		.targetsize = sizeof(struct xt_tee_tginfo),
+		.checkentry = tee_tg_check,
+		.me         = THIS_MODULE,
+	},
+#endif
+};
+
+static int __init tee_tg_init(void)
+{
+	return xt_register_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+}
+
+static void __exit tee_tg_exit(void)
+{
+	xt_unregister_targets(tee_tg_reg, ARRAY_SIZE(tee_tg_reg));
+}
+
+module_init(tee_tg_init);
+module_exit(tee_tg_exit);
+MODULE_AUTHOR("Sebastian Claßen <sebastian.classen@freenet.ag>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_DESCRIPTION("Xtables: Reroute packet copy");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_TEE");
+MODULE_ALIAS("ip6t_TEE");
-- 
1.7.0.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
  2010-04-15 23:25 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
@ 2010-04-15 23:25 ` Jan Engelhardt
  2010-04-19 12:22   ` Patrick McHardy
  2010-04-15 23:25 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
  2010-04-15 23:25 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser. The setting is
configurable at runtime through sysfs and will take effect when a
table is replaced by a new one.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/x_tables.h |    7 +++
 net/ipv4/netfilter/arp_tables.c    |    6 ++-
 net/ipv4/netfilter/ip_tables.c     |   65 ++++++++++++++++--------------
 net/ipv6/netfilter/ip6_tables.c    |   56 ++++++++++----------------
 net/netfilter/x_tables.c           |   77 ++++++++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 26ced0c..50c8672 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -401,6 +401,13 @@ struct xt_table_info {
 	unsigned int hook_entry[NF_INET_NUMHOOKS];
 	unsigned int underflow[NF_INET_NUMHOOKS];
 
+	/*
+	 * Number of user chains. Since tables cannot have loops, at most
+	 * @stacksize jumps (number of user chains) can possibly be made.
+	 */
+	unsigned int stacksize;
+	unsigned int *stackptr;
+	void ***jumpstack;
 	/* ipt_entry tables: one per CPU */
 	/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
 	void *entries[1];
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e8e363d..07a6990 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
+		if (strcmp(arpt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 18c5b15..70900ec 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb,
 	     const struct net_device *out,
 	     struct xt_table *table)
 {
-#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct iphdr *ip;
 	bool hotdrop = false;
@@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb,
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ipt_entry *e, *back;
+	struct ipt_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb,
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
+	pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n",
+		 table->name, hook, origptr,
+		 get_entry(table_base, private->underflow[hook]));
 
 	do {
 		const struct ipt_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip_packet_match(ip, indev, outdev,
 		    &e->ip, mtpar.fragoff)) {
  no_match:
@@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0) {
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+					pr_devel("Underflow (this is normal) "
+						 "to %p\n", e);
+				} else {
+					e = jumpstack[--*stackptr];
+					pr_devel("Pulled %p out from pos %u\n",
+						 e, *stackptr);
+					e = ipt_next_entry(e);
+				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ipt_entry *next = ipt_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
+				pr_devel("Pushed %p into pos %u\n",
+					 e, *stackptr - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb,
 		tgpar.targinfo = t->data;
 
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		/* Target might have changed stuff. */
 		ip = ip_hdr(skb);
 		if (verdict == IPT_CONTINUE)
@@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 	xt_info_rdunlock_bh();
-
+	pr_devel("Exiting %s; resetting sp from %u to %u\n",
+		 __func__, *stackptr, origptr);
+	*stackptr = origptr;
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
 #else
@@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ipt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f2b815e..2a2770b 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb,
 	      const struct net_device *out,
 	      struct xt_table *table)
 {
-#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	bool hotdrop = false;
 	/* Initializing verdict to NF_DROP keeps gcc happy. */
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ip6t_entry *e, *back;
+	struct ip6t_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb,
 
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
-
 	do {
 		const struct ip6t_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &mtpar.thoff, &mtpar.fragoff, &hotdrop)) {
  no_match:
@@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0)
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+				else
+					e = ip6t_next_entry(jumpstack[--*stackptr]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ip6t_entry *next = ip6t_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb,
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		if (verdict == IP6T_CONTINUE)
 			e = ip6t_next_entry(e);
 		else
@@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 
-#ifdef CONFIG_NETFILTER_DEBUG
-	tb_comefrom = NETFILTER_LINK_POISON;
-#endif
 	xt_info_rdunlock_bh();
+	*stackptr = origptr;
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
@@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ip6t_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8e23d8f..edde5c6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
+/* Allow this many total (re)entries. */
+static const unsigned int xt_jumpstack_multiplier = 2;
+
 /* Registration hooks for targets. */
 int
 xt_register_target(struct xt_target *target)
@@ -680,6 +683,26 @@ void xt_free_table_info(struct xt_table_info *info)
 		else
 			vfree(info->entries[cpu]);
 	}
+
+	if (info->jumpstack != NULL) {
+		if (sizeof(void *) * info->stacksize > PAGE_SIZE) {
+			for_each_possible_cpu(cpu)
+				vfree(info->jumpstack[cpu]);
+		} else {
+			for_each_possible_cpu(cpu)
+				kfree(info->jumpstack[cpu]);
+		}
+	}
+
+	if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->jumpstack);
+	else
+		kfree(info->jumpstack);
+	if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->stackptr);
+	else
+		kfree(info->stackptr);
+
 	kfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -724,6 +747,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock);
 DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks);
 EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks);
 
+static int xt_jumpstack_alloc(struct xt_table_info *i)
+{
+	unsigned int size;
+	int cpu;
+
+	size = sizeof(unsigned int) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->stackptr = vmalloc(size);
+	else
+		i->stackptr = kmalloc(size, GFP_KERNEL);
+	if (i->stackptr == NULL)
+		return -ENOMEM;
+	memset(i->stackptr, 0, size);
+
+	size = sizeof(void **) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->jumpstack = vmalloc(size);
+	else
+		i->jumpstack = kmalloc(size, GFP_KERNEL);
+	if (i->jumpstack == NULL)
+		return -ENOMEM;
+	memset(i->jumpstack, 0, size);
+
+	i->stacksize *= xt_jumpstack_multiplier;
+	size = sizeof(void *) * i->stacksize;
+	for_each_possible_cpu(cpu) {
+		if (size > PAGE_SIZE)
+			i->jumpstack[cpu] = vmalloc_node(size,
+				cpu_to_node(cpu));
+		else
+			i->jumpstack[cpu] = kmalloc_node(size,
+				GFP_KERNEL, cpu_to_node(cpu));
+		if (i->jumpstack[cpu] == NULL)
+			/*
+			 * Freeing will be done later on by the callers. The
+			 * chain is: xt_replace_table -> __do_replace ->
+			 * do_replace -> xt_free_table_info.
+			 */
+			return -ENOMEM;
+	}
+
+	return 0;
+}
 
 struct xt_table_info *
 xt_replace_table(struct xt_table *table,
@@ -732,6 +798,7 @@ xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	int ret;
 
 	/* Do the substitution. */
 	local_bh_disable();
@@ -746,6 +813,12 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
@@ -770,6 +843,10 @@ struct xt_table *xt_register_table(struct net *net,
 	struct xt_table_info *private;
 	struct xt_table *t, *table;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	/* Don't add one object to multiple lists. */
 	table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
 	if (!table) {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
  2010-04-15 23:25 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
  2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
@ 2010-04-15 23:25 ` Jan Engelhardt
  2010-04-19 14:07   ` Patrick McHardy
  2010-04-15 23:25 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Since Xtables is now reentrant/nestable, the cloned packet can also go
through Xtables and be subject to rules itself.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv4/ip_output.c   |    1 -
 net/ipv6/ip6_output.c  |    1 -
 net/netfilter/xt_TEE.c |   40 ++++++++++++++++++----------------------
 3 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0abfdde..f09135e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -309,7 +309,6 @@ int ip_output(struct sk_buff *skb)
 			    ip_finish_output,
 			    !(IPCB(skb)->flags & IPSKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip_output);
 
 int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
 {
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index d09be7f..c10a38a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -176,7 +176,6 @@ int ip6_output(struct sk_buff *skb)
 			    ip6_finish_output,
 			    !(IP6CB(skb)->flags & IP6SKB_REROUTED));
 }
-EXPORT_SYMBOL_GPL(ip6_output);
 
 /*
  *	xmit an sk_buff (used by TCP)
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index b3d7301..842e701 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -12,6 +12,7 @@
  */
 #include <linux/ip.h>
 #include <linux/module.h>
+#include <linux/percpu.h>
 #include <linux/route.h>
 #include <linux/skbuff.h>
 #include <net/checksum.h>
@@ -32,6 +33,7 @@
 #endif
 
 static const union nf_inet_addr tee_zero_address;
+static DEFINE_PER_CPU(bool, tee_active);
 
 static struct net *pick_net(struct sk_buff *skb)
 {
@@ -91,6 +93,8 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 	const struct xt_tee_tginfo *info = par->targinfo;
 	struct iphdr *iph;
 
+	if (percpu_read(tee_active))
+		return XT_CONTINUE;
 	/*
 	 * Copy the skb, and route the copy. Will later return %XT_CONTINUE for
 	 * the original skb, which should continue on its way as if nothing has
@@ -125,24 +129,13 @@ tee_tg4(struct sk_buff *skb, const struct xt_target_param *par)
 		--iph->ttl;
 	ip_send_check(iph);
 
-	/*
-	 * Xtables is not reentrant currently, so a choice has to be made:
-	 * 1. return absolute verdict for the original and let the cloned
-	 *    packet travel through the chains
-	 * 2. let the original continue travelling and not pass the clone
-	 *    to Xtables.
-	 * #2 is chosen. Normally, we would use ip_local_out for the clone.
-	 * Because iph->check is already correct and we don't pass it to
-	 * Xtables anyway, a shortcut to dst_output [forwards to ip_output] can
-	 * be taken. %IPSKB_REROUTED needs to be set so that ip_output does not
-	 * invoke POSTROUTING on the cloned packet.
-	 */
-	IPCB(skb)->flags |= IPSKB_REROUTED;
-	if (tee_tg_route4(skb, info))
-		ip_output(skb);
-	else
+	if (tee_tg_route4(skb, info)) {
+		percpu_write(tee_active, true);
+		ip_local_out(skb);
+		percpu_write(tee_active, false);
+	} else {
 		kfree_skb(skb);
-
+	}
 	return XT_CONTINUE;
 }
 
@@ -177,6 +170,8 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 {
 	const struct xt_tee_tginfo *info = par->targinfo;
 
+	if (percpu_read(tee_active))
+		return XT_CONTINUE;
 	skb = pskb_copy(skb, GFP_ATOMIC);
 	if (skb == NULL)
 		return XT_CONTINUE;
@@ -192,12 +187,13 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 		struct ipv6hdr *iph = ipv6_hdr(skb);
 		--iph->hop_limit;
 	}
-	IP6CB(skb)->flags |= IP6SKB_REROUTED;
-	if (tee_tg_route6(skb, info))
-		ip6_output(skb);
-	else
+	if (tee_tg_route6(skb, info)) {
+		percpu_write(tee_active, true);
+		ip6_local_out(skb);
+		percpu_write(tee_active, false);
+	} else {
 		kfree_skb(skb);
-
+	}
 	return XT_CONTINUE;
 }
 #endif /* WITH_IPV6 */
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy
  2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
                   ` (2 preceding siblings ...)
  2010-04-15 23:25 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
@ 2010-04-15 23:25 ` Jan Engelhardt
  2010-04-19 14:08   ` Patrick McHardy
  3 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-15 23:25 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/ipv4/netfilter/ip_tables.c   |    2 --
 net/ipv4/netfilter/ipt_REJECT.c  |    3 ---
 net/ipv6/netfilter/ip6_tables.c  |    2 --
 net/ipv6/netfilter/ip6t_REJECT.c |    3 ---
 4 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 70900ec..bb5e0d9 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -434,8 +434,6 @@ ipt_do_table(struct sk_buff *skb,
 			continue;
 		}
 
-		/* Targets which reenter must return
-		   abs. verdicts */
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
diff --git a/net/ipv4/netfilter/ipt_REJECT.c b/net/ipv4/netfilter/ipt_REJECT.c
index b026014..038fa0b 100644
--- a/net/ipv4/netfilter/ipt_REJECT.c
+++ b/net/ipv4/netfilter/ipt_REJECT.c
@@ -139,9 +139,6 @@ reject_tg(struct sk_buff *skb, const struct xt_target_param *par)
 {
 	const struct ipt_reject_info *reject = par->targinfo;
 
-	/* WARNING: This code causes reentry within iptables.
-	   This means that the iptables jump stack is now crap.  We
-	   must return an absolute verdict. --RR */
 	switch (reject->with) {
 	case IPT_ICMP_NET_UNREACHABLE:
 		send_unreach(skb, ICMP_NET_UNREACH);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 2a2770b..7afa117 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -451,8 +451,6 @@ ip6t_do_table(struct sk_buff *skb,
 			continue;
 		}
 
-		/* Targets which reenter must return
-		   abs. verdicts */
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
diff --git a/net/ipv6/netfilter/ip6t_REJECT.c b/net/ipv6/netfilter/ip6t_REJECT.c
index 55b9b2d..dad9762 100644
--- a/net/ipv6/netfilter/ip6t_REJECT.c
+++ b/net/ipv6/netfilter/ip6t_REJECT.c
@@ -179,9 +179,6 @@ reject_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 	struct net *net = dev_net((par->in != NULL) ? par->in : par->out);
 
 	pr_debug("%s: medium point\n", __func__);
-	/* WARNING: This code causes reentry within ip6tables.
-	   This means that the ip6tables jump stack is now crap.  We
-	   must return an absolute verdict. --RR */
 	switch (reject->with) {
 	case IP6T_ICMP6_NO_ROUTE:
 		send_unreach(net, skb, ICMPV6_NOROUTE, par->hooknum);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-15 23:25 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
@ 2010-04-19 12:20   ` Patrick McHardy
  2010-04-19 12:36     ` Jan Engelhardt
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2010-04-19 12:20 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> xt_TEE can be used to clone and reroute a packet. This can for
> example be used to copy traffic at a router for logging purposes
> to another dedicated machine.
> 
> References: http://www.gossamer-threads.com/lists/iptables/devel/68781

Applied, thanks Jan.

> +static bool tee_tg_route_oif(struct flowi *f, struct net *net,
> +			     const struct xt_tee_tginfo *info)
> +{
> +	const struct net_device *dev;
> +
> +	if (*info->oif != '\0')
> +		return true;
> +	dev = dev_get_by_name(net, info->oif);
> +	if (dev == NULL)
> +		return false;
> +	f->oif = dev->ifindex;
> +	return true;
> +}
> +
> +static bool
> +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
> +{
> +	const struct iphdr *iph = ip_hdr(skb);
> +	struct net *net = pick_net(skb);
> +	struct rtable *rt;
> +	struct flowi fl;
> +
> +	memset(&fl, 0, sizeof(fl));
> +	if (!tee_tg_route_oif(&fl, net, info))
> +		return false;

Redoing the interface lookup once per packet is really highly
suboptimal. I'll change that to do the lookup once per new
rule or simply in userspace.

Please also send the userspace patch you're using so I can
do some testing. Thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
@ 2010-04-19 12:22   ` Patrick McHardy
  2010-04-19 12:54     ` Jan Engelhardt
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2010-04-19 12:22 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> Currently, the table traverser stores return addresses in the ruleset
> itself (struct ip6t_entry->comefrom). This has a well-known drawback:
> the jumpstack is overwritten on reentry, making it necessary for
> targets to return absolute verdicts. Also, the ruleset (which might
> be heavy memory-wise) needs to be replicated for each CPU that can
> possibly invoke ip6t_do_table.
> 
> This patch decouples the jumpstack from struct ip6t_entry and instead
> puts it into xt_table_info. Not being restricted by 'comefrom'
> anymore, we can set up a stack as needed. By default, there is room
> allocated for two entries into the traverser. The setting is
> configurable at runtime through sysfs and will take effect when a
> table is replaced by a new one.

The changelog is not up to date anymore, but ...

> 
> diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
> index 26ced0c..50c8672 100644
> --- a/include/linux/netfilter/x_tables.h
> +++ b/include/linux/netfilter/x_tables.h
> @@ -401,6 +401,13 @@ struct xt_table_info {
>  	unsigned int hook_entry[NF_INET_NUMHOOKS];
>  	unsigned int underflow[NF_INET_NUMHOOKS];
>  
> +	/*
> +	 * Number of user chains. Since tables cannot have loops, at most
> +	 * @stacksize jumps (number of user chains) can possibly be made.
> +	 */
> +	unsigned int stacksize;
> +	unsigned int *stackptr;
> +	void ***jumpstack;
...
> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
> index 8e23d8f..edde5c6 100644
> --- a/net/netfilter/x_tables.c
> +++ b/net/netfilter/x_tables.c
> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
>  	[NFPROTO_IPV6]   = "ip6",
>  };
>  
> +/* Allow this many total (re)entries. */
> +static const unsigned int xt_jumpstack_multiplier = 2;
> +

Why aren't you using a define instead of saving the stack size
in the table info?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-19 12:20   ` Patrick McHardy
@ 2010-04-19 12:36     ` Jan Engelhardt
  2010-04-19 12:42       ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-19 12:36 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel


On Monday 2010-04-19 14:20, Patrick McHardy wrote:
>> +static bool
>> +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
>> +{
>> +	const struct iphdr *iph = ip_hdr(skb);
>> +	struct net *net = pick_net(skb);
>> +	struct rtable *rt;
>> +	struct flowi fl;
>> +
>> +	memset(&fl, 0, sizeof(fl));
>> +	if (!tee_tg_route_oif(&fl, net, info))
>> +		return false;
>
>Redoing the interface lookup once per packet is really highly
>suboptimal. I'll change that to do the lookup once per new
>rule or simply in userspace.

I thought about that too. But if you grab a reference to the dev on
rule insertion, the rule would stop working when you down and up an
interface, which does not match the regular iptables behavior (-i and
-o options) at all.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE
  2010-04-19 12:36     ` Jan Engelhardt
@ 2010-04-19 12:42       ` Patrick McHardy
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2010-04-19 12:42 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> On Monday 2010-04-19 14:20, Patrick McHardy wrote:
>>> +static bool
>>> +tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
>>> +{
>>> +	const struct iphdr *iph = ip_hdr(skb);
>>> +	struct net *net = pick_net(skb);
>>> +	struct rtable *rt;
>>> +	struct flowi fl;
>>> +
>>> +	memset(&fl, 0, sizeof(fl));
>>> +	if (!tee_tg_route_oif(&fl, net, info))
>>> +		return false;
>> Redoing the interface lookup once per packet is really highly
>> suboptimal. I'll change that to do the lookup once per new
>> rule or simply in userspace.
> 
> I thought about that too. But if you grab a reference to the dev on
> rule insertion, the rule would stop working when you down and up an
> interface, which does not match the regular iptables behavior (-i and
> -o options) at all.

Not down and up, unregister and register. But we don't need a
reference, just the ifindex. That can be updated on netdev events.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-19 12:22   ` Patrick McHardy
@ 2010-04-19 12:54     ` Jan Engelhardt
  2010-04-19 14:06       ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-19 12:54 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel


On Monday 2010-04-19 14:22, Patrick McHardy wrote:
>Jan Engelhardt wrote:
>> This patch decouples the jumpstack from struct ip6t_entry and instead
>> puts it into xt_table_info. Not being restricted by 'comefrom'
>> anymore, we can set up a stack as needed. By default, there is room
>> allocated for two entries into the traverser. The setting is
>> configurable at runtime through sysfs and will take effect when a
>> table is replaced by a new one.
>
>The changelog is not up to date anymore, but ...

Oops ;-)

>> --- a/include/linux/netfilter/x_tables.h
>> +++ b/include/linux/netfilter/x_tables.h
>> @@ -401,6 +401,13 @@ struct xt_table_info {
>>  	unsigned int hook_entry[NF_INET_NUMHOOKS];
>>  	unsigned int underflow[NF_INET_NUMHOOKS];
>>  
>> +	/*
>> +	 * Number of user chains. Since tables cannot have loops, at most
>> +	 * @stacksize jumps (number of user chains) can possibly be made.
>> +	 */
>> +	unsigned int stacksize;
>> +	unsigned int *stackptr;
>> +	void ***jumpstack;
>...
>> --- a/net/netfilter/x_tables.c
>> +++ b/net/netfilter/x_tables.c
>> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
>>  	[NFPROTO_IPV6]   = "ip6",
>>  };
>>  
>> +/* Allow this many total (re)entries. */
>> +static const unsigned int xt_jumpstack_multiplier = 2;
>> +
>
>Why aren't you using a define instead of saving the stack size
>in the table info?

I don't see how a define does any good here. Since you were quoting
the multiplier line, I guess you could be confusing the multiplier
with stored stacksize. FTR, the definition is:

table->stacksize := number_of_user_chains(#UC) * multiplier;

Since #UC is variable, so is stacksize, and so stacksize cannot
be replaced by a constant.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-19 12:54     ` Jan Engelhardt
@ 2010-04-19 14:06       ` Patrick McHardy
  2010-04-20 13:18         ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2010-04-19 14:06 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> On Monday 2010-04-19 14:22, Patrick McHardy wrote:
>>> --- a/include/linux/netfilter/x_tables.h
>>> +++ b/include/linux/netfilter/x_tables.h
>>> @@ -401,6 +401,13 @@ struct xt_table_info {
>>>  	unsigned int hook_entry[NF_INET_NUMHOOKS];
>>>  	unsigned int underflow[NF_INET_NUMHOOKS];
>>>  
>>> +	/*
>>> +	 * Number of user chains. Since tables cannot have loops, at most
>>> +	 * @stacksize jumps (number of user chains) can possibly be made.
>>> +	 */
>>> +	unsigned int stacksize;
>>> +	unsigned int *stackptr;
>>> +	void ***jumpstack;
>> ...
>>> --- a/net/netfilter/x_tables.c
>>> +++ b/net/netfilter/x_tables.c
>>> @@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
>>>  	[NFPROTO_IPV6]   = "ip6",
>>>  };
>>>  
>>> +/* Allow this many total (re)entries. */
>>> +static const unsigned int xt_jumpstack_multiplier = 2;
>>> +
>> Why aren't you using a define instead of saving the stack size
>> in the table info?
> 
> I don't see how a define does any good here. Since you were quoting
> the multiplier line, I guess you could be confusing the multiplier
> with stored stacksize. FTR, the definition is:
> 
> table->stacksize := number_of_user_chains(#UC) * multiplier;
> 
> Since #UC is variable, so is stacksize, and so stacksize cannot
> be replaced by a constant.

Right, thanks for the explanation. Applied.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too
  2010-04-15 23:25 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
@ 2010-04-19 14:07   ` Patrick McHardy
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2010-04-19 14:07 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> Since Xtables is now reentrant/nestable, the cloned packet can also go
> through Xtables and be subject to rules itself.

Applied, thanks Jan.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy
  2010-04-15 23:25 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
@ 2010-04-19 14:08   ` Patrick McHardy
  2010-04-20 12:24     ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2010-04-19 14:08 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Also applied, thanks Jan. I'll push everything out after
some more intensive testing.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy
  2010-04-19 14:08   ` Patrick McHardy
@ 2010-04-20 12:24     ` Patrick McHardy
  2010-04-20 12:29       ` Jan Engelhardt
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McHardy @ 2010-04-20 12:24 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

Patrick McHardy wrote:
> Also applied, thanks Jan. I'll push everything out after
> some more intensive testing.

Not using oif was broken due to an incorrect check for a device
name, there also was a device reference leak. I took the
opportunity to convert it to notifier based device resolving.

If a oif is given, we register a netdevice notifier to resolve
the name on NETDEV_REGISTER or NETDEV_CHANGE and unresolve it
again on NETDEV_UNREGISTER or NETDEV_CHANGE (to a different name).
The behaviour should be equivalent to the runtime resolving.

Please review, if things are fine I'll commit the patch and
push everything out.

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 4796 bytes --]

diff --git a/include/linux/netfilter/xt_TEE.h b/include/linux/netfilter/xt_TEE.h
index 55d4a50..5c21d5c 100644
--- a/include/linux/netfilter/xt_TEE.h
+++ b/include/linux/netfilter/xt_TEE.h
@@ -4,6 +4,9 @@
 struct xt_tee_tginfo {
 	union nf_inet_addr gw;
 	char oif[16];
+
+	/* used internally by the kernel */
+	struct xt_tee_priv *priv __attribute__((aligned(8)));
 };
 
 #endif /* _XT_TEE_TARGET_H */
diff --git a/net/netfilter/xt_TEE.c b/net/netfilter/xt_TEE.c
index 842e701..1dbef1d 100644
--- a/net/netfilter/xt_TEE.c
+++ b/net/netfilter/xt_TEE.c
@@ -15,6 +15,7 @@
 #include <linux/percpu.h>
 #include <linux/route.h>
 #include <linux/skbuff.h>
+#include <linux/notifier.h>
 #include <net/checksum.h>
 #include <net/icmp.h>
 #include <net/ip.h>
@@ -32,6 +33,12 @@
 #	define WITH_IPV6 1
 #endif
 
+struct xt_tee_priv {
+	struct notifier_block	notifier;
+	struct xt_tee_tginfo	*tginfo;
+	int			oif;
+};
+
 static const union nf_inet_addr tee_zero_address;
 static DEFINE_PER_CPU(bool, tee_active);
 
@@ -49,20 +56,6 @@ static struct net *pick_net(struct sk_buff *skb)
 	return &init_net;
 }
 
-static bool tee_tg_route_oif(struct flowi *f, struct net *net,
-			     const struct xt_tee_tginfo *info)
-{
-	const struct net_device *dev;
-
-	if (*info->oif != '\0')
-		return true;
-	dev = dev_get_by_name(net, info->oif);
-	if (dev == NULL)
-		return false;
-	f->oif = dev->ifindex;
-	return true;
-}
-
 static bool
 tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
 {
@@ -72,8 +65,11 @@ tee_tg_route4(struct sk_buff *skb, const struct xt_tee_tginfo *info)
 	struct flowi fl;
 
 	memset(&fl, 0, sizeof(fl));
-	if (!tee_tg_route_oif(&fl, net, info))
-		return false;
+	if (info->priv) {
+		if (info->priv->oif == -1)
+			return false;
+		fl.oif = info->priv->oif;
+	}
 	fl.nl_u.ip4_u.daddr = info->gw.ip;
 	fl.nl_u.ip4_u.tos   = RT_TOS(iph->tos);
 	fl.nl_u.ip4_u.scope = RT_SCOPE_UNIVERSE;
@@ -149,8 +145,11 @@ tee_tg_route6(struct sk_buff *skb, const struct xt_tee_tginfo *info)
 	struct flowi fl;
 
 	memset(&fl, 0, sizeof(fl));
-	if (!tee_tg_route_oif(&fl, net, info))
-		return false;
+	if (info->priv) {
+		if (info->priv->oif == -1)
+			return false;
+		fl.oif = info->priv->oif;
+	}
 	fl.nl_u.ip6_u.daddr = info->gw.in6;
 	fl.nl_u.ip6_u.flowlabel = ((iph->flow_lbl[0] & 0xF) << 16) |
 				  (iph->flow_lbl[1] << 8) | iph->flow_lbl[2];
@@ -198,15 +197,70 @@ tee_tg6(struct sk_buff *skb, const struct xt_target_param *par)
 }
 #endif /* WITH_IPV6 */
 
+static int tee_netdev_event(struct notifier_block *this, unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct xt_tee_priv *priv;
+
+	priv = container_of(this, struct xt_tee_priv, notifier);
+	switch (event) {
+	case NETDEV_REGISTER:
+		if (!strcmp(dev->name, priv->tginfo->oif))
+			priv->oif = dev->ifindex;
+		break;
+	case NETDEV_UNREGISTER:
+		if (dev->ifindex == priv->oif)
+			priv->oif = -1;
+		break;
+	case NETDEV_CHANGENAME:
+		if (!strcmp(dev->name, priv->tginfo->oif))
+			priv->oif = dev->ifindex;
+		else if (dev->ifindex == priv->oif)
+			priv->oif = -1;
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
 static int tee_tg_check(const struct xt_tgchk_param *par)
 {
-	const struct xt_tee_tginfo *info = par->targinfo;
+	struct xt_tee_tginfo *info = par->targinfo;
+	struct xt_tee_priv *priv;
 
-	if (info->oif[sizeof(info->oif)-1] != '\0')
-		return -EINVAL;
 	/* 0.0.0.0 and :: not allowed */
-	return (memcmp(&info->gw, &tee_zero_address,
-	       sizeof(tee_zero_address)) == 0) ? -EINVAL : 0;
+	if (memcmp(&info->gw, &tee_zero_address,
+		   sizeof(tee_zero_address)) == 0)
+		return -EINVAL;
+
+	if (info->oif[0]) {
+		if (info->oif[sizeof(info->oif)-1] != '\0')
+			return -EINVAL;
+
+		priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+		if (priv == NULL)
+			return -ENOMEM;
+
+		priv->tginfo  = info;
+		priv->oif     = -1;
+		priv->notifier.notifier_call = tee_netdev_event;
+
+		register_netdevice_notifier(&priv->notifier);
+	} else
+		info->priv = NULL;
+
+	return 0;
+}
+
+static void tee_tg_destroy(const struct xt_tgdtor_param *par)
+{
+	struct xt_tee_tginfo *info = par->targinfo;
+
+	if (info->priv) {
+		unregister_netdevice_notifier(&info->priv->notifier);
+		kfree(info->priv);
+	}
 }
 
 static struct xt_target tee_tg_reg[] __read_mostly = {
@@ -217,6 +271,7 @@ static struct xt_target tee_tg_reg[] __read_mostly = {
 		.target     = tee_tg4,
 		.targetsize = sizeof(struct xt_tee_tginfo),
 		.checkentry = tee_tg_check,
+		.destroy    = tee_tg_destroy,
 		.me         = THIS_MODULE,
 	},
 #ifdef WITH_IPV6
@@ -227,6 +282,7 @@ static struct xt_target tee_tg_reg[] __read_mostly = {
 		.target     = tee_tg6,
 		.targetsize = sizeof(struct xt_tee_tginfo),
 		.checkentry = tee_tg_check,
+		.destroy    = tee_tg_destroy,
 		.me         = THIS_MODULE,
 	},
 #endif

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy
  2010-04-20 12:24     ` Patrick McHardy
@ 2010-04-20 12:29       ` Jan Engelhardt
  2010-04-20 12:36         ` Patrick McHardy
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-20 12:29 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

On Tuesday 2010-04-20 14:24, Patrick McHardy wrote:

>Patrick McHardy wrote:
>> Also applied, thanks Jan. I'll push everything out after
>> some more intensive testing.
>
>Not using oif was broken due to an incorrect check for a device
>name, there also was a device reference leak. I took the
>opportunity to convert it to notifier based device resolving.
>
>If a oif is given, we register a netdevice notifier to resolve
>the name on NETDEV_REGISTER or NETDEV_CHANGE and unresolve it
>again on NETDEV_UNREGISTER or NETDEV_CHANGE (to a different name).
>The behaviour should be equivalent to the runtime resolving.
>
>Please review, if things are fine I'll commit the patch and
>push everything out.

Seems good to me.

>+		priv = kmalloc(sizeof(*priv), GFP_KERNEL);
>+		if (priv == NULL)
>+			return -ENOMEM;
>+
>+		priv->tginfo  = info;
>+		priv->oif     = -1;
>+		priv->notifier.notifier_call = tee_netdev_event;
>+
>+		register_netdevice_notifier(&priv->notifier);

I take it tee_tg_event gets immediately called after the registration.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy
  2010-04-20 12:29       ` Jan Engelhardt
@ 2010-04-20 12:36         ` Patrick McHardy
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2010-04-20 12:36 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> On Tuesday 2010-04-20 14:24, Patrick McHardy wrote:
> 
>> Patrick McHardy wrote:
>>> Also applied, thanks Jan. I'll push everything out after
>>> some more intensive testing.
>> Not using oif was broken due to an incorrect check for a device
>> name, there also was a device reference leak. I took the
>> opportunity to convert it to notifier based device resolving.
>>
>> If a oif is given, we register a netdevice notifier to resolve
>> the name on NETDEV_REGISTER or NETDEV_CHANGE and unresolve it
>> again on NETDEV_UNREGISTER or NETDEV_CHANGE (to a different name).
>> The behaviour should be equivalent to the runtime resolving.
>>
>> Please review, if things are fine I'll commit the patch and
>> push everything out.
> 
> Seems good to me.

Thanks.

>> +		priv = kmalloc(sizeof(*priv), GFP_KERNEL);
>> +		if (priv == NULL)
>> +			return -ENOMEM;
>> +
>> +		priv->tginfo  = info;
>> +		priv->oif     = -1;
>> +		priv->notifier.notifier_call = tee_netdev_event;
>> +
>> +		register_netdevice_notifier(&priv->notifier);
> 
> I take it tee_tg_event gets immediately called after the registration.

Correct, its invoked once for each existing netdevice during
registration.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-19 14:06       ` Patrick McHardy
@ 2010-04-20 13:18         ` Patrick McHardy
  2010-04-20 13:21           ` Patrick McHardy
  2010-04-20 18:26           ` Jan Engelhardt
  0 siblings, 2 replies; 21+ messages in thread
From: Patrick McHardy @ 2010-04-20 13:18 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Patrick McHardy wrote:
> Jan Engelhardt wrote:
>>>> +/* Allow this many total (re)entries. */
>>>> +static const unsigned int xt_jumpstack_multiplier = 2;
>>>> +
>>> Why aren't you using a define instead of saving the stack size
>>> in the table info?
>> I don't see how a define does any good here. Since you were quoting
>> the multiplier line, I guess you could be confusing the multiplier
>> with stored stacksize. FTR, the definition is:
>>
>> table->stacksize := number_of_user_chains(#UC) * multiplier;
>>
>> Since #UC is variable, so is stacksize, and so stacksize cannot
>> be replaced by a constant.
> 
> Right, thanks for the explanation. Applied.

I just noticed a problem with this patch:

[  428.295752] BUG: sleeping function called from invalid context at
mm/slub.c:1705
[  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
[  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
[  428.295776] Call Trace:
[  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
[  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
[  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]
[  428.295839]  [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables]
[  428.295851]  [<f865abe1>] ? try_module_get+0x82/0x9b [x_tables]
[  428.295864]  [<f865b4c0>] xt_replace_table+0x3c/0x5f [x_tables]
[  428.295876]  [<f86b5dc3>] do_ipt_set_ctl+0x182/0x3d5 [ip_tables]
[  428.295922]  [<c037388f>] nf_sockopt+0x167/0x17c
[  428.295931]  [<c03738d8>] nf_setsockopt+0x1a/0x1f
[  428.295940]  [<c037dda4>] ip_setsockopt+0x60/0x84
[  428.295951]  [<c039260a>] raw_setsockopt+0x1f/0x62
[  428.295960]  [<c034d909>] sock_common_setsockopt+0x18/0x1d
[  428.295968]  [<c034bfb9>] sys_setsockopt+0x5e/0x79
[  428.295977]  [<c034d0a0>] sys_socketcall+0x12d/0x190
[  428.295987]  [<c0102a57>] sysenter_do_call+0x12/0x26

You probably shouldn't be allocating the jumpstack while BHs are
disabled.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-20 13:18         ` Patrick McHardy
@ 2010-04-20 13:21           ` Patrick McHardy
  2010-04-20 18:26           ` Jan Engelhardt
  1 sibling, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2010-04-20 13:21 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Patrick McHardy wrote:
> Patrick McHardy wrote:
>> Jan Engelhardt wrote:
>>>>> +/* Allow this many total (re)entries. */
>>>>> +static const unsigned int xt_jumpstack_multiplier = 2;
>>>>> +
>>>> Why aren't you using a define instead of saving the stack size
>>>> in the table info?
>>> I don't see how a define does any good here. Since you were quoting
>>> the multiplier line, I guess you could be confusing the multiplier
>>> with stored stacksize. FTR, the definition is:
>>>
>>> table->stacksize := number_of_user_chains(#UC) * multiplier;
>>>
>>> Since #UC is variable, so is stacksize, and so stacksize cannot
>>> be replaced by a constant.
>> Right, thanks for the explanation. Applied.
> 
> I just noticed a problem with this patch:
> 
> [  428.295752] BUG: sleeping function called from invalid context at
> mm/slub.c:1705
> [  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
> [  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
> [  428.295776] Call Trace:
> [  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
> [  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
> [  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]
> [  428.295839]  [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables]
> [  428.295851]  [<f865abe1>] ? try_module_get+0x82/0x9b [x_tables]
> [  428.295864]  [<f865b4c0>] xt_replace_table+0x3c/0x5f [x_tables]
> [  428.295876]  [<f86b5dc3>] do_ipt_set_ctl+0x182/0x3d5 [ip_tables]
> [  428.295922]  [<c037388f>] nf_sockopt+0x167/0x17c
> [  428.295931]  [<c03738d8>] nf_setsockopt+0x1a/0x1f
> [  428.295940]  [<c037dda4>] ip_setsockopt+0x60/0x84
> [  428.295951]  [<c039260a>] raw_setsockopt+0x1f/0x62
> [  428.295960]  [<c034d909>] sock_common_setsockopt+0x18/0x1d
> [  428.295968]  [<c034bfb9>] sys_setsockopt+0x5e/0x79
> [  428.295977]  [<c034d0a0>] sys_socketcall+0x12d/0x190
> [  428.295987]  [<c0102a57>] sysenter_do_call+0x12/0x26
> 
> You probably shouldn't be allocating the jumpstack while BHs are
> disabled.

I pushed the entire patchset out, please send a fix on top.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-20 13:18         ` Patrick McHardy
  2010-04-20 13:21           ` Patrick McHardy
@ 2010-04-20 18:26           ` Jan Engelhardt
  2010-04-21 12:47             ` Patrick McHardy
  1 sibling, 1 reply; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-20 18:26 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel


On Tuesday 2010-04-20 15:18, Patrick McHardy wrote:
>
>I just noticed a problem with this patch:
>
>[  428.295752] BUG: sleeping function called from invalid context at
>mm/slub.c:1705
>[  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
>[  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
>[  428.295776] Call Trace:
>[  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
>[  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
>[  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]
>[  428.295839]  [<f865b3bb>] xt_jumpstack_alloc+0x36/0xff [x_tables]
>
>You probably shouldn't be allocating the jumpstack while BHs are
>disabled.


The following patch should address this. It's easily movable
as it only does work on newinfo itself.



The following changes since commit 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350:
  Bart De Schuymer (1):
        netfilter: bridge-netfilter: fix refragmenting IP traffic encapsulated in PPPoE traffic

are available in the git repository at:

  git://dev.medozas.de/linux master

Jan Engelhardt (1):
      netfilter: x_tables: move sleeping allocation outside BH-disabled region

 net/netfilter/x_tables.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

parent 6c79bf0f2440fd250c8fce8d9b82fcf03d4e8350 (v2.6.34-rc3-1335-g6c79bf0)
commit 3109d6b7342a5c8bf23697a86be12e733472d820
Author: Jan Engelhardt <jengelh@medozas.de>
Date:   Tue Apr 20 20:11:15 2010 +0200

netfilter: x_tables: move sleeping allocation outside BH-disabled region

The jumpstack allocation needs to be moved out of the critical region.
Corrects this notice:

BUG: sleeping function called from invalid context at mm/slub.c:1705
[  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
[  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
[  428.295776] Call Trace:
[  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
[  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
[  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 net/netfilter/x_tables.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 3ae3234..445de70 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -801,6 +801,12 @@ xt_replace_table(struct xt_table *table,
 	struct xt_table_info *private;
 	int ret;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	/* Do the substitution. */
 	local_bh_disable();
 	private = table->private;
@@ -814,12 +820,6 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
-	ret = xt_jumpstack_alloc(newinfo);
-	if (ret < 0) {
-		*error = ret;
-		return NULL;
-	}
-
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
-- 
# Created with git-export-patch

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-20 18:26           ` Jan Engelhardt
@ 2010-04-21 12:47             ` Patrick McHardy
  0 siblings, 0 replies; 21+ messages in thread
From: Patrick McHardy @ 2010-04-21 12:47 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netfilter-devel

Jan Engelhardt wrote:
> netfilter: x_tables: move sleeping allocation outside BH-disabled region
> 
> The jumpstack allocation needs to be moved out of the critical region.
> Corrects this notice:
> 
> BUG: sleeping function called from invalid context at mm/slub.c:1705
> [  428.295762] in_atomic(): 1, irqs_disabled(): 0, pid: 9111, name: iptables
> [  428.295771] Pid: 9111, comm: iptables Not tainted 2.6.34-rc1 #2
> [  428.295776] Call Trace:
> [  428.295791]  [<c012138e>] __might_sleep+0xe5/0xed
> [  428.295801]  [<c019e8ca>] __kmalloc+0x92/0xfc
> [  428.295825]  [<f865b3bb>] ? xt_jumpstack_alloc+0x36/0xff [x_tables]

Applied, thanks Jan.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant
  2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
@ 2010-04-13 23:21 ` Jan Engelhardt
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Engelhardt @ 2010-04-13 23:21 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel

Currently, the table traverser stores return addresses in the ruleset
itself (struct ip6t_entry->comefrom). This has a well-known drawback:
the jumpstack is overwritten on reentry, making it necessary for
targets to return absolute verdicts. Also, the ruleset (which might
be heavy memory-wise) needs to be replicated for each CPU that can
possibly invoke ip6t_do_table.

This patch decouples the jumpstack from struct ip6t_entry and instead
puts it into xt_table_info. Not being restricted by 'comefrom'
anymore, we can set up a stack as needed. By default, there is room
allocated for two entries into the traverser. The setting is
configurable at runtime through sysfs and will take effect when a
table is replaced by a new one.

arp_tables is not touched though, because there is just one/two
modules and further patches seek to collapse the table traverser
anyhow.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
---
 include/linux/netfilter/x_tables.h |    7 +++
 net/ipv4/netfilter/arp_tables.c    |    6 ++-
 net/ipv4/netfilter/ip_tables.c     |   65 ++++++++++++++++--------------
 net/ipv6/netfilter/ip6_tables.c    |   56 ++++++++++----------------
 net/netfilter/x_tables.c           |   77 ++++++++++++++++++++++++++++++++++++
 5 files changed, 145 insertions(+), 66 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 26ced0c..50c8672 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -401,6 +401,13 @@ struct xt_table_info {
 	unsigned int hook_entry[NF_INET_NUMHOOKS];
 	unsigned int underflow[NF_INET_NUMHOOKS];
 
+	/*
+	 * Number of user chains. Since tables cannot have loops, at most
+	 * @stacksize jumps (number of user chains) can possibly be made.
+	 */
+	unsigned int stacksize;
+	unsigned int *stackptr;
+	void ***jumpstack;
 	/* ipt_entry tables: one per CPU */
 	/* Note : this field MUST be the last one, see XT_TABLE_INFO_SZ */
 	void *entries[1];
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e8e363d..07a6990 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -649,6 +649,9 @@ static int translate_table(struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			break;
 		++i;
+		if (strcmp(arpt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 	duprintf("translate_table: ARPT_ENTRY_ITERATE gives %d\n", ret);
 	if (ret != 0)
@@ -1774,8 +1777,7 @@ struct xt_table *arpt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 18c5b15..70900ec 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -321,8 +321,6 @@ ipt_do_table(struct sk_buff *skb,
 	     const struct net_device *out,
 	     struct xt_table *table)
 {
-#define tb_comefrom ((struct ipt_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	const struct iphdr *ip;
 	bool hotdrop = false;
@@ -330,7 +328,8 @@ ipt_do_table(struct sk_buff *skb,
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ipt_entry *e, *back;
+	struct ipt_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -356,19 +355,23 @@ ipt_do_table(struct sk_buff *skb,
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
+	pr_devel("Entering %s(hook %u); sp at %u (UF %p)\n",
+		 table->name, hook, origptr,
+		 get_entry(table_base, private->underflow[hook]));
 
 	do {
 		const struct ipt_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip_packet_match(ip, indev, outdev,
 		    &e->ip, mtpar.fragoff)) {
  no_match:
@@ -403,17 +406,28 @@ ipt_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0) {
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+					pr_devel("Underflow (this is normal) "
+						 "to %p\n", e);
+				} else {
+					e = jumpstack[--*stackptr];
+					pr_devel("Pulled %p out from pos %u\n",
+						 e, *stackptr);
+					e = ipt_next_entry(e);
+				}
 				continue;
 			}
 			if (table_base + v != ipt_next_entry(e) &&
 			    !(e->ip.flags & IPT_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ipt_entry *next = ipt_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
+				pr_devel("Pushed %p into pos %u\n",
+					 e, *stackptr - 1);
 			}
 
 			e = get_entry(table_base, v);
@@ -426,18 +440,7 @@ ipt_do_table(struct sk_buff *skb,
 		tgpar.targinfo = t->data;
 
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IPT_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		/* Target might have changed stuff. */
 		ip = ip_hdr(skb);
 		if (verdict == IPT_CONTINUE)
@@ -447,7 +450,9 @@ ipt_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 	xt_info_rdunlock_bh();
-
+	pr_devel("Exiting %s; resetting sp from %u to %u\n",
+		 __func__, *stackptr, origptr);
+	*stackptr = origptr;
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
 #else
@@ -455,8 +460,6 @@ ipt_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -838,6 +841,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ipt_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2086,8 +2092,7 @@ struct xt_table *ipt_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index f2b815e..2a2770b 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -351,15 +351,14 @@ ip6t_do_table(struct sk_buff *skb,
 	      const struct net_device *out,
 	      struct xt_table *table)
 {
-#define tb_comefrom ((struct ip6t_entry *)table_base)->comefrom
-
 	static const char nulldevname[IFNAMSIZ] __attribute__((aligned(sizeof(long))));
 	bool hotdrop = false;
 	/* Initializing verdict to NF_DROP keeps gcc happy. */
 	unsigned int verdict = NF_DROP;
 	const char *indev, *outdev;
 	const void *table_base;
-	struct ip6t_entry *e, *back;
+	struct ip6t_entry *e, **jumpstack;
+	unsigned int *stackptr, origptr, cpu;
 	const struct xt_table_info *private;
 	struct xt_match_param mtpar;
 	struct xt_target_param tgpar;
@@ -383,19 +382,19 @@ ip6t_do_table(struct sk_buff *skb,
 
 	xt_info_rdlock_bh();
 	private = table->private;
-	table_base = private->entries[smp_processor_id()];
+	cpu        = smp_processor_id();
+	table_base = private->entries[cpu];
+	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
+	stackptr   = &private->stackptr[cpu];
+	origptr    = *stackptr;
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 
-	/* For return from builtin chain */
-	back = get_entry(table_base, private->underflow[hook]);
-
 	do {
 		const struct ip6t_entry_target *t;
 		const struct xt_entry_match *ematch;
 
 		IP_NF_ASSERT(e);
-		IP_NF_ASSERT(back);
 		if (!ip6_packet_match(skb, indev, outdev, &e->ipv6,
 		    &mtpar.thoff, &mtpar.fragoff, &hotdrop)) {
  no_match:
@@ -432,17 +431,20 @@ ip6t_do_table(struct sk_buff *skb,
 					verdict = (unsigned)(-v) - 1;
 					break;
 				}
-				e = back;
-				back = get_entry(table_base, back->comefrom);
+				if (*stackptr == 0)
+					e = get_entry(table_base,
+					    private->underflow[hook]);
+				else
+					e = ip6t_next_entry(jumpstack[--*stackptr]);
 				continue;
 			}
 			if (table_base + v != ip6t_next_entry(e) &&
 			    !(e->ipv6.flags & IP6T_F_GOTO)) {
-				/* Save old back ptr in next entry */
-				struct ip6t_entry *next = ip6t_next_entry(e);
-				next->comefrom = (void *)back - table_base;
-				/* set back pointer to next entry */
-				back = next;
+				if (*stackptr >= private->stacksize) {
+					verdict = NF_DROP;
+					break;
+				}
+				jumpstack[(*stackptr)++] = e;
 			}
 
 			e = get_entry(table_base, v);
@@ -454,19 +456,7 @@ ip6t_do_table(struct sk_buff *skb,
 		tgpar.target   = t->u.kernel.target;
 		tgpar.targinfo = t->data;
 
-#ifdef CONFIG_NETFILTER_DEBUG
-		tb_comefrom = 0xeeeeeeec;
-#endif
 		verdict = t->u.kernel.target->target(skb, &tgpar);
-
-#ifdef CONFIG_NETFILTER_DEBUG
-		if (tb_comefrom != 0xeeeeeeec && verdict == IP6T_CONTINUE) {
-			printk("Target %s reentered!\n",
-			       t->u.kernel.target->name);
-			verdict = NF_DROP;
-		}
-		tb_comefrom = 0x57acc001;
-#endif
 		if (verdict == IP6T_CONTINUE)
 			e = ip6t_next_entry(e);
 		else
@@ -474,10 +464,8 @@ ip6t_do_table(struct sk_buff *skb,
 			break;
 	} while (!hotdrop);
 
-#ifdef CONFIG_NETFILTER_DEBUG
-	tb_comefrom = NETFILTER_LINK_POISON;
-#endif
 	xt_info_rdunlock_bh();
+	*stackptr = origptr;
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
@@ -486,8 +474,6 @@ ip6t_do_table(struct sk_buff *skb,
 		return NF_DROP;
 	else return verdict;
 #endif
-
-#undef tb_comefrom
 }
 
 /* Figures out from what hook each rule can be called: returns 0 if
@@ -869,6 +855,9 @@ translate_table(struct net *net, struct xt_table_info *newinfo, void *entry0,
 		if (ret != 0)
 			return ret;
 		++i;
+		if (strcmp(ip6t_get_target(iter)->u.user.name,
+		    XT_ERROR_TARGET) == 0)
+			++newinfo->stacksize;
 	}
 
 	if (i != repl->num_entries) {
@@ -2120,8 +2109,7 @@ struct xt_table *ip6t_register_table(struct net *net,
 {
 	int ret;
 	struct xt_table_info *newinfo;
-	struct xt_table_info bootstrap
-		= { 0, 0, 0, { 0 }, { 0 }, { } };
+	struct xt_table_info bootstrap = {0};
 	void *loc_cpu_entry;
 	struct xt_table *new_table;
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8e23d8f..edde5c6 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -62,6 +62,9 @@ static const char *const xt_prefix[NFPROTO_NUMPROTO] = {
 	[NFPROTO_IPV6]   = "ip6",
 };
 
+/* Allow this many total (re)entries. */
+static const unsigned int xt_jumpstack_multiplier = 2;
+
 /* Registration hooks for targets. */
 int
 xt_register_target(struct xt_target *target)
@@ -680,6 +683,26 @@ void xt_free_table_info(struct xt_table_info *info)
 		else
 			vfree(info->entries[cpu]);
 	}
+
+	if (info->jumpstack != NULL) {
+		if (sizeof(void *) * info->stacksize > PAGE_SIZE) {
+			for_each_possible_cpu(cpu)
+				vfree(info->jumpstack[cpu]);
+		} else {
+			for_each_possible_cpu(cpu)
+				kfree(info->jumpstack[cpu]);
+		}
+	}
+
+	if (sizeof(void **) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->jumpstack);
+	else
+		kfree(info->jumpstack);
+	if (sizeof(unsigned int) * nr_cpu_ids > PAGE_SIZE)
+		vfree(info->stackptr);
+	else
+		kfree(info->stackptr);
+
 	kfree(info);
 }
 EXPORT_SYMBOL(xt_free_table_info);
@@ -724,6 +747,49 @@ EXPORT_SYMBOL_GPL(xt_compat_unlock);
 DEFINE_PER_CPU(struct xt_info_lock, xt_info_locks);
 EXPORT_PER_CPU_SYMBOL_GPL(xt_info_locks);
 
+static int xt_jumpstack_alloc(struct xt_table_info *i)
+{
+	unsigned int size;
+	int cpu;
+
+	size = sizeof(unsigned int) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->stackptr = vmalloc(size);
+	else
+		i->stackptr = kmalloc(size, GFP_KERNEL);
+	if (i->stackptr == NULL)
+		return -ENOMEM;
+	memset(i->stackptr, 0, size);
+
+	size = sizeof(void **) * nr_cpu_ids;
+	if (size > PAGE_SIZE)
+		i->jumpstack = vmalloc(size);
+	else
+		i->jumpstack = kmalloc(size, GFP_KERNEL);
+	if (i->jumpstack == NULL)
+		return -ENOMEM;
+	memset(i->jumpstack, 0, size);
+
+	i->stacksize *= xt_jumpstack_multiplier;
+	size = sizeof(void *) * i->stacksize;
+	for_each_possible_cpu(cpu) {
+		if (size > PAGE_SIZE)
+			i->jumpstack[cpu] = vmalloc_node(size,
+				cpu_to_node(cpu));
+		else
+			i->jumpstack[cpu] = kmalloc_node(size,
+				GFP_KERNEL, cpu_to_node(cpu));
+		if (i->jumpstack[cpu] == NULL)
+			/*
+			 * Freeing will be done later on by the callers. The
+			 * chain is: xt_replace_table -> __do_replace ->
+			 * do_replace -> xt_free_table_info.
+			 */
+			return -ENOMEM;
+	}
+
+	return 0;
+}
 
 struct xt_table_info *
 xt_replace_table(struct xt_table *table,
@@ -732,6 +798,7 @@ xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	int ret;
 
 	/* Do the substitution. */
 	local_bh_disable();
@@ -746,6 +813,12 @@ xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0) {
+		*error = ret;
+		return NULL;
+	}
+
 	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
 
@@ -770,6 +843,10 @@ struct xt_table *xt_register_table(struct net *net,
 	struct xt_table_info *private;
 	struct xt_table *t, *table;
 
+	ret = xt_jumpstack_alloc(newinfo);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
 	/* Don't add one object to multiple lists. */
 	table = kmemdup(input_table, sizeof(struct xt_table), GFP_KERNEL);
 	if (!table) {
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-04-21 12:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-15 23:25 nf-next: TEE 20100215 Jan Engelhardt
2010-04-15 23:25 ` [PATCH 1/4] netfilter: xtables: inclusion of xt_TEE Jan Engelhardt
2010-04-19 12:20   ` Patrick McHardy
2010-04-19 12:36     ` Jan Engelhardt
2010-04-19 12:42       ` Patrick McHardy
2010-04-15 23:25 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt
2010-04-19 12:22   ` Patrick McHardy
2010-04-19 12:54     ` Jan Engelhardt
2010-04-19 14:06       ` Patrick McHardy
2010-04-20 13:18         ` Patrick McHardy
2010-04-20 13:21           ` Patrick McHardy
2010-04-20 18:26           ` Jan Engelhardt
2010-04-21 12:47             ` Patrick McHardy
2010-04-15 23:25 ` [PATCH 3/4] netfilter: xt_TEE: have cloned packet travel through Xtables too Jan Engelhardt
2010-04-19 14:07   ` Patrick McHardy
2010-04-15 23:25 ` [PATCH 4/4] netfilter: xtables: remove old comments about reentrancy Jan Engelhardt
2010-04-19 14:08   ` Patrick McHardy
2010-04-20 12:24     ` Patrick McHardy
2010-04-20 12:29       ` Jan Engelhardt
2010-04-20 12:36         ` Patrick McHardy
  -- strict thread matches above, loose matches on Subject: below --
2010-04-13 23:21 nf-next: TEE only Jan Engelhardt
2010-04-13 23:21 ` [PATCH 2/4] netfilter: xtables2: make ip_tables reentrant Jan Engelhardt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.