All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default
@ 2014-09-16 15:47 Florian Westphal
  2014-09-16 15:47 ` [PATCH 2/2 nf-next] net: bridge: deprecate call_iptables=1 default Florian Westphal
  2014-09-17 12:15 ` [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Pablo Neira Ayuso
  0 siblings, 2 replies; 4+ messages in thread
From: Florian Westphal @ 2014-09-16 15:47 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

Jesper reports that kernels with CONFIG_BRIDGE_NETFILTER=n show significantly
better performance vs. CONFIG_BRIDGE_NETFILTER=y, even with
bridge-nf-call-iptables=0.

This is because bridge registers some bridge netfilter hooks at
module load time, so the static key to bypass nf rule evaluation
via NF_HOOK() is false.

The hooks serve no purpose, unless iptables filtering for bridges is
desired (i.e., bridge-nf-call-*=1 and active iptables rules present).

The proper solution would be to just change the bridge-nf-call-iptables sysctl
default value to 0 and then register the hooks when user enables call-iptables
sysctl.  We cannot do that though since it breaks existing setups.

The next best solution is to delay registering of the hooks until
we know that

a) call-iptables sysctl is enabled (this is the default)
AND
b) ip(6)tables rules are loaded.

This adds br_nf_check_call_iptables() helper in bridge input before
bridge 'prerouting' (sic) hooks to perform this check.

IOW, if user does not turn off call-iptables sysctl on the bridge, hook
registering is only done if NFPROTO_IPV4/IPV6 hooks are registered as
well once the first packet arrives on a bridge port.

Doing this check for every packet is still faster than registering
the hooks unconditionally.  To not add overhead for setups where
the call-iptables hooks are required, a static key shortcut is provided.

As its not possible to register hooks from bh context (grabs mutex)
its scheduled via workqueue.

Note that, to not make this overly complicated, the hooks are not
unregistered again when the sysctl is disabled later on.

If user toggles call-iptables sysctl to 0 before adding any ports
to the bridge, those hooks are not registered even if iptables rules
are loaded.

Daniel reports following results for super_netperf/200/TCP_RR:
CONFIG_BRIDGE_NETFILTER=n: ~988k TPS
CONFIG_BRIDGE_NETFILTER=y: ~971k TPS
CONFIG_BRIDGE_NETFILTER=y /w patch: ~988k TPS

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_input.c     | 41 +++++++++++++++++++++++++++++
 net/bridge/br_netfilter.c | 67 ++++++++++++++++++++++++++++++++++++++---------
 net/bridge/br_private.h   | 11 ++++++++
 3 files changed, 106 insertions(+), 13 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 366c436..bd770b7 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -18,6 +18,11 @@
 #include <linux/netfilter_bridge.h>
 #include <linux/export.h>
 #include <linux/rculist.h>
+
+#ifdef CONFIG_BRIDGE_NETFILTER
+#include <linux/jump_label.h>
+#endif
+
 #include "br_private.h"
 
 /* Hook for brouter */
@@ -153,6 +158,39 @@ static int br_handle_local_finish(struct sk_buff *skb)
 	return 0;	 /* process further */
 }
 
+#ifdef CONFIG_BRIDGE_NETFILTER
+extern struct static_key brnf_hooks_active;
+
+static inline int br_nf_check_call_iptables(const struct net_bridge *br)
+{
+	bool ip, ip6, arp;
+	int i;
+
+	if (static_key_true(&brnf_hooks_active))
+		return 0;
+
+	ip  = brnf_call_iptables  || br->nf_call_iptables;
+	ip6 = brnf_call_ip6tables || br->nf_call_ip6tables;
+	arp = brnf_call_arptables || br->nf_call_arptables;
+
+	for (i=0; i < NF_MAX_HOOKS; i++) {
+		if (nf_hooks_active(NFPROTO_IPV4, i) && ip)
+			break;
+		if (nf_hooks_active(NFPROTO_IPV6, i) && ip6)
+			break;
+		if (nf_hooks_active(NFPROTO_ARP, i) && arp)
+			break;
+	}
+
+	if (i < NF_MAX_HOOKS)
+		return br_netfiler_hooks_init();
+
+	return 0;
+}
+#else
+static inline int br_nf_check_call_iptables(const struct net_bridge *br){}
+#endif
+
 /*
  * Return NULL if skb is handled
  * note: already called with rcu_read_lock
@@ -176,6 +214,9 @@ rx_handler_result_t br_handle_frame(struct sk_buff **pskb)
 
 	p = br_port_get_rcu(skb->dev);
 
+	if (br_nf_check_call_iptables(p->br))
+		goto drop;
+
 	if (unlikely(is_link_local_ether_addr(dest))) {
 		u16 fwd_mask = p->br->group_fwd_mask_required;
 
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index a615264..c6715a2 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -31,6 +31,8 @@
 #include <linux/netfilter_arp.h>
 #include <linux/in_route.h>
 #include <linux/inetdevice.h>
+#include <linux/workqueue.h>
+#include <linux/jump_label.h>
 
 #include <net/ip.h>
 #include <net/ipv6.h>
@@ -47,18 +49,19 @@
 #define store_orig_dstaddr(skb)	 (skb_origaddr(skb) = ip_hdr(skb)->daddr)
 #define dnat_took_place(skb)	 (skb_origaddr(skb) != ip_hdr(skb)->daddr)
 
+struct static_key brnf_hooks_active __read_mostly;
+static DEFINE_MUTEX(brnf_hook_mutex);
+
 #ifdef CONFIG_SYSCTL
 static struct ctl_table_header *brnf_sysctl_header;
-static int brnf_call_iptables __read_mostly = 1;
-static int brnf_call_ip6tables __read_mostly = 1;
-static int brnf_call_arptables __read_mostly = 1;
 static int brnf_filter_vlan_tagged __read_mostly = 0;
 static int brnf_filter_pppoe_tagged __read_mostly = 0;
 static int brnf_pass_vlan_indev __read_mostly = 0;
+
+int brnf_call_iptables __read_mostly = 1;
+int brnf_call_ip6tables __read_mostly = 1;
+int brnf_call_arptables __read_mostly = 1;
 #else
-#define brnf_call_iptables 1
-#define brnf_call_ip6tables 1
-#define brnf_call_arptables 1
 #define brnf_filter_vlan_tagged 0
 #define brnf_filter_pppoe_tagged 0
 #define brnf_pass_vlan_indev 0
@@ -1059,6 +1062,45 @@ static struct ctl_table brnf_table[] = {
 };
 #endif
 
+static void __br_register_hooks_init(struct work_struct *w)
+{
+	int ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
+	if (ret == 0)
+		static_key_slow_inc(&brnf_hooks_active);
+
+	mutex_unlock(&brnf_hook_mutex);
+	kfree(w);
+
+	WARN_ONCE(ret, "could not register bridge netfilter hook (error %d)",
+									ret);
+}
+
+int br_netfiler_hooks_init(void)
+{
+	struct work_struct *w;
+
+	/* called from bh context, cannot sleep */
+	if (!mutex_trylock(&brnf_hook_mutex))
+		return -EBUSY;
+
+	if (static_key_true(&brnf_hooks_active)) {
+		mutex_unlock(&brnf_hook_mutex);
+		return 0;
+	}
+
+	w = kzalloc(sizeof(*w), GFP_ATOMIC);
+	if (!w) {
+		mutex_unlock(&brnf_hook_mutex);
+		return -ENOMEM;
+	}
+
+	INIT_WORK(w, __br_register_hooks_init);
+	schedule_work(w);
+	/* worker will unlock mutex */
+
+	return 0;
+}
+
 int __init br_netfilter_init(void)
 {
 	int ret;
@@ -1067,17 +1109,11 @@ int __init br_netfilter_init(void)
 	if (ret < 0)
 		return ret;
 
-	ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
-	if (ret < 0) {
-		dst_entries_destroy(&fake_dst_ops);
-		return ret;
-	}
 #ifdef CONFIG_SYSCTL
 	brnf_sysctl_header = register_net_sysctl(&init_net, "net/bridge", brnf_table);
 	if (brnf_sysctl_header == NULL) {
 		printk(KERN_WARNING
 		       "br_netfilter: can't register to sysctl.\n");
-		nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
 		dst_entries_destroy(&fake_dst_ops);
 		return -ENOMEM;
 	}
@@ -1088,7 +1124,12 @@ int __init br_netfilter_init(void)
 
 void br_netfilter_fini(void)
 {
-	nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
+	/* sync with possibly running __br_register_hooks_init() */
+	mutex_lock(&brnf_hook_mutex);
+
+	if (static_key_enabled(&brnf_hooks_active))
+		nf_unregister_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
+	mutex_unlock(&brnf_hook_mutex);
 #ifdef CONFIG_SYSCTL
 	unregister_net_sysctl_table(brnf_sysctl_header);
 #endif
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 62a7fa2..e3b80ce 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -755,6 +755,17 @@ static inline int br_vlan_enabled(struct net_bridge *br)
 int br_netfilter_init(void);
 void br_netfilter_fini(void);
 void br_netfilter_rtable_init(struct net_bridge *);
+int br_netfiler_hooks_init(void);
+
+#ifdef CONFIG_SYSCTL
+extern int brnf_call_iptables;
+extern int brnf_call_ip6tables;
+extern int brnf_call_arptables;
+#else
+#define brnf_call_iptables 1
+#define brnf_call_ip6tables 1
+#define brnf_call_arptables 1
+#endif
 #else
 #define br_netfilter_init()	(0)
 #define br_netfilter_fini()	do { } while (0)
-- 
1.8.1.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2 nf-next] net: bridge: deprecate call_iptables=1 default
  2014-09-16 15:47 [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Florian Westphal
@ 2014-09-16 15:47 ` Florian Westphal
  2014-09-17 12:15 ` [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Pablo Neira Ayuso
  1 sibling, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2014-09-16 15:47 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev, Florian Westphal

add a warning and tell users to set call_iptables to 0 or 1
depending on desired behaviour before adding ports to the bridge.

Might allow changing default to 0 in the future.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/bridge/br_netfilter.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index c6715a2..0b19a81 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -58,9 +58,9 @@ static int brnf_filter_vlan_tagged __read_mostly = 0;
 static int brnf_filter_pppoe_tagged __read_mostly = 0;
 static int brnf_pass_vlan_indev __read_mostly = 0;
 
-int brnf_call_iptables __read_mostly = 1;
-int brnf_call_ip6tables __read_mostly = 1;
-int brnf_call_arptables __read_mostly = 1;
+int brnf_call_iptables __read_mostly = 2;
+int brnf_call_ip6tables __read_mostly = 2;
+int brnf_call_arptables __read_mostly = 2;
 #else
 #define brnf_filter_vlan_tagged 0
 #define brnf_filter_pppoe_tagged 0
@@ -1065,9 +1065,16 @@ static struct ctl_table brnf_table[] = {
 static void __br_register_hooks_init(struct work_struct *w)
 {
 	int ret = nf_register_hooks(br_nf_ops, ARRAY_SIZE(br_nf_ops));
-	if (ret == 0)
+	if (ret == 0) {
 		static_key_slow_inc(&brnf_hooks_active);
 
+		if ((brnf_call_iptables|brnf_call_ip6tables|brnf_call_arptables) == 2)
+			pr_info("bridge: automatic filtering via arp/ip/ip6tables "
+				"is deprecated and it will "
+				"be disabled soon. "
+				"Set bridge-nf-call-{ip,ip6,arp}tables sysctls to 1 or 0 "
+				"before adding bridge ports instead.\n");
+	}
 	mutex_unlock(&brnf_hook_mutex);
 	kfree(w);
 
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default
  2014-09-16 15:47 [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Florian Westphal
  2014-09-16 15:47 ` [PATCH 2/2 nf-next] net: bridge: deprecate call_iptables=1 default Florian Westphal
@ 2014-09-17 12:15 ` Pablo Neira Ayuso
  2014-09-17 12:42   ` Florian Westphal
  1 sibling, 1 reply; 4+ messages in thread
From: Pablo Neira Ayuso @ 2014-09-17 12:15 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, netdev

Hi Florian,

On Tue, Sep 16, 2014 at 05:47:36PM +0200, Florian Westphal wrote:
> Jesper reports that kernels with CONFIG_BRIDGE_NETFILTER=n show significantly
> better performance vs. CONFIG_BRIDGE_NETFILTER=y, even with
> bridge-nf-call-iptables=0.
> 
> This is because bridge registers some bridge netfilter hooks at
> module load time, so the static key to bypass nf rule evaluation
> via NF_HOOK() is false.
> 
> The hooks serve no purpose, unless iptables filtering for bridges is
> desired (i.e., bridge-nf-call-*=1 and active iptables rules present).
> 
> The proper solution would be to just change the bridge-nf-call-iptables sysctl
> default value to 0 and then register the hooks when user enables call-iptables
> sysctl.  We cannot do that though since it breaks existing setups.
>
> The next best solution is to delay registering of the hooks until
> we know that
> 
> a) call-iptables sysctl is enabled (this is the default)
> AND
> b) ip(6)tables rules are loaded.
> 
> This adds br_nf_check_call_iptables() helper in bridge input before
> bridge 'prerouting' (sic) hooks to perform this check.
> 
> IOW, if user does not turn off call-iptables sysctl on the bridge, hook
> registering is only done if NFPROTO_IPV4/IPV6 hooks are registered as
> well once the first packet arrives on a bridge port.
> 
> Doing this check for every packet is still faster than registering
> the hooks unconditionally.  To not add overhead for setups where
> the call-iptables hooks are required, a static key shortcut is provided.
> 
> As its not possible to register hooks from bh context (grabs mutex)
> its scheduled via workqueue.

My main concern with this approach is that we may let packets go
through unfiltered for some little time until the worker thread has
the chance to register the hooks.

Alternatives that I see for these are:

* pr_info to indicate the br_netfilter enable by default is
  deprecated. Similar to your small patch 2/2, but it will take quite
  some time until we can finally change this to zero.

* I think we can unregister the hooks when we notice that all
  bridge-nf-call-*tables are zero from the sysctl. We register them if
  any of them becomes 1 again.

Let me know, thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default
  2014-09-17 12:15 ` [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Pablo Neira Ayuso
@ 2014-09-17 12:42   ` Florian Westphal
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Westphal @ 2014-09-17 12:42 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, netdev

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > As its not possible to register hooks from bh context (grabs mutex)
> > its scheduled via workqueue.
> 
> My main concern with this approach is that we may let packets go
> through unfiltered for some little time until the worker thread has
> the chance to register the hooks.

They are supposed to be dropped until hook is there.
However, Nikolay Alexandrov just (correctly...) pointed out to me via irc that this mutex
abuse is against the rules (lock/unlock in different threads!).

So, please toss this series.

> Alternatives that I see for these are:
> * pr_info to indicate the br_netfilter enable by default is
>   deprecated. Similar to your small patch 2/2, but it will take quite
>   some time until we can finally change this to zero.

Right.  I had planned to send a revert for patch 1 at some point,
plus a change to default to 0.

> * I think we can unregister the hooks when we notice that all
>   bridge-nf-call-*tables are zero from the sysctl. We register them if
>   any of them becomes 1 again.

Thats what I thought, but we now also have per-bridge sysfs knobs,
i.e. sysctl could be 0 and someone enabling br42->call_iptables ....

Its doable, but it gets ugly.
I don't think its worth the pain to care about "sometimes
enabled/disabled".

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-09-17 12:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-16 15:47 [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Florian Westphal
2014-09-16 15:47 ` [PATCH 2/2 nf-next] net: bridge: deprecate call_iptables=1 default Florian Westphal
2014-09-17 12:15 ` [PATCH 1/2 nf-next] net: bridge: don't register netfilter call_iptables hooks by default Pablo Neira Ayuso
2014-09-17 12:42   ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.