All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
       [not found] <20140430092905.GA4318@localhost>
@ 2014-05-02 15:40 ` Vasily Averin
  2014-05-02 22:55   ` Florian Westphal
  0 siblings, 1 reply; 20+ messages in thread
From: Vasily Averin @ 2014-05-02 15:40 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso
  Cc: netfilter-devel, Stephen Hemminger, Patrick McHardy

Dear Pablo, Florian,

could you please take look at patch below?
I've added per network namespace flag for manage of ipv4 defragmentation
in bridge. In OpenVZ we can have enabled conntracks in some container and
disabled in another ones. Seems this functionality was not merged into
mainline yet, however I hope it will be done sooner or later.

I'm not sure about name of flag variable and about its location.
I believe you can have better ideas about this.

Also I have one more question -- about defrag user check.

In my patch I use
if ((user >= IP_DEFRAG_CONNTRACK_BRIDGE_IN) && 
    (user <= __IP_DEFRAG_CONNTRACK_BRIDGE_IN))
because nf_ct_defrag_user can add zone.

I've found defrag user check in ip_expire() -- but it does not take
account of zone.
Is it a bug in ip_expire() or I missed something?

---[patch rfc]---
Currently bridge can silently drop ipv4 fragments.
If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
br_nf_pre_routing defragments incoming ipv4 fragments, but skb->nfct check
in br_nf_dev_queue_xmit does not allow to re-fragment combined packet back,
and therefore it is dropped in br_dev_queue_push_xmit without incrementing
of any failcounters.

According to Patrick McHardy, bridge should not defragment and fragment
packets unless conntrack is enabled.

This patch adds per network namespace flag to manage ipv4 defragmentation
in bridge.

Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 include/net/netns/conntrack.h                  |    1 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    2 +
 net/ipv4/netfilter/nf_defrag_ipv4.c            |   39 ++++++++++++++++++++++-
 3 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 773cce3..7589937 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -25,6 +25,7 @@ struct nf_proto_net {
 struct nf_generic_net {
 	struct nf_proto_net pn;
 	unsigned int timeout;
+	bool br_ipv4_defrag_disabled;
 };
 
 struct nf_tcp_net {
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index d807822..5f773d4 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -87,6 +87,20 @@ static unsigned int ipv4_conntrack_defrag(const struct nf_hook_ops *ops,
 		enum ip_defrag_users user =
 			nf_ct_defrag_user(ops->hooknum, skb);
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK) && defined (CONFIG_BRIDGE_NETFILTER)
+		if ((user >= IP_DEFRAG_CONNTRACK_BRIDGE_IN) &&
+		    (user <= __IP_DEFRAG_CONNTRACK_BRIDGE_IN)) {
+#ifdef CONFIG_NET_NS
+			struct net *net = skb->sk->sk_net;
+#else
+			struct net *net = &init_net;
+#endif
+			/* A bridge should not defragment and fragment packets. 
+			   We only do it if connection tracking is enabled. */
+			if (net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled)
+				return NF_ACCEPT;
+		}
+#endif
 		if (nf_ct_ipv4_gather_frags(skb, user))
 			return NF_STOLEN;
 	}
@@ -110,14 +124,37 @@ static struct nf_hook_ops ipv4_defrag_ops[] = {
 	},
 };
 
+static int nf_defrag_ipv4_net_init(struct net *net)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled = true;
+#endif
+	return 0;
+}
+
+static struct pernet_operations nf_defrag_ipv4_net_ops = {
+	.init = nf_defrag_ipv4_net_init,
+};
+
 static int __init nf_defrag_init(void)
 {
-	return nf_register_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	int ret = 0;
+
+	ret = register_pernet_subsys(&nf_defrag_ipv4_net_ops);
+	if (ret)
+		goto out;
+
+	ret = nf_register_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	if (ret)
+		unregister_pernet_subsys(&nf_defrag_ipv4_net_ops);
+out:
+	return ret;
 }
 
 static void __exit nf_defrag_fini(void)
 {
 	nf_unregister_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	unregister_pernet_subsys(&nf_defrag_ipv4_net_ops);
 }
 
 void nf_defrag_ipv4_enable(void)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-02 15:40 ` [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled Vasily Averin
@ 2014-05-02 22:55   ` Florian Westphal
  2014-05-03  7:15     ` Vasily Averin
                       ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Florian Westphal @ 2014-05-02 22:55 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel,
	Stephen Hemminger, Patrick McHardy

Vasily Averin <vvs@parallels.com> wrote:
> Also I have one more question -- about defrag user check.
> 
> In my patch I use
> if ((user >= IP_DEFRAG_CONNTRACK_BRIDGE_IN) && 
>     (user <= __IP_DEFRAG_CONNTRACK_BRIDGE_IN))
> because nf_ct_defrag_user can add zone.
> 
> I've found defrag user check in ip_expire() -- but it does not take
> account of zone.
> Is it a bug in ip_expire() or I missed something?

Looks like a bug to me.

> ---[patch rfc]---
> Currently bridge can silently drop ipv4 fragments.
> If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
> br_nf_pre_routing defragments incoming ipv4 fragments, but skb->nfct check
> in br_nf_dev_queue_xmit does not allow to re-fragment combined packet back,
> and therefore it is dropped in br_dev_queue_push_xmit without incrementing
> of any failcounters.
> 
> According to Patrick McHardy, bridge should not defragment and fragment
> packets unless conntrack is enabled.
> 
> This patch adds per network namespace flag to manage ipv4 defragmentation
> in bridge.
> 
> Signed-off-by: Vasily Averin <vvs@openvz.org>

Are we sure this is required rather than just removing the skb->nfct
test in br_nf_dev_queue_xmit() and be done with it?

Because that seems a lot saner to me, I fail to see how

if (skb->protocol == htons(ETH_P_IP) &&
           skb->len + nf_bridge_mtu_reduction(skb) >
		    skb->dev->mtu && !skb_is_gso(skb)) {

Would evaluate as 'true' without nf_defrag_ipv4 module loaded.

[ its from br_nf_dev_queue_xmit function ]

>  struct nf_generic_net {
>  	struct nf_proto_net pn;
>  	unsigned int timeout;
> +	bool br_ipv4_defrag_disabled;
>  };

>  struct nf_tcp_net {
> diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
> index d807822..5f773d4 100644
> --- a/net/ipv4/netfilter/nf_defrag_ipv4.c
> +++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
> @@ -87,6 +87,20 @@ static unsigned int ipv4_conntrack_defrag(const struct nf_hook_ops *ops,
>  		enum ip_defrag_users user =
>  			nf_ct_defrag_user(ops->hooknum, skb);
>  
> +#if IS_ENABLED(CONFIG_NF_CONNTRACK) && defined (CONFIG_BRIDGE_NETFILTER)
> +		if ((user >= IP_DEFRAG_CONNTRACK_BRIDGE_IN) &&
> +		    (user <= __IP_DEFRAG_CONNTRACK_BRIDGE_IN)) {
> +#ifdef CONFIG_NET_NS
> +			struct net *net = skb->sk->sk_net;
> +#else
> +			struct net *net = &init_net;
> +#endif
> +			/* A bridge should not defragment and fragment packets. 
> +			   We only do it if connection tracking is enabled. */
> +			if (net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled)
> +				return NF_ACCEPT;
> +		}
> +#endif

I wonder who would be responsible to set br_ipv4_defrag_disabled to false
to enable conntracking on a bridge again?

Did i miss something in the patch?

Thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-02 22:55   ` Florian Westphal
@ 2014-05-03  7:15     ` Vasily Averin
  2014-05-03  7:18     ` [PATCH RFC v2] " Vasily Averin
  2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
  2 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-03  7:15 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, netfilter-devel, Stephen Hemminger, Patrick McHardy

On 05/03/2014 02:55 AM, Florian Westphal wrote:
> Vasily Averin <vvs@parallels.com> wrote:
>> I've found defrag user check in ip_expire() -- but it does not take
>> account of zone.
>> Is it a bug in ip_expire() or I missed something?
> 
> Looks like a bug to me.

Thanks, I've submitted separated patch to fix it.

>> ---[patch rfc]---
>> This patch adds per network namespace flag to manage ipv4 defragmentation
>> in bridge.
> 
> Are we sure this is required rather than just removing the skb->nfct
> test in br_nf_dev_queue_xmit() and be done with it?
> Because that seems a lot saner to me, I fail to see how
> 
> if (skb->protocol == htons(ETH_P_IP) &&
>            skb->len + nf_bridge_mtu_reduction(skb) >
> 		    skb->dev->mtu && !skb_is_gso(skb)) {
> 
> Would evaluate as 'true' without nf_defrag_ipv4 module loaded.
> 
> [ its from br_nf_dev_queue_xmit function ]

I think you are right, seems skb->nfct check will be extra anyway.

However my patch fixes wrong processing packets in bridge in case disabled conntracks.
Probably Patrick can elaborate in more details why this is bad.

I have noticed only that currently bridge without conntracks can silently merge 2 small
packets if their common size < mtu. Probably it can be unexpected for processing on destination side.
However if we enable connection tracking -- we'll get the same situation again.

> I wonder who would be responsible to set br_ipv4_defrag_disabled to false
> to enable conntracking on a bridge again?
> 
> Did i miss something in the patch?

I'm sorry, you are right, changes of nf_conntrack_ipv4 pernet_operations was lost 
I'll resend resend updated patch soon.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC v2] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-02 22:55   ` Florian Westphal
  2014-05-03  7:15     ` Vasily Averin
@ 2014-05-03  7:18     ` Vasily Averin
  2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
  2 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-03  7:18 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netfilter-devel, Stephen Hemminger, Patrick McHardy, Pablo Neira Ayuso

Currently bridge can silently drop ipv4 fragments.
If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
br_nf_pre_routing defragments incoming ipv4 fragments, but skb->nfct check
in br_nf_dev_queue_xmit does not allow to re-fragment combined packet back,
and therefore it is dropped in br_dev_queue_push_xmit without incrementing
of any failcounters.

According to Patrick McHardy, bridge should not defragment and fragment
packets unless conntrack is enabled.

This patch adds per network namespace flag to manage ipv4 defragmentation
in bridge.

v2: added missed hooks: nf_conntrack pernet_operations hooks changes flag state

Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 include/net/netns/conntrack.h                  |    1 +
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    2 +
 net/ipv4/netfilter/nf_defrag_ipv4.c            |   39 +++++++++++++++++++++++-
 3 files changed, 41 insertions(+), 1 deletions(-)

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 773cce3..7589937 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -25,6 +25,7 @@ struct nf_proto_net {
 struct nf_generic_net {
 	struct nf_proto_net pn;
 	unsigned int timeout;
+	bool br_ipv4_defrag_disabled;
 };
 
 struct nf_tcp_net {
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 8127dc8..9ca9333 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -432,6 +432,7 @@ static int ipv4_net_init(struct net *net)
 		pr_err("nf_conntrack_ipv4: pernet registration failed\n");
 		goto out_ipv4;
 	}
+	net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled = false;
 	return 0;
 out_ipv4:
 	nf_ct_l4proto_pernet_unregister(net, &nf_conntrack_l4proto_icmp);
@@ -445,6 +446,7 @@ out_tcp:
 
 static void ipv4_net_exit(struct net *net)
 {
+	net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled = true;
 	nf_ct_l3proto_pernet_unregister(net, &nf_conntrack_l3proto_ipv4);
 	nf_ct_l4proto_pernet_unregister(net, &nf_conntrack_l4proto_icmp);
 	nf_ct_l4proto_pernet_unregister(net, &nf_conntrack_l4proto_udp4);
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index 12e13bd..0bd499b 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -86,6 +86,20 @@ static unsigned int ipv4_conntrack_defrag(const struct nf_hook_ops *ops,
 		enum ip_defrag_users user =
 			nf_ct_defrag_user(ops->hooknum, skb);
 
+#if IS_ENABLED(CONFIG_NF_CONNTRACK) && defined (CONFIG_BRIDGE_NETFILTER)
+		if ((user >= IP_DEFRAG_CONNTRACK_BRIDGE_IN) &&
+		    (user <= __IP_DEFRAG_CONNTRACK_BRIDGE_IN)) {
+#ifdef CONFIG_NET_NS
+			struct net *net = skb->sk->sk_net;
+#else
+			struct net *net = &init_net;
+#endif
+			/* A bridge should not defragment and fragment packets. 
+			   We only do it if connection tracking is enabled. */
+			if (net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled)
+				return NF_ACCEPT;
+		}
+#endif
 		if (nf_ct_ipv4_gather_frags(skb, user))
 			return NF_STOLEN;
 	}
@@ -109,14 +123,37 @@ static struct nf_hook_ops ipv4_defrag_ops[] = {
 	},
 };
 
+static int nf_defrag_ipv4_net_init(struct net *net)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+	net->ct.nf_ct_proto.generic.br_ipv4_defrag_disabled = true;
+#endif
+	return 0;
+}
+
+static struct pernet_operations nf_defrag_ipv4_net_ops = {
+	.init = nf_defrag_ipv4_net_init,
+};
+
 static int __init nf_defrag_init(void)
 {
-	return nf_register_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	int ret = 0;
+
+	ret = register_pernet_subsys(&nf_defrag_ipv4_net_ops);
+	if (ret)
+		goto out;
+
+	ret = nf_register_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	if (ret)
+		unregister_pernet_subsys(&nf_defrag_ipv4_net_ops);
+out:
+	return ret;
 }
 
 static void __exit nf_defrag_fini(void)
 {
 	nf_unregister_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	unregister_pernet_subsys(&nf_defrag_ipv4_net_ops);
 }
 
 void nf_defrag_ipv4_enable(void)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-02 22:55   ` Florian Westphal
  2014-05-03  7:15     ` Vasily Averin
  2014-05-03  7:18     ` [PATCH RFC v2] " Vasily Averin
@ 2014-05-03 23:39     ` Pablo Neira Ayuso
  2014-05-04  0:23       ` Florian Westphal
                         ` (3 more replies)
  2 siblings, 4 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2014-05-03 23:39 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Vasily Averin, netfilter-devel, Stephen Hemminger, Patrick McHardy

On Sat, May 03, 2014 at 12:55:22AM +0200, Florian Westphal wrote:
[...]
> > ---[patch rfc]---
> > Currently bridge can silently drop ipv4 fragments.
> > If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
> > br_nf_pre_routing defragments incoming ipv4 fragments, but skb->nfct check
> > in br_nf_dev_queue_xmit does not allow to re-fragment combined packet back,
> > and therefore it is dropped in br_dev_queue_push_xmit without incrementing
> > of any failcounters.
> > 
> > According to Patrick McHardy, bridge should not defragment and fragment
> > packets unless conntrack is enabled.
> > 
> > This patch adds per network namespace flag to manage ipv4 defragmentation
> > in bridge.
> > 
> > Signed-off-by: Vasily Averin <vvs@openvz.org>
> 
> Are we sure this is required rather than just removing the skb->nfct
> test in br_nf_dev_queue_xmit() and be done with it?
> 
> Because that seems a lot saner to me, I fail to see how
> 
> if (skb->protocol == htons(ETH_P_IP) &&
>            skb->len + nf_bridge_mtu_reduction(skb) >
> 		    skb->dev->mtu && !skb_is_gso(skb)) {
> 
> Would evaluate as 'true' without nf_defrag_ipv4 module loaded.
> 
> [ its from br_nf_dev_queue_xmit function ]

I think we still may see IP packets larger than the mtu in that path.
It would be a rare case since we need that the bridge has different
(smaller) mtu than the sender, but still possible. The is_skb_forwardable()
check in the current tree snapshot comes just a bit later, so if we
remove that skb->nfct, the bridge will fragment large packets.

In general, I believe bridges should silently drop packets that are
larger than the mtu and they should perform no fragmentation handling,
no gathering and no [re]fragmentation. They are transparent devices
that operate at layer 2.

The conntrack case is a special case that forces us to enable
fragmentation handling since we get sort of a bridge that inspects
layer 3 and 4 packet information. So we have sort of, let's call it, a
mutant bridge.

We also have the tproxy target and the socket match, they seem to
require defragmentation as well, I'm afraid the skb->nfct check will
not help for those cases. I think that we need some counter to know
how many clients we have that require the gathering + fragmentation
code, so if we have at least one, we have to enable it.

Perhaps we can also display a message to inform the user that
netfilter fragmentation handling is enabled.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
@ 2014-05-04  0:23       ` Florian Westphal
  2014-05-04 11:15         ` Pablo Neira Ayuso
  2014-05-04 20:06       ` Bart De Schuymer
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2014-05-04  0:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Vasily Averin, netfilter-devel,
	Stephen Hemminger, Patrick McHardy

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > ---[patch rfc]---
> > > Currently bridge can silently drop ipv4 fragments.
> > > If node have loaded nf_defrag_ipv4 module but have no nf_conntrack_ipv4,
> > > br_nf_pre_routing defragments incoming ipv4 fragments, but skb->nfct check
> > > in br_nf_dev_queue_xmit does not allow to re-fragment combined packet back,
> > > and therefore it is dropped in br_dev_queue_push_xmit without incrementing
> > > of any failcounters.
> > > 
> > > According to Patrick McHardy, bridge should not defragment and fragment
> > > packets unless conntrack is enabled.
> > > 
> > > This patch adds per network namespace flag to manage ipv4 defragmentation
> > > in bridge.
> > > 
> > > Signed-off-by: Vasily Averin <vvs@openvz.org>
> > 
> > Are we sure this is required rather than just removing the skb->nfct
> > test in br_nf_dev_queue_xmit() and be done with it?
> > 
> > Because that seems a lot saner to me, I fail to see how
> > 
> > if (skb->protocol == htons(ETH_P_IP) &&
> >            skb->len + nf_bridge_mtu_reduction(skb) >
> > 		    skb->dev->mtu && !skb_is_gso(skb)) {
> > 
> > Would evaluate as 'true' without nf_defrag_ipv4 module loaded.
> > 
> > [ its from br_nf_dev_queue_xmit function ]
> 
> I think we still may see IP packets larger than the mtu in that path.
> It would be a rare case since we need that the bridge has different
> (smaller) mtu than the sender, but still possible. The is_skb_forwardable()
> check in the current tree snapshot comes just a bit later, so if we
> remove that skb->nfct, the bridge will fragment large packets.

I have to confess that I never tried that out; I assumed the nic
would toss the over-mtu frame.

> In general, I believe bridges should silently drop packets that are
> larger than the mtu and they should perform no fragmentation handling,
> no gathering and no [re]fragmentation. They are transparent devices
> that operate at layer 2.

Not sure I agree.  Silent drops are bad (or perhaps I misunderstand you,
if we do 'silent drop' in br_nf_dev_queue_xmit there should at least be
a mib counter of some sort).

The last part stands of course, I agree that bridges should be
transparent and not do frag handling etc.

> The conntrack case is a special case that forces us to enable
> fragmentation handling since we get sort of a bridge that inspects
> layer 3 and 4 packet information. So we have sort of, let's call it, a
> mutant bridge.

Yes 8-/

> We also have the tproxy target and the socket match, they seem to
> require defragmentation as well, I'm afraid the skb->nfct check will
> not help for those cases. I think that we need some counter to know
> how many clients we have that require the gathering + fragmentation
> code, so if we have at least one, we have to enable it.

Last time I tried TPROXY on top of bridge it was a pain in the neck.

Essentially one has to build a 'brouter' and force packets
upwards the stack (DROP via ebtables in broute table).

Such packets will not be seen by the bridge since they're routed
normally via the ip stack for local delivery.

(-j TPROXY needs policy routing for the redirect to work).

It is also rather fragile in my experience (due to ebtables just
seeing ethernet frames doing 'broute DROP only for tcp port 80' doesn't work
universally since we don't see netfilter-defragmented packets at that stage).

All things considered I think that just doing the re-fragmentation (aka
just remove skb->nfct test) is really the least-sucky one of the options
we have.

If you do IP NAT/TPROXY/conntrack on bridges you're already asking for varying
degrees of layering violations, so I think it would at least be preferable to
have one that "works" :-)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-04  0:23       ` Florian Westphal
@ 2014-05-04 11:15         ` Pablo Neira Ayuso
  0 siblings, 0 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2014-05-04 11:15 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Vasily Averin, netfilter-devel, Stephen Hemminger, Patrick McHardy

On Sun, May 04, 2014 at 02:23:17AM +0200, Florian Westphal wrote:
[...]
> > We also have the tproxy target and the socket match, they seem to
> > require defragmentation as well, I'm afraid the skb->nfct check will
> > not help for those cases. I think that we need some counter to know
> > how many clients we have that require the gathering + fragmentation
> > code, so if we have at least one, we have to enable it.
> 
> Last time I tried TPROXY on top of bridge it was a pain in the neck.
> 
> Essentially one has to build a 'brouter' and force packets
> upwards the stack (DROP via ebtables in broute table).
> 
> Such packets will not be seen by the bridge since they're routed
> normally via the ip stack for local delivery.
> 
> (-j TPROXY needs policy routing for the redirect to work).
> 
> It is also rather fragile in my experience (due to ebtables just
> seeing ethernet frames doing 'broute DROP only for tcp port 80' doesn't work
> universally since we don't see netfilter-defragmented packets at that stage).

All those bridge-nf-call-* were quite a hack IMO, I don't think this
is the only extension with problems.

> All things considered I think that just doing the re-fragmentation (aka
> just remove skb->nfct test) is really the least-sucky one of the options
> we have.

I see, and I think it's reasonable to assume that if nf_defrag_* is
loaded, the user expects that its bridge may fragment traffic. OK,
let's remove the skb->nfct check there.

> If you do IP NAT/TPROXY/conntrack on bridges you're already asking for varying
> degrees of layering violations, so I think it would at least be preferable to
> have one that "works" :-)

Perhaps it would be good to restrict extensions that we know that
don't work/have severe limitations to iptables/ip6tables, at least
those that we really know that don't work or need some more bits to
get them working.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
  2014-05-04  0:23       ` Florian Westphal
@ 2014-05-04 20:06       ` Bart De Schuymer
  2014-05-04 23:01         ` Pablo Neira Ayuso
  2014-05-05 12:55       ` [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge Vasily Averin
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
  3 siblings, 1 reply; 20+ messages in thread
From: Bart De Schuymer @ 2014-05-04 20:06 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal
  Cc: Vasily Averin, netfilter-devel, Stephen Hemminger, Patrick McHardy

Pablo Neira Ayuso schreef op 4/05/2014 1:39:
> I think we still may see IP packets larger than the mtu in that path.
> It would be a rare case since we need that the bridge has different
> (smaller) mtu than the sender, but still possible. The is_skb_forwardable()
> check in the current tree snapshot comes just a bit later, so if we
> remove that skb->nfct, the bridge will fragment large packets.
>
> In general, I believe bridges should silently drop packets that are
> larger than the mtu and they should perform no fragmentation handling,
> no gathering and no [re]fragmentation. They are transparent devices
> that operate at layer 2.

I agree. I don't think it's a good idea to commit code that would do 
fragmentation of IP packets that weren't defragmented first.

> The conntrack case is a special case that forces us to enable
> fragmentation handling since we get sort of a bridge that inspects
> layer 3 and 4 packet information. So we have sort of, let's call it, a
> mutant bridge.
>
> We also have the tproxy target and the socket match, they seem to
> require defragmentation as well, I'm afraid the skb->nfct check will
> not help for those cases. I think that we need some counter to know
> how many clients we have that require the gathering + fragmentation
> code, so if we have at least one, we have to enable it.

If I understood Vasily correctly, in his setup ip_defrag is being called 
from code that isn't connection tracking. Glancing at the code, at least 
IP virtual server and the code that handles the router attention IP 
option also call ip_defrag.

Isn't there an easy way to see that the skb contains a defragmented IP 
packet? If there were, then it seems replacing the "skb->nfct != NULL" 
by "is_defragmented(skb)" would suffice, no?
I see no reason to artificially restrict defrag/refrag to connection 
tracking.

cheers,

Bart


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled
  2014-05-04 20:06       ` Bart De Schuymer
@ 2014-05-04 23:01         ` Pablo Neira Ayuso
  0 siblings, 0 replies; 20+ messages in thread
From: Pablo Neira Ayuso @ 2014-05-04 23:01 UTC (permalink / raw)
  To: Bart De Schuymer
  Cc: Florian Westphal, Vasily Averin, netfilter-devel,
	Stephen Hemminger, Patrick McHardy

On Sun, May 04, 2014 at 10:06:27PM +0200, Bart De Schuymer wrote:
> If I understood Vasily correctly, in his setup ip_defrag is being
> called from code that isn't connection tracking. Glancing at the
> code, at least IP virtual server and the code that handles the
> router attention IP option also call ip_defrag.
> 
> Isn't there an easy way to see that the skb contains a defragmented
> IP packet? If there were, then it seems replacing the "skb->nfct !=
> NULL" by "is_defragmented(skb)" would suffice, no?

We didn't find any way that a packet larger than the mtu can hit that
code. The (re-)fragmentation only applies to a skb that fulfills
skb_has_frag_list(), so no need to restrict it.

> I see no reason to artificially restrict defrag/refrag to connection
> tracking.

After Vasily's patch, fragmentation/defragmentation on a bridge will
basically depend on if nf_defrag_ipv4 is loaded or not.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge
  2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
  2014-05-04  0:23       ` Florian Westphal
  2014-05-04 20:06       ` Bart De Schuymer
@ 2014-05-05 12:55       ` Vasily Averin
  2014-05-05 20:57         ` Florian Westphal
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
  3 siblings, 1 reply; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:55 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy

Pablo,

I've implemented your wishes, could you please review my new patch set?

> In general, I believe bridges should silently drop packets that are
> larger than the mtu and they should perform no fragmentation handling,
> no gathering and no [re]fragmentation. They are transparent devices
> that operate at layer 2.
> 
> The conntrack case is a special case that forces us to enable
> fragmentation handling since we get sort of a bridge that inspects
> layer 3 and 4 packet information. So we have sort of, let's call it, a
> mutant bridge.
> 
> We also have the tproxy target and the socket match, they seem to
> require defragmentation as well, I'm afraid the skb->nfct check will
> not help for those cases. I think that we need some counter to know
> how many clients we have that require the gathering + fragmentation
> code, so if we have at least one, we have to enable it.
> 
> Perhaps we can also display a message to inform the user that
> netfilter fragmentation handling is enabled.

For nf_conntrack_ipv4 I increment counter once only,
For TPROXY target and socket match I increment counter on checkentry and
decrement on destroy hook. So if these modules are just loaded but are not
used in net namespace, they will not affect ipv4 defragmentation.

Please let me know if you have some better ideas.

Vasily Averin (7):
  nf: added per net namespace ipv4 defragmentation users counter
  nf: initialization of ipv4 defragmentation users counter
  nf: increment and decrement for ipv4 defragmentation users counter
  nf: ipv4 defragmentation users counter changes in nf_conntrack_ipv4
  nf: ipv4 defragmentation users counter changes in TPROXY target
  nf: ipv4 defragmentation users counter changes in xt_socket match
  nf: use counter to manage ipv4 defragmentation on bridge

 include/net/net_namespace.h                    |    3 ++
 include/net/netfilter/ipv4/nf_defrag_ipv4.h    |   13 +++++++
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    2 +
 net/ipv4/netfilter/nf_defrag_ipv4.c            |   46 +++++++++++++++++++++++-
 net/netfilter/xt_TPROXY.c                      |   11 +++++-
 net/netfilter/xt_socket.c                      |   19 ++++++++++
 6 files changed, 92 insertions(+), 2 deletions(-)

-- 
1.7.5.4


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/7] nf: added per net namespace ipv4 defragmentation users counter
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
@ 2014-05-05 12:55         ` Vasily Averin
  2014-05-05 12:55         ` [PATCH 2/7] nf: initialization of " Vasily Averin
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:55 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 include/net/net_namespace.h |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 5f9eb26..d82e4b2 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -113,6 +113,9 @@ struct net {
 #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
 	struct netns_nf_frag	nf_frag;
 #endif
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)
+	atomic_t		br_defrag_ipv4_users_count;
+#endif
 	struct sock		*nfnl;
 	struct sock		*nfnl_stash;
 #endif
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/7] nf: initialization of ipv4 defragmentation users counter
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
  2014-05-05 12:55         ` [PATCH 1/7] nf: added per net namespace ipv4 defragmentation users counter Vasily Averin
@ 2014-05-05 12:55         ` Vasily Averin
  2014-05-05 12:56         ` [PATCH 3/7] nf: increment and decrement functions for " Vasily Averin
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:55 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 net/ipv4/netfilter/nf_defrag_ipv4.c |   32 +++++++++++++++++++++++++++++++-
 1 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index f40f321..f82685c 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -110,14 +110,44 @@ static struct nf_hook_ops ipv4_defrag_ops[] = {
 	},
 };
 
+static int nf_defrag_ipv4_net_init(struct net *net)
+{
+	atomic_set(&net->br_defrag_ipv4_users_count, 0);
+	return 0;
+}
+
+static void nf_defrag_ipv4_net_exit(struct net *net)
+{
+	if (unlikely(atomic_read(&net->br_defrag_ipv4_users_count) != 0)) {
+		pr_warn("net %p cleanup: br_defrag_users_count = %d\n",
+			net, atomic_read(&net->br_defrag_ipv4_users_count));
+	}
+}
+
+static struct pernet_operations nf_defrag_ipv4_net_ops = {
+	.init = nf_defrag_ipv4_net_init,
+	.exit = nf_defrag_ipv4_net_exit,
+};
+
 static int __init nf_defrag_init(void)
 {
-	return nf_register_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	int ret = 0;
+
+	ret = register_pernet_subsys(&nf_defrag_ipv4_net_ops);
+	if (ret)
+		goto out;
+
+	ret = nf_register_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	if (ret)
+		unregister_pernet_subsys(&nf_defrag_ipv4_net_ops);
+out:
+	return ret;
 }
 
 static void __exit nf_defrag_fini(void)
 {
 	nf_unregister_hooks(ipv4_defrag_ops, ARRAY_SIZE(ipv4_defrag_ops));
+	unregister_pernet_subsys(&nf_defrag_ipv4_net_ops);
 }
 
 void nf_defrag_ipv4_enable(void)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/7] nf: increment and decrement functions for ipv4 defragmentation users counter
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
  2014-05-05 12:55         ` [PATCH 1/7] nf: added per net namespace ipv4 defragmentation users counter Vasily Averin
  2014-05-05 12:55         ` [PATCH 2/7] nf: initialization of " Vasily Averin
@ 2014-05-05 12:56         ` Vasily Averin
  2014-05-05 12:56         ` [PATCH 4/7] nf: ipv4 defragmentation users counter changes in nf_conntrack_ipv4 module Vasily Averin
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:56 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 include/net/netfilter/ipv4/nf_defrag_ipv4.h |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/net/netfilter/ipv4/nf_defrag_ipv4.h b/include/net/netfilter/ipv4/nf_defrag_ipv4.h
index f01ef20..2a433ba 100644
--- a/include/net/netfilter/ipv4/nf_defrag_ipv4.h
+++ b/include/net/netfilter/ipv4/nf_defrag_ipv4.h
@@ -3,4 +3,17 @@
 
 void nf_defrag_ipv4_enable(void);
 
+static inline void inc_br_defrag_ipv4_users_count(struct net *net)
+{
+	if (atomic_inc_return(&net->br_defrag_ipv4_users_count) == 1)
+		pr_info("net %p: bridge ipv4 defragmentation is enabled\n",
+			net);
+}
+
+static inline void dec_br_defrag_ipv4_users_count(struct net *net)
+{
+	if (atomic_dec_and_test(&net->br_defrag_ipv4_users_count))
+		pr_info("net %p: bridge ipv4 defragmentation is disabled\n",
+			net);
+}
 #endif /* _NF_DEFRAG_IPV4_H */
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/7] nf: ipv4 defragmentation users counter changes in nf_conntrack_ipv4 module
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
                           ` (2 preceding siblings ...)
  2014-05-05 12:56         ` [PATCH 3/7] nf: increment and decrement functions for " Vasily Averin
@ 2014-05-05 12:56         ` Vasily Averin
  2014-05-05 12:56         ` [PATCH 5/7] nf: ipv4 defragmentation users counter changes in TPROXY target Vasily Averin
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:56 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 8127dc8..1376055 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -432,6 +432,7 @@ static int ipv4_net_init(struct net *net)
 		pr_err("nf_conntrack_ipv4: pernet registration failed\n");
 		goto out_ipv4;
 	}
+	inc_br_defrag_ipv4_users_count(net);
 	return 0;
 out_ipv4:
 	nf_ct_l4proto_pernet_unregister(net, &nf_conntrack_l4proto_icmp);
@@ -445,6 +446,7 @@ out_tcp:
 
 static void ipv4_net_exit(struct net *net)
 {
+	dec_br_defrag_ipv4_users_count(net);
 	nf_ct_l3proto_pernet_unregister(net, &nf_conntrack_l3proto_ipv4);
 	nf_ct_l4proto_pernet_unregister(net, &nf_conntrack_l4proto_icmp);
 	nf_ct_l4proto_pernet_unregister(net, &nf_conntrack_l4proto_udp4);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/7] nf: ipv4 defragmentation users counter changes in TPROXY target
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
                           ` (3 preceding siblings ...)
  2014-05-05 12:56         ` [PATCH 4/7] nf: ipv4 defragmentation users counter changes in nf_conntrack_ipv4 module Vasily Averin
@ 2014-05-05 12:56         ` Vasily Averin
  2014-05-05 12:56         ` [PATCH 6/7] nf: ipv4 defragmentation users counter changes in xt_socket match Vasily Averin
  2014-05-05 12:56         ` [PATCH 7/7] nf: use counter to manage ipv4 defragmentation on bridge Vasily Averin
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:56 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 net/netfilter/xt_TPROXY.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/xt_TPROXY.c b/net/netfilter/xt_TPROXY.c
index ef8a926..ce99ad4 100644
--- a/net/netfilter/xt_TPROXY.c
+++ b/net/netfilter/xt_TPROXY.c
@@ -528,14 +528,21 @@ static int tproxy_tg4_check(const struct xt_tgchk_param *par)
 	const struct ipt_ip *i = par->entryinfo;
 
 	if ((i->proto == IPPROTO_TCP || i->proto == IPPROTO_UDP)
-	    && !(i->invflags & IPT_INV_PROTO))
+	    && !(i->invflags & IPT_INV_PROTO)) {
+		inc_br_defrag_ipv4_users_count(par->net);
 		return 0;
+	}
 
 	pr_info("Can be used only in combination with "
 		"either -p tcp or -p udp\n");
 	return -EINVAL;
 }
 
+void tproxy_tg4_destroy(const struct xt_tgdtor_param *par)
+{
+	dec_br_defrag_ipv4_users_count(par->net);
+}
+
 static struct xt_target tproxy_tg_reg[] __read_mostly = {
 	{
 		.name		= "TPROXY",
@@ -545,6 +552,7 @@ static struct xt_target tproxy_tg_reg[] __read_mostly = {
 		.revision	= 0,
 		.targetsize	= sizeof(struct xt_tproxy_target_info),
 		.checkentry	= tproxy_tg4_check,
+		.destroy	= tproxy_tg4_destroy,
 		.hooks		= 1 << NF_INET_PRE_ROUTING,
 		.me		= THIS_MODULE,
 	},
@@ -556,6 +564,7 @@ static struct xt_target tproxy_tg_reg[] __read_mostly = {
 		.revision	= 1,
 		.targetsize	= sizeof(struct xt_tproxy_target_info_v1),
 		.checkentry	= tproxy_tg4_check,
+		.destroy	= tproxy_tg4_destroy,
 		.hooks		= 1 << NF_INET_PRE_ROUTING,
 		.me		= THIS_MODULE,
 	},
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/7] nf: ipv4 defragmentation users counter changes in xt_socket match
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
                           ` (4 preceding siblings ...)
  2014-05-05 12:56         ` [PATCH 5/7] nf: ipv4 defragmentation users counter changes in TPROXY target Vasily Averin
@ 2014-05-05 12:56         ` Vasily Averin
  2014-05-05 12:56         ` [PATCH 7/7] nf: use counter to manage ipv4 defragmentation on bridge Vasily Averin
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:56 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 net/netfilter/xt_socket.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 1ba6793..982bd58 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -388,6 +388,12 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 }
 #endif
 
+static int socket_mt4_v0_check(const struct xt_mtchk_param *par)
+{
+	inc_br_defrag_ipv4_users_count(par->net);
+	return 0;
+}
+
 static int socket_mt_v1_check(const struct xt_mtchk_param *par)
 {
 	const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
@@ -396,6 +402,8 @@ static int socket_mt_v1_check(const struct xt_mtchk_param *par)
 		pr_info("unknown flags 0x%x\n", info->flags & ~XT_SOCKET_FLAGS_V1);
 		return -EINVAL;
 	}
+	if (par->family == NFPROTO_IPV4)
+		inc_br_defrag_ipv4_users_count(par->net);
 	return 0;
 }
 
@@ -407,15 +415,24 @@ static int socket_mt_v2_check(const struct xt_mtchk_param *par)
 		pr_info("unknown flags 0x%x\n", info->flags & ~XT_SOCKET_FLAGS_V2);
 		return -EINVAL;
 	}
+	if (par->family == NFPROTO_IPV4)
+		inc_br_defrag_ipv4_users_count(par->net);
 	return 0;
 }
 
+static void socket_mt4_destroy(const struct xt_mtdtor_param *par)
+{
+	dec_br_defrag_ipv4_users_count(par->net);
+}
+
 static struct xt_match socket_mt_reg[] __read_mostly = {
 	{
 		.name		= "socket",
 		.revision	= 0,
 		.family		= NFPROTO_IPV4,
 		.match		= socket_mt4_v0,
+		.checkentry	= socket_mt4_v0_check,
+		.destroy	= socket_mt4_destroy,
 		.hooks		= (1 << NF_INET_PRE_ROUTING) |
 				  (1 << NF_INET_LOCAL_IN),
 		.me		= THIS_MODULE,
@@ -426,6 +443,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV4,
 		.match		= socket_mt4_v1_v2,
 		.checkentry	= socket_mt_v1_check,
+		.destroy	= socket_mt4_destroy,
 		.matchsize	= sizeof(struct xt_socket_mtinfo1),
 		.hooks		= (1 << NF_INET_PRE_ROUTING) |
 				  (1 << NF_INET_LOCAL_IN),
@@ -450,6 +468,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 		.family		= NFPROTO_IPV4,
 		.match		= socket_mt4_v1_v2,
 		.checkentry	= socket_mt_v2_check,
+		.destroy	= socket_mt4_destroy,
 		.matchsize	= sizeof(struct xt_socket_mtinfo1),
 		.hooks		= (1 << NF_INET_PRE_ROUTING) |
 				  (1 << NF_INET_LOCAL_IN),
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/7] nf: use counter to manage ipv4 defragmentation on bridge
       [not found]       ` <cover.1399292146.git.vvs@openvz.org>
                           ` (5 preceding siblings ...)
  2014-05-05 12:56         ` [PATCH 6/7] nf: ipv4 defragmentation users counter changes in xt_socket match Vasily Averin
@ 2014-05-05 12:56         ` Vasily Averin
  6 siblings, 0 replies; 20+ messages in thread
From: Vasily Averin @ 2014-05-05 12:56 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso; +Cc: netfilter-devel, Patrick McHardy


Signed-off-by: Vasily Averin <vvs@openvz.org>
---
 net/ipv4/netfilter/nf_defrag_ipv4.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c b/net/ipv4/netfilter/nf_defrag_ipv4.c
index f82685c..40cbd05 100644
--- a/net/ipv4/netfilter/nf_defrag_ipv4.c
+++ b/net/ipv4/netfilter/nf_defrag_ipv4.c
@@ -87,6 +87,20 @@ static unsigned int ipv4_conntrack_defrag(const struct nf_hook_ops *ops,
 		enum ip_defrag_users user =
 			nf_ct_defrag_user(ops->hooknum, skb);
 
+#if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4) && defined (CONFIG_BRIDGE_NETFILTER)
+		if ((user >= IP_DEFRAG_CONNTRACK_BRIDGE_IN) &&
+		    (user <= __IP_DEFRAG_CONNTRACK_BRIDGE_IN)) {
+			struct net *net = sock_net(skb->sk);
+
+			/* A bridge should not defragment and fragment packets. 
+			 * However if connection tracking is enabled or
+			 * if some target (TPROXY) or matches (socket) are used 
+			 * we enable ipv4 defragmentation on bridge
+			 */
+			if (atomic_read(&net->br_defrag_ipv4_users_count) == 0)
+				return NF_ACCEPT;
+		}
+#endif
 		if (nf_ct_ipv4_gather_frags(skb, user))
 			return NF_STOLEN;
 	}
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge
  2014-05-05 12:55       ` [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge Vasily Averin
@ 2014-05-05 20:57         ` Florian Westphal
  2014-05-07 13:27           ` Vasily Averin
  0 siblings, 1 reply; 20+ messages in thread
From: Florian Westphal @ 2014-05-05 20:57 UTC (permalink / raw)
  To: Vasily Averin
  Cc: Florian Westphal, Pablo Neira Ayuso, netfilter-devel, Patrick McHardy

Vasily Averin <vvs@parallels.com> wrote:
> For nf_conntrack_ipv4 I increment counter once only,
> For TPROXY target and socket match I increment counter on checkentry and
> decrement on destroy hook. So if these modules are just loaded but are not
> used in net namespace, they will not affect ipv4 defragmentation.
> Please let me know if you have some better ideas.

bridges defrag packets (if the nf_defrag_ipv4 is loaded) because
brnf_call_iptables sysctl is set to 1 by default.

What about making this sysctl per-netns?

That way a bridge running inside a netns could disable
iptables processing, it seems to be global switch at this time.

This way you could not enable iptables processing on a bridge
without defrag enabled (again, if the module is loaded), OTOH I
don't see why one would want iptables on a bridge without conntrack
(might as well just use ebtables).

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge
  2014-05-05 20:57         ` Florian Westphal
@ 2014-05-07 13:27           ` Vasily Averin
  2014-05-07 18:49             ` Bart De Schuymer
  0 siblings, 1 reply; 20+ messages in thread
From: Vasily Averin @ 2014-05-07 13:27 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel, Patrick McHardy

On 05/06/2014 12:57 AM, Florian Westphal wrote:
> Vasily Averin <vvs@parallels.com> wrote:
>> For nf_conntrack_ipv4 I increment counter once only,
>> For TPROXY target and socket match I increment counter on checkentry and
>> decrement on destroy hook. So if these modules are just loaded but are not
>> used in net namespace, they will not affect ipv4 defragmentation.
>> Please let me know if you have some better ideas.
> 
> bridges defrag packets (if the nf_defrag_ipv4 is loaded) because
> brnf_call_iptables sysctl is set to 1 by default.
> 
> What about making this sysctl per-netns?

I think it is great idea,
I'm agree it's much better than my patch set.

However, could anybody explain,
if nobody likes bridge-netfilters, why according sysctls are enabled in kernel by default?
I've found in RHEL6 tries to disable them via /etc/sysctl.conf
however it doesn't work when bridge module is loaded after applying settings saved in sysctl.conf

Thank you,
	Vasily Averin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge
  2014-05-07 13:27           ` Vasily Averin
@ 2014-05-07 18:49             ` Bart De Schuymer
  0 siblings, 0 replies; 20+ messages in thread
From: Bart De Schuymer @ 2014-05-07 18:49 UTC (permalink / raw)
  To: Vasily Averin, Florian Westphal
  Cc: Pablo Neira Ayuso, netfilter-devel, Patrick McHardy

Vasily Averin schreef op 7/05/2014 15:27:
> On 05/06/2014 12:57 AM, Florian Westphal wrote:
>> Vasily Averin <vvs@parallels.com> wrote:
>>> For nf_conntrack_ipv4 I increment counter once only,
>>> For TPROXY target and socket match I increment counter on checkentry and
>>> decrement on destroy hook. So if these modules are just loaded but are not
>>> used in net namespace, they will not affect ipv4 defragmentation.
>>> Please let me know if you have some better ideas.
>>
>> bridges defrag packets (if the nf_defrag_ipv4 is loaded) because
>> brnf_call_iptables sysctl is set to 1 by default.
>>
>> What about making this sysctl per-netns?
>
> I think it is great idea,
> I'm agree it's much better than my patch set.

No objections from me.

> However, could anybody explain,
> if nobody likes bridge-netfilters, why according sysctls are enabled in kernel by default?

If nobody likes it, it's quite simple: don't enable bridge-nf when 
configuring the kernel.
Here are the reasons why the defaults are set to 1:
- Backwards compatibility (the sysctl options were added later)
- Default behaviour should be independent of sysctl being enabled or not
- If someone compiles bridge-nf into the kernel, one might expect that 
it's intended to be used.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-05-07 18:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20140430092905.GA4318@localhost>
2014-05-02 15:40 ` [PATCH RFC] Bridge: do not defragment packets unless connection tracking is enabled Vasily Averin
2014-05-02 22:55   ` Florian Westphal
2014-05-03  7:15     ` Vasily Averin
2014-05-03  7:18     ` [PATCH RFC v2] " Vasily Averin
2014-05-03 23:39     ` [PATCH RFC] " Pablo Neira Ayuso
2014-05-04  0:23       ` Florian Westphal
2014-05-04 11:15         ` Pablo Neira Ayuso
2014-05-04 20:06       ` Bart De Schuymer
2014-05-04 23:01         ` Pablo Neira Ayuso
2014-05-05 12:55       ` [PATCH RFC 0/7] users counter to manage ipv4 defragmentation on bridge Vasily Averin
2014-05-05 20:57         ` Florian Westphal
2014-05-07 13:27           ` Vasily Averin
2014-05-07 18:49             ` Bart De Schuymer
     [not found]       ` <cover.1399292146.git.vvs@openvz.org>
2014-05-05 12:55         ` [PATCH 1/7] nf: added per net namespace ipv4 defragmentation users counter Vasily Averin
2014-05-05 12:55         ` [PATCH 2/7] nf: initialization of " Vasily Averin
2014-05-05 12:56         ` [PATCH 3/7] nf: increment and decrement functions for " Vasily Averin
2014-05-05 12:56         ` [PATCH 4/7] nf: ipv4 defragmentation users counter changes in nf_conntrack_ipv4 module Vasily Averin
2014-05-05 12:56         ` [PATCH 5/7] nf: ipv4 defragmentation users counter changes in TPROXY target Vasily Averin
2014-05-05 12:56         ` [PATCH 6/7] nf: ipv4 defragmentation users counter changes in xt_socket match Vasily Averin
2014-05-05 12:56         ` [PATCH 7/7] nf: use counter to manage ipv4 defragmentation on bridge Vasily Averin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.