All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: Disable defered bridge hooks by default
@ 2006-07-04  9:26 Patrick McHardy
  2006-07-04  9:27 ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-04  9:26 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 168 bytes --]

Finally got to taking care of the first part of getting
rid of the defered bridge hooks. Bart, does this look
correct to you? This stuff confuses me badly :) Thanks.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1761 bytes --]

[NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks

When a packet without any chunks is received, the newconntrack variable
in sctp_packet contains an out of bounds value that is used to look up an
pointer from the array of timeouts, which is then dereferenced, resulting
in a crash. Make sure at least a single chunk is present.

Problem noticed by George A. Theall <theall@tenablesecurity.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 4044a218db83433791d45cd453a6311d7dcbd17d
tree 37ea0a16426151bd3606264ea8c462bbf6f3decd
parent a39727f212426b9d5f9267b3318a2afaf9922d3b
author Patrick McHardy <kaber@trash.net> Fri, 30 Jun 2006 05:27:26 +0200
committer Patrick McHardy <kaber@trash.net> Fri, 30 Jun 2006 05:27:26 +0200

 net/ipv4/netfilter/ip_conntrack_proto_sctp.c |    2 +-
 net/netfilter/nf_conntrack_proto_sctp.c      |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/netfilter/ip_conntrack_proto_sctp.c b/net/ipv4/netfilter/ip_conntrack_proto_sctp.c
index 0416073..2d3612c 100644
--- a/net/ipv4/netfilter/ip_conntrack_proto_sctp.c
+++ b/net/ipv4/netfilter/ip_conntrack_proto_sctp.c
@@ -254,7 +254,7 @@ static int do_basic_checks(struct ip_con
 	}
 
 	DEBUGP("Basic checks passed\n");
-	return 0;
+	return count == 0;
 }
 
 static int new_state(enum ip_conntrack_dir dir,
diff --git a/net/netfilter/nf_conntrack_proto_sctp.c b/net/netfilter/nf_conntrack_proto_sctp.c
index 0c6da49..9dab81d 100644
--- a/net/netfilter/nf_conntrack_proto_sctp.c
+++ b/net/netfilter/nf_conntrack_proto_sctp.c
@@ -259,7 +259,7 @@ static int do_basic_checks(struct nf_con
 	}
 
 	DEBUGP("Basic checks passed\n");
-	return 0;
+	return count == 0;
 }
 
 static int new_state(enum ip_conntrack_dir dir,

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-04  9:26 RFC: Disable defered bridge hooks by default Patrick McHardy
@ 2006-07-04  9:27 ` Patrick McHardy
  2006-07-08  0:36   ` Tom Eastep
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-04  9:27 UTC (permalink / raw)
  To: Bart De Schuymer; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 374 bytes --]

Patrick McHardy wrote:
> Finally got to taking care of the first part of getting
> rid of the defered bridge hooks. Bart, does this look
> correct to you? This stuff confuses me badly :) Thanks.
> 
> ------------------------------------------------------------------------
> 
> [NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks

D'oh, wrong patch.



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 4264 bytes --]

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 43ab119..cfbf892 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -248,3 +248,21 @@ Why:	The interface no longer has any cal
 Who:	Nick Piggin <npiggin@suse.de>
 
 ---------------------------
+
+What:	Bridge netfilter defered IPv4/IPv6 output hook calling
+When:	January 2007
+Why:	The defered output hooks are a bad layering violation causing
+	lots of unusual and broken behaviour on bridge devices.
+	Examples include broken QoS classifation using the MARK or
+	CLASSIFY targets, broken behaviour with the IPsec policy match,
+	broken connection tracking with VLAN on a bridge, ...
+
+	Their only use is to enable bridge output port filtering within
+	iptables with the physdev match, which can just as well be done by
+	combining iptables and ebtables using netfilter marks. Until they
+	will be removed the deferal will be deactivated by default and
+	needs to be manually enabled by users requiring this behavious
+	through /proc/sys/bridge/bridge-nf-defer-{iptables,ip6tables}.
+Who:	Patrick McHardy <kaber@trash.net>
+
+---------------------------
diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index a75b84b..7f27bfd 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -46,6 +46,7 @@ #define BRNF_BRIDGED_DNAT		0x02
 #define BRNF_DONT_TAKE_PARENT		0x04
 #define BRNF_BRIDGED			0x08
 #define BRNF_NF_BRIDGE_PREROUTING	0x10
+#define BRNF_DEFERED_HOOK		0x20
 
 
 /* Only used in br_forward.c */
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 98338ed..8f54428 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -763,6 +763,8 @@ enum {
 	NET_BRIDGE_NF_CALL_IPTABLES = 2,
 	NET_BRIDGE_NF_CALL_IP6TABLES = 3,
 	NET_BRIDGE_NF_FILTER_VLAN_TAGGED = 4,
+	NET_BRIDGE_NF_DEFER_IPTABLES = 5,
+	NET_BRIDGE_NF_DEFER_IP6TABLES = 6,
 };
 
 /* CTL_PROC names: */
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 3da9264..3068e90 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -55,6 +55,8 @@ #ifdef CONFIG_SYSCTL
 static struct ctl_table_header *brnf_sysctl_header;
 static int brnf_call_iptables = 1;
 static int brnf_call_ip6tables = 1;
+static int brnf_defer_iptables = 0;
+static int brnf_defer_ip6tables = 0;
 static int brnf_call_arptables = 1;
 static int brnf_filter_vlan_tagged = 1;
 #else
@@ -729,7 +731,9 @@ #endif
 		NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
 			skb->dev, br_forward_finish);
 		goto out;
-	}
+	} else if (!(nf_bridge->mask & BRNF_DEFERED_HOOK))
+		return NF_ACCEPT;
+
 	realoutdev = bridge_parent(skb->dev);
 	if (!realoutdev)
 		return NF_DROP;
@@ -885,6 +889,7 @@ #if defined(CONFIG_VLAN_8021Q) || define
 #endif
 	    ) {
 		struct nf_bridge_info *nf_bridge;
+		struct iphdr *ip;
 
 		if (!skb->nf_bridge) {
 #ifdef CONFIG_SYSCTL
@@ -892,7 +897,7 @@ #ifdef CONFIG_SYSCTL
 			   the version should be 4 or 6. We can't use
 			   skb->protocol because that isn't set on
 			   PF_INET(6)/LOCAL_OUT. */
-			struct iphdr *ip = skb->nh.iph;
+			ip = skb->nh.iph;
 
 			if (ip->version == 4 && !brnf_call_iptables)
 				return NF_ACCEPT;
@@ -921,7 +926,11 @@ #if defined(CONFIG_VLAN_8021Q) || define
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		return NF_STOP;
+		if ((ip->version == 4 && brnf_defer_iptables) ||
+		    (ip->version == 6 && brnf_defer_ip6tables)) {
+			nf_bridge->mask |= BRNF_DEFERED_HOOK;
+			return NF_STOP;
+		}
 	}
 
 	return NF_ACCEPT;
@@ -1051,6 +1060,22 @@ static ctl_table brnf_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &brnf_sysctl_call_tables,
 	},
+	{
+		.ctl_name	= NET_BRIDGE_NF_DEFER_IPTABLES,
+		.procname	= "bridge-nf-defer-iptables",
+		.data		= &brnf_defer_iptables,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &brnf_sysctl_call_tables,
+	},
+	{
+		.ctl_name	= NET_BRIDGE_NF_DEFER_IP6TABLES,
+		.procname	= "bridge-nf-defer-ip6tables",
+		.data		= &brnf_defer_ip6tables,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &brnf_sysctl_call_tables,
+	},
 	{ .ctl_name = 0 }
 };
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-04  9:27 ` Patrick McHardy
@ 2006-07-08  0:36   ` Tom Eastep
  2006-07-08  3:01     ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Eastep @ 2006-07-08  0:36 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 3135 bytes --]

Patrick McHardy wrote:

> +Why:	The defered output hooks are a bad layering violation causing
> +	lots of unusual and broken behaviour on bridge devices.
> +	Examples include broken QoS classifation using the MARK or
> +	CLASSIFY targets, broken behaviour with the IPsec policy match,
> +	broken connection tracking with VLAN on a bridge, ...
> +
> +	Their only use is to enable bridge output port filtering within
> +	iptables with the physdev match, which can just as well be done by
> +	combining iptables and ebtables using netfilter marks.

Patrick,

Once again, netfilter marks are the solution of last resort. This is
becoming very painful for those of us who produce general Netfilter
configuration tools. The situation is exacerbated by the fact that
ebtables doesn't support modifying the mark value via logical AND/OR and
the other fwmark consumers (tc, ip) don't allow a mask when testing the
fwmark value.

Be that as it may, I'm having difficulty trying to apply your proposed
approach to Shorewall.

Here's an example of a forwarding rule in Shorewall:

	ACCEPT	foo	bar	tcp	25

'foo' and 'bar' are "zone" names and in a bridged configuration each may
be associated with a different port on a bridge. Today, there is a
single rule that sends all 'foo' -> 'bar' packets to a separate chain
(this isn't the exact rule is but it illustrates the point):

	iptables -A FORWARD -i bridge -o bridge -m physdev \
	--physdev-in <foo port> --physdev-out <bar port> -j foo2bar

Given that the ebtables filter table FORWARD chain is traversed before
the iptables filter table FORWARD chain, that single iptables command
can be replaced by:

	ebtables -A FORWARD -o <bar port> -j mark \
	--set-mark <bar mark> # --or-mark would be real handy here

	iptables -A FORWARD -i <bridge> -o <bridge> -m physdev \
	--physdev-in <foo port> -m mark --mark <bar mark> -j foo2bar

The Shorewall rule listed above then creates:

	iptables -A foo2bar -p tcp --dport 25 -j ACCEPT

At the end of the foo2bar chain is an unconditional rule that is
determined by the effective foo->bar policy specified by the user;
policy values include DROP, REJECT, ACCEPT and CONTINUE (CONTINUE is
used for nested zones).

A similar approach is taken for locally-generated packets. There is a
single rule to direct all 'fw' to 'bar' traffic to the 'fw2bar' chain
("fw" is the default name for the zone comprised of the local system):

	iptables -A OUTPUT -o <bridge> -m physdev \
	--physdev-out <bar port> -j fw2bar

As with forwarding, the fw2bar chain ends with a rule that enforces the
fw->bar policy.

Predictably, the Shorewall rule:

	ACCEPT	fw	bar	tcp	25

generates:

	iptables -A fw2bar -p tcp --dport 25 -j ACCEPT

I see no sensible way to eliminate the --physdev-out usage in the OUTPUT
chain using ebtables/iptables and marking. What am I missing?

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-08  0:36   ` Tom Eastep
@ 2006-07-08  3:01     ` Patrick McHardy
  2006-07-10  9:56       ` Amin Azez
                         ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-08  3:01 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Tom Eastep wrote:
> Patrick McHardy wrote:
> 
> 
>>+Why:	The defered output hooks are a bad layering violation causing
>>+	lots of unusual and broken behaviour on bridge devices.
>>+	Examples include broken QoS classifation using the MARK or
>>+	CLASSIFY targets, broken behaviour with the IPsec policy match,
>>+	broken connection tracking with VLAN on a bridge, ...
>>+
>>+	Their only use is to enable bridge output port filtering within
>>+	iptables with the physdev match, which can just as well be done by
>>+	combining iptables and ebtables using netfilter marks.
> 
> 
> Patrick,
> 
> Once again, netfilter marks are the solution of last resort. This is
> becoming very painful for those of us who produce general Netfilter
> configuration tools. The situation is exacerbated by the fact that
> ebtables doesn't support modifying the mark value via logical AND/OR and
> the other fwmark consumers (tc, ip) don't allow a mask when testing the
> fwmark value.

I understand your problems perfectly, one of my netfilter backgrounds
is creating (proprietary) high-level tools as well (aka typical
applicance vendor). I know the problems getting along with netfilter
marks and specifying reasonable limits, but this stuff has created
so much problems that I just don't care. If we need more bits, so be
it, and introducing bitwise operations to ebtables MARK can only be
a good thing anyway (and for that matter, in every other spot using
nfmark).

> Be that as it may, I'm having difficulty trying to apply your proposed
> approach to Shorewall.
> 
> Here's an example of a forwarding rule in Shorewall:
> 
> 	ACCEPT	foo	bar	tcp	25
> 
> 'foo' and 'bar' are "zone" names and in a bridged configuration each may
> be associated with a different port on a bridge. Today, there is a
> single rule that sends all 'foo' -> 'bar' packets to a separate chain
> (this isn't the exact rule is but it illustrates the point):
> 
> 	iptables -A FORWARD -i bridge -o bridge -m physdev \
> 	--physdev-in <foo port> --physdev-out <bar port> -j foo2bar
> 
> Given that the ebtables filter table FORWARD chain is traversed before
> the iptables filter table FORWARD chain, that single iptables command
> can be replaced by:
> 
> 	ebtables -A FORWARD -o <bar port> -j mark \
> 	--set-mark <bar mark> # --or-mark would be real handy here
> 
> 	iptables -A FORWARD -i <bridge> -o <bridge> -m physdev \
> 	--physdev-in <foo port> -m mark --mark <bar mark> -j foo2bar
> 
> The Shorewall rule listed above then creates:
> 
> 	iptables -A foo2bar -p tcp --dport 25 -j ACCEPT
> 
> At the end of the foo2bar chain is an unconditional rule that is
> determined by the effective foo->bar policy specified by the user;
> policy values include DROP, REJECT, ACCEPT and CONTINUE (CONTINUE is
> used for nested zones).
> 
> A similar approach is taken for locally-generated packets. There is a
> single rule to direct all 'fw' to 'bar' traffic to the 'fw2bar' chain
> ("fw" is the default name for the zone comprised of the local system):
> 
> 	iptables -A OUTPUT -o <bridge> -m physdev \
> 	--physdev-out <bar port> -j fw2bar
> 
> As with forwarding, the fw2bar chain ends with a rule that enforces the
> fw->bar policy.
> 
> Predictably, the Shorewall rule:
> 
> 	ACCEPT	fw	bar	tcp	25
> 
> generates:
> 
> 	iptables -A fw2bar -p tcp --dport 25 -j ACCEPT
> 
> I see no sensible way to eliminate the --physdev-out usage in the OUTPUT
> chain using ebtables/iptables and marking. What am I missing?

I'm a lazy reader, so I didn't follow this entirely. But:

"-i <bridge> -o <bridge> "

implies you're using this for purely bridged traffic. The feature
we're going to remove only affects locally generated traffic exiting
on a bridge device, in that case iptables _can't_ know the output
port. But you can do your iptables matching, mark matching packets
and filter on the mark within ebtables. Hmm .. actually that makes
me wonder why purely bridged traffic wouldn't have the same problem,
I can only imagine that it does similar violations. Let me look
into this ..

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-08  3:01     ` Patrick McHardy
@ 2006-07-10  9:56       ` Amin Azez
  2006-07-11  8:28         ` Patrick McHardy
  2006-07-11 20:34       ` Tom Eastep
  2006-07-12  6:16       ` Philip Craig
  2 siblings, 1 reply; 39+ messages in thread
From: Amin Azez @ 2006-07-10  9:56 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Patrick McHardy wrote:
> Tom Eastep wrote:
>> Patrick McHardy wrote:
>>
>>
>>> +Why:	The defered output hooks are a bad layering violation causing
>>> +	lots of unusual and broken behaviour on bridge devices.
>>> +	Examples include broken QoS classifation using the MARK or
>>> +	CLASSIFY targets, broken behaviour with the IPsec policy match,
>>> +	broken connection tracking with VLAN on a bridge, ...
>>> +
>>> +	Their only use is to enable bridge output port filtering within
>>> +	iptables with the physdev match, which can just as well be done by
>>> +	combining iptables and ebtables using netfilter marks.
>>
>> Patrick,
>>
>> Once again, netfilter marks are the solution of last resort. This is
>> becoming very painful for those of us who produce general Netfilter
>> configuration tools. The situation is exacerbated by the fact that
>> ebtables doesn't support modifying the mark value via logical AND/OR and
>> the other fwmark consumers (tc, ip) don't allow a mask when testing the
>> fwmark value.
> 
> I understand your problems perfectly, one of my netfilter backgrounds
> is creating (proprietary) high-level tools as well (aka typical
> applicance vendor). I know the problems getting along with netfilter
> marks and specifying reasonable limits, but this stuff has created
> so much problems that I just don't care. If we need more bits, so be
> it, and introducing bitwise operations to ebtables MARK can only be
> a good thing anyway (and for that matter, in every other spot using
> nfmark).


This morning in the shower I was wondering if I would have to add back
in what you are just taking out; however I am willing to accept this
qualified expert opinion!

This leads me to suggest addition of ipt_marks instead of ipt_mark

Not only is "the" mark overutilized but the problems of managing free
bit-ranges in iptables/ebtables (if/when ebtables supports masking) will
also be too troublesome to bear.

I re-suggest adding multiple on-demand storage slots to conntrack (and
now also skb's), for storing labelled cookie-type values tor checking
later, by iptables or ebtables or any-other.

My I copy my latest RFC attempt here (still a work in progress)

A lot of iptables modules patch ip_conntrack.h to allocate storage.

To avoid the number of conntrack or iptables modules that taint and grow
the size of an skb, or conntrack (even when they are not loaded into the
kernel), I suggest another way of iptables modules having per-conntrack
storage.

More detail of advantages

Sometimes data does not need to be stored for every conntrack, for
example per-subnet data aggregated over all ip addresses in a subnet.
Current style is to implement some kind of hash or list (often from
scratch) and store data there, but often the hash key is a function of
some of the skb or conntrack fields.

Sometimes the hash stores per-conntrack data, which could be stored in
the conntrack directly, sometimes the hash stores data aggegated for
multiple connections,  but the conntrack can cache the pointer to the
hash entry to that hash/list making access quicker.

(I think the kernel also needs common multi-indexed collections. but
anyway...)

Implementation

One slot in the conntrack is a pointer to an array of pointers.

Each module that wants to use storage will register and receives and an
index from the storage manager. This index is used as the offset to the
array element containing data for that module.

Complications

As more modules register we approach the current problem with a lot of
extra storage allocated per conntrack, but hopefully only for modules
that are actually loaded. If modules delay requesting an offset till
they first need one, that problem will be reduced except for people who
briefly play with all modules.

ID re-use.

It will be hard to re-use an id if conntracks still exist with non-zero
values for that slot; if the slot is re-allocated the new module using
the old ID may read bad data.

One solution is to maintain a global version counter. This is stamped
into each allocated conntrack. The version counter is updated when a
slot is re-allocated.

The brief #define style API used to retrieve a slot from the conntrack
will compare the conntrack recorded version stamp to the storage manager
version stamp when that slot was last allocated. If the conntrack
version stamp is old, then the value is obviously fromthe previous user
and is zeroed before being returned.

The other problem is of growing the array for older conntracks, if new
slots have been made available since the conntrack had it's storage
array allocated. The same solution applies except
that the array would also need growing, which implies a lock in the
conntrack that must be obtained before using (or growing) the array.



Do these overheads waste any advantage in merely trying to save space in
the conntrack?

Do this neatly solve the problem of passing state between tables?

Do these overheads waste any advantage in using the conntrack as a
caching mechanism?

Sam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-10  9:56       ` Amin Azez
@ 2006-07-11  8:28         ` Patrick McHardy
  2006-07-11  9:33           ` Amin Azez
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-11  8:28 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Amin Azez wrote:
> This leads me to suggest addition of ipt_marks instead of ipt_mark
> 
> Not only is "the" mark overutilized but the problems of managing free
> bit-ranges in iptables/ebtables (if/when ebtables supports masking) will
> also be too troublesome to bear.
> 
> I re-suggest adding multiple on-demand storage slots to conntrack (and
> now also skb's), for storing labelled cookie-type values tor checking
> later, by iptables or ebtables or any-other.
> 
> My I copy my latest RFC attempt here (still a work in progress)
> 
> [...]
> 
> Do these overheads waste any advantage in merely trying to save space in
> the conntrack?
> 
> Do this neatly solve the problem of passing state between tables?
> 
> Do these overheads waste any advantage in using the conntrack as a
> caching mechanism?

I would prefer to see some code before commenting. My gut feeling
is that it is overly complex. Have you looked at Rusty's ct_extend?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-11  8:28         ` Patrick McHardy
@ 2006-07-11  9:33           ` Amin Azez
  0 siblings, 0 replies; 39+ messages in thread
From: Amin Azez @ 2006-07-11  9:33 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Patrick McHardy wrote:
> Amin Azez wrote:
>> This leads me to suggest addition of ipt_marks instead of ipt_mark
>>
>> Not only is "the" mark overutilized but the problems of managing free
>> bit-ranges in iptables/ebtables (if/when ebtables supports masking) will
>> also be too troublesome to bear.
>>
>> I re-suggest adding multiple on-demand storage slots to conntrack (and
>> now also skb's), for storing labelled cookie-type values tor checking
>> later, by iptables or ebtables or any-other.
>>
>> My I copy my latest RFC attempt here (still a work in progress)
>>
>> [...]
>>
>> Do these overheads waste any advantage in merely trying to save space in
>> the conntrack?
>>
>> Do this neatly solve the problem of passing state between tables?
>>
>> Do these overheads waste any advantage in using the conntrack as a
>> caching mechanism?
> 
> I would prefer to see some code before commenting. My gut feeling
> is that it is overly complex. Have you looked at Rusty's ct_extend?


Thankyou for pointing this out. I guess ct_extend is what I am after.
And where the extended storage is an integer it ought to be easily
available from *_tables.

I'll give this a go over.

Sam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-08  3:01     ` Patrick McHardy
  2006-07-10  9:56       ` Amin Azez
@ 2006-07-11 20:34       ` Tom Eastep
  2006-07-11 21:29         ` Patrick McHardy
  2006-07-12  6:16       ` Philip Craig
  2 siblings, 1 reply; 39+ messages in thread
From: Tom Eastep @ 2006-07-11 20:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 2079 bytes --]

Patrick McHardy wrote:
> Tom Eastep wrote:

>>
>> A similar approach is taken for locally-generated packets. There is a
>> single rule to direct all 'fw' to 'bar' traffic to the 'fw2bar' chain
>> ("fw" is the default name for the zone comprised of the local system):
>>
>> 	iptables -A OUTPUT -o <bridge> -m physdev \
>> 	--physdev-out <bar port> -j fw2bar
>>
>> As with forwarding, the fw2bar chain ends with a rule that enforces the
>> fw->bar policy.
>>
>> Predictably, the Shorewall rule:
>>
>> 	ACCEPT	fw	bar	tcp	25
>>
>> generates:
>>
>> 	iptables -A fw2bar -p tcp --dport 25 -j ACCEPT
>>
>> I see no sensible way to eliminate the --physdev-out usage in the OUTPUT
>> chain using ebtables/iptables and marking. What am I missing?
> 
> I'm a lazy reader, so I didn't follow this entirely. But:
> 
> "-i <bridge> -o <bridge> "
> 
> implies you're using this for purely bridged traffic. The feature
> we're going to remove only affects locally generated traffic exiting
> on a bridge device, in that case iptables _can't_ know the output
> port.

And that is the case that I'm concerned about.

> But you can do your iptables matching, mark matching packets
> and filter on the mark within ebtables.

I was afraid that's what you were going to suggest. If Shorewall was an
appliance that only supported a limited set of configurations, I could entertain
that approach; as it is, I'm not sure.

I'm going to issue a warning to my users that Shorewall support for
bridge/firewalls may be discontinued in the future. If in the next six months, I
can come up with code that is clean enough to go forward with, I'll rescind the
announcement.

So that I understand the playing field, --physdev-out will no longer be
supported out of the FORWARD and OUTPUT chains (all tables); is that correct?

Thanks,
-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-11 20:34       ` Tom Eastep
@ 2006-07-11 21:29         ` Patrick McHardy
  2006-07-12 22:41           ` Tom Eastep
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-11 21:29 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Tom Eastep wrote:
> Patrick McHardy wrote:
> 
>>But you can do your iptables matching, mark matching packets
>>and filter on the mark within ebtables.
> 
> 
> I was afraid that's what you were going to suggest. If Shorewall was an
> appliance that only supported a limited set of configurations, I could entertain
> that approach; as it is, I'm not sure.
> 
> I'm going to issue a warning to my users that Shorewall support for
> bridge/firewalls may be discontinued in the future. If in the next six months, I
> can come up with code that is clean enough to go forward with, I'll rescind the
> announcement.

That sounds overly dramatic to me.

> So that I understand the playing field, --physdev-out will no longer be
> supported out of the FORWARD and OUTPUT chains (all tables); is that correct?

For locally generated traffic (-o br0), yes. This feature is going to
be removed, but I think it might be more useful to gather some data
among your users who actually needs this. I did some google-research
myself, and I wasn't able to find more then a handful of examples
of people actually using it this way. I certainly would be interested
in this data, if it really is needed by a significant larger amount
than I thought I will consider migation strategies stronger than
before. So far I'm not convinced that this really will pose a problem.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-08  3:01     ` Patrick McHardy
  2006-07-10  9:56       ` Amin Azez
  2006-07-11 20:34       ` Tom Eastep
@ 2006-07-12  6:16       ` Philip Craig
  2006-07-13  0:20         ` Tom Eastep
  2006-07-13  7:46         ` Patrick McHardy
  2 siblings, 2 replies; 39+ messages in thread
From: Philip Craig @ 2006-07-12  6:16 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

Patrick McHardy wrote:
> I'm a lazy reader, so I didn't follow this entirely. But:
> 
> "-i <bridge> -o <bridge> "
> 
> implies you're using this for purely bridged traffic. The feature
> we're going to remove only affects locally generated traffic exiting
> on a bridge device, in that case iptables _can't_ know the output
> port.

It also affects forwarded packets that are routed out the bridge device?
ie anything that is routed out a bridge device, because iptables sees
the packet before bridging does its stuff.  Purely bridged packets are
fine though, because they don't actually leave the bridge device, the
bridging code just calls the netfilter hooks at the appropriate time.

Would it be possible to call the bridging code from the iptables
forward/output hooks, store the result somewhere, and then when the
packet is passed to the bridge device, just use the stored result?
So fix the problem by calling bridging early, instead of deferring
iptables?  I'm not sure if this could handle packets that need to
be sent out multiple ports.

And possibly related, could this be done for arp lookups too, so
that output mac matching is possible?

> But you can do your iptables matching, mark matching packets
> and filter on the mark within ebtables.

I haven't thought about this too much, but for a high-level tool,
the rules can theoretically be too complicated for this to be feasible.
The worst case will be to perform all the possible matches that are
supported by iptables but not ebtables, and encode the result of all
of these into the mark.  And even if this can be optimized, I can
imagine the code to do it being quite complex.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-11 21:29         ` Patrick McHardy
@ 2006-07-12 22:41           ` Tom Eastep
  2006-07-13  7:35             ` Patrick McHardy
  2006-07-13  9:56             ` Amin Azez
  0 siblings, 2 replies; 39+ messages in thread
From: Tom Eastep @ 2006-07-12 22:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 1994 bytes --]

Patrick McHardy wrote:
> Tom Eastep wrote:
>> Patrick McHardy wrote:
>>
>>> But you can do your iptables matching, mark matching packets
>>> and filter on the mark within ebtables.
>>
>> I was afraid that's what you were going to suggest. If Shorewall was an
>> appliance that only supported a limited set of configurations, I could entertain
>> that approach; as it is, I'm not sure.
>>
>> I'm going to issue a warning to my users that Shorewall support for
>> bridge/firewalls may be discontinued in the future. If in the next six months, I
>> can come up with code that is clean enough to go forward with, I'll rescind the
>> announcement.
> 
> That sounds overly dramatic to me.

Probably ... my apologies.

> 
>> So that I understand the playing field, --physdev-out will no longer be
>> supported out of the FORWARD and OUTPUT chains (all tables); is that correct?
> 
> For locally generated traffic (-o br0), yes. This feature is going to
> be removed, but I think it might be more useful to gather some data
> among your users who actually needs this. I did some google-research
> myself, and I wasn't able to find more then a handful of examples
> of people actually using it this way. I certainly would be interested
> in this data, if it really is needed by a significant larger amount
> than I thought I will consider migation strategies stronger than
> before. So far I'm not convinced that this really will pose a problem.

I copied you privately on the survey that I sent out to Shorewall users -- so
far, two out of four responders filter traffic from the firewall to the bridge
based on egress port and three out of four filter outbound bridge traffic routed
from an unbridged interface based on egress port.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-12  6:16       ` Philip Craig
@ 2006-07-13  0:20         ` Tom Eastep
  2006-07-13  0:42           ` David Miller
  2006-07-13  7:31           ` Patrick McHardy
  2006-07-13  7:46         ` Patrick McHardy
  1 sibling, 2 replies; 39+ messages in thread
From: Tom Eastep @ 2006-07-13  0:20 UTC (permalink / raw)
  To: Philip Craig
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 3427 bytes --]

Philip Craig wrote:
> Patrick McHardy wrote:

> 
>> But you can do your iptables matching, mark matching packets
>> and filter on the mark within ebtables.
> 
> I haven't thought about this too much, but for a high-level tool,
> the rules can theoretically be too complicated for this to be feasible.
> The worst case will be to perform all the possible matches that are
> supported by iptables but not ebtables, and encode the result of all
> of these into the mark.  And even if this can be optimized, I can
> imagine the code to do it being quite complex.
> 

The proposed solution requires high-level tools to have two completely
different paradigms for filtering traffic; one for when the output
device is a bridge and the packet is locally-generated (and possibly
when the input device is not the same bridge -- I'm still not clear on
that point) and the other is applied everywhere else. In one case,
filtering rules are contained in the filter table and use standard
filter targets -- in the other case, they must be placed in the mangle
table and use the MARK target. That, in and of itself, is a ugly change
for Shorewall which has a nice uniform structure for generating
filtering rules.

I think that the mark values can be optimized down to two fwmark bits
per Shorewall zone. If the first bit is set, traffic is allowed to the
zone; if the other bit is set, the traffic is not allowed (our users
lose the ability to use the REJECT target since ebtables can only ACCEPT
or DROP). Two separate bits are required so that Shorewall can detect
packets that don't match any rules and hence are subject to the
applicable user-specified policy (MARK isn't terminating).

In Shorewall, there needs to be a few more fwmark bits reserved to
indicate if and at what level the packet should be logged if it is
dropped and a similar number of bits to indicate if and at what level
the packet should be logged if it is accepted. And finally, a few more
fwmark bits will need to be allocated to indicate the contents of the
log-prefix. From what I gather, ebtables doesn't support ULOG so that
form of logging won't be available.

Shorewall supports 'actions' which are essentially user-defined filter
chains; actions can be invoked in other filtering rules (a --jump to the
action chain results). With --physdev-out limited as proposed, actions
must generate either a filter table chain or a mangle table chain or
both (the mangle table chains must be tailored for each destination zone
in the rules where the action is invoked since the mark values are
dependent on the destination zone in the invoking rule). Shorewall today
needs to tailor the chain based on the invoking context and that part of
the code is complex and error prone.

The original support for bridge-port based filtering was added to
Shorewall in a weekend, including the documentation (I can only work on
Shorewall off hours). To maintain even reduced support for this feature
will require considerably more effort and will result, I fear, in a less
maintainable product.

In summary, I think that the complexity is probably manageable but this
is really ugly ...

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  0:20         ` Tom Eastep
@ 2006-07-13  0:42           ` David Miller
  2006-07-13  0:45             ` Tom Eastep
  2006-07-13  7:31           ` Patrick McHardy
  1 sibling, 1 reply; 39+ messages in thread
From: David Miller @ 2006-07-13  0:42 UTC (permalink / raw)
  To: teastep; +Cc: netfilter-devel, bdschuym, philipc, kaber

From: Tom Eastep <teastep@shorewall.net>
Date: Wed, 12 Jul 2006 17:20:21 -0700

> From what I gather, ebtables doesn't support ULOG so that
> form of logging won't be available.

See net/bridge/netfilter/ebt_ulog.c, it does support this.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  0:42           ` David Miller
@ 2006-07-13  0:45             ` Tom Eastep
  2006-07-13  9:45               ` Amin Azez
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Eastep @ 2006-07-13  0:45 UTC (permalink / raw)
  To: David Miller; +Cc: netfilter-devel, bdschuym, philipc, kaber

[-- Attachment #1: Type: text/plain, Size: 546 bytes --]

David Miller wrote:
> From: Tom Eastep <teastep@shorewall.net>
> Date: Wed, 12 Jul 2006 17:20:21 -0700
> 
>> From what I gather, ebtables doesn't support ULOG so that
>> form of logging won't be available.
> 
> See net/bridge/netfilter/ebt_ulog.c, it does support this.

Thanks, David -- I stand corrected.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  0:20         ` Tom Eastep
  2006-07-13  0:42           ` David Miller
@ 2006-07-13  7:31           ` Patrick McHardy
  1 sibling, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13  7:31 UTC (permalink / raw)
  To: Tom Eastep
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Philip Craig

On Wed, 12 Jul 2006, Tom Eastep wrote:

> Philip Craig wrote:
>> Patrick McHardy wrote:
>
>>
>>> But you can do your iptables matching, mark matching packets
>>> and filter on the mark within ebtables.
>>
>> I haven't thought about this too much, but for a high-level tool,
>> the rules can theoretically be too complicated for this to be feasible.
>> The worst case will be to perform all the possible matches that are
>> supported by iptables but not ebtables, and encode the result of all
>> of these into the mark.  And even if this can be optimized, I can
>> imagine the code to do it being quite complex.


>From what I can see, you would need one mark value for each combination
of bridge port and target.

> The proposed solution requires high-level tools to have two completely
> different paradigms for filtering traffic; one for when the output
> device is a bridge and the packet is locally-generated (and possibly
> when the input device is not the same bridge -- I'm still not clear on
> that point) and the other is applied everywhere else. In one case,
> filtering rules are contained in the filter table and use standard
> filter targets -- in the other case, they must be placed in the mangle
> table and use the MARK target. That, in and of itself, is a ugly change
> for Shorewall which has a nice uniform structure for generating
> filtering rules.


True, that isn't very nice. But its only required for filtering
based on bridge ports, not for the bridge device itself. I'm
not sure if this is possible, but since you're filtering based
on the bridge port, I assume you allow only certain IP addresses
or something like that. Couldn't you just route them directly
to the bridge port instead?

> I think that the mark values can be optimized down to two fwmark bits
> per Shorewall zone. If the first bit is set, traffic is allowed to the
> zone; if the other bit is set, the traffic is not allowed (our users
> lose the ability to use the REJECT target since ebtables can only ACCEPT
> or DROP). Two separate bits are required so that Shorewall can detect
> packets that don't match any rules and hence are subject to the
> applicable user-specified policy (MARK isn't terminating).
>
> In Shorewall, there needs to be a few more fwmark bits reserved to
> indicate if and at what level the packet should be logged if it is
> dropped and a similar number of bits to indicate if and at what level
> the packet should be logged if it is accepted. And finally, a few more
> fwmark bits will need to be allocated to indicate the contents of the
> log-prefix. From what I gather, ebtables doesn't support ULOG so that
> form of logging won't be available.
>
> Shorewall supports 'actions' which are essentially user-defined filter
> chains; actions can be invoked in other filtering rules (a --jump to the
> action chain results). With --physdev-out limited as proposed, actions
> must generate either a filter table chain or a mangle table chain or
> both (the mangle table chains must be tailored for each destination zone
> in the rules where the action is invoked since the mark values are
> dependent on the destination zone in the invoking rule). Shorewall today
> needs to tailor the chain based on the invoking context and that part of
> the code is complex and error prone.
>
> The original support for bridge-port based filtering was added to
> Shorewall in a weekend, including the documentation (I can only work on
> Shorewall off hours). To maintain even reduced support for this feature
> will require considerably more effort and will result, I fear, in a less
> maintainable product.
>
> In summary, I think that the complexity is probably manageable but this
> is really ugly ...


Its getting harder, no doubt about that. But if you check bugzilla and
the lists we had tons of reports of other things this breaks, and I
lean towards having consistent behaviour instead of a rarely used
feature. The simple fact is that the output bridge port is outside of
the scope of what iptables can know for routed traffic and this feature
should have never been introduced in the first place (in which case
I guess nobody would be complaining). I'm open to suggestions how to
make it easier for users getting rid of this, but we still need to
get rid of it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-12 22:41           ` Tom Eastep
@ 2006-07-13  7:35             ` Patrick McHardy
  2006-07-13 14:11               ` Tom Eastep
  2006-07-13  9:56             ` Amin Azez
  1 sibling, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13  7:35 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

On Wed, 12 Jul 2006, Tom Eastep wrote:

> Patrick McHardy wrote:
>>
>>> So that I understand the playing field, --physdev-out will no longer be
>>> supported out of the FORWARD and OUTPUT chains (all tables); is that correct?
>>
>> For locally generated traffic (-o br0), yes. This feature is going to
>> be removed, but I think it might be more useful to gather some data
>> among your users who actually needs this. I did some google-research
>> myself, and I wasn't able to find more then a handful of examples
>> of people actually using it this way. I certainly would be interested
>> in this data, if it really is needed by a significant larger amount
>> than I thought I will consider migation strategies stronger than
>> before. So far I'm not convinced that this really will pose a problem.
>
> I copied you privately on the survey that I sent out to Shorewall users -- so
> far, two out of four responders filter traffic from the firewall to the bridge
> based on egress port and three out of four filter outbound bridge traffic routed
> from an unbridged interface based on egress port.


Thanks. I'm currently travelling, I'll look at this on Sunday when I get
back home (haven't received it yet I think). Do you know on which criteria
they filter in addition to the bridge port?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-12  6:16       ` Philip Craig
  2006-07-13  0:20         ` Tom Eastep
@ 2006-07-13  7:46         ` Patrick McHardy
  2006-07-13  8:12           ` Philip Craig
  2006-07-13 14:11           ` Amin Azez
  1 sibling, 2 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13  7:46 UTC (permalink / raw)
  To: Philip Craig
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

On Wed, 12 Jul 2006, Philip Craig wrote:

> Patrick McHardy wrote:
>> I'm a lazy reader, so I didn't follow this entirely. But:
>>
>> "-i <bridge> -o <bridge> "
>>
>> implies you're using this for purely bridged traffic. The feature
>> we're going to remove only affects locally generated traffic exiting
>> on a bridge device, in that case iptables _can't_ know the output
>> port.
>
> It also affects forwarded packets that are routed out the bridge device?
> ie anything that is routed out a bridge device, because iptables sees
> the packet before bridging does its stuff.  Purely bridged packets are
> fine though, because they don't actually leave the bridge device, the
> bridging code just calls the netfilter hooks at the appropriate time.

Exactly.

> Would it be possible to call the bridging code from the iptables
> forward/output hooks, store the result somewhere, and then when the
> packet is passed to the bridge device, just use the stored result?
> So fix the problem by calling bridging early, instead of deferring
> iptables?  I'm not sure if this could handle packets that need to
> be sent out multiple ports.

No, it couldn't. Its questionable behaviour to do this anyway, think
about something you REJECT without looking at the output port - you
will reject once for each output port. The entire idea of using
iptables from the bridge code seems flawed to me. Conntrack might
also get very confused under certain cicurstances.

> And possibly related, could this be done for arp lookups too, so
> that output mac matching is possible?

Theoretically yes, but I'm trying to get rid of a problem here,
not introduce new ones :)

>> But you can do your iptables matching, mark matching packets
>> and filter on the mark within ebtables.
>
> I haven't thought about this too much, but for a high-level tool,
> the rules can theoretically be too complicated for this to be feasible.
> The worst case will be to perform all the possible matches that are
> supported by iptables but not ebtables, and encode the result of all
> of these into the mark.  And even if this can be optimized, I can
> imagine the code to do it being quite complex.


Its a bit easier, but I don't see that as a real problem, its a high
level tool after all, its OK if things get a bit complicated.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  7:46         ` Patrick McHardy
@ 2006-07-13  8:12           ` Philip Craig
  2006-07-13  8:36             ` Patrick McHardy
  2006-07-13 14:11           ` Amin Azez
  1 sibling, 1 reply; 39+ messages in thread
From: Philip Craig @ 2006-07-13  8:12 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

Patrick McHardy wrote:
> On Wed, 12 Jul 2006, Philip Craig wrote:
>
>> Patrick McHardy wrote:
>> Would it be possible to call the bridging code from the iptables
>> forward/output hooks, store the result somewhere, and then when the
>> packet is passed to the bridge device, just use the stored result?
>> So fix the problem by calling bridging early, instead of deferring
>> iptables?  I'm not sure if this could handle packets that need to
>> be sent out multiple ports.
>
> No, it couldn't. Its questionable behaviour to do this anyway, think
> about something you REJECT without looking at the output port - you
> will reject once for each output port. The entire idea of using
> iptables from the bridge code seems flawed to me. Conntrack might
> also get very confused under certain cicurstances.

These are also problems for purely bridged packets.  br_flood calls
__br_forward for every port, which calls the bridge forward hook, and
that passes the packet on to the ipv4 forward hook.  Should this be
changed so that the ipv4 forward hook is called earlier, and filtering
on output port is never possible?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  8:12           ` Philip Craig
@ 2006-07-13  8:36             ` Patrick McHardy
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13  8:36 UTC (permalink / raw)
  To: Philip Craig
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

On Thu, 13 Jul 2006, Philip Craig wrote:

> Patrick McHardy wrote:
>> On Wed, 12 Jul 2006, Philip Craig wrote:
>>
>>> Patrick McHardy wrote:
>>> Would it be possible to call the bridging code from the iptables
>>> forward/output hooks, store the result somewhere, and then when the
>>> packet is passed to the bridge device, just use the stored result?
>>> So fix the problem by calling bridging early, instead of deferring
>>> iptables?  I'm not sure if this could handle packets that need to
>>> be sent out multiple ports.
>>
>> No, it couldn't. Its questionable behaviour to do this anyway, think
>> about something you REJECT without looking at the output port - you
>> will reject once for each output port. The entire idea of using
>> iptables from the bridge code seems flawed to me. Conntrack might
>> also get very confused under certain cicurstances.
>
> These are also problems for purely bridged packets.  br_flood calls
> __br_forward for every port, which calls the bridge forward hook, and
> that passes the packet on to the ipv4 forward hook.  Should this be
> changed so that the ipv4 forward hook is called earlier, and filtering
> on output port is never possible?

As I already mentioned in this thread, I believe the concept of calling
iptables from briding is flawed. We have ebtables to do filtering within
a bridge. But this is getting slightly off-topic.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  0:45             ` Tom Eastep
@ 2006-07-13  9:45               ` Amin Azez
  0 siblings, 0 replies; 39+ messages in thread
From: Amin Azez @ 2006-07-13  9:45 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netfilter-devel, bdschuym, philipc, kaber

Tom Eastep wrote:
> David Miller wrote:
>> From: Tom Eastep <teastep@shorewall.net>
>> Date: Wed, 12 Jul 2006 17:20:21 -0700
>>
>>> From what I gather, ebtables doesn't support ULOG so that
>>> form of logging won't be available.
>> See net/bridge/netfilter/ebt_ulog.c, it does support this.
> 
> Thanks, David -- I stand corrected.


beware, though, ebtables ulog, and iptables ulog have a different
netlink packet structure, so make sure you use the right #include when
compiling userspace.


Sam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-12 22:41           ` Tom Eastep
  2006-07-13  7:35             ` Patrick McHardy
@ 2006-07-13  9:56             ` Amin Azez
  1 sibling, 0 replies; 39+ messages in thread
From: Amin Azez @ 2006-07-13  9:56 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Tom Eastep wrote:
> I copied you privately on the survey that I sent out to Shorewall users -- so
> far, two out of four responders filter traffic from the firewall to the bridge
> based on egress port and three out of four filter outbound bridge traffic routed
> from an unbridged interface based on egress port.


I've been thinking how similar the problems of apache rewrite rules are
to the problems of iptables we discuss here.

Apache folk ended up doing a fake "look-ahead" to show how a request
would be handled and so how some of the request fields that are not set
WOULD be set. In apache such look-ahead could cause recursion but
request "notes" (like iptables marks) or other mechanisms are used so
that rewrite matches can tell that when they are operating under
look-ahead and stop being clever in those cases.

It seems like some iptables rules want to say: If the packet WOULD have
been handled thus, then handle it like this instead. Of course, when
there is more than one type of rule doing his, finding out how a packet
"WOULD" have been routed can be "interesting".

Maybe we can fix up bridging behaviour to follow routing rules.
What if bridged packets could be routed to the target ip and mac and
bridge, as if it were an existing route in the route table? This perhaps
would involve mild freakery in the routing code. From an external view
it would still look like bridging? (I want a conntrack to be able to
cache a route anyway so that a route won't change on established
connections even if the routing results would be different);

The odds of Patrick slapping is head with exasperation at my imperfect
understanding are quite high, I admit. (Perhaps I'm side stepping too
much other important bridging features)

Sam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  7:35             ` Patrick McHardy
@ 2006-07-13 14:11               ` Tom Eastep
  2006-07-13 14:45                 ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Eastep @ 2006-07-13 14:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 575 bytes --]

Patrick McHardy wrote:

> 
> 
> Thanks. I'm currently travelling, I'll look at this on Sunday when I get
> back home (haven't received it yet I think). Do you know on which criteria
> they filter in addition to the bridge port?
> 

Typically, they will also filter on protocol and the destination port (if the
protocol is TCP or UDP).

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13  7:46         ` Patrick McHardy
  2006-07-13  8:12           ` Philip Craig
@ 2006-07-13 14:11           ` Amin Azez
  2006-07-13 14:50             ` Patrick McHardy
  1 sibling, 1 reply; 39+ messages in thread
From: Amin Azez @ 2006-07-13 14:11 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

* Patrick McHardy wrote, On 13/07/06 08:46:

>> Would it be possible to call the bridging code from the iptables
>> forward/output hooks, store the result somewhere, and then when the
>> packet is passed to the bridge device, just use the stored result?
>> So fix the problem by calling bridging early, instead of deferring
>> iptables?  I'm not sure if this could handle packets that need to
>> be sent out multiple ports.
> 
> No, it couldn't. Its questionable behaviour to do this anyway, think
> about something you REJECT without looking at the output port - you
> will reject once for each output port. The entire idea of using
> iptables from the bridge code seems flawed to me. Conntrack might
> also get very confused under certain cicurstances.
> 
>> And possibly related, could this be done for arp lookups too, so
>> that output mac matching is possible?
> 
> Theoretically yes, but I'm trying to get rid of a problem here,
> not introduce new ones :)

The difficulty I am now trying to solve is that of SNAT-ing bridged
traffic depending on the output port.

Bridging is really layer2 routing and so it makes a lot of sense to
handle it like this.

I'm currently looking at adding a bridge look-ahead target (similar to
Philips suggestion to see where a packet was likely to be bridged so I
can use that. It probably wants combining with physdev so that physdev
can use whichever strategy is suited whether the packet is routed or to
be bridged.

Sam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 14:11               ` Tom Eastep
@ 2006-07-13 14:45                 ` Patrick McHardy
  2006-07-13 15:31                   ` Tom Eastep
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13 14:45 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer


On Thu, 13 Jul 2006, Tom Eastep wrote:

> Patrick McHardy wrote:
>
>>
>> Thanks. I'm currently travelling, I'll look at this on Sunday when I get
>> back home (haven't received it yet I think). Do you know on which criteria
>> they filter in addition to the bridge port?
>>
>
> Typically, they will also filter on protocol and the destination port (if the
> protocol is TCP or UDP).


And the really differentiate between different bridge ports for this, i.e.
port eth0 may receive packets on port 80, port eth1 may not?

I don't really see why this can't be done purely within ebtables, it too
can filter based on protocol and port numbers. Do you also know of
examples where its really necessary to filter on bridge port and use
iptables' capabilities?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 14:11           ` Amin Azez
@ 2006-07-13 14:50             ` Patrick McHardy
  2006-07-13 15:29               ` Amin Azez
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13 14:50 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

On Thu, 13 Jul 2006, Amin Azez wrote:

> * Patrick McHardy wrote, On 13/07/06 08:46:
>
>>> And possibly related, could this be done for arp lookups too, so
>>> that output mac matching is possible?
>>
>> Theoretically yes, but I'm trying to get rid of a problem here,
>> not introduce new ones :)
>
> The difficulty I am now trying to solve is that of SNAT-ing bridged
> traffic depending on the output port.
>
> Bridging is really layer2 routing and so it makes a lot of sense to
> handle it like this.

And what will the usage scenario of this be?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 14:50             ` Patrick McHardy
@ 2006-07-13 15:29               ` Amin Azez
  2006-07-19 16:36                 ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: Amin Azez @ 2006-07-13 15:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

* Patrick McHardy wrote, On 13/07/06 15:50:
> On Thu, 13 Jul 2006, Amin Azez wrote:
> 
>> * Patrick McHardy wrote, On 13/07/06 08:46:
>>
>>>> And possibly related, could this be done for arp lookups too, so
>>>> that output mac matching is possible?
>>>
>>> Theoretically yes, but I'm trying to get rid of a problem here,
>>> not introduce new ones :)
>>
>> The difficulty I am now trying to solve is that of SNAT-ing bridged
>> traffic depending on the output port.
>>
>> Bridging is really layer2 routing and so it makes a lot of sense to
>> handle it like this.
> 
> And what will the usage scenario of this be?

Intercepting some traffic types and sending them via alternate gateways,
(i.e. VPN gateways if the VPN is up) for errr... people who don't want
to change the network default gateway for the whole network; also, for
managing certain services for multi-subnet'd segments. I guess its part
of the fashion to do things in layer2 these days as it involves less
infrastructure change. For them.

I think Philip was right and my slight suggestion is wrong; I can't
"look ahead" as ebtables would need to process the "look ahead" also,
and influencing the look ahead (as well as counting the look-ahead packets)

I think cracking open the bridge code to fit the same model as the ip
routing code, so it works out how to bridge it at just the same point
the routing code calculates a route and then output device; but saves
the answer for later when the final hooks have been traversed.

I think if ebtables becomes the answer it will just grow to be as big as
iptables with all the extra matches iptables has, just as iptables has
extra -p tcp and -p udp matches.

Sam

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 14:45                 ` Patrick McHardy
@ 2006-07-13 15:31                   ` Tom Eastep
  2006-07-15 14:32                     ` Tom Eastep
  2006-07-19 14:21                     ` Patrick McHardy
  0 siblings, 2 replies; 39+ messages in thread
From: Tom Eastep @ 2006-07-13 15:31 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

Patrick McHardy wrote:
> 
> On Thu, 13 Jul 2006, Tom Eastep wrote:
> 
>> Patrick McHardy wrote:
>>
>>>
>>> Thanks. I'm currently travelling, I'll look at this on Sunday when I get
>>> back home (haven't received it yet I think). Do you know on which
>>> criteria
>>> they filter in addition to the bridge port?
>>>
>>
>> Typically, they will also filter on protocol and the destination port
>> (if the
>> protocol is TCP or UDP).
> 
> 
> And the really differentiate between different bridge ports for this, i.e.
> port eth0 may receive packets on port 80, port eth1 may not?

That's correct.

> 
> I don't really see why this can't be done purely within ebtables, it too
> can filter based on protocol and port numbers. Do you also know of
> examples where its really necessary to filter on bridge port and use
> iptables' capabilities?
> 

In Shorewall, filtering on bridge port is usually done *first* to select the
appropriate rule chain. In that rule chain, the user has access to more or less
the full box of iptables tools (at least those that are supported by kernels
from kernel.org). To what extent those tools are actually used, I don't know.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 15:31                   ` Tom Eastep
@ 2006-07-15 14:32                     ` Tom Eastep
  2006-07-19 14:21                     ` Patrick McHardy
  1 sibling, 0 replies; 39+ messages in thread
From: Tom Eastep @ 2006-07-15 14:32 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

On Thu, 2006-07-13 at 08:31 -0700, Tom Eastep wrote:

> 
> In Shorewall, filtering on bridge port is usually done *first* to select the
> appropriate rule chain. In that rule chain, the user has access to more or less
> the full box of iptables tools (at least those that are supported by kernels
> from kernel.org). To what extent those tools are actually used, I don't know.
> 

From my quick survey, most Shorewall bridge users are filtering OUTPUT
traffic and/or routed traffic based on egress port; where additional
filtering is being used, it is most commonly based on source IP,
protocol and destination port. One respondent is also using ipt_recent.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 15:31                   ` Tom Eastep
  2006-07-15 14:32                     ` Tom Eastep
@ 2006-07-19 14:21                     ` Patrick McHardy
  2006-07-19 15:50                       ` Tom Eastep
  1 sibling, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-19 14:21 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Tom Eastep wrote:
> Patrick McHardy wrote:
> 
>>I don't really see why this can't be done purely within ebtables, it too
>>can filter based on protocol and port numbers. Do you also know of
>>examples where its really necessary to filter on bridge port and use
>>iptables' capabilities?
>>
> 
> 
> In Shorewall, filtering on bridge port is usually done *first* to select the
> appropriate rule chain. In that rule chain, the user has access to more or less
> the full box of iptables tools (at least those that are supported by kernels
> from kernel.org). To what extent those tools are actually used, I don't know.

It there anything preventing your users from you routing outgoing
packets to the bridge ports directly? I assume if they use IP/port
filters they should already have a pretty good idea of whats located
behind a bridge port and don't really need the bridge to route the
packets.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-19 14:21                     ` Patrick McHardy
@ 2006-07-19 15:50                       ` Tom Eastep
  2006-07-19 16:02                         ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: Tom Eastep @ 2006-07-19 15:50 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 2146 bytes --]

On Wed, 2006-07-19 at 16:21 +0200, Patrick McHardy wrote:
> Tom Eastep wrote:
> > Patrick McHardy wrote:
> > 
> >>I don't really see why this can't be done purely within ebtables, it too
> >>can filter based on protocol and port numbers. Do you also know of
> >>examples where its really necessary to filter on bridge port and use
> >>iptables' capabilities?
> >>
> > 
> > 
> > In Shorewall, filtering on bridge port is usually done *first* to select the
> > appropriate rule chain. In that rule chain, the user has access to more or less
> > the full box of iptables tools (at least those that are supported by kernels
> > from kernel.org). To what extent those tools are actually used, I don't know.
> 
> It there anything preventing your users from you routing outgoing
> packets to the bridge ports directly? I assume if they use IP/port
> filters they should already have a pretty good idea of whats located
> behind a bridge port and don't really need the bridge to route the
> packets.
> 

It is possible to define a Shorewall zone as a set of (interface,
network) pairs (where 'network' may be a single host) rather than a set
of (bridge, port) pairs. So instead of defining a bridged zone as 'all
hosts whose traffic goes in/out port P on bridge B', the user can define
the same zone as (B,net1),(B,net2),... where net1,net2,... defines the
same set of hosts (extra care must be taken when a default route goes
through a bridge port)

While this alternative definition is more cumbersome and error-prone for
the user to configure and maintain, it removes the need for filtering by
bridge port entirely. That approach is probably what I will recommend to
current users when --physdev-out is no longer available.

Shorewall also supports defining a zone as a set of (interface, ipset)
pairs which is a workable alternative for those users who have installed
ipset support.

-Tom
-- 
Tom Eastep    \ Nothing is foolproof to a sufficiently talented fool
Shoreline,     \ http://shorewall.net
Washington USA  \ teastep@shorewall.net
PGP Public Key   \ https://lists.shorewall.net/teastep.pgp.key

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-19 15:50                       ` Tom Eastep
@ 2006-07-19 16:02                         ` Patrick McHardy
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-19 16:02 UTC (permalink / raw)
  To: Tom Eastep; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

Tom Eastep wrote:
> On Wed, 2006-07-19 at 16:21 +0200, Patrick McHardy wrote:
> 
>>It there anything preventing your users from you routing outgoing
>>packets to the bridge ports directly? I assume if they use IP/port
>>filters they should already have a pretty good idea of whats located
>>behind a bridge port and don't really need the bridge to route the
>>packets.
>>
> 
> 
> It is possible to define a Shorewall zone as a set of (interface,
> network) pairs (where 'network' may be a single host) rather than a set
> of (bridge, port) pairs. So instead of defining a bridged zone as 'all
> hosts whose traffic goes in/out port P on bridge B', the user can define
> the same zone as (B,net1),(B,net2),... where net1,net2,... defines the
> same set of hosts (extra care must be taken when a default route goes
> through a bridge port)
> 
> While this alternative definition is more cumbersome and error-prone for
> the user to configure and maintain, it removes the need for filtering by
> bridge port entirely. That approach is probably what I will recommend to
> current users when --physdev-out is no longer available.

That should work if the location of the hosts is already known. But in
that case I still wonder why route to the bridge at all .. the ports
should work as well as long as you know where the individual hosts are
located.

Anyway, glad you found a viable alternative for shorewall.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-13 15:29               ` Amin Azez
@ 2006-07-19 16:36                 ` Patrick McHardy
       [not found]                   ` <44BE624E.5080307@ufomechanic.net>
  0 siblings, 1 reply; 39+ messages in thread
From: Patrick McHardy @ 2006-07-19 16:36 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

Amin Azez wrote:
> Intercepting some traffic types and sending them via alternate gateways,
> (i.e. VPN gateways if the VPN is up) for errr... people who don't want
> to change the network default gateway for the whole network; also, for
> managing certain services for multi-subnet'd segments. I guess its part
> of the fashion to do things in layer2 these days as it involves less
> infrastructure change. For them.
> 
> I think Philip was right and my slight suggestion is wrong; I can't
> "look ahead" as ebtables would need to process the "look ahead" also,
> and influencing the look ahead (as well as counting the look-ahead packets)
> 
> I think cracking open the bridge code to fit the same model as the ip
> routing code, so it works out how to bridge it at just the same point
> the routing code calculates a route and then output device; but saves
> the answer for later when the final hooks have been traversed.

The routing code shouldn't know anything about the bridge FDB.

> I think if ebtables becomes the answer it will just grow to be as big as
> iptables with all the extra matches iptables has, just as iptables has
> extra -p tcp and -p udp matches.

Thats where MARK will be useful, but only if the target is available
in ebtables. SNAT based on bridge port won't be possible anymore.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
       [not found]                   ` <44BE624E.5080307@ufomechanic.net>
@ 2006-07-19 17:15                     ` Patrick McHardy
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-19 17:15 UTC (permalink / raw)
  To: Amin Azez; +Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep

Amin Azez wrote:
> * Patrick McHardy wrote, On 19/07/06 17:36:
> 
>>The routing code shouldn't know anything about the bridge FDB.
>>  
> 
> True, I only suggest that the bridge work out what it is going to do
> with the packet just after routing (if any) but before post-routing. I
> can't think of much else that will happen to the packets between
> postrouting and bridge output that would spoil this.
> 
> I wasn't suggesting that bridging and routing modules should peer over
> eachothers shoulders, but rather than defer the hooks we bring forward
> the bridge decision.


But at that point again you can't even be sure if the packet will
finally exit a bridge device. TC action could direct the packet
to a different device from the bridge output queue. There is no
clean solution for this which integrates well with all the other
things Linux networking can do.

>>>I think if ebtables becomes the answer it will just grow to be as big as
>>>iptables with all the extra matches iptables has, just as iptables has
>>>extra -p tcp and -p udp matches.
>>>    
>>
>>Thats where MARK will be useful, but only if the target is available
>>in ebtables. SNAT based on bridge port won't be possible anymore.
>>  
> 
> Yes but the ebtables could get as big as turing machine as the possible
> ebtables actions end up getting encoded into the mark; with such side
> deferred effects as:
> * set dscp to various values
> * set mark (for tc)
> * drop
> 
> all multipled by the number of output ports, so that these actions can
> be taken based on the output port.


Worst case will need some space, but common sense tells me that this is
not what users are doing (DSCP based on bridge output port combined with
criteria only usable within iptables? What for?). Marking for tc can
be done within iptables, and ebtables just needs to encode the output
port in the mark if you really need it (why would you? you have a
seperate device, the bridge port, to do QoS on).

Anyway, I'll submit the patch soon, which gives us at least 6 more month
to work this out.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re:  RFC: Disable defered bridge hooks by default
  2006-07-13 12:56 bdschuym@pandora.be
@ 2006-07-13 14:38 ` Patrick McHardy
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-13 14:38 UTC (permalink / raw)
  To: bdschuym@pandora.be
  Cc: Netfilter Development Mailinglist, Bart De Schuymer,
	Philip Craig, Tom Eastep

On Thu, 13 Jul 2006, bdschuym@pandora.be wrote:

>> Van: Patrick McHardy [mailto:kaber@trash.net]
>>
>> As I already mentioned in this thread, I believe the concept of calling
>> iptables from briding is flawed. We have ebtables to do filtering within
>> a bridge. But this is getting slightly off-topic.
>
> ebtables can't do things like stateful IP firewalling. IMO, things like
> transparent stateful firewalling are way too cool to just throw out of Linux
> because it troubles iptables somewhat in special cases.


Sure, I didn't suggest this. But the fact is that is doesn't work very
well together. Just using ip_conntrack (without NAT) without problems
should be relatively easy, its NAT and iptables that create problems.
Saying the problem cases are "special" (because they create problems)
does not help, users expect them to work because they way it presents
itself suggests they would work. But don't fear, I don't have any
further plans besides getting rid of the deferred hooks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re:  RFC: Disable defered bridge hooks by default
@ 2006-07-13 12:56 bdschuym@pandora.be
  2006-07-13 14:38 ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: bdschuym@pandora.be @ 2006-07-13 12:56 UTC (permalink / raw)
  To: Patrick McHardy, Philip Craig
  Cc: Netfilter Development Mailinglist, Bart De Schuymer, Tom Eastep


>----- Oorspronkelijk bericht -----
>Van: Patrick McHardy [mailto:kaber@trash.net]
>Verzonden: donderdag, juli 13, 2006 10:36 AM
>Aan: 'Philip Craig'
>CC: 'Netfilter Development Mailinglist', 'Bart De Schuymer', 'Tom Eastep'
>Onderwerp: Re: RFC: Disable defered bridge hooks by default
>
>On Thu, 13 Jul 2006, Philip Craig wrote:
>
>> Patrick McHardy wrote:
>>> On Wed, 12 Jul 2006, Philip Craig wrote:
>>>
>>>> Patrick McHardy wrote:
>>>> Would it be possible to call the bridging code from the iptables
>>>> forward/output hooks, store the result somewhere, and then when the
>>>> packet is passed to the bridge device, just use the stored result?
>>>> So fix the problem by calling bridging early, instead of deferring
>>>> iptables?  I'm not sure if this could handle packets that need to
>>>> be sent out multiple ports.
>>>
>>> No, it couldn't. Its questionable behaviour to do this anyway, think
>>> about something you REJECT without looking at the output port - you
>>> will reject once for each output port. The entire idea of using
>>> iptables from the bridge code seems flawed to me. Conntrack might
>>> also get very confused under certain cicurstances.
>>
>> These are also problems for purely bridged packets.  br_flood calls
>> __br_forward for every port, which calls the bridge forward hook, and
>> that passes the packet on to the ipv4 forward hook.  Should this be
>> changed so that the ipv4 forward hook is called earlier, and filtering
>> on output port is never possible?
>
>As I already mentioned in this thread, I believe the concept of calling
>iptables from briding is flawed. We have ebtables to do filtering within
>a bridge. But this is getting slightly off-topic.

ebtables can't do things like stateful IP firewalling. IMO, things like 
transparent stateful firewalling are way too cool to just throw out of Linux 
because it troubles iptables somewhat in special cases.

cheers,
Bart

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
  2006-07-07 10:17 bdschuym@pandora.be
@ 2006-07-07 10:24 ` Patrick McHardy
  0 siblings, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-07 10:24 UTC (permalink / raw)
  To: bdschuym@pandora.be; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

bdschuym@pandora.be wrote:
>>----- Oorspronkelijk bericht -----
>>Van: Patrick McHardy [mailto:kaber@trash.net]
>>
>>This is the latest version. Fixed the crash and added a
>>runtime warning to the physdev match and on-the-fly activation
>>when needed, so we don't need the sysctls. Works fine in
>>a few simple tests, if there are no further objections
>>I'll submit this version.
> 
> 
> I'd add something like the code below on top of your patch. That way no false warnings are given if the user specifies she only wants to look at bridged packets.
> 
> --- net/netfilter/xt_physdev.c.old	2006-07-07 12:11:44.000000000 +0200
> +++ net/netfilter/xt_physdev.c	2006-07-07 12:14:14.000000000 +0200
> @@ -114,7 +114,9 @@ checkentry(const char *tablename,
>  	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
>  		return 0;
>  	if (brnf_deferred_hooks == 0 && info->bitmask & XT_PHYSDEV_OP_OUT &&
> -	    hook_mask & ((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD))) {
> +	    hook_mask & ((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD)) &&
> +	    (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) ||
> +	       info->invert & XT_PHYSDEV_OP_BRIDGED)) {
>  		printk(KERN_WARNING "physdev match: using --physdev-out in the "
>  		       "OUTPUT or FORWARD chain is deprecated and breaks other "
>  		       "things, it will be removed in January 2007. See "

Right, I forgot about that, will add and also update the text.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re:  RFC: Disable defered bridge hooks by default
@ 2006-07-07 10:17 bdschuym@pandora.be
  2006-07-07 10:24 ` Patrick McHardy
  0 siblings, 1 reply; 39+ messages in thread
From: bdschuym@pandora.be @ 2006-07-07 10:17 UTC (permalink / raw)
  To: Patrick McHardy, bdschuym@pandora.be
  Cc: Netfilter Development Mailinglist, Bart De Schuymer


>----- Oorspronkelijk bericht -----
>Van: Patrick McHardy [mailto:kaber@trash.net]
>Verzonden: vrijdag, juli 7, 2006 05:37 AM
>Aan: bdschuym@pandora.be
>CC: 'Bart De Schuymer', 'Netfilter Development Mailinglist'
>Onderwerp: Re: RFC: Disable defered bridge hooks by default
>
>bdschuym@pandora.be wrote:
>> Looks good, but please use deferred instead of defered :)
>
>This is the latest version. Fixed the crash and added a
>runtime warning to the physdev match and on-the-fly activation
>when needed, so we don't need the sysctls. Works fine in
>a few simple tests, if there are no further objections
>I'll submit this version.

I'd add something like the code below on top of your patch. That way no false warnings are given if the user specifies she only wants to look at bridged packets.

cheers,
Bart

--- net/netfilter/xt_physdev.c.old	2006-07-07 12:11:44.000000000 +0200
+++ net/netfilter/xt_physdev.c	2006-07-07 12:14:14.000000000 +0200
@@ -114,7 +114,9 @@ checkentry(const char *tablename,
 	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
 		return 0;
 	if (brnf_deferred_hooks == 0 && info->bitmask & XT_PHYSDEV_OP_OUT &&
-	    hook_mask & ((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD))) {
+	    hook_mask & ((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD)) &&
+	    (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) ||
+	       info->invert & XT_PHYSDEV_OP_BRIDGED)) {
 		printk(KERN_WARNING "physdev match: using --physdev-out in the "
 		       "OUTPUT or FORWARD chain is deprecated and breaks other "
 		       "things, it will be removed in January 2007. See "

>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
       [not found] <W8195318669268441152182124@nocme1bl6.telenet-ops.be>
  2006-07-06 10:49 ` Patrick McHardy
@ 2006-07-07  3:37 ` Patrick McHardy
  1 sibling, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-07  3:37 UTC (permalink / raw)
  To: bdschuym@pandora.be; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

bdschuym@pandora.be wrote:
> Looks good, but please use deferred instead of defered :)

This is the latest version. Fixed the crash and added a
runtime warning to the physdev match and on-the-fly activation
when needed, so we don't need the sysctls. Works fine in
a few simple tests, if there are no further objections
I'll submit this version.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 4351 bytes --]

[NETFILTER]: bridge netfilter: add deferred output hooks to feature-removal-schedule

Add deferred output hooks to feature-removal-schedule and disable them by
default. Until their removal they will be activated by the physdev match
when needed.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 47190a17ac996af8d1f1bcbc5650ff64a0ab171d
tree 52644df0d058e6174ffa37166ea69b59e62c556e
parent 120bda20c6f64b32e8bfbdd7b34feafaa5f5332e
author Patrick McHardy <kaber@trash.net> Fri, 07 Jul 2006 05:34:59 +0200
committer Patrick McHardy <kaber@trash.net> Fri, 07 Jul 2006 05:34:59 +0200

 Documentation/feature-removal-schedule.txt |   16 ++++++++++++++++
 include/linux/netfilter_bridge.h           |    3 +++
 net/bridge/br_netfilter.c                  |   11 +++++++++--
 net/netfilter/xt_physdev.c                 |    9 +++++++++
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 99f219a..e6aa994 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -266,3 +266,19 @@ Why:	The interrupt related SA_* flags ar
 Who:	Thomas Gleixner <tglx@linutronix.de>
 
 ---------------------------
+
+What:	Bridge netfilter deferred IPv4/IPv6 output hook calling
+When:	January 2007
+Why:	The deferred output hooks are a layering violation causing lots
+	of unusual and broken behaviour on bridge devices. Examples of
+	things they break include QoS classifation using the MARK or
+	CLASSIFY targets, the IPsec policy match and connection tracking
+	with VLANs on a bridge. Their only use is to enable bridge output
+	port filtering within iptables with the physdev match, which can
+	just as well be done by combining iptables and ebtables using
+	netfilter marks. Until it will get removed the hook deferral is
+	disabled by default and is only enabled when needed.
+
+Who:	Patrick McHardy <kaber@trash.net>
+
+---------------------------
diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h
index 8776402..8d5bd29 100644
--- a/include/linux/netfilter_bridge.h
+++ b/include/linux/netfilter_bridge.h
@@ -45,6 +45,7 @@ #define BRNF_BRIDGED_DNAT		0x02
 #define BRNF_DONT_TAKE_PARENT		0x04
 #define BRNF_BRIDGED			0x08
 #define BRNF_NF_BRIDGE_PREROUTING	0x10
+#define BRNF_DEFERRED_HOOK		0x20
 
 
 /* Only used in br_forward.c */
@@ -79,6 +80,8 @@ struct bridge_skb_cb {
 		__u32 ipv4;
 	} daddr;
 };
+
+extern int brnf_deferred_hooks;
 #endif /* CONFIG_BRIDGE_NETFILTER */
 
 #endif /* __KERNEL__ */
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 8298a51..dfe242f 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -61,6 +61,8 @@ #else
 #define brnf_filter_vlan_tagged 1
 #endif
 
+int brnf_deferred_hooks;
+
 static __be16 inline vlan_proto(const struct sk_buff *skb)
 {
 	return vlan_eth_hdr(skb)->h_vlan_encapsulated_proto;
@@ -721,7 +723,9 @@ #endif
 		NF_HOOK(PF_BRIDGE, NF_BR_FORWARD, skb, realindev,
 			skb->dev, br_forward_finish);
 		goto out;
-	}
+	} else if (!(nf_bridge->mask & BRNF_DEFERRED_HOOK))
+		return NF_ACCEPT;
+
 	realoutdev = bridge_parent(skb->dev);
 	if (!realoutdev)
 		return NF_DROP;
@@ -913,7 +917,10 @@ #if defined(CONFIG_VLAN_8021Q) || define
 		if (out->priv_flags & IFF_802_1Q_VLAN)
 			nf_bridge->netoutdev = (struct net_device *)out;
 #endif
-		return NF_STOP;
+		if (brnf_deferred_hooks) {
+			nf_bridge->mask |= BRNF_DEFERRED_HOOK;
+			return NF_STOP;
+		}
 	}
 
 	return NF_ACCEPT;
diff --git a/net/netfilter/xt_physdev.c b/net/netfilter/xt_physdev.c
index 5fe4c9d..7b9dba2 100644
--- a/net/netfilter/xt_physdev.c
+++ b/net/netfilter/xt_physdev.c
@@ -113,6 +113,15 @@ checkentry(const char *tablename,
 	if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
 	    info->bitmask & ~XT_PHYSDEV_OP_MASK)
 		return 0;
+	if (brnf_deferred_hooks == 0 && info->bitmask & XT_PHYSDEV_OP_OUT &&
+	    hook_mask & ((1 << NF_IP_LOCAL_OUT) | (1 << NF_IP_FORWARD))) {
+		printk(KERN_WARNING "physdev match: using --physdev-out in the "
+		       "OUTPUT or FORWARD chain is deprecated and breaks other "
+		       "things, it will be removed in January 2007. See "
+		       "Documentation/feature-removal-schedule.txt for "
+		       "details.\n");
+		brnf_deferred_hooks = 1;
+	}
 	return 1;
 }
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: RFC: Disable defered bridge hooks by default
       [not found] <W8195318669268441152182124@nocme1bl6.telenet-ops.be>
@ 2006-07-06 10:49 ` Patrick McHardy
  2006-07-07  3:37 ` Patrick McHardy
  1 sibling, 0 replies; 39+ messages in thread
From: Patrick McHardy @ 2006-07-06 10:49 UTC (permalink / raw)
  To: bdschuym@pandora.be; +Cc: Netfilter Development Mailinglist, Bart De Schuymer

bdschuym@pandora.be wrote:
>>----- Oorspronkelijk bericht -----
>>Van: Patrick McHardy [mailto:kaber@trash.net]
>>Verzonden: dinsdag, juli 4, 2006 11:27 AM
>>Aan: 'Bart De Schuymer'
>>CC: 'Netfilter Development Mailinglist'
>>Onderwerp: Re: RFC: Disable defered bridge hooks by default
>>
>>Patrick McHardy wrote:
>>
>>>Finally got to taking care of the first part of getting
>>>rid of the defered bridge hooks. Bart, does this look
>>>correct to you? This stuff confuses me badly :) Thanks.
>>>
> 
> 
> Looks good, but please use deferred instead of defered :)

Sure :) I tested it in the meantime and it crashes somewhere,
will repost after fixing that.

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2006-07-19 17:15 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-04  9:26 RFC: Disable defered bridge hooks by default Patrick McHardy
2006-07-04  9:27 ` Patrick McHardy
2006-07-08  0:36   ` Tom Eastep
2006-07-08  3:01     ` Patrick McHardy
2006-07-10  9:56       ` Amin Azez
2006-07-11  8:28         ` Patrick McHardy
2006-07-11  9:33           ` Amin Azez
2006-07-11 20:34       ` Tom Eastep
2006-07-11 21:29         ` Patrick McHardy
2006-07-12 22:41           ` Tom Eastep
2006-07-13  7:35             ` Patrick McHardy
2006-07-13 14:11               ` Tom Eastep
2006-07-13 14:45                 ` Patrick McHardy
2006-07-13 15:31                   ` Tom Eastep
2006-07-15 14:32                     ` Tom Eastep
2006-07-19 14:21                     ` Patrick McHardy
2006-07-19 15:50                       ` Tom Eastep
2006-07-19 16:02                         ` Patrick McHardy
2006-07-13  9:56             ` Amin Azez
2006-07-12  6:16       ` Philip Craig
2006-07-13  0:20         ` Tom Eastep
2006-07-13  0:42           ` David Miller
2006-07-13  0:45             ` Tom Eastep
2006-07-13  9:45               ` Amin Azez
2006-07-13  7:31           ` Patrick McHardy
2006-07-13  7:46         ` Patrick McHardy
2006-07-13  8:12           ` Philip Craig
2006-07-13  8:36             ` Patrick McHardy
2006-07-13 14:11           ` Amin Azez
2006-07-13 14:50             ` Patrick McHardy
2006-07-13 15:29               ` Amin Azez
2006-07-19 16:36                 ` Patrick McHardy
     [not found]                   ` <44BE624E.5080307@ufomechanic.net>
2006-07-19 17:15                     ` Patrick McHardy
     [not found] <W8195318669268441152182124@nocme1bl6.telenet-ops.be>
2006-07-06 10:49 ` Patrick McHardy
2006-07-07  3:37 ` Patrick McHardy
2006-07-07 10:17 bdschuym@pandora.be
2006-07-07 10:24 ` Patrick McHardy
2006-07-13 12:56 bdschuym@pandora.be
2006-07-13 14:38 ` Patrick McHardy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.