netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bridge:fragmented packets dropped by bridge
@ 2019-07-30 12:25 Rundong Ge
  2019-07-30 12:35 ` Florian Westphal
  2019-07-30 12:41 ` Nikolay Aleksandrov
  0 siblings, 2 replies; 8+ messages in thread
From: Rundong Ge @ 2019-07-30 12:25 UTC (permalink / raw)
  To: davem
  Cc: kuznet, yoshfuji, netdev, pablo, kadlec, fw, roopa,
	netfilter-devel, coreteam, bridge, nikolay, linux-kernel,
	rdong.ge

Given following setup:
-modprobe br_netfilter
-echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
-brctl addbr br0
-brctl addif br0 enp2s0
-brctl addif br0 enp3s0
-brctl addif br0 enp6s0
-ifconfig enp2s0 mtu 1300
-ifconfig enp3s0 mtu 1500
-ifconfig enp6s0 mtu 1500
-ifconfig br0 up

                 multi-port
mtu1500 - mtu1500|bridge|1500 - mtu1500
  A                  |            B
                   mtu1300

With netfilter defragmentation/conntrack enabled, fragmented
packets from A will be defragmented in prerouting, and refragmented
at postrouting.
But in this scenario the bridge found the frag_max_size(1500) is
larger than the dst mtu stored in the fake_rtable whitch is
always equal to the bridge's mtu 1300, then packets will be dopped.

This modifies ip_skb_dst_mtu to use the out dev's mtu instead
of bridge's mtu in bridge refragment.

Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
---
 include/net/ip.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/net/ip.h b/include/net/ip.h
index 29d89de..0512de3 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
 					  const struct sk_buff *skb)
 {
+	if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
+		return min(skb->dev->mtu, IP_MAX_MTU);
 	if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
 		bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-07-30 12:25 [PATCH] bridge:fragmented packets dropped by bridge Rundong Ge
@ 2019-07-30 12:35 ` Florian Westphal
  2019-07-30 13:50   ` Rundong Ge
  2019-08-26  7:59   ` Jan Engelhardt
  2019-07-30 12:41 ` Nikolay Aleksandrov
  1 sibling, 2 replies; 8+ messages in thread
From: Florian Westphal @ 2019-07-30 12:35 UTC (permalink / raw)
  To: Rundong Ge
  Cc: davem, kuznet, yoshfuji, netdev, pablo, kadlec, fw, roopa,
	netfilter-devel, coreteam, bridge, nikolay, linux-kernel

Rundong Ge <rdong.ge@gmail.com> wrote:
> Given following setup:
> -modprobe br_netfilter
> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> -brctl addbr br0
> -brctl addif br0 enp2s0
> -brctl addif br0 enp3s0
> -brctl addif br0 enp6s0
> -ifconfig enp2s0 mtu 1300
> -ifconfig enp3s0 mtu 1500
> -ifconfig enp6s0 mtu 1500
> -ifconfig br0 up
> 
>                  multi-port
> mtu1500 - mtu1500|bridge|1500 - mtu1500
>   A                  |            B
>                    mtu1300

How can a bridge forward a frame from A/B to mtu1300?

> With netfilter defragmentation/conntrack enabled, fragmented
> packets from A will be defragmented in prerouting, and refragmented
> at postrouting.

Yes, but I don't see how that relates to the problem at hand.

> But in this scenario the bridge found the frag_max_size(1500) is
> larger than the dst mtu stored in the fake_rtable whitch is
> always equal to the bridge's mtu 1300, then packets will be dopped.

What happens without netfilter or non-fragmented packets?

> This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> of bridge's mtu in bridge refragment.

It seems quite a hack?  The above setup should use a router, not a bridge.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-07-30 12:25 [PATCH] bridge:fragmented packets dropped by bridge Rundong Ge
  2019-07-30 12:35 ` Florian Westphal
@ 2019-07-30 12:41 ` Nikolay Aleksandrov
  2019-07-30 13:58   ` Rundong Ge
  2019-08-26  2:45   ` Rundong Ge
  1 sibling, 2 replies; 8+ messages in thread
From: Nikolay Aleksandrov @ 2019-07-30 12:41 UTC (permalink / raw)
  To: Rundong Ge, davem
  Cc: kuznet, yoshfuji, netdev, pablo, kadlec, fw, roopa,
	netfilter-devel, coreteam, bridge, linux-kernel

On 30/07/2019 15:25, Rundong Ge wrote:
> Given following setup:
> -modprobe br_netfilter
> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> -brctl addbr br0
> -brctl addif br0 enp2s0
> -brctl addif br0 enp3s0
> -brctl addif br0 enp6s0
> -ifconfig enp2s0 mtu 1300
> -ifconfig enp3s0 mtu 1500
> -ifconfig enp6s0 mtu 1500
> -ifconfig br0 up
> 
>                  multi-port
> mtu1500 - mtu1500|bridge|1500 - mtu1500
>   A                  |            B
>                    mtu1300
> 
> With netfilter defragmentation/conntrack enabled, fragmented
> packets from A will be defragmented in prerouting, and refragmented
> at postrouting.
> But in this scenario the bridge found the frag_max_size(1500) is
> larger than the dst mtu stored in the fake_rtable whitch is
> always equal to the bridge's mtu 1300, then packets will be dopped.
> 
> This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> of bridge's mtu in bridge refragment.
> 
> Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> ---
>  include/net/ip.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index 29d89de..0512de3 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>  static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
>  					  const struct sk_buff *skb)
>  {
> +	if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> +		return min(skb->dev->mtu, IP_MAX_MTU);
>  	if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
>  		bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
>  
> 

I don't think this is correct, there's a reason why the bridge chooses the smallest
possible MTU out of its members and this is simply a hack to circumvent it.
If you really like to do so just set the bridge MTU manually, we've added support
so it won't change automatically to the smallest, but then how do you pass packets
1500 -> 1300 in this setup ?

You're talking about the frag_size check in br_nf_ip_fragment(), right ?


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-07-30 12:35 ` Florian Westphal
@ 2019-07-30 13:50   ` Rundong Ge
  2019-08-26  7:59   ` Jan Engelhardt
  1 sibling, 0 replies; 8+ messages in thread
From: Rundong Ge @ 2019-07-30 13:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: davem, kuznet, yoshfuji, netdev, Pablo Neira Ayuso, kadlec,
	Roopa Prabhu, netfilter-devel, coreteam, bridge, nikolay,
	linux-kernel

> How can a bridge forward a frame from A/B to mtu1300?
It is free for user to set different MTU for bridge ports. In our case
only tcp traffic between A/B and mtu 1300, and mss negotiation can
make packets less than 1300.

> What happens without netfilter or non-fragmented packets?
Without br_netfilter it works fine, there is no defragmentation and
refragmentation, fragmented packets will egress directly.

Florian Westphal <fw@strlen.de> 于2019年7月30日周二 下午8:35写道:
>
> Rundong Ge <rdong.ge@gmail.com> wrote:
> > Given following setup:
> > -modprobe br_netfilter
> > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> > -brctl addbr br0
> > -brctl addif br0 enp2s0
> > -brctl addif br0 enp3s0
> > -brctl addif br0 enp6s0
> > -ifconfig enp2s0 mtu 1300
> > -ifconfig enp3s0 mtu 1500
> > -ifconfig enp6s0 mtu 1500
> > -ifconfig br0 up
> >
> >                  multi-port
> > mtu1500 - mtu1500|bridge|1500 - mtu1500
> >   A                  |            B
> >                    mtu1300
>
> How can a bridge forward a frame from A/B to mtu1300?
>
> > With netfilter defragmentation/conntrack enabled, fragmented
> > packets from A will be defragmented in prerouting, and refragmented
> > at postrouting.
>
> Yes, but I don't see how that relates to the problem at hand.
>
> > But in this scenario the bridge found the frag_max_size(1500) is
> > larger than the dst mtu stored in the fake_rtable whitch is
> > always equal to the bridge's mtu 1300, then packets will be dopped.
>
> What happens without netfilter or non-fragmented packets?
>
> > This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> > of bridge's mtu in bridge refragment.
>
> It seems quite a hack?  The above setup should use a router, not a bridge.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-07-30 12:41 ` Nikolay Aleksandrov
@ 2019-07-30 13:58   ` Rundong Ge
  2019-08-26  2:45   ` Rundong Ge
  1 sibling, 0 replies; 8+ messages in thread
From: Rundong Ge @ 2019-07-30 13:58 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: davem, kuznet, yoshfuji, netdev, Pablo Neira Ayuso, kadlec,
	Florian Westphal, Roopa Prabhu, netfilter-devel, coreteam,
	bridge, linux-kernel

Yes it is about the frag_size check in br_nf_ip_fragment(). As i said
without br_netfilter the packets forwarding is fine.
And I feel it is weird that br_nf_dev_queue_xmit() use out dev's mtu
to decide whether to do the fragmentation, but
then br_nf_ip_fragment() use bridge's mtu to do the actual fragmentation.

And in this case fragmented packets fit the out dev mtu but were
dropped, I think it is not right.

Nikolay Aleksandrov <nikolay@cumulusnetworks.com> 于2019年7月30日周二 下午8:41写道:
>
> On 30/07/2019 15:25, Rundong Ge wrote:
> > Given following setup:
> > -modprobe br_netfilter
> > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> > -brctl addbr br0
> > -brctl addif br0 enp2s0
> > -brctl addif br0 enp3s0
> > -brctl addif br0 enp6s0
> > -ifconfig enp2s0 mtu 1300
> > -ifconfig enp3s0 mtu 1500
> > -ifconfig enp6s0 mtu 1500
> > -ifconfig br0 up
> >
> >                  multi-port
> > mtu1500 - mtu1500|bridge|1500 - mtu1500
> >   A                  |            B
> >                    mtu1300
> >
> > With netfilter defragmentation/conntrack enabled, fragmented
> > packets from A will be defragmented in prerouting, and refragmented
> > at postrouting.
> > But in this scenario the bridge found the frag_max_size(1500) is
> > larger than the dst mtu stored in the fake_rtable whitch is
> > always equal to the bridge's mtu 1300, then packets will be dopped.
> >
> > This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> > of bridge's mtu in bridge refragment.
> >
> > Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> > ---
> >  include/net/ip.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/include/net/ip.h b/include/net/ip.h
> > index 29d89de..0512de3 100644
> > --- a/include/net/ip.h
> > +++ b/include/net/ip.h
> > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> >  static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> >                                         const struct sk_buff *skb)
> >  {
> > +     if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> > +             return min(skb->dev->mtu, IP_MAX_MTU);
> >       if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> >               bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> >
> >
>
> I don't think this is correct, there's a reason why the bridge chooses the smallest
> possible MTU out of its members and this is simply a hack to circumvent it.
> If you really like to do so just set the bridge MTU manually, we've added support
> so it won't change automatically to the smallest, but then how do you pass packets
> 1500 -> 1300 in this setup ?
>
> You're talking about the frag_size check in br_nf_ip_fragment(), right ?
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-07-30 12:41 ` Nikolay Aleksandrov
  2019-07-30 13:58   ` Rundong Ge
@ 2019-08-26  2:45   ` Rundong Ge
  1 sibling, 0 replies; 8+ messages in thread
From: Rundong Ge @ 2019-08-26  2:45 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: davem, kuznet, yoshfuji, netdev, Pablo Neira Ayuso, kadlec,
	Florian Westphal, Roopa Prabhu, netfilter-devel, coreteam,
	bridge, linux-kernel

On Tue, Jul 30, 2019 at 8:41 PM Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
>
> On 30/07/2019 15:25, Rundong Ge wrote:
> > Given following setup:
> > -modprobe br_netfilter
> > -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> > -brctl addbr br0
> > -brctl addif br0 enp2s0
> > -brctl addif br0 enp3s0
> > -brctl addif br0 enp6s0
> > -ifconfig enp2s0 mtu 1300
> > -ifconfig enp3s0 mtu 1500
> > -ifconfig enp6s0 mtu 1500
> > -ifconfig br0 up
> >
> >                  multi-port
> > mtu1500 - mtu1500|bridge|1500 - mtu1500
> >   A                  |            B
> >                    mtu1300
> >
> > With netfilter defragmentation/conntrack enabled, fragmented
> > packets from A will be defragmented in prerouting, and refragmented
> > at postrouting.
> > But in this scenario the bridge found the frag_max_size(1500) is
> > larger than the dst mtu stored in the fake_rtable whitch is
> > always equal to the bridge's mtu 1300, then packets will be dopped.
> >
> > This modifies ip_skb_dst_mtu to use the out dev's mtu instead
> > of bridge's mtu in bridge refragment.
> >
> > Signed-off-by: Rundong Ge <rdong.ge@gmail.com>
> > ---
> >  include/net/ip.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/include/net/ip.h b/include/net/ip.h
> > index 29d89de..0512de3 100644
> > --- a/include/net/ip.h
> > +++ b/include/net/ip.h
> > @@ -450,6 +450,8 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
> >  static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
> >                                         const struct sk_buff *skb)
> >  {
> > +     if ((skb_dst(skb)->flags & DST_FAKE_RTABLE) && skb->dev)
> > +             return min(skb->dev->mtu, IP_MAX_MTU);
> >       if (!sk || !sk_fullsock(sk) || ip_sk_use_pmtu(sk)) {
> >               bool forwarding = IPCB(skb)->flags & IPSKB_FORWARDED;
> >
> >
>
> I don't think this is correct, there's a reason why the bridge chooses the smallest
> possible MTU out of its members and this is simply a hack to circumvent it.
> If you really like to do so just set the bridge MTU manually, we've added support
> so it won't change automatically to the smallest, but then how do you pass packets
> 1500 -> 1300 in this setup ?
>
> You're talking about the frag_size check in br_nf_ip_fragment(), right ?
>

Hi Nikolay
My setup may not be common. And may I know if there is any reason to
use output port's MTU
to do the re-fragment check but then use the bridge's MTU to do the re-fragment?
Is it the expected behavior that the bridge's MTU will affect the
FORWARD traffic re-fragment,
because I used to think the bridge's MTU will only effect the OUTPUT
traffic sent from "br0".
And the modification in this patch will replace the MTU in the
fake_rtable which is only
used in the FORWARD re-fragment and won't affect the local traffic from "br0".

TKS
Raydodn

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-07-30 12:35 ` Florian Westphal
  2019-07-30 13:50   ` Rundong Ge
@ 2019-08-26  7:59   ` Jan Engelhardt
  2019-08-28  9:21     ` Rundong Ge
  1 sibling, 1 reply; 8+ messages in thread
From: Jan Engelhardt @ 2019-08-26  7:59 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Rundong Ge, davem, kuznet, yoshfuji, netdev, pablo, kadlec,
	roopa, netfilter-devel, coreteam, bridge, nikolay, linux-kernel


On Tuesday 2019-07-30 14:35, Florian Westphal wrote:
>Rundong Ge <rdong.ge@gmail.com> wrote:
>> Given following setup:
>> -modprobe br_netfilter
>> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
>> -brctl addbr br0
>> -brctl addif br0 enp2s0
>> -brctl addif br0 enp3s0
>> -brctl addif br0 enp6s0
>> -ifconfig enp2s0 mtu 1300
>> -ifconfig enp3s0 mtu 1500
>> -ifconfig enp6s0 mtu 1500
>> -ifconfig br0 up
>> 
>>                  multi-port
>> mtu1500 - mtu1500|bridge|1500 - mtu1500
>>   A                  |            B
>>                    mtu1300
>
>How can a bridge forward a frame from A/B to mtu1300?

There might be a misunderstanding here judging from the shortness of this
thread.

I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0
(in essence the third bridge port if you so wish) itself has MTU 1300.

Therefore, frame forwarding from eth0 to eth1 should succeed, since the
1300-byte MTU is only relevant if the bridge decides the packet needs to be
locally delivered.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] bridge:fragmented packets dropped by bridge
  2019-08-26  7:59   ` Jan Engelhardt
@ 2019-08-28  9:21     ` Rundong Ge
  0 siblings, 0 replies; 8+ messages in thread
From: Rundong Ge @ 2019-08-28  9:21 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Florian Westphal, davem, kuznet, yoshfuji, netdev,
	Pablo Neira Ayuso, kadlec, Roopa Prabhu, netfilter-devel,
	coreteam, bridge, Nikolay Aleksandrov, linux-kernel

Jan Engelhardt <jengelh@inai.de> 于2019年8月26日周一 下午3:59写道:
>
>
> On Tuesday 2019-07-30 14:35, Florian Westphal wrote:
> >Rundong Ge <rdong.ge@gmail.com> wrote:
> >> Given following setup:
> >> -modprobe br_netfilter
> >> -echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
> >> -brctl addbr br0
> >> -brctl addif br0 enp2s0
> >> -brctl addif br0 enp3s0
> >> -brctl addif br0 enp6s0
> >> -ifconfig enp2s0 mtu 1300
> >> -ifconfig enp3s0 mtu 1500
> >> -ifconfig enp6s0 mtu 1500
> >> -ifconfig br0 up
> >>
> >>                  multi-port
> >> mtu1500 - mtu1500|bridge|1500 - mtu1500
> >>   A                  |            B
> >>                    mtu1300
> >
> >How can a bridge forward a frame from A/B to mtu1300?
>
> There might be a misunderstanding here judging from the shortness of this
> thread.
>
> I understood it such that the bridge ports (eth0,eth1) have MTU 1500, yet br0
> (in essence the third bridge port if you so wish) itself has MTU 1300.
>
> Therefore, frame forwarding from eth0 to eth1 should succeed, since the
> 1300-byte MTU is only relevant if the bridge decides the packet needs to be
> locally delivered.

Under this setup when I do "ping B -l 2000" from A, the fragmented
packets will be dropped by bridge.
When the "/proc/sys/net/bridge/bridge-nf-call-iptables" is on, bridge
will do defragment at PREROUTING and re-fragment at POSTROUTING. At
the re-fragment bridge will check if the max frag size is larger than
the bridge's MTU in  br_nf_ip_fragment(), if it is true packets will
be dropped.
And this patch use the outdev's MTU instead of the bridge's MTU to do
the br_nf_ip_fragment.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-28  9:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 12:25 [PATCH] bridge:fragmented packets dropped by bridge Rundong Ge
2019-07-30 12:35 ` Florian Westphal
2019-07-30 13:50   ` Rundong Ge
2019-08-26  7:59   ` Jan Engelhardt
2019-08-28  9:21     ` Rundong Ge
2019-07-30 12:41 ` Nikolay Aleksandrov
2019-07-30 13:58   ` Rundong Ge
2019-08-26  2:45   ` Rundong Ge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).