All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
@ 2019-02-07 22:15 Sander Eikelenboom
  2019-02-08  7:07 ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Sander Eikelenboom @ 2019-02-07 22:15 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, David S. Miller, netdev, linux-kernel

L.S.,

While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
(using an nftables firewall with NAT and connection tracking).

Unfortunately it isn't too obvious since no errors are logged, but on clients it
causes symptoms like firefox intermittently not being able to load pages with:
    Network Protocol Error
    An error occurred during a connection to www.example.com
    The page you are trying to view cannot be shown because an error in the network protocol was detected.
    Please contact the website owners to inform them of this problem.

But it's only intermittently, so i can still visit some webpages with clients, 
could be that packet size and or fragments are at play ?

So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
but to no avail. 

After that I tried to git bisect and ended up with:

faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
commit faec18dbb0405c7d4dda025054511dc3a6696918
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Dec 13 16:01:33 2018 +0100

    netfilter: nat: remove l4proto->manip_pkt
    
    This removes the last l4proto indirection, the two callers, the l3proto
    packet mangling helpers for ipv4 and ipv6, now call the
    nf_nat_l4proto_manip_pkt() helper.
    
    nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
    they contain no functionality anymore to not clutter this patch.
    
    Next patch will remove the empty files and the nf_nat_l4proto
    struct.
    
    nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
    other nat manip functionality as well, not just udp and udplite.
    
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

:040000 040000 22d8706921e03cbd6d78a6ebcc5f253ccfd2bf0c b6f8ab2779215b4495dfe641f50e798da73859ac M  include
:040000 040000 af212a756f1acf00cbe45c3be5b71f38f01f1d34 165c440f9e6f2e05738628a19b51f7603f95752a M  net

Any ideas or debugging hints ?

--
Sander

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
  2019-02-07 22:15 Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt Sander Eikelenboom
@ 2019-02-08  7:07 ` Florian Westphal
  2019-02-08 11:54   ` Florian Westphal
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2019-02-08  7:07 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Florian Westphal, Pablo Neira Ayuso, David S. Miller, netdev,
	linux-kernel

Sander Eikelenboom <linux@eikelenboom.it> wrote:
> L.S.,
> 
> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
> (using an nftables firewall with NAT and connection tracking).
> 
> Unfortunately it isn't too obvious since no errors are logged, but on clients it
> causes symptoms like firefox intermittently not being able to load pages with:
>     Network Protocol Error
>     An error occurred during a connection to www.example.com
>     The page you are trying to view cannot be shown because an error in the network protocol was detected.
>     Please contact the website owners to inform them of this problem.
> 
> But it's only intermittently, so i can still visit some webpages with clients, 
> could be that packet size and or fragments are at play ?
> 
> So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
> but to no avail. 
> 
> After that I tried to git bisect and ended up with:
> 
> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
> commit faec18dbb0405c7d4dda025054511dc3a6696918
> Author: Florian Westphal <fw@strlen.de>
> Date:   Thu Dec 13 16:01:33 2018 +0100
> 
>     netfilter: nat: remove l4proto->manip_pkt

Thanks, this is immensely helpful.

I think I see the bug, we can't use target->dst.protonum in
nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
with a related icmp packet.

I will send a patch in a few hours when I get back.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
  2019-02-08  7:07 ` Florian Westphal
@ 2019-02-08 11:54   ` Florian Westphal
  2019-02-08 15:34     ` Sander Eikelenboom
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Westphal @ 2019-02-08 11:54 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Sander Eikelenboom, Pablo Neira Ayuso, David S. Miller, netdev,
	linux-kernel

Florian Westphal <fw@strlen.de> wrote:
> Sander Eikelenboom <linux@eikelenboom.it> wrote:
> > L.S.,
> > 
> > While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
> > (using an nftables firewall with NAT and connection tracking).
> > 
> > Unfortunately it isn't too obvious since no errors are logged, but on clients it
> > causes symptoms like firefox intermittently not being able to load pages with:
> >     Network Protocol Error
> >     An error occurred during a connection to www.example.com
> >     The page you are trying to view cannot be shown because an error in the network protocol was detected.
> >     Please contact the website owners to inform them of this problem.
> > 
> > But it's only intermittently, so i can still visit some webpages with clients, 
> > could be that packet size and or fragments are at play ?
> > 
> > So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
> > e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
> > but to no avail. 
> > 
> > After that I tried to git bisect and ended up with:
> > 
> > faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
> > commit faec18dbb0405c7d4dda025054511dc3a6696918
> > Author: Florian Westphal <fw@strlen.de>
> > Date:   Thu Dec 13 16:01:33 2018 +0100
> > 
> >     netfilter: nat: remove l4proto->manip_pkt
> 
> Thanks, this is immensely helpful.
> 
> I think I see the bug, we can't use target->dst.protonum in
> nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
> with a related icmp packet.
> 
> I will send a patch in a few hours when I get back.

Sander, does this patch fix things for you?

Thanks!

diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
--- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
@@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
 
 	/* Change outer to look like the reply to an incoming packet */
 	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+	target.dst.protonum = IPPROTO_ICMP;
 	if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip))
 		return 0;
 
diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
--- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
@@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
 	}
 
 	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
+	target.dst.protonum = IPPROTO_ICMPV6;
 	if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip))
 		return 0;
 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt
  2019-02-08 11:54   ` Florian Westphal
@ 2019-02-08 15:34     ` Sander Eikelenboom
  0 siblings, 0 replies; 4+ messages in thread
From: Sander Eikelenboom @ 2019-02-08 15:34 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, David S. Miller, netdev, linux-kernel

On 08/02/2019 12:54, Florian Westphal wrote:
> Florian Westphal <fw@strlen.de> wrote:
>> Sander Eikelenboom <linux@eikelenboom.it> wrote:
>>> L.S.,
>>>
>>> While trying out a 5.0-RC5 kernel I seem to have stumbled over a regression with NAT.
>>> (using an nftables firewall with NAT and connection tracking).
>>>
>>> Unfortunately it isn't too obvious since no errors are logged, but on clients it
>>> causes symptoms like firefox intermittently not being able to load pages with:
>>>     Network Protocol Error
>>>     An error occurred during a connection to www.example.com
>>>     The page you are trying to view cannot be shown because an error in the network protocol was detected.
>>>     Please contact the website owners to inform them of this problem.
>>>
>>> But it's only intermittently, so i can still visit some webpages with clients, 
>>> could be that packet size and or fragments are at play ?
>>>
>>> So I tried testing with git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git with 
>>> e8c32c32b48c2e889704d8ca0872f92eb027838e as last commit, to be sure to have the latest netdev has to offer,
>>> but to no avail. 
>>>
>>> After that I tried to git bisect and ended up with:
>>>
>>> faec18dbb0405c7d4dda025054511dc3a6696918 is the first bad commit
>>> commit faec18dbb0405c7d4dda025054511dc3a6696918
>>> Author: Florian Westphal <fw@strlen.de>
>>> Date:   Thu Dec 13 16:01:33 2018 +0100
>>>
>>>     netfilter: nat: remove l4proto->manip_pkt
>>
>> Thanks, this is immensely helpful.
>>
>> I think I see the bug, we can't use target->dst.protonum in
>> nf_nat_l4proto_manip_pkt(), it will be TCP in case we're dealing
>> with a related icmp packet.
>>
>> I will send a patch in a few hours when I get back.
> 
> Sander, does this patch fix things for you?

Hi Florian,

You may stick on a reported/tested-by if you like.
Thanks for the swift fix !

--
Sander

> 
> Thanks!
> 
> diff --git a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> --- a/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> +++ b/net/ipv4/netfilter/nf_nat_l3proto_ipv4.c
> @@ -215,6 +215,7 @@ int nf_nat_icmp_reply_translation(struct sk_buff *skb,
>  
>  	/* Change outer to look like the reply to an incoming packet */
>  	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> +	target.dst.protonum = IPPROTO_ICMP;
>  	if (!nf_nat_ipv4_manip_pkt(skb, 0, &target, manip))
>  		return 0;
>  
> diff --git a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
> --- a/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
> +++ b/net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
> @@ -226,6 +226,7 @@ int nf_nat_icmpv6_reply_translation(struct sk_buff *skb,
>  	}
>  
>  	nf_ct_invert_tuplepr(&target, &ct->tuplehash[!dir].tuple);
> +	target.dst.protonum = IPPROTO_ICMPV6;
>  	if (!nf_nat_ipv6_manip_pkt(skb, 0, &target, manip))
>  		return 0;
>  
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-02-08 15:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-07 22:15 Kernel 5.0-rc5 regression with NAT, bisected to: netfilter: nat: remove l4proto->manip_pkt Sander Eikelenboom
2019-02-08  7:07 ` Florian Westphal
2019-02-08 11:54   ` Florian Westphal
2019-02-08 15:34     ` Sander Eikelenboom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.