How to troubleshoot (suspected) flowtable lockups/packet drops?

All of lore.kernel.org
 help / color / mirror / Atom feed

* How to troubleshoot (suspected) flowtable lockups/packet drops?
@ 2021-03-16 15:43 Martin Gignac
  2021-03-16 23:05 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Gignac @ 2021-03-16 15:43 UTC (permalink / raw)
  To: netfilter

Hi,

A while back I set up flowtables on my firewall, which is running
Fedora Server 33. I defined all of my network interfaces (physical and
virtual) as flowtable devices:

    flowtable f {
            hook ingress priority filter
            devices = { tun0, bond0, dummy0, bond1.999, bond1,
vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0,
eno4, eno3, eno2, eno1 }
    }

I then configured the forward chain to offload all IPv4/IPv6 TCP and
UDP traffic to the flowtable:

    chain forward {
        type filter hook forward priority filter; policy drop;
        ip protocol { tcp, udp } flow offload @f
        ip6 nexthdr { tcp, udp } flow offload @f
        ct state established,related counter packets 0 bytes 0 accept
        ct state invalid drop
        [...] (various accept rules)
    }

This seemed to be working fine until yesterday, when an IPv6 SSH
session through an OpenVPN tunnel (terminating on the firewall)
between my home computer and a VM at work started locking up. I would
then start a new IPv6 SSH session to the same VM and it work fine for
a short while, and then it would lock up as well. The lock ups would
last a few seconds to a few minutes, and then resolve themselves
without me doing anything. It would work for a short while, then it
would lock up again, and so on. IPv4 sessions did not seem to be
affected.

I tcpdump'ed the incoming OpenVPN traffic on the tun0 interface while
simultaneously tcpdump'ing on the outgoing interface towards the VM,
and I noticed that when the lockups occurred, I would see incoming
traffic but no outgoing traffic. So at least I eliminated issues on
the Internet since traffic *was* coming in on the VPN.

I then added a rule in my trace chain to filter for IPv6 traffic
coming from my home computer with the source port of one of the SSH
connections I had that kept locking up:

    chain trace_chain {
        type filter hook prerouting priority -301;
        ip6 saddr 2682:272:9000:6::1:10 tcp sport 41000 meta nftrace set 1
    }

I ran 'nft monitor trace' and initially I didn't see anything, which I
assumed to be normal since the ASCII diagram at
https://wiki.nftables.org/wiki-nftables/index.php/Flowtable shows that
traffic gets shunted to the flowtable before the prerouting hook.
Then, the SSH session locked up again, and right before it resumed, I
suddenly saw an entry appear in the traces, matching this rule:

    ct state established,related

No other packet appeared UNTIL the SSH session locked up again, and
right before it resumed once more. Can something explain this
behavior? I don't know understand fully how flowtables work, but it
seems to me like suddenly there are no more hits for that specific
flow in the flowtable, and after a while the next packet in the
session no longer bypasses the classic forwarding path. That packet
then matches 'ct state established,related' and an established
conntrack entry, which then puts a new flow in the flow table, and the
subsequent packets then once again bypass the classic forwarding
path... until it locks up again.

I'm not sure where to look at this stage. I wanted to look at the
entry in the flow table, much like one can do for conntrack sessions,
but I couldn't find any info on the web regarding whether this is
actually possible or not.

Does anybody have any flowtable troubleshooting tips for me?

Thanks,
-Martin

P.S. The OS is Fedora Server 33 (kernel 5.10.17-200.fc33.x86_64) with
a manually compiled nftables (v0.9.8).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-16 15:43 How to troubleshoot (suspected) flowtable lockups/packet drops? Martin Gignac
@ 2021-03-16 23:05 ` Pablo Neira Ayuso
  2021-03-17  1:37   ` Martin Gignac
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2021-03-16 23:05 UTC (permalink / raw)
  To: Martin Gignac; +Cc: netfilter

On Tue, Mar 16, 2021 at 11:43:32AM -0400, Martin Gignac wrote:
> Hi,
> 
> A while back I set up flowtables on my firewall, which is running
> Fedora Server 33. I defined all of my network interfaces (physical and
> virtual) as flowtable devices:

If you enable counters, the flowtable updates the conntrack table
counters. So you can consult them via conntrack -L.

See below in your flowtable definition.

>     flowtable f {
>             hook ingress priority filter
>             devices = { tun0, bond0, dummy0, bond1.999, bond1,
> vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0,
> eno4, eno3, eno2, eno1 }

              counter

>     }

The flowtable datapath updates the counters right before the packet
transmission.

> I then configured the forward chain to offload all IPv4/IPv6 TCP and
> UDP traffic to the flowtable:
> 
>     chain forward {
>         type filter hook forward priority filter; policy drop;
>         ip protocol { tcp, udp } flow offload @f
>         ip6 nexthdr { tcp, udp } flow offload @f
>         ct state established,related counter packets 0 bytes 0 accept
>         ct state invalid drop
>         [...] (various accept rules)
>     }
> 
> This seemed to be working fine until yesterday, when an IPv6 SSH
> session through an OpenVPN tunnel (terminating on the firewall)
> between my home computer and a VM at work started locking up. I would
> then start a new IPv6 SSH session to the same VM and it work fine for
> a short while, and then it would lock up as well. The lock ups would
> last a few seconds to a few minutes, and then resolve themselves
> without me doing anything. It would work for a short while, then it
> would lock up again, and so on. IPv4 sessions did not seem to be
> affected.
> 
> I tcpdump'ed the incoming OpenVPN traffic on the tun0 interface while
> simultaneously tcpdump'ing on the outgoing interface towards the VM,
> and I noticed that when the lockups occurred, I would see incoming
> traffic but no outgoing traffic. So at least I eliminated issues on
> the Internet since traffic *was* coming in on the VPN.
> 
> I then added a rule in my trace chain to filter for IPv6 traffic
> coming from my home computer with the source port of one of the SSH
> connections I had that kept locking up:
> 
>     chain trace_chain {
>         type filter hook prerouting priority -301;
>         ip6 saddr 2682:272:9000:6::1:10 tcp sport 41000 meta nftrace set 1
>     }
> 
> I ran 'nft monitor trace' and initially I didn't see anything, which I
> assumed to be normal since the ASCII diagram at
> https://wiki.nftables.org/wiki-nftables/index.php/Flowtable shows that
> traffic gets shunted to the flowtable before the prerouting hook.
> Then, the SSH session locked up again, and right before it resumed, I
> suddenly saw an entry appear in the traces, matching this rule:
> 
>     ct state established,related
> 
> No other packet appeared UNTIL the SSH session locked up again, and
> right before it resumed once more. Can something explain this
> behavior? I don't know understand fully how flowtables work, but it
> seems to me like suddenly there are no more hits for that specific
> flow in the flowtable, and after a while the next packet in the
> session no longer bypasses the classic forwarding path. That packet
> then matches 'ct state established,related' and an established
> conntrack entry, which then puts a new flow in the flow table, and the
> subsequent packets then once again bypass the classic forwarding
> path... until it locks up again.
>
> I'm not sure where to look at this stage. I wanted to look at the
> entry in the flow table, much like one can do for conntrack sessions,
> but I couldn't find any info on the web regarding whether this is
> actually possible or not.

No way to dump the flowtable content yet, but that is doable.

> Does anybody have any flowtable troubleshooting tips for me?

Quick description of the flowtable datapath:

#1 if lookup fails (no match) => pass up (to classic forwarding path)
#2 if packet exceeds mtu => pass up
#3 tcp fin or rst => pass up
... from this point on packet follows the flowtable datapath
#4 nat
#5 decrement ttl
#6 update counters (if enabled)
#7 xmit

The flowtable entry stays in the flowtable for 30 seconds, if no
packets are seen, the entry expires and packets go back to classic
forwarding path.

Your trace rule shows no packets being passed up until the connection
resumes from the stall.

Can you see the flowtable counters being updated?

> Thanks,
> -Martin
> 
> P.S. The OS is Fedora Server 33 (kernel 5.10.17-200.fc33.x86_64) with
> a manually compiled nftables (v0.9.8).

This kernel contains 8d6bca156e47 ("netfilter: flowtable: fix tcp and
udp header checksum update"), so this might be a different issue.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-16 23:05 ` Pablo Neira Ayuso
@ 2021-03-17  1:37   ` Martin Gignac
  2021-03-17 10:34     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Gignac @ 2021-03-17  1:37 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

Hi Pablo,

How can I add the 'counter' option to the existing flowtable? Is this
possible? I tried the following:

    [magi@s116r2l1fw01a ~]$ sudo nft add flowtable inet filter f {
hook ingress priority filter \; counter \;}

It didn't complain, but I couldn't see the 'counter' option when I
listed the flowtable:

    [magi@s116r2l1fw01a ~]$ sudo nft list flowtables
    Did not kill
    table inet filter {
            flowtable f {
                    hook ingress priority filter
                    devices = { tun0, bond0, dummy0, bond1.999, bond1,
vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0,
eno4, eno3, eno2, eno1 }
            }
    }
    table ip nat {
    }

And looking at the output of conntrack -L, the counters didn't
increase past the initial flow creation packet:

    tcp      6 src=192.168.125.3 dst=192.168.40.254 sport=53420
dport=22 packets=1 bytes=60 src=192.168.40.254 dst=192.168.125.3
sport=22 dport=53420 packets=1 bytes=60 [OFFLOAD] mark=0 use=2

How do I enable the 'counter' option exactly (since I'm clearly doing
something wrong)?

Thanks,
-Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-17  1:37   ` Martin Gignac
@ 2021-03-17 10:34     ` Pablo Neira Ayuso
  2021-03-17 19:07       ` Martin Gignac
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2021-03-17 10:34 UTC (permalink / raw)
  To: Martin Gignac; +Cc: netfilter

On Tue, Mar 16, 2021 at 09:37:44PM -0400, Martin Gignac wrote:
> Hi Pablo,
> 
> How can I add the 'counter' option to the existing flowtable? Is this
> possible? I tried the following:
> 
>     [magi@s116r2l1fw01a ~]$ sudo nft add flowtable inet filter f {
> hook ingress priority filter \; counter \;}

Sorry, support for updating flowtable properties is missing:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210317103156.23859-1-pablo@netfilter.org/
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210317103156.23859-2-pablo@netfilter.org/

> It didn't complain, but I couldn't see the 'counter' option when I
> listed the flowtable:
> 
>     [magi@s116r2l1fw01a ~]$ sudo nft list flowtables
>     Did not kill
>     table inet filter {
>             flowtable f {
>                     hook ingress priority filter
>                     devices = { tun0, bond0, dummy0, bond1.999, bond1,
> vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0,
> eno4, eno3, eno2, eno1 }
>             }
>     }
>     table ip nat {
>     }
> 
> And looking at the output of conntrack -L, the counters didn't
> increase past the initial flow creation packet:
> 
>     tcp      6 src=192.168.125.3 dst=192.168.40.254 sport=53420
> dport=22 packets=1 bytes=60 src=192.168.40.254 dst=192.168.125.3
> sport=22 dport=53420 packets=1 bytes=60 [OFFLOAD] mark=0 use=2
> 
> How do I enable the 'counter' option exactly (since I'm clearly doing
> something wrong)?

Set on the counter flags at flowtable creation time, ie. flowtable 'f'
should not exist.

# cat flowtable.nft
table inet filter {
        flowtable f {
                hook ingress priority filter
                devices = { tun0, bond0, dummy0, bond1.999, bond1, vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0, eno4, eno3, eno2, eno1 }
                counter
        }
}
# nft -f flowtable.nft

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-17 10:34     ` Pablo Neira Ayuso
@ 2021-03-17 19:07       ` Martin Gignac
  2021-03-17 20:42         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Gignac @ 2021-03-17 19:07 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

> Set on the counter flags at flowtable creation time, ie. flowtable 'f'
> should not exist.

I tried creating a file like this:

    delete flowtable inet filter f

    table inet filter {

        flowtable f {
            hook ingress priority filter - 1
            devices = { tun0, bond0, dummy0, bond1.999, bond1,
vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0,
eno4, eno3, eno2, eno1 }
            counter
        }
    }

And then running nft -f <filename> on it, but I got these errors:

    <filename>:1:30-30: Error: Could not process rule: Device or resource busy
    delete flowtable inet filter f

I assume this is because the flowtable is in use, so it can not be deleted.

Short of rebooting the Linux server (which I cannot do right now since
I have many people relying on it), is there any kind of way for me to
re-create the flowtable with the added 'counter' parameter without
impacting traffic?

Thanks,
-Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-17 19:07       ` Martin Gignac
@ 2021-03-17 20:42         ` Pablo Neira Ayuso
  2021-03-17 22:01           ` Martin Gignac
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2021-03-17 20:42 UTC (permalink / raw)
  To: Martin Gignac; +Cc: netfilter

On Wed, Mar 17, 2021 at 03:07:55PM -0400, Martin Gignac wrote:
> > Set on the counter flags at flowtable creation time, ie. flowtable 'f'
> > should not exist.
> 
> I tried creating a file like this:
> 
>     delete flowtable inet filter f
> 
>     table inet filter {
> 
>         flowtable f {
>             hook ingress priority filter - 1
>             devices = { tun0, bond0, dummy0, bond1.999, bond1,
> vrf-conntrackd, vrf-mgmt, enp66s0f1, enp66s0f0, enp5s0f1, enp5s0f0,
> eno4, eno3, eno2, eno1 }
>             counter
>         }
>     }
> 
> And then running nft -f <filename> on it, but I got these errors:
> 
>     <filename>:1:30-30: Error: Could not process rule: Device or resource busy
>     delete flowtable inet filter f
> 
> I assume this is because the flowtable is in use, so it can not be deleted.
> 
> Short of rebooting the Linux server (which I cannot do right now since
> I have many people relying on it), is there any kind of way for me to
> re-create the flowtable with the added 'counter' parameter without
> impacting traffic?

It should be possible to:

 delete rule inet filter y handle 3
 delete flowtable inet filter

but transaction code for the flowtable is buggy :-\

Two more fixes: It looks like EEXIST is also bogusly reported in case of
add-after-delete flowtable in the same batch.

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210317201957.13165-1-pablo@netfilter.org/
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210317201957.13165-2-pablo@netfilter.org/

I made a regression test for nft to make sure this works fine in the
future:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20210317203636.14869-1-pablo@netfilter.org/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-17 20:42         ` Pablo Neira Ayuso
@ 2021-03-17 22:01           ` Martin Gignac
  2021-03-17 22:28             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Gignac @ 2021-03-17 22:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

If I just run:

    nft list ruleset > rules.txt

add:

    flush ruleset

to the top of the file, add the:

    counter

statement to the flowtable section, and then:

    nft -f rules.txt

This should atomically add the "counter", but not impact traffic in
any way, shape or form, correct?

-Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-17 22:01           ` Martin Gignac
@ 2021-03-17 22:28             ` Pablo Neira Ayuso
  2021-03-18  2:23               ` Martin Gignac
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2021-03-17 22:28 UTC (permalink / raw)
  To: Martin Gignac; +Cc: netfilter

On Wed, Mar 17, 2021 at 06:01:43PM -0400, Martin Gignac wrote:
> If I just run:
> 
>     nft list ruleset > rules.txt
> 
> add:
> 
>     flush ruleset
> 
> to the top of the file, add the:
> 
>     counter
> 
> statement to the flowtable section, and then:
> 
>     nft -f rules.txt
> 
> This should atomically add the "counter", but not impact traffic in
> any way, shape or form, correct?

It turns on packets and bytes counters:

# conntrack -L
tcp      6 src=10.141.10.2 dst=192.168.10.2 sport=57758 dport=5201 packets=1998758 bytes=87532896157 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=57758 packets=1966493 bytes=102257896 [OFFLOAD] mark=0 use=2

You also have to enable counters in conntrack:

echo 1 > /proc/sys/net/netfilter/nf_conntrack_acct 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-17 22:28             ` Pablo Neira Ayuso
@ 2021-03-18  2:23               ` Martin Gignac
  2021-03-18 16:20                 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Gignac @ 2021-03-18  2:23 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

Hi Pablo,

I was finally able to reproduce the IPv6 lockup with the flowtable
counters turned on. I had conntrack -L running under 'watch' with some
greps to isolate the specific flow I wanted to check out. I also had a
tcpdump running on the OpenVPN tun interface and another tcpdump
running on the bonded VLAN interface to compare both.

When a lockup occurred, as I said earlier, I could see some packets
coming in on the bonded VLAN interface but not being sent out the tun0
interface. When those packets came in, I *did* see the packet count
increase by one for the "packet=" metric for that specific direction
for every one of those packets.

Sometimes, after some time being locked up, the state of the session
would move back to "ESTABLISHED [ASSURED]" (but traffic would remain
"stuck") until the point where traffic would suddenly resume, and then
the session would move back to "[OFFLOAD]" state again.

Commenting out the rule that offloaded IPv6 to the flowtable in the
ruleset. and reloading that ruleset with "nft -f rules.txt"
immediately fixed the lockup.

Am I the only person that's reported any kind of issue with flowtable
and IPv6? Maybe it's something about my setup...

-Martin

On Wed, Mar 17, 2021 at 6:28 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>
> On Wed, Mar 17, 2021 at 06:01:43PM -0400, Martin Gignac wrote:
> > If I just run:
> >
> >     nft list ruleset > rules.txt
> >
> > add:
> >
> >     flush ruleset
> >
> > to the top of the file, add the:
> >
> >     counter
> >
> > statement to the flowtable section, and then:
> >
> >     nft -f rules.txt
> >
> > This should atomically add the "counter", but not impact traffic in
> > any way, shape or form, correct?
>
> It turns on packets and bytes counters:
>
> # conntrack -L
> tcp      6 src=10.141.10.2 dst=192.168.10.2 sport=57758 dport=5201 packets=1998758 bytes=87532896157 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=57758 packets=1966493 bytes=102257896 [OFFLOAD] mark=0 use=2
>
> You also have to enable counters in conntrack:
>
> echo 1 > /proc/sys/net/netfilter/nf_conntrack_acct

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-18  2:23               ` Martin Gignac
@ 2021-03-18 16:20                 ` Pablo Neira Ayuso
  2021-03-18 17:00                   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2021-03-18 16:20 UTC (permalink / raw)
  To: Martin Gignac; +Cc: netfilter

On Wed, Mar 17, 2021 at 10:23:04PM -0400, Martin Gignac wrote:
> Hi Pablo,
> 
> I was finally able to reproduce the IPv6 lockup with the flowtable
> counters turned on. I had conntrack -L running under 'watch' with some
> greps to isolate the specific flow I wanted to check out. I also had a
> tcpdump running on the OpenVPN tun interface and another tcpdump
> running on the bonded VLAN interface to compare both.
> 
> When a lockup occurred, as I said earlier, I could see some packets
> coming in on the bonded VLAN interface but not being sent out the tun0
> interface. When those packets came in, I *did* see the packet count
> increase by one for the "packet=" metric for that specific direction
> for every one of those packets.
> 
> Sometimes, after some time being locked up, the state of the session
> would move back to "ESTABLISHED [ASSURED]" (but traffic would remain
> "stuck") until the point where traffic would suddenly resume, and then
> the session would move back to "[OFFLOAD]" state again.
> 
> Commenting out the rule that offloaded IPv6 to the flowtable in the
> ruleset. and reloading that ruleset with "nft -f rules.txt"
> immediately fixed the lockup.
> 
> Am I the only person that's reported any kind of issue with flowtable
> and IPv6? Maybe it's something about my setup...

My IPv6 testbed is working fine here.

I just checked that kernel-5.10.23-200.fc33 contains

commit 8d6bca156e47d68551750a384b3ff49384c67be3
Author: Sven Auhagen <sven.auhagen@voleatech.de>
Date:   Tue Feb 2 18:01:16 2021 +0100

    netfilter: flowtable: fix tcp and udp header checksum update
    
    When updating the tcp or udp header checksum on port nat the function
    inet_proto_csum_replace2 with the last parameter pseudohdr as true.
    This leads to an error in the case that GRO is used and packets are
    split up in GSO. The tcp or udp checksum of all packets is incorrect.
    
    The error is probably masked due to the fact the most network driver
    implement tcp/udp checksum offloading. It also only happens when GRO is
    applied and not on single packets.
    
    The error is most visible when using a pppoe connection which is not
    triggering the tcp/udp checksum offload.

which looks similar to your issue.

I don't have access to kernel 5.10.17-200.fc33.x86_64, it's been
replaced in the mirrors I have access to by kernel-5.10.23-200.fc33.

It would be good to confirm you have this fix before looking somewhere
else.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-18 16:20                 ` Pablo Neira Ayuso
@ 2021-03-18 17:00                   ` Pablo Neira Ayuso
  2021-03-18 17:24                     ` Martin Gignac
  0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2021-03-18 17:00 UTC (permalink / raw)
  To: Martin Gignac; +Cc: netfilter

On Thu, Mar 18, 2021 at 05:20:59PM +0100, Pablo Neira Ayuso wrote:
> On Wed, Mar 17, 2021 at 10:23:04PM -0400, Martin Gignac wrote:
> > Hi Pablo,
> > 
> > I was finally able to reproduce the IPv6 lockup with the flowtable
> > counters turned on. I had conntrack -L running under 'watch' with some
> > greps to isolate the specific flow I wanted to check out. I also had a
> > tcpdump running on the OpenVPN tun interface and another tcpdump
> > running on the bonded VLAN interface to compare both.
> > 
> > When a lockup occurred, as I said earlier, I could see some packets
> > coming in on the bonded VLAN interface but not being sent out the tun0
> > interface. When those packets came in, I *did* see the packet count
> > increase by one for the "packet=" metric for that specific direction
> > for every one of those packets.
> > 
> > Sometimes, after some time being locked up, the state of the session
> > would move back to "ESTABLISHED [ASSURED]" (but traffic would remain
> > "stuck") until the point where traffic would suddenly resume, and then
> > the session would move back to "[OFFLOAD]" state again.
> > 
> > Commenting out the rule that offloaded IPv6 to the flowtable in the
> > ruleset. and reloading that ruleset with "nft -f rules.txt"
> > immediately fixed the lockup.
> > 
> > Am I the only person that's reported any kind of issue with flowtable
> > and IPv6? Maybe it's something about my setup...
> 
> My IPv6 testbed is working fine here.
> 
> I just checked that kernel-5.10.23-200.fc33 contains
> 
> commit 8d6bca156e47d68551750a384b3ff49384c67be3
> Author: Sven Auhagen <sven.auhagen@voleatech.de>
> Date:   Tue Feb 2 18:01:16 2021 +0100
> 
>     netfilter: flowtable: fix tcp and udp header checksum update
>     
>     When updating the tcp or udp header checksum on port nat the function
>     inet_proto_csum_replace2 with the last parameter pseudohdr as true.
>     This leads to an error in the case that GRO is used and packets are
>     split up in GSO. The tcp or udp checksum of all packets is incorrect.
>     
>     The error is probably masked due to the fact the most network driver
>     implement tcp/udp checksum offloading. It also only happens when GRO is
>     applied and not on single packets.
>     
>     The error is most visible when using a pppoe connection which is not
>     triggering the tcp/udp checksum offload.
> 
> which looks similar to your issue.
> 
> I don't have access to kernel 5.10.17-200.fc33.x86_64, it's been
> replaced in the mirrors I have access to by kernel-5.10.23-200.fc33.
> 
> It would be good to confirm you have this fix before looking somewhere
> else.

I just checked, 5.10.17-200.fc33.x86_64 already contains the fix above.
No need to check.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: How to troubleshoot (suspected) flowtable lockups/packet drops?
  2021-03-18 17:00                   ` Pablo Neira Ayuso
@ 2021-03-18 17:24                     ` Martin Gignac
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Gignac @ 2021-03-18 17:24 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter

> I just checked, 5.10.17-200.fc33.x86_64 already contains the fix above.
> No need to check.

Bummer. Thanks for checking!

-Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-03-18 17:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 15:43 How to troubleshoot (suspected) flowtable lockups/packet drops? Martin Gignac
2021-03-16 23:05 ` Pablo Neira Ayuso
2021-03-17  1:37   ` Martin Gignac
2021-03-17 10:34     ` Pablo Neira Ayuso
2021-03-17 19:07       ` Martin Gignac
2021-03-17 20:42         ` Pablo Neira Ayuso
2021-03-17 22:01           ` Martin Gignac
2021-03-17 22:28             ` Pablo Neira Ayuso
2021-03-18  2:23               ` Martin Gignac
2021-03-18 16:20                 ` Pablo Neira Ayuso
2021-03-18 17:00                   ` Pablo Neira Ayuso
2021-03-18 17:24                     ` Martin Gignac

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.