[PATCH net] netfilter: nf_flow_table: fix teardown flow timeout

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout
@ 2022-05-09  7:29 Oz Shlomo
  2022-05-09  8:51 ` Sven Auhagen
  0 siblings, 1 reply; 6+ messages in thread
From: Oz Shlomo @ 2022-05-09  7:29 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Felix Fietkau
  Cc: netdev, netfilter-devel, Florian Westphal, Paul Blakey, Oz Shlomo

Connections leaving the established state (due to RST / FIN TCP packets)
set the flow table teardown flag. The packet path continues to set lower
timeout value as per the new TCP state but the offload flag remains set.
Hence, the conntrack garbage collector may race to undo the timeout
adjustment of the packet path, leaving the conntrack entry in place with
the internal offload timeout (one day).

Return the connection's ownership to conntrack upon teardown by clearing
the offload flag and fixing the established timeout value. The flow table
GC thread will asynchonrnously free the flow table and hardware offload
entries.

Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
---
 net/netfilter/nf_flow_table_core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 3db256da919b..ef080dbd4fd0 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -375,6 +375,9 @@ void flow_offload_teardown(struct flow_offload *flow)
 	set_bit(NF_FLOW_TEARDOWN, &flow->flags);

 	flow_offload_fixup_ct_state(flow->ct);
+	flow_offload_fixup_ct_timeout(flow->ct);
+
+	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown);

-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout
  2022-05-09  7:29 [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout Oz Shlomo
@ 2022-05-09  8:51 ` Sven Auhagen
  2022-05-09 12:18   ` Oz Shlomo
  0 siblings, 1 reply; 6+ messages in thread
From: Sven Auhagen @ 2022-05-09  8:51 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: Pablo Neira Ayuso, Felix Fietkau, netdev, netfilter-devel,
	Florian Westphal, Paul Blakey

Hi Oz,

thank you, this patch fixes the race between ct gc and flowtable teardown.
There is another big problem though in the code currently and I will send a patch
in a minute.

The flowtable teardown code always forces the ct state back to established
and adds the established timeout to it even if it is in CLOSE or FIN WAIT
which ultimately leads to a huge number of dead states in established state.

I will CC you on the patch, where I also stumbled upon your issue.

Best
Sven

On Mon, May 09, 2022 at 10:29:16AM +0300, Oz Shlomo wrote:
> Connections leaving the established state (due to RST / FIN TCP packets)
> set the flow table teardown flag. The packet path continues to set lower
> timeout value as per the new TCP state but the offload flag remains set.
> Hence, the conntrack garbage collector may race to undo the timeout
> adjustment of the packet path, leaving the conntrack entry in place with
> the internal offload timeout (one day).
> 
> Return the connection's ownership to conntrack upon teardown by clearing
> the offload flag and fixing the established timeout value. The flow table
> GC thread will asynchonrnously free the flow table and hardware offload
> entries.
> 
> Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
> Reviewed-by: Paul Blakey <paulb@nvidia.com>
> ---
>  net/netfilter/nf_flow_table_core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 3db256da919b..ef080dbd4fd0 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -375,6 +375,9 @@ void flow_offload_teardown(struct flow_offload *flow)
>  	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
>  
>  	flow_offload_fixup_ct_state(flow->ct);
> +	flow_offload_fixup_ct_timeout(flow->ct);
> +
> +	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
>  }
>  EXPORT_SYMBOL_GPL(flow_offload_teardown);
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout
  2022-05-09  8:51 ` Sven Auhagen
@ 2022-05-09 12:18   ` Oz Shlomo
  2022-05-09 12:27     ` Sven Auhagen
  0 siblings, 1 reply; 6+ messages in thread
From: Oz Shlomo @ 2022-05-09 12:18 UTC (permalink / raw)
  To: Sven Auhagen
  Cc: Pablo Neira Ayuso, Felix Fietkau, netdev, netfilter-devel,
	Florian Westphal, Paul Blakey

Hi Sven,

On 5/9/2022 11:51 AM, Sven Auhagen wrote:
> Hi Oz,
> 
> thank you, this patch fixes the race between ct gc and flowtable teardown.
> There is another big problem though in the code currently and I will send a patch
> in a minute.
> 
> The flowtable teardown code always forces the ct state back to established
> and adds the established timeout to it even if it is in CLOSE or FIN WAIT
> which ultimately leads to a huge number of dead states in established state.

The system's design assumes that connections are added to nf flowtable 
when they are in established state and are removed when they are about 
to leave the established state.
It is the front-end's responsibility to enforce this behavior.
Currently nf flowtable is used by nft and tc act_ct.

act_ct removes the connection from nf flowtable when a fin/rst packet is 
received. So the connection is still in established state when it is 
removed from nf flow table (see tcf_ct_flow_table_lookup).
act_ct then calls nf_conntrack_in (for the same packet) which will 
transition the connection to close/fin_wait state .

I am less familiar with nft internals.

> 
> I will CC you on the patch, where I also stumbled upon your issue.

I reviewed the patch but I did not understand how it addresses the issue 
that is fixed here.

The issue here is that IPS_OFFLOAD_BIT remains set when teardown is 
called and the connection transitions to close/fin_wait state.
That can potentially race with the nf gc which assumes that the 
connection is owned by nf flowtable and sets a one day timeout.

> 
> Best
> Sven
> 
> On Mon, May 09, 2022 at 10:29:16AM +0300, Oz Shlomo wrote:
>> Connections leaving the established state (due to RST / FIN TCP packets)
>> set the flow table teardown flag. The packet path continues to set lower
>> timeout value as per the new TCP state but the offload flag remains set.
>> Hence, the conntrack garbage collector may race to undo the timeout
>> adjustment of the packet path, leaving the conntrack entry in place with
>> the internal offload timeout (one day).
>>
>> Return the connection's ownership to conntrack upon teardown by clearing
>> the offload flag and fixing the established timeout value. The flow table
>> GC thread will asynchonrnously free the flow table and hardware offload
>> entries.
>>
>> Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
>> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
>> Reviewed-by: Paul Blakey <paulb@nvidia.com>
>> ---
>>   net/netfilter/nf_flow_table_core.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index 3db256da919b..ef080dbd4fd0 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -375,6 +375,9 @@ void flow_offload_teardown(struct flow_offload *flow)
>>   	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
>>   
>>   	flow_offload_fixup_ct_state(flow->ct);
>> +	flow_offload_fixup_ct_timeout(flow->ct);
>> +
>> +	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
>>   }
>>   EXPORT_SYMBOL_GPL(flow_offload_teardown);
>>   
>> -- 
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout
  2022-05-09 12:18   ` Oz Shlomo
@ 2022-05-09 12:27     ` Sven Auhagen
  2022-05-09 13:01       ` Oz Shlomo
  0 siblings, 1 reply; 6+ messages in thread
From: Sven Auhagen @ 2022-05-09 12:27 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: Pablo Neira Ayuso, Felix Fietkau, netdev, netfilter-devel,
	Florian Westphal, Paul Blakey

On Mon, May 09, 2022 at 03:18:42PM +0300, Oz Shlomo wrote:
Hi Oz,

> Hi Sven,
> 
> On 5/9/2022 11:51 AM, Sven Auhagen wrote:
> > Hi Oz,
> > 
> > thank you, this patch fixes the race between ct gc and flowtable teardown.
> > There is another big problem though in the code currently and I will send a patch
> > in a minute.
> > 
> > The flowtable teardown code always forces the ct state back to established
> > and adds the established timeout to it even if it is in CLOSE or FIN WAIT
> > which ultimately leads to a huge number of dead states in established state.
> 
> The system's design assumes that connections are added to nf flowtable when
> they are in established state and are removed when they are about to leave
> the established state.
> It is the front-end's responsibility to enforce this behavior.
> Currently nf flowtable is used by nft and tc act_ct.
> 
> act_ct removes the connection from nf flowtable when a fin/rst packet is
> received. So the connection is still in established state when it is removed
> from nf flow table (see tcf_ct_flow_table_lookup).
> act_ct then calls nf_conntrack_in (for the same packet) which will
> transition the connection to close/fin_wait state .
> 
> I am less familiar with nft internals.
> 

It is added when the ct state is established but not the TCP state.
Sorry, I was probably a little unclear about what is ct and what is TCP.

The TCP 3 way handshake is basically stopped after the second packet
because this will create the flow table entry.
The 3rd packet ACK is not seen by nftables anymore but passes through
the flowtable already which leaves the TCP state in SYN_RECV.

The flowtable is passing TCP FIN/RST up to nftables/slowpath for
processing while the flowtable entry is still active.
The TCP FIN will move the TCP state from SYN_RECV to FIN_WAIT and
also at some point the gc triggers the flowtable teardown.
The flowtable teardown then sets the TCP state back to ESTABLISHED
and puts the long ESTABLISHED timeout into the TCP state even though
the TCP connection is closed.

My patch basically also incorporates your change by adding the clearing
of the offload bit to the end of the teardown processing.
The problem in general though is that the teardown has no check or
assumption about what the current status of the TCP connection is.
It just sets it to established.

I hope that it explains it in a better way.

Best
Sven

> > 
> > I will CC you on the patch, where I also stumbled upon your issue.
> 
> I reviewed the patch but I did not understand how it addresses the issue
> that is fixed here.
> 
> The issue here is that IPS_OFFLOAD_BIT remains set when teardown is called
> and the connection transitions to close/fin_wait state.
> That can potentially race with the nf gc which assumes that the connection
> is owned by nf flowtable and sets a one day timeout.
> 
> > 
> > Best
> > Sven
> > 
> > On Mon, May 09, 2022 at 10:29:16AM +0300, Oz Shlomo wrote:
> > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > set the flow table teardown flag. The packet path continues to set lower
> > > timeout value as per the new TCP state but the offload flag remains set.
> > > Hence, the conntrack garbage collector may race to undo the timeout
> > > adjustment of the packet path, leaving the conntrack entry in place with
> > > the internal offload timeout (one day).
> > > 
> > > Return the connection's ownership to conntrack upon teardown by clearing
> > > the offload flag and fixing the established timeout value. The flow table
> > > GC thread will asynchonrnously free the flow table and hardware offload
> > > entries.
> > > 
> > > Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
> > > Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
> > > Reviewed-by: Paul Blakey <paulb@nvidia.com>
> > > ---
> > >   net/netfilter/nf_flow_table_core.c | 3 +++
> > >   1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > > index 3db256da919b..ef080dbd4fd0 100644
> > > --- a/net/netfilter/nf_flow_table_core.c
> > > +++ b/net/netfilter/nf_flow_table_core.c
> > > @@ -375,6 +375,9 @@ void flow_offload_teardown(struct flow_offload *flow)
> > >   	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > >   	flow_offload_fixup_ct_state(flow->ct);
> > > +	flow_offload_fixup_ct_timeout(flow->ct);
> > > +
> > > +	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > >   }
> > >   EXPORT_SYMBOL_GPL(flow_offload_teardown);
> > > -- 
> > > 1.8.3.1
> > > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout
  2022-05-09 12:27     ` Sven Auhagen
@ 2022-05-09 13:01       ` Oz Shlomo
  2022-05-09 13:14         ` Sven Auhagen
  0 siblings, 1 reply; 6+ messages in thread
From: Oz Shlomo @ 2022-05-09 13:01 UTC (permalink / raw)
  To: Sven Auhagen
  Cc: Pablo Neira Ayuso, Felix Fietkau, netdev, netfilter-devel,
	Florian Westphal, Paul Blakey

Hi Sven,

On 5/9/2022 3:27 PM, Sven Auhagen wrote:
> On Mon, May 09, 2022 at 03:18:42PM +0300, Oz Shlomo wrote:
> Hi Oz,
> 
>> Hi Sven,
>>
>> On 5/9/2022 11:51 AM, Sven Auhagen wrote:
>>> Hi Oz,
>>>
>>> thank you, this patch fixes the race between ct gc and flowtable teardown.
>>> There is another big problem though in the code currently and I will send a patch
>>> in a minute.
>>>
>>> The flowtable teardown code always forces the ct state back to established
>>> and adds the established timeout to it even if it is in CLOSE or FIN WAIT
>>> which ultimately leads to a huge number of dead states in established state.
>>
>> The system's design assumes that connections are added to nf flowtable when
>> they are in established state and are removed when they are about to leave
>> the established state.
>> It is the front-end's responsibility to enforce this behavior.
>> Currently nf flowtable is used by nft and tc act_ct.
>>
>> act_ct removes the connection from nf flowtable when a fin/rst packet is
>> received. So the connection is still in established state when it is removed
>> from nf flow table (see tcf_ct_flow_table_lookup).
>> act_ct then calls nf_conntrack_in (for the same packet) which will
>> transition the connection to close/fin_wait state .
>>
>> I am less familiar with nft internals.
>>
> 
> It is added when the ct state is established but not the TCP state.
> Sorry, I was probably a little unclear about what is ct and what is TCP.
> 
> The TCP 3 way handshake is basically stopped after the second packet
> because this will create the flow table entry.
> The 3rd packet ACK is not seen by nftables anymore but passes through
> the flowtable already which leaves the TCP state in SYN_RECV.
> 
> The flowtable is passing TCP FIN/RST up to nftables/slowpath for
> processing while the flowtable entry is still active.
> The TCP FIN will move the TCP state from SYN_RECV to FIN_WAIT and
> also at some point the gc triggers the flowtable teardown.

So perhaps TCP FIN packets processing should first call 
flow_offload_teardown and then process the packet through the slow path.
This will ensure that the flowtable entry is invalidated when the 
connection is still in established state (similar to what act_ct is doing).

> The flowtable teardown then sets the TCP state back to ESTABLISHED
> and puts the long ESTABLISHED timeout into the TCP state even though
> the TCP connection is closed.
> 
> My patch basically also incorporates your change by adding the clearing
> of the offload bit to the end of the teardown processing.
> The problem in general though is that the teardown has no check or
> assumption about what the current status of the TCP connection is.
> It just sets it to established.
> 

AFAIU your patch clears the bit at flow_offload_del.
If so, then it can still race as flow_offload_del is called by the 
flowtable gc thread while the flow is deleted by flow_offload_teardown 
on the dp thread.

> I hope that it explains it in a better way.
> 
> Best
> Sven
> 
>>>
>>> I will CC you on the patch, where I also stumbled upon your issue.
>>
>> I reviewed the patch but I did not understand how it addresses the issue
>> that is fixed here.
>>
>> The issue here is that IPS_OFFLOAD_BIT remains set when teardown is called
>> and the connection transitions to close/fin_wait state.
>> That can potentially race with the nf gc which assumes that the connection
>> is owned by nf flowtable and sets a one day timeout.
>>
>>>
>>> Best
>>> Sven
>>>
>>> On Mon, May 09, 2022 at 10:29:16AM +0300, Oz Shlomo wrote:
>>>> Connections leaving the established state (due to RST / FIN TCP packets)
>>>> set the flow table teardown flag. The packet path continues to set lower
>>>> timeout value as per the new TCP state but the offload flag remains set.
>>>> Hence, the conntrack garbage collector may race to undo the timeout
>>>> adjustment of the packet path, leaving the conntrack entry in place with
>>>> the internal offload timeout (one day).
>>>>
>>>> Return the connection's ownership to conntrack upon teardown by clearing
>>>> the offload flag and fixing the established timeout value. The flow table
>>>> GC thread will asynchonrnously free the flow table and hardware offload
>>>> entries.
>>>>
>>>> Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
>>>> Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
>>>> Reviewed-by: Paul Blakey <paulb@nvidia.com>
>>>> ---
>>>>    net/netfilter/nf_flow_table_core.c | 3 +++
>>>>    1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>>>> index 3db256da919b..ef080dbd4fd0 100644
>>>> --- a/net/netfilter/nf_flow_table_core.c
>>>> +++ b/net/netfilter/nf_flow_table_core.c
>>>> @@ -375,6 +375,9 @@ void flow_offload_teardown(struct flow_offload *flow)
>>>>    	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
>>>>    	flow_offload_fixup_ct_state(flow->ct);
>>>> +	flow_offload_fixup_ct_timeout(flow->ct);
>>>> +
>>>> +	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(flow_offload_teardown);
>>>> -- 
>>>> 1.8.3.1
>>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout
  2022-05-09 13:01       ` Oz Shlomo
@ 2022-05-09 13:14         ` Sven Auhagen
  0 siblings, 0 replies; 6+ messages in thread
From: Sven Auhagen @ 2022-05-09 13:14 UTC (permalink / raw)
  To: Oz Shlomo
  Cc: Pablo Neira Ayuso, Felix Fietkau, netdev, netfilter-devel,
	Florian Westphal, Paul Blakey

Hi Oz,

On Mon, May 09, 2022 at 04:01:37PM +0300, Oz Shlomo wrote:
> Hi Sven,
> 
> On 5/9/2022 3:27 PM, Sven Auhagen wrote:
> > On Mon, May 09, 2022 at 03:18:42PM +0300, Oz Shlomo wrote:
> > Hi Oz,
> > 
> > > Hi Sven,
> > > 
> > > On 5/9/2022 11:51 AM, Sven Auhagen wrote:
> > > > Hi Oz,
> > > > 
> > > > thank you, this patch fixes the race between ct gc and flowtable teardown.
> > > > There is another big problem though in the code currently and I will send a patch
> > > > in a minute.
> > > > 
> > > > The flowtable teardown code always forces the ct state back to established
> > > > and adds the established timeout to it even if it is in CLOSE or FIN WAIT
> > > > which ultimately leads to a huge number of dead states in established state.
> > > 
> > > The system's design assumes that connections are added to nf flowtable when
> > > they are in established state and are removed when they are about to leave
> > > the established state.
> > > It is the front-end's responsibility to enforce this behavior.
> > > Currently nf flowtable is used by nft and tc act_ct.
> > > 
> > > act_ct removes the connection from nf flowtable when a fin/rst packet is
> > > received. So the connection is still in established state when it is removed
> > > from nf flow table (see tcf_ct_flow_table_lookup).
> > > act_ct then calls nf_conntrack_in (for the same packet) which will
> > > transition the connection to close/fin_wait state .
> > > 
> > > I am less familiar with nft internals.
> > > 
> > 
> > It is added when the ct state is established but not the TCP state.
> > Sorry, I was probably a little unclear about what is ct and what is TCP.
> > 
> > The TCP 3 way handshake is basically stopped after the second packet
> > because this will create the flow table entry.
> > The 3rd packet ACK is not seen by nftables anymore but passes through
> > the flowtable already which leaves the TCP state in SYN_RECV.
> > 
> > The flowtable is passing TCP FIN/RST up to nftables/slowpath for
> > processing while the flowtable entry is still active.
> > The TCP FIN will move the TCP state from SYN_RECV to FIN_WAIT and
> > also at some point the gc triggers the flowtable teardown.
> 
> So perhaps TCP FIN packets processing should first call
> flow_offload_teardown and then process the packet through the slow path.
> This will ensure that the flowtable entry is invalidated when the connection
> is still in established state (similar to what act_ct is doing).
> 

It does so at this point but it will again be set to established during flowtable delete.
If you clear the IPS_OFFLOAD_BIT in the flow_offload_teardown, as your patch does,
you will create a race condition between ct gc and flow_offload_del which will
now access ct data which have been deallocated by ct gc and will lead to
memory corruption.
So your patch will not fix the problem correctly and leads to more issues.

My patch moves all the work to flow_offload_del since this is the last step
taken before a flowtable flow is deallocated and therefore a ct state must be
safe to access. Only after the function has run all its code, the
IPS_OFFLOAD_BIT can safely be cleared.

There are more problems, if the flowtable can not process a packet for any
reason, it will push it up to nftables/slowpath.
So we might miss the FIN in the flowtable code and this might also
change the TCP state.
Therefore the TCP state must be set to ESTABLISHED somewhere in the
flowtable code so we can be sure that it has changed at
flow_offload_del or it is still established.

There is no way of resolving this on the nftable side only.

Best
Sven

> > The flowtable teardown then sets the TCP state back to ESTABLISHED
> > and puts the long ESTABLISHED timeout into the TCP state even though
> > the TCP connection is closed.
> > 
> > My patch basically also incorporates your change by adding the clearing
> > of the offload bit to the end of the teardown processing.
> > The problem in general though is that the teardown has no check or
> > assumption about what the current status of the TCP connection is.
> > It just sets it to established.
> > 
> 
> AFAIU your patch clears the bit at flow_offload_del.
> If so, then it can still race as flow_offload_del is called by the flowtable
> gc thread while the flow is deleted by flow_offload_teardown on the dp
> thread.
> 
> > I hope that it explains it in a better way.
> > 
> > Best
> > Sven
> > 
> > > > 
> > > > I will CC you on the patch, where I also stumbled upon your issue.
> > > 
> > > I reviewed the patch but I did not understand how it addresses the issue
> > > that is fixed here.
> > > 
> > > The issue here is that IPS_OFFLOAD_BIT remains set when teardown is called
> > > and the connection transitions to close/fin_wait state.
> > > That can potentially race with the nf gc which assumes that the connection
> > > is owned by nf flowtable and sets a one day timeout.
> > > 
> > > > 
> > > > Best
> > > > Sven
> > > > 
> > > > On Mon, May 09, 2022 at 10:29:16AM +0300, Oz Shlomo wrote:
> > > > > Connections leaving the established state (due to RST / FIN TCP packets)
> > > > > set the flow table teardown flag. The packet path continues to set lower
> > > > > timeout value as per the new TCP state but the offload flag remains set.
> > > > > Hence, the conntrack garbage collector may race to undo the timeout
> > > > > adjustment of the packet path, leaving the conntrack entry in place with
> > > > > the internal offload timeout (one day).
> > > > > 
> > > > > Return the connection's ownership to conntrack upon teardown by clearing
> > > > > the offload flag and fixing the established timeout value. The flow table
> > > > > GC thread will asynchonrnously free the flow table and hardware offload
> > > > > entries.
> > > > > 
> > > > > Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
> > > > > Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
> > > > > Reviewed-by: Paul Blakey <paulb@nvidia.com>
> > > > > ---
> > > > >    net/netfilter/nf_flow_table_core.c | 3 +++
> > > > >    1 file changed, 3 insertions(+)
> > > > > 
> > > > > diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> > > > > index 3db256da919b..ef080dbd4fd0 100644
> > > > > --- a/net/netfilter/nf_flow_table_core.c
> > > > > +++ b/net/netfilter/nf_flow_table_core.c
> > > > > @@ -375,6 +375,9 @@ void flow_offload_teardown(struct flow_offload *flow)
> > > > >    	set_bit(NF_FLOW_TEARDOWN, &flow->flags);
> > > > >    	flow_offload_fixup_ct_state(flow->ct);
> > > > > +	flow_offload_fixup_ct_timeout(flow->ct);
> > > > > +
> > > > > +	clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
> > > > >    }
> > > > >    EXPORT_SYMBOL_GPL(flow_offload_teardown);
> > > > > -- 
> > > > > 1.8.3.1
> > > > > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-09 13:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09  7:29 [PATCH net] netfilter: nf_flow_table: fix teardown flow timeout Oz Shlomo
2022-05-09  8:51 ` Sven Auhagen
2022-05-09 12:18   ` Oz Shlomo
2022-05-09 12:27     ` Sven Auhagen
2022-05-09 13:01       ` Oz Shlomo
2022-05-09 13:14         ` Sven Auhagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).