All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH nf-next] flow_table: do not try to add already offloaded entries
@ 2022-06-27 14:38 Marcelo Ricardo Leitner
  2022-06-27 15:19 ` Oz Shlomo
  0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-06-27 14:38 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Pablo Neira Ayuso, Oz Shlomo, Paul Blakey

Currently, whenever act_ct tries to match a packet against the flow
table, it will also try to refresh the offload. That is, at the end
of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().

The problem is that flow_offload_refresh() will try to offload entries
that are actually already offloaded, leading to expensive and useless
work. Before this patch, with a simple iperf3 test on OVS + TC
(hw_offload=true) + CT test entirely in sw, it looks like:

- 39,81% tcf_classify
   - fl_classify
      - 37,09% tcf_action_exec
         + 33,18% tcf_mirred_act
         - 2,69% tcf_ct_act
            - 2,39% tcf_ct_flow_table_lookup
               - 1,67% queue_work_on
                  - 1,52% __queue_work
                       1,20% try_to_wake_up
         + 0,80% tcf_pedit_act
      + 2,28% fl_mask_lookup

The patch here aborts the add operation if the entry is already present
in hw. With this patch, then:

- 43,94% tcf_classify
   - fl_classify
      - 39,64% tcf_action_exec
         + 38,00% tcf_mirred_act
         - 1,04% tcf_ct_act
              0,63% tcf_ct_flow_table_lookup
      + 3,19% fl_mask_lookup

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/netfilter/nf_flow_table_offload.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
 {
 	struct flow_offload_work *offload;
 
+	if (test_bit(NF_FLOW_HW, &flow->flags))
+		return;
+
 	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
 	if (!offload)
 		return;
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries
  2022-06-27 14:38 [PATCH nf-next] flow_table: do not try to add already offloaded entries Marcelo Ricardo Leitner
@ 2022-06-27 15:19 ` Oz Shlomo
  2022-06-27 17:06   ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 7+ messages in thread
From: Oz Shlomo @ 2022-06-27 15:19 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner, netfilter-devel; +Cc: Pablo Neira Ayuso, Paul Blakey

Hi Marcelo,

On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
> Currently, whenever act_ct tries to match a packet against the flow
> table, it will also try to refresh the offload. That is, at the end
> of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
> 
> The problem is that flow_offload_refresh() will try to offload entries
> that are actually already offloaded, leading to expensive and useless
> work. Before this patch, with a simple iperf3 test on OVS + TC

Packets of offloaded connections are expected to process in hardware.
As such, it is not expected to receive packets in software from 
offloaded connections.

However, hardware offload may fail due to various reasons (e.g. size 
limits, insertion rate throttling etc.).
The "refresh" mechanism is the enabler for offload retries.


> (hw_offload=true) + CT test entirely in sw, it looks like:
> 
> - 39,81% tcf_classify
>     - fl_classify
>        - 37,09% tcf_action_exec
>           + 33,18% tcf_mirred_act
>           - 2,69% tcf_ct_act
>              - 2,39% tcf_ct_flow_table_lookup
>                 - 1,67% queue_work_on
>                    - 1,52% __queue_work
>                         1,20% try_to_wake_up
>           + 0,80% tcf_pedit_act
>        + 2,28% fl_mask_lookup
> 
> The patch here aborts the add operation if the entry is already present
> in hw. With this patch, then:
> 
> - 43,94% tcf_classify
>     - fl_classify
>        - 39,64% tcf_action_exec
>           + 38,00% tcf_mirred_act
>           - 1,04% tcf_ct_act
>                0,63% tcf_ct_flow_table_lookup
>        + 3,19% fl_mask_lookup
> 
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---
>   net/netfilter/nf_flow_table_offload.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
> --- a/net/netfilter/nf_flow_table_offload.c
> +++ b/net/netfilter/nf_flow_table_offload.c
> @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
>   {
>   	struct flow_offload_work *offload;
>   
> +	if (test_bit(NF_FLOW_HW, &flow->flags))
> +		return;
> +

This change will make the refresh call obsolete as the NF_FLOW_HW bit is 
set on the first flow offload attempt.

>   	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
>   	if (!offload)
>   		return;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries
  2022-06-27 15:19 ` Oz Shlomo
@ 2022-06-27 17:06   ` Marcelo Ricardo Leitner
  2022-06-28 13:13     ` Oz Shlomo
  0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-06-27 17:06 UTC (permalink / raw)
  To: Oz Shlomo; +Cc: netfilter-devel, Pablo Neira Ayuso, Paul Blakey

Hi Oz,

On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
> Hi Marcelo,
> 
> On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
> > Currently, whenever act_ct tries to match a packet against the flow
> > table, it will also try to refresh the offload. That is, at the end
> > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
> > 
> > The problem is that flow_offload_refresh() will try to offload entries
> > that are actually already offloaded, leading to expensive and useless
> > work. Before this patch, with a simple iperf3 test on OVS + TC
> 
> Packets of offloaded connections are expected to process in hardware.
> As such, it is not expected to receive packets in software from offloaded
> connections.
> 
> However, hardware offload may fail due to various reasons (e.g. size limits,
> insertion rate throttling etc.).
> The "refresh" mechanism is the enabler for offload retries.

Thanks for the quick review.

Right. I don't mean to break this mechanism. Act_ct can also be used
in semi/pure sw datapath, and then the premise of packets being
expected to be handled in hw is not valid anymore. I can provide a
more detailed use case if you need.

> 
> 
> > (hw_offload=true) + CT test entirely in sw, it looks like:
> > 
> > - 39,81% tcf_classify
> >     - fl_classify
> >        - 37,09% tcf_action_exec
> >           + 33,18% tcf_mirred_act
> >           - 2,69% tcf_ct_act
> >              - 2,39% tcf_ct_flow_table_lookup
> >                 - 1,67% queue_work_on
> >                    - 1,52% __queue_work
> >                         1,20% try_to_wake_up
> >           + 0,80% tcf_pedit_act
> >        + 2,28% fl_mask_lookup
> > 
> > The patch here aborts the add operation if the entry is already present
> > in hw. With this patch, then:
> > 
> > - 43,94% tcf_classify
> >     - fl_classify
> >        - 39,64% tcf_action_exec
> >           + 38,00% tcf_mirred_act
> >           - 1,04% tcf_ct_act
> >                0,63% tcf_ct_flow_table_lookup
> >        + 3,19% fl_mask_lookup
> > 
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > ---
> >   net/netfilter/nf_flow_table_offload.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
> > --- a/net/netfilter/nf_flow_table_offload.c
> > +++ b/net/netfilter/nf_flow_table_offload.c
> > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
> >   {
> >   	struct flow_offload_work *offload;
> > +	if (test_bit(NF_FLOW_HW, &flow->flags))
> > +		return;
> > +
> 
> This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
> on the first flow offload attempt.

Oh oh.. I was quite sure it was getting cleared when the entry was
removed from HW, but not.

So instead of the if() above, what about:
+       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))

AFAICT it will keep trying while the entry is not present in the flow
table, and stop while it is there. Once the entry is aged from HW, it
is also removed from the flow table, so this part should be okay. 
But if the offload failed for some reason like you said above, and the
entry is left on the flow table, it won't retry until it ages out from
the flow table.

If you expect that this situation can be easily triggered, we may need
to add a rate limit instead then. Even for these connections that
failed to offload, this "busy retrying" is expensive and may backfire
in such situation.

> 
> >   	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
> >   	if (!offload)
> >   		return;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries
  2022-06-27 17:06   ` Marcelo Ricardo Leitner
@ 2022-06-28 13:13     ` Oz Shlomo
  2022-06-28 15:25       ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 7+ messages in thread
From: Oz Shlomo @ 2022-06-28 13:13 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netfilter-devel, Pablo Neira Ayuso, Paul Blakey

Hi Marcelo,


On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote:
> Hi Oz,
> 
> On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
>> Hi Marcelo,
>>
>> On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
>>> Currently, whenever act_ct tries to match a packet against the flow
>>> table, it will also try to refresh the offload. That is, at the end
>>> of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
>>>
>>> The problem is that flow_offload_refresh() will try to offload entries
>>> that are actually already offloaded, leading to expensive and useless
>>> work. Before this patch, with a simple iperf3 test on OVS + TC
>>
>> Packets of offloaded connections are expected to process in hardware.
>> As such, it is not expected to receive packets in software from offloaded
>> connections.
>>
>> However, hardware offload may fail due to various reasons (e.g. size limits,
>> insertion rate throttling etc.).
>> The "refresh" mechanism is the enabler for offload retries.
> 
> Thanks for the quick review.
> 
> Right. I don't mean to break this mechanism. Act_ct can also be used
> in semi/pure sw datapath, and then the premise of packets being
> expected to be handled in hw is not valid anymore. I can provide a
> more detailed use case if you need.

It is clear that the refresh design introduces some overhead when act_ct 
is used in a pure sw datapath.

> 
>>
>>
>>> (hw_offload=true) + CT test entirely in sw, it looks like:
>>>
>>> - 39,81% tcf_classify
>>>      - fl_classify
>>>         - 37,09% tcf_action_exec
>>>            + 33,18% tcf_mirred_act
>>>            - 2,69% tcf_ct_act
>>>               - 2,39% tcf_ct_flow_table_lookup
>>>                  - 1,67% queue_work_on
>>>                     - 1,52% __queue_work
>>>                          1,20% try_to_wake_up
>>>            + 0,80% tcf_pedit_act
>>>         + 2,28% fl_mask_lookup
>>>
>>> The patch here aborts the add operation if the entry is already present
>>> in hw. With this patch, then:
>>>
>>> - 43,94% tcf_classify
>>>      - fl_classify
>>>         - 39,64% tcf_action_exec
>>>            + 38,00% tcf_mirred_act
>>>            - 1,04% tcf_ct_act
>>>                 0,63% tcf_ct_flow_table_lookup
>>>         + 3,19% fl_mask_lookup
>>>
>>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> ---
>>>    net/netfilter/nf_flow_table_offload.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
>>> index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
>>> --- a/net/netfilter/nf_flow_table_offload.c
>>> +++ b/net/netfilter/nf_flow_table_offload.c
>>> @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
>>>    {
>>>    	struct flow_offload_work *offload;
>>> +	if (test_bit(NF_FLOW_HW, &flow->flags))
>>> +		return;
>>> +
>>
>> This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
>> on the first flow offload attempt.
> 
> Oh oh.. I was quite sure it was getting cleared when the entry was
> removed from HW, but not.
> 
> So instead of the if() above, what about:
> +       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))

I think this will set the IPS_HW_OFFLOAD_BIT prematurely.
Currently this bit is set only when the the flow has been successfully 
offloaded.

> 
> AFAICT it will keep trying while the entry is not present in the flow
> table, and stop while it is there. Once the entry is aged from HW, it
> is also removed from the flow table, so this part should be okay.
> But if the offload failed for some reason like you said above, and the
> entry is left on the flow table, it won't retry until it ages out from
> the flow table.

But then we will never have the chance to re-install it in hardware 
while the connection is still active.

> 
> If you expect that this situation can be easily triggered, we may need
> to add a rate limit instead then. Even for these connections that
> failed to offload, this "busy retrying" is expensive and may backfire
> in such situation.

Perhaps we can refresh only if the flow_block callbacks list is not empty.


> 
>>
>>>    	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
>>>    	if (!offload)
>>>    		return;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries
  2022-06-28 13:13     ` Oz Shlomo
@ 2022-06-28 15:25       ` Marcelo Ricardo Leitner
  2022-06-28 15:34         ` Oz Shlomo
  0 siblings, 1 reply; 7+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-06-28 15:25 UTC (permalink / raw)
  To: Oz Shlomo; +Cc: netfilter-devel, Pablo Neira Ayuso, Paul Blakey

On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote:
> Hi Marcelo,
> 
> 
> On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote:
> > Hi Oz,
> > 
> > On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
> > > Hi Marcelo,
> > > 
> > > On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
> > > > Currently, whenever act_ct tries to match a packet against the flow
> > > > table, it will also try to refresh the offload. That is, at the end
> > > > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
> > > > 
> > > > The problem is that flow_offload_refresh() will try to offload entries
> > > > that are actually already offloaded, leading to expensive and useless
> > > > work. Before this patch, with a simple iperf3 test on OVS + TC
> > > 
> > > Packets of offloaded connections are expected to process in hardware.
> > > As such, it is not expected to receive packets in software from offloaded
> > > connections.
> > > 
> > > However, hardware offload may fail due to various reasons (e.g. size limits,
> > > insertion rate throttling etc.).
> > > The "refresh" mechanism is the enabler for offload retries.
> > 
> > Thanks for the quick review.
> > 
> > Right. I don't mean to break this mechanism. Act_ct can also be used
> > in semi/pure sw datapath, and then the premise of packets being
> > expected to be handled in hw is not valid anymore. I can provide a
> > more detailed use case if you need.
> 
> It is clear that the refresh design introduces some overhead when act_ct is
> used in a pure sw datapath.

Cool.

> 
> > 
> > > 
> > > 
> > > > (hw_offload=true) + CT test entirely in sw, it looks like:
> > > > 
> > > > - 39,81% tcf_classify
> > > >      - fl_classify
> > > >         - 37,09% tcf_action_exec
> > > >            + 33,18% tcf_mirred_act
> > > >            - 2,69% tcf_ct_act
> > > >               - 2,39% tcf_ct_flow_table_lookup
> > > >                  - 1,67% queue_work_on
> > > >                     - 1,52% __queue_work
> > > >                          1,20% try_to_wake_up
> > > >            + 0,80% tcf_pedit_act
> > > >         + 2,28% fl_mask_lookup
> > > > 
> > > > The patch here aborts the add operation if the entry is already present
> > > > in hw. With this patch, then:
> > > > 
> > > > - 43,94% tcf_classify
> > > >      - fl_classify
> > > >         - 39,64% tcf_action_exec
> > > >            + 38,00% tcf_mirred_act
> > > >            - 1,04% tcf_ct_act
> > > >                 0,63% tcf_ct_flow_table_lookup
> > > >         + 3,19% fl_mask_lookup
> > > > 
> > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > ---
> > > >    net/netfilter/nf_flow_table_offload.c | 3 +++
> > > >    1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> > > > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
> > > > --- a/net/netfilter/nf_flow_table_offload.c
> > > > +++ b/net/netfilter/nf_flow_table_offload.c
> > > > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
> > > >    {
> > > >    	struct flow_offload_work *offload;
> > > > +	if (test_bit(NF_FLOW_HW, &flow->flags))
> > > > +		return;
> > > > +
> > > 
> > > This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
> > > on the first flow offload attempt.
> > 
> > Oh oh.. I was quite sure it was getting cleared when the entry was
> > removed from HW, but not.
> > 
> > So instead of the if() above, what about:
> > +       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))
> 
> I think this will set the IPS_HW_OFFLOAD_BIT prematurely.
> Currently this bit is set only when the the flow has been successfully
> offloaded.

Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry.

> 
> > 
> > AFAICT it will keep trying while the entry is not present in the flow
> > table, and stop while it is there. Once the entry is aged from HW, it
> > is also removed from the flow table, so this part should be okay.
> > But if the offload failed for some reason like you said above, and the
> > entry is left on the flow table, it won't retry until it ages out from
> > the flow table.
> 
> But then we will never have the chance to re-install it in hardware while
> the connection is still active.

Unless it goes idle, yes.

> 
> > 
> > If you expect that this situation can be easily triggered, we may need
> > to add a rate limit instead then. Even for these connections that
> > failed to offload, this "busy retrying" is expensive and may backfire
> > in such situation.
> 
> Perhaps we can refresh only if the flow_block callbacks list is not empty.

It may not be empty even for sw datapath. If you have packets coming
from the wire towards a veth/virtio, for example. It will likely
having a matching act_ct with the same zone number on both directions.
And/or if a zone is shared, as the the flow table then is also shared.

> 
> 
> > 
> > > 
> > > >    	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
> > > >    	if (!offload)
> > > >    		return;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries
  2022-06-28 15:25       ` Marcelo Ricardo Leitner
@ 2022-06-28 15:34         ` Oz Shlomo
  2022-06-28 15:44           ` Marcelo Ricardo Leitner
  0 siblings, 1 reply; 7+ messages in thread
From: Oz Shlomo @ 2022-06-28 15:34 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netfilter-devel, Pablo Neira Ayuso, Paul Blakey



On 6/28/2022 6:25 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote:
>> Hi Marcelo,
>>
>>
>> On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote:
>>> Hi Oz,
>>>
>>> On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
>>>> Hi Marcelo,
>>>>
>>>> On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
>>>>> Currently, whenever act_ct tries to match a packet against the flow
>>>>> table, it will also try to refresh the offload. That is, at the end
>>>>> of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
>>>>>
>>>>> The problem is that flow_offload_refresh() will try to offload entries
>>>>> that are actually already offloaded, leading to expensive and useless
>>>>> work. Before this patch, with a simple iperf3 test on OVS + TC
>>>>
>>>> Packets of offloaded connections are expected to process in hardware.
>>>> As such, it is not expected to receive packets in software from offloaded
>>>> connections.
>>>>
>>>> However, hardware offload may fail due to various reasons (e.g. size limits,
>>>> insertion rate throttling etc.).
>>>> The "refresh" mechanism is the enabler for offload retries.
>>>
>>> Thanks for the quick review.
>>>
>>> Right. I don't mean to break this mechanism. Act_ct can also be used
>>> in semi/pure sw datapath, and then the premise of packets being
>>> expected to be handled in hw is not valid anymore. I can provide a
>>> more detailed use case if you need.
>>
>> It is clear that the refresh design introduces some overhead when act_ct is
>> used in a pure sw datapath.
> 
> Cool.
> 
>>
>>>
>>>>
>>>>
>>>>> (hw_offload=true) + CT test entirely in sw, it looks like:
>>>>>
>>>>> - 39,81% tcf_classify
>>>>>       - fl_classify
>>>>>          - 37,09% tcf_action_exec
>>>>>             + 33,18% tcf_mirred_act
>>>>>             - 2,69% tcf_ct_act
>>>>>                - 2,39% tcf_ct_flow_table_lookup
>>>>>                   - 1,67% queue_work_on
>>>>>                      - 1,52% __queue_work
>>>>>                           1,20% try_to_wake_up
>>>>>             + 0,80% tcf_pedit_act
>>>>>          + 2,28% fl_mask_lookup
>>>>>
>>>>> The patch here aborts the add operation if the entry is already present
>>>>> in hw. With this patch, then:
>>>>>
>>>>> - 43,94% tcf_classify
>>>>>       - fl_classify
>>>>>          - 39,64% tcf_action_exec
>>>>>             + 38,00% tcf_mirred_act
>>>>>             - 1,04% tcf_ct_act
>>>>>                  0,63% tcf_ct_flow_table_lookup
>>>>>          + 3,19% fl_mask_lookup
>>>>>
>>>>> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>>> ---
>>>>>     net/netfilter/nf_flow_table_offload.c | 3 +++
>>>>>     1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
>>>>> index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
>>>>> --- a/net/netfilter/nf_flow_table_offload.c
>>>>> +++ b/net/netfilter/nf_flow_table_offload.c
>>>>> @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
>>>>>     {
>>>>>     	struct flow_offload_work *offload;
>>>>> +	if (test_bit(NF_FLOW_HW, &flow->flags))
>>>>> +		return;
>>>>> +
>>>>
>>>> This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
>>>> on the first flow offload attempt.
>>>
>>> Oh oh.. I was quite sure it was getting cleared when the entry was
>>> removed from HW, but not.
>>>
>>> So instead of the if() above, what about:
>>> +       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))
>>
>> I think this will set the IPS_HW_OFFLOAD_BIT prematurely.
>> Currently this bit is set only when the the flow has been successfully
>> offloaded.
> 
> Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry.
> 
>>
>>>
>>> AFAICT it will keep trying while the entry is not present in the flow
>>> table, and stop while it is there. Once the entry is aged from HW, it
>>> is also removed from the flow table, so this part should be okay.
>>> But if the offload failed for some reason like you said above, and the
>>> entry is left on the flow table, it won't retry until it ages out from
>>> the flow table.
>>
>> But then we will never have the chance to re-install it in hardware while
>> the connection is still active.
> 
> Unless it goes idle, yes.
> 
>>
>>>
>>> If you expect that this situation can be easily triggered, we may need
>>> to add a rate limit instead then. Even for these connections that
>>> failed to offload, this "busy retrying" is expensive and may backfire
>>> in such situation.
>>
>> Perhaps we can refresh only if the flow_block callbacks list is not empty.
> 
> It may not be empty even for sw datapath. If you have packets coming
> from the wire towards a veth/virtio, for example. It will likely
> having a matching act_ct with the same zone number on both directions.
> And/or if a zone is shared, as the the flow table then is also shared.

Hmm, I was thinking that you are targeting the use case of deployments 
with no ct offload supporting hardware.

CT action is usually proceeded by a goto action. So, a filter with ct 
and goto action list will offload even if the last chain forwards to a 
veth/virtio device.

> 
>>
>>
>>>
>>>>
>>>>>     	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
>>>>>     	if (!offload)
>>>>>     		return;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries
  2022-06-28 15:34         ` Oz Shlomo
@ 2022-06-28 15:44           ` Marcelo Ricardo Leitner
  0 siblings, 0 replies; 7+ messages in thread
From: Marcelo Ricardo Leitner @ 2022-06-28 15:44 UTC (permalink / raw)
  To: Oz Shlomo; +Cc: netfilter-devel, Pablo Neira Ayuso, Paul Blakey

On Tue, Jun 28, 2022 at 06:34:26PM +0300, Oz Shlomo wrote:
> 
> 
> On 6/28/2022 6:25 PM, Marcelo Ricardo Leitner wrote:
> > On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote:
> > > Hi Marcelo,
> > > 
> > > 
> > > On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote:
> > > > Hi Oz,
> > > > 
> > > > On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
> > > > > Hi Marcelo,
> > > > > 
> > > > > On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
> > > > > > Currently, whenever act_ct tries to match a packet against the flow
> > > > > > table, it will also try to refresh the offload. That is, at the end
> > > > > > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
> > > > > > 
> > > > > > The problem is that flow_offload_refresh() will try to offload entries
> > > > > > that are actually already offloaded, leading to expensive and useless
> > > > > > work. Before this patch, with a simple iperf3 test on OVS + TC
> > > > > 
> > > > > Packets of offloaded connections are expected to process in hardware.
> > > > > As such, it is not expected to receive packets in software from offloaded
> > > > > connections.
> > > > > 
> > > > > However, hardware offload may fail due to various reasons (e.g. size limits,
> > > > > insertion rate throttling etc.).
> > > > > The "refresh" mechanism is the enabler for offload retries.
> > > > 
> > > > Thanks for the quick review.
> > > > 
> > > > Right. I don't mean to break this mechanism. Act_ct can also be used
> > > > in semi/pure sw datapath, and then the premise of packets being
> > > > expected to be handled in hw is not valid anymore. I can provide a
> > > > more detailed use case if you need.
> > > 
> > > It is clear that the refresh design introduces some overhead when act_ct is
> > > used in a pure sw datapath.
> > 
> > Cool.
> > 
> > > 
> > > > 
> > > > > 
> > > > > 
> > > > > > (hw_offload=true) + CT test entirely in sw, it looks like:
> > > > > > 
> > > > > > - 39,81% tcf_classify
> > > > > >       - fl_classify
> > > > > >          - 37,09% tcf_action_exec
> > > > > >             + 33,18% tcf_mirred_act
> > > > > >             - 2,69% tcf_ct_act
> > > > > >                - 2,39% tcf_ct_flow_table_lookup
> > > > > >                   - 1,67% queue_work_on
> > > > > >                      - 1,52% __queue_work
> > > > > >                           1,20% try_to_wake_up
> > > > > >             + 0,80% tcf_pedit_act
> > > > > >          + 2,28% fl_mask_lookup
> > > > > > 
> > > > > > The patch here aborts the add operation if the entry is already present
> > > > > > in hw. With this patch, then:
> > > > > > 
> > > > > > - 43,94% tcf_classify
> > > > > >       - fl_classify
> > > > > >          - 39,64% tcf_action_exec
> > > > > >             + 38,00% tcf_mirred_act
> > > > > >             - 1,04% tcf_ct_act
> > > > > >                  0,63% tcf_ct_flow_table_lookup
> > > > > >          + 3,19% fl_mask_lookup
> > > > > > 
> > > > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > > > > > ---
> > > > > >     net/netfilter/nf_flow_table_offload.c | 3 +++
> > > > > >     1 file changed, 3 insertions(+)
> > > > > > 
> > > > > > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> > > > > > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
> > > > > > --- a/net/netfilter/nf_flow_table_offload.c
> > > > > > +++ b/net/netfilter/nf_flow_table_offload.c
> > > > > > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
> > > > > >     {
> > > > > >     	struct flow_offload_work *offload;
> > > > > > +	if (test_bit(NF_FLOW_HW, &flow->flags))
> > > > > > +		return;
> > > > > > +
> > > > > 
> > > > > This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
> > > > > on the first flow offload attempt.
> > > > 
> > > > Oh oh.. I was quite sure it was getting cleared when the entry was
> > > > removed from HW, but not.
> > > > 
> > > > So instead of the if() above, what about:
> > > > +       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))
> > > 
> > > I think this will set the IPS_HW_OFFLOAD_BIT prematurely.
> > > Currently this bit is set only when the the flow has been successfully
> > > offloaded.
> > 
> > Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry.
> > 
> > > 
> > > > 
> > > > AFAICT it will keep trying while the entry is not present in the flow
> > > > table, and stop while it is there. Once the entry is aged from HW, it
> > > > is also removed from the flow table, so this part should be okay.
> > > > But if the offload failed for some reason like you said above, and the
> > > > entry is left on the flow table, it won't retry until it ages out from
> > > > the flow table.
> > > 
> > > But then we will never have the chance to re-install it in hardware while
> > > the connection is still active.
> > 
> > Unless it goes idle, yes.
> > 
> > > 
> > > > 
> > > > If you expect that this situation can be easily triggered, we may need
> > > > to add a rate limit instead then. Even for these connections that
> > > > failed to offload, this "busy retrying" is expensive and may backfire
> > > > in such situation.
> > > 
> > > Perhaps we can refresh only if the flow_block callbacks list is not empty.
> > 
> > It may not be empty even for sw datapath. If you have packets coming
> > from the wire towards a veth/virtio, for example. It will likely
> > having a matching act_ct with the same zone number on both directions.
> > And/or if a zone is shared, as the the flow table then is also shared.
> 
> Hmm, I was thinking that you are targeting the use case of deployments with
> no ct offload supporting hardware.

Ahh.

> 
> CT action is usually proceeded by a goto action. So, a filter with ct and
> goto action list will offload even if the last chain forwards to a
> veth/virtio device.

Right, but there's the outbound traffic. Traffic coming from a
container that is using veth, and going out to the wire.
And also traffic from one container to another in the same host, that
can also trigger this, if at least one of them is not annotated to use
the offloading.
These will be using ingress tc filters on veth interfaces, which won't
offload.

> 
> > 
> > > 
> > > 
> > > > 
> > > > > 
> > > > > >     	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
> > > > > >     	if (!offload)
> > > > > >     		return;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-28 15:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27 14:38 [PATCH nf-next] flow_table: do not try to add already offloaded entries Marcelo Ricardo Leitner
2022-06-27 15:19 ` Oz Shlomo
2022-06-27 17:06   ` Marcelo Ricardo Leitner
2022-06-28 13:13     ` Oz Shlomo
2022-06-28 15:25       ` Marcelo Ricardo Leitner
2022-06-28 15:34         ` Oz Shlomo
2022-06-28 15:44           ` Marcelo Ricardo Leitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.