netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Yet another approach for implementing connection tracking offload
@ 2019-02-18 19:00 Yossi Kuperman
  2019-02-22 22:23 ` Marcelo Leitner
  0 siblings, 1 reply; 3+ messages in thread
From: Yossi Kuperman @ 2019-02-18 19:00 UTC (permalink / raw)
  To: Guy Shattah, Aaron Conole, John Hurley, Simon Horman,
	Justin Pettit, Gregory Rose, Eelco Chaudron, Flavio Leitner,
	Florian Westphal, Jiri Pirko, Rashid Khan, Sushil Kulkarni,
	Andy Gospodarek, Roi Dayan, Yossi Kuperman, Or Gerlitz,
	Rony Efraim, davem, Marcelo Leitner, Paul Blakey
  Cc: netdev

Hello All,

Following is a description of yet another possible approach to implement connection tracking offload. We would like to hear your opinion. There is the “native” way of implementing such an offload by mirroring the software tables to hardware. This way seems straightforward and simple, but real life is much more complicated than that. Alternatively, we can merge the data-path flows (separated by recirc_id) and offload a single flow to hardware.  
 
The general idea is quite simple. When OVS-daemon configures TC with a filter that recirculate, the driver merely pretends to offload it and return success. Upon packet arrival (in software) we let it traverse TC as usual, except for now we notify the driver on each successful match. By doing this, the driver has all the necessary information to merge the participating flows---including connection tracking 5-tuple---into one equivalent flow. We do such a merge and offload only if the connection is established. Note: the same mechanism to communicate a 5-tuple to the driver can be used to notify on a filter match.
 
It is the driver responsibility to build and maintain the list of filters a (specific) packet hit along the TC walk. Once we reach the last filter (a terminating one, e.g., forward) the driver posts a work on a dedicated work-queue. In this work-queue context, we merge the participating filters and create a new filter that is logically equal (match + actions). The merge itself is not complicated as it might seems—TC does all the heavy lifting, this is not a random list of filters. At this point, we configure the hardware with one filter, either we have a match and the packet is handled by the hardware, or we don’t and the packet goes to software unmodified.
 
Going along this path we must tackle two things: 1) counters and 2) TC filter deletion. 1) We must maintain TC counters as the user expect. Each merged filter holds a list of filters it is derived from, parents. Once an update is available for a merged filter counter, the driver must update the corresponding parents appropriately. 2) Upon TC filer deletion it is mandatory to remove all the derived (merged) filters from the hardware as consequence.
 
 
Pros & Cons
 
Pros: 1) Circumvent the complexity involved with continuation in software where the hardware left off. 2) Simplifies the hardware pipeline with only one filter and might improve the overall performance.
 
Cons: 1) Only applicable to OVS-oriented filters, will not support priorities and overlapping filters. 2) Merger logic might consume CPU cycles which might impact the rate of filters we can offload. However, this overhead is believed to be negligible, if implemented carefully. 3) Requires TC/flower to notify the driver on each filter match (that is the only change needed above the driver).
 
 
Both approaches share the same software model, most of the code above the driver is shared. This approach can be considered temporary until the hardware will mature.
 
What do you think about this approach?
 
If something is not clear please let me know and I will do my best to clarify.
 
Cheers,
Kuperman


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Yet another approach for implementing connection tracking offload
  2019-02-18 19:00 Yet another approach for implementing connection tracking offload Yossi Kuperman
@ 2019-02-22 22:23 ` Marcelo Leitner
  2019-02-26 13:35   ` Yossi Kuperman
  0 siblings, 1 reply; 3+ messages in thread
From: Marcelo Leitner @ 2019-02-22 22:23 UTC (permalink / raw)
  To: Yossi Kuperman
  Cc: Guy Shattah, Aaron Conole, John Hurley, Simon Horman,
	Justin Pettit, Gregory Rose, Eelco Chaudron, Flavio Leitner,
	Florian Westphal, Jiri Pirko, Rashid Khan, Sushil Kulkarni,
	Andy Gospodarek, Roi Dayan, Or Gerlitz, Rony Efraim, davem,
	Paul Blakey, netdev

On Mon, Feb 18, 2019 at 07:00:19PM +0000, Yossi Kuperman wrote:
> Hello All,
> 
> Following is a description of yet another possible approach to
> implement connection tracking offload. We would like to hear your
> opinion. There is the “native” way of implementing such an offload
> by mirroring the software tables to hardware. This way seems
> straightforward and simple, but real life is much more complicated
> than that. Alternatively, we can merge the data-path flows
> (separated by recirc_id) and offload a single flow to hardware.  
>  
> The general idea is quite simple. When OVS-daemon configures TC with
> a filter that recirculate, the driver merely pretends to offload it
> and return success. Upon packet arrival (in software) we let it

This has potential to be a support nightmare: things that should and
seems to be but are actually not and expected not to be.
IOW, not_in_hw should still be there somehow.

> traverse TC as usual, except for now we notify the driver on each
> successful match. By doing this, the driver has all the necessary
> information to merge the participating flows---including connection
> tracking 5-tuple---into one equivalent flow. We do such a merge and
> offload only if the connection is established. Note: the same
> mechanism to communicate a 5-tuple to the driver can be used to
> notify on a filter match.
>  
> It is the driver responsibility to build and maintain the list of
> filters a (specific) packet hit along the TC walk. Once we reach the

I'm assuming this could be a shared code amongst the drivers, like
DIM. We really don't want different algorithms for this.
With that, we would be using the driver as just a temporary storage,
for the matches/actions. Maybe we can do that with skb extensions?

Like, turn on a flight recorder extention and have one especial last
tc action (or even embed it into core tc) to process it when the
traversing finishes. And only then call the driver to update whatever
is needed.

> last filter (a terminating one, e.g., forward) the driver posts a
> work on a dedicated work-queue. In this work-queue context, we merge
> the participating filters and create a new filter that is logically
> equal (match + actions). The merge itself is not complicated as it
> might seems—TC does all the heavy lifting, this is not a random list
> of filters. At this point, we configure the hardware with one
> filter, either we have a match and the packet is handled by the
> hardware, or we don’t and the packet goes to software unmodified.
>  
> Going along this path we must tackle two things: 1) counters and 2)
> TC filter deletion. 1) We must maintain TC counters as the user
> expect. Each merged filter holds a list of filters it is derived
> from, parents. Once an update is available for a merged filter
> counter, the driver must update the corresponding parents
> appropriately. 2) Upon TC filer deletion it is mandatory to remove
> all the derived (merged) filters from the hardware as consequence.

I'm failing to see how this would address two features of CT:

- window validation. How would the card know that it should perform
  window validation on these packets?

- flow stats. If it offloads the result of a merge, how can it
  retrieve the specific stats for each 5-tuple? Even with the lists
  you mentioned, once the 5-tuples are aggregated, we can't separate
  them anymore.

>  
>  
> Pros & Cons
>  
> Pros: 1) Circumvent the complexity involved with continuation in
> software where the hardware left off. 2) Simplifies the hardware
> pipeline with only one filter and might improve the overall
> performance.
>  
> Cons: 1) Only applicable to OVS-oriented filters, will not support
> priorities and overlapping filters. 2) Merger logic might consume
> CPU cycles which might impact the rate of filters we can offload.
> However, this overhead is believed to be negligible, if implemented
> carefully. 3) Requires TC/flower to notify the driver on each filter
> match (that is the only change needed above the driver).

On more general comments, I have the feeling that this would be best
done in OvS:
- it has 2 flow "translations": one from OpenFlow to OvS/tc sw
  datapath, and then OvS/tc sw to hw. If the resulting flows are
  better for the hw, aren't them better for sw datapath too?

- if this is for OvS-oriented filters, why can't/shouldn't OvS do this
  processing instead?


Somehow I keep thinking this is replacing conntrack (as in the
conntrack subsystem) with ipset offloading.

All in all, any approach is probably fine but I'm struggling to see a
path forward from this to actual CT offloading. Maybe I missed
something.

Cheers,
Marcelo

>  
>  
> Both approaches share the same software model, most of the code
> above the driver is shared. This approach can be considered
> temporary until the hardware will mature.
>  
> What do you think about this approach?
>  
> If something is not clear please let me know and I will do my best
> to clarify.
>  
> Cheers, Kuperman
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Yet another approach for implementing connection tracking offload
  2019-02-22 22:23 ` Marcelo Leitner
@ 2019-02-26 13:35   ` Yossi Kuperman
  0 siblings, 0 replies; 3+ messages in thread
From: Yossi Kuperman @ 2019-02-26 13:35 UTC (permalink / raw)
  To: Marcelo Leitner
  Cc: Guy Shattah, Aaron Conole, John Hurley, Simon Horman,
	Justin Pettit, Gregory Rose, Eelco Chaudron, Flavio Leitner,
	Florian Westphal, Jiri Pirko, Rashid Khan, Sushil Kulkarni,
	Andy Gospodarek, Roi Dayan, Or Gerlitz, Rony Efraim, davem,
	Paul Blakey, netdev, Oz Shlomo


On 23/02/2019 0:23, Marcelo Leitner wrote:
> On Mon, Feb 18, 2019 at 07:00:19PM +0000, Yossi Kuperman wrote:
>> Hello All,
>>
>> Following is a description of yet another possible approach to
>> implement connection tracking offload. We would like to hear your
>> opinion. There is the “native” way of implementing such an offload
>> by mirroring the software tables to hardware. This way seems
>> straightforward and simple, but real life is much more complicated
>> than that. Alternatively, we can merge the data-path flows
>> (separated by recirc_id) and offload a single flow to hardware.
>>   
>> The general idea is quite simple. When OVS-daemon configures TC with
>> a filter that recirculate, the driver merely pretends to offload it
>> and return success. Upon packet arrival (in software) we let it
> This has potential to be a support nightmare: things that should and
> seems to be but are actually not and expected not to be.
> IOW, not_in_hw should still be there somehow.

Agree, we should handle this properly.

Please note we have a similar behavior with MT. From TC perspective it

may seem that everything is offloaded (in_hw), but we will encounter a

miss for every new connection.


>> traverse TC as usual, except for now we notify the driver on each
>> successful match. By doing this, the driver has all the necessary
>> information to merge the participating flows---including connection
>> tracking 5-tuple---into one equivalent flow. We do such a merge and
>> offload only if the connection is established. Note: the same
>> mechanism to communicate a 5-tuple to the driver can be used to
>> notify on a filter match.
>>   
>> It is the driver responsibility to build and maintain the list of
>> filters a (specific) packet hit along the TC walk. Once we reach the
> I'm assuming this could be a shared code amongst the drivers, like
> DIM. We really don't want different algorithms for this.
> With that, we would be using the driver as just a temporary storage,
> for the matches/actions. Maybe we can do that with skb extensions?
>
> Like, turn on a flight recorder extention and have one especial last
> tc action (or even embed it into core tc) to process it when the
> traversing finishes. And only then call the driver to update whatever
> is needed.
>
It is a possibility.


>> last filter (a terminating one, e.g., forward) the driver posts a
>> work on a dedicated work-queue. In this work-queue context, we merge
>> the participating filters and create a new filter that is logically
>> equal (match + actions). The merge itself is not complicated as it
>> might seems—TC does all the heavy lifting, this is not a random list
>> of filters. At this point, we configure the hardware with one
>> filter, either we have a match and the packet is handled by the
>> hardware, or we don’t and the packet goes to software unmodified.
>>   
>> Going along this path we must tackle two things: 1) counters and 2)
>> TC filter deletion. 1) We must maintain TC counters as the user
>> expect. Each merged filter holds a list of filters it is derived
>> from, parents. Once an update is available for a merged filter
>> counter, the driver must update the corresponding parents
>> appropriately. 2) Upon TC filer deletion it is mandatory to remove
>> all the derived (merged) filters from the hardware as consequence.
> I'm failing to see how this would address two features of CT:
>
> - window validation. How would the card know that it should perform
>    window validation on these packets?

The driver should allocate a context and mapped it appropriately, similar

way one should do for the MT approach. It is possible that one context

is shared between two or more merged flows. Context holds TCP state

in hardware.

> - flow stats. If it offloads the result of a merge, how can it
>    retrieve the specific stats for each 5-tuple? Even with the lists
>    you mentioned, once the 5-tuples are aggregated, we can't separate
>    them anymore.

Each merged flow maintains a list of flows it is derived from, including

any 5-tuple involved. We still have an entry for each 5-tuple in the driver,

and we keep it up-to-date based on statistics from related merged flows.

Upper layers can query the driver for the relevant information given a 
5-tuple.


Am I missing something?

>>   
>>   
>> Pros & Cons
>>   
>> Pros: 1) Circumvent the complexity involved with continuation in
>> software where the hardware left off. 2) Simplifies the hardware
>> pipeline with only one filter and might improve the overall
>> performance.
>>   
>> Cons: 1) Only applicable to OVS-oriented filters, will not support
>> priorities and overlapping filters. 2) Merger logic might consume
>> CPU cycles which might impact the rate of filters we can offload.
>> However, this overhead is believed to be negligible, if implemented
>> carefully. 3) Requires TC/flower to notify the driver on each filter
>> match (that is the only change needed above the driver).
> On more general comments, I have the feeling that this would be best
> done in OvS:
> - it has 2 flow "translations": one from OpenFlow to OvS/tc sw
>    datapath, and then OvS/tc sw to hw. If the resulting flows are
>    better for the hw, aren't them better for sw datapath too?

A valid point. I'm not sure by how much it will improve the slow-path

performance, but it will surely complicate the code. Doing this "trick"

definitely improves the hardware performance.

> - if this is for OvS-oriented filters, why can't/shouldn't OvS do this
>    processing instead?
>
It is something we have considered before, as it makes sense. However,

going this way means for every new connection (different 5-tuple) a costly

"travel" to OVS user-space.

> Somehow I keep thinking this is replacing conntrack (as in the
> conntrack subsystem) with ipset offloading.
>
> All in all, any approach is probably fine but I'm struggling to see a
> path forward from this to actual CT offloading. Maybe I missed
> something.
> Cheers,
> Marcelo
>
>>   
>>   
>> Both approaches share the same software model, most of the code
>> above the driver is shared. This approach can be considered
>> temporary until the hardware will mature.
>>   
>> What do you think about this approach?
>>   
>> If something is not clear please let me know and I will do my best
>> to clarify.
>>   
>> Cheers, Kuperman
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-26 13:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-18 19:00 Yet another approach for implementing connection tracking offload Yossi Kuperman
2019-02-22 22:23 ` Marcelo Leitner
2019-02-26 13:35   ` Yossi Kuperman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).