Centralizing support for TCAM?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Centralizing support for TCAM?
       [not found] <57d4a2db-ca3b-909a-073a-52ecceb428f2@gmail.com>
@ 2016-09-02 17:18 ` Florian Fainelli
  2016-09-02 18:49   ` John Fastabend
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Fainelli @ 2016-09-02 17:18 UTC (permalink / raw)
  To: netdev
  Cc: jiri, idosh, john.fastabend, ast, davem, jhs, ecree, andrew,
	vivien.didelot, john.fastabend

Hi all,

(apologies for the long CC list and the fact that I can't type correctly
email addresses)

While working on adding support for the Broadcom Ethernet switches
Compact Field Processor (which is essentially a TCAM,
action/policer/rate meter RAMs, 256 entries), I started working with the
ethtool::rxnfc API which is actually kind of nice in that it fits nicely
with my use simple case of being able to insert rules at a given or
driver selected location and has a pretty good flow representation for
common things you may match: TCP/UDP v4/v6 (not so much for non-IP, or
L2 stuff though you can use the extension flow representation). It lacks
support for more complex actions other than redirect to a particular
port/queue.

Now ethtool::rxnfc is one possible user, but tc and netfiler also are,
more powerful and extensible, but since this is a resource constrained
piece of hardware, and it would suck for people to have to implement
these 3 APIs if we could come up with a central one that satisfies the
superset offered by tc + netfilter. We can surely imagine an use case we
centralize the whole matching + action into a Domain Specific Language
that we compiled into eBPF and then translate into whatever the HW
understands, although that raises the question of where do we put the
translation tool in user space or kernel space.

So what's everybody's take on this?

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-02 17:18 ` Centralizing support for TCAM? Florian Fainelli
@ 2016-09-02 18:49   ` John Fastabend
  2016-09-02 19:02     ` Andrew Lunn
  2016-09-03  7:09     ` Jiri Pirko
  0 siblings, 2 replies; 8+ messages in thread
From: John Fastabend @ 2016-09-02 18:49 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: jiri, idosh, john.fastabend, ast, davem, jhs, ecree, andrew,
	vivien.didelot

On 16-09-02 10:18 AM, Florian Fainelli wrote:
> Hi all,
> 

Hi Florian,

> (apologies for the long CC list and the fact that I can't type correctly
> email addresses)
> 

My favorite topic ;)

> While working on adding support for the Broadcom Ethernet switches
> Compact Field Processor (which is essentially a TCAM,
> action/policer/rate meter RAMs, 256 entries), I started working with the
> ethtool::rxnfc API which is actually kind of nice in that it fits nicely
> with my use simple case of being able to insert rules at a given or
> driver selected location and has a pretty good flow representation for
> common things you may match: TCP/UDP v4/v6 (not so much for non-IP, or
> L2 stuff though you can use the extension flow representation). It lacks
> support for more complex actions other than redirect to a particular
> port/queue.

When I was doing this for one of the products I work on I decided that
extending ethtool was likely not a good approach and building a netlink
interface would be a better choice. My reasons were mainly extending
ethtool is a bit painful to keep structure compatibility across versions
and I also had use cases that wanted to get notifications both made
easier when using netlink. However my netlink port+extensions were not
accepted and were called a "kernel bypass" and the general opinion was
that it was not going to be accepted upstream. Hence the 'tc' effort.

> 
> Now ethtool::rxnfc is one possible user, but tc and netfiler also are,
> more powerful and extensible, but since this is a resource constrained
> piece of hardware, and it would suck for people to have to implement
> these 3 APIs if we could come up with a central one that satisfies the
> superset offered by tc + netfilter. We can surely imagine an use case we

My opinion is that tc and netfilter are sufficiently different that
building a common layer is challenging and is actually more complex vs
just implementing two interfaces. Always happy to review code though.

There is also an already established packet flow through tc, netfilter,
fdb, l3 in linux that folks want to maintain. At the moment I just don't
see the need for a common layer IMO.

Also adding another layer of abstraction so we end up doing multiple
translations into and out of these layers adds overhead. Eventually
I need to get reasonable operations per second on the TCAM tables.
Reasonable for me being somewhere in the 50k to 100k add/del/update
commands per second. I'm hesitant to create more abstractions then
are actually needed.

> centralize the whole matching + action into a Domain Specific Language
> that we compiled into eBPF and then translate into whatever the HW
> understands, although that raises the question of where do we put the
> translation tool in user space or kernel space.

The eBPF to HW translation I started to look at but gave up. The issue
was the program space of eBPF is much larger than any traditional
parser, table hardware implementation can support so most programs get
rejected (obvious observation right?). I'm more inclined to build
hardware that can support eBPF vs restricting eBPF to fit into a
parser/table model.

Surely something like P4 (DSL) -> ebpf -> HW can constrain the ebpf
programs so they can be loaded without issues. This might be worth
while but mapping it onto 'tc' classifiers like cls_{u32|flower} is a
bit more straight forward.

> 
> So what's everybody's take on this?

Seems a good time to bring up my other issue. When I have a pipeline
with multiple TCAM tables I was trying to figure out how to abstract
that in Linux. Something like the following

    TCAM -> exact match -> TCAM -> exact match

So for now I was thinking of lifting two netdevs into linux something
like, ethx-frontend, ethx-backend. Where rules added to the frontend
go into the front part of the pipeline and rules added to the backend
go into the second half of the pipeline.

It probably needs more thought.

> 
> Thanks!
> 

Not sure that helps but my suggestion is to see if the
cls_u32/cls_flower implementation that exists today solves at least
the TCAM entry problem. Note the "order" field in u32 allows you to
place rules in user specific order.

.John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-02 18:49   ` John Fastabend
@ 2016-09-02 19:02     ` Andrew Lunn
  2016-09-03  7:09     ` Jiri Pirko
  1 sibling, 0 replies; 8+ messages in thread
From: Andrew Lunn @ 2016-09-02 19:02 UTC (permalink / raw)
  To: John Fastabend
  Cc: Florian Fainelli, netdev, jiri, idosh, john.fastabend, ast,
	davem, jhs, ecree, vivien.didelot

> I need to get reasonable operations per second on the TCAM tables.
> Reasonable for me being somewhere in the 50k to 100k add/del/update
> commands per second. I'm hesitant to create more abstractions then
> are actually needed.

Hi John

That is an interesting requirement. Could you explain why?

The Marvell TCAM is accessed using MDIO. Maybe it can do 50
add/del/updates per second. For a WiFi access point, or cable modem,
even 50 per second seems ample, they get set at boot and never
changed.

	Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-02 18:49   ` John Fastabend
  2016-09-02 19:02     ` Andrew Lunn
@ 2016-09-03  7:09     ` Jiri Pirko
  2016-09-06  3:44       ` Alexei Starovoitov
  1 sibling, 1 reply; 8+ messages in thread
From: Jiri Pirko @ 2016-09-03  7:09 UTC (permalink / raw)
  To: John Fastabend
  Cc: Florian Fainelli, netdev, jiri, idosh, john.fastabend, ast,
	davem, jhs, ecree, andrew, vivien.didelot

Fri, Sep 02, 2016 at 08:49:34PM CEST, john.fastabend@gmail.com wrote:
>On 16-09-02 10:18 AM, Florian Fainelli wrote:
>> Hi all,
>> 
>
>Hi Florian,
>
>> (apologies for the long CC list and the fact that I can't type correctly
>> email addresses)
>> 
>
>My favorite topic ;)
>
>> While working on adding support for the Broadcom Ethernet switches
>> Compact Field Processor (which is essentially a TCAM,
>> action/policer/rate meter RAMs, 256 entries), I started working with the
>> ethtool::rxnfc API which is actually kind of nice in that it fits nicely
>> with my use simple case of being able to insert rules at a given or
>> driver selected location and has a pretty good flow representation for
>> common things you may match: TCP/UDP v4/v6 (not so much for non-IP, or
>> L2 stuff though you can use the extension flow representation). It lacks
>> support for more complex actions other than redirect to a particular
>> port/queue.
>
>When I was doing this for one of the products I work on I decided that
>extending ethtool was likely not a good approach and building a netlink
>interface would be a better choice. My reasons were mainly extending
>ethtool is a bit painful to keep structure compatibility across versions
>and I also had use cases that wanted to get notifications both made
>easier when using netlink. However my netlink port+extensions were not
>accepted and were called a "kernel bypass" and the general opinion was
>that it was not going to be accepted upstream. Hence the 'tc' effort.

Ethtool should die peacefully. Don't poke in it in the process...


>
>> 
>> Now ethtool::rxnfc is one possible user, but tc and netfiler also are,
>> more powerful and extensible, but since this is a resource constrained
>> piece of hardware, and it would suck for people to have to implement
>> these 3 APIs if we could come up with a central one that satisfies the
>> superset offered by tc + netfilter. We can surely imagine an use case we
>
>My opinion is that tc and netfilter are sufficiently different that
>building a common layer is challenging and is actually more complex vs
>just implementing two interfaces. Always happy to review code though.

In february, Pablo did some work on finding the common intermediate
layer for classifier-action subsystem. It was rejected with the argument
of unnecessary overhead. Makes sense to me. After that, you introduced
u32 tc offload. Since that, couple more tc classifiers and actions were
offloaded.

I believe that for Florian's usecase, TC is a great fit. You can just use
cls_flower with couple of actions.

My colleagues are working hard on enabling cls_flower offload. You can
easily benefit that. In mlxsw we also plan to use that for our TCAM ACLs
offloading.


>
>There is also an already established packet flow through tc, netfilter,
>fdb, l3 in linux that folks want to maintain. At the moment I just don't
>see the need for a common layer IMO.
>
>Also adding another layer of abstraction so we end up doing multiple
>translations into and out of these layers adds overhead. Eventually
>I need to get reasonable operations per second on the TCAM tables.
>Reasonable for me being somewhere in the 50k to 100k add/del/update
>commands per second. I'm hesitant to create more abstractions then
>are actually needed.
>
>> centralize the whole matching + action into a Domain Specific Language
>> that we compiled into eBPF and then translate into whatever the HW
>> understands, although that raises the question of where do we put the
>> translation tool in user space or kernel space.
>
>The eBPF to HW translation I started to look at but gave up. The issue
>was the program space of eBPF is much larger than any traditional
>parser, table hardware implementation can support so most programs get
>rejected (obvious observation right?). I'm more inclined to build
>hardware that can support eBPF vs restricting eBPF to fit into a
>parser/table model.

+1
I have been thinging a lot about this and I believe that parsing bpf
program in drivers into some pre-defined tables is quite complex. I
think that bpf is just very unsuitable to offload, if you don't have a
hw which could directly interpret it.
I know that Alexei disagrees :)


>
>Surely something like P4 (DSL) -> ebpf -> HW can constrain the ebpf
>programs so they can be loaded without issues. This might be worth
>while but mapping it onto 'tc' classifiers like cls_{u32|flower} is a
>bit more straight forward.
>
>> 
>> So what's everybody's take on this?
>
>Seems a good time to bring up my other issue. When I have a pipeline
>with multiple TCAM tables I was trying to figure out how to abstract
>that in Linux. Something like the following
>
>    TCAM -> exact match -> TCAM -> exact match
>
>So for now I was thinking of lifting two netdevs into linux something
>like, ethx-frontend, ethx-backend. Where rules added to the frontend
>go into the front part of the pipeline and rules added to the backend
>go into the second half of the pipeline.
>
>It probably needs more thought.
>
>> 
>> Thanks!
>> 
>
>Not sure that helps but my suggestion is to see if the
>cls_u32/cls_flower implementation that exists today solves at least
>the TCAM entry problem. Note the "order" field in u32 allows you to
>place rules in user specific order.
>
>.John

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-03  7:09     ` Jiri Pirko
@ 2016-09-06  3:44       ` Alexei Starovoitov
  2016-09-06 11:17         ` Jamal Hadi Salim
  0 siblings, 1 reply; 8+ messages in thread
From: Alexei Starovoitov @ 2016-09-06  3:44 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: John Fastabend, Florian Fainelli, netdev, jiri, idosh,
	john.fastabend, ast, davem, jhs, ecree, andrew, vivien.didelot

On Sat, Sep 03, 2016 at 09:09:50AM +0200, Jiri Pirko wrote:
> Fri, Sep 02, 2016 at 08:49:34PM CEST, john.fastabend@gmail.com wrote:
> >On 16-09-02 10:18 AM, Florian Fainelli wrote:
> >> Hi all,
> >> 
> >
> >Hi Florian,
> >
> >> (apologies for the long CC list and the fact that I can't type correctly
> >> email addresses)
> >> 
> >
> >My favorite topic ;)
> >
> >> While working on adding support for the Broadcom Ethernet switches
> >> Compact Field Processor (which is essentially a TCAM,
> >> action/policer/rate meter RAMs, 256 entries), I started working with the
> >> ethtool::rxnfc API which is actually kind of nice in that it fits nicely
> >> with my use simple case of being able to insert rules at a given or
> >> driver selected location and has a pretty good flow representation for
> >> common things you may match: TCP/UDP v4/v6 (not so much for non-IP, or
> >> L2 stuff though you can use the extension flow representation). It lacks
> >> support for more complex actions other than redirect to a particular
> >> port/queue.
> >
> >When I was doing this for one of the products I work on I decided that
> >extending ethtool was likely not a good approach and building a netlink
> >interface would be a better choice. My reasons were mainly extending
> >ethtool is a bit painful to keep structure compatibility across versions
> >and I also had use cases that wanted to get notifications both made
> >easier when using netlink. However my netlink port+extensions were not
> >accepted and were called a "kernel bypass" and the general opinion was
> >that it was not going to be accepted upstream. Hence the 'tc' effort.
> 
> Ethtool should die peacefully. Don't poke in it in the process...
> 
> 
> >
> >> 
> >> Now ethtool::rxnfc is one possible user, but tc and netfiler also are,
> >> more powerful and extensible, but since this is a resource constrained
> >> piece of hardware, and it would suck for people to have to implement
> >> these 3 APIs if we could come up with a central one that satisfies the
> >> superset offered by tc + netfilter. We can surely imagine an use case we
> >
> >My opinion is that tc and netfilter are sufficiently different that
> >building a common layer is challenging and is actually more complex vs
> >just implementing two interfaces. Always happy to review code though.
> 
> In february, Pablo did some work on finding the common intermediate
> layer for classifier-action subsystem. It was rejected with the argument
> of unnecessary overhead. Makes sense to me. After that, you introduced
> u32 tc offload. Since that, couple more tc classifiers and actions were
> offloaded.
> 
> I believe that for Florian's usecase, TC is a great fit. You can just use
> cls_flower with couple of actions.
> 
> My colleagues are working hard on enabling cls_flower offload. You can
> easily benefit that. In mlxsw we also plan to use that for our TCAM ACLs
> offloading.
> 
> 
> >
> >There is also an already established packet flow through tc, netfilter,
> >fdb, l3 in linux that folks want to maintain. At the moment I just don't
> >see the need for a common layer IMO.
> >
> >Also adding another layer of abstraction so we end up doing multiple
> >translations into and out of these layers adds overhead. Eventually
> >I need to get reasonable operations per second on the TCAM tables.
> >Reasonable for me being somewhere in the 50k to 100k add/del/update
> >commands per second. I'm hesitant to create more abstractions then
> >are actually needed.
> >
> >> centralize the whole matching + action into a Domain Specific Language
> >> that we compiled into eBPF and then translate into whatever the HW
> >> understands, although that raises the question of where do we put the
> >> translation tool in user space or kernel space.
> >
> >The eBPF to HW translation I started to look at but gave up. The issue
> >was the program space of eBPF is much larger than any traditional
> >parser, table hardware implementation can support so most programs get
> >rejected (obvious observation right?). I'm more inclined to build
> >hardware that can support eBPF vs restricting eBPF to fit into a
> >parser/table model.
> 
> +1
> I have been thinging a lot about this and I believe that parsing bpf
> program in drivers into some pre-defined tables is quite complex. I
> think that bpf is just very unsuitable to offload, if you don't have a
> hw which could directly interpret it.
> I know that Alexei disagrees :)

lol :)
compiling bpf into fixed pipeline asic is definitely not easy.
The problem with adding new cls classifieris and actions to match
what configurable hw does isn't pretty either. The fixed pipeline
isn't interesting beyond l2/l3 and flow-based hw features are mostly
useless in the tor. I'm not against adding new classifiers, since it's
better than sdk, but we won't be using such tc features either.
Since this thread about tcam... my 0.02 here is it's pretty bad in
the nic(host) due to power consumption and in the tor it's only good as
a part of algorithmic lpm solutions. There it won't be even seen as tcam.
Instead the fancy algorithms will use exact match + tcam + aux data to pack
as many routes into such 'algorithmic lpm' as possible, so I cannot see
what tcam as actual tcam can be good for.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-06  3:44       ` Alexei Starovoitov
@ 2016-09-06 11:17         ` Jamal Hadi Salim
  2016-09-06 13:31           ` Andrew Lunn
  0 siblings, 1 reply; 8+ messages in thread
From: Jamal Hadi Salim @ 2016-09-06 11:17 UTC (permalink / raw)
  To: Alexei Starovoitov, Jiri Pirko
  Cc: John Fastabend, Florian Fainelli, netdev, jiri, idosh,
	john.fastabend, ast, davem, ecree, andrew, vivien.didelot

On 16-09-05 11:44 PM, Alexei Starovoitov wrote:

> lol :)
> compiling bpf into fixed pipeline asic is definitely not easy.
> The problem with adding new cls classifieris and actions to match
> what configurable hw does isn't pretty either. The fixed pipeline
> isn't interesting beyond l2/l3 and flow-based hw features are mostly
> useless in the tor.

openflow cargo cult grew around those ACLs (before pragmatism
of table sizes and that there is more reality to networking
than some silly ACLs or defining everything as a table).
But that doesnt make things outside of L2/3 useless. A few
players like google are using those ASIC ACLs quiet effectively.

>I'm not against adding new classifiers, since it's
> better than sdk, but we won't be using such tc features either.

We are seeing use of those ACLs on TORs and spines (with tc). Yes, tiny
table spaces are a problem but new hardware allows for expansion
and people who use those tables are factoring in these issues.
You are not going to beat the performance numbers these things
offer. There is a lifespan of maybe 3-4 years where you are
not going to beat those numbers with s/ware without spending
more $ and space.

> Since this thread about tcam... my 0.02 here is it's pretty bad in
> the nic(host) due to power consumption and in the tor it's only good as
> a part of algorithmic lpm solutions. There it won't be even seen as tcam.
> Instead the fancy algorithms will use exact match + tcam + aux data to pack
> as many routes into such 'algorithmic lpm' as possible, so I cannot see
> what tcam as actual tcam can be good for.
>

Agreed on tcams.

cheers,
jamal

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-06 11:17         ` Jamal Hadi Salim
@ 2016-09-06 13:31           ` Andrew Lunn
  2016-09-06 13:36             ` Jamal Hadi Salim
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Lunn @ 2016-09-06 13:31 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Alexei Starovoitov, Jiri Pirko, John Fastabend, Florian Fainelli,
	netdev, jiri, idosh, john.fastabend, ast, davem, ecree,
	vivien.didelot

> >Since this thread about tcam... my 0.02 here is it's pretty bad in
> >the nic(host) due to power consumption and in the tor it's only good as
> >a part of algorithmic lpm solutions. There it won't be even seen as tcam.
> >Instead the fancy algorithms will use exact match + tcam + aux data to pack
> >as many routes into such 'algorithmic lpm' as possible, so I cannot see
> >what tcam as actual tcam can be good for.
> >
> 
> Agreed on tcams.

So if i'm reading this right, you are talking about big switches, top
of racks, etc. And you don't see much use for the TCAM.

Florian and I are interested in the other end of the scale. Little
5-10 port switches in SoHo, STB, WiFi Access points etc. At the
moment, firewalling in such devices is done by the CPU. If we can
offload some of the firewall rules to the TCAM, we would be happy.

	Andrew

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Centralizing support for TCAM?
  2016-09-06 13:31           ` Andrew Lunn
@ 2016-09-06 13:36             ` Jamal Hadi Salim
  0 siblings, 0 replies; 8+ messages in thread
From: Jamal Hadi Salim @ 2016-09-06 13:36 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Alexei Starovoitov, Jiri Pirko, John Fastabend, Florian Fainelli,
	netdev, jiri, idosh, john.fastabend, ast, davem, ecree,
	vivien.didelot

On 16-09-06 09:31 AM, Andrew Lunn wrote:

> So if i'm reading this right, you are talking about big switches, top
> of racks, etc. And you don't see much use for the TCAM.
>
> Florian and I are interested in the other end of the scale. Little
> 5-10 port switches in SoHo, STB, WiFi Access points etc. At the
> moment, firewalling in such devices is done by the CPU. If we can
> offload some of the firewall rules to the TCAM, we would be happy.
>

As with all discussions on netdev, that was a distraction;->
Use tc as mentioned by Jiri. Please please no ethtool.

cheers,
jamal

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-06 13:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <57d4a2db-ca3b-909a-073a-52ecceb428f2@gmail.com>
2016-09-02 17:18 ` Centralizing support for TCAM? Florian Fainelli
2016-09-02 18:49   ` John Fastabend
2016-09-02 19:02     ` Andrew Lunn
2016-09-03  7:09     ` Jiri Pirko
2016-09-06  3:44       ` Alexei Starovoitov
2016-09-06 11:17         ` Jamal Hadi Salim
2016-09-06 13:31           ` Andrew Lunn
2016-09-06 13:36             ` Jamal Hadi Salim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.