Netdev Archive on lore.kernel.org
 help / Atom feed
* TC stats / hw offload question
@ 2019-02-05 19:41 Edward Cree
  2019-02-06  2:20 ` Jamal Hadi Salim
  0 siblings, 1 reply; 9+ messages in thread
From: Edward Cree @ 2019-02-05 19:41 UTC (permalink / raw)
  To: netdev; +Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang

Regarding TC_CLSFLOWER_STATS, when a filter rule modifies the length of
 the packet, e.g. by adding a VLAN or encap header, should the bytes
 counter count the length of the packet _before_ edits (i.e. as seen by
 the match), or the length after edits?  If the latter, what is the
 correct behaviour when (say) a packet is mirrored as-is but also
 redirected with encapsulation?
The fact that the stats live in the struct tc_action suggests a per-
 action connection that would imply post-edit length, but then in
 tcf_exts_dump_stats() we only look at the first action, which seems to
 imply we really want the pre-edit length.
I can't find any kind of doc or spec defining what behaviour is required.

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-05 19:41 TC stats / hw offload question Edward Cree
@ 2019-02-06  2:20 ` Jamal Hadi Salim
  2019-02-08 10:26   ` Edward Cree
  0 siblings, 1 reply; 9+ messages in thread
From: Jamal Hadi Salim @ 2019-02-06  2:20 UTC (permalink / raw)
  To: Edward Cree, netdev; +Cc: Jiri Pirko, Cong Wang

On 2019-02-05 2:41 p.m., Edward Cree wrote:
> Regarding TC_CLSFLOWER_STATS, when a filter rule modifies the length of
>   the packet, e.g. by adding a VLAN or encap header, should the bytes
>   counter count the length of the packet _before_ edits (i.e. as seen by
>   the match), or the length after edits?  

Classifier stats - when they exist - should be on
"classified" packets. i.e before edits.
Edits are typically done by actions. And even then we dont
have counters which indicate the post edit resizing.

> If the latter, what is the
>   correct behaviour when (say) a packet is mirrored as-is but also
>   redirected with encapsulation?

It is that ambiguity that make it hard to maintain a single
counter for post-edit size. packet and byte counts are on
"entry".

> The fact that the stats live in the struct tc_action suggests a per-
>   action connection that would imply post-edit length,

The classifiers dont mod the packets. The actions do. And they
maintain stats on the size on "entry" i.e pre-edit.

> but then in
>   tcf_exts_dump_stats() we only look at the first action, which seems to
>   imply we really want the pre-edit length.
> I can't find any kind of doc or spec defining what behaviour is required.
> 

Each action keeps its own counters. If you did something like:

tc match using flower blah \
action vlan push tag ... \
action redirect to egress of eth0

And you submited a packet of size x bytes,
then the "match" would record x bytes.
the "vlan action" would record x bytes.
the "redirect" would record size x+vlaninfo bytes
the egress of eth0 would  recorr x+vlaninfo bytes

cheers,
jamal

> -Ed
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-06  2:20 ` Jamal Hadi Salim
@ 2019-02-08 10:26   ` Edward Cree
  2019-02-09 17:39     ` Jamal Hadi Salim
  0 siblings, 1 reply; 9+ messages in thread
From: Edward Cree @ 2019-02-08 10:26 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev; +Cc: Jiri Pirko, Cong Wang

On 06/02/19 02:20, Jamal Hadi Salim wrote:
> The classifiers dont mod the packets. The actions do. And they
> maintain stats on the size on "entry" i.e pre-edit. 
Thank you for clearing that up.

> Each action keeps its own counters. If you did something like:
>
> tc match using flower blah \
> action vlan push tag ... \
> action redirect to egress of eth0
>
> And you submited a packet of size x bytes,
> then the "match" would record x bytes.
Sorry, where would it record that?  I can't find any stats counters on
 the "match" either in the software path or the offload API.

> the "vlan action" would record x bytes.
> the "redirect" would record size x+vlaninfo bytes
> the egress of eth0 would  recorr x+vlaninfo bytes
Am I right in thinking that offloaded counters don't do that?  As far
 as I can tell, the drivers with flower offload all use
 tcf_exts_stats_update() which takes a single 'bytes' count and adds
 it to all the actions.  (Presumably this is pre-edit length.)

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-08 10:26   ` Edward Cree
@ 2019-02-09 17:39     ` Jamal Hadi Salim
  2019-02-11 11:44       ` Edward Cree
  0 siblings, 1 reply; 9+ messages in thread
From: Jamal Hadi Salim @ 2019-02-09 17:39 UTC (permalink / raw)
  To: Edward Cree, netdev
  Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek,
	PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski

On 2019-02-08 5:26 a.m., Edward Cree wrote:
> On 06/02/19 02:20, Jamal Hadi Salim wrote:

>> tc match using flower blah \
>> action vlan push tag ... \
>> action redirect to egress of eth0
>>
>> And you submited a packet of size x bytes,
>> then the "match" would record x bytes.
 >
> Sorry, where would it record that?

Sorry - that was hypothetical on my part. We dont
record the bytes in the classifier. We do (on some
classifiers) record hit counts, so we can see how many
times a lookup on a specific filter happened and how
many times that lookup resulted in a match.
See u32 (or a few others Cong has been updating
recently).

>  I can't find any stats counters on
>   the "match" either in the software path or the offload API.

Hasnt been necessary thus far.
Is your end goal to match and count?
Most hardware has stats/counters as a separate table.
Assuming yours does as well, then if all you want was
to just match and accept, then you would add a filter
with an accept/ok. i.e something along the lines of:

tc match using flower blah \
action ok index 12345

The "action index" field can be used to map to a counter
table index in hardware. Note if you dont explicitly
specify the index, the kernel (and in this case h/w)
issues one for you.

>> the "vlan action" would record x bytes.
>> the "redirect" would record size x+vlaninfo bytes
>> the egress of eth0 would  recorr x+vlaninfo bytes
> Am I right in thinking that offloaded counters don't do that?  As far
>   as I can tell, the drivers with flower offload all use
>   tcf_exts_stats_update() which takes a single 'bytes' count and adds
>   it to all the actions.  (Presumably this is pre-edit length.)

On the pre/post edit, perhaps some of the hardware
owners could chime in +Ccing a few.

To the tcf_exts_stats_update():
Note, most people interested in h/w offload will choose to skip
sw (explicitly defined when you add  a rule).
The update synchronizes the hardware stats/counters
to the kernel - so when you dump the policies in such a setup
you are seeing what is reflected in hardware.

cheers,
jamal

PS: I have to say just keeping one counter is at times
insufficient. Example, the policer records how many bytes
were seen, not how many bytes were allowed through.
For tunnels, it is easy to compute post-edit size because
you can easily compute later the added/removed bytes.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-09 17:39     ` Jamal Hadi Salim
@ 2019-02-11 11:44       ` Edward Cree
  2019-02-14 12:39         ` Jamal Hadi Salim
  0 siblings, 1 reply; 9+ messages in thread
From: Edward Cree @ 2019-02-11 11:44 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek,
	PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski

On 09/02/19 17:39, Jamal Hadi Salim wrote:
> On 2019-02-08 5:26 a.m., Edward Cree wrote:
>>   I can't find any stats counters on
>>   the "match" either in the software path or the offload API.
>
> Hasnt been necessary thus far.
> Is your end goal to match and count?
My end goal is to implement TC offload in some hw we're designing
 here at Solarflare.  So I'm trying to determine what hardware is
 expected/required to do.
It might be possible to design our new hw so that we can attach a
 counter to every action, if that's what TC wants.  But since the
 other vendors don't seem to do that, I wondered if there was a
 reason, or if perhaps the counter resources (and PCI bw to read
 them) could be saved if all those separate counters aren't really
 needed.  Right now the design we are considering would only count
 packets as-matched, i.e. before any edits.  That's fine for encap
 — you can calculate the bytes correction in SW — but not for decap
 since in principle the length of the RXed outer headers could
 vary (e.g. you might have IP options there).

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-11 11:44       ` Edward Cree
@ 2019-02-14 12:39         ` Jamal Hadi Salim
  2019-02-14 15:17           ` Andy Gospodarek
  2019-02-18 18:56           ` Edward Cree
  0 siblings, 2 replies; 9+ messages in thread
From: Jamal Hadi Salim @ 2019-02-14 12:39 UTC (permalink / raw)
  To: Edward Cree, netdev
  Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek,
	PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski


On 2019-02-11 6:44 a.m., Edward Cree wrote:

>> Hasnt been necessary thus far.
>> Is your end goal to match and count?
> My end goal is to implement TC offload in some hw we're designing
>   here at Solarflare.  So I'm trying to determine what hardware is
>   expected/required to do.
> It might be possible to design our new hw so that we can attach a
>   counter to every action, if that's what TC wants. 

It makes sense to have a counter on every action - even if it is
for debugging purposes. The two most basic actions are "drop" or
"accept". In TC speak the default action is "classid x:y" which
typically is to select a queue or give the flow some identity
(one should be able to use the same action on h/w ingress as well
to select a rx DMA ring for example, but that seems uncommon).

Note, your counters should also be shareable; example, count all
the drops in one counter across multiple flows as in the following
case where counter index 1 is used.

tc flower match foo action drop index 1
tc flower match bar action drop index 1


>   But since the
>   other vendors don't seem to do that, I wondered if there was a
>   reason, or if perhaps the counter resources (and PCI bw to read
>   them) could be saved if all those separate counters aren't really
>   needed.  

Probably nobody has paid attention or asked as you did.
Will let the h/w folks speak for themselves. My understanding
based on experience is counters are cheap. Most modern NICs
and ASICs have a gazillion of them at their disposal.

> Right now the design we are considering would only count
>   packets as-matched, i.e. before any edits.  That's fine for encap
>   — you can calculate the bytes correction in SW — but not for decap
>   since in principle the length of the RXed outer headers could
>   vary (e.g. you might have IP options there).
>

ok, so not much in terms of other types of actions.
But for abstraction sake maybe use "flowid x:y" and have counters
associated with that. Or even make this optional and only attach
a counter if someone says "action ok" and allow them to specify
the counter index (assuming you architecture has an indexed table
of counters).

cheers,
jamal


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-14 12:39         ` Jamal Hadi Salim
@ 2019-02-14 15:17           ` Andy Gospodarek
  2019-02-18 18:56           ` Edward Cree
  1 sibling, 0 replies; 9+ messages in thread
From: Andy Gospodarek @ 2019-02-14 15:17 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Edward Cree, netdev, Jiri Pirko, Cong Wang, Or Gerlitz,
	PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski

On Thu, Feb 14, 2019 at 07:39:28AM -0500, Jamal Hadi Salim wrote:
> 
> On 2019-02-11 6:44 a.m., Edward Cree wrote:
> 
> >   But since the
> >   other vendors don't seem to do that, I wondered if there was a
> >   reason, or if perhaps the counter resources (and PCI bw to read
> >   them) could be saved if all those separate counters aren't really
> >   needed.
> 
> Probably nobody has paid attention or asked as you did.

That's not totally true, but I'm glad to hear that others are
considering it.

> Will let the h/w folks speak for themselves. My understanding
> based on experience is counters are cheap. Most modern NICs
> and ASICs have a gazillion of them at their disposal.

Counters can be cheap to implement (though that does not always mean
that everyone chooses to add many of them to hardware), but the real
cost to them, as Edward points out, is the cost of accessing them an
keeping them up to date.  If we are concerned about keeping track of
flow counters on thousands (or more) flows the cost on the PCI bus can
be quite high.

We have been looking at a few options to deal with tracking this massive
number of counts, but are open to hearing what others feel they would
like to see happen to this.  Though stats sometimes seem boring to some
developers, they are critical and we have been considering whether or
not we need to think about different driver or global infra to handle
it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-14 12:39         ` Jamal Hadi Salim
  2019-02-14 15:17           ` Andy Gospodarek
@ 2019-02-18 18:56           ` Edward Cree
  2019-02-18 19:37             ` Edward Cree
  1 sibling, 1 reply; 9+ messages in thread
From: Edward Cree @ 2019-02-18 18:56 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek,
	PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski

On 14/02/19 12:39, Jamal Hadi Salim wrote:
> On 2019-02-11 6:44 a.m., Edward Cree wrote:
>> My end goal is to implement TC offload in some hw we're designing
>>   here at Solarflare.  So I'm trying to determine what hardware is
>>   expected/required to do.
>> It might be possible to design our new hw so that we can attach a
>>   counter to every action, if that's what TC wants. 
>
> It makes sense to have a counter on every action - even if it is
> for debugging purposes. The two most basic actions are "drop" or
> "accept". In TC speak the default action is "classid x:y" which
> typically is to select a queue or give the flow some identity
Perhaps I was insufficiently clear; we're only looking at the switching
 side of things (e.g. OVS offload) right now; we don't yet have a plan
 for 'delivery' filters (I imagine we'll probably initially port over
 our ethtool ntuple filter handling from ef10, though we may end up
 going down the TC route there).
So I think at the moment 'classid' isn't relevant (?)

> Note, your counters should also be shareable; example, count all
> the drops in one counter across multiple flows as in the following
> case where counter index 1 is used.
>
> tc flower match foo action drop index 1
> tc flower match bar action drop index 1
[...]
> allow them to specify
> the counter index (assuming you architecture has an indexed table
> of counters).
Our architecture allocates objects (including counters) and returns
 opaque handles to them, so we'd need a software table to connect
 counter index to FW counter ID.
Also, sharing counters in hw causes extra work for the driver code
 that keeps track of which encap actions are getting hit (so it can
 keep the neighbour entries alive).  Maybe summing the shared
 counters in sw is easier than that, I'm not sure (or maybe encap
 action counters should just be kept separate from the counters we
 report to TC).

TBH I'm coming to the conclusion that what we should do for the first
 version of our driver is just to create a counter per rule and report
 it against the first action (only), and for now ignore the index (or
 maybe require it to be set to some distinguished value, like 0, to
 mean 'allocate', so that as a future extension we can support
 shareable counters).

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: TC stats / hw offload question
  2019-02-18 18:56           ` Edward Cree
@ 2019-02-18 19:37             ` Edward Cree
  0 siblings, 0 replies; 9+ messages in thread
From: Edward Cree @ 2019-02-18 19:37 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev
  Cc: Jiri Pirko, Cong Wang, Or Gerlitz, Andy Gospodarek,
	PJ Waskiewicz, Anjali Singhai Jain, Jakub Kicinski

On 18/02/19 18:56, Edward Cree wrote:
> (or
>  maybe require it to be set to some distinguished value, like 0, to
>  mean 'allocate', so that as a future extension we can support
>  shareable counters).
Turns out 0 isn't an option, because TC already internally swallows
 that and allocates its own index (same as if 'index' wasn't specified
 on cmdline at all), unless we can change TC to pass that 0 on to the
 driver.

-Ed

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, back to index

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05 19:41 TC stats / hw offload question Edward Cree
2019-02-06  2:20 ` Jamal Hadi Salim
2019-02-08 10:26   ` Edward Cree
2019-02-09 17:39     ` Jamal Hadi Salim
2019-02-11 11:44       ` Edward Cree
2019-02-14 12:39         ` Jamal Hadi Salim
2019-02-14 15:17           ` Andy Gospodarek
2019-02-18 18:56           ` Edward Cree
2019-02-18 19:37             ` Edward Cree

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox